CsvReader.Read / CsvReader.ReadAsync duplicates data

57 views Asked by At

Inside an Activity in Azure Durable Functions, I have a method that reads Csv file from certain index. Purpose of this method is to read 2000 rows of the file or until last row, starting from the specified index.

With files larger than 2000 rows, we reach BucketSize and stop while loop, so no duplicates occur.

However, with files shorter than bucket size, what should happen is that Read/ReadAsync method should return false when reached end of the file. What happens instead, is that upon reaching last row, reader starts working again from the specified index, and Read/ReadAsync method returns false only after full 2nd run. Which results in duplicated values.

public async Task<List<TModel>> StartProcessingBatchAsync(Stream stream, long index, CancellationToken token)
{
    _logger.LogInformation("Processing CSV async");
    using var reader = new StreamReader(stream);
    using var csv = new CsvReader(reader, new CsvConfiguration(CultureInfo.InvariantCulture) { HasHeaderRecord = CsvHasHeaderRecord });
    csv.Context.RegisterClassMap(_modelMap);

    stream.Position = 0;
    csv.Read();
    csv.ReadHeader();
    stream.Position = index;
    var modelsList = new List<TModel>();
    var errorCount = 0;
    var rowCounter = 0;

    _logger.LogInformation($"Reading {_configuration.BucketSize} items starting from index: {stream.Position}");
    while(await csv.ReadAsync() && rowCounter < _configuration.BucketSize)
    {
        TModel record;

        try
        {
            record = csv.GetRecord<TModel>();

            Guard.NotNull(record, nameof(record));
        }
        catch
        {
            errorCount++;
            continue;
        }
        modelsList.Add(record);
        rowCounter++;
    }
    _logger.LogInformation($"Found {errorCount} errors out of {_configuration.BucketSize} items");

    return modelsList;
}

Any ideas why is that? Why Read/ReadAsync method is going for the 2nd run instead of throwing false at the end of the file?

0

There are 0 answers