Parsing CSV from stream with TextFieldParser always reaches EndOfData

1.7k views Asked by At

During parsing CSV file as a stream from Azure Blob, TextFieldParser always reaches EndOfData immediately, without any data read. The same code, but with the path to same physical file instead of stream works.

    Dim storageAccount As CloudStorageAccount = CloudStorageAccount.Parse(AzureStorageConnection)
    Dim blobClient As CloudBlobClient = storageAccount.CreateCloudBlobClient()
    Dim BlobList As IEnumerable(Of CloudBlockBlob) = blobClient.GetContainerReference("containername").ListBlobs().OfType(Of CloudBlockBlob)

    For Each blb In BlobList
        Dim myList As New List(Of MyBusinessObject)

        Using memoryStream = New MemoryStream()
            blb.DownloadToStream(memoryStream)

            Using Reader As New FileIO.TextFieldParser(memoryStream)
                Reader.TextFieldType = FileIO.FieldType.FixedWidth
                Reader.SetFieldWidths(2, 9, 10)
                Dim currentRow As String()
                While Not Reader.EndOfData
                    Try
                        currentRow = Reader.ReadFields()
                        myList.Add(New GsmXFileRow() With {
                        ' code to read currentRow and add elements to myList
                        })
                    Catch ex As FileIO.MalformedLineException
                    End Try
                End While
            End Using
        End Using
    Next

I have also tried to convert MemoryStream to TextReader

Dim myTextReader As TextReader = New StreamReader(memoryStream)

and then passing myTextReader into TextFieldParser, but this does not work either.

Using Reader As New FileIO.TextFieldParser(myTextReader)

1

There are 1 answers

4
Joel Coehoorn On BEST ANSWER

I see this:

Value of Length property equals file size

and this:

'Position` property has same value

That means at the start of the loop, the MemoryStream has already advanced to the end of the stream. Just set Position back to 0, and you should be in a better place.

However, there may be another issue here, too. That stream data is binary with some unknown encoding. The TextFieldParser wants to work with Text. You need a way to give the TextFieldParser information about what encoding is used.

In this case, I recommend a StreamReader. This type inherits from TextReader, so you can use it with the TextFieldParser :

Dim storageAccount As CloudStorageAccount = CloudStorageAccount.Parse(AzureStorageConnection)
Dim blobClient As CloudBlobClient = storageAccount.CreateCloudBlobClient()
Dim BlobList As IEnumerable(Of CloudBlockBlob) = blobClient.GetContainerReference("containername").ListBlobs().OfType(Of CloudBlockBlob)

Dim myList As New List(Of MyBusinessObject)
For Each blb In BlobList

    'Several constructor overloads allow you to specify the encoding here
    Using blobData As New StreamReader(New MemoryStream())
        blb.DownloadToStream(blobData.Stream)

        'Fix the position problem
        blobData.Stream.Position = 0

        Using Reader As New FileIO.TextFieldParser(blogData)
            Reader.TextFieldType = FileIO.FieldType.FixedWidth
            Reader.SetFieldWidths(2, 9, 10)
            Dim currentRow As String() = Reader.ReadFields()
            While Not Reader.EndOfData
                Try
                    myList.Add(New GsmXFileRow() With {
                        ' code to read currentRow and add elements to myList
                    })
                    currentRow = Reader.ReadFields()
                Catch ex As FileIO.MalformedLineException
                End Try
            End While
        End Using 
    End Using
Next