VBScript processing CSV (RFC4180)

197 views Asked by At

What I'm attempting to do is read over a RFC4180 compliant csv file, and throw each value into an array, for processing further on down the script. Below is the most convoluted but compliant csv line I could come up with, and it works, but this script will be customer facing, so I'd love it if you could have a look over and test the logic / show me the parts where I've missed things.

I'd love best practice pointers too, if possible.

This is an extract, basically I read in a line from the CSV file, and iterate over each char, testing and applying different things based on my understanding of RFC 4180. chr(34) represents double quote ("), it's the only way I could find to do a comparison in VBScript.

If isFirstChar = True And thisChar = chr(34) Then
    'Mark it as in quotes, but don't print it, as it's not part of the text meant for the din file
    isInQuotes = True
    isFirstChar = False
ElseIf thisChar = chr(34) And isInQuotes = True Then
    If nextChar = chr(34) Then
    'Per RFC4108, "" is an escape sequence
    'Print it, jump up the old CharCounter to prevent double handling.
        CharCounter = CharCounter +1
        FieldData = FieldData & thisChar
        isFirstChar = False
    ElseIf nextChar = "," Then
    'It's the end of the field, we can set the isInQuotes to false so that the next char is handled as end of field
    isInQuotes = False
    isFirstChar = False
    Else
    'CSV isn't RFC4180 compliant
    WScript.Echo "This CSV isn't RFC4180 compliant, please fix the file or contact <redacted> for assistance."
    End If
ElseIf thisChar = "," Then
    If isInQuotes = False Then
    'End of Field, handle appropriately
        FieldCounter = FieldCounter + 1
        Redim Preserve FieldArray(FieldCounter)
        FieldArray(FieldCounter) = FieldData
        FieldData = ""
        isFirstChar = True
    Else
    'In quotes, handle as regular char
    FieldData = FieldData & thisChar
    isFirstChar = False
    End If
Else
    'Got all the way here, it's just a regular character! (We hope)
    FieldData = FieldData & thisChar
    isFirstChar = False
End If

And below is an example of a row from a CSV record:

"Luke ,, Pearson","Luke ""Lucky""","""111 ""Brown Mountain Cres",,"CO""OROY",QLD,4563,,1234567,1,1.11,N,AT FRONT GATE,0712345678,0.0022,2
1

There are 1 answers

0
Chris On

I wanted a standalone solution so here is what I came up with. (I am sure there is room for improvement):

 Private Function CSVSplit(ByVal csv As String) As String()
    ' see RFC 4180
    If csv = Empty Then
        Dim emptyArray(-1 To -1) As String
        CSVSplit = emptyArray
        Exit Function
    End If

    Dim i As Long
    Dim tokenBuf As Collection: Set tokenBuf = New Collection
    Dim tokens As Collection: Set tokens = New Collection
    Dim token As String
    Dim inQuotes As Boolean
    Dim nextChar As String
    Dim c_ As Variant

    For i = 1 To Len(csv)
        Dim c As String: c = Mid(csv, i, 1)

        If i = Len(csv) Then
            nextChar = Empty
        Else
            nextChar = Mid(csv, i + 1, 1)
        End If

        Select Case c
            Case ","
                If inQuotes Then
                    Call tokenBuf.Add(c)
                Else
                    token = ""
                    For Each c_ In tokenBuf
                        token = token & c_
                    Next
                    Call tokens.Add(token)
                    Set tokenBuf = New Collection
                End If

            Case """"
                If Not inQuotes Then
                    inQuotes = True
                ElseIf nextChar = """" Then
                    i = i + 1
                    c = Mid(csv, i, 1)
                    ' quote literal
                    Call tokenBuf.Add(c)
                Else
                    inQuotes = False
                End If

            Case Else
                Call tokenBuf.Add(c)
        End Select
    Next

    token = ""
    For Each c_ In tokenBuf
        token = token & c_
    Next
    Call tokens.Add(token)

    Dim result() As String
    ReDim result(tokens.count - 1)

    For i = 0 To tokens.count - 1
        result(i) = tokens(i+1)
    Next

    CSVSplit = result
End Function

Tested with the following (Note the WriteLine function is just a custom function that outputs to a console for testing purposes.:

Dim s As Variant
Dim ss(3) As String
ss(0) = "Panes, Trains, Automobiles"

ss(1) = "Wide, He Said ""Tall, Skinny and round"", ""Narrow, but """"manageable"""""

ss(2) = "Speaker: Cindy, Greeting: """"""Hello, " & vbNewLine & "world"""""", Farewell: Goodbye"

ss(3) = "Lions, Tigers, Bears,"

For Each s in ss
    Dim arr() As String: arr = CSVSplit(s)
    Dim i As Integer
    For i = 0 To UBound(arr)
        Call WriteLine("_________________________________________________")
        Call WriteLine(arr(i))
        Call WriteLine("_________________________________________________")
        Call WriteLine(vbNewLine)
    Next
Next

Output:

_________________________________________________
Panes
_________________________________________________


_________________________________________________
 Trains
_________________________________________________


_________________________________________________
 Automobiles
_________________________________________________


_________________________________________________
Wide
_________________________________________________


_________________________________________________
 He Said Tall, Skinny and round
_________________________________________________


_________________________________________________
 Narrow, but "manageable"
_________________________________________________


_________________________________________________
Speaker: Cindy
_________________________________________________


_________________________________________________
 Greeting: "Hello,
world"
_________________________________________________


_________________________________________________
 Farewell: Goodbye
_________________________________________________


_________________________________________________
Lions
_________________________________________________


_________________________________________________
 Tigers
_________________________________________________


_________________________________________________
 Bears
_________________________________________________


_________________________________________________

_________________________________________________