What's a good library for parsing fixed-length records in Groovy?

2.4k views Asked by At

I want a library that I can give it a file and a config param of column length, name, and possibly type and from that get back a map of the columns of each row.

This isn't difficult thing to do on my own, but I would be surprised if there wasn't already a great solution. I've tried searching for one, but have had no luck.

3

There are 3 answers

0
Jonny Heggheim On BEST ANSWER

You can always use FlatFileItemReader from Spring Batch that will return a structure like JDBC ResultSet.

But it might be overkill and make it more complex. For Groovy I find it easy to read and write code like this:

file = '''\
JOHN      DOE       123       
JANE      ROE       456       
'''

names = []
file.eachLine { names << [
    first: it[0..9].trim(), 
    last:  it[10..19].trim(),
    age:   it[20..22].toInteger()
]}

assert names[0].first == 'JOHN'
assert names[1].age == 456
0
Tim H On

Just tested this using the regex method and the String getAt method. getAt seems to be about 2x faster than regex over 10k

def input = "";

for(i=1;i<10000;i++)
{
    input += "JOHN      DOE       123       \n"
}


def fieldDefs = [firstName: 10, lastName: 10, someValue: 10]


def benchmark = { closure ->
    start = System.currentTimeMillis()
    closure.call()
    now = System.currentTimeMillis()
    now - start
  }


def pattern = "^" + fieldDefs.collect { k, v -> "(.{$v})" }.join('') + "\$"

duration = benchmark {
    rows = []
    input.eachLine { line ->

        String firstName = line.getAt(0..9).trim();
        String lastName = line.getAt(10..19).trim();
        String someValue = line.getAt(20..29).trim();
        rows << ["firstName":firstName,"lastName":lastName,"someValue":someValue];
    }

    //println rows
    }


println "execution of string method took ${duration} ms"


def duration = benchmark {
rows = []
input.eachLine { line ->
    def m = line =~ pattern
    if (m) {
        def names = fieldDefs.keySet() as List
        def values = m[0][1..-1].collect { it.trim() }
        rows << [names, values].transpose().collectEntries{it}
    }
}

//println rows
}

println "execution of regex method took ${duration} ms"

execution of string method took 245 ms execution of regex method took 505 ms

1
ataylor On

I don't know of anything specifically for groovy. I've done something similar with regular expressions; here's a quick and dirty parser based on this approach:

def input =
"JOHN      DOE       123       \n" +
"JANE      ROE       456       \n"

def fieldDefs = [firstName: 10, lastName: 10, someValue: 10]

def pattern = "^" + fieldDefs.collect { k, v -> "(.{$v})" }.join('') + "\$"

rows = []
input.eachLine { line ->
    def m = line =~ pattern
    if (m) {
        def names = fieldDefs.keySet() as List
        def values = m[0][1..-1].collect { it.trim() }
        rows << [names, values].transpose().collectEntries{it}
    }
}