parsing keyed lists from a file in tcl?

2.7k views Asked by At

I have a file full of records in the following format:

{TOKEN 
    { NAME {name of this token} }
    { GROUPS {Group 1} }
    { VALUE value }
    { REPEATING {
        { MAX 3 }
        { TIME {nmin 30} }
    } }
    { WINDOW */*/*/* }
    { ACTION {
        { EXEC {code to run here} }
    } }
}
{TOKEN 
    { NAME {name of next token} }
    { GROUPS {Group 1} }
    { VALUE value }
    { WINDOW 0/0:30-2:00,3:30-7:30/*/* }
    { HOST {localhost} }
    { ACTION {
        { email {
            { FROM [email protected] }
            { TO [email protected] }
            { SUBJ {email subject test} }
            { MSG {this is the email body} }
        } }
    } }

Not all of the records have the same keywords but they all are nested keyed lists and I need to parse them into a .csv file for easier review. However, when I read in the file, it comes in as a single string rather than as a list of keyed lists. Splitting on whitespace or newline wouldn't help because they are located inside the keyed lists too. I tried to insert a pipe (|) between }\n and {T and split on the pipe but I still ended up with strings.

I hope someone can point me in the right direction to parse these s-expression files.

thanks in advance!

J

4

There are 4 answers

5
Donal Fellows On BEST ANSWER

That looks like a list of TclX keyed lists, which were an earlier attempt to do what modern Tcl does with dictionaries. Keyed lists nest quite nicely — that's a tree, not a table — so mapping to CSV will not be maximally efficient, but their syntax is such that the easiest way to handle them is with the TclX code.

Preliminaries:

package require TclX
package require csv;        # From Tcllib

List the columns that we're going to be interested in. Note the . separating bits of names.

set columns {
    TOKEN.NAME TOKEN.GROUPS TOKEN.VALUE TOKEN.REPEATING.MAX TOKEN.REPEATING.TIME
    TOKEN.WINDOW TOKEN.HOST TOKEN.ACTION.EXEC TOKEN.ACTION.email.FROM
    TOKEN.ACTION.email.TO TOKEN.ACTION.email.SUBJ TOKEN.ACTION.email.MSG
}
# Optionally, put a header row in:
puts [csv::join $columns]

Loading the real data into Tcl:

set f [open "thefile.dta"]
set data [read $f]
close $f

Iterate over the lists, extract the info, and send to stdout as CSV:

foreach item $data {
    # Ugly hack to munge data into real TclX format
    set item [list [list [lindex $item 0] [lrange $item 1 end]]]
    set row {}
    foreach label $columns {
        if {![keylget item $label value]} {set value ""}
        lappend row $value
    }
    puts [csv::join $row]
}

Or something like that.

6
Hai Vu On

The Problem

Here is how I understand your problem.

  • You have a text file full of records. Each record is {TOKEN ...}
  • Each record is almost a keyed list, but not quite: the string TOKEN makes it an invalid keyed list. If we remove this string, then the rest will be a valid keyed list.
  • Each keyed list might be nested. That is, the value might be another keyed list.
  • You want to write each record as a row in a CSV file. However, in a CSV file, each row should contain the same number of columns, which is not the case here. I will leave it for you to find out how to best deal with it.

The Solution

What I suggest is to turn this into a dictionary, which is a flat, not nested, structure. That should make the job easier. Once you have a flat list, dealing with it becomes easier. Here is my solution:

# myscript.tcl

package require Tclx

proc makeKey {prefix key} {
    return [string trim "$prefix $key"]
}   

proc keyedlist2dict {klname {keyPrefix ""}} {
    upvar 1 $klname kl
    set d {}
    foreach key [keylkeys kl] {
        set value [keylget kl $key]
        if {[catch {keylkeys value}]} {
            # value is not a nested keyed list
            lappend d [makeKey $keyPrefix $key] $value
        } else {
            # value is a nested keyed list
            set d [concat $d [keyedlist2dict value $key]] ;# TCL 8.4
        }   
    }   

    return $d
}   

set contents [read [open data.txt]]
foreach item $contents { 
    # Each item starts with "TOKEN", which we need to remove otherwise
    # the keyed list is invalid
    set item [lrange $item 1 end]

    # Convert a keyed list to a dict, then to a csv row. We can then 
    # display the row or to write it to a file.
    set rec [keyedlist2dict item]

    # Display it
    foreach {key value} $rec { ;# TCL 8.4
        puts "$key: $value"
    }   
    puts ""
}   

Run the Script

tclsh myscript.tcl

Output

NAME: name of this token
GROUPS: Group 1
VALUE: value
REPEATING MAX: 3
REPEATING TIME: nmin 30
WINDOW: */*/*/*
ACTION EXEC: code to run here

NAME: name of next token
GROUPS: Group 1
VALUE: value
WINDOW: 0/0:30-2:00,3:30-7:30/*/*
HOST: localhost
email FROM: [email protected]
email TO: [email protected]
email SUBJ: email subject test
email MSG: this is the email body

Discussion

  • I assume your data is data.txt
  • The workhorse here is keyedlist2dict, where I take a keyed list and flatten it out to become a dictionary.
    • In this procedure, if the value is not a nested keyed list, I just append the key and values to the dictionary
    • If the value is indeed a nested keyed list, then I recursively call keyedlist2dict
    • Take a look at the output and you will see how I form the new keys
  • This script requires TCL version 8.5 or later

Update

I made changes to the two lines which I marked TCL 8.4. The script should now work on TCL 8.4 system.

0
glenn jackman On

You could treat the data as plain lists and read it line-by-line. The info complete command helps here:

set fh [open your.file r]
while {[gets $fh line] != -1} {
    append kl $line
    if {[info complete $kl]} {
        lappend lists $kl
        set kl ""
    }
}
close $fh
puts [llength $lists]                ;# 2
puts [llength [lindex $lists 0]]     ;# 1
puts [llength [lindex $lists 0 0]]   ;# 7
puts $lists

{{TOKEN { NAME {name of this token} } { GROUPS {Group 1} } { VALUE value } { REPEATING { { MAX 3 } { TIME {nmin 30} } } } { WINDOW //*/* } { ACTION { { EXEC {code to run here} } } }}} {{TOKEN { NAME {name of next token} } { GROUPS {Group 1} } { VALUE value } { WINDOW 0/0:30-2:00,3:30-7:30// } { HOST {localhost} } { ACTION { { email { { FROM [email protected] } { TO [email protected] } { SUBJ {email subject test} } { MSG {this is the email body} } } } } }}}

1
jslaker On

I realize this is a few months old at this point, but I see that you're trying to parse Cloverleaf config files (which is how I stumbled on this myself).

For anyone else trying to do something similar, there are actually libraries available for handling this provided with Cloverleaf, though they're not mentioned anywhere in the documentation.

Check out $HCIROOT/tcl/lib/cloverleaf. Handling for alert configs looks like it's in configIO.tlib. NetConfig stuff is in nci.tlib and netData.tlib.