TXR lisp: processing TXR collected data

251 views Asked by At

I have the following lisp data for which I would like to achieve a specific output. I got this output using @(do (prinl order) (prinl location) ...) from my TXR parser.

(defvar order '(0 1 2 3 4 5))
(defvar location 
  '("shape" "shape/rectangle" "shape/square" "shape/rectangle" "shape/rectangle" ""))
(defvar headings 
  '(("geometer") ("id" "width: cm" "height: cm") 
    ("id" "length: m") ("id" "width: cm" "height: cm")
    ("angle: °") ("year" "month" "day")))
(defvar values 
  '(("Alice") 
    (("1" "13" "15") ("2" "12" "14"))
    (("1" "10") ("2" "5") ("3..5" "7") 
     ("6;8" "15;12") ("7" "20") ("9..10" "25;30"))
    (("3" "5" "12.2")) ("90") ("2017" "03" "01")))
(defvar type '("meta" "data" "data" "data" "meta" "meta"))

At the end of the day, my desired output are CSV tables

[shape/rectangle]
year,month,day,geometer,angle: °,id,width: cm,height: cm
2017,03,01,90,Alice,1,13,15
2017,03,01,90,Alice,2,12,14
2017,03,01,90,Alice,3,5,12.2

[shape/square]
year,month,day,geometer,id,length: m
2017,03,01,Alice,1,10
2017,03,01,Alice,2,5
2017,03,01,Alice,3,7
2017,03,01,Alice,4,7
2017,03,01,Alice,5,7
2017,03,01,Alice,6,15
2017,03,01,Alice,8,12
2017,03,01,Alice,7,20
2017,03,01,Alice,9,25
2017,03,01,Alice,10,30

I have written some TXR lisp code for decompressing values:

(defun str-range-p (x)
  (m^$ #/\d+\.\.\d+/ x))

(defun str-range-expand (x)
  [apply range [mapcar int-str (split-str x "..")]])

(defun str-int-list-p (s)
  (and (str-list-p s)
       (all (str-list-expand s)
            (lambda (x)
              (or (int-str x)
                  (str-range-p x))))))

(defun str-list-p (x)
  (search-str x ";"))

(defun str-list-expand (x)
  (split-str x ";"))

(defun expand (s)
  (cond ((str-int-list-p s)
         (flatten [mapcar (lambda (x)
                            (if (str-range-p x)
                                (str-range-expand x)
                              (int-str x)))
                  (str-list-expand s)]))
        ((str-list-p s) (str-list-expand s))
        ((str-range-p s) (str-range-expand s))
        ((int-str s) (int-str s))
        (t s)))

And for checking if a location string is the parent of another location string:

(defun level-up (x)
  (cond ((equal x "") nil)
        ((search-str x "/")
         (sub-str x 0 (search-str x "/" 0 t)))
        (t "")))

(defun parent-location-p (x y)
  (or (equal x y)
      (equal x "")
      (and (not (equal y ""))
           (match-str (level-up y) x))))

I'm primarily interested in what TXR lisp built-in functions you think might be good for solving the remainder of this task to achieve the desired output. And, how would you approach the existing code differently to take advantage of existing TXR lisp features?

1

There are 1 answers

4
Kaz On BEST ANSWER

This solution works on the sample data given in an earlier edit of the question. It doesn't save the data in different .csv files but its output indicates what goes where.

A few objects are used to organize the logic. Locations are represented by the locations structure which automatically breaks the path names into components for easy analysis. Headings are represented as heading objects, which treat the type notation somewhat; currently it is only used for reformatting the integers representing years, days and months to the proper notation with leading zeros. Tables are represented as table objects which hold various properties. Values, however, are just lists. A table contains a list of rows, and rows are just lists of values. Values are normally scalar. If one or more values in a row are value are lists, that means that the row is a compression of multiple rows (as a result of the .. and ; notation). Ranges are expanded using code straight from the Rosetta Code range expansion, adapted to the delimiters used here.

The parser is modified only very slightly. The :counter is gone, and the main collect has instead a :vars (tables): only a list of tables emerges and these are objects constructed using the new macro. Also, there is a new @(rebind values (values)) so that meta tables come out in the same representation: though they have just one row, we want their rows property to hold a list of rows, just like data tables.

@(do
   (defstruct (location str) nil
     str path

     (:method parse (me)
       (set me.path (tok-str me.str #/[^\/]+/)))

     (:method format (me)
       (set me.str `@{me.path "/"}`))

     (:method level-up (me)
       (new location path (butlast me.path)))

     (:method is-prefix-of (me maybe-suffix)
       (let ((mm (mismatch me.path maybe-suffix.path)))
         (or (not mm) (eql mm (length me.path)))))

     (:method print (me stream pretty-p)
       (put-string `@{me.path "/"}` stream))

     (:method equal (me) me.path)

     (:postinit (me)
       (if me.str
         me.(parse)
         me.(format))))

   (defstruct (heading str) nil
     str name type

     (:method parse (me)
       (tree-case (split-str me.str #/: */)
         ((nm ty) (set me.name nm me.type ty))
         ((nm)    (set me.name nm me.type nm))))

     (:method format-value (me arg)
       (casequal me.type
         ("year" (fmt "~,04d" arg))
         (("month" "day") (fmt "~,02d" arg))
         (t (if (stringp arg)
              arg
              (tostringp arg)))))

     (:method print (me stream pretty-p)
       (put-string (or me.str
                       (if (equal me.name me.type)
                         `@{me.name}`
                         `@{me.name}: @{me.type}`))
                   stream))

     (:postinit (me)
       (when me.str
         me.(parse))))

  (defun expand-helper (list)
     (cond
       ((null list) nil)
       ((consp (first list))
        (append (range (first (first list))
                       (second (first list)))
                (rangeexpand (rest list))))
       (t (cons (first list) (rangeexpand (rest list))))))

   (defun rangeexpand (list)
     (uniq (expand-helper list)))

   (defun make-values (string)
     (if [#/\.\.|;/ string]
       (let ((syntax (collect-each ((p (split-str string ";")))
                       (tree-case (split-str p "..")
                         ((from to . junk)
                          ;; if junk isn't nil, error!
                          (list (num-str from) (num-str to)))
                         ((single . junk)
                          (num-str single))))))
         (rangeexpand syntax))
       (or (num-str string) string)))

   (defstruct table nil
     location headings rows type order
     (:static order-cnt 0)

     (:method merge (me other)
       (new table
            location other.location
            headings (append me.headings other.headings)
            type other.type
            rows (append-each ((mr me.rows))
                   (collect-each ((or other.rows))
                     (append mr or)))
            order other.order))

     (:method cat (me other)
       (let ((me-copy (copy-struct me)))
         (set me-copy.rows (append me.rows other.rows))
         me-copy))

     (:method expand-rows (me)
       (labels ((expand-row (row)
                  (build
                    (if [find-if consp row]
                      (while* [find-if consp row]
                        (let ((this (mapcar [iffi consp car] row))
                              (next (mapcar [iffi consp cdr] row)))
                          (add this)
                          (set row next)))
                      (add row)))))
         [mappend expand-row me.rows]))

     (:postinit (me)
       (unless me.order
         (set me.order (inc me.order-cnt))))))
@(define os)@/[ ]*/@(end)
@(define location)@\
@  (cases)@\
@/[a-z]+/@(eol)@\
@  (or)@\
@/[a-z]+//@(location)@\
@  (end)@\
@(end)
@(define heading)@/[a-z]+(:[^,]*)?/@(end)
@(define value)@/[^,]+/@(end)
@(define table (location headings values type))
@  (cases)
@    (cases)@\
[[@location]]@(or)[[]]@(bind location "")@\
@    (end)
@    (coll)@(os)@{headings (heading)}@(os)@(end)
@    (coll)@(os)@{values (value)}@(os)@(end)
@    (rebind values (values))
@    (bind type "meta")
@(os)
@  (or)
[@location]
@    (coll)@(os)@{headings (heading)}@(os)@(end)
@    (collect :gap 0)
@      (coll)@(os)@{values (value)}@(os)@(end)
@    (until)
@      (os)
@    (end)
@    (bind type "data")
@  (end)
@(end)
@(collect :vars (tables))
@  (table location headings values type)
@  (bind tables @(new table
                      location (new (location location))
                      headings (mapcar (do new (heading @1)) headings)
                      rows (mapcar (op mapcar make-values) values)
                      type type))
@(until)
@  (eof)
@(end)
@(do
   (let* ((metas (keepqual "meta" tables (usl type)))
          (datas (remqual "meta" tables (usl type)))
          (sorted-metas [sort (copy metas) > (op length @1.location.path)])
          (combined-datas (hash-values (group-reduce (hash :equal-based)
                                                     (usl location)
                                                     (do if @1 @1.(cat @2) @2)
                                                     datas)))
          (augmented-datas (collect-each ((d combined-datas))
                             (each ((m sorted-metas))
                               (when m.location.(is-prefix-of d.location)
                                 (set d m.(merge d))))
                             d)))
     (each ((a augmented-datas))
       (put-line `@{a.location}.csv:`)
       (put-line `@{a.headings ","}`)
       (each ((r a.(expand-rows)))
         (put-line `@{(mapcar (ret @1.(format-value @2))
                              a.headings r) ","}`))
       (put-line))))

The requirement for catenating tables with the same location is handled using the group-reduce expression which relies on a hash table to identify like items and combine them using the table structure's cat method. A table catenates another table by producing a copy of itself with the rows replaced by appending its original rows with that of the other one.

Merging the additional properties from meta tables is performed by iterating over all of the data tables and applying the matching properties. For each data table, we iterate all of the meta tables in order of decreasing path length (most particular to least). From each meta table whose location is a prefix of the data table's location, we merge on the properties with the table merge method. (This also works functionally, like cat: it returns a new, merged table). Merge means that we stick on all the headings from the meta table, and do a cross-producting operation on the rows: each new meta row on the left is paired with every row of the table being extended on the right.

Expanding rows which contain multiple values is done by table expand-rows. This simply makes a copy of each row, with each list substituted by its first item (the Lisp car). It then iterates over the cdr: a new row is calculated where the lists are replaced by their cdr. This repeats, until the lists are exhausted. For instance (1 (a b) 3 (x y)) will produce (1 a 3 x), with a "remainder" of (1 (b) 3 (y)). This remainder produces (1 b 3 y) with a new remainder of (1 nil 3 nil). This doesn't contain any more consp values (all are atom) so the iteration terminates.