Keeping file name information with Cascalog Tuples

97 views Asked by At

I'm looking for a way of keeping a filename that's associated with the tuples/data that originate from that particular file. I've searched around and found that hfs-wholefile works really well at getting filenames but it then returns a large chunk of binary information. Is it possible to take this binary information and turn it back into tuples that I can then processes as if I had gotten them from hfs-textline?

(def file-name-with-data
  "Process a file and associate a filename with it"
   [file]
  (<- [file-name ?data1 ?data2 ?data3 ?data4]
      ((hfs-wholefile file) ?file-name ?binary-data)
      (function-that-im-looking-for ?binary-data :> ?data1 ?data2 ?data3 ?data4)))

The example above is ideally what I would like to use to process this information. In Cascalog/Cascading is there a way to turn the bytes into regular variables I can use in queries?

0

There are 0 answers