Count Lines, grep, head, and tail inside Feather Files

637 views Asked by At

Setup: I am contemplating switching from writing large (~20GB) data files with csv to feather format, since I have plenty of storage space and the extra speed is more important. One thing I like about csv files is that at the command line, I can do a quick

wc -l filename

to get a row count, even for large data files. Also, I can quickly search for a simple string with

grep search_string filename

The head and tail commands are also very useful at times. These are straight-forward and work well with csv files, but not with feather. If I try any of them on a feather file, I do not get results that make sense or are helpful.

While I certainly can read a feather file into, say, Python or R, and analyze it then, the hassle of writing out the path and importing the necessary libraries is something I'd rather dispense with.

My Question: Does there exist either a cross-platform (at least Mac and Linux) feather file reader I can use to quickly read in and view feather data (this would be in tabular format) with features corresponding to row count, grep, head, and tail? Or are there simple CLI utilities I could install that would enable me to do the equivalent of line count, grep, head, and tail?

I've seen this question, but it is very incomplete relative to my question.

1

There are 1 answers

1
Dudi Boy On BEST ANSWER

Using feather files you must use Python or R programs.

To use csv you can use any of the common text manipulation utilities available to Linxu/Unix users.

Linux text manipulation tools

reader less

search grep

converters awk sed

extractor split

editor vim

Each of the above tools requires some learning and practice.

Suggestion

If you have programming skill, create a program to manipulate your feather file.