I am currently working with a huge TSV file (~5,000 columns and 500,000 records) structured approximately as follows:
f.ID f.1.0.0 f.2.0.0 f.3.0.1 f.3.0.2
1 A 22 B32 -1
2 F 38 B1 65
I cannot inspect it manually, but I have a sister file that should be in the same file format (with the join key f.ID in common).
Everything works fine on the sister file:
$ mlr --itsv cut -f f.ID file1.tab | head -n2
f.ID=1
f.ID=2
But when I try to subset it on known columns (e.g. f.ID), miller returns nothing:
$ mlr --itsv cut -f f.ID file2.tab | head -n2
I am having a hard time figuring out how to diagnose what is going on with this file, as I suspect it's formatted in a non-standard way. Is there a way to get what Miller is doing for each record or to get where it is failing?
If you can use another tool, try using duckdb cli and run
Start with a limited number of rows