Efficient way to get n middle lines from a very big file

Question

Efficient way to get n middle lines from a very big file

7.7k views Asked by Mahesh At 09 December 2013 at 07:07

I have a big file around 60GB.

I need to get n middle lines of the file. I am using a command with head and tail like

tail -m file |head -n >output.txt

where m,n are numbers

The general structure of the file is like below with set of records (comma separated columns.) Each line can be of different length(say max 5000 chars).

col1,col2,col3,col4...col10

Is there any other way that I can take n middle lines with less time, because the current command is taking lot of time to execute?

Original Q&A

There are 6 answers

**MarcoS** · Answer 1 · 2013-12-09T09:00:41+00:00

The only possible solution I can think of to speed up the search is to build and index of your lines, something like:

And then, knowing the index length, you could jump quickly in the middle of your data file (or wherever you like...). Of course you should keep the index updated when the file changes...

Obviously the canonical solution for such a problem would be to keep the data in a DB (see for example SQLite), an not in a plain file... :-)

**perreal** · Answer 2 · 2013-12-09T09:16:25+00:00

perreal On 09 December 2013 at 09:16

With sed you can at least remove the pipeline:

sed -n '600000,700000p' file > output.txt

will print lines 600000 through 700000.

**Rajish** · Answer 3 · 2013-12-09T09:20:02+00:00

It might be more efficient to use the split utility, because with tail and head in pipe you scan some parts of the file twice.

Example

split -l <k> <file> <prefix>

Where k is the number of lines you want to have in each file, and the (optional) prefix is added to each output file name.

**bobah** · Answer 4 · 2013-12-09T09:26:38+00:00

Open the file in the binary random access mode, seek to the middle, move forward sequentially till you reach \n or \n\r ascii, starting from the following character dump N lines to your rest file (one \n - one line). Job done.

If your file is sorted and you need data between two keys you use the above described method + bisection.

**Anitha Mani** · Answer 5 · 2013-12-09T10:28:27+00:00

Anitha Mani On 09 December 2013 at 10:28

awk 'FNR>=n && FNR<=m'

followed by name of the file.

**Mihamina Rakotomandimby** · Answer 6 · 2023-04-24T06:01:53+00:00

Having the same problem (mine is an Asterisk Master.csv file), I am affraid there is no trivial solution: when wanting to access the 10,000,000-th line of a file (file, not database record nor in memory representation of the file), whatever have to count from 0 to 10,000,000... :-(

TechQA.

Efficient way to get n middle lines from a very big file

There are 6 answers

Example

Related Questions in UNIX

Related Questions in TAIL

Related Questions in UNIX-HEAD

Popular Questions

Popular Tags

Trending Questions