Sorting file with 1.8 million records using script

Question

Sorting file with 1.8 million records using script

175 views Asked by user2653586 At 05 August 2013 at 15:40

I am trying to remove identical lines in a file having 1.8 million records and create a new file. Using the following command:

sort tmp1.csv | uniq -c | sort -nr > tmp2.csv

Running the script creates a new file sort.exe.stackdump with the following information:

"Exception: STATUS_ACCESS_VIOLATION at rip=00180144805
..
..
program=C:\cygwin64\bin\sort.exe, pid 6136, thread main
cs=0033 ds=002B es=002B fs=0053 gs=002B ss=002B"

The script works for a small file with 10 lines. Seems like sort.exe cannot handle so many records. How do I work with such a large file with more than 1.8 million records? We do not have any database other than ACCESS and I was trying to do this manually in ACCESS.

Original Q&A

There are 2 answers

Rob Starling On 05 August 2013 at 16:23

It sounds like your sort command is broken. Since the path says cygwin, i'm assuming this is GNU sort, which generally should have no problem with this task, given sufficient memory and disk space. Try playing with flags to adjust where and how much it uses the disk: http://www.gnu.org/software/coreutils/manual/html_node/sort-invocation.html

**user2653586** · Accepted Answer · 2013-12-26T19:05:22+00:00

user2653586 On 26 December 2013 at 19:05 BEST ANSWER

The following awk command seemed to be a much faster way to get rid of the uniqe values:

awk '!v[$0]++' $FILE2 > tmp.csv

where $FILE2 is the file name with duplicate values.

TechQA.

Sorting file with 1.8 million records using script

There are 2 answers

Related Questions in EXCEPTION

Related Questions in SORTING

Related Questions in LARGE-FILES

Related Questions in STACK-DUMP

Popular Questions

Popular Tags

Trending Questions