I found quiet few posts here suggesting solutions using awk
and sed
, but none of them seems to do the job. Either the whole line is being removed, or nothing at all is removed. I'm also not command line wizzard and my knowledge is kind of limited, so I decided to ask for help here. It doesnt matter the solution, whether is awk
, grep
, sed
... I honestly can't make any difference in this case, so it will be whatever you feel it's beset in this case.
What I have is several files with few million lines, and the files/lines look something like this:
50somethingcharactergibberish shortrword
50somethingcharactergibberish shortrword
50somethingcharactergibberish shortrword
50somethingcharactergibberish shortrword
50somethingcharactergibberish shortrword
50somethingcharactergibberish shortrword
And this goes for several million lines. What I need to do, is to remove the 50somethingcharactergibberish and leave only the shortword. The problem also is that there is no pattern, the long word in question sometimes starts with letter, and sometimes with number. So I assume I'll have to count the characters eventually.
The most minimal
awk
that could work for you is something like:-