extracting text from html file with bash

Question

extracting text from html file with bash

687 views Asked by kely789456123 At 01 January 2017 at 12:50

I have a script:

cd ../data;
dossier=$(ls crawl);

let "compte = 1";

for file in $dossier
do

lynx --dump --nolist $file >> ../data/txt/$compte'.txt';

let "compte = compte + 1"; 
done

I am using lynx to retrieve the text from all my HTML files but the problem is that when I open my text file, it is written that:

410 GONE

This doesn't exist any more. Try html.com.

I do not know why because when I am in the terminal and in my crawl-folder, I execute the lynx dump on each HTML file and it is producing the text file but when I want to use it with the script to read all my HTML files and use lynx on them the results are not good.

Original Q&A

There are 1 answers

**fernand0** · Answer 1 · 2017-01-01T12:57:01+00:00

fernand0 On 01 January 2017 at 12:57

You need the protocol and (not sure about this) the path. For example:

lynx -dump file:///where/my/file/is/file.html

TechQA.

extracting text from html file with bash

There are 1 answers

Related Questions in HTML

Related Questions in BASH

Related Questions in LYNX

Popular Questions

Popular Tags

Trending Questions