how to get csv files using wget

18.3k views Asked by At

I want to download some csv files from a webpage using wget. (This is the webpage http://sinca.mma.gob.cl/index.php/region/index/id/II). However using wget I only get some cgi-bin files and other format files which I suppose could build an csv file. Given that I have no knowledge at all on javascript or whatever is required to build the csv files, is there a way I could get those excel files using wget directly?

This is the log file after running wget --10:30:06-- http://sinca.mma.gob.cl/index.php/region/index/id/II => `sinca.mma.gob.cl/index.php/region/index/id/II' Resolving sinca.mma.gob.cl... 190.215.49.125 Connecting to sinca.mma.gob.cl[190.215.49.125]:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html]

0K .......... .......... .......... .......... ..........   28.17 KB/s

50K .......... .......... .......... .......... .......... 226.24 KB/s 100K . 1.44 MB/s

Last-modified header missing -- time-stamps turned off. 10:30:09 (50.81 KB/s) - `sinca.mma.gob.cl/index.php/region/index/id/II.html' saved [103911]

Removing sinca.mma.gob.cl/index.php/region/index/id/II.html since it should be rejected.

FINISHED --10:30:09-- Downloaded: 103,911 bytes in 1 files Converted 0 files in 0.00 seconds.

2

There are 2 answers

9
Luke Rixson On

Wget dependant on the options will get all the files you specify, if you ask it to grab all the files that is exactly what it will do, unless the permissions do not allow those files to be downloaded, if you use

wget -r --no-parent http://www.example.com/folder/

That will pull all files, folders and subfolders of that directory unless you negate files of a particular type i.e:

To filter for specific file extensions:

wget -A pdf,jpg -m -p -E -k -K -np http://site/path/

Or, if you prefer long option names:

wget --accept pdf,jpg --mirror --progress --adjust-extension --convert-links --backup-converted --no-parent http://site/path/

This will mirror the site, but the files without jpg or pdf extension will be automatically removed.

So in answer to your question, yes you can just specify you want all excel files and nothing else.

If it still does not work your could try using the

-o wget.log

option to specify it to log to a file so you can see what's going on post up the log results and I will try to help you some more.

0
tekim On

You need to provide wget the full url that generates the file you want, for example:

wget -O test.csv "http://sinca.mma.gob.cl/cgi-bin/APUB-MMA/apub.tsindico2.cgi?outtype=xcl&macro=./RII/237/Cal/PM25//PM25.diario.diario.ic&from=13060100&to=15110323&path=/usr/airviro/data/CONAMA/&lang=esp&rsrc=&macropath="

I tested the above and I get the exact same csv file as I do when I click the link on the site. The link runs some javascript which generates the URL used above. To get that URL I clicked on the link, and then copied the address that appeared in the address bar.