The following command was aborted:
wget -w 10 -m -H "<URL>"
I would like to resume this download without checking the dates on the server for every file that I've already downloaded.
I'm using: GNU Wget 1.21.3 built on darwin18.7.0.
The following doesn't work for me because it keeps requesting headers at a rate of 1 every 10 seconds, to not overwhelm the server, and then it doesn't download the files, but checking is very slow. 10 seconds times 80,000 files is a long time, and if it aborts again after 300,000 files resuming using this command will take even longer. In fact it takes as long as starting over, which I'd like to avoid.
wget -c -w 10 -m -H "<URL>"
The following is not recursive as the first file exists and subsequently not parsed for URLs to recursively download everything else.
wget -w 10 -r -nc -l inf --no-remove-listing -H "<URL>"
The result of this command is this:
File ‘<URL>’ already there; not retrieving.
The file that's "already there" contains links that should be followed, and if those files are "already there" then they too should not be retrieved. This process should continue until wget encounters files that haven't yet been downloaded.
I need to download 600,000 files without overwhelming the server and have already downloaded 80,000 files. wget should be able to zip through all the downloaded files really fast until it finds a missing file that it needs to download and then rate limit the downloads to 1 every 10 seconds.
I've read through the entire man page and can't find anything that looks like it will work except for what I have already tried. I don't care about the dates on the files, retrieving updated files, or downloading the rest of incomplete files. I only want to download files from the 600,000 that I haven't already downloaded without bogging down the server with unnecessary requests.
If said file contains absolute links then you might try using combination of
--force-htmland-i file.htmlconsider following simple example, letfile.htmlcontent bethen
does create following structure
and if you remove one of files, say
archive.org/offshoot_assets/favicon.icothen subsequent run will download only that missing file.