How to wget a file when the filename isn't known? - php

I am trying to automate the download of a file using wget and calling the php script from cron, the filename always consists of filename and date, however the date changes depending on when the file is uploaded. The trouble is there is no certainty of when the file is updated, and hence the final name can never really be known until the directory is checked.
An example filename is file20100818.tbz
I tried using wildcards within wget but they have failed, both using * and %
Thanks in advance,
Greg

Assuming the file type is constant then from the wget man page:
You want to download all the GIFs from
a directory on an HTTP server. You
tried wget
http://www.server.com/dir/*.gif, but
that didn't work because HTTP
retrieval does not support globbing.
In that case, use:
wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
So, you want to use the -A flag, something like:
wget -r -l1 --no-parent -A.tbz http://www.mysite.com/path/to/files/

For the sake of clarity, because this threads shows up in google search when searching "wget and wildcards" and because the answers above don't bring sensitive solution and there doesn't seem to be anything else on SO answering this:
According to the wget manual, you can use the wildcards when using ftp and using the option -g on (--glob=on), however, wget will return an error unless you are using all the -r -np -nd options. Thanks to Wiseman20#ubuntuforums for showing us the way.
Samplecode:
wget -r -np -nd --glob=on ftp://ftp.ncbi.nlm.nih.gov/blast/db/nt.*.tar.gz

You can for loop each date like this:
<?php
for($i=0;$i<30;$i++)
{
$filename = "file".date("Ymd", time() + 86400 * $i).".tbz";
//try file download, if successful, break out of loop.
?>
You can increase number of tries in for loop.

Related

A better way of putting allot of information into an SQL database, than querying in URL

I have a simple php script set up, through which I put information into an MySQL database by the following URL query:
https://www.thesite.com/addinformation.php?WHERE_TO_LABEL_IT_AS&WHAT_TEXT_TO_ADD"
I call it from bash by simply using wget
wget --no-check-certificate --user=username --password='password' --delete-after https://www.thesite.com/addinformation.php?$LABELVARIABLE&$STRINGVARIABLE"
It works for my needs, but I would now like to use a similar way to add large chunks of text into the database. Perhaps even whole text files. What method would you recommend I read up on to do this?
You could save your content to a file, and post it using either wget or curl.
$ wget --post-file=<file path>
or
$ curl --data-binary #<file path>

How can I download a web page by using Wget?

Firstly, i want download the web page: http://acm.sgu.ru/problem.php?contest=0&problem=161
I try to use the command:
wget -o 161.html http://acm.sgu.ru/problem.php?contest=0&problem=161
But it not work!
Anyone help me ?
The URL you are providing to wget contains characters that have special meaning in the shell (&), therefore you have to escape them by putting them inside single quotes.
Option -o file is used to log all messages to the provided file.
If you want the page to written to the provided file use option -O file (capital O).
Try:
wget -O 161.html 'http://acm.sgu.ru/problem.php?contest=0&problem=161'

How to setup a wget cron job command

How to setup a cron job command to execute an URL?
/usr/bin/wget -q http://www.domain.com/cron_jobs/job1.php >/dev/null 2>&1
Why can't I make this work!? Have tried everything.. The PHP script should send an email and create some files, but none is done
The command returns this:
Output from command /usr/bin/wget -q http://www.domain.com/cron_jobs/job1.php ..
No output generated
... but it still creates an empty file in /root on each execute!? Why?
Use curl like this:
/usr/bin/curl http://domain.com/page.php
Don't worry about the output, it will be ignored
I had the same problem. The solution is understanding that wget is outputting two things: the results of the url request AND activity messages about what it's doing.
By default, if you do not specify an output file, it will create one, seemingly named after the file in your url, in the current folder where wget is run.
If you want to specify a different output file:
-O outputfile.txt
will output the url results to outputfile.txt, overrwriting what's there.
If you wish to append to that file, write to std out and then append to the file from there:
and here's the trick: to write to std out use:
-O-
the second dash is in lieu of a filename and tells wget to write the url results to std out.
then use the append syntax, >>, to append to a file of your choice:
wget -O- http://www.invisibility.com >>/var/log/invisibility.log
The lower case o, specifies the location of the activity log, so if you wish to log activity for the url request, you can:
wget -o http://someurl.com /var/log/activity.log
-q suppresses output of activity messages
wget -q http://someurl.com /var/log/activity.log
will not log any activity to the specified file, and I think that is the crux where people get confused.
Remember:
-O is shorthand for --output-document
-o is shorthand for --output-file, which is the activity log.
Took me hours to get it working. Thank you for people writing down solutions.
One also needs to make sure to check whether single or double quotes are needed, otherwise it will parse the url wrong leading to error messages:
This worked (using single quotes):
/usr/bin/wget -O -q 'http://domain.com/cron-file.php'
This gave errors (using double quotes):
/usr/bin/wget -O -q "http://domain.com/cron-file.php"
Don't know if the /usr/bin/ is needed. Read about different ways of how to do the order of the -O -q. It is hard to find a reliable definitive source on the web for this subject. So many different examples.
An online wget manual can be found here, for the available options (but check with the Linux distro one is using for an up to date version):
http://unixhelp.ed.ac.uk/CGI/man-cgi?wget
For use wget to display HTML:
wget -qO- http://www.example.com

Why could wget not work with PHP's exec function?

My script tries to exec() wget but seems to fail (though, no error raises up). What could be the problem? Should I tune PHP somehow? I just installed Apache and PHP on Ubuntu...
Add third parameter to exec() to find out the exit code of wget.
Maybe wget is not in the (search) path of the apache/php process.
Did you try an absolute path to the wget executable?
What is your $_GET['one']? The name of a video file? A number? A url? What's $file? What' $one?
Obvious error sources:
Are all of those variables set? If $one is blank, then wget has nowhere to go to fetch your file. If $_GET['one'] and $file are blank, then your output file will most likely not exist, either because the directory can't be found ($_GET['one']) is empty, or $file is empty, causing wget to try and output to a directory name, which is not allowed.
'illegal' characters in any of the variables. Does $file contain shell meta-characters? Any of ;?*/\ etc...? Those will all screw up the command line.
Why are you using wget anyways? You're passing raw query parameters out to a shell, which is just asking for trouble. It would be trivial to pass in shell metacharacters, which would allow remote users to run ANYTHING on your webserver. Consider the following query:
http://example.com/fetch.php?one=;%20rm%20-rf%20/%20;
which in your script becomes:
wget -O /var/www/videos/; rm -rf / ;/$file $one
and now your script is happily deleting everything on the server which your web server's user has permissions for.

Cron job creating empty file each time it runs

I have a php script I want to run every minute to see if there are draft news posts that need to be posted. I was using "wget" for the cron command in cPanel, but i noticed (after a couple days) that this was creating a blank file in the main directory every single time it ran. Is there something I need to stop that from happening?
Thanks.
When wget runs, by default, it generates an output file, from what I need to remember.
You probably need to use some option of wget, to specify to which file it should write its output -- and use /dev/null as destination file (It's a "special file" that will "eat" everything you can write to it)
Judging from man wget, the -O or --output-file option would be a good candidate :
-O file
--output-document=file
The documents will not be written to the appropriate files, but all will be concatenated together and written to file.
so, you might need to use something like this :
wget -O /dev/null http://www.example.com/your-script.php
And, btw, the output of scripts run from the crontab is often redirected to a logfile -- it can always help.
Something like this might help, about that :
wget -O /dev/null http://www.example.com/your-script.php >> /YOUR_PATH_logfile.log
And you might also want to redirect the error output to another file (can be useful, to help with debugging, the day something goes wrong) :
wget -O /dev/null http://www.example.com/your-script.php >>/YOUR_PATH/log-output.log 2>>/YOUR_PATH/log-errors.log

Categories