Execute PHP scraper script in cronjob - php

I am trying to execute this script that will go fetch data from a site and import it into my database.
I have created the cronjob and waited for 20 minutes. There is no error or result, it is just silent like nothing happened.
I am also not getting an email showing the result of the command. How can I execute this script and also receive the result via email?
This is the cronjob I am currently using:
20 * * * * /usr/bin/GET http://example.com/wp-content/plugins/ScriptName/scrap_data2.php?request_type=import_animes&site=2

Although Hostgator does use GET as one of its cron examples; the usual way to "run" a script from cron via its http link is to use "wget" or "curl".
I beleive "GET" is part of "libwww-perl" package and it may depend on where or whether this is installed. Try using WGET instead.
20 * * * * wget -O /dev/null http://example.com/wp-content/plugins/ScriptName/scrap_data2
'-O /dev/null' above is used to ditch cron wgets output as you won't need it since your script emails success.

Related

cronjob executing a php file

So I have a php file that executes each time it you reload it in the web browser. It uses PHPMailer, to send mail based on criteria in my db. I was attempting to use a cronjob to execute the file which I thought would basically do the same as reloading the page. The php file that I need to run in the cronjob is test.php, and its path is /var/www/html/mailer/test.php.
My cronjob is:
1 * * * * root /var/www/html/mailer/test.php >> /var/www/html/mailer/cron_status.log
and it should be throwing errors into that cron_status.log file, but its empty. I realize that this is firing every minute, but I'm just doing it to test the cronjob, and I really need to set it to 24hrs. With no error output, and no emails landing where they should be, I don't think I've properly setup my cronjob. This is my first time ever trying this. This is on a centos 7 droplet, and I've followed the tutorial from digital ocean with no success.
I need to see the php file to be sure, but you probably need to change it to this:
1 * * * * php /var/www/html/mailer/test.php >> /var/www/html/mailer/cron_status.log

Codeigniter CLI command with Tsohost not working

I currently have a PHP website built with codeigniter, and i'm having issues with CLI and cron jobs.
The CLI is setup so the controller running the script is found in the /application/controllers/scrape on the server (looking via the ftp) this would be /public_html/application/controllers/scrape, the function to run is called all_sites.
I'm hosted with TSOhost and can successfully run the command using the browser via URL (website.com/index.php/scrape/all_sites)however the script times out, hence the need to use a cron job to run the script.
So far i have tried the following raw cron commands in the advanced mode in the TSOhost control panel when trying to get the script to run daily:
The TSOhost technician set this up
03 19 * * * /usr/bin/php-5.3 /var/sites/s/website.com/public_html/application/controllers/scrape.php (didn't work)
0 6 * * * /usr/bin/wget -O /dev/null -o /dev/null http://www.speeddatemate.com/index.php/scrape/all_sites
0 6 * * * /usr/bin/php-5.3 /var/sites/s/speeddatemate.com/public_html/application/controllers/scrape/all_sites
03 19 * * * /usr/bin/php-5.3 /var/sites/s/speeddatemate.com/public_html/application/controllers/scrape.php
03 19 * * * /usr/bin/php-5.3 /var/sites/s/speeddatemate.com/public_html/application/controllers/scrape/all_sites
10 18 * * * /usr/bin/php-5.3 /var/sites/s/speeddatemate.com/public_html/index.php scrape all_sites
TSO host have stated:
For referencing your site path, use /var/sites/s/website.com
The path to PHP 5.2 is /usr/bin/php and for 5.3 it's /usr/bin/php-5.3
The technician also said:
"To run from CLI you would need to find a way to get those parameters into the script, they can't be appended to the command."
Although is this not the Point of the command?
I've also tried running it via the "make a http request" option which creates the raw job as:
0 6 * * * /usr/bin/wget -O /dev/null -o /dev/null http://www.speeddatemate.com/index.php/scrape/all_sites
Again this does not work.
I've searched high and low to find a way to get this working and read various posts and tried various methods nothing has worked. Can anyone help?
First off, don't bother setting up a cron job unless you have it working on the command line. You will end up mixing different things and generating a plethora of unwanted possibilities for debugging which will confuse you further.
Now, you are saying that
I can successfully run the command using the browser via URL (website.com/index.php/scrape/all_sites), however the script times out
This could be because the script takes longer than 30 seconds to complete, and thus is hitting the max-execution-time in php. Check this.
If this is the case, check this and answers over here and here to resolve it by increasing the limit.
Once you have a working version of the script, either via browser or via command line, you can go ahead and schedule the cron jobs, like already shared by your TSOhost tech support.
03 19 * * * /usr/bin/php-5.3 /var/sites/s/website.com/public_html/application/controllers/scrape.php (didn't work)
0 6 * * * /usr/bin/wget -O /dev/null -o /dev/null http://www.speeddatemate.com/index.php/scrape/all_sites
Once again, setup the crons only after you have the other pieces working.
If it still doesn't work, give more info regarding:
What exactly the script does? Does it download something, does it backup something; update the question with whatever it does.
What parameters does it require? and how were you passing them via the url?
What do the logs say? Assuming your webserver is apache, you can usually find them at /var/log/apache2/error.log and /var/log/apache2/access.log.
EDIT 1
OP says in comments that php index.php scrape all_sites works for him.
Assuming that works from the root of of his app, where path to index.php can be asssumed to be /var/sites/s/website.com/public_html/application/index.php, try this cron job then
03 19 * * * cd /var/sites/s/website.com/public_html/application/ && /usr/bin/php-5.3 index.php scrape all_sites
If possible, schedule it for a time closer to current time rather than a fixed job for 19:03
If this still doesn't work, and assuming the max-execution-time has already been taken care of, the problem could be with one of your environment variables - your cli shell environment is having some variables that are missing from your cron environment.
In my experience, I have found that PATH variable causes the most troubles, so run echo $PATH on your shell, and if the path value you get is /some/path:/some/other/path:/more/path/values, run your cron job like
03 19 * * * export PATH="/some/path:/some/other/path:/more/path/values" && cd /var/sites/s/website.com/public_html/application/ && /usr/bin/php-5.3 index.php scrape all_sites
If this doesn't work out, check all the environment variables next.
Use printenv > ~/shell_environment.txt from a normal shell to get all the environment variables set in the shell. Then use the following cron entry * * * * * printenv > ~/cron_environment.txt to get the variables from the cron environment.
Compare the two - shell_environment.txt and cron_environment.txt for any unset variable which you need to tinker with in cron environment.
First of all make sure your script run as expected from your browser. If so then you can try to run it from command line. Lets assume following is your controller.
<?php
class Scrape extends CI_Controller {
public function all_sites($some_parameter = 'working')
{
echo "Its {$some_parameter}!".PHP_EOL;
}
}
?>
As per your provided information to run the command you can run
/usr/bin/php-5.3 /var/sites/s/website.com/public_html/index.php scrape all_sites
And set cron as
03 19 * * * /usr/bin/php-5.3 /var/sites/s/website.com/public_html/index.php scrape all_sites
If you need to pass a parameter, pass the parameter like:
/usr/bin/php-5.3 /var/sites/s/website.com/public_html/index.php scrape all_sites "Working fine"
Enjoy!
You can read reference document about Running via the CLI from codeigniter here

Cron job is not giving required result, but accessing same file through browser does

I need to run a php script to generate snapshots using CutyCapt of some websites using crone job, i get websites' addressess from database and then delete this record after generating screenshots.
i used */5 * * * * /usr/bin/php -f /path/generate.php
it didn't worked for crone job but if i access the same file using browser it works fine, and if run this script using command php from command line it also works fine.
then i just created another file and accessed the url using file_get_contents; added this file to crone job it worked fine for some days but now this approach is also not working. i don't know what happened. i didn't change any of these files.
I also tried wget command to access that url but failed to get required out put.
my crontab is now looks like this
*/5 * * * * wget "http://www.mysite.com/generate.php" -O /dev/null
Strange thing is that crone job executes fine it fetches data from database and deletes record as well but does not update images.
Is there any problem with rights or something similar that prevents it to generate images using crone job but not when accessed using browser.
Please help i am stuck.
I don't know what your script is doing internally, but keep in mind that a user's cron jobs do not inherit the users environment. So, you may be running into a PATH issue if your php script is running any shell commands.
If you are running shell commands from the script, try setting the PATH environment variable from within your php script and see if that helps.
is there any user credintials on this page , such as Basic authentication ?
if so , you have to define the user name and password in wget request like
wget --http-user=user --http-password=password "http://url" ?
and try another solution by running yor script from php command line
so your crontab could look like
*/5 * * * * /usr/bin/php -f /path/to/generate.php
try this solution it will work and it is better than hitting the server to execute background operations on your data
and I hope this helps

Cron running but functionality not working

I have several PHP files to be run by cron. I set up the crons using command-
crontab crontab.txt
Inside the crontab.txt file, I have written cron commands like this:-
#(Updating tutor activities) - every minute
* * * * * /usr/bin/wget -O - -q -t 1 http://project/cron/tutor_activities.php
But none of the functionalities are working (database queries, sending reminder mails etc.). Running the URLs manually works.
Then I put my mail address in MAILTO and received the mails. In the mail, I received entire HTML source of the page. What is expected in the mail? Why are my functionalities not working?
Updates
If I change my cron commands to
#(Updating tutor activities) - every minute
* * * * * /usr/bin/wget http://project/cron/tutor_activities.php
Still no success and this comes in my mail -
--15:03:01-- http://project/cron/tutor_activities.php
=> `tutor_activities.php'
Resolving project... IP Address
Connecting to test.project|IP Address|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://project./ [following]
--15:03:01-- http://project./
=> `index.html.1'
Resolving project.... IP Address
Connecting to project.|IP Address|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://project/home/ [following]
--15:03:01-- http://project/home/
=> `index.html.1'
Resolving project... IP Address
Connecting to wproject|IP Address|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
index.html.1 has sprung into existence.
Retrying.
And lots of index.html.1 , index.html.2 files are accumulating in the root of my project. I do not want these files to be created. Just want the files to execute.
Same results if I use either of the two commands -
* * * * * /usr/bin/wget http://project/cron/tutor_activities.php
* * * * * wget http://project/cron/tutor_activities.php
running php command with MAILTO set sends me this error /bin/sh: php: command not found.
* * * * * php /path/to/test.php
So, I am not able to use php command.
I have written a simple mailto() inside my test.php. The mail does not come when run through cron (using both wget and php fails) but running the URL manually works.
My problem
To make it clear again, my main problem is that the functionality inside the cron files is not running. Creation of files is a secondary issue.
Any help would be appreciated
Thanks,
Sandeepan
if you want to call an url as cronjob, you'll have to use somthing like wget. if it's a php-script on your server it would be easier to use php /...pathtomyscript.../cron/tutor_activities.php
try
which php
The path which is returned should be placed with the command which is passed to run the Cron file.If you are setting up the Cron through Shell,it won't give any problem,but to be assured,try giving absolute path when you are trying to run a php page.
/path/to/php /path/to/cron/script/
Try to give your comand like this,if the problem persists;feel free to discuss.
When you call wget with -O -, it will send the downloaded content to stdout, which cron is sending to you via the email message. In the first case, it's doing exactly what it should.
When you call wget witout the -O parameter, it will try to save the downloaded content as a file of the same name as the web page being downloaded. If it exists, it will add the incrementer to the name, as you saw. In this second case, it's doing exactly what it should.
It's not clear from your question where you want the output to go, but if you want to save the output to the same file each time, use -O myfilename.html.
If your running PHP from cron/command line make sure you put the full path to the php executable
It's entirely possible that PHP's not in the path within the cron environment - it's definitely not going to have the same setup as your regular shell. Try using the absolute path to BOTH the php interpreter AND the php script in the cron command:
* * * * * /path/to/php /path/to/test.php
As for the creation of files, you just have to add a redirect to your wget command:
wget -O - ... http://.... > /dev/null
-O - forces wget to write anything it downloads to standard output, which cron will then happily email to you. By adding the > /dev/null at the end of the command, this output will instead go the Great Bitbucket in the Sky. If you don't want wget's stderr output emailed either, you can also add a 2&>1 after the /dev/null, which further redirects stderr to stdout, which is now going to /dev/null.
I found the problem myself. I did not put the same URL in my crontab file which I was running manually and that was my mistake.
While running manually I was just typing test in the URL, my browsers's list of saved URLs was appearing and I was selecting the URL http://www.test.project.com/cron/tutor_activities.php, but in the crontab file I had put http://test.project.com/cron/tutor_activities.php. I was mistakenly assuming this would run http://www.test.project.com/cron/tutor_activities.php (because we have a rewrite rule present to add www)
But the rewrite rule was redirecting it to http://www.test.project.com/home. That's why the HTML content in the mails.
So, the most important thing to learn here is to make sure we don't miss the minute things and don't assume that we did everything correctly. In my case, better to copy-paste the working URL into the cron file.
An easy and secure (no tmp files) way to do this is to use bash's process substitution:
* * * * * bash -c "/path/to/php <(/usr/bin/wget -O - -q -t 1 http://project/cron/tutor_activities.php)"
Process substitution runs the command within <() and puts the output into a file object that is only visible from the current process. In order to use it from cron, invoke bash directly and pass it as a command string.
And as others have mentioned, use the full path to php which you can find out with which php.

My php script is not sending an email when called by a cron job

I am running a cron job every five minutes which calls a php script to check to see if users have imported any files for processing.
When I run the php script by going to the address in my web browser it runs the script and then sends the user a notification by email. When I run the script using the cron job, the script works fine, but it doesn't send the user an email. Any thoughts about why no email is sent?
I'm running Ubuntu Hardy LTS. The cron job is:
*/5 * * * * /usr/bin/wget -–delete-after http://www.mywebsite.com/import_processing.php >/dev/null 2>&1
I'm using delete-after so that I don't get copies of the script piling up in my server directory. I'm suppressing output and errors also as I don't need email confirmation myself.
The script uses the basic mail function, and as I said, works just fine when run from my browser.
Update: It looks like the issue is my php script is looking for a browser cookie to send the email. I imagine I'll have to find another way to get the user's identity.
run it like this
*/5 * * * * /usr/bin/php /var/www/htdocs/blah/blah/import_processing.php >/dev/null 2>&1
when you use wget you are downloading the file,
with php you are running the file,
test your script running
/usr/bin/php /var/www/htdocs/blah/blah/import_processing.php >/dev/null 2>&1
using the local path, just in case run
$ which php
to figure out where it is installed
The script was running properly using wget or php; the problem was that the php script looked for a browser cookie which didn't exist when the page was run outside of a web browser.
I changed the php script to do a database lookup instead and it worked just fine. Thanks for your suggestions.

Categories