I'm working on an app written in CakePHp 2.3.8 on a Ubuntu 12.04 server running apache2. I'd like to create a cron job to handle a situation that occurs on the first day of every month. Each month users are given a set amount of specific tasks they can use, if they go over this limit they're charged based on the number of tasks they go over by. I'd like to create a cron job to accomplish this, but my concern is someone accessing the URL of the CakePHP action for this specific task which could then initiate financial transactions.
I read through this writeup from Google about cron jobs, but I'm not quite sure I understand what they're saying about securing URL's.
A cron handler is just a normal handler defined in app.yaml. You can prevent users
from accessing URLs used by scheduled tasks by restricting access to administrator
accounts. Scheduled tasks can access admin-only URLs. You can restrict a URL by
adding login: admin to the handler configuration in app.yaml.
If the URL being accessed is powered by my CakePHP app, how is cron able to determine whether or not an administrator is accessing it? Or am I supposed to write a stand-alone PHP (or whatever language) file to handle these cron jobs, and inside that file it can "talk" to cron to determine if an admin is accessing it?
Say I do use CakePHP to power it. Would it be safe (or rather necessary) to use a long string in the URL so that basically no one would guess it, and have it match that string in the code?
So something like www.mysite.com/url/to/task/jdbpojzm2929qJjfwX82j3zze9iwj919jsfjmmwmwi
And then my code for that job
function cron_called_function($code){
if($code == "jdbpojzm2929qJjfwX82j3zze9iwj919jsfjmmwmwi"){
//do task
}
}
Non-public member functions cannot be accessed via the url. Cake convention says prefix the method with an underscore.
private function _cron_called_function() { // or protected
// do task
}
Or perhaps look at creating a shell
and setting up a cron in cake
Never use URLs to do these kind of tasks, it is simply plain wrong, insecure and can cause your script to die or the server to become not responding any more.
Lets say you have 10000 users and a script runtime of 30 sec, it is very likely that the script times out before it finished and you end up with just a part of your users being processed at this time. The other scenario with a high or infinite amount of script runtime can lock your server. Depending on the script or DB actions it might cause the server to have a high load and users who use the site while the script is running will encounter a horrible slow to non responding site.
Also you can't really run a loop on a single URL, well you could redirect from one to another that does the limit and offset thing to simulate a loop over the 100000 users. If you don't loop over the records but fetch all 100000 at the same time it's likely your script dies because of running out of memory.
You should create a shell that processes the users in a loop and always just processes batches of for example 10, 50 or 100 users.
When executing your shell I recommend to use it with the "nice" command together to limit the amount of CPU time the shell is allowed to use to prevent the shell from taking 100% CPU usage to keep your site responding.
You can't "talk" to a cron either, a cron is nothing more than a timed executing of something. You can't really specify an user either except you implemented a shell in a way that allows you to pass a specific user as argument for example "cake transactions --user admin". If you mean to execute the shell with a specific system user see How to specify in crontab by what user to run script?.
Look at creating a shell
and setting up a cron in cake.
There are a bunch of ways to prevent anyone buy your own server from accessing a url. None are perfect, but some are better than others.
If possible, point the cron to a page that is simply not visible on the web. This could be a page that is located above the public_html heirarchy. From within the server, this page will be accessible, but will not be accessible via url. This is the best option, IMO.
Another option is to restrict the page to the ip address of the server and to other values in the request such as a post or querystring variable.
And, of course, you have already figured out that you can include a long secret or token in the url that would long enough to make it difficult or unlikely to guess.
You could also ping a page that, in turn, uses CURL to log in as an administrator and runs the page - this is, in some ways, the option that most reflects how you interact with the site. You could create a admin called "cron" and then there would be a log of "cron"'s activities just like any other admin. http://php.net/manual/en/book.curl.php
Related
I developed a site using Zend Framework 2. It is basically a price comparison site that integrates with many of the top affiliate networks out there. I wrote a script that checks prices from each affiliate network, and then updates my local DB with that price. Depending on which affiliate network I am contacting, I may be making an API call (Amazon or CJ.com), or I may be looking at an XML product feed (Pepperjam or LinkShare). The XML product feed would be hosted locally.
At present, there are around 3,500 sku's that I am checking with this script. The vast majority of them (95%+) are targeting an XML product feed. I would estimate that this script should probably take in the neighborhood of 10 minutes to complete. Some of the XML files I am looking at are around 8 MB in size.
I have tested this script thoroughly in my local environment and taken great lengths to make sure that there is no memory leak or something of that nature which would cause performance issues. As an example, I made sure to use data streams where possible to avoid putting the XML file in memory over and over, etc. Suffice to say, the script runs locally without issue.
This script is intended to be run as a cron job, however I do have a way to trigger it via the secure admin interface ad-hoc. Locally, this is how I initiate the script to run, and everything goes rather smoothly.
When I deploy my code to the shared hosting account, I am having all sorts of problems. In order to troubleshoot, I attached logging to various stages of this script to track when it starts, how it progresses, and when each step completes, etc. All of this is being logged to a MySQL database.
Problem #1: If I run the script ad-hoc via an HTTP request, I find that it will run for a couple minutes, and then the script starts again (so there are now two instance apparently running). Wait another couple minutes, and a third one will start, etc..... Here is an example when I triggered the script to run at 10:09pm via an HTTP request.
Screenshot of process manager
Needless to say, I DO NOT run it via an HTTP request because it only serves to get me in trouble with my web hosting provider :)
Problem #2: When the script runs on the server, triggered via a cron job, it is failing to complete. I have taken the production copy of the database and taken it locally along with the XML files, it runs fine. So it should not be a problem with bad data exposing bad code. My observation is - the script nearly runs for the exact same amount of time - before aborts, or is terminated, or whatever. The last record updated is generally timestamped around 4 minutes and 30 seconds or so (if memory serves) after the script is triggered. The SKU list is constantly changing so the record that it ends on differs, but the the time of the last update is nearly the same each time. Nothing is being logged in the error logs. I monitored server resources via SSH top command and there is nothing out of the ordinary. CPU usage is in check and memory used does not go up.
I have a shared hosting account through Bluehost. My thoughts were that perhaps it was a script max execution time issue. I extended the max execution time in the script itself and via php.ini. Made no difference.
So I guess what I am looking for is some fresh ideas of where to go next. What questions should I be asking my hosting company so they can help me get to the bottom of this. They are only somewhat helpful to say the least. Could it be some limitation on my hosting account? Triggering some sort of automatic monitor that is killing the script? What types of Apache settings could be problematic for a script of this nature? PHP.ini settings? Absolutely any input you can provide would be helpful.
And why, when triggered via HTTP, would it keep spinning up new instances? I guess I could live w/o running it manually, and only run it via a cron job, but that isn't working either. So .... interested in hearing the communities thoughts on this. Thanks!
I haven't seen your script, neither did I work with your hoster, so everything below is just a guess - and a suggestion.
Given your description, I would say you're right that your script might have been killed by timeout when run from cron. I'm not sure why it keeps spawning new instances of your script when you execute it manually via an HTTP request, but it may also be related to a timeout (e.g. if they have a logic that restarts a script if it has not produced an output within a certain time, or something like that).
You can follow up with your hosting provider about running long-running (or memory-consuming) script in their environment, and they might have some FAQ or document already written that covers this topic.
Let me suggest an option for you in case if your provider is unable to help.
From what you said, I expect your script runs an SQL query to get a list of SKUs, and then slowly iterates over this list, performing some job on every item (and eventually dies for whatever reason, as we learned).
How about if you create a temporary table (or file - just any kind of persistent storage on the server) that would save the last processed record ID of the script, or NULL if the script successfully completed. That way you'll be able to make your script start with the last processed record (if the last processed record had id = 1000, add ... WHERE id > 1000 to the main query that fetches SKUs), and you won't really care if the script completed its first attempt or not (if not, it will keep processing from that very point when it was killed, on its second try).
Alternatively, to extend this approach, you can limit one invocation to the certain amount of records to process (e.g. 100 or 1000), again, saving the last processed record ID in the database or somewhere else.
The main idea is: if the script fails to process all SKUs at once, just make it restartable so that it does not lose its progress.
I have this scenario:
User submits a link to my PHP website and closes the browser. Now that the server has got the link it will analyse the submitted link (page) for the broken links and after it has completely analysed the posted link, it will send an email to the user. I have a complete understanding of the second part i.e. how to analyse the page for the broken links and send the mail to the user. Only problem that I have is how may I achieve this first part i.e. make the server keep running the actions on it's own even even if there is no request made by the client end?
I have learned that "Crontab" or a "fork" may work for me. What do you say about these? Is it possible to achieve what I want, using these? What are the alternatives?
crontab would be the way to go for something like this.
Essentially you have two applications:
A web site where users submit data to a database.
An offline script, scheduled to run via cron, which checks for records in the database and performs the analysis, sending notifications of the results when complete.
Both of these applications share the same database, but are otherwise oblivious to each other.
A website itself isn't suited well for this sort of offline work, it's mainly a request/response system. But a scheduled task works for this. Unless the user is expecting an immediate response, a small delay of waiting for the next scheduled run of the offline task is fine.
The server should run the script independently of the browser. Once the request is submitted, the php server runs the script and returns the result to the browser (if it has a result to return)
An alternative would be to add the request to a database and then use crontab run the php script at a given interval. The script would then check the database to see if there's anything that needs to be processed. You could limit the script to run one database entry every minute (or whatever works). This will help prevent performance problems if you have a lot of requests at once, but will be slower to send the email.
A typical approach would be to enter the link into a database when the user submits it. You would then use a cron job to execute a script periodically, which will process any pending links.
Exactly how to setup a cron job (or equivalent scheduled task) depends on your server. If you have a host which provides a web-based admin tool (such as CPanel), there will often be a way to do it in there.
PHP script will keep running after the client closes the broser (terminating the connection).
Only keep in mind PHP scripts maximum execution time is limited to "max_execution_time" directive value.
Of course here I suppose the link submission happens calling your script page... I don't understand if this is your use case...
For the sake of simplicity, a cronjob could do the wonders. User submits a link, the web handler simply saves the link into a DB (let me pretend here that the table is named "queued_links"). Then a cronjob scheduled to run each minute (for example), selects every link from queued_links, does the application logic (finds broken page links) and sends the email. It then also deletes the link from queued_links (or updates a flag to represent the fact that the link has already been processed.
In the sake of scale and speed, a cronjob wouldn't fit as well as a Message Queue (see rabbitmq, activemq, gearman, and beanstalkd (gearman and beanstalk are my favorite 2, simple and fit well with php)). In lieu of spawning a cronjob every minute, a queue processor listens for 'events' and asynchronously processes the 'events' (think 'onLinkSubmission($link)'), and processes the messages ASAP. The cronjob solution is just a simplified implementation of one of these MQ solutions, will result in better / more predictable results, but at the cost of adding new services to maintain, etc.
well, there are couple of ways, simplest of them would be:
When user submit a request, save this request some where, let's call it jobs table, and inform customer that his request has been received, they'll be updated site finish processing your request, or whatever suites you.
Now, create a (or multiple) scripts (depending upon requirement) and run this script from Cron, this script will pick requests from Job table, process it, do whatever required.
Alternatively, you can evaluate possibility of message_queue or may be using a Job server for this.
so, it all depends on your requirement.
i started to learn programming like a month ago. I already knew html and css, i thought i should learn PHP. I learned alot of it from from tutorials and books, now I am making mysql based websites for practice.
I always used to play browser based strategy games like travian when i was a kid. I was thinking about how those sites worked. I didnt have any problem till i realized that the game actually worked after you closed the browser. For example; you log in to your account and start a construction and log off. But even after you close the browser, game knows that in "x" amount of time it needs to update your data of that specific building.
can someone tell me how that works? is it something with php or MySQL or some other programming language? even if you can tell me what to search online, it would be enough.
Despite being someone who loves tackling steep learning curves, I would advise against trying jump into something that requires background processes until you have a bit more programming experience.
But either way, here's what you need to know:
Normal PHP Process
The way that PHP normally works is the following:
User types a url into the browser and hits enter (or just clicks on a link)
Request is sent to a bunch of servers and magically finds its way to the right web server (beyond scope of this answer)
Server program like Apache or IIS listening on port 80 grabs the request
Apache sees that there's a .php extension on the requested page
Apache looks up if any processors have been assigned to .php and finds php.exe
The requested page is fed into php.exe
php.exe starts up a new process for the specific user, runs everything on the script, returns its result
The result is then sent back to the user
When the user closes the browser and ends the "session", the process started by php exits
So the problem you encounter when you want something running in the background is that PHP in most cases is generally accessed through the web server, and hence usually requires a browser (and user making requests through the browser). And since closing the browser ends the process, so you need a way to run php scripts without a browser.
Luckily PHP can be accessed outside of just the webserver as a normal process on the server. But then the problem is that you have to access the server. You probably don't want your users to ssh into your server in order to manually run scripts (and I'm assuming you don't want to do it manually on behalf of your users every single time either). Hence you have the options either creating cronjobs that will automatically execute a command at a specific frequency as if you had typed it in yourself on your server's commandline. Another option is to manually start a script once that doesn't shutdown unless your server shuts down.
Triggering a Script based on Time:
Cron that is a task scheduler on *nix systems and Windows Task Scheduler on Windows. What you can do is set up a cronjob to run a specific php file at a specific frequency, and execute all the "background" tasks you need to run from within there.
One way of doing this would be to have a mysql table containing things that need to be executed along with when they need to be executed. The script then queries the table based on time to retrieve which tasks need to be executed, executes them, and then marks them executed (or just deletes them) in the mysql table.
This is a basic form of process queuing.
Building a Queue Server
This is a lot more advanced, but here's a tutorial for creating a script that will queue processes in the background without the need for any external databases: Building a Queue Server in PHP .
Let me know if this makes sense or if you have any questions :)
PHP is a server side language. Any time anybody accesses a PHP program on the server, it runs, irrespective of who is a client.
So, imagine a program that holds a counter. It stores this in a database. Every time updatecounter.php is called, the counter gets updated by one.
You browse to updatecounter.php, and it tells you that the counter is now at 34.
Next time you browse to updatecounter.php it tells you that the counter is at 53.
Its gone up by 18 more counts than you were expecting.
This is because updatecounter.php was being run without your intervention. It was being run by other people.
Now, if you looked at updatecounter.php, you might see code like this:
require_once("my_code.php);
$counterValue = increment_counter_value();
echo "New Counter Value = ".$counterValue;
Notice that the main core of the program is stored in a separate program than the program that you are calling.
Also, notice that instead of calling increment_counter_value, you could call anything. So every time somebody browsed to updatecounter.php, or whatever your game would be called, the internal game mechanics could be run. You could for instance, have an hourly stat management routine which would check each time it was called if it had been run in the last hour, and if it hadn't it would perform all the stats.
Now, what if nobody else is playing your game? If that happens, then the hourly stat management wouldn't get called, and your game world would die. So what you would need to do is create another program who's sole function is to run your stats. You would then schedule that program on the server to run at an hourly interval. You do this using something called a CRON job. You will probably find that your host already has this facility built in, if you are on Apache. I won't go into any more detail about task scheduling as without knowing your environment its impossible to give the correct answer. But basically, you would need to schedule a PHP program to run on the server to perform the hourly maintenance.
Here's a tutorial on CRON jobs:
http://net.tutsplus.com/tutorials/other/scheduling-tasks-with-cron-jobs/
I haven't used it myself but I've had no problems with other stuff on tutsplus so you should be ok.
This is not only php . Browser based game are combination of php/mysql/javascript/html . There are lot of technologies being used for this kind of work. When you are doing something on the browser, lets say adding a building ,an ajax request is being sent to the server so the server updates the database (can't wait until logout because then other users won't know your status to play (in case of multiparty) .
I have a PHP website in which, when a member visits a page, a series of database maintenance actions are made.
For example, in a common page, I've included a PHP script which checks how many posts every user has made and updates the database giving them points accordingly.
The problem with this method is that my website has 100+ members, and I'm worried that my scripts start to slow down as my memberbase grows.
Is there any way to code a bot in PHP, so my database can be updated without the user's intervention?
You should run a PHP file from within a cron job. Most PHP hosts including shared hosting provide cron access.
With cron you can schedule a task to run on an interval basis. This PHP program will then go through and do the updating that you require. So... take the code you do now and move it into a seperate PHP file and then tell cron to run it maybe once an hour or whatever you deem to be the correct interval.
For best performance, you need to update users table when he publish the post, not every time when need to know how many posts he published.
Create a cron to run daily (for instance) with the follow command:
php -q /home/cpaneluser/cron.php
And put a cron.php outside of public_html with all maintenance taks.
Or allow only administrators to do the maintenance tasks with a link in administrative panel.
i wonder how can i schedule and automate tasks in PHP? can i? or is web server features like cron jobs needed.
i am wondering if there is a way i can say delete files after say 3 days when the file are likely outdated or not needed
PHP natively doesn't support automating tasks, you have to build a solution yourself or search google for available solutions. If you have a frequently visited site/page, you could add a timestamp to the database linking to the file, when visiting your site in a chosen time (e.g. 8 in the morning) the script (e.g. deleteOlderDocuments.php) runs and deletes the files that are older.
Just an idea. Hope it helps.
PHP operates under the request-response model, so it won't be the responsibility of PHP to initiate and perform the scheduled job. Use cron, or make your PHP site to register the cron jobs.
(Note: the script that the job executes can be written in PHP of course)
In most shared hosting environments, a PHP interpreter is started for each page request. This means that for each PHP script in said environment, all that script will know about is the fact that it's handling a request, and the information that request gave it. Technically you could check the current time in PHP and see if a task needs to be performed, but that is relying on a user requesting that script near a given time.
It is better to use cron for such tasks. especially if the tasks you need performed can be slow -- then, every once in a while, around a certain time, a user would have a particularly slow response, because them accessing a script caused the server to do a whole bunch of scheduled stuff.