Related
Question
Using PHP & Jquery how would you execute code after a given amount of time, say 1 month (even after the user has closed the browser etc)
Scenario
I've wanted to build an application that does something in an amount of time specified by the user, "sort of like hootsuite". But i cant get my head around how it would work.
I know you can use node.js (I struggle to understand and implement this in any of my laravel projects...) but even then wouldnt the server be filled with stress if say 1000 people had something waiting to be executed on the server for a whole month or even a year while still handling other user requests?
I've looked around a bit and CRON jobs came up but this doesnt sound like what i was looking for! Im not sure, ill be grateful if anyone can explain to me how they think i could go about it
Essentially what you're looking for is a scheduling system. The reason why the UNIX cron tool has come up in your searches is because it is a scheduling tool; it allows UNIX users to schedule tasks to happen at certain times. Other operating systems also have task schedulers.
Schedulers
The principal implementation strategy for a scheduler is some kind of polling mechanism, i.e., a software component which periodically checks to see if there are any scheduled tasks which are now due to be executed and, if so, executes them.
Implementation strategies
In order to implement something like this you would need a way to store information about scheduled tasks (e.g. when they're supposed to happen, who they belong to, what they're supposed to do). For example, you might use a database management system, or a file on disk.
You would also need a component to do the polling. This could be a daemon process (i.e. a process which is always running in the background) which includes a sleep (or wait or timeout) call which allows it to check at intervals for scheduled tasks, rather than constantly checking (and thereby most likely consuming all the CPU cycles!). Or it could be a program (in PHP if you like) which is itself run by cron on the host system, say, every five minutes which checks for scheduled tasks and then executes in, perhaps in separate processes. If you were to use cron, there are numerous PHP wrappers to help such as https://packagist.org/packages/peppeocchi/php-cron-scheduler.
Services
However, instead of implementing all this yourself, you may consider making use of an existing service. There seem to be several options, including at least one free (within limits) service: https://atrigger.com/.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
This is a design question and I appreciate your insight / advise. I understand this question may have different answers based on experience and I am merely trying to seek some guidance before I make a selection on how I proceed.
Background -
My application is primarily built on LAMP stack - Linux, Apache, MySQL and PHP. I also use jQuery to client side scripting and the application is fairly simple and executes very fast. I am also using CakePHP framework
Scenario #1 -
The user clicks a link on the web page
The click triggers an AJAX call to a PHP script on the server
The PHP script make a cURL request to another web address to process some information and usually returns in 4-5 seconds
Upon return the PHP script completes execution and terminates
Question -
I keep hearing that PHP is synchronous and will hang until this request is finished - so if multiple users make multiple requests in the above scenario will PHP hang until each request is processed sequentially or does Apache take care of spawning multiple threads to process each web request separately?
I am trying to figure out a way to better handle this - even if it means I should step outside of PHP. Would you recommend I use PERL scripting to handle to cURL request and just have PHP fork a shell thread and exit or would it be better to create a JAVA servlet that the AJAX can call since JAVA is multi-threaded it can handle this on the same.
I am reading up on pThreads - is this a scenario where pThreads would be
Scenario 2
User uploads a zip file and clicks the process button and then quits the application
Upon clicking the process button an AJAX request is sent to the server to process the zip file. The PHP script receiving this request has ignore_user_abort enabled so it does execute even if the user quits.
However processing of this zip file can take multiple minutes as it involves multiple cURL calls and SOAP calls across web servers
Once processing is done, the PHP script updates the database and terminates
Question
Again similar to the above question, is this something that will be blocking in nature if multiple people upload files at the same time?
Assuming PHP would queue all the various requests - would this cause a timeout scenario and loss of requests?
Is this something better done with PERL/JAVA etc?
Thank you for your advise and insight
The short answer is
Scenario #1
all / most languages are synchronous, that said running ajax is asynchronous and by extension running php by ajax is asynchronous. The thing is here you are confusing "synchronous" which in this context means block until an operation is finished or process blocking, with parallel processing or even multi-threading.
again multi-threading is quite different then parallel processing, php is quite capable of running dozens of parallel processes. Is it the best language for it, probably not but it can do it with as little effort as running a shell script with exec and a command like this exec(usr/bin/php -f pathtophpfile/index.php arg1 > /dev/null & ); on linux. multi-threading is defined as this:
Multithreading is the ability of a program or an operating system process
to manage its use by more than one user at a time and to even manage
multiple requests by the same user without having to have multiple
copies of the programming running in the computer
Parallel processing is defined as this
Parallel processing is the simultaneous use of more than one CPU or
processor core to execute a program or multiple computational threads.
So while technically php cant do either of these, you can run multiple copies of php at the same time on the same machine, much in the same way as you can manually open multiple shell windows and run commands in each of them. Is it parallel processing or multi-threading? No, it's just running multiple copies of PHP at the same time.
But the biggest challenge with any " multi-threaded or parallel process " is race conditions. If you are careful to avoid them you will be fine. Race conditions are like this
process1 loads text.txt
process1 makes changes
process2 loads text.txt - before process1 has saved its data
process2 makes changes
process2 saves changes
process1 saves changes
Now you will lose any changes made by process2 because process1 had the data in memory and never accounted for process2 changing it. This is also what I would call a concurrency issue, they are basically the same thing. Another thing to look out for if using CRON or some other rudimentary queuing method, is not pulling the same job with multiple processes.
Also debugging can be a challenge, this is true of any background process and not specific to php. The simplest thing to do here is use a file to log your output to using things like ob_start() & $var = ob_get_clean() ( output buffering) and recording that. It's also useful to use a shutdown handler to log errors such as
http://php.net/manual/en/function.register-shutdown-function.php
Of course these are over simplified examples, explanations but that is the gist of it.
Scenario #2
how would it be? as I mentioned php and Apache can serve over 200 clients at once, another request is just another connection to Apache ( when using ajax or CURL ) but its basically the same even when just using the CLI (command line interface). There is no inherent reason you cant run several dozen php processes at once.
How would it Queue it, they just execute again like oping multiple tabs in a browser. As for a timeout, there are always resource limits on a server no matter what language you use. You could use a queuing system to insure that only a few files are processed at a give time, this could be as simple as cron and a database table with some status column, such as queued, running, complete. then the cron script runs one job marked as queued, marks it as running while running, marks it complete when done, rinse and repeat.
That is a matter of opinion and more so a matter of your ability with those languages.
I'm actually building a system in php that takes one csv file and breaks it into 25000 row chunks ( without re-writing separate files, just reading from offsets in the same file with multiple threads ). These chunks are then processed in parallel by up to 10 workers and then aggregated back together, and then some reports and emails ect are generated. Is it easy to do, no. Is it possible, sure is.
The system I am building for example takes a file with say 1million + rows, and queries a database with over 700k records. It works a bit like this
Job Preprocess ( one process creates multiple chunks )
create a job file
calculate ofsets
queue ( in rabbitMq ) multiple jobs
Process ( multiple processes each handle one or more chunk )
load data from queue
access input_file.csv at offset and read to end of offset
generate a numbered result file such as 0.csv, 1.csv for each chunk
Aggregation ( one process only, receives the bits of the job )
load previously saved job file ( from step 1 )
as each chunk completes record that in job file
when all chunks are done, compact all the results from the numbered files in order.
The trick here is that the multiple process part ( step 2 ) doesn't touch that job file in step one ( or it would encounter race conditions ), further only one process receives all of the chunks for a job. Once all the chunks are received, we compact them into one file do some clean up and then send out emails etc..
With this I have ran a file with 1 million rows in under 2 minutes. Using a single thread / process it takes about 15 minutes to run the same file.
So ( again ) I assure you It can be done, it's tricky and you have to be very careful on how you move your data around but it's not impossible to do these things in php. PHP and modern hardware for that matter can handle thousands of operations a second. Usually the bottle necks are bad indexing in a database or waiting on network connections ect...
If you plan on doing some real heavy duty work I'd suggest looking into a queuing or messaging system like I use ( RabbitMq ) but that might be overkill in your case. I use the queuing system to help keep the process flow sane and avoid race conditions, basically it's sole purpose for me is to organize the data flow.
Scenario #1
1) PHP is synchronous, but the question is confused. PHP executes instructions synchronously, normally, however Apache defines the processing model. Apache will reuse or spawn a worker process or thread to handle the request, up to the configured limit.
2) The way you are handling it is fine, you might want to try and reduce the amount of time it takes to update the user interface, because 4-5 seconds is rather long.
3) I will talk a little about using threads at the frontend.
Using threads at the frontend doesn't make sense. As mentioned, your webserver has a defined processing model, it is designed to scale with that model, creating user threads as the result of a web request disrupts that model. Even if user code creates a reasonable number of threads, for example 8, if 100 clients come along at once, you will be asking your hardware to execute 800 threads concurrently.
That is clearly a bad idea !
Scenario #2
1) The same answer as #1.1, it's the processing model of the server that handles multiple clients.
2) The same answer as question 1 in both scenarios.
3) That's entirely a matter of opinion.
The problem you seem to have is essentially the same in both scenarios.
Advice
Don't make anything more complex than it has to be; in both scenarios, the problem is your receiving server side code responds slower than is desirable.
In the case where you have many HTTP requests to make to process a request, your code is I/O bound, don't go straight to multi-processing or multi-threading at all, try non-blocking I/O first, this is simpler, more accessible, more suitable, and scales with PHP.
In the case where you have code that is CPU bound, for example, you have solved the I/O problem, and are making all your requests using non-blocking I/O, but once data is downloaded, it requires considerable processing to be used. Then you might think about using multiple processes or threads.
Whatever happens, you should not use multi-threading at the frontend, what you want to do is isolate those parts of the application that require multi-threading and communicate with this isolated sub-application using some sane form of RPC.
I'm putting together my first commercial PHP application, it's nothing really huge as I'm still eagerly learning PHP :)
Right now I'm still in the conceptual stage of planning my application but I run into one problem all the time, the application is supposed to be self-hosted by my customers, on their own servers and will include some very long running scripts, depending on how much data every customer enters in his application.
Now I think I have two options, either use cronjobs, like for example let one or multiple cronjobs run at a time that every customer can set himself, OR make the whole processing of data as daemons that run in the background...
My question is, since it's a self-hosted application (and every server is different)... is it even recommended to try to write php that starts background processes on a customers server, or is this more something that you can do reliably only on your own server...?
Or should I use cronjobs for these long running processes?
(depending on the amount of data my customers will enter in the application, a process could run 3+ hours)
Is that even a problem that can be solved, reliably, with PHP...? Excuse me if this should be a weird question, I'm really not experienced with PHP daemons and/or long running cronjobs created by php.
So to recap everything:
Commercial self-hosted application, including long running processes, cronjobs or daemons? And is either or maybe both also a reliable solution for a paid application that you can give to your customers with a clear conscience because you know it will work reliable on all kinds of different servers...?
EDIT*
PS: Sorry, I forgot to mention that the application targets only Linux servers, so everything like Debian, Ubuntu etc etc.
Short answer, no, don't go for background process if this will be a client hosted solution. If you go towards the ASP concept (Application Service Provider... not Active Server Pages ;)) then you can do some wacky stuff with background processes and external apps connecting to your sql servers and processing stuff for you.
What i suggest is to create a strong task management backbone and link that to a solid task processing infrastructure. I'll recommend you read an old post i did quite some time ago regarding background processes and a strategy i had adopted to fix long running processes:
Start & Stop PHP Script from Backend Administrative Webpage
Happy reading...
UPDATE
I realize that my old post is far from easy to understand so here goes:
You need 2 models: Job and JobQueue, 2 controller: JobProcessor, XYZProcessor
JobProcessor is called either by a user when a page triggers or using a cronjob as you wish. JobProcessor::process() is the key that starts the whole processing or continues it. It loads the JobQueues and asks the job queues if there is work to do. If there is work to do, it asks the jobqueue to start/continue it's job.
JobQueue Model: Used to queue several JOBS one behind each other and controls what job is currently current by keep some kind of ID and STATE about which job is running.
Job Model: Represents exactly what needs to be done, it contains for example the name of the controller that will process the data, the function to call to process the data and a serialized configuration property that describe what must be done.
XYZController: Is the one that contains the processing method. When the processing method is called, the controller must load everything it needs to memory and then process each individual unit of work as fast as possible.
Example:
Call of index.php
Index.php creates a jobprocessor controller
Index.php calls the jobprocessor's process()
JobProcessor::Process() loads all the queues and processes them
For each JobQueue::Process(), the job queue loads it's possible Jobs and detects if one is currently running or not. If none is running, it starts the next one by calling Job::Process();
Job::Process() creates the XYZController that will work the task at hand. For example, my old system had an InvoicingController and a MassmailingController that worked hand in hand.
Job::Process() calls XYZController::Prepare() so that it loads it's information to process. (For example, load a batch of emails to process, load a batch of invoices to create)
Job::Process() calls XYZController::RunWorkUnit() so that it processes a single unit of work (For example, create one invoice, send one email)
Job::Process() asks JobProcessingController::DoIStillHaveTimeToProcess() and if so, continues processing the next element.
Job::Process() runs out of time and calls XYZController::Cleanup() so that all resources are released
JobQueue::Process() ends and returns to JobController
JobController::Process() is about to end? Open a socket, call myself back so i can start another round of processing until i don't have anything to do anymore
Handle the request from the user that start in position #1.
Ultimately, you can instead open a socket each time and ask the processor to do something, or you can queue a CronJob to call your processor. This way your users won't get stuck waiting for the 3/4 work units to complete each time.
Its worth noting that, in addition to running daemons or cron jobs, you can kick off long running processes from a web request (but note that it must run outside of the webserver process group) and of course asynchronous message processing (which is essentially a variant on the batch approach).
All four of these approaches are very different in terms of how they behave, how concurrency and timing are managed. The factors which make them all different are the same ones you omitted from your question - so it's not really possible to answer.
Unfortunately all rely on facilities which are very different between MSWindows and POSIX systems - so although PHP will run on both, if you want to sell your app on both platforms it's going to need 2 versions.
Maybe you should talk to your potential customer base and ask them what they want?
Greetings All!
I am having some troubles on how to execute thousands upon thousands of requests to a web service (eBay), I have a limit of 5 million calls per day, so there are no problems on that end.
However, I'm trying to figure out how to process 1,000 - 10,000 requests every minute to every 5 minutes.
Basically the flow is:
1) Get list of items from database (1,000 to 10,000 items)
2) Make a API POST request for each item
3) Accept return data, process data, update database
Obviously a single PHP instance running this in a loop would be impossible.
I am aware that PHP is not a multithreaded language.
I tried the CURL solution, basically:
1) Get list of items from database
2) Initialize multi curl session
3) For each item add a curl session for the request
4) execute the multi curl session
So you can imagine 1,000-10,000 GET requests occurring...
This was ok, around 100-200 requests where occurring in about a minute or two, however, only 100-200 of the 1,000 items actually processed, I am thinking that i'm hitting some sort of Apache or MySQL limit?
But this does add latency, its almost like performing a DoS attack on myself.
I'm wondering how you would handle this problem? What if you had to make 10,000 web service requests and 10,000 MySQL updates from the return data from the web service... And this needs to be done in at least 5 minutes.
I am using PHP and MySQL with the Zend Framework.
Thanks!
I've had to do something similar, but with Facebook, updating 300,000+ profiles every hour. As suggested by grossvogel, you need to use many processes to speed things up because the script is spending most of it's time waiting for a response.
You can do this with forking, if your PHP install has support for forking, or you can just execute another PHP script via the command line.
exec('nohup /path/to/script.php >> /tmp/logfile 2>&1 & echo $!'), $processId);
You can pass parameters (getopt) to the php script on the command line to tell it which "batch" to process. You can have the master script do a sleep/check cycle to see if the scripts are still running by checking for the process id's. I've tested up to 100 scripts running at once in this manner, at which point the CPU load can get quite high.
Combine multiple processes with multi-curl, and you should easily be able to do what you need.
My two suggestions are (a) do some benchmarking to find out where your real bottlenecks are and (b) use batching and cacheing wherever possible.
Mysqli allows multiple-statement queries, so you could definitely batch those database updates.
The http requests to the web service are more likely the culprit, though. Check the API you're using to see if you can get more info from a single call, maybe? To break up the work, maybe you want a single master script to shell out to a bunch of individual processes, each of which makes an api call and stores the results in a file or memcached. The master can periodically read the results and update the db. (Careful to rotate the data store for safe reading and writing by multiple processes.)
To understand your requirements better, you must implement your solution only in PHP? Or you can interface a PHP part with another part written in another language?
If you could not go for another language, try to perform this update maybe as php script that runs in the background and not through the apache.
You can follow Brent Baisley advice for a simple use case.
If you want to build a robuts solution, then you need to :
set up a representation of the actions in a table in database that will be your process queue;
set up a script that pop this queue and process your action;
set up a cron daemon that run this script every x.
This way you can have 1000 PHP scripts running, using your OS parallelism capabilities and not hanging when ebay is taking to to respond.
The real advantage of this system is that you can fully control the firepower you throw at your task by adjusting :
the number of request one PHP script does;
the order / number / type / priority of the action in the queue;
the number or scripts the cron daemon runs.
Thanks everyone for the awesome and quick answers!
The advice from Brent Baisley and e-satis works nicely, rather than executing the sub-processes using CURL like i did before, the forking takes a massive load off, it also nicely gets around the issues with max out my apache connection limit.
Thanks again!
It is true that PHP is not multithreaded, but it can certainly be setup with multiple processes.
I have created a system that resemebles the one that you are describing. It's running in a loop and is basically a background process. It uses up to 8 processes for batch processing and a single control process.
It is somewhat simplified because i do not have to have any communication between the processes. Everything resides in a database so each process is spawned with the full context taken from the database.
Here is a basic description of the system.
1. Start control process
2. Check database for new jobs
3. Spawn child process with the job data as a parameter
4. Keep a table of the child processes to be able to control the number of simultaneous processes.
Unfortunately it does not appear to be a widespread idea to use PHP for this type of application, and i really had to write wrappers for the low level functions.
The manual has a whole section on these functions, and it appears that there are methods for allowing IPC as well.
PCNTL has the functions to control forking/child processes, and Semaphore covers IPC.
The interesting part of this is that i'm able to fork off actual PHP code, not execute other programs.
That question may appear strange.
But every time I made PHP projects in the past, I encountered this sort of bad experience:
Scripts cancel running after 10 seconds. This results in very bad database inconsistencies (bad example for an deleting loop: User is about to delete an photo album. Album object gets deleted from database, and then half way down of deleting the photos the script gets killed right where it is, and 10.000 photos are left with no reference).
It's not transaction-safe. I've never found a way to do something securely, to ensure it's done. If script gets killed, it gets killed. Right in the middle of a loop. It gets just killed. That never happened on tomcat with java. Java runs and runs and runs, if it takes long.
Lot's of newsletter-scripts try to come around that problem by splitting the job up into a lot of packages, i.e. sending 100 at a time, then relading the page (oh man, really stupid), doing the next one, and so on. Most often something hangs or script will take longer than 10 seconds, and your platform is crippled up.
But then, I hear that very big projects use PHP like studivz (the german facebook clone, actually the biggest german website). So there is a tiny light of hope that this bad behavior just comes from unprofessional hosting companies who just kill php scripts because their servers are so bad. What's the truth about this? Can it be configured in such a way, that scripts never get killed because they take a little longer?
Is PHP suitable for very large projects?
Whenever I see a question like that, I get a bit uneasy. What does very large mean? What may be large to you, may be small to me or vice versa. And that is even assuming that we use the same metric. Are you measuring time to build the project, complete life-cycle of the project, money that are involved, number of people using it, number of developers to build/maintain it, etc. etc.
That said, the problems you're describing sounds like you don't know your technology good enough. That would be a problem for you regardless of which technology you picked. For example, use database transactions to ensure atomicity. And use asynchronous offline jobs to process long running tasks (Such as dispatching a mailing list).
A lot if the bad behaviour is covered in good frameworks like the Zend Framework.
Anything that takes longer the 10 seconds is really messed up but you can always raise the execution time with http://de3.php.net/set_time_limit
A lot of big sites are writen in PHP: Facebook, Wikipedia, StudiVZ, Digg.com etc.. a lot of the things you are talking about are just configuration things maybe you should look into that?
Are you looking for set_time_limit() and ignore_user_abort()?
Performance is not a feature you can just throw in after most of the site is done.
You have to design the site for heavy load.
If a database task is normally involving 10K rows, you should be prepared not just the execution time issues, but other maintenance questions.
Worst case: make a consistency tool to check and fix those errors.
Better: instead of phisically delete the images, just flag them and let background services to take care of the expensive maneuvers.
Best: you can utilize a job queue service and add this job to the queue.
If you do need to do transactions in php, you can just do:
mysql_query("BEGIN");
/// do your queries here
mysql_query("COMMIT");
The commit command will just complete the transaction.
If any errors occur, you can just rollback with:
mysql_query("ROLLBACK");
Edit: Note this will only work if you are using a database that supports transactions, such as InnoDB
You can configure how much time is allowed for executing a script, either in the php.ini setting or via ini_set/set_time_limit
Instead of studivz (the German Facebook clone), you could look at the actual Facebook which is entirely PHP. Or Digg. Or many Yahoo sites. Or many, many others.
ignore_user_abort is probably what you're looking for, but you could also add another layer in terms of scheduled maintenance jobs. They basically run on a specified interval and do various things to make sure your data/filesystem are in a state that you want... deleting old/unlinked files is just one of many things you can do.
For these large loops like deleting photo albums or sending 1000's of emails your looking for ignore_user_abort and set_time_limit.
Something like this:
ignore_user_abort(true); //users leaves webpage will not kill script
set_time_limit(0); //script can take as long as it wants
for(i=0;i<10000;i++)
costly_very_important_operation();
Be carefull however that this could potentially run the script forever:
ignore_user_abort(true); //users leaves webpage will not kill script
set_time_limit(0); //script can take as long as it wants
while(true)
do_something();
That script will never die, unless you restart your server.
Therefore it is best to never set the time_limit the 0.
Technically no programming language is transaction safe, it's the database that needs to be transaction safe. So if the script/code running dies or disconnects, for whatever reason, the transaction will be rolled back.
Putting queries in a loop is a very bad idea unless it is specifically design to be running in batches and breaking a much larger set into smaller pieces. Adjusting PHP timers and limits is generally a stop gap solution, you are still dependent on the client browser if using the web to kick off a script.
If I have a long process that needs to be kicked off by a browser, I "disconnect" the process from the browser and web server so control is returned to the user while the script runs. PHP scripts run from the command line can run for hours if you want. You can then use AJAX, or reload the page, to check on the progress of the long running script.
There are security concern with this code, but to "disconnect" a process from PHP running under something like Apache:
exec("nohup /usr/bin/php -f /path/to/script.php > /dev/null 2>&1 &");
But that really has nothing to do with PHP being suitable for large projects or being transaction safe. PHP can be used for large projects, but since by default there is no code that remains "resident" between hits, it can get slow if not designed right. Also, since there is no namespace support, you want to plan ahead if you have a large development team.
It's fine for a Java based system to take a few minutes to startup, initialize and load all the default objects. But this is unacceptable with PHP. PHP will take more planning for larger systems. The question is, when does the time saved in using PHP get wasted by the additional planning time required for a large system?
The reason you most likely experienced bad database consistencies in the past is because you were using the MyISAM engine for mysql (which DOES NOT support transactions). Use InnoDB instead, it supports transactions and performs row level locking.
Or use postgreSQL.
Many, many software sites are made in PHP. However, you will not hear about millions of web pages made in PHP that do not exist anymore because they were abandoned. Those pages may have burned all company money for dealing with PHP mess, or maybe they bankrupted because their soft was so crappy that customer did not want it… PHP seems good at the startup, but it does not scale very well. Yes, there are many huge web sites made in PHP, but they are rather exceptions, than a norm.