I'm putting together my first commercial PHP application, it's nothing really huge as I'm still eagerly learning PHP :)
Right now I'm still in the conceptual stage of planning my application but I run into one problem all the time, the application is supposed to be self-hosted by my customers, on their own servers and will include some very long running scripts, depending on how much data every customer enters in his application.
Now I think I have two options, either use cronjobs, like for example let one or multiple cronjobs run at a time that every customer can set himself, OR make the whole processing of data as daemons that run in the background...
My question is, since it's a self-hosted application (and every server is different)... is it even recommended to try to write php that starts background processes on a customers server, or is this more something that you can do reliably only on your own server...?
Or should I use cronjobs for these long running processes?
(depending on the amount of data my customers will enter in the application, a process could run 3+ hours)
Is that even a problem that can be solved, reliably, with PHP...? Excuse me if this should be a weird question, I'm really not experienced with PHP daemons and/or long running cronjobs created by php.
So to recap everything:
Commercial self-hosted application, including long running processes, cronjobs or daemons? And is either or maybe both also a reliable solution for a paid application that you can give to your customers with a clear conscience because you know it will work reliable on all kinds of different servers...?
EDIT*
PS: Sorry, I forgot to mention that the application targets only Linux servers, so everything like Debian, Ubuntu etc etc.
Short answer, no, don't go for background process if this will be a client hosted solution. If you go towards the ASP concept (Application Service Provider... not Active Server Pages ;)) then you can do some wacky stuff with background processes and external apps connecting to your sql servers and processing stuff for you.
What i suggest is to create a strong task management backbone and link that to a solid task processing infrastructure. I'll recommend you read an old post i did quite some time ago regarding background processes and a strategy i had adopted to fix long running processes:
Start & Stop PHP Script from Backend Administrative Webpage
Happy reading...
UPDATE
I realize that my old post is far from easy to understand so here goes:
You need 2 models: Job and JobQueue, 2 controller: JobProcessor, XYZProcessor
JobProcessor is called either by a user when a page triggers or using a cronjob as you wish. JobProcessor::process() is the key that starts the whole processing or continues it. It loads the JobQueues and asks the job queues if there is work to do. If there is work to do, it asks the jobqueue to start/continue it's job.
JobQueue Model: Used to queue several JOBS one behind each other and controls what job is currently current by keep some kind of ID and STATE about which job is running.
Job Model: Represents exactly what needs to be done, it contains for example the name of the controller that will process the data, the function to call to process the data and a serialized configuration property that describe what must be done.
XYZController: Is the one that contains the processing method. When the processing method is called, the controller must load everything it needs to memory and then process each individual unit of work as fast as possible.
Example:
Call of index.php
Index.php creates a jobprocessor controller
Index.php calls the jobprocessor's process()
JobProcessor::Process() loads all the queues and processes them
For each JobQueue::Process(), the job queue loads it's possible Jobs and detects if one is currently running or not. If none is running, it starts the next one by calling Job::Process();
Job::Process() creates the XYZController that will work the task at hand. For example, my old system had an InvoicingController and a MassmailingController that worked hand in hand.
Job::Process() calls XYZController::Prepare() so that it loads it's information to process. (For example, load a batch of emails to process, load a batch of invoices to create)
Job::Process() calls XYZController::RunWorkUnit() so that it processes a single unit of work (For example, create one invoice, send one email)
Job::Process() asks JobProcessingController::DoIStillHaveTimeToProcess() and if so, continues processing the next element.
Job::Process() runs out of time and calls XYZController::Cleanup() so that all resources are released
JobQueue::Process() ends and returns to JobController
JobController::Process() is about to end? Open a socket, call myself back so i can start another round of processing until i don't have anything to do anymore
Handle the request from the user that start in position #1.
Ultimately, you can instead open a socket each time and ask the processor to do something, or you can queue a CronJob to call your processor. This way your users won't get stuck waiting for the 3/4 work units to complete each time.
Its worth noting that, in addition to running daemons or cron jobs, you can kick off long running processes from a web request (but note that it must run outside of the webserver process group) and of course asynchronous message processing (which is essentially a variant on the batch approach).
All four of these approaches are very different in terms of how they behave, how concurrency and timing are managed. The factors which make them all different are the same ones you omitted from your question - so it's not really possible to answer.
Unfortunately all rely on facilities which are very different between MSWindows and POSIX systems - so although PHP will run on both, if you want to sell your app on both platforms it's going to need 2 versions.
Maybe you should talk to your potential customer base and ask them what they want?
Related
Would appreciate some help understanding typical best practices in carrying out a series of tasks using Gearman in conjunction with PHP (among other things).
Here is the basic scenario:
A user uploads a set of image files through a web-based interface. The php code responding to the POST request generates an entry in a database for each file, mostly with null entries in the columns, queues a job for each to do analysis using Gearman, generates a status page and exits.
The Gearman worker gets a job for a file and starts a relatively long-running analysis. The result of that analysis is a set of parameters that need to be inserted back into the database record for that file.
My question is, what is the generally accepted method of doing this? Should I use a callback that will ultimately kick off a different php script that is going to do the modification, or should the worker function itself do the database modification?
Everything is currently running on the same machine; I'm planning on using Gearman for background scheduling, rather than for scaling by farming out to different machines, but in any case any of the functions could connect to the database wherever it is.
Any thoughts appreciated; just looking for some insights on how this typically gets structured and what might be considered best practice.
Are you sure you want to use Gearman? I only ask because it was the defacto PHP job server about 15 years ago but hasn't been a reliable solution for quite some time. I am not sure if things have drastically improved in the last 12 months, but last time I evaluated Gearman, it wasn't production capable.
Now, on to the questions.
what is the generally accepted method of doing this? Should I use a callback that will ultimately kick off a different php script that is going to do the modification, or should the worker function itself do the database modification?
You are going to follow this general pattern with any job queue:
Collect a unit of work. In your case, it will be 1 of the images and any information about who that image belongs to, user id, etc.
Submit the work to the job queue with this information.
Job Queue's worker process picks up the work, and starts processing it. This is where I would create records in the database as you can opt to not create them on job failure.
The job queue is going to track which jobs have completed and usually the status of completion. If you are using gearman, this is the gearmand process. You also need something pickup work and process that work, I will refer to this as the job worker. The job worker is where the concurrency happens which is what i think you were referring to when you said "kick off a different php script." You can just kick off a PHP script at an interval (with supervisord or a cronjob) for a kind of poll & fork approach. It's not the most efficient approach, but it doesn't sound like it will really matter for your applications use case. You could also use pcntl_fork or pthreads in PHP to get more control over your concurrent processes and implement a worker pool pattern, but it is much more complicated than just firing off a script. If you are interested in trying to implement some concurrency in PHP, I have a proof-of-concept job worker for beanstalkd available on GitHub that implements a worker pool with both fork and pthreads. I have also include a couple of other resources on the subject of concurrency.
Job Worker (pthreads)
Job Worker (fork)
PHP Daemon Example
PHP IPC Example
Question
Using PHP & Jquery how would you execute code after a given amount of time, say 1 month (even after the user has closed the browser etc)
Scenario
I've wanted to build an application that does something in an amount of time specified by the user, "sort of like hootsuite". But i cant get my head around how it would work.
I know you can use node.js (I struggle to understand and implement this in any of my laravel projects...) but even then wouldnt the server be filled with stress if say 1000 people had something waiting to be executed on the server for a whole month or even a year while still handling other user requests?
I've looked around a bit and CRON jobs came up but this doesnt sound like what i was looking for! Im not sure, ill be grateful if anyone can explain to me how they think i could go about it
Essentially what you're looking for is a scheduling system. The reason why the UNIX cron tool has come up in your searches is because it is a scheduling tool; it allows UNIX users to schedule tasks to happen at certain times. Other operating systems also have task schedulers.
Schedulers
The principal implementation strategy for a scheduler is some kind of polling mechanism, i.e., a software component which periodically checks to see if there are any scheduled tasks which are now due to be executed and, if so, executes them.
Implementation strategies
In order to implement something like this you would need a way to store information about scheduled tasks (e.g. when they're supposed to happen, who they belong to, what they're supposed to do). For example, you might use a database management system, or a file on disk.
You would also need a component to do the polling. This could be a daemon process (i.e. a process which is always running in the background) which includes a sleep (or wait or timeout) call which allows it to check at intervals for scheduled tasks, rather than constantly checking (and thereby most likely consuming all the CPU cycles!). Or it could be a program (in PHP if you like) which is itself run by cron on the host system, say, every five minutes which checks for scheduled tasks and then executes in, perhaps in separate processes. If you were to use cron, there are numerous PHP wrappers to help such as https://packagist.org/packages/peppeocchi/php-cron-scheduler.
Services
However, instead of implementing all this yourself, you may consider making use of an existing service. There seem to be several options, including at least one free (within limits) service: https://atrigger.com/.
this side of PHP is rather new to me.
I am interested in firing off a large number (25-50) separate processes from a parent script. I would like for the parent script to not wait for these other scripts to complete AND I would like for these other scripts to run in parallel.
Each script would run for a specified amount of time calling a webservice.
Can anyone give me some direction with this? I'm not asking for a coded answer specifically, but I just need some guidance.
Much thanks.
It really depends on what you want to achieve. #Julien's forking method could work, but this is not preferable if your web service calls are data intensive. I am not saying that forking is bad on the contrary it works, but with the ammount of different wev services you want to call you should have a way manage things better.
Another thing that you can do is base this on cronjobs. For example if you're calling these webservices for some users in your app create a queue - a DB table that you add records that need to be processed. If you are using Cake use the Cake Shells. Then set up cronjobs that call a the shells that processes these records every now and then. Divide all services is separate queues - at least for those that are very different in logic. This way you will also divide your risk because if there is a failure in one of the web service calls you would not jeopardise all in some way. Have separate logging abilities for each queue which will enable you to quickly track down problems. With consuming web services very often problems are external to your application.
i wonder how can i schedule and automate tasks in PHP? can i? or is web server features like cron jobs needed.
i am wondering if there is a way i can say delete files after say 3 days when the file are likely outdated or not needed
PHP natively doesn't support automating tasks, you have to build a solution yourself or search google for available solutions. If you have a frequently visited site/page, you could add a timestamp to the database linking to the file, when visiting your site in a chosen time (e.g. 8 in the morning) the script (e.g. deleteOlderDocuments.php) runs and deletes the files that are older.
Just an idea. Hope it helps.
PHP operates under the request-response model, so it won't be the responsibility of PHP to initiate and perform the scheduled job. Use cron, or make your PHP site to register the cron jobs.
(Note: the script that the job executes can be written in PHP of course)
In most shared hosting environments, a PHP interpreter is started for each page request. This means that for each PHP script in said environment, all that script will know about is the fact that it's handling a request, and the information that request gave it. Technically you could check the current time in PHP and see if a task needs to be performed, but that is relying on a user requesting that script near a given time.
It is better to use cron for such tasks. especially if the tasks you need performed can be slow -- then, every once in a while, around a certain time, a user would have a particularly slow response, because them accessing a script caused the server to do a whole bunch of scheduled stuff.
When executing proc_nice(), is it actually nice'ing Apache's thread?
If so, and if the current user (non-super user) can't renice to its original priority is killing the Apache thread appropriate (apache_child_terminate) on an Apache 2.0x server?
The issue is that I am trying to limit the impact of an app that allows the user to run Ad-Hack queries. The Queries can be massive and the resultant transform on the data requires a lot of Memory and CPU.
I've already re-written the process to be more stream based - helping with the memory consumption, but I would also like the process to run a lower priority. However I can't leave the Apache thread in low priority as we have a lot of high-priority web services running on this same box.
TIA
In that kind of situation, a solution if often to not do that kind of heavy work within the Apache processes, but either :
run an external PHP process, using something like shell_exec, for instance -- this is if you must work in synchronous mode (ie, if you cannot execute the task a couple of minutes later)
push the task to a FIFO system, and immediatly return a message to the user saying "your task will be processed soon"
and have some other process (launched via a crontab every minute, for instance) check that FIFO queue
and do the processing it there is something in the queue
That process, itself, can run in low priority mode.
As often as possible, especially if the heavy calculations take some time, I would go for the second solution :
It allows users to get some feedback immediatly : "the server has received your request, and will process it soon"
It doesn't keep Apaches's processes "working" for long : the heavy stuff is done by other processes
If, one day, you need such an amount of processing power that one server is not enough anymore, this kind of system will be easier to scale : just add a second server that'll pick from the same FIFO queue
If your server is really too loaded, you can stop processing from the queue, at least for some time, so the load can get better -- for instance, this can be usefull if your critical web-services are used a lot in a specific time-frame.
Another (nice-looking, but I haven't tried it yet) solution would be to use some kind of tool like, for instance, Gearman :
Gearman provides a generic application
framework to farm out work to other
machines or processes that are better
suited to do the work. It allows you
to do work in parallel, to load
balance processing, and to call
functions between languages. It can be
used in a variety of applications,
from high-availability web sites to
the transport of database replication
events. In other words, it is the
nervous system for how distributed
processing communicates.