Anynchronous Logging strategy for Python & PHP

Anynchronous Logging strategy for Python & PHP - php

Here's the situation: We have a bunch of python scripts continuously doing stuff and ultimately writing data in mysql, and we need a log to analyse the error rate and script performance.
We also have php front-end that interacts with the mysql data and we also need to log the user actions so that we can analyse their behaviour, and compute some scoring functions.
So we thought of having a mysql table table for each case (one for "python scripts" log and one for "user actions" log).
Ideally, we would be writing to thsese log tables asynchronously, for performance and low-latency reasons. Is there a way to do so in Python (we are using django ORM) and in PHP (we are using Yii Framework) ?
Are there any better approaches for solving this problem ?
Update :
for the user actions, (Web UI), we are now considering loading the Apache Log into mysql with relevant session info automatically through simple Apache configuration

There are (AFAIK) only two ways to do anything a-synchronously in PHP:
Fork the process (requires pcntl_fork)
exec() a process and release it by (assuming *nix) appending > /dev/null & to the end of the command string.
Both of these approaches result in a new process being created, albeit temporarily, so whether this would afford any performance increase is debatable and depends highly on your server environment - I suspect it would make things worse, not better. If your database is very heavily loaded (and therefore the thing that is slowing you down) you might get a faster result from dumping the log messages to file, and having a daemon script that crawls for thing to enter into the DB - but again, whether this would help is debatable.
Python supports multi-threading which makes life a lot easier.

You could open a raw Unix or network socket to a logging service that caches messages and writes them to disk or database asynchronously. If your PHP and Python processes are long-running and generate many messages per execution, keeping an open socket would be more performant than making separate HTTP/database requests synchronously.
You'd have to measure it compared to appending to a file (open once then lock, seek, write, and unlock while running and close at end) to see which is faster.

Related

PHP or Java to handle long running background requests [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
This is a design question and I appreciate your insight / advise. I understand this question may have different answers based on experience and I am merely trying to seek some guidance before I make a selection on how I proceed.
Background -
My application is primarily built on LAMP stack - Linux, Apache, MySQL and PHP. I also use jQuery to client side scripting and the application is fairly simple and executes very fast. I am also using CakePHP framework
Scenario #1 -
The user clicks a link on the web page
The click triggers an AJAX call to a PHP script on the server
The PHP script make a cURL request to another web address to process some information and usually returns in 4-5 seconds
Upon return the PHP script completes execution and terminates
Question -
I keep hearing that PHP is synchronous and will hang until this request is finished - so if multiple users make multiple requests in the above scenario will PHP hang until each request is processed sequentially or does Apache take care of spawning multiple threads to process each web request separately?
I am trying to figure out a way to better handle this - even if it means I should step outside of PHP. Would you recommend I use PERL scripting to handle to cURL request and just have PHP fork a shell thread and exit or would it be better to create a JAVA servlet that the AJAX can call since JAVA is multi-threaded it can handle this on the same.
I am reading up on pThreads - is this a scenario where pThreads would be
Scenario 2
User uploads a zip file and clicks the process button and then quits the application
Upon clicking the process button an AJAX request is sent to the server to process the zip file. The PHP script receiving this request has ignore_user_abort enabled so it does execute even if the user quits.
However processing of this zip file can take multiple minutes as it involves multiple cURL calls and SOAP calls across web servers
Once processing is done, the PHP script updates the database and terminates
Question
Again similar to the above question, is this something that will be blocking in nature if multiple people upload files at the same time?
Assuming PHP would queue all the various requests - would this cause a timeout scenario and loss of requests?
Is this something better done with PERL/JAVA etc?
Thank you for your advise and insight

The short answer is
Scenario #1
all / most languages are synchronous, that said running ajax is asynchronous and by extension running php by ajax is asynchronous. The thing is here you are confusing "synchronous" which in this context means block until an operation is finished or process blocking, with parallel processing or even multi-threading.
again multi-threading is quite different then parallel processing, php is quite capable of running dozens of parallel processes. Is it the best language for it, probably not but it can do it with as little effort as running a shell script with exec and a command like this exec(usr/bin/php -f pathtophpfile/index.php arg1 > /dev/null & ); on linux. multi-threading is defined as this:
Multithreading is the ability of a program or an operating system process
to manage its use by more than one user at a time and to even manage
multiple requests by the same user without having to have multiple
copies of the programming running in the computer
Parallel processing is defined as this
Parallel processing is the simultaneous use of more than one CPU or
processor core to execute a program or multiple computational threads.
So while technically php cant do either of these, you can run multiple copies of php at the same time on the same machine, much in the same way as you can manually open multiple shell windows and run commands in each of them. Is it parallel processing or multi-threading? No, it's just running multiple copies of PHP at the same time.
But the biggest challenge with any " multi-threaded or parallel process " is race conditions. If you are careful to avoid them you will be fine. Race conditions are like this
process1 loads text.txt
process1 makes changes
process2 loads text.txt - before process1 has saved its data
process2 makes changes
process2 saves changes
process1 saves changes
Now you will lose any changes made by process2 because process1 had the data in memory and never accounted for process2 changing it. This is also what I would call a concurrency issue, they are basically the same thing. Another thing to look out for if using CRON or some other rudimentary queuing method, is not pulling the same job with multiple processes.
Also debugging can be a challenge, this is true of any background process and not specific to php. The simplest thing to do here is use a file to log your output to using things like ob_start() & $var = ob_get_clean() ( output buffering) and recording that. It's also useful to use a shutdown handler to log errors such as
http://php.net/manual/en/function.register-shutdown-function.php
Of course these are over simplified examples, explanations but that is the gist of it.
Scenario #2
how would it be? as I mentioned php and Apache can serve over 200 clients at once, another request is just another connection to Apache ( when using ajax or CURL ) but its basically the same even when just using the CLI (command line interface). There is no inherent reason you cant run several dozen php processes at once.
How would it Queue it, they just execute again like oping multiple tabs in a browser. As for a timeout, there are always resource limits on a server no matter what language you use. You could use a queuing system to insure that only a few files are processed at a give time, this could be as simple as cron and a database table with some status column, such as queued, running, complete. then the cron script runs one job marked as queued, marks it as running while running, marks it complete when done, rinse and repeat.
That is a matter of opinion and more so a matter of your ability with those languages.
I'm actually building a system in php that takes one csv file and breaks it into 25000 row chunks ( without re-writing separate files, just reading from offsets in the same file with multiple threads ). These chunks are then processed in parallel by up to 10 workers and then aggregated back together, and then some reports and emails ect are generated. Is it easy to do, no. Is it possible, sure is.
The system I am building for example takes a file with say 1million + rows, and queries a database with over 700k records. It works a bit like this
Job Preprocess ( one process creates multiple chunks )
create a job file
calculate ofsets
queue ( in rabbitMq ) multiple jobs
Process ( multiple processes each handle one or more chunk )
load data from queue
access input_file.csv at offset and read to end of offset
generate a numbered result file such as 0.csv, 1.csv for each chunk
Aggregation ( one process only, receives the bits of the job )
load previously saved job file ( from step 1 )
as each chunk completes record that in job file
when all chunks are done, compact all the results from the numbered files in order.
The trick here is that the multiple process part ( step 2 ) doesn't touch that job file in step one ( or it would encounter race conditions ), further only one process receives all of the chunks for a job. Once all the chunks are received, we compact them into one file do some clean up and then send out emails etc..
With this I have ran a file with 1 million rows in under 2 minutes. Using a single thread / process it takes about 15 minutes to run the same file.
So ( again ) I assure you It can be done, it's tricky and you have to be very careful on how you move your data around but it's not impossible to do these things in php. PHP and modern hardware for that matter can handle thousands of operations a second. Usually the bottle necks are bad indexing in a database or waiting on network connections ect...
If you plan on doing some real heavy duty work I'd suggest looking into a queuing or messaging system like I use ( RabbitMq ) but that might be overkill in your case. I use the queuing system to help keep the process flow sane and avoid race conditions, basically it's sole purpose for me is to organize the data flow.

Scenario #1
1) PHP is synchronous, but the question is confused. PHP executes instructions synchronously, normally, however Apache defines the processing model. Apache will reuse or spawn a worker process or thread to handle the request, up to the configured limit.
2) The way you are handling it is fine, you might want to try and reduce the amount of time it takes to update the user interface, because 4-5 seconds is rather long.
3) I will talk a little about using threads at the frontend.
Using threads at the frontend doesn't make sense. As mentioned, your webserver has a defined processing model, it is designed to scale with that model, creating user threads as the result of a web request disrupts that model. Even if user code creates a reasonable number of threads, for example 8, if 100 clients come along at once, you will be asking your hardware to execute 800 threads concurrently.
That is clearly a bad idea !
Scenario #2
1) The same answer as #1.1, it's the processing model of the server that handles multiple clients.
2) The same answer as question 1 in both scenarios.
3) That's entirely a matter of opinion.
The problem you seem to have is essentially the same in both scenarios.
Advice
Don't make anything more complex than it has to be; in both scenarios, the problem is your receiving server side code responds slower than is desirable.
In the case where you have many HTTP requests to make to process a request, your code is I/O bound, don't go straight to multi-processing or multi-threading at all, try non-blocking I/O first, this is simpler, more accessible, more suitable, and scales with PHP.
In the case where you have code that is CPU bound, for example, you have solved the I/O problem, and are making all your requests using non-blocking I/O, but once data is downloaded, it requires considerable processing to be used. Then you might think about using multiple processes or threads.
Whatever happens, you should not use multi-threading at the frontend, what you want to do is isolate those parts of the application that require multi-threading and communicate with this isolated sub-application using some sane form of RPC.

What is daemon? Their practical use? Usage with php?

Could someone explain me in two words, what is daemon and what use of them in php?
I, know that this is a process, which is runing all the time.
But i can't understand what use of it in php app?
Can someone please give examples of use?
Can i use daemon to lessen memory usage of my app?
As i understand, daemon can hold data and give it on request, so basically i can store most usable data there, to avoid getting it from mysql for each visitor?
Or i'm totally wrong? :)
Thanks ;)

A daemon is a endless running process, which just waits for jobs. A webserver ("http-daemon") waits for requests to handle, a printer daemon waits for something to print (and so on). On Win systems its called "service".
If you can use it for your application in some way highly depends on your application and what you want to do with a daemon. But also I dont recommend PHP for that.

Could someone explain me in two words, what is daemon and what use of them in php?
cli application or process
I, know that this is a process, which is runing all the time. But i can't understand what use of it in php app?
You can use it to do; job that is not visible to user or from interface, e.g. database stale data cleanup, schedule task that you you wanted to update part or something on db or page in background
Can someone please give examples of use? Can i use daemon to lessen memory usage of my app?
I think drupal or cron had cron script...perhaps checking it would help. Lessen memory? no, memory optimization is always on the application design or script coded.
As i understand, daemon can hold data and give it on request, so basically i can store most usable data there, to avoid getting it from mysql for each visitor?
No, a daemon is a script however you can create a JSON or XML data file that the daemon script can process.

Please see this answer regarding the use of PHP for a daemon. There are times when you might want to fork a child process in PHP, perhaps to execute some query while the parent does other work and then inform the parent that the job as a whole can be completed.
I would not, however use PHP to set up a socket server or similar, nor would I use PHP in any other instance where execution was measured in units greater than seconds.
I don't want to discourage you from exploring and experimenting, just caution you against putting too much trust in an implementation that exceeds the capabilities of the language.

Because a daemon is just a process that runs in an infinite loop, whether or not a daemon can be helpful for your particular app is entirely up to the daemon and the requirements of your app.
MySQL is itself run as a daemon, but a typical way of decreasing the number of calls to MySQL is to cache their output in Memcached (which not surprisingly also runs as a daemon). So the advantage of using Memcached isn't that it's a daemon, it's that it's a daemon more geared to a specific task (caching objects) than MySQLd (providing a SQL-queryable database).
If your app repeatedly needs to make the same SQL queries, then it's definitely worth considering using Memcache or another caching layer (which, yes, will most likely be provided by a daemon) in between the app and MySQL.

Multithreaded Programming in PHP to avoid runtime limitations

I know about PHP not being multithreaded but i talked with a friend about this: If i have a large algorithmic problem i want to solve with PHP isn't the solution to simply using the "curl_multi_xxx" interface and start n HTTP requests on the same server. This is what i would call PHP style multithreading.
Are there any problems with this in the typical webserver environment? The master request which is waiting for "curl_multi_exec" shouldn't count any time against its maximum runtime or memory length.
I have never seen this anywhere promoted as a solution to prevent a script killed by too restrictive admin settings for PHP.
If i add this as a feature into a popular PHP system will there be server admins hiring a russian mafia hitman to get revenge for this hack?

If i add this as a feature into a
popular PHP system will there be
server admins hiring a russian mafia
hitman to get revenge for this hack?
No but it's still a terrible idea for no other reason than PHP is supposed to render web pages. Not run big algorithms. I see people trying to do this in ASP.Net all the time. There are two proper solutions.
Have your PHP script spawn a process
that runs independently of the web
server and updates a common data
store (probably a database) with
information about the progress of
the task that your PHP scripts can
access.
Have a constantly running daemon
that checks for jobs in a common
data store that the PHP scripts can
issue jobs to and view the progress
on currently running jobs.

By using curl, you are adding a network timeout dependency into the mix. Ideally you would run everything from the command line to avoid timeout issues.
PHP does support forking (pcntl_fork). You can fork some processes and then monitor them with something like pcntl_waitpid. You end up with one "parent" process to monitor the children it spanned.
Keep in mind that while one process can startup, load everything, then fork, you can't share things like database connections. So each forked process should establish it's own. I've used forking for up 50 processes.
If forking isn't available for your install of PHP, you can spawn a process as Spencer mentioned. Just make sure you spawn the process in such a way that it doesn't stop processing of your main script. You also want to get the process ID so you can monitor the spawned processes.
exec("nohup /path/to/php.script > /dev/null 2>&1 & echo $!", $output);
$pid = $output[0];
You can also use the above exec() setup to spawn a process started from a web page and get control back immediately.

Out of curiosity - what is your "large algorithmic problem" attempting to accomplish?
You might be better to write it as an Amazon EC2 service, then sell access to the service rather than the package itself.
Edit: you now mention "mass emails". There are already services that do this, they're generally known as "spammers". Please don't.

Lothar,
As far as I know, php don't work with services, like his concorrent, so you don't have a way for php to know how much time have passed unless you're constantly interrupting the process to check the time passed .. So, imo, no, you can't do that in php :)

Does PHP proc_nice leave Apache threads at new priority setting?

When executing proc_nice(), is it actually nice'ing Apache's thread?
If so, and if the current user (non-super user) can't renice to its original priority is killing the Apache thread appropriate (apache_child_terminate) on an Apache 2.0x server?
The issue is that I am trying to limit the impact of an app that allows the user to run Ad-Hack queries. The Queries can be massive and the resultant transform on the data requires a lot of Memory and CPU.
I've already re-written the process to be more stream based - helping with the memory consumption, but I would also like the process to run a lower priority. However I can't leave the Apache thread in low priority as we have a lot of high-priority web services running on this same box.
TIA

In that kind of situation, a solution if often to not do that kind of heavy work within the Apache processes, but either :
run an external PHP process, using something like shell_exec, for instance -- this is if you must work in synchronous mode (ie, if you cannot execute the task a couple of minutes later)
push the task to a FIFO system, and immediatly return a message to the user saying "your task will be processed soon"
and have some other process (launched via a crontab every minute, for instance) check that FIFO queue
and do the processing it there is something in the queue
That process, itself, can run in low priority mode.
As often as possible, especially if the heavy calculations take some time, I would go for the second solution :
It allows users to get some feedback immediatly : "the server has received your request, and will process it soon"
It doesn't keep Apaches's processes "working" for long : the heavy stuff is done by other processes
If, one day, you need such an amount of processing power that one server is not enough anymore, this kind of system will be easier to scale : just add a second server that'll pick from the same FIFO queue
If your server is really too loaded, you can stop processing from the queue, at least for some time, so the load can get better -- for instance, this can be usefull if your critical web-services are used a lot in a specific time-frame.
Another (nice-looking, but I haven't tried it yet) solution would be to use some kind of tool like, for instance, Gearman :
Gearman provides a generic application
framework to farm out work to other
machines or processes that are better
suited to do the work. It allows you
to do work in parallel, to load
balance processing, and to call
functions between languages. It can be
used in a variety of applications,
from high-availability web sites to
the transport of database replication
events. In other words, it is the
nervous system for how distributed
processing communicates.

PHP: Multithreaded PHP / Web Services?

Greetings All!
I am having some troubles on how to execute thousands upon thousands of requests to a web service (eBay), I have a limit of 5 million calls per day, so there are no problems on that end.
However, I'm trying to figure out how to process 1,000 - 10,000 requests every minute to every 5 minutes.
Basically the flow is:
1) Get list of items from database (1,000 to 10,000 items)
2) Make a API POST request for each item
3) Accept return data, process data, update database
Obviously a single PHP instance running this in a loop would be impossible.
I am aware that PHP is not a multithreaded language.
I tried the CURL solution, basically:
1) Get list of items from database
2) Initialize multi curl session
3) For each item add a curl session for the request
4) execute the multi curl session
So you can imagine 1,000-10,000 GET requests occurring...
This was ok, around 100-200 requests where occurring in about a minute or two, however, only 100-200 of the 1,000 items actually processed, I am thinking that i'm hitting some sort of Apache or MySQL limit?
But this does add latency, its almost like performing a DoS attack on myself.
I'm wondering how you would handle this problem? What if you had to make 10,000 web service requests and 10,000 MySQL updates from the return data from the web service... And this needs to be done in at least 5 minutes.
I am using PHP and MySQL with the Zend Framework.
Thanks!

I've had to do something similar, but with Facebook, updating 300,000+ profiles every hour. As suggested by grossvogel, you need to use many processes to speed things up because the script is spending most of it's time waiting for a response.
You can do this with forking, if your PHP install has support for forking, or you can just execute another PHP script via the command line.
exec('nohup /path/to/script.php >> /tmp/logfile 2>&1 & echo $!'), $processId);
You can pass parameters (getopt) to the php script on the command line to tell it which "batch" to process. You can have the master script do a sleep/check cycle to see if the scripts are still running by checking for the process id's. I've tested up to 100 scripts running at once in this manner, at which point the CPU load can get quite high.
Combine multiple processes with multi-curl, and you should easily be able to do what you need.

My two suggestions are (a) do some benchmarking to find out where your real bottlenecks are and (b) use batching and cacheing wherever possible.
Mysqli allows multiple-statement queries, so you could definitely batch those database updates.
The http requests to the web service are more likely the culprit, though. Check the API you're using to see if you can get more info from a single call, maybe? To break up the work, maybe you want a single master script to shell out to a bunch of individual processes, each of which makes an api call and stores the results in a file or memcached. The master can periodically read the results and update the db. (Careful to rotate the data store for safe reading and writing by multiple processes.)

To understand your requirements better, you must implement your solution only in PHP? Or you can interface a PHP part with another part written in another language?

If you could not go for another language, try to perform this update maybe as php script that runs in the background and not through the apache.

You can follow Brent Baisley advice for a simple use case.
If you want to build a robuts solution, then you need to :
set up a representation of the actions in a table in database that will be your process queue;
set up a script that pop this queue and process your action;
set up a cron daemon that run this script every x.
This way you can have 1000 PHP scripts running, using your OS parallelism capabilities and not hanging when ebay is taking to to respond.
The real advantage of this system is that you can fully control the firepower you throw at your task by adjusting :
the number of request one PHP script does;
the order / number / type / priority of the action in the queue;
the number or scripts the cron daemon runs.

Thanks everyone for the awesome and quick answers!
The advice from Brent Baisley and e-satis works nicely, rather than executing the sub-processes using CURL like i did before, the forking takes a massive load off, it also nicely gets around the issues with max out my apache connection limit.
Thanks again!

It is true that PHP is not multithreaded, but it can certainly be setup with multiple processes.
I have created a system that resemebles the one that you are describing. It's running in a loop and is basically a background process. It uses up to 8 processes for batch processing and a single control process.
It is somewhat simplified because i do not have to have any communication between the processes. Everything resides in a database so each process is spawned with the full context taken from the database.
Here is a basic description of the system.
1. Start control process
2. Check database for new jobs
3. Spawn child process with the job data as a parameter
4. Keep a table of the child processes to be able to control the number of simultaneous processes.
Unfortunately it does not appear to be a widespread idea to use PHP for this type of application, and i really had to write wrappers for the low level functions.
The manual has a whole section on these functions, and it appears that there are methods for allowing IPC as well.
PCNTL has the functions to control forking/child processes, and Semaphore covers IPC.
The interesting part of this is that i'm able to fork off actual PHP code, not execute other programs.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.