I have 3 servers with processes that require all the CPU they can get. I let these processes write their standard output to a file
./run.sh > run.log
Would this writing slow down the process? (the ./run.sh script starts eg. a Java program and sometimes a Ruby program)
Now I want to create a web-interface that would display the output from the script while it is running. I can imagine writing a php script which refreshes every 5 seconds, creates a SSH connection to the server does and gets the run.log file.
But wouldn't that interfere with the process or slow it down? It is really crutial that the server is able to use as much of it's power as possible. Or are there better ways to handle this? Instead of creating an SSH connection every 5 seconds, maby a persistent connection and update with Ajax? (security is not a requirement)
Would this writing slow down the process? (the ./run.sh script starts eg. a Java program and sometimes a Ruby program)
Maybe; if the process writes a lot of data, it can easily slowdown the process, because likely the process will be writing synchronously to the disk. Otherwise, you don't have to worry.
An alternative would be having a setup where the script sends the output to the machine with the web application via some kind of message service. This would avoid polling the server; whether it would be more efficient depends on the details.
A simple an efficient mechanism would be forwarding stdout to a UDP socket and have the web application listen and store temporarily those messages in a circular buffer.
Related
I have a Ubuntu server which is collecting incoming SNMP traps. Currently these traps are handled and logged using a PHP script.
file /etc/snmp/snmptrapd.conf
traphandle default /home/svr/00-VHOSTS/nagios/scripts/snmpTrap.php
This script is quite long and it contains many database operations. Usually the server receives thousands of traps per day and therefore this script is taking too much CPU time. My understand is this is due to high start-up cost of the php script every-time when a trap received.
I got a request to re-write this and I was thinking of running this script as a daemon. I can create an Ubuntu daemon. My question is how can I pass trap-handler to this daemon using snmptrapd.conf file?
Thank you in advance.
One suggestion is to use mysql support thats built into 5.5 of snmptrapd. That way you can use mysql as a queue and process the traps in bulk.
details of this are on the snmptrapd page: http://www.net-snmp.org/wiki/index.php/Snmptrapd
If not using mysql another option is to use a named pipe.
Do mkfifo snmptrapd.log
Now change snmptrapd to write to this log. Its not a file but it looks like one. You then write another daemon to watch the named pipe for new data.
You can probably use php-fpm / php-fcgi to minimize PHP script start-up cost.
Although, you probably need to write some wrapper shell script to forward request from snmptrapd to fcgi protocol.
But at first I'd recommend checking the PHP script. PHP start-up cost is not that high that few requests per minute should rise CPU usage notably.
I'm writing a php code for a web server where it's required to do some heavy duty processes when requested before returning the results to the users.
My question is: does the apache server creates a separate thread/process for each client or should I use multi-threading to separate them?
The processes include calling the execution of other applications through cmd and downloading files to the server.
Well every request to the web server is a separate process which will try to use a free core from the CPU, and if there isn't a free one currently, it will go on a que and wait.
You can't have multithreading in php with apache within a single web request. You simply can't. Usually at each request apache forks a new O.S. process.
This is configurable, but typically chosen when working with php, since many methods of php standard library are not thread safe.
When I had to handle heavy computation I always choose to make the user request asynchronous, and let a third-process daemon to do the actual computation in background. In this case, after the user request, I let the client to poll the daemon (through others web-requests) to know when the computation is done.
Backgrond:
I currently have a daemon written in PHP. I knew PHP wasn't the best solution to this problem when I wrote it, but it's what I had access to at the time and what I'm doing makes PHP more than ideal.
Actually, I am using two daemons in PHP. Both are simple while(true) loops with set_time_limit(0). One likes to crash more than the other (which isn't a problem because I have a cron that restarts it if it ever crashes) and I'm guessing it's because of the increased network activity.
Anyway, the daemons:
Daemon 1:
This daemon requests information from an external server, loops very intensely through that data (some 10+ foreach's) and inserts it into a database. It is doing this 24/7. It is critical this daemon is running at 11:59pm each day.
Daemon 2:
This daemon requests the same data. However, when this loops through that data if acts upon certain data found and makes a external network request if it does. It makes requests like this fairly often. Probably around once every few minutes if it's running properly (if it crashes and needs to be restarted or freezes, the requests will build up..) This daemon absolutely loves to crash. Crashes are okay, though. This daemon also likes to freeze where it must be killed to start working again.
The problem:
Well, requesting the same data twice (currently like twice per second per script) is extremely inefficient. I need to merge them both into one daemon. However, daemon 1 is critical and needs to be doing it's job. If the more buggy daemon after merging causes the daemon to crash I could have problems.
So, the question:
I'm thinking I could create the new daemon to make the requests outside of the script. What I mean is when the new daemon needs to make a network request (that would really slow down the script and likely cause more issues) it calls another script (that wouldn't block the main script). So for example, the new daemon needs to make 20 network requests, it can send those 20 requests all at the same time by calling another script to handle them. This takes the work from the daemon and likely will cause less crashes and I will not need to request the same data twice.
Here's the situation: We have a bunch of python scripts continuously doing stuff and ultimately writing data in mysql, and we need a log to analyse the error rate and script performance.
We also have php front-end that interacts with the mysql data and we also need to log the user actions so that we can analyse their behaviour, and compute some scoring functions.
So we thought of having a mysql table table for each case (one for "python scripts" log and one for "user actions" log).
Ideally, we would be writing to thsese log tables asynchronously, for performance and low-latency reasons. Is there a way to do so in Python (we are using django ORM) and in PHP (we are using Yii Framework) ?
Are there any better approaches for solving this problem ?
Update :
for the user actions, (Web UI), we are now considering loading the Apache Log into mysql with relevant session info automatically through simple Apache configuration
There are (AFAIK) only two ways to do anything a-synchronously in PHP:
Fork the process (requires pcntl_fork)
exec() a process and release it by (assuming *nix) appending > /dev/null & to the end of the command string.
Both of these approaches result in a new process being created, albeit temporarily, so whether this would afford any performance increase is debatable and depends highly on your server environment - I suspect it would make things worse, not better. If your database is very heavily loaded (and therefore the thing that is slowing you down) you might get a faster result from dumping the log messages to file, and having a daemon script that crawls for thing to enter into the DB - but again, whether this would help is debatable.
Python supports multi-threading which makes life a lot easier.
You could open a raw Unix or network socket to a logging service that caches messages and writes them to disk or database asynchronously. If your PHP and Python processes are long-running and generate many messages per execution, keeping an open socket would be more performant than making separate HTTP/database requests synchronously.
You'd have to measure it compared to appending to a file (open once then lock, seek, write, and unlock while running and close at end) to see which is faster.
I know about PHP not being multithreaded but i talked with a friend about this: If i have a large algorithmic problem i want to solve with PHP isn't the solution to simply using the "curl_multi_xxx" interface and start n HTTP requests on the same server. This is what i would call PHP style multithreading.
Are there any problems with this in the typical webserver environment? The master request which is waiting for "curl_multi_exec" shouldn't count any time against its maximum runtime or memory length.
I have never seen this anywhere promoted as a solution to prevent a script killed by too restrictive admin settings for PHP.
If i add this as a feature into a popular PHP system will there be server admins hiring a russian mafia hitman to get revenge for this hack?
If i add this as a feature into a
popular PHP system will there be
server admins hiring a russian mafia
hitman to get revenge for this hack?
No but it's still a terrible idea for no other reason than PHP is supposed to render web pages. Not run big algorithms. I see people trying to do this in ASP.Net all the time. There are two proper solutions.
Have your PHP script spawn a process
that runs independently of the web
server and updates a common data
store (probably a database) with
information about the progress of
the task that your PHP scripts can
access.
Have a constantly running daemon
that checks for jobs in a common
data store that the PHP scripts can
issue jobs to and view the progress
on currently running jobs.
By using curl, you are adding a network timeout dependency into the mix. Ideally you would run everything from the command line to avoid timeout issues.
PHP does support forking (pcntl_fork). You can fork some processes and then monitor them with something like pcntl_waitpid. You end up with one "parent" process to monitor the children it spanned.
Keep in mind that while one process can startup, load everything, then fork, you can't share things like database connections. So each forked process should establish it's own. I've used forking for up 50 processes.
If forking isn't available for your install of PHP, you can spawn a process as Spencer mentioned. Just make sure you spawn the process in such a way that it doesn't stop processing of your main script. You also want to get the process ID so you can monitor the spawned processes.
exec("nohup /path/to/php.script > /dev/null 2>&1 & echo $!", $output);
$pid = $output[0];
You can also use the above exec() setup to spawn a process started from a web page and get control back immediately.
Out of curiosity - what is your "large algorithmic problem" attempting to accomplish?
You might be better to write it as an Amazon EC2 service, then sell access to the service rather than the package itself.
Edit: you now mention "mass emails". There are already services that do this, they're generally known as "spammers". Please don't.
Lothar,
As far as I know, php don't work with services, like his concorrent, so you don't have a way for php to know how much time have passed unless you're constantly interrupting the process to check the time passed .. So, imo, no, you can't do that in php :)