I have a PHP script that runs as a background process. This script simply uses fopen to read from the Twitter Streaming API. Essentially an http connection that never ends. I can't post the script unfortunately because it is proprietary. The script on Ubuntu runs normally and uses very little CPU. However on BSD the script always uses nearly a 100% CPU. The script is working just fine on both machines and is the exact same script. Can anyone think of something that might point me in the right direction to fix this? This is the first PHP script I have written to consistently run in the background.
The script is an infinite loop, it reads the data out and writes to a json file every minute. The script will write to a MySQL database whenever a reconnect happens, which is usually after days of running. The script does nothing else and is not very long. I have little experience with BSD or writing PHP scripts that run infinite loops. Thanks in advance for any suggestions, let me know if this belongs in another StackExchange. I will try to answer any questions as quickly as possible, because I realize the question is very vague.
Without seeing the script, this is very difficult to give you a definitive answer, however what you need to do is ensure that your script is waiting for data appropriately. What you should absolutely definitely not do is call stream_set_timeout($fp, 0); or stream_set_blocking($fp, 0); on your file pointer.
The basic structure of a script to do something like this that should avoid racing would be something like this:
// Open the file pointer and set blocking mode
$fp = fopen('http://www.domain.tld/somepage.file','r');
stream_set_timeout($fp, 1);
stream_set_blocking($fp, 1);
while (!feof($fp)) { // This should loop until the server closes the connection
// This line should be pretty much the first line in the loop
// It will try and fetch a line from $fp, and block for 1 second
// or until one is available. This should help avoid racing
// You can also use fread() in the same way if necessary
if (($str = fgets($fp)) === FALSE) continue;
// rest of app logic goes here
}
You can use sleep()/usleep() to avoid racing as well, but the better approach is to rely on a blocking function call to do your blocking. If it works on one OS but not on another, try setting the blocking modes/behaviour explicitly, as above.
If you can't get this to work with a call to fopen() passing a HTTP URL, it may be a problem with the HTTP wrapper implementation in PHP. To work around this, you could use fsockopen() and handle the request yourself. This is not too difficult, especially if you only need to send a single request and read a constant stream response.
It sounds to me like one of your functions is blocking briefly on Linux, but not BSD. Without seeing your script it is hard to get specific, but one thing I would suggest is to add a usleep() before the next loop iteration:
usleep(100000); //Sleep for 100ms
You don't need a long sleep... just enough so that you're not using 100% CPU.
Edit: Since you mentioned you don't have a good way to run this in the background right now, I suggest checking out this tutorial for "daemonizing" your script. Included is some handy code for doing this. It can even make a file in init.d for you.
How does the code look like that does the actual reading? Do you just hammer the socket until you get something?
One really effective way to deal with this is to use the libevent extension, but that's not for the feeble minded.
Related
After much digging around I haven't been able to identify the issue behind a race condition I'm finding on a little PHP pseudo-cron.
The code looks like this:
fh = fopen(ROOT . '.cron.lock', 'w+');
if (flock($fh, LOCK_EX|LOCK_NB)) {
//Cron logic goes here
flock($fh, LOCK_UN);
}
It should be pretty straightforward, and usually does work. The point is that every so often, this little cron executes twice (sending a duplicate email to a user), which is rather annoying.
Initially I thought I'd have to use the third $ewouldblock parameter. But this has turned out not to work, just causing the cron to execute always without regard for any other process.
Whenever I test this code in a CLI environment, it works perfectly fine. But it won't as soon as I move over to using the script inside a HTTPD (Apache) request.
http://php.net/manual/en/function.flock.php
If anyone can help, or maybe give me a pointer what the whole $ewouldblock parameter is about (since the documentation is not very straightforward about it) I would appreciate a lot.
I have a very troubling problem at hand. I am using a web-socket server that runs in PHP. The issue is I need to be able to use a setInterval/setTimeout function similar to JavaScript, but within my php socket server.
I do not have the time or resources to convert my entire project over to nodejs/javascript. It will take forever. I love php so much, that I do not want to make the switch. Everything else works fine and I feel like it's not worth it to re-write everything just because I cannot use a similar setInterval function inside php.
Since the php socket server runs through the shell, I can use a setInterval type function using a loop:
http://pastebin.com/nzcvXRph
This code does work as intended, but it seems a bit overboard for resources and I feel like that while loop will suck a lot resources.
Is there anyway I can re-compile PHP from source and include a "while2" loop that only iterates every 500 milliseconds instead of instantly?
I don't think there is a way to recompile PHP from source.
If you want to delay the execution of the loop you could use the sleep function, which is used for delaying execution.
For example, I want to print 10 number after every 2 seconds then the code below should do the job.
for($i=0;$i<=10;$i++)
{
print($i++);
sleep(2);
}
Check thee PHP docs here.
EDIT
Following up what I mentioned in the replies, if you want the user to have its own instance of the run time, then threads would be an option. There is very limited examples of multi threaded application in PHP, I would recommend to check out some examples in JAVA, it shouldn't he hard to understand. Here is a good video tutorial.
For PHP
php.net/threads
Check out the contributor notes, sometimes people write good examples.
Initial Condition: I have code written in php file. initially i was executing code, it was taking 30 seconds to execute. In this file the code was called 5 times.
What will happen next:Let if i need to execute this code 50 times then it will take 300 seconds in one execution in browser.next for 500 times 3000 secs. So it is serial execution of code.
What I Need: i need to execute this code in parallel. like several instance. So i would like to minimize the execution time so user has not wait for such long time.
What I Did: i used PHP CURL to execute this code parallel. I called this file several times to minimize the execution time.
So I want to know that is this method is correct. How much CURL i can execute and how much resources it require. It need a better method that how could i execute this code in parallel with tutorial.
any help will be grateful.
Probably the simplest option without changing your code (too much), though, would be to call PHP through the command line and not CURL. This cuts the overhead of APACHE (both in memory and speed), networking etc. Plus Curl is not a portable option as some servers can't see themselves (in network terms).
$process1 = popen('php myfile.php [parameters]');
$process2 = popen('php myfile.php [parameters]');
// get response from children : you can loop until all completed
$response1 = stream_get_contents($process1);
$response2 = stream_get_contents($process2);
You'll need to remove any reference to apache added variables in $_SERVER, and replace $_GET with argv/argc references. Both otherwise it should just work.
But the best solution will probably be pThreads (http://php.net/manual/en/book.pthreads.php) that allow you to do what you want. Will require some editing of code (and installing, possibly) but does what you're asking.
php curl is low enough overhead to not have to worry about it. If you can make loopback calls to a server farm through a load balancer, that's a good use case for curl. I've also used pcntl_fork() for same-host parallelism, but it's harder to set up. I've written classes built on both; see my php lib at https://github.com/andrasq/quicklib for ideas (or just borrow code, it's open source)
Consider using Gearman. Documentation :
http://php.net/manual/en/book.gearman.php
What is the best way to break up a recursive function that is using a ton of resources
For example:
function do_a_lot(){
//a lot of code and processing is done here
//it takes a lot of execution time
if($true){
//if true we have to do all of that processing again
do_a_lot();
}
}
Is there anyway to make the server only have to take the brunt of the first execution and then break up the recursion into separate processes? Or am I dreaming?
Honestly, if your function is using up that much of your system's resources, I'd most likely refactor my code. However, it's not truly multithreading, but you could perhaps look at using popen to fork your process.
One of the rule of PHP is "Share nothing". That means every PHP process is independant and shares nothing with the others. So if you want to break your execution on several PHP process you'll have to store the data somewhere. It can be a memcached storage, or a database, or the session, as you want.
Then you'll need to 'fork' your PHp process. They're solutions available to get this done on the server side. IMHO this is all hacks. Dangerous and not minded in the PHP/web way. With the exception of 'work queues' tools.
I think the nicest way is to break your task with ajax. This will allow you a clean user interface and will avoid any long response timeout in the web process. i.e. show a 'working zone' to you user, then ask in ajax for next step of the job (first one), get response (in server side stor you response), then ask for next step, store new response and respond , next step, etc. You can even add a 'stop that stuff' function on the client side.
You can check as well for 'php work queue' on google.
If it's a long running task, divide and conquer with gearman
There is a family of methods (birddog, shadow, and follow)in the Twitter API that opens a (mostly) permanent connection and allows you to follow many users. I've run the sample connection code with cURL in bash, and it works nicely: when a user I specify writes a tweet, I get a stream of XML in my console.
My question is: how can I access data with PHP that isn't returned as a direct function call, but is streamed? This data arrives sporadically and unpredictably, and it's not something I've ever dealt with nor do I know where to begin looking for answers. Any advice and descriptions of libraries or pitfalls would be appreciated.
fopen and fgets
<?php
$sock = fopen('http://domain.tld/path/to/file', 'r');
$data = null;
while(($data = fgets($sock)) == TRUE)
{
echo $data;
}
fclose($sock);
This is by no means great (or even good) code but it should provide the functionality you need. You will need to add error handling and data parsing among other things.
I'm pretty sure that your script will time out after ~30 seconds of listening for data on the stream. Even if it doesn't, once you get a significant server load, the sheer number of open and listening connections will bring the server to it's knees.
I would suggest you take a look at an AJAX solution that makes a call to a script that just stores a Queue of messages. I'm not sure how the Twitter API works exactly though, so I'm not sure if you can have a script run when requested to get all the tweets, or if you have to have some sort of daemon append the tweets to a Queue that PHP can read and pass back via your AJAX call.
There are libraries for this these days that make things much easier (and handle the tricky bits like reconnections, socket handling, TCP backoff, etc), ie:
http://code.google.com/p/phirehose/
I would suggest looking into using AJAX. Im not a PHP developer, but I would think that you could wire up an AJAX call to the API and update your web page.
Phirehose is definitely the way to go:
http://code.google.com/p/phirehose/