Faster alternative to file_get_contents()

Faster alternative to file_get_contents() - php

Currently I'm using file_get_contents() to submit GET data to an array of sites, but upon execution of the page I get this error:
Fatal error: Maximum execution time of 30 seconds exceeded
All I really want the script to do is start loading the webpage, and then leave. Each webpage may take up to 5 minutes to load fully, and I don't need it to load fully.
Here is what I currently have:
foreach($sites as $s) //Create one line to read from a wide array
{
file_get_contents($s['url']); // Send to the shells
}
EDIT: To clear any confusion, this script is being used to start scripts on other servers, that return no data.
EDIT: I'm now attempting to use cURL to do the trick, by setting a timeout of one second to make it send the data and then stop. Here is my code:
$ch = curl_init($s['url']); //load the urls
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 1); //Only send the data, don't wait.
curl_exec($ch); //Execute
curl_close($ch); //Close it off.
Perhaps I've set the option wrong. I'm looking through some manuals as we speak. Just giving you an update. Thank you all of you that are helping me thus far.
EDIT: Ah, found the problem. I was using CURLOPT_CONNECTTIMEOUT instead of CURLOPT_TIMEOUT. Whoops.
However now, the scripts aren't triggering. They each use ignore_user_abort(TRUE); so I can't understand the problem
Hah, scratch that. Works now. Thanks a lot everyone

There are many ways to solve this.
You could use cURL with its curl_multi_* functions to execute asynchronously the requests. Or use cURL the common way but using 1 as timeout limit, so it will request and return timeout, but the request will be executed.
If you don't have cURL installed, you could continue using file_get_contents but forking processes (not so cool, but works) using something like ZendX_Console_Process_Unix so you avoid the waiting between each request.

As Franco mentioned and I'm not sure was picked up on, you specifically want to use the curl_multi functions, not the regular curl ones. This packs multiple curl objects into a curl_multi object and executes them simultaneously, returning (or not, in your case) the responses as they arrive.
Example at http://php.net/curl_multi_init

Re your update that you only need to trigger the operation:
You could try using file_get_contents with a timeout. This would lead to the remote script being called, but the connection being terminated after n seconds (e.g. 1).
If the remote script is configured so it continues to run even if the connection is aborted (in PHP that would be ignore_user_abort), it should work.
Try it out. If it doesn't work, you won't get around increasing your time_limit or using an external executable. But from what you're saying - you just need to make the request - this should work. You could even try to set the timeout to 0 but I wouldn't trust that.
From here:
<?php
$ctx = stream_context_create(array(
'http' => array(
'timeout' => 1
)
)
);
file_get_contents("http://example.com/", 0, $ctx);
?>
To be fair, Chris's answer already includes this possibility: curl also has a timeout switch.

it is not file_get_contents() who consume that much time but network connection itself.
Consider not to submit GET data to an array of sites, but create an rss and let them get RSS data.

I don't fully understands the meaning behind your script.
But here is what you can do:
In order to avoid the fatal error quickly you can just add set_time_limit(120) at the beginning of the file. This will allow the script to run for 2 minutes. Of course you can use any number that you want and 0 for infinite.
If you just need to call the url and you don't "care" for the result you should use cUrl in asynchronous mode. This case any call to the URL will not wait till it finished. And you can call them all very quickly.
BR.

If the remote pages take up to 5 minutes to load, your file_get_contents will sit and wait for that 5 minutes. Is there any way you could modify the remote scripts to fork into a background process and do the heavy processing there? That way your initial hit will return almost immediately, and not have to wait for the startup period.
Another possibility is to investigate if a HEAD request would do the trick. HEAD does not return any data, just headers, so it may be enough to trigger the remote jobs and not wait for the full output.

Related

Execute Code in Parallel in PHP to minimize execution time

Initial Condition: I have code written in php file. initially i was executing code, it was taking 30 seconds to execute. In this file the code was called 5 times.
What will happen next:Let if i need to execute this code 50 times then it will take 300 seconds in one execution in browser.next for 500 times 3000 secs. So it is serial execution of code.
What I Need: i need to execute this code in parallel. like several instance. So i would like to minimize the execution time so user has not wait for such long time.
What I Did: i used PHP CURL to execute this code parallel. I called this file several times to minimize the execution time.
So I want to know that is this method is correct. How much CURL i can execute and how much resources it require. It need a better method that how could i execute this code in parallel with tutorial.
any help will be grateful.

Probably the simplest option without changing your code (too much), though, would be to call PHP through the command line and not CURL. This cuts the overhead of APACHE (both in memory and speed), networking etc. Plus Curl is not a portable option as some servers can't see themselves (in network terms).
$process1 = popen('php myfile.php [parameters]');
$process2 = popen('php myfile.php [parameters]');
// get response from children : you can loop until all completed
$response1 = stream_get_contents($process1);
$response2 = stream_get_contents($process2);
You'll need to remove any reference to apache added variables in $_SERVER, and replace $_GET with argv/argc references. Both otherwise it should just work.
But the best solution will probably be pThreads (http://php.net/manual/en/book.pthreads.php) that allow you to do what you want. Will require some editing of code (and installing, possibly) but does what you're asking.

php curl is low enough overhead to not have to worry about it. If you can make loopback calls to a server farm through a load balancer, that's a good use case for curl. I've also used pcntl_fork() for same-host parallelism, but it's harder to set up. I've written classes built on both; see my php lib at https://github.com/andrasq/quicklib for ideas (or just borrow code, it's open source)

Consider using Gearman. Documentation :
http://php.net/manual/en/book.gearman.php

PHP Background Process on BSD uses 100% CPU

I have a PHP script that runs as a background process. This script simply uses fopen to read from the Twitter Streaming API. Essentially an http connection that never ends. I can't post the script unfortunately because it is proprietary. The script on Ubuntu runs normally and uses very little CPU. However on BSD the script always uses nearly a 100% CPU. The script is working just fine on both machines and is the exact same script. Can anyone think of something that might point me in the right direction to fix this? This is the first PHP script I have written to consistently run in the background.
The script is an infinite loop, it reads the data out and writes to a json file every minute. The script will write to a MySQL database whenever a reconnect happens, which is usually after days of running. The script does nothing else and is not very long. I have little experience with BSD or writing PHP scripts that run infinite loops. Thanks in advance for any suggestions, let me know if this belongs in another StackExchange. I will try to answer any questions as quickly as possible, because I realize the question is very vague.

Without seeing the script, this is very difficult to give you a definitive answer, however what you need to do is ensure that your script is waiting for data appropriately. What you should absolutely definitely not do is call stream_set_timeout($fp, 0); or stream_set_blocking($fp, 0); on your file pointer.
The basic structure of a script to do something like this that should avoid racing would be something like this:
// Open the file pointer and set blocking mode
$fp = fopen('http://www.domain.tld/somepage.file','r');
stream_set_timeout($fp, 1);
stream_set_blocking($fp, 1);
while (!feof($fp)) { // This should loop until the server closes the connection
// This line should be pretty much the first line in the loop
// It will try and fetch a line from $fp, and block for 1 second
// or until one is available. This should help avoid racing
// You can also use fread() in the same way if necessary
if (($str = fgets($fp)) === FALSE) continue;
// rest of app logic goes here
}
You can use sleep()/usleep() to avoid racing as well, but the better approach is to rely on a blocking function call to do your blocking. If it works on one OS but not on another, try setting the blocking modes/behaviour explicitly, as above.
If you can't get this to work with a call to fopen() passing a HTTP URL, it may be a problem with the HTTP wrapper implementation in PHP. To work around this, you could use fsockopen() and handle the request yourself. This is not too difficult, especially if you only need to send a single request and read a constant stream response.

It sounds to me like one of your functions is blocking briefly on Linux, but not BSD. Without seeing your script it is hard to get specific, but one thing I would suggest is to add a usleep() before the next loop iteration:
usleep(100000); //Sleep for 100ms
You don't need a long sleep... just enough so that you're not using 100% CPU.
Edit: Since you mentioned you don't have a good way to run this in the background right now, I suggest checking out this tutorial for "daemonizing" your script. Included is some handy code for doing this. It can even make a file in init.d for you.

How does the code look like that does the actual reading? Do you just hammer the socket until you get something?
One really effective way to deal with this is to use the libevent extension, but that's not for the feeble minded.

Wait for a PHP page to finish processing a SOAP request in the background and then execute the next statement?

I have PHP code that calls a SOAP request to a remote server. The remote server processes the SOAP request and I can then fetch the results.
Ideally I would like to call the SAOP request, wait 5 seconds, and then go and look for the results. The reason being the remote server takes a couple of seconds to finish it's processing. I have no control over the remote server.
At present I have this code:
$object = new Resource_Object();
$identifier = $_GET['id'];
$object->sendBatch($id));
sleep(5);
$results = $object->getBatchReport();
echo $results;
The problem with the above code is sendBatch takes a few seconds to complete. After adding sleep(5) the page take 5 seconds longer to load, but still the results are not displayed. If I load the page again, or call getBatchReport() from another page, the results are there.
I guess this has something to do with the statelessness of HTML that is causing the whole page to execute at once. I considered using 'output buffering' but I don't really understand what output buffering is for.
I was also considering using jQuery and Ajax to continuously poll getBatchReport(), but the problem is that I need to call this page from another location and as sendBatch() grows the 5 second delay might go up, probably to about 2 minutes. I don't think Ajax will work if I call this page remotely (this page is already being called in the background spawned by
/dev/null 2>&1 &
).
I have no control over the remote server specified in sendBatch routine and as far as I know it doesn't have any callback functions. I would prefer not to use CRON because that would mean I have to query the remote server the whole time.
Any ideas?

I was overly optimistic when I thought 5 seconds would do the job. Upon retesting I found that actually 15 seconds is a more realistic value. It's working now.

PHP Multiple Curl Requests

I'm currently using Curl for PHP a lot. It takes a lot of time to get results of about 100 pages each time. For every request i'm using code like this
$ch = curl_init();
// get source
curl_close($ch);
What are my options to speed things up?
How should I use the multi_init() etc?

Reuse the same cURL handler ($ch) without running curl_close. This will speed it up just a little bit.
Use curl_multi_init to run the processes in parallel. This can have a tremendous effect.

take curl_multi - it is far better. Save the handshakes - they are not needed every time!

when i use code given in "http://php.net/curl_multi_init", response of 2 requests are conflicting.
But the code written in below link, returns each response separately (in array format)
https://stackoverflow.com/a/21362749/3177302

or take pcntl_fork, fork some new threads to execute curl_exec. But it's not as good as curl_multi.

How to execute a PHP spider/scraper but without it timing out

Basically I need to get around max execution time.
I need to scrape pages for info at varying intervals, which means calling the bot at those intervals, to load a link form the database and scrap the page the link points to.
The problem is, loading the bot. If I load it with javascript (like an Ajax call) the browser will throw up an error saying that the page is taking too long to respond yadda yadda yadda, plus I will have to keep the page open.
If I do it from within PHP I could probably extend the execution time to however long is needed but then if it does throw an error I don't have the access to kill the process, and nothing is displayed in the browser until the PHP execute is completed right?
I was wondering if anyone had any tricks to get around this? The scraper executing by itself at various intervals without me needing to watch it the whole time.
Cheers :)

Use set_time_limit() as such:
set_time_limit(0);
// Do Time Consuming Operations Here

"nothing is displayed in the browser until the PHP execute is completed"
You can use flush() to work around this:
flush()
(PHP 4, PHP 5)
Flushes the output buffers of PHP and whatever backend PHP is using (CGI, a web server, etc). This effectively tries to push all the output so far to the user's browser.

take a look at how Sphider (PHP Search Engine) does this.
Basically you will just process some part of the sites you need, do your thing, and go on to the next request if there's a continue=true parameter set.

run via CRON and split spider into chunks, so it will only do few chunks at once. call from CRON with different parameteres to process only few chunks.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Faster alternative to file_get_contents() - php

it is not file_get_contents() who consume that much time but network connection itself. Consider not to submit GET data to an array of sites, but create an rss and let them get RSS data.

Related

Execute Code in Parallel in PHP to minimize execution time

PHP Background Process on BSD uses 100% CPU

Wait for a PHP page to finish processing a SOAP request in the background and then execute the next statement?

PHP Multiple Curl Requests

How to execute a PHP spider/scraper but without it timing out

Categories

Resources