I can't allow for file_get_contents to work more than 1 second, if it is not possible - I need to skip to next loop.
for ($i = 0; $i <=59; ++$i) {
$f=file_get_contents('http://example.com');
if(timeout<1 sec) - do something and loop next;
else skip file_get_contents(), do semething else, and loop next;
}
Is it possible to make a function like this?
Actually I'm using curl_multi and I can't fugure out how to set timeout on a WHOLE curl_multi request.
If you are working with http urls only you can do the following:
$ctx = stream_context_create(array(
'http' => array(
'timeout' => 1
)
));
for ($i = 0; $i <=59; $i++) {
file_get_contents("http://example.com/", 0, $ctx);
}
However, this is just the read timeout, meaning the time between two read operations (or the time before the first read operation). If the download rate is constant, there should not being such gaps in the download rate and the download can take even an hour.
If you want the whole download not take more than a second you can't use file_get_contents() anymore. I would encourage to use curl in this case. Like this:
// create curl resource
$ch = curl_init();
for($i=0; $i<59; $i++) {
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
// set timeout
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
}
$ctx = stream_context_create(array(
'http' => array(
'timeout' => 1
)
)
);
file_get_contents("http://example.com/", 0, $ctx);
Source
Related
I have a custom cURL function that has to download huge number of images from remote server. I was banned a couple of times before when I used file_get_contents(). I found that curl_multi_init() is better option as with 1 connection it can download for example 20 images at once.
I made a custom functions that uses curl_init() and I am trying to figure out how I can implement curl_multi_init() so in my LOOP where I grab the list of 20 URLs from the database I can call my custom function and at the last loop to use curl_close(). At the current situation my function generates connection for each url in the LOOP. Here is the function:
function downloadUrlToFile($remoteurl,$newfileName){
$errors = 0;
$options = array(
CURLOPT_FILE => fopen('../images/products/'.$newfileName, 'w'),
CURLOPT_TIMEOUT => 28800,
CURLOPT_URL => $remoteurl,
CURLOPT_RETURNTRANSFER => 1
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$imageString =curl_exec($ch);
$image = imagecreatefromstring($imageString);
if($imageString !== false AND !empty($imageString)){
if ($image !== false){
$width_orig = imagesx($image);
if($width_orig > 1000){
$saveimage = copy_and_resize_remote_image_product($image,$newfileName);
}else $saveimage = file_put_contents('../images/products/'.$newfileName,$imageString);
}else $errors++;
}else $errors++;
curl_close($ch);
return $errors;
}
There has to be a way to use curl_multi_init() and my function downloadUrlToFile because:
I need to change the file name on the fly
In my function I am also checking several things for the remote image.. In the sample function I check the size only and resize it if neccessary but there is much more things done by this function (I cutted that part for shorter, but I also use the function to pass more variables..)
How should the code be changed so during the LOOP to connect only once to the remote server?
Thanks in advance
Try this pattern for Multi CURL
$urls = array($url_1, $url_2, $url_3);
$content = array();
$ch = array();
$mh = curl_multi_init();
foreach( $urls as $index => $url ) {
$ch[$index] = curl_init();
curl_setopt($ch[$index], CURLOPT_URL, $url);
curl_setopt($ch[$index], CURLOPT_HEADER, 0);
curl_setopt($ch[$index], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($mh, $ch[$index]);
}
$active = null;
for(;;) {
curl_multi_exec($mh, $active);
if($active < 1){
// all downloads completed
break;
}else{
// sleep-wait for more data to arrive on socket.
// (without this, we would be wasting 100% cpu of 1 core while downloading,
// with this, we'll be using like 1-2% cpu of 1 core instead.)
curl_multi_select($mh, 1);
}
}
foreach ( $ch AS $index => $c ) {
$content[$index] = curl_multi_getcontent($c);
curl_multi_remove_handle($mh, $c);
//You can add some functions here and use $content[$index]
}
curl_multi_close($mh);
UPDATE: Setup tested - it works - but my web-host cannot handle 600 email in about 6 seconds - I had each connection wait 20 seconds and then send one mail - those all went through
I have a mailing list with 600+ emails
I have a function to send out the 600+ emails
Unfortunately, there is a limit as to the execution time (90 seconds) - and therefore the script is shut down before it is completed. I cannot change the time with set_time_limit(0), as it is set by my web-host (not in an ini file that i can change either)
My solution is to make post requests from a main file to a sub file that will send out chunks of 100 mails at a time. But will these be sent without delay - or will they wait for an answer before sending the next request?
The code:
for($i=0;$i<$mails;$i+100) {
$url = 'http://www.bedsteforaeldreforasyl.dk/siteadmin/php/sender.php';
$myvars = 'start=' . $i . '&emne=' . $emne . '&besked=' . $besked;
$ch = curl_init( $url );
curl_setopt( $ch, CURLOPT_POST, 1);
curl_setopt( $ch, CURLOPT_POSTFIELDS, $myvars);
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt( $ch, CURLOPT_HEADER, 0);
curl_setopt( $ch, CURLOPT_SAFE_UPLOAD, 0);
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt( $ch, CURLOPT_TIMEOUT, 1);
$response = curl_exec( $ch );
curl_close($ch);
}
$mails is the total number of recipients
$start is the start row number i the SQL statement
Will this (as I hope) start 6 parallel connections - or will it (as I fear) start 6 procesesses each after the other?
In the receiving script I have:
<br>
ignore_user_abort(true);<br>
$q1 = "SELECT * FROM maillist LIMIT $start,100 ORDER BY navn";
Create six php scripts, one for each 100 emails (or pass a value (e.g. 0-5) to a single script).
Create a main script to call these six sub-scripts.
Use stream_socket_client() to call the sub-scripts.
The six scripts will run simultaneously.
You can catch anything echoed back by the sub-scripts (e.g. status).
$timeout = 120;
$buffer_size = 8192;
$result = array();
$sockets = array();
$id = 0;
header('Content-Type: text/plain; charset=utf-8');
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail1.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail2.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail3.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail4.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail5.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail6.php");
foreach($urls as $path){
$host = $path['host'];
$path = $path['path'];
$http = "GET $path HTTP/1.0\r\nHost: $host\r\n\r\n";
$stream = stream_socket_client("$host:80", $errno,$errstr, 120,STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT);
if ($stream) {
$sockets[] = $stream; // supports multiple sockets
fwrite($stream, $http);
}
else {
$err .= "$id Failed<br>\n";
}
}
echo $err;
while (count($sockets)) {
$read = $sockets;
stream_select($read, $write = NULL, $except = NULL, $timeout);
if (count($read)) {
foreach ($read as $r) {
$id = array_search($r, $sockets);
$data = fread($r, $buffer_size);
if (strlen($data) == 0) {
// echo "$id Closed: " . date('h:i:s') . "\n\n\n";
$closed[$id] = microtime(true);
fclose($r);
unset($sockets[$id]);
}
else {
$result[$id] .= $data;
}
}
}
else {
// echo 'Timeout: ' . date('h:i:s') . "\n\n\n";
break;
}
}
var_export($result);
I'll provide some ideas on how the objective can be achieved.
First Option - Use curl_multi_* suite of functions. It provides non-blocking cURL requests.
2 . Second Option - Use an asynchronous library like amphp or ReactPHP. Though it would essentially provide the same benefit as curl_multi_*, IIRC.
Use pcntl_fork() to create separate processes and distribute the job as in worker nodes.
Use pthreads extension, which essentially provides a userland PHP implementation of true multi-threading.
I'll warn you though, the last two options should be the last resort, since the parallel processing world comes up some spooky situations which can prove to be really pesky ;-).
I'd also probably suggest you that if you are planning to scale this sort of application, it'd be the best course of action to use some external service.
I'm trying to find a way to only quickly access a file and then disconnect immediately.
So I've decided to use cURL since it's the fastest option for me. But I can't figure out how I should "disconnect" cURL.
With the code below, Apache's access logs says that the file I tried accessing was indeed accessed, but I'm feeling a little iffy about this, because when I just run the while loop without breaking out of it, it just keeps looping. Shouldn't the loop stop when cURL has finished fetching the file? Or am I just being silly; is the loop just restarting constantly?
<?php
$Resource = curl_init();
curl_setopt($Resource, CURLOPT_URL, '...');
curl_setopt($Resource, CURLOPT_HEADER, 0);
curl_setopt($Resource, CURLOPT_USERAGENT, '...');
while(curl_exec($Resource)){
break;
}
curl_close($Resource);
?>
I tried setting the CURLOPT_CONNECTTIMEOUT_MS / CURLOPT_CONNECTTIMEOUT options to very small values, but it didn't help in this case.
Is there a more "proper" way of doing this?
This statement is superflous:
while(curl_exec($Resource)){
break;
}
Instead just keep the return value for future reference:
$result = curl_exec($Resource);
The while loop does not help anything. So now to your question: You can tell curl that it should only take some bytes from the body and then quit. That can be achieved by reducing the CURLOPT_BUFFERSIZE to a small value and by using a callback function to tell curl it should stop:
$withCallback = array(
CURLOPT_BUFFERSIZE => 20, # ~ value of bytes you'd like to get
CURLOPT_WRITEFUNCTION => function($handle, $data) {
echo "WRITE: (", strlen($data), ") $data\n";
return 0;
},
);
$handle = curl_init("http://stackoverflow.com/");
curl_setopt_array($handle, $withCallback);
curl_exec($handle);
curl_close($handle);
Output:
WRITE: (10) <!DOCTYPE
Another alternative is to make a HEAD request by using CURLOPT_NOBODY which will never fetch the body. But it's not a GET request.
The connect timeout settings are about how long it will take until the connect times out. The connect is the phase until the server accepts input from curl and curl starts to know about that the server does. It's not related to the phase when curl fetches data from the server, that's
CURLOPT_TIMEOUT The maximum number of seconds to allow cURL functions to execute.
You find a long list of available options in the PHP Manual: curl_setoptĀDocs.
Perhaps that might be helpful?
$GLOBALS["dataread"] = 0;
define("MAX_DATA", 3000); // how many bytes should be read?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.php.net/");
curl_setopt($ch, CURLOPT_WRITEFUNCTION, "handlewrite");
curl_exec($ch);
curl_close($ch);
function handlewrite($ch, $data)
{
$GLOBALS["dataread"] += strlen($data);
echo "READ " . strlen($data) . " bytes\n";
if ($GLOBALS["dataread"] > MAX_DATA) {
return 0;
}
return strlen($data);
}
I am using cURL multi to get data from some websites. With code:
function getURL($ids)
{
global $mh;
$curl = array();
$response = array();
$n = count($ids);
for($i = 0; $i < $n; $i++) {
$id = $ids[$i];
$url = 'http://www.domain.com/?id='.$id;
// Init cURL
$curl[$i] = curl_init($url);
curl_setopt($curl[$i], CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl[$i], CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl[$i], CURLOPT_USERAGENT, 'Googlebot/2.1 (http://www.googlebot.com/bot.html)');
//curl_setopt($curl[$i], CURLOPT_FORBID_REUSE, true);
//curl_setopt($curl[$i], CURLOPT_HEADER, false);
curl_setopt($curl[$i], CURLOPT_HTTPHEADER, array(
'Connection: Keep-Alive',
'Keep-Alive: 300'
));
// Set to multi cURL
curl_multi_add_handle($mh, $curl[$i]);
}
// Execute
do {
curl_multi_exec($mh, $flag);
} while ($flag > 0);
// Get response
for($i = 1; $i < $n; $i++) {
// Get data
$id = $ids[$i];
$response[] = array(
'id' => $id,
'data' => curl_multi_getcontent($curl[$i])
);
// Remove handle
//curl_multi_remove_handle($mh, $curl[$i]);
}
// Reponse
return $response;
}
But, i have problem is cURL open too many sockets to connect to webserver. Each connection, cURL create new socket to webserver.
I want to current connection is keep-alive for next connection. I don't want that 100 URL then cURL must create 100 sockets to handle :(
Please help me. Thanks so much !
So don't open that many sockets. Modify your code to only open X sockets, and then repeatedly use those sockets until all of your $ids have been consumed. That or pass fewer $ids into the function to begin with.
I know, this is old, but the correct answer has not been given, yet, IMHO.
Please have a look at th CURLMOPT_MAX_TOTAL_CONNECTIONS option, which should solve your problem:
https://curl.se/libcurl/c/CURLMOPT_MAX_TOTAL_CONNECTIONS.html
Also make sure, that multiplexing via HTTP/2 is not disabled accidentally:
https://curl.se/libcurl/c/CURLMOPT_PIPELINING.html
Classical HTTP/1 pipelining is no longer supported by cURL, but cURL can still re-use an existing HTTP/1 connection to send a new request once the current request has finished on that connection.
[Updated At Bottom]
Hi everyone.
Start With Short URLs:
Imagine that you've got a collection of 5 short urls (like http://bit.ly) in a php array, like this:
$shortUrlArray = array("http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123",
"http://bit.ly/123");
End with Final, Redirected URLs:
How can I get the final url of these short urls with php? Like this:
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
http://www.example.com/some-directory/some-page.html
I have one method (found online) that works well with a single url, but when looping over multiple urls, it only works with the final url in the array. For your reference, the method is this:
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
//$header['errno'] = $err;
//$header['errmsg'] = $errmsg;
//$header['content'] = $content;
print($header[0]);
return $header;
}
//Using the above method in a for loop
$finalURLs = array();
$lineCount = count($shortUrlArray);
for($i = 0; $i <= $lineCount; $i++){
$singleShortURL = $shortUrlArray[$i];
$myUrlInfo = get_web_page( $singleShortURL );
$rawURL = $myUrlInfo["url"];
array_push($finalURLs, $rawURL);
}
Close, but not enough
This method works, but only with a single url. I Can't use it in a for loop which is what I want to do. When used in the above example in a for loop, the first four elements come back unchanged, and only the final element is converted into its final url. This happens whether your array is 5 elements or 500 elements long.
Solution Sought:
Please give me a hint as to how you'd modify this method to work when used inside of a for loop with collection of urls (Rather than just one).
-OR-
If you know of code that is better suited for this task, please include it in your answer.
Thanks in advance.
Update:
After some further prodding I've found that the problem lies not in the above method (which, after all, seems to work fine in for loops) but possibly encoding. When I hard-code an array of short urls, the loop works fine. But when I pass in a block of newline-seperated urls from an html form using GET or POST, the above mentioned problem ensues. Are the urls somehow being changed into a format not compatible with the method when I submit the form????
New Update:
You guys, I've found that my problem was due to something unrelated to the above method. My problem was that the URL encoding of my short urls converted what i thought were just newline characters (separating the urls) into this: %0D%0A which is a line feed or return character... And that all short urls save for the final url in the collection had a "ghost" character appended to the tail, thus making it impossible to retrieve the final urls for those only. I identified the ghost character, corrected my php explode, and all works fine now. Sorry and thanks.
This may be of some help: How to put string in array, split by new line?
You would probably do something like this, assuming you're getting the URLs returned in POST:
$final_urls = array();
$short_urls = explode( chr(10), $_POST['short_urls'] ); //You can replace chr(10) with "\n" or "\r\n", depending on how you get your urls. And of course, change $_POST['short_urls'] to the source of your string.
foreach ( $short_urls as $short ) {
$final_urls[] = get_web_page( $short );
}
I get the following output, using var_dump($final_urls); and your bit.ly url:
http://codepad.org/8YhqlCo1
And my source: $_POST['short_urls'] = "http://bit.ly/123\nhttp://bit.ly/123\nhttp://bit.ly/123\nhttp://bit.ly/123";
I also got an error, using your function: Notice: Undefined offset: 0 in /var/www/test.php on line 27 Line 27: print($header[0]); I'm not sure what you wanted there...
Here's my test.php, if it will help: http://codepad.org/zI2wAOWL
I think you are almost have it there. Try this:
$shortUrlArray = array("http://yhoo.it/2deaFR",
"http://bit.ly/900913",
"http://bit.ly/4m1AUx");
$finalURLs = array();
$lineCount = count($shortUrlArray);
for($i = 0; $i < $lineCount; $i++){
$singleShortURL = $shortUrlArray[$i];
$myUrlInfo = get_web_page( $singleShortURL );
$rawURL = $myUrlInfo["url"];
printf($rawURL."\n");
array_push($finalURLs, $rawURL);
}
I implemented to get a each line of a plain text file, with one shortened url per line, the according redirect url:
<?php
// input: textfile with one bitly shortened url per line
$plain_urls = file_get_contents('in.txt');
$bitly_urls = explode("\r\n", $plain_urls);
// output: where should we write
$w_out = fopen("out.csv", "a+") or die("Unable to open file!");
foreach($bitly_urls as $bitly_url) {
$c = curl_init($bitly_url);
curl_setopt($c, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36');
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($c, CURLOPT_HEADER, 1);
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 20);
// curl_setopt($c, CURLOPT_PROXY, 'localhost:9150');
// curl_setopt($c, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
$r = curl_exec($c);
// get the redirect url:
$redirect_url = curl_getinfo($c)['redirect_url'];
// write output as csv
$out = '"'.$bitly_url.'";"'.$redirect_url.'"'."\n";
fwrite($w_out, $out);
}
fclose($w_out);
Have fun and enjoy!
pw