I have a script which takes a some.txt file and reads the links and return if my websites backlink is there or not. But the problem is, it is very slow and I want to increase its speed. Is there any way to increase its speed?
<?php
ini_set('max_execution_time', 3000);
$source = file_get_contents("your-backlinks.txt");
$needle = "http://www.submitage.com"; //without http as I have imploded the http later in the script
$new = explode("\n",$source);
foreach ($new as $check) {
$a = file_get_contents(trim($check));
if (strpos($a,$needle)) {
$found[] = $check;
} else {
$notfound[] = $check;
}
}
echo "Matches that were found: \n ".implode("\n",$found)."\n";
echo "Matches that were not found \n". implode("\n",$notfound);
?>
Your biggest bottleneck is the fact that you are executing the HTTP requests in sequence, not in parallel. curl is able to perform multiple requests in parallel. Here's an example from the documentation, heavily adapted to use a loop and actually collect the results. I cannot promise it's correct, I only promise I've followed the documentation correctly:
$mh = curl_multi_init();
$handles = array();
foreach($new as $check){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $check);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_multi_add_handle($mh,$ch);
$handles[$check]=$ch;
}
// verbatim from the demo
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
// end of verbatim code
for($handles as $check => $ch){
$a = curl_multi_getcontent($ch)
...
}
You won't be able to squeeze any more speed out of the operation by optimizing the PHP, except maybe some faux-multithreading solution.
However, you could create a queue system that would allow you to run the check as a background task. Instead of checking the URLs as you iterate through them, add them to the queue instead. Then write a cron script that grabs unchecked URLs from the queue one by one, checks if they contain a reference to your domain and saves the result.
Related
I'm using curl_multi to process multiple API requests in parallel.
However, I've noticed there is a lot of fluctuation in the time it takes to complete the requests.
Is this related to the speed of the APIs themselves, or the timeout I set on curl_multi_select? Right now it is 0.05. Should it be less? How can I know this process is finishing the requests as fast as possible without wasted time in between checks to see if they're done?
<?php
// Build the multi-curl handle, adding each curl handle
$handles = array(/* Many curl handles*/);
$mh = curl_multi_init();
foreach($handles as $curl){
curl_multi_add_handle($mh, $curl);
}
$running = null;
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh, 0.05); // Should this value be less than 0.05?
} while ($running > 0);
// Close the handles
foreach($results as $curl){
curl_multi_remove_handle($mh, $curl);
}
curl_multi_close($mh);
?>
current implementation of curl_multi_select() in php doesn't block and doesn't respect timeout parameter, maybe it will be fixed later. the proper way of waiting is not implemented in your code, it have to be 2 loops, i will post some tested code from my bot as an example:
$running = 1;
while ($running)
{
# execute request
if ($a = curl_multi_exec($this->murl, $running)) {
throw BotError::text("curl_multi_exec[$a]: ".curl_multi_strerror($a));
}
# check finished
if (!$running) {
break;
}
# wait for activity
while (!$a)
{
if (($a = curl_multi_select($this->murl, $wait)) < 0)
{
throw BotError::text(
($a = curl_multi_errno($this->murl))
? "curl_multi_select[$a]: ".curl_multi_strerror($a)
: 'system select failed'
);
}
usleep($wait * 1000000);# wait for some time <1sec
}
}
doing
$running = null;
for(;;){
curl_multi_exec($mh, $running);
if($running <1){
break;
}
curl_multi_select($mh, 1);
}
should be better, then you'll avoid a useless select() when nothing is running..
I have a list of domains which i want to check if they are active or not. I can check each one separately but I'm having hard time getting the batch to work.
$c200= array();
$c301= array();
$c302= array();
$urls=array();
foreach (new SplFileObject("oList.txt") as $line) {
$urls[]=$line;
}
//print_r($urls);
$mh = curl_multi_init();
foreach ($urls as $key => $value) {
$ch[$key] =curl_init($value);
curl_setopt($ch[$key], CURLOPT_HEADER, true);
curl_setopt($ch[$key], CURLOPT_TIMEOUT, 10);
curl_multi_add_handle($mh, $ch[$key]);
}
do{
curl_multi_exec($mh, $running);
curl_multi_select($mh);
}while ($running > 0);
foreach (array_keys($ch) as $key) {
echo curl_getinfo($ch[$key], CURLINFO_HTTP_CODE);
echo "\n";
curl_multi_remove_handle($mh, $ch[$key]);
}
curl_multi_close($mh)
I wrote the above code but it gives me zeros as output.
any help would be appreciated.
curl_errno() does not return the resulting code if it's used inside curl_multi.
It seems it is undocumented but if an error occures inside a curl_multi() then the resources will not have resulting error-code until curl_multi_info_read() is called. There is a referencing bug/documentation request: https://bugs.php.net/bug.php?id=79318&thanks=4
Original answer
Usually when I interfere with a 0 as response code then I have a local issue (dns, network, ssl, url..).
In order to dig further you can check if curl had an error with the execution. This can be checked by curl_errno() which returns an curl error number and curl_error() which will return an descriptive error string.
The error-code and error-message will be perhaps one of those you can find here: https://curl.haxx.se/libcurl/c/libcurl-errors.html
EDIT #2
If you work with curl_multi you need to call curl_multi_info_read() once to get the resulting codes. Below is an example how you can fetch the relevant result entry.
// your code...
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while ($running > 0);
while ($result = curl_multi_info_read($mh)) {
if ($result['result'] == CURLM_OK) {
echo 'Success: ' . curl_getinfo($result['handle'], CURLINFO_HTTP_CODE) . "\n";
} else {
echo 'Error: ' . curl_strerror($result['result']) . "\n";
}
}
A real test will now result the following:
$ php test.php
Error: Couldn't resolve host name
Success: 200
Success: 200
EDIT #3
Additionally it seems that calling curl_multi_info_read($mh) does the trick as well and populates internally the information into your existing handles/resources.
In my opinion this is a bit misleading. I will create a bug/documentation report to the php as I can't find anything about it. I just stumbled upon as I checked how guzzle did the low-level implementation of it.
// your code...
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while ($running > 0);
while ($result = curl_multi_info_read($mh)) {}
foreach($ch as $handle) {
echo "Handle: " . curl_errno($handle) . PHP_EOL;
}
Here's a typical multi-curl request example for PHP:
$mh = curl_multi_init();
foreach ($urls as $index => $url) {
$curly[$index] = curl_init($url);
curl_setopt($curly[$index], CURLOPT_RETURNTRANSFER, 1);
curl_multi_add_handle($mh, $curly[$index]);
}
// execute the handles
$running = null;
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while($running > 0);
// get content and remove handles
$result = array();
foreach($curly as $index => $c) {
$result[$index] = curl_multi_getcontent($c);
curl_multi_remove_handle($mh, $c);
}
// all done
curl_multi_close($this->mh);
The process involves 3 steps:
1. Preparing data
2. Sending requests and waiting until they're finished
3. Collecting responses
I'd like to split step #2 into 2 parts, first send all the requests and then collect all the response, and do some useful job instead of just waiting, for example, processing the responses of last group of requests.
So, how can I split this part of code
$running = null;
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while($running > 0);
into separate parts?
Send all the requests
Do some another job while waiting
Retrieve all the responses
I tried like this:
// Send the requests
$running = null;
curl_multi_exec($mh, $running);
// Do some job while waiting
// ...
// Get all the responses
do {
curl_multi_exec($mh, $running);
curl_multi_select($mh);
} while($running > 0);
but it doesn't seem to work properly.
I found the solution, here's how to launch all the requests without waiting the response
do {
curl_multi_exec($mh, $running);
} while (curl_multi_select($mh) === -1);
then we can do any other jobs and catch the responses any time later.
When I run the below code it seems to me curl_multi_select and curl_multi_info_read are contradicting each other. As I understand it curl_multi_select is supposed to be blocking until curl_multi_exec has a response but I haven't seen that actually happen.
$url = "http://google.com";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
$mc = curl_multi_init();
curl_multi_add_handle($mc, $ch);
do {
$exec = curl_multi_exec($mc, $running);
} while ($exec == CURLM_CALL_MULTI_PERFORM);
$ready=curl_multi_select($mc, 100);
var_dump($ready);
$info = curl_multi_info_read($mc,$msgs);
var_dump($info);
this returns
int 1
boolean false
which seems to contradict itself. How can it be ready and not have any messages?
The php version I'm using is 5.3.9
Basically curl_multi_select blocks until there is something to read or send with curl_multi_exec. If you loop around curl_multi_exec without using curl_multi_select this will eat up 100% of a CPU core.
So curl_multi_info_read is used to check if any transfer has ended (correctly or with an error).
Code using the multi handle should follow the following pattern:
do
{
$mrc = curl_multi_exec($this->mh, $active);
}
while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK)
{
curl_multi_select($this->mh);
do
{
$mrc = curl_multi_exec($this->mh, $active);
}
while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($info = curl_multi_info_read($this->mh))
{
$this->process_ch($info);
}
}
See also: Doing curl_multi_exec the right way.
From the spec:
Ask the multi handle if there are any messages or information from the individual transfers. Messages may include information such as an error code from the transfer or just the fact that a transfer is completed.
The 1 could mean there is activity, but not necessarily a message waiting: in this case probably that some of your download data is available, but not all. The example in the curl_multi_select doc explicitly tests for false values back from curl_multi_info_read.
i am finding a Curl function which can open particular no. of webpage open at a time also there will no output or returndata false will more good . I need to access 5-10 url at a same time .. I heard abt Curl Multi Threading but dont have proper function or class to use it ..
i find some by searching but most of them seems to be loop mean it i not using continuous connection just one after one ! I want something which can connect multiple connection at a time not one by one !
I made one :
function mutload($url){
if(!is_array($url)){
exit;
}
for($i=0;$i<count($url);$i++){
// create both cURL resources
$ch[] = curl_init();
$ch[] = curl_init();
// set URL and other appropriate options
curl_setopt($ch[$i], CURLOPT_URL, $url[$i]);
curl_setopt($ch[$i], CURLOPT_HEADER, 0);
curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, 0);
}
//create the multiple cURL handle
$mh = curl_multi_init();
for($i=0;$i<count($url);$i++){
//add the two handles
curl_multi_add_handle($mh,$ch[$i]);
}
$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}
//close the handles
for($i=0;$i<count($url);$i++){
curl_multi_remove_handle($mh, $ch[$i]);
}
curl_multi_close($mh);
}
ok ! but i m confused that will it connect all the urls at a time or one by one ! mre over i am geeting the content also i only want to connect or request to the site do not need ay content from there i used RETURNTRASFER,false but didnt work .. please hlep me thanks !
You're looking for the curl_multi_* family of functions. Have a look at curl_multi_exec.
Set CURLOPT_NOBODY to prevent curl from downloading any cotent.
I didn't test your code but curl_multi adds items to a queue from a loop and process them in parallel. Sometimes there can be issues if you are trying to load 100s of URLs, but it should be fine for a few URLs. If you have long DNS lookups or slow servers, all your results will have to wait for the slowest request.
This code is tested and should work, it is somewhat similar to yours:
http://www.onlineaspect.com/2009/01/26/how-to-use-curl_multi-without-blocking/