I'm using curl to scrap two websites, both of them with the same php script(that is ran every 30 min by a cron job). The request is very simple:
//website 1
$ch = curl_init();
$url = 'url';
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
//website 2
$ch2 = curl_init();
$url2 = 'url';
curl_setopt($ch2, CURLOPT_URL, $url2);
curl_setopt($ch2, CURLOPT_RETURNTRANSFER, true);
$result2 = curl_exec($ch2)
curl_close($ch2);
My question/s is: what is the best practice in cases like this to prevent running out of memory(didn't happen yet but who knows) and to maximize execution speed?
Is there a way to clean memory after each curl request?
Thank you! :D
Related
I'm doing a simple curl on this address: https://github.com/users/davidhariri/contributions_calendar_data
When i grab the result with this function:
function fetch_data($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$result = curl_exec($ch);
curl_close($ch);
print_r($result);
return $result;
}
The strings are correct, but the ints (the contributions) are wrong.
Results from curl
[...["2014/01/04",0],["2014/01/05",0],["2014/01/06",0],["2014/01/07",1],["2014/01/08",0]]
Results from just navigating to the address
[...["2014/01/04",0],["2014/01/05",0],["2014/01/06",1],["2014/01/07",5],["2014/01/08",5]]
Something during the curl process might be transforming ints to binary and back again? I have no idea what's happening here.
Check that you are not logged in in the browser. You may get different results if so.
I have a curl script that I have calling Rotten Tomatoes. Every time I run it, even in a for loop from 1 to 10, it runs infinitely. The only way to stop it is by restarting the server, the page continues to call the rotten tomatoes site until the server goes down. The curl script works for other APIs so it should work for this one. Here it is, any idea?:
For the $temp_movie, that gets its value and works properly.
$ch = curl_init();
$api_link = "http://api.rottentomatoes.com/api/public/v1.0/movies.json?apikey=****&q=".$temp_movie."&page_limit=1";
echo $api_link."<br>";
curl_setopt($ch, CURLOPT_URL, $api_link);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '3');
$content = trim(curl_exec($ch));
curl_close($ch);
$rottentomatoes = json_decode($content, true);
I have no idea why this worked but like I said, the curl script worked for other APIs, so I tried copy and pasting the same curl code (again) and trying again. Some reason this works? Is there any difference that I'm just not seeing?:
$ch = curl_init();
$api_link = "http://api.rottentomatoes.com/api/public/v1.0/movies.json?apikey=****&q=".$temp_movie."&page_limit=1";
curl_setopt($ch, CURLOPT_URL, $api_link);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '3');
$content = trim(curl_exec($ch));
curl_close($ch);
$rottentomatoes = json_decode($content, true);
I can't work out why this URL is not being found by CURL. The CURL engine is simply taken to a 400 error page.
My code is very simple and works fantastically with non-dynamic URLs.
I am hoping it's something easy to spot, for example, a missing CURL option.
I have tried using $url = urlencode($url) but that didn't work either.
Here's the code:
$url = 'http://www.destinations-uk.com/accommodations.php?link=accommodations&country=england&category=Reviews&id=1';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
$r = curl_exec($ch);
$r = explode("\n", $r);
$keys = array();
if(!empty($r)) $keys[] = array_shift($r);
foreach($r as $line){
preg_match('/.+:\s/',$line,$match);
if($match) $keys[substr($match[0],0,-2)] = preg_replace('/.+:\s/','', $line);
}
print_r($keys);
Perhaps, this is something on the server-side done to prevent automated requests.
I get files by their urls by this code
file_get_contents($_POST['url'];
Then I do something with them.
But I don't want to operate with big files, how do I limit size of received file?
It should throw an error if file is bigger than 500kb.
See my answer to this question. You need to have the cURL extension, with which you can make a HEAD HTTP request to the remote server. The response will let you know how big the file is, and you can then decide accordingly.
You are interested specifically in this line:
$size = curl_getinfo($ch, CURLINFO_CONTENT_LENGTH_DOWNLOAD);
Agree with #Jon
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_URL, $url); //specify the url
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$head = curl_exec($ch);
$size = curl_getinfo($ch,CURLINFO_CONTENT_LENGTH_DOWNLOAD);
if(<limit the $size>){
file_get_contents($url);
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $PathUrl);
curl_setopt($ch, CURLOPT_USERPWD, 'someuser:somepass');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
$info = curl_getinfo($ch);
Any ideas on why it works about 30% of the time and the other 70% if fails....viewing the url on any browser works all the time
You may be better off setting the Authorization header via CURLOPT_HTTPHEADER.
Eg, curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization' => 'user:pass'))
Edit: also, this may not apply because you say it works 30% of the time, but just be aware of common forms of encoding for Auth headers, eg, base64.