Facebook graph extremely slow in PHP - php

Whether using the Facebook PHP SDK, or just loading data using curl with $contents = file_get_contents("https://graph.facebook.com/$id?access_token=$accessToken"), it takes around a whole second for the response to come.
That counts as very slow when I need to check the data for a bunch of ids.
When in a browser, if I type in a facebook graph url, I get the results almost instantly, under a tenth of the time it takes in PHP.
What is causing this problem, and how can I make it as fast as it would be in any browser? I know the browser can do it. There has to be a way to make it fast in PHP too.
IDEA: perhaps I need to configure something in cURL?
What I have tried:
Using the PHP SDK. It's as slow. The reason I tried using file_get_contents() in the first place was because I was hoping the PHP SDK wasn't configured properly.
Using setopt($ch, CURLOPT_SSL_VERIFYPEER, false);. It didn't make a difference. AFTER ANSWER ACCEPT EDIT: actually, this together with reusing the curl handle made the subsequent requests really fast.
EDIT: here is a pastebin of the code I used to measure the time it takes to do the requests: http://pastebin.com/bEbuqq5g.
I corrected the text that used to say microseconds, to seconds. this is what produces results similar to the one I wrote in my comment in this question: Facebook graph extremely slow in PHP. Note also that they take similarly slow times even if the access token is expired, like in my pastebin example.
EDIT 2: there should be partly a problem with ssl. I tried benchmarking http://graph.facebook.com/4 (no httpS), and it resulted in 1.2 seconds for three requests, whereas the same, but with https took 2.2 seconds. This is in no way a solution though, because for any request that needs an access token, I must use https.

file_get_contents can be very slow in PHP because it doesn't send/process headers properly, leading to the HTTP connection not getting closed properly when the file transfer is complete. I have also read about DNS issues, though I don't have any information about that.
The solution that I highly recommend is to either use the PHP SDK, which is designed for making API calls to Facebook, or make use of cURL (which the SDK uses). With cURL you can really configure a lot of aspects of the request, since it's basically designed for making API calls like this.
PHP SDK information: https://developers.facebook.com/docs/reference/php/
PHP SDK source: https://github.com/facebook/facebook-php-sdk
If you choose to do it without the SDK, you could look at how they make use of cURL in base_facebook.php. here is some sample code you could use to fetch using cURL:
function get_url($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, FALSE); // Return contents only
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // return results instead of outputting
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10) // Give up after connecting for 10 seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 60); // Only execute 60s at most
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // Don't verify SSL cert
$response = curl_exec($ch);
curl_close($ch);
return $response;
}
$contents = get_url("https://graph.facebook.com/$id?access_token=$accessToken");
The function will return FALSE on failure.
I see that you said you've used the PHP SDK, but maybe you didn't have cURL set up. Try installing or updating it, and if it still seems to be slow, you should use
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
and check out the output.

I wondered what would happen if I did two subsequent curl_exec() calls without doing a curl_close(), enabling the use of HTTP Keep-Alive.
The test code:
$ch = curl_init('https://graph.facebook.com/xxx');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// FIRST REQUEST
curl_exec($ch);
print_r(curl_getinfo($ch));
// SECOND REQUEST
curl_exec($ch);
print_r(curl_getinfo($ch));
curl_close($ch);
Below are the results, showing parts of the output from curl_getinfo():
// FIRST REQUEST
[total_time] => 0.976259
[namelookup_time] => 0.008271
[connect_time] => 0.208543
[pretransfer_time] => 0.715296
// SECOND REQUEST
[total_time] => 0.253083
[namelookup_time] => 3.7E-5
[connect_time] => 3.7E-5
[pretransfer_time] => 3.9E-5
The first request is pretty slow, almost one whole second, similar to your experience. But from the time of the second request (only 0.25s) you can see how much difference the keep-alive made.
Your browser uses this technique as well of course, loading the page in a fresh instance of your browser would take considerably longer.

Just two thoughts:
Have you verified that the browser doesn't have a presistent connection to facebook? That the browser hasn't cached the DNS lookup (you could try adding graph.facebook.net to your hosts-file to rule in/out DNS)
You are of course running the php code from the same system/environment as your browser (not from a vm, not from another host? Also that php is running with the same scheduling priorties as your browser? (same nice level etc))

The overall biggest factor in making Graph API calls “slow” is – the HTTP connection.
Maybe there’s a little improvement in there by tweaking some parameters or getting a server with a better connection.
But this will most likely make no big difference, as HTTP is generally to be considered “slow”, and there’s little that can be done about this.
That counts as very slow when I need to check the data for a bunch of ids.
The best thing you can do to speed things up is, of course – minimize the number of HTTP requests.
If you have to do several Graph API calls in a row, try doing them as a Batch Request instead. That allows you to query several portions of data, while at the same time making only one HTTP request.

This is purly a speculation, however the cause might be that Facebook uses the SPDY protocol (not sure wheter that's true for the API). PHP is not able to load the page using the SPDY protocol.

Related

PHP : understand the CURL timeout

From a php page, i have to do a get to another php file.
I don't care to wait for the response of the get or know whether it is successful or not.
The file called could end the script also in 5-6 seconds, so i don't know how to handle the get timeout considering what has been said before.
The code is this
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://mywebsite/myfile.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
$content = trim(curl_exec($ch));
curl_close($ch);
For the first task (Where you don't need to wait for response )you can start new background process and below that write code which will redirect you on another page.
Yeah, you definitely shouldn't be creating a file on the server in response to a GET request. Even as a side-effect, it's less than ideal; as the main purpose of the request, it just doesn't make sense.
If you were doing this as a POST, you'd still have the same issue to work with, however. In that case, if the action can't be guaranteed to happen quickly enough to be acceptable in the context of HTTP, you'll need to hive it off somewhere else. E.g. make your HTTP request send a message to some other system which then works in parallel whilst the HTTP response is free to be sent back immediately.

Retrieve / send back HTTP headers with PHP / Curl

I have a HTML/PHP/JS page that I use for an automation process.
On load, it performs a curl request like :
function get_data($url) {
$curl = curl_init();
$timeout = 5;
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
$html = get_data($url);
Then it uses DOMDocument to retrieve a specific element on the remote page. My PHP code handles it, makes some operations, then stores it in a variable.
My purpose as you can guess is to simulate a "normal" connexion. To do so, I used the Tamper tool to see what requests are performed, when I was physically interacting with the remote page. HTTP headers are made of UA, cookies (among them, a session cookie), and so on. The only POST variable I have to send back is my PHP variable (you know, the one wich was calculated and stored in a PHP var). I also tested the process with Chrome, which allows me to copy/paste requests as curl.
My question is simple : is there a way to handle HTTP requests / cookies in a simple way ? Or do I have to retrieve them, parse them, store them and send them back "one by one" ?
Indeed, a request and a response are slightly different, but in this case they share many things in common. So I wonder if there is a way to explore the remote page as a browser would do, and interact with it, using for instance an extra PHP library.
Or maybe I'm doing it the wrong way and I should use other languages (PERL...) ?
The code shown above does not handle requests and cookies, I've tried but it was a bit too tricky to handle, hence I ask this question here :) I'm not lazy, but I wonder if there is a more simple way to achieve my goal.
Thanks for your advices, sorry for the english

PHP curl maximum execution time using hhvm

I am trying to download all the data from an api, so I am curling into it and saving the results a json file. But the execution stops and the results are truncated and never finishes.
How can this be remedied. Maybe the maximum execution time in the server of api cannot serve so long so it stops. I think there are more than 10000 results.
Is there a way to download the first 1000, 2nd 1000 results etc. and by the way, the api uses sails.js for their api,
Here is my code :
<?php
$url = 'http://api.example.com/model';
$data = array (
'app_id' => '234567890976',
'limit' => 100000
);
$fields_string = '';
foreach($data as $key=>$value) { $fields_string .= $key.'='.urlencode($value).'&'; }
$fields_string = rtrim($fields_string,'&');
$url = $url.'?'.$fields_string;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '300000000');
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
$response = curl_exec($ch);
print($response);
$file = fopen("results.json", 'w+'); // Create a new file, or overwrite the existing one.
fwrite($file, $response);
fclose($file);
curl_close($ch);
Lots of possible problems might be the cause. Without more details that help understand if the problem is on the client or server, such as with error codes or other info, it's hard to say.
Given that you are calling the API with a URL, what happens when you put your URL into a browser? If you get a good response in a browser then it seems likely the problem is with your local configuration and not with node/sails.
Here are a few ideas to see if the problem is local, but I'll admit I can't say any one is the right answer because I don't have enough information to do better:
Check your php.ini settings for memory_limit, max_execution_time and if you are using Apache, the httpd.conf timeout setting. A test using the URL in a browser is a way to see if these settings may help. If the browser downloads the response fine, start checking things like these settings for reasons your system is prematurely ending things.
If you are saving the response to disk and not manipulating the data, you could try removing CURLOPT_RETURNTRANSFER and instead use CURLOPT_FILE. This can be more memory efficient and (in my experience) faster if you don't need the data in-memory. See this article or this article on this site for info on how to do this.
Check what's in curl_errno if the script isn't crashing.
Related: what is your error reporting level? If error reporting is off...why haven't you turned it on as you debug this? If error reporting is on...are you getting any errors?
Given the way you are using foreach to construct a URL, I have to wonder if you are writing a really huge URL with up to 10,000 items in your query string. If so, that's a bad approach. In a situation like that, you could consider breaking up the requests into individual queries and then use curl_multi or the Rolling Curl library that uses curl_multi to do the work to queue and execute multiple requests. (If you are just making a single request and get one gigantic response with tons of detail, this won't be useful.)
Good luck.

php curl check if url is reachable before query

We're having problems with an api we are using.
Here is the code we're using (naming no names on the api front)
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://apiurl.com/whatever/api/we/call');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$ch_output = curl_exec($ch);
curl_close($ch);
This response times out, but not for ages. This is hideously slowing down our web app, and as such further code breaks because of the bad return value. This I can fix, however the response timeout I don't know how to fix. Is there any way to quickly see if a url is "responding" (e.g. something like ping in terminal) before trying to do a curl request?
Thank you.
Do you mean usingcurl_setopt($ch,CURLOPT_CONNECTTIMEOUT,NUMERIC_TIMEOUT_VALUE);to set the timeout?
Your best option would be to set the timeout on curl to a more acceptable level. There are several timeout options available for DNS lookup, connect timeout, transfer timeout, etc. More information is available here http://php.net/manual/en/function.curl-setopt.php

How I can get data after make a POST to an external HTTPS Web Page?

I need to make a POST in JSON format to an HTTPS web page in a remote server and receive an answer in JSON format.
The data to be send it to the remote server is take it from the URL (bar)<---Done in PHP
My problem is to send this data and receive an answer.
I tried making it in PHP, and HTML using cURL(php) and submit(html).
The results: In PHP I can't send anything.
In HTML I can submit the data, get an answer but I can't catch in my code.
I see the answer using Wireshark, and as I see the POST is make it after a negotiation protocol, and as I said I receive an answer(encoded due to HTTPS, I think).
Now I need receive that answer in my code to generate an URL link so I'm considering to use Java Script.
I never do something similar before.
Any suggestion will be appreciated, thanks.
I'm using the following code with not result but a 20 seconds of delay until a blank page.
<?php
$url = 'https://www.google.com/loc/json';
$body = '{"version":"1.1.0","cell_towers":[{"cell_id":"48","location_area_code":1158,"mobile_country_code":752,"mobile_network_code.":7,"age":0,"signal_strength":-71,"timing_advance":2255}]}';
$c = curl_init();
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_POST, true);
curl_setopt($c, CURLOPT_POSTFIELDS, $body);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
//curl_setopt($c, CURLOPT_HTTPHEADERS,'Content-Type: application/json');
$page = curl_exec($c);
echo($page);
//print_r($page);
curl_close($c);
?>
New info
I Just get new very important info
"The Gears Terms of Service prohibits direct use of the Google location server (http://www.google.com/loc/json) via HTTP requests. This service may only be accessed through the Geolocation API."
So, I was going trough the wrong way, and from now I will start to learn about Gears in order to apply the Gears API.
Cheers!
There's no real reason PHP couldn't do the PHP for you, if you set things up properly.
For instance, it may require a cookie that it had set on the client browser at some point, which your PHP/curl request doesn't have.
To do proper debugging, use HTTPFox or Firebug in Firefox, which monitor the requests from within the browser itself, and can show the actual data, not the encrypted garbage that wireshark would capture.
Of course, you could use the client browser as a sort of proxy for your server. Browser posts to the HTTPS server, gets a response, then sends that response to your server. But if that data is "important" and shouldn't be exposed, then the client-side solution is a bad one.

Categories