I have a set up where I have two servers running a thin-client (Apache, PHP). On Server A, it's consider a client machine and connects to Server B to obtain data via a Restful API. Both servers are on the same network. On Server B, the response of the request is shown below:
{
"code": 200,
"response_time": {
"time": 0.43,
"measure": "seconds"
}
}
Server B calculates the time completed for each task by using microseconds to flag the start and end of a request block. But when I use curl on Server A to make the call to the Server B, I get very strange results in terms on execution time:
$url = "https://example.com/api";
/*server B address. I've tried IP address as well without any change in results.
This must go over a SSL connection. */
$start_time = microtime(true);
$curl2 = curl_init();
curl_setopt($curl2, CURLOPT_URL, $url);
curl_setopt($curl2, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl2, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl2, CURLOPT_USERAGENT, "Server A User Agent");
$result = curl_exec($curl2);
$HttpCode = curl_getinfo($curl2, CURLINFO_HTTP_CODE);
$total_time = curl_getinfo($curl2, CURLINFO_TOTAL_TIME);
$connect_time = curl_getinfo($curl2, CURLINFO_CONNECT_TIME);
$namelookup_time = curl_getinfo($curl2, CURLINFO_NAMELOOKUP_TIME);
$end_time = microtime(true);
$timeDiff = round(((float)$end_time - (float)$start_time), 3);
I get the following for each Time Check:
$timeDiff = 18.7381 (Using Microseconds)
$total_time = 18.7381 (Transfer Time)
$connect_time = 0.020679
$namelookup_time = 0.004144
So I'm not sure why this is happening. Is there a better way to source data from another server in your network that holds your API? It would be like if Twitter's Site was consuming their API from another server that isn't the API server. I would think that the time for the curl to the API would be pretty similar to the time reported by the API. I understand there the API doesn't take into account network traffic and speed to open the connection - but 18 seconds versus 0.43 seems strange to me.
Any ideas here?
This is not the issue with curl anymore. Rather its the problem with your network setup. You can check this out by doing few things.
1) Use ping command to check the response time.
From Server-A: ping Server-B-IP
From Server-B: ping Server-A-IP
2) Similarly you can use the traceroute(for windows tracert) command to check the response time as well. You should get the response instantly.
From Server-A: traceroute Server-B-IP
From Server-B: traceroute Server-A-IP
3) Use wget or curl commandline to download a large file(let say 100 MB) From one server to another, and then check how long does they take. For example using wget:
From Server-B: wget http://server-A-IP/test/test-file.flv
From Server-A: wget http://server-B-IP/test/test-file.flv
4) Apart from these basic routine check, you can also use some advance tools to sort this network problem out. For example the commands/examples from the following two links:
Test network connection performance between two Linux servers
Command line tool to test bandwidth between 2 servers
I had the same problem about 3 days ago. I've wasted an entire afternoon to find the problem. At the end I contacted my server provider and told him the problem. He said, that this is not a problem of my script, but of the carrier (network).
Maybe it is the same problem I had, so contact your server provider and ask him.
Did you tried it with file_get_contents? It would be interesting if the response time is the same with it.
Related
I have an XML file localy. It contains data from marketplace.
It roughly looks like this:
<offer id="2113">
<picture>https://anotherserver.com/image1.jpg</picture>
<picture>https://anotherserver.com/image2.jpg</picture>
</offer>
<offer id="2117">
<picture>https://anotherserver.com/image3.jpg</picture>
<picture>https://anotherserver.com/image4.jpg</picture>
</offer>
...
What I want is to save those images in <picture> node localy.
There are about 9,000 offers and about 14,000 images.
When I iterate through them I see that images are being copied from that another server but at some point it gives 504 Gateway Timeout.
Thing is that sometimes error is given after 2,000 images sometimes way more or less.
I tried getting only one image 12,000 times from that server (i.e. only https://anotherserver.com/image3.jpg) but it still gave the same error.
As I've read, than another server is blocking my requests after some quantity.
I tried using PHP sleep(20) after every 100th image but it still gave me the same error (sleep(180) - same). When I tried local image but with full path it didn't gave any errors. Tried second server (non local) the same thing occured.
I use PHP copy() function to move image from that server.
I've just used file_get_contents() for testing purposes but got the same error.
I have
set_time_limit(300000);
ini_set('default_socket_timeout', 300000);
as well but no luck.
Is there any way to do this without chunking requests?
Does this error occur on some one image? Would be great to catch this error or just keep track of the response delay to send another request after some time if this can be done?
Is there any constant time in seconds that I have to wait in order to get those requests rollin'?
And pls give me non-curl answers if possible.
UPDATE
Curl and exec(wget) didn't work as well. They both gone to same error.
Can remote server be tweaked so it doesn't block me? (If it does).
p.s. if I do: echo "<img src = 'https://anotherserver.com/image1.jpg'" /> in loop for all 12,000 images, they show up just fine.
Since you're accessing content on a server you have no control over, only the server administrators know the blocking rules in place.
But you have a few options, as follows:
Run batches of 1000 or so, then sleep for a few hours.
Split the request up between computers that are requesting the information.
Maybe even something as simple as changing the requesting user agent info every 1000 or so images would be good enough to bypass the blocking mechanism.
Or some combination of all of the above.
I would suggest you to try following
1. reuse previously opened connection using CURL
$imageURLs = array('https://anotherserver.com/image1.jpg', 'https://anotherserver.com/image2.jpg', ...);
$notDownloaded = array();
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
foreach ($imageURLs as $URL) {
$filepath = parse_url($URL, PHP_URL_PATH);
$fp = fopen(basename($filepath), "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_URL, $URL);
curl_exec($ch);
fclose($fp);
if (curl_getinfo($ch, CURLINFO_RESPONSE_CODE) == 504) {
$notDownloaded[] = $URL;
}
}
curl_close($ch);
// check to see if $notDownloaded is empty
If images are accessible via both https and http try to use http instead. (this will at least speed up the downloading)
Check response headers when 504 is returned as well as when you load url your browser. Make sure there are no X-RateLimit-* headers. BTW what is the response headers actually?
I have a PHP script that connects to an URL through cURL and then does something, depending on the returned HTTP status code:
$ch = curl_init();
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_URL => $url,
CURLOPT_USERAGENT => "What?!?"
);
curl_setopt_array($ch, $options);
$out = curl_exec($ch);
$code = curl_getinfo($ch)["http_code"];
curl_close($ch);
if ($code == "200") {
echo "200";
} else {
echo "not 200";
}
Some webservers are slow to reply, and although the page is loaded in my browser after a few seconds my script, when it tries to connect to that server, tells me that it did not receive a positive ("200") reply. So, apparently, the connection initiated by cURL timed out.
But why? I don't set a timeout in my script, and according to other answers on this site the default timeout for cURL is definitely longer than the three or four seconds it takes for the page to load in my browser.
So why does the connecion time out, and how can I get it to last longer, if, apparently, it is already set to infinite?
Notes:
The same URL doesn't always time out. So sometimes cURL can connect.
It is not one specific URL that sometimes times out, but different URLs at different times.
I'm on a shared server, so I don't have root access to any files.
I tried to look at curl_getinfo($ch) and curl_error($ch) – as per #drew010's suggestion in the comments – but both were empty whenever the problem happened.
The whole script runs for a little more than one minute. In this time it connects to 300+ URLs successfully. Even when one of the URLs fails, the other connections are successfully made. So the script does not time out.
cURL does not time out either, because when I try to connect to an URL with a script sleeping for 59 seconds, cURL successfully connects. So apparently the slowness of the failing URL is not a problem in itself for cURL.
Update
Following #Karlos' suggestion in his answer, I used:
CURLOPT_VERBOSE => 1,
CURLOPT_STDERR => $curl_log
(using code from this answer) and found the following in $curl_log when an URL failed (URL and IP changed):
* About to connect() to www.somesite.com port 80 (#0)
* Trying 104.16.37.249... * connected
* Connected to www.somesite.com (104.16.37.249) port 80 (#0)
GET /wp_german/?feed=rss2 HTTP/1.1
User-Agent: myURL
Host: www.somesite.com
Accept: */*
* Recv failure: Connection reset by peer
* Closing connection #0
So, I have found the why – thank you #Karlos! – and apparently #Axalix was right and it is a network problem. I'll now follow suggestions given on this site for that kind of failure. Thanks to everyone for their help!
My experience working with curl showed me that sometimes when using the option:
CURLOPT_RETURNTRANSFER => true
the server might not give a successful reply or, at least, a successful reply within the timeframe that curl has to receive the response and cache it, so the results are returned by the curl into the variable you assign. In your code:
$out = curl_exec($ch);
In this stackoverflow question CURLOPT_RETURNTRANSFER set to true doesnt work on hosting server, you can see that that the option CURLOPT_RETURNTRANSFER is directly affected by the requested host web server implementation.
As you are using explicitly the response body, and your code relies on the response headers, a good way to solve this might be to:
CURLOPT_RETURNTRANSFER => false
and execute the curl code to work on the response headers.
Once you have the header with the code you are interested, you could run a php script that echoes the curl response and parse it by yourself:
<?php
$url=isset($_GET['url']) ? $_GET['url'] : 'http://www.example.com';
$ch= curl_init();
$options = array(
CURLOPT_RETURNTRANSFER => false,
CURLOPT_URL => $url,
CURLOPT_USERAGENT => "myURL"
);
curl_setopt_array($ch, $options);
curl_exec($ch);
curl_close($ch);
?>
In any case the reply to your question why your request does not get an error, I guess that the use of the option CURLOPT_NOSIGNAL and the different timeout options explained in the set_opt php manual might get you closer to it.
In order to dig further, the option CURLOPT_VERBOSE might help you to have extra information about the request behavior through the STDERR.
The reason may be your hosting provider is imposing some limits on outgoing connections.
Here is what can be done to secure your script:
Create a queue in DB with all the URLs that need to be fetched.
Run cron every minute or 5 minutes, take a few URLs from DB - mark them as in progress.
Try to fetch those URLs. Mark every fetched URL as success in DB.
Increment failure count for unsuccessful ones.
Continue going through queue until its empty.
If you implement such a solution you will be able to process every single URL under any unfavourable conditions.
I'm trying to make a PHP script that will check the HTTP status of a website as fast as possible.
I'm currently using get_headers() and running it in a loop of 200 random urls from mysql database.
To check all 200 - it takes an average of 2m 48s.
Is there anything I can do to make it (much) faster?
(I know about fsockopen - It can check port 80 on 200 sites in 20s - but it's not the same as requesting the http status code because the server may responding on the port - but might not be loading websites correctly etc)
Here is the code..
<?php
function get_httpcode($url) {
$headers = get_headers($url, 0);
// Return http status code
return substr($headers[0], 9, 3);
}
###
## Grab task and execute it
###
// Loop through task
while($data = mysql_fetch_assoc($sql)):
$result = get_httpcode('http://'.$data['url']);
echo $data['url'].' = '.$result.'<br/>';
endwhile;
?>
You can try CURL library. You can send multiple request parallel at same time with CURL_MULTI_EXEC
Example:
$ch = curl_init('http_url');
curl_setopt($ch, CURLOPT_HEADER, 1);
$c = curl_exec($ch);
$info = curl_getinfo($ch, CURLINFO_HTTP_CODE);
print_r($info);
UPDATED
Look this example. http://www.codediesel.com/php/parallel-curl-execution/
I don't know if this is an option that you can consider, but you could run all of them almost at the same using a fork, this way the script will take only a bit longer than one request
http://www.php.net/manual/en/function.pcntl-fork.php
you could add this in a script that is ran in cli mode and launch all the requests at the same time, for example
Edit: you say that you have 200 calls to make, so a thing you might experience is the database connection loss. the problem is caused by the fact that the link is destroyed when the first script completes. to avoid that you could create a new connection for each child. I see that you are using the standard mysql_* functions so be sure to pass the 4th parameter to be sure you create a new link each time. also check the maximum number of simultaneous connections on your server
I'm not very expert to PHP. I want to know how to communicate between 2 web servers. For clearance, (from 1st Server) run a function (querying) on remote server. And return the result to 1st server.
Actually the theme will be:
Web Server (1) ----------------> Web Server (2) ---------------> Database Server
Web Server (1) <---------------- Web Server (2) <--------------- Database Server
Query Function() will be only located on Web Server (2). Then i need to run that query function() remotely from Web Server (1).
What is it call? And Is it possible?
Yes.
A nice way I can think of doing would be to send a request to the 2nd server via a URL. In the GET (or POST) parameters, specify which method you'd like to call, and (for security) some sort of hash that changes with time. The hash in there to ensure no third-party can run the function arbitrarily on the 2nd server.
To send the request, you could use cURL:
function get_url($request_url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $request_url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
curl_close($ch);
return $response;
}
This sends a GET request. You can then use:
$request_url = 'http://second-server-address/listening_page.php?function=somefunction&securityhash=HASH';
$response = get_url($request_url);
On your second server, set up the listening_page.php (with whatever filename you like, of course) that checks for GET requests and verifies the integrity of the request (i.e. the hash, correct & valid params).
You can do so by using an API. create a page on second server that takes variables and communicates to the server using those vars (depending on what you need). and the standard reply from that page should be either JSON or XML. then read that from server 1 by requesting that file and getting the reply from the 2nd server.
*NOTE if its a private file, make sure you use an authentication method to prevent users from accessing the file
What you are aiming to do is definitely possible. You will need to set up some sort of api in order for server one to make a request to server 2.
I suggest you read up on SOAP and REST api
http://www.netmagazine.com/tutorials/make-your-own-soap-api
Generally you will use something like CURL to contact server 2 from server 1.
Google curl and you should quickly get idea.
Its not going to be easy to give you a complete solution so I hope this nudge in the right direction is helpful.
I am using PHP and CURL to make HTTP reverse geocoding (lat, long -> address) requests to Google Maps. I have a premier account, so we can make a lot of a requests without being throttled or blocked.
Unfortunately, I have reached a performance limit. We get approximately 500,000 requests daily that need to be reverse geocoded.
The code is quite trivial (I will write pieces in pseudo-code) for the sake of saving time and space. The following code fragment is called every 15 seconds via a job.
<?php
//get requests from database
$requests = get_requests();
foreach($requests as $request) {
//build up the url string to send to google
$url = build_url_string($request->latitude, $request->longitude);
//make the curl request
$response = Curl::get($url);
//write the response address back to the database
write_response($response);
}
class Curl {
public static function get($p_url, $p_timeout = 5) {
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, $p_url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, $p_timeout);
curl_setopt($curl_handle, CURLOPT_TIMEOUT, $p_timeout);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($curl_handle);
curl_close($curl_handle);
return $response;
}
}
?>
The performance problem seems to be the CURL requests. They are extremely slow, probably because its making a full HTTP request every operations. We have a 100mbps connection, but the script running at full speed is only utilizing about 1mbps. The load on the server is essentially nothing. The server is a quad core, with 8GB of memory.
What things can we do to increase the throughput of this? Is there a way to open a persistent (keep-alive) HTTP request with Google Maps? How about exploding the work out horizontally, i.e. making 50 concurrent requests?
Thanks.
some things I would do:
no matter how "premium" you are, doing external http-requests will always be a bottleneck, so for starters, cache request+response - you can still update them via cron on a regular basis
these are single http requests - you will never get "fullspeed" with them especially if request and response are that small (< 1MB) - tcp/handshaking/headers/etc.
so try using multicurl (if your premium allows it) in order to start multiple requests - this should give you fullspeed ;)
add "Connection: close" in the request header you send, this will immediately close the http connection so your and google's server won't get hammered with halfopen
Considering you are running all your requests sequentially you should look into dividing the work up onto multiple machines or processes. Then each can be run in parallel. Judging by your benchmarks you are limited by how fast each Curl response takes, not by CPU or bandwidth.
My first guess is too look at a queuing system (Gearman, RabbitMQ).