Slow HTTP POST request in php

Slow HTTP POST request in php - php

I'm trying to POSTing some data (a JSON string) from a php script to a java server (all written by myself) and getting the response back.
I tried the following code:
$url="http://localhost:8000/hashmap";
$opts = array('http' => array('method' => 'POST', 'content' => $JSONDATA,'header'=>"Content-Type: application/x-www-form-urlencoded"));
$st = stream_context_create($opts);
echo file_get_contents($url, false,$st);
Now, this code actually works (I get back as result the right answer), but file_get_contents hangs everytime 20 seconds while being executed (I printed the time before and after the instruction). The operations performed by the server are executed in a small amount of time, and I'm sure it's not normal to wait all this time to get the response.
Am I missing something?

Badly mis-configured server maybe that doesn't send the right content-size and using HTTP/1.1.
Either fix the server or request the data as HTTP/1.0

Try adding Connection: close and Content-Length: strlen($JSONDATA) headers to the $opts.
Also, if you want to avoid using extensions, have a look at this class I wrote some time ago to perform HTTP requests using PHP core only. It works on PHP4 (which is why I wrote it) and PHP5, and the only extension it ever requires is OpenSSL, and you only need that if you want to do an HTTPS request. Documented(ish) in comments at the top.
Supports all sorts of stuff - GET, POST, PUT and more, including file uploads, cookies, automatic redirect handling. I have used it quite a lot on a platform I work with regularly that is stuck with PHP/4.3.10 and it works beautifully... Even if I do say so myself...

Related

API request works from local machine, not on server

I have an interesting situation when calling the Shopify API. I use the standard procedure for calling the url and get the data, like this:
define('SHOPIFY_SHOP', 'myteststore.myshopify.com');
define('SHOPIFY_APP_API_KEY', 'xxxx');
define('SHOPIFY_APP_PASSWORD', 'yyy');
$shop_url = 'https://'.SHOPIFY_APP_API_KEY.':'.SHOPIFY_APP_PASSWORD.'#'.SHOPIFY_SHOP;
$response = Requests::get($shop_url.'/admin/products.json');
And I correctly get the response, parse the data and all works great. Now, when I put it to the actual server (Ubuntu 12.04), I noticed a weird message from the Spotify API:
[API] Invalid API key or access token (unrecognized login or wrong password)
I tried creating a new app, but still its the same. So the same file and the same set works on my machine, but not on the server. (only difference in the file is the path to requests library, require_once './Requests/library/Requests.php'; for Linux and require_once '..\Requests\library\Requests.php'; for Windows) As stated, I use the requests library and I assume there has to be some trick where the library (or something else) rewrites the URl and it doesn't get to Shopify correctly.
I tried using CURL with the URL directly, and it works that way as well. Can anyone point me what might be causing this?
Update: I moved to another library which solved the issue, but would like to know what was causing this since I had great experience with Requests up to this point.

I'm starting to use the same lib, and I stumbled upon something relevant right after finding this question:
https://github.com/rmccue/Requests/issues/142#issuecomment-147276906
Quoting relevant part:
This is an intentional part of the API design; in a typical use case,
you won't necessarily need data sent along with a request. Building
the URL for you is just a convenience.
Requests::get is a helper function designed to make GET requests
lightweight in the code, which is why there's no $data parameter
there. If you need to send data, use Requests::request instead
$response = Requests::request( 'http://httpbin.org/get', $headers, $data, Requests::GET, $options );
// GET is the default for type, and $options can be blank, so this can be shortened:
$response = Requests::request( 'http://httpbin.org/get', $headers, $data );

I couldn't figure why is this happening, it appears the Requests library is stripping the parameters from GET requests, so I moved to unirest library and this solved the issue.

How to receive a file via HTTP PUT with PHP

This is something that has been bugging me for a while.. I'm building of a RESTful API that has to receive files on some occasions.
When using HTTP POST, we can read data from $_POST and files from $_FILES.
When using HTTP GET, we can read data from $_GET and files from $_FILES.
However, when using HTTP PUT, AFAIK the only way to read data is to use the php://input stream.
All good and well, untill I want to send a file over HTTP PUT. Now the php://input stream doesn't work as expected anymore, since it has a file in there as well.
Here's how I currently read data on a PUT request:
(which works great as long as there are no files posted)
$handle = fopen('php://input', 'r');
$rawData = '';
while ($chunk = fread($handle, 1024)) {
$rawData .= $chunk;
}
parse_str($rawData, $data);
When I then output rawData, it shows
-----ZENDHTTPCLIENT-44cf242ea3173cfa0b97f80c68608c4c
Content-Disposition: form-data; name="image_01"; filename="lorem-ipsum.png"
Content-Type: image/png; charset=binary
�PNG
���...etc etc...
���,
-----ZENDHTTPCLIENT-8e4c65a6678d3ef287a07eb1da6a5380
Content-Disposition: form-data; name="testkey"
testvalue
-----ZENDHTTPCLIENT-8e4c65a6678d3ef287a07eb1da6a5380
Content-Disposition: form-data; name="otherkey"
othervalue
Does anyone know how to properly receive files over HTTP PUT, or how to parse files out of the php://input stream?
===== UPDATE #1 =====
I have tried only the above method, don't really have a clue as to what I can do else.
I have gotten no errors using this method, besides that I don't get the desired result of the posted data and files.
===== UPDATE #2 =====
I'm sending this test request using Zend_Http_Client, as follows:
(haven't had any problems with Zend_Http_Client so far)
$client = new Zend_Http_Client();
$client->setConfig(array(
'strict' => false,
'maxredirects' => 0,
'timeout' => 30)
);
$client->setUri( 'http://...' );
$client->setMethod(Zend_Http_Client::PUT);
$client->setFileUpload( dirname(__FILE__) . '/files/lorem-ipsum.png', 'image_01');
$client->setParameterPost(array('testkey' => 'testvalue', 'otherkey' => 'othervalue');
$client->setHeaders(array(
'api_key' => '...',
'identity' => '...',
'credential' => '...'
));
===== SOLUTION =====
Turns out I made some wrong assumptions, mainly that HTTP PUT would be similar to HTTP POST. As you can read below, DaveRandom explained to me that HTTP PUT is not meant for transferring multiple files on the same request.
I have now moved the transferring of formdata from the body to url querystring. The body now holds the contents of a single file.
For more information, read DaveRandom's answer. It's epic.

The data you show does not depict a valid PUT request body (well, it could, but I highly doubt it). What it shows is a multipart/form-data request body - the MIME type used when uploading files via HTTP POST through an HTML form.
PUT requests should exactly compliment the response to a GET request - they send you the file contents in the message body, and nothing else.
Essentially what I'm saying is that it is not your code to receive the file that is wrong, it is the code that is making the request - the client code is incorrect, not the code you show here (although the parse_str() call is a pointless exercise).
If you explain what the client is (a browser, script on other server, etc) then I can help you take this further. As it is, the appropriate request method for the request body that you depict is POST, not PUT.
Let's take a step back from the problem, and look at the HTTP protocol in general - specifically the client request side - hopefully this will help you understand how all of this is supposed to work. First, a little history (if you're not interested in this, feel free to skip this section).
History
HTTP was originally designed as a mechanism for retrieving HTML documents from remote servers. At first it effectively supported only the GET method, whereby the client would request a document by name and the server would return it to the client. The first public specification for HTTP, labelled as HTTP 0.9, appeared in 1991 - and if you're interested, you can read it here.
The HTTP 1.0 specification (formalised in 1996 with RFC 1945) expanded the capabilities of the protocol considerably, adding the HEAD and POST methods. It was not backwards compatible with HTTP 0.9, due to a change in the format of the response - a response code was added, as well as the ability to include metadata for the returned document in the form of MIME format headers - key/value data pairs. HTTP 1.0 also abstracted the protocol from HTML, allowing for the transfer of files and data in other formats.
HTTP 1.1, the form of the protocol that is almost exclusively in use today is built on top of HTTP 1.0 and was designed to be backwards compatible with HTTP 1.0 implementations. It was standardised in 1999 with RFC 2616. If you are a developer working with HTTP, get to know this document - it is your bible. Understanding it fully will give you a considerable advantage over your peers who do not.
Get to the point already
HTTP works on a request-response architecture - the client sends a request message to the server, the server returns a response message to the client.
A request message includes a METHOD, a URI and optionally, a number of HEADERS. The request METHOD is what this question relates to, so it is what I will cover in the most depth here - but first it is important to understand exactly what we mean when we talk about the request URI.
The URI is the location on the server of the resource we are requesting. In general, this consists of a path component, and optionally a query string. There are circumstances where other components may be present as well, but for the purposes of simplicity we shall ignore them for now.
Let's imagine you type http://server.domain.tld/path/to/document.ext?key=value into the address bar of your browser. The browser dismantles this string, and determines that it needs to connect to an HTTP server at server.domain.tld, and ask for the document at /path/to/document.ext?key=value.
The generated HTTP 1.1 request will look (at a minimum) like this:
GET /path/to/document.ext?key=value HTTP/1.1
Host: server.domain.tld
The first part of the request is the word GET - this is the request METHOD. The next part is the path to the file we are requesting - this is the request URI. At the end of this first line is an identifier indicating the protocol version in use. On the following line you can see a header in MIME format, called Host. HTTP 1.1 mandates that the Host: header be included with every request. This is the only header of which this is true.
The request URI is broken into two parts - everything to the left of the question mark ? is the path, everything to the right of it is the query string.
Request Methods
RFC 2616 (HTTP/1.1) defines 8 request methods.
OPTIONS
The OPTIONS method is rarely used. It is intended as a mechanism for determining what kind of functionality the server supports before attempting to consume a service the server may provide.
Off the top of my head, the only place in fairly common usage that I can think of where this is used is when opening documents in Microsoft office directly over HTTP from Internet Explorer - Office will send an OPTIONS request to the server to determine if it supports the PUT method for the specific URI, and if it does it will open the document in a way that allows the user to save their changes to the document directly back to the remote server. This functionality is tightly integrated within these specific Microsoft applications.
GET
This is by far and away the most common method in every day usage. Every time you load a regular document in your web browser it will be a GET request.
The GET method requests that the server return a specific document. The only data that should be transmitted to the server is information that the server requires to determine which document should be returned. This can include information that the server can use to dynamically generate the document, which is sent in the form of headers and/or query string in the request URI. While we're on the subject - Cookies are sent in the request headers.
HEAD
This method is identical to the GET method, with one difference - the server will not return the requested document, if will only return the headers that would be included in the response. This is useful for determining, for example, if a particular document exists without having to transfer and process the entire document.
POST
This is the second most commonly used method, and arguably the most complex. POST method requests are almost exclusively used to invoke some actions on the server that may change its state.
A POST request, unlike GET and HEAD, can (and usually does) include some data in the body of the request message. This data can be in any format, but most commonly it is a query string (in the same format as it would appear in the request URI) or a multipart message that can communicate key/value pairs along with file attachments.
Many HTML forms use the POST method. In order to upload files from a browser, you would need to use the POST method for your form.
The POST method is semantically incompatible with RESTful APIs because it is not idempotent. That is to say, a second identical POST request may result in a further change to the state of the server. This contradicts the "stateless" constraint of REST.
PUT
This directly complements GET. Where a GET requests indicates that the server should return the document at the location specified by the request URI in the response body, the PUT method indicates that the server should store the data in the request body at the location specified by the request URI.
DELETE
This indicates that the server should destroy the document at the location indicated by the request URI. Very few internet facing HTTP server implementations will perform any action when they receive a DELETE request, for fairly obvious reasons.
TRACE
This provides an application-layer level mechanism to allow clients to inspect the request it has sent as it looks by the time it reaches the destination server. This is mostly useful for determining the effect that any proxy servers between the client and the destination server may be having on the request message.
CONNECT
HTTP 1.1 reserves the name for a CONNECT method, but does not define its usage, or even its purpose. Some proxy server implementations have since used the CONNECT method to facilitate HTTP tunnelling.

I've never tried using PUT (GET POST and FILES were sufficient for my needs) but this example is from the php docs so it might help you (http://php.net/manual/en/features.file-upload.put-method.php):
<?php
/* PUT data comes in on the stdin stream */
$putdata = fopen("php://input", "r");
/* Open a file for writing */
$fp = fopen("myputfile.ext", "w");
/* Read the data 1 KB at a time
and write to the file */
while ($data = fread($putdata, 1024))
fwrite($fp, $data);
/* Close the streams */
fclose($fp);
fclose($putdata);
?>

Here is the solution that I found to be the most useful.
$put = array();
parse_str(file_get_contents('php://input'), $put);
$put will be an array, just like you are used to seeing in $_POST, except now you can follow true REST HTTP protocol.

Use POST and include an X- header to indicate the actual method (PUT in this case). Usually this is how one works around a firewall which does not allow methods other than GET and POST. Simply declare PHP buggy (since it refuses to handle multipart PUT payloads, it IS buggy), and treat it as you would an outdated/draconian firewall.
The opinions as to what PUT means in relation to GET are just that, opinions. The HTTP makes no such requirement. It simply states 'equivalent' .. it is up to the designer to determine what 'equivalent' means. If your design can accept a multi-file upload PUT and produce an 'equivalent' representation for a subsequent GET for the same resource, that's just fine and dandy, both technically and philosophically, with the HTTP specifications.

Just follow what it says in the DOC:
<?php
/* PUT data comes in on the stdin stream */
$putdata = fopen("php://input", "r");
/* Open a file for writing */
$fp = fopen("myputfile.ext", "w");
/* Read the data 1 KB at a time
and write to the file */
while ($data = fread($putdata, 1024))
fwrite($fp, $data);
/* Close the streams */
fclose($fp);
fclose($putdata);
?>
This should read the whole file that is on the PUT stream and save it locally, then you could do what you want with it.

How to load a URL and only get back the last 20k of it

I have been made aware of the Accept-Range header.
I have a URL that I am calling that always returns a 2mb file. I don't need this much and only need the last section 20-50k.
I am not sure how to go about using it? Would I need to use cURL? I am currently using file_get_contents().
Would someone be able to provide me with an example / tutorial?
Thanks.
EDIT: If this isn't possible then what is post on about? Here ...
EDIT: Ulrika! I'm not insane.

This is possible using the Range header, provided the server supports it. See the HTTP 1.1 spec. You would want to send a header in the following format in your request:
Range: bytes=-50000
This would give you the last 50,000 bytes. Adjust to whatever you need.
You can specify this header in file_get_contents using a context. For example:
// Create a stream
$opts = array(
'http'=>array(
'method' => "GET",
'header' => "Range: bytes=-50000\r\n"
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);

If you were to file_get_contents() and dump that to a passthrough 'cache' file on disk, then you could use the unix/linux tail -c to only grab back the last 20kb or so. This doesn't mitigate the actual transfer, but gets that 20kb into the application.

This is indeed possible - see this question for an example of the HTTP headers sent and received

you can't do that. You're going to have to load the entire file (which is sent in its entirety, sequentially, by the source server), and just discard most of it.
What you're asking is like "I'm tuning to this radio station on my car stereo and I only want to hear the last 5 minutes of the show, without having to wait for the rest to complete or change channels".

How to send cookies with file_get_contents

I'm trying to get the contents from another file with file_get_contents (don't ask why).
I have two files: test1.php and test2.php. test1.php returns a string, bases on the user that is logged in.
test2.php tries to get the contents of test1.php and is being executed by the browser, thus getting the cookies.
To send the cookies with file_get_contents, I create a streaming context:
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"))`;
I'm retrieving the contents with:
$contents = file_get_contents("http://www.example.com/test1.php", false, $opts);
But now I get the error:
Warning: file_get_contents(http://www.example.com/test1.php) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
Does somebody knows what I'm doing wrong here?
edit:
forgot to mention: Without the streaming_context, the page just loads. But without the cookies I don't get the info I need.

First, this is probably just a typo in your question, but the third arguments to file_get_contents() needs to be your streaming context, NOT the array of options. I ran a quick test with something like this, and everything worked as expected
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"));
$context = stream_context_create($opts);
$contents = file_get_contents('http://example.com/test1.txt', false, $context);
echo $contents;
The error indicates the server is returning a 404. Try fetching the URL from the machine PHP is running on and not from your workstation/desktop/laptop. It may be that your web server is having trouble reaching the site, your local machine has a cached copy, or some other network screwiness.
Be sure you repeat your exact request when running this test, including the cookie you're sending (command line curl is good for this). It's entirely possible that the page in question may load fine in a browser without the cookie, but when you send the cookie the site actually is returning a 404.
Make sure that $_SERVER['HTTP_COOKIE'] has the raw cookie you think it does.
If you're screen scraping, download Firefox and a copy of the LiveHTTPHeaders extension. Perform all the necessary steps to reach whatever page it is you want in Firefox. Then, using the output from LiveHTTPHeaders, recreate the exact same request requence. Include every header, not just the cookies.
Finally, PHP Curl exists for a reason. If at all possible, (I'm not asking!) use it instead. :)

Just to share this information.
When using session_start(), the session file is lock by PHP. Thus the actual script is the only script that can access the session file. If you try to access it via fsockopen() or file_get_contents() you can wait a long time since you try to open a file that has been locked.
One way to solve this problem is to use the session_write_close() to unlock the file and relock it after with session_start().
Example:
<?php
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"));
$context = stream_context_create($opts);
session_write_close(); // unlock the file
$contents = file_get_contents('http://120.0.0.1/controler.php?c=test_session', false, $context);
session_start(); // Lock the file
echo $contents;
?>
Since file_get_contents() is a blocking function, both script won't be in concurrency while trying to modify the session file.
But i'm sure this is not the best manner to manipulate session with an extend connection.
Btw: it's faster than cURL and fsockopen()
Let me know if you find something better.

Just out of curiosity, are you attempting file_get_contents on a page that has a space in it? I remember trying to use fgc on a URL that had a space in the name and while my web browser parsed it just fine, fgc didn't. I ended up having to use a str_replace to replace ' ' with '%20'.
I would think that this should have been relatively easy to spot that though as it would report only half of the filename. Also, I noticed in one of these posts, someone used \r\n while defining the headers. Keep in mind that PHP doesn't like these to be in single quotes, but they work fine in double.

Make sure that file1.php exists on the server. Try opening it in your own browser to make sure!

How do I check for valid (not dead) links programmatically using PHP?

Given a list of urls, I would like to check that each url:
Returns a 200 OK status code
Returns a response within X amount of time
The end goal is a system that is capable of flagging urls as potentially broken so that an administrator can review them.
The script will be written in PHP and will most likely run on a daily basis via cron.
The script will be processing approximately 1000 urls at a go.
Question has two parts:
Are there any bigtime gotchas with an operation like this, what issues have you run into?
What is the best method for checking the status of a url in PHP considering both accuracy and performance?

Use the PHP cURL extension. Unlike fopen() it can also make HTTP HEAD requests which are sufficient to check the availability of a URL and save you a ton of bandwith as you don't have to download the entire body of the page to check.
As a starting point you could use some function like this:
function is_available($url, $timeout = 30) {
$ch = curl_init(); // get cURL handle
// set cURL options
$opts = array(CURLOPT_RETURNTRANSFER => true, // do not output to browser
CURLOPT_URL => $url, // set URL
CURLOPT_NOBODY => true, // do a HEAD request only
CURLOPT_TIMEOUT => $timeout); // set timeout
curl_setopt_array($ch, $opts);
curl_exec($ch); // do it!
$retval = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200; // check if HTTP OK
curl_close($ch); // close handle
return $retval;
}
However, there's a ton of possible optimizations: You might want to re-use the cURL instance and, if checking more than one URL per host, even re-use the connection.
Oh, and this code does check strictly for HTTP response code 200. It does not follow redirects (302) -- but there also is a cURL-option for that.

Look into cURL. There's a library for PHP.
There's also an executable version of cURL so you could even write the script in bash.

I actually wrote something in PHP that does this over a database of 5k+ URLs. I used the PEAR class HTTP_Request, which has a method called getResponseCode(). I just iterate over the URLs, passing them to getResponseCode and evaluate the response.
However, it doesn't work for FTP addresses, URLs that don't begin with http or https (unconfirmed, but I believe it's the case), and sites with invalid security certificates (a 0 is not found). Also, a 0 is returned for server-not-found (there's no status code for that).
And it's probably easier than cURL as you include a few files and use a single function to get an integer code back.

fopen() supports http URI.
If you need more flexibility (such as timeout), look into the cURL extension.

Seems like it might be a job for curl.
If you're not stuck on PHP Perl's LWP might be an answer too.

You should also be aware of URLs returning 301 or 302 HTTP responses which redirect to another page. Generally this doesn't mean the link is invalid. For example, http://amazon.com returns 301 and redirects to http://www.amazon.com/.

Just returning a 200 response is not enough; many valid links will continue to return "200" after they change into porn / gambling portals when the former owner fails to renew.
Domain squatters typically ensure that every URL in their domains returns 200.

One potential problem you will undoubtably run into is when the box this script is running on looses access to the Internet... you'll get 1000 false positives.
It would probably be better for your script to keep some type of history and only report a failure after 5 days of failure.
Also, the script should be self-checking in some way (like checking a known good web site [google?]) before continuing with the standard checks.

You only need a bash script to do this. Please check my answer on a similar post here. It is a one-liner that reuses HTTP connections to dramatically improve speed, retries n times for temporary errors and follows redirects.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.