Stream context in PHP - what is it? - php

I have searched for hours and I cannot figure out what a 'stream context' in PHP is. I'm trying to use an API and it involves using this 'stream context'.
The documentation says:
A context is a set of parameters and wrapper specific options which modify or enhance the behavior of a stream.
A parameter of what?
What is meant by an option being 'specific to a wrapper'?
What stream?
Here is the code I'm talking about:
// Encode the credentials and create the stream context.
$auth = base64_encode("$acctKey:$acctKey");
$data = array(
'http' => array(
'request_fulluri' => true,
// ignore_errors can help debug – remove for production. This option added in PHP 5.2.10
'ignore_errors' => true,
'header' => "Authorization: Basic $auth")
);
$context = stream_context_create($data);
// Get the response from Bing.
$response = file_get_contents($requestUri, 0, $context);

It took me a while to understand the stream contexts options and wrappers of PHP. I wrote an article about what helped me finally wrap my brain around how to understand PHP stream contexts options and wrappers. I hope it helps.
To properly handle whatever is coming down the line (streamed data), you will need the appropriate code to handle the different kinds of items being passed (data types). The tools for handling each different kind of data type are the “parameters”.
The “context” is determined by what is being pass along (streamed). So for different “contexts” (kinds of items) being “streamed” (passed along) the “parameters” (required tools for handling) the “data type” (kind of item) will change.
The term context simply makes reference to the fact that for different data types the situation is unique with its own required parameters.
The PHP stream wrapper would require a context in order to know which parameters are needed to handle the data type.

A parameter of the context that modifies the properties of the stream.
The options are specific to whatever wrapper the stream is using. Examples of these include files, all the different php:// URIs, the HTTP wrapper (like when you do file_get_contents('http://example.com') — it’s not the same thing as file_get_contents('some-file.txt'))
Any stream!
In this case, the stream context is passed to file_get_contents to tell it to send that authorization header and those options to the wrapper that allows file_get_contents to get contents from HTTP URLs.
You can find a list of the HTTP context options on the PHP website.

http, request_fulluri, ignore_errors, header are all parameters.
They change the way the function (file_get_contents in this case) works.
An option that is specific to a wrapper is something like 'http' --
you wouldn't use that on a filesystem file stream since it's not applicable.
The stream is the transfer of data itself which occurs when file_get_contents opens the connection, transfers everything, etc...

Related

When is a StreamContext reusable ? And when should it not be reused?

I'm passing from http to https, and therefore I have to add a StreamContext to several read_file and get_file_contents calls.
I need to replace
read_file('http://'.$host.$uri);
by
$stream_context = stream_context_create([
/* some lenghty options array */
]);
read_file('https://'.$host.$uri, false, $stream_context);
Now my question: Is a $stream_context reusable like this:
$stream_context = stream_context_create([
/* some lenghty options array */
]);
read_file('https://'.$host.$uri, false, $stream_context);
get_file_contents($another_url, false, $stream_context);
read_file($even_another, false, $stream_context);
or do I need to recreate a new StreamContext for each URL ?
Asked differently: Is a stream context just a descriptor for parameters and options, or does it get bound to the resource when using it ?
Edit: It seems from the comments, that one can reuse StreamContext often, but not always. This is not quite satisfactory as an answer.
When can or should it be reused, and when can't it be reused ? Can someone shed some light on the internal working of StreamContext. The documentation looks quite sparse to me.
stream contexts are re-usable and they can be re-used always, not often.
The comment from #ilpaijin pointing to "unpredicted behaviour comment" is simple a misunderstanding of the author leaving the comment.
When you specify your context for HTTP wrapper, you specify the wrapper as HTTP regardless of schema you are targeting, meaning there is no such thing as HTTPS wrapper.
If you try to do the following:
"https" => [
// options will not be applied to HTTPS stream as there is no such wrapper (https)
]
The correct way:
"http" => [
// options will apply to http:// and https:// streams.
]
When should/could re-use?
It's really up to you and up to the logic you are trying to implement.
Don't forget you have default context set for all native PHP wrappers.
The example you have posted where you have the same context stream being passed to 3 different call s is unnecessary, simple use stream_context_set_default and set the default context for request originating from your code.
There are certain situations where you set the default but for one particular request you want to have different context, this would be a good idea to create another stream and pass it in.
Does the stream context contain state, like for instance cookies or tls initial negotiation that are passes from one call to another?
Stream context does not contain state, however you could achieve a mock like this with additional code. Any state, let it be cookie or TLS handshake, are simply request headers. You would need to read that information from incoming request and set it in the stream, and then pass that stream to other request, thus mocking "the state" of parent request. That being said - don't do it, just use CURL.
On a side, the real power of streams is creating your own/custom stream. The header manipulation and state control are much easier (and better) achieved with CURL.
It apparently serves as a connection object (same logic like with database connection) and can be reused in a similar way:
<?php
$default_opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: foo=bar",
'proxy'=>"tcp://10.54.1.39:8000"
)
);
$alternate_opts = array(
'http'=>array(
'method'=>"POST",
'header'=>"Content-type: application/x-www-form-urlencoded\r\n" .
"Content-length: " . strlen("baz=bomb"),
'content'=>"baz=bomb"
)
);
$default = stream_context_get_default($default_opts);
$alternate = stream_context_create($alternate_opts);
/* Sends a regular GET request to proxy server at 10.54.1.39
* For www.example.com using context options specified in $default_opts
*/
readfile('http://www.example.com');
/* Sends a POST request directly to www.example.com
* Using context options specified in $alternate_opts
*/
readfile('http://www.example.com', false, $alternate);
?>
It appears that you can. I used xdebug_debug_zval and ran some simple tests to see if PHP was retaining it internally (I used PHP 7.1.3 with xdebug on an internal development server)
$context = stream_context_create(['https' => ['method' => 'GET']]);
xdebug_debug_zval('context');
$stream = file_get_contents('https://secure.php.net/manual/en/function.file-get-contents.php', false, $context);
xdebug_debug_zval('context');
$stream = fopen('https://secure.php.net/', 'r', false, $context);
xdebug_debug_zval('context');
What I got back was
context:
(refcount=1, is_ref=0)resource(2, stream-context)
context:
(refcount=1, is_ref=0)resource(2, stream-context)
context:
(refcount=2, is_ref=0)resource(2, stream-context)
Interestingly, the second call increased the refcount, meaning it was passed by reference internally. Even unsetting $stream didn't remove it or prevent me from calling it again.
Edit
Since the question was modified...
A context creates a resource data type. Because it contains an instance of PHP data, it is passed by reference implicitly, meaning that PHP is passing the internal data directly and not simply making a copy of it. There's no native way to destroy a context.
I agree with above answers, stream_context_create() will create and return handle to resource by taking option parameters for a connection. This can be re-used to different resources, as it is a handle. Does not matter, where it is used but needs to have handle within the request.

Is there a way for retrieving an external DOM in PHP (with loadHTMLfile) including some headers in the html call?

I'm trying to get some DOM from external URLs while passing some headers, so the received DOM must be different depending on them.
For instance, if I have a mobile device header, the received DOM should be completely different than from a PC header.
At the moment I haven't found the way to do this, as I'm always receiving the same DOM (with my default headers).
Thank you in advance.
You can specify a so called stream context for the load operation. Check this example, which is derived from the PHP manual
// Specify headers
$opts = array(
'http' => array(
'user_agent' => 'PHP libxml agent',
)
);
// Create and assign a stream context
$context = stream_context_create($opts);
libxml_set_streams_context($context);
// request the HTML file through HTTP
$doc = DOMDocument::loadHTMLFile('http://www.example.com/file.html');
You can follow theses articles in the PHP manual:
DOMDocument::loadHTMLFile()
stream_context_create()
libxml_set_streams_context()
Supported Protocols and Wrappers
HTTP context options

How to receive a file via HTTP PUT with PHP

This is something that has been bugging me for a while.. I'm building of a RESTful API that has to receive files on some occasions.
When using HTTP POST, we can read data from $_POST and files from $_FILES.
When using HTTP GET, we can read data from $_GET and files from $_FILES.
However, when using HTTP PUT, AFAIK the only way to read data is to use the php://input stream.
All good and well, untill I want to send a file over HTTP PUT. Now the php://input stream doesn't work as expected anymore, since it has a file in there as well.
Here's how I currently read data on a PUT request:
(which works great as long as there are no files posted)
$handle = fopen('php://input', 'r');
$rawData = '';
while ($chunk = fread($handle, 1024)) {
$rawData .= $chunk;
}
parse_str($rawData, $data);
When I then output rawData, it shows
-----ZENDHTTPCLIENT-44cf242ea3173cfa0b97f80c68608c4c
Content-Disposition: form-data; name="image_01"; filename="lorem-ipsum.png"
Content-Type: image/png; charset=binary
�PNG
���...etc etc...
���,
-----ZENDHTTPCLIENT-8e4c65a6678d3ef287a07eb1da6a5380
Content-Disposition: form-data; name="testkey"
testvalue
-----ZENDHTTPCLIENT-8e4c65a6678d3ef287a07eb1da6a5380
Content-Disposition: form-data; name="otherkey"
othervalue
Does anyone know how to properly receive files over HTTP PUT, or how to parse files out of the php://input stream?
===== UPDATE #1 =====
I have tried only the above method, don't really have a clue as to what I can do else.
I have gotten no errors using this method, besides that I don't get the desired result of the posted data and files.
===== UPDATE #2 =====
I'm sending this test request using Zend_Http_Client, as follows:
(haven't had any problems with Zend_Http_Client so far)
$client = new Zend_Http_Client();
$client->setConfig(array(
'strict' => false,
'maxredirects' => 0,
'timeout' => 30)
);
$client->setUri( 'http://...' );
$client->setMethod(Zend_Http_Client::PUT);
$client->setFileUpload( dirname(__FILE__) . '/files/lorem-ipsum.png', 'image_01');
$client->setParameterPost(array('testkey' => 'testvalue', 'otherkey' => 'othervalue');
$client->setHeaders(array(
'api_key' => '...',
'identity' => '...',
'credential' => '...'
));
===== SOLUTION =====
Turns out I made some wrong assumptions, mainly that HTTP PUT would be similar to HTTP POST. As you can read below, DaveRandom explained to me that HTTP PUT is not meant for transferring multiple files on the same request.
I have now moved the transferring of formdata from the body to url querystring. The body now holds the contents of a single file.
For more information, read DaveRandom's answer. It's epic.
The data you show does not depict a valid PUT request body (well, it could, but I highly doubt it). What it shows is a multipart/form-data request body - the MIME type used when uploading files via HTTP POST through an HTML form.
PUT requests should exactly compliment the response to a GET request - they send you the file contents in the message body, and nothing else.
Essentially what I'm saying is that it is not your code to receive the file that is wrong, it is the code that is making the request - the client code is incorrect, not the code you show here (although the parse_str() call is a pointless exercise).
If you explain what the client is (a browser, script on other server, etc) then I can help you take this further. As it is, the appropriate request method for the request body that you depict is POST, not PUT.
Let's take a step back from the problem, and look at the HTTP protocol in general - specifically the client request side - hopefully this will help you understand how all of this is supposed to work. First, a little history (if you're not interested in this, feel free to skip this section).
History
HTTP was originally designed as a mechanism for retrieving HTML documents from remote servers. At first it effectively supported only the GET method, whereby the client would request a document by name and the server would return it to the client. The first public specification for HTTP, labelled as HTTP 0.9, appeared in 1991 - and if you're interested, you can read it here.
The HTTP 1.0 specification (formalised in 1996 with RFC 1945) expanded the capabilities of the protocol considerably, adding the HEAD and POST methods. It was not backwards compatible with HTTP 0.9, due to a change in the format of the response - a response code was added, as well as the ability to include metadata for the returned document in the form of MIME format headers - key/value data pairs. HTTP 1.0 also abstracted the protocol from HTML, allowing for the transfer of files and data in other formats.
HTTP 1.1, the form of the protocol that is almost exclusively in use today is built on top of HTTP 1.0 and was designed to be backwards compatible with HTTP 1.0 implementations. It was standardised in 1999 with RFC 2616. If you are a developer working with HTTP, get to know this document - it is your bible. Understanding it fully will give you a considerable advantage over your peers who do not.
Get to the point already
HTTP works on a request-response architecture - the client sends a request message to the server, the server returns a response message to the client.
A request message includes a METHOD, a URI and optionally, a number of HEADERS. The request METHOD is what this question relates to, so it is what I will cover in the most depth here - but first it is important to understand exactly what we mean when we talk about the request URI.
The URI is the location on the server of the resource we are requesting. In general, this consists of a path component, and optionally a query string. There are circumstances where other components may be present as well, but for the purposes of simplicity we shall ignore them for now.
Let's imagine you type http://server.domain.tld/path/to/document.ext?key=value into the address bar of your browser. The browser dismantles this string, and determines that it needs to connect to an HTTP server at server.domain.tld, and ask for the document at /path/to/document.ext?key=value.
The generated HTTP 1.1 request will look (at a minimum) like this:
GET /path/to/document.ext?key=value HTTP/1.1
Host: server.domain.tld
The first part of the request is the word GET - this is the request METHOD. The next part is the path to the file we are requesting - this is the request URI. At the end of this first line is an identifier indicating the protocol version in use. On the following line you can see a header in MIME format, called Host. HTTP 1.1 mandates that the Host: header be included with every request. This is the only header of which this is true.
The request URI is broken into two parts - everything to the left of the question mark ? is the path, everything to the right of it is the query string.
Request Methods
RFC 2616 (HTTP/1.1) defines 8 request methods.
OPTIONS
The OPTIONS method is rarely used. It is intended as a mechanism for determining what kind of functionality the server supports before attempting to consume a service the server may provide.
Off the top of my head, the only place in fairly common usage that I can think of where this is used is when opening documents in Microsoft office directly over HTTP from Internet Explorer - Office will send an OPTIONS request to the server to determine if it supports the PUT method for the specific URI, and if it does it will open the document in a way that allows the user to save their changes to the document directly back to the remote server. This functionality is tightly integrated within these specific Microsoft applications.
GET
This is by far and away the most common method in every day usage. Every time you load a regular document in your web browser it will be a GET request.
The GET method requests that the server return a specific document. The only data that should be transmitted to the server is information that the server requires to determine which document should be returned. This can include information that the server can use to dynamically generate the document, which is sent in the form of headers and/or query string in the request URI. While we're on the subject - Cookies are sent in the request headers.
HEAD
This method is identical to the GET method, with one difference - the server will not return the requested document, if will only return the headers that would be included in the response. This is useful for determining, for example, if a particular document exists without having to transfer and process the entire document.
POST
This is the second most commonly used method, and arguably the most complex. POST method requests are almost exclusively used to invoke some actions on the server that may change its state.
A POST request, unlike GET and HEAD, can (and usually does) include some data in the body of the request message. This data can be in any format, but most commonly it is a query string (in the same format as it would appear in the request URI) or a multipart message that can communicate key/value pairs along with file attachments.
Many HTML forms use the POST method. In order to upload files from a browser, you would need to use the POST method for your form.
The POST method is semantically incompatible with RESTful APIs because it is not idempotent. That is to say, a second identical POST request may result in a further change to the state of the server. This contradicts the "stateless" constraint of REST.
PUT
This directly complements GET. Where a GET requests indicates that the server should return the document at the location specified by the request URI in the response body, the PUT method indicates that the server should store the data in the request body at the location specified by the request URI.
DELETE
This indicates that the server should destroy the document at the location indicated by the request URI. Very few internet facing HTTP server implementations will perform any action when they receive a DELETE request, for fairly obvious reasons.
TRACE
This provides an application-layer level mechanism to allow clients to inspect the request it has sent as it looks by the time it reaches the destination server. This is mostly useful for determining the effect that any proxy servers between the client and the destination server may be having on the request message.
CONNECT
HTTP 1.1 reserves the name for a CONNECT method, but does not define its usage, or even its purpose. Some proxy server implementations have since used the CONNECT method to facilitate HTTP tunnelling.
I've never tried using PUT (GET POST and FILES were sufficient for my needs) but this example is from the php docs so it might help you (http://php.net/manual/en/features.file-upload.put-method.php):
<?php
/* PUT data comes in on the stdin stream */
$putdata = fopen("php://input", "r");
/* Open a file for writing */
$fp = fopen("myputfile.ext", "w");
/* Read the data 1 KB at a time
and write to the file */
while ($data = fread($putdata, 1024))
fwrite($fp, $data);
/* Close the streams */
fclose($fp);
fclose($putdata);
?>
Here is the solution that I found to be the most useful.
$put = array();
parse_str(file_get_contents('php://input'), $put);
$put will be an array, just like you are used to seeing in $_POST, except now you can follow true REST HTTP protocol.
Use POST and include an X- header to indicate the actual method (PUT in this case). Usually this is how one works around a firewall which does not allow methods other than GET and POST. Simply declare PHP buggy (since it refuses to handle multipart PUT payloads, it IS buggy), and treat it as you would an outdated/draconian firewall.
The opinions as to what PUT means in relation to GET are just that, opinions. The HTTP makes no such requirement. It simply states 'equivalent' .. it is up to the designer to determine what 'equivalent' means. If your design can accept a multi-file upload PUT and produce an 'equivalent' representation for a subsequent GET for the same resource, that's just fine and dandy, both technically and philosophically, with the HTTP specifications.
Just follow what it says in the DOC:
<?php
/* PUT data comes in on the stdin stream */
$putdata = fopen("php://input", "r");
/* Open a file for writing */
$fp = fopen("myputfile.ext", "w");
/* Read the data 1 KB at a time
and write to the file */
while ($data = fread($putdata, 1024))
fwrite($fp, $data);
/* Close the streams */
fclose($fp);
fclose($putdata);
?>
This should read the whole file that is on the PUT stream and save it locally, then you could do what you want with it.

How to add a parameter to the request header sent by a PHP script?

I'm trying to use a web service REST API for which I need to add a parameter for authorization (with the appropriate key, of course) to get a XML result. I'm developing in PHP. How can I add a parameter to the request header in such a situation?
Edit: The way I'm doing the request right now is $xml = simplexml_load_file($query_string);
Are you using curl? (recommended)
I assume that you are using curl to do these requests towards the REST API, if you aren't; use it.
When using curl you can add a custom header by calling curl_setopt with the appropriate parameters, such as in below.
curl_setopt (
$curl_handle, CURLOPT_HTTPHEADER,
array ('Authentication-Key: foobar')
); // make curl send a HTTP header named 'Authentication-key'
// with the value 'foobar'
Documentation:
PHP: cURL - Manual
PHP: curl_setopt - Manual
Are you using file_get_contents or similar?
This method is not recommended, though it is functional.
Note: allow_url_fopen needs to be enabled for file_get_contents to be able to access resources over HTTP.
If you'd like to add a custom header to such request you'll need to create yourself a valid stream context, as in the below snippet:
$context_options = array(
'http' =>array (
'method' => 'GET',
'header' => 'Authentication-Key'
)
);
$context = stream_context_create ($context_options);
$response = file_get_contents (
'http://www.stackoverflow.com', false, $context_options
);
Documentation:
PHP: file_get_contents - Manual
PHP: stream_context_create - Manual
PHP: Runtime Configuration, allow_url_fopen
I'm using neither of the above solutions, what should I do?
[Post OP EDIT]
My recommendation is to fetch the data using curl and then pass it off to the parser in question when all the data is received. Separate data fetching from the processing of the returned data.
[/Post OP EDIT]
When you use $xml = simplexml_load_file($query_string);, the PHP interpreter invokes it's wrapper over fopen to open the contents of a file located at $query_string. If $query_string is a remote file, the PHP interpreter opens a stream to that remote URL and retrieves the contents of the file there (if the HTTP response code 200 OK). It uses the default stream context to do that.
There is a way to alter the headers sent by altering that stream context, however, in most cases, this is a bad idea. You're relying on PHP to always open all files, local or remote, using a function that was meant to take a local file name only. Not only is it a security problem but it also could be the source of a bug that is very hard to track down.
Instead, consider splitting the loading of the remote content using cURL (checking the returned HTTP status code and other sanity checks) and then parsing that content into a SimpleXMLElement object to use. When you use cURL, you can set any headers you want to send with the request by invoking something similar to curl_setopt($ch, CURLOPT_HTTPHEADER, array('HeaderName' => 'value');
Hope this helps.

How to load a URL and only get back the last 20k of it

I have been made aware of the Accept-Range header.
I have a URL that I am calling that always returns a 2mb file. I don't need this much and only need the last section 20-50k.
I am not sure how to go about using it? Would I need to use cURL? I am currently using file_get_contents().
Would someone be able to provide me with an example / tutorial?
Thanks.
EDIT: If this isn't possible then what is post on about? Here ...
EDIT: Ulrika! I'm not insane.
This is possible using the Range header, provided the server supports it. See the HTTP 1.1 spec. You would want to send a header in the following format in your request:
Range: bytes=-50000
This would give you the last 50,000 bytes. Adjust to whatever you need.
You can specify this header in file_get_contents using a context. For example:
// Create a stream
$opts = array(
'http'=>array(
'method' => "GET",
'header' => "Range: bytes=-50000\r\n"
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);
If you were to file_get_contents() and dump that to a passthrough 'cache' file on disk, then you could use the unix/linux tail -c to only grab back the last 20kb or so. This doesn't mitigate the actual transfer, but gets that 20kb into the application.
This is indeed possible - see this question for an example of the HTTP headers sent and received
you can't do that. You're going to have to load the entire file (which is sent in its entirety, sequentially, by the source server), and just discard most of it.
What you're asking is like "I'm tuning to this radio station on my car stereo and I only want to hear the last 5 minutes of the show, without having to wait for the rest to complete or change channels".

Categories