When is a StreamContext reusable ? And when should it not be reused? - php

I'm passing from http to https, and therefore I have to add a StreamContext to several read_file and get_file_contents calls.
I need to replace
read_file('http://'.$host.$uri);
by
$stream_context = stream_context_create([
/* some lenghty options array */
]);
read_file('https://'.$host.$uri, false, $stream_context);
Now my question: Is a $stream_context reusable like this:
$stream_context = stream_context_create([
/* some lenghty options array */
]);
read_file('https://'.$host.$uri, false, $stream_context);
get_file_contents($another_url, false, $stream_context);
read_file($even_another, false, $stream_context);
or do I need to recreate a new StreamContext for each URL ?
Asked differently: Is a stream context just a descriptor for parameters and options, or does it get bound to the resource when using it ?
Edit: It seems from the comments, that one can reuse StreamContext often, but not always. This is not quite satisfactory as an answer.
When can or should it be reused, and when can't it be reused ? Can someone shed some light on the internal working of StreamContext. The documentation looks quite sparse to me.

stream contexts are re-usable and they can be re-used always, not often.
The comment from #ilpaijin pointing to "unpredicted behaviour comment" is simple a misunderstanding of the author leaving the comment.
When you specify your context for HTTP wrapper, you specify the wrapper as HTTP regardless of schema you are targeting, meaning there is no such thing as HTTPS wrapper.
If you try to do the following:
"https" => [
// options will not be applied to HTTPS stream as there is no such wrapper (https)
]
The correct way:
"http" => [
// options will apply to http:// and https:// streams.
]
When should/could re-use?
It's really up to you and up to the logic you are trying to implement.
Don't forget you have default context set for all native PHP wrappers.
The example you have posted where you have the same context stream being passed to 3 different call s is unnecessary, simple use stream_context_set_default and set the default context for request originating from your code.
There are certain situations where you set the default but for one particular request you want to have different context, this would be a good idea to create another stream and pass it in.
Does the stream context contain state, like for instance cookies or tls initial negotiation that are passes from one call to another?
Stream context does not contain state, however you could achieve a mock like this with additional code. Any state, let it be cookie or TLS handshake, are simply request headers. You would need to read that information from incoming request and set it in the stream, and then pass that stream to other request, thus mocking "the state" of parent request. That being said - don't do it, just use CURL.
On a side, the real power of streams is creating your own/custom stream. The header manipulation and state control are much easier (and better) achieved with CURL.

It apparently serves as a connection object (same logic like with database connection) and can be reused in a similar way:
<?php
$default_opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: foo=bar",
'proxy'=>"tcp://10.54.1.39:8000"
)
);
$alternate_opts = array(
'http'=>array(
'method'=>"POST",
'header'=>"Content-type: application/x-www-form-urlencoded\r\n" .
"Content-length: " . strlen("baz=bomb"),
'content'=>"baz=bomb"
)
);
$default = stream_context_get_default($default_opts);
$alternate = stream_context_create($alternate_opts);
/* Sends a regular GET request to proxy server at 10.54.1.39
* For www.example.com using context options specified in $default_opts
*/
readfile('http://www.example.com');
/* Sends a POST request directly to www.example.com
* Using context options specified in $alternate_opts
*/
readfile('http://www.example.com', false, $alternate);
?>

It appears that you can. I used xdebug_debug_zval and ran some simple tests to see if PHP was retaining it internally (I used PHP 7.1.3 with xdebug on an internal development server)
$context = stream_context_create(['https' => ['method' => 'GET']]);
xdebug_debug_zval('context');
$stream = file_get_contents('https://secure.php.net/manual/en/function.file-get-contents.php', false, $context);
xdebug_debug_zval('context');
$stream = fopen('https://secure.php.net/', 'r', false, $context);
xdebug_debug_zval('context');
What I got back was
context:
(refcount=1, is_ref=0)resource(2, stream-context)
context:
(refcount=1, is_ref=0)resource(2, stream-context)
context:
(refcount=2, is_ref=0)resource(2, stream-context)
Interestingly, the second call increased the refcount, meaning it was passed by reference internally. Even unsetting $stream didn't remove it or prevent me from calling it again.
Edit
Since the question was modified...
A context creates a resource data type. Because it contains an instance of PHP data, it is passed by reference implicitly, meaning that PHP is passing the internal data directly and not simply making a copy of it. There's no native way to destroy a context.

I agree with above answers, stream_context_create() will create and return handle to resource by taking option parameters for a connection. This can be re-used to different resources, as it is a handle. Does not matter, where it is used but needs to have handle within the request.

Related

file_get_contents VS dom->loadHTMLFile

I've been making a PHP crawler that needs to get all links from a site and fire those links (instead of clicking it manually or doing client-side JS).
I have read these:
How do I make a simple crawler in PHP?
How do you parse and process HTML/XML in PHP?
and others more, and I decided to follow 1.
So far it has been working, but I have been baffled by the difference in the approach of using file_get_contents against dom->loadHTMLFile. Can you please enlighten me with these and the implications it might cause, pros and cons, or simple versus scenario.
Effectively these method are doing the same. However, using file_get_contents() you will need to store the results, at least temporarily, in a string variable unless you pass it to DOMDocument::loadHTML(). This leads to a higher memory usage in your application.
Some sites may require you to set some special header values, or use an other HTTP method than GET. If you need this, you need to specify a so called stream context. You can achieve this for both of the above methods using stream_context_create():
Example:
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: foo=bar\r\n"
)
);
$ctx = stream_context_create($opts);
You can set this context using both of the above ways, but they differ in how to achieve this:
// With file_get_contents ...
$file_get_contents($url, false, $ctx);
// With DOM
libxml_set_streams_context($ctx);
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
Leaves to be said, that using the curl extension you will have even more control about he HTTP transfer, what might be necessary in some special cases.

Stream context in PHP - what is it?

I have searched for hours and I cannot figure out what a 'stream context' in PHP is. I'm trying to use an API and it involves using this 'stream context'.
The documentation says:
A context is a set of parameters and wrapper specific options which modify or enhance the behavior of a stream.
A parameter of what?
What is meant by an option being 'specific to a wrapper'?
What stream?
Here is the code I'm talking about:
// Encode the credentials and create the stream context.
$auth = base64_encode("$acctKey:$acctKey");
$data = array(
'http' => array(
'request_fulluri' => true,
// ignore_errors can help debug – remove for production. This option added in PHP 5.2.10
'ignore_errors' => true,
'header' => "Authorization: Basic $auth")
);
$context = stream_context_create($data);
// Get the response from Bing.
$response = file_get_contents($requestUri, 0, $context);
It took me a while to understand the stream contexts options and wrappers of PHP. I wrote an article about what helped me finally wrap my brain around how to understand PHP stream contexts options and wrappers. I hope it helps.
To properly handle whatever is coming down the line (streamed data), you will need the appropriate code to handle the different kinds of items being passed (data types). The tools for handling each different kind of data type are the “parameters”.
The “context” is determined by what is being pass along (streamed). So for different “contexts” (kinds of items) being “streamed” (passed along) the “parameters” (required tools for handling) the “data type” (kind of item) will change.
The term context simply makes reference to the fact that for different data types the situation is unique with its own required parameters.
The PHP stream wrapper would require a context in order to know which parameters are needed to handle the data type.
A parameter of the context that modifies the properties of the stream.
The options are specific to whatever wrapper the stream is using. Examples of these include files, all the different php:// URIs, the HTTP wrapper (like when you do file_get_contents('http://example.com') — it’s not the same thing as file_get_contents('some-file.txt'))
Any stream!
In this case, the stream context is passed to file_get_contents to tell it to send that authorization header and those options to the wrapper that allows file_get_contents to get contents from HTTP URLs.
You can find a list of the HTTP context options on the PHP website.
http, request_fulluri, ignore_errors, header are all parameters.
They change the way the function (file_get_contents in this case) works.
An option that is specific to a wrapper is something like 'http' --
you wouldn't use that on a filesystem file stream since it's not applicable.
The stream is the transfer of data itself which occurs when file_get_contents opens the connection, transfers everything, etc...

How to add a parameter to the request header sent by a PHP script?

I'm trying to use a web service REST API for which I need to add a parameter for authorization (with the appropriate key, of course) to get a XML result. I'm developing in PHP. How can I add a parameter to the request header in such a situation?
Edit: The way I'm doing the request right now is $xml = simplexml_load_file($query_string);
Are you using curl? (recommended)
I assume that you are using curl to do these requests towards the REST API, if you aren't; use it.
When using curl you can add a custom header by calling curl_setopt with the appropriate parameters, such as in below.
curl_setopt (
$curl_handle, CURLOPT_HTTPHEADER,
array ('Authentication-Key: foobar')
); // make curl send a HTTP header named 'Authentication-key'
// with the value 'foobar'
Documentation:
PHP: cURL - Manual
PHP: curl_setopt - Manual
Are you using file_get_contents or similar?
This method is not recommended, though it is functional.
Note: allow_url_fopen needs to be enabled for file_get_contents to be able to access resources over HTTP.
If you'd like to add a custom header to such request you'll need to create yourself a valid stream context, as in the below snippet:
$context_options = array(
'http' =>array (
'method' => 'GET',
'header' => 'Authentication-Key'
)
);
$context = stream_context_create ($context_options);
$response = file_get_contents (
'http://www.stackoverflow.com', false, $context_options
);
Documentation:
PHP: file_get_contents - Manual
PHP: stream_context_create - Manual
PHP: Runtime Configuration, allow_url_fopen
I'm using neither of the above solutions, what should I do?
[Post OP EDIT]
My recommendation is to fetch the data using curl and then pass it off to the parser in question when all the data is received. Separate data fetching from the processing of the returned data.
[/Post OP EDIT]
When you use $xml = simplexml_load_file($query_string);, the PHP interpreter invokes it's wrapper over fopen to open the contents of a file located at $query_string. If $query_string is a remote file, the PHP interpreter opens a stream to that remote URL and retrieves the contents of the file there (if the HTTP response code 200 OK). It uses the default stream context to do that.
There is a way to alter the headers sent by altering that stream context, however, in most cases, this is a bad idea. You're relying on PHP to always open all files, local or remote, using a function that was meant to take a local file name only. Not only is it a security problem but it also could be the source of a bug that is very hard to track down.
Instead, consider splitting the loading of the remote content using cURL (checking the returned HTTP status code and other sanity checks) and then parsing that content into a SimpleXMLElement object to use. When you use cURL, you can set any headers you want to send with the request by invoking something similar to curl_setopt($ch, CURLOPT_HTTPHEADER, array('HeaderName' => 'value');
Hope this helps.

PHP multi curl - find out what proxy was used for a particular curl handle

I'm using multi curl with anonymous proxies, and I want to flag the proxies based on performance and location etc after the curl handle is returned. I've tried curl_getinfo() but that does not return information about the proxy used for that curl handle.
Any ideas? I've thought about maybe a way to identify a particular handle and storing that with the proxy used, then when the handle has fired off and returned via curl_multi_info_read() I can look up the handle via the proxy. Not sure what to use as an identifier though. Doing a dump shows the handle as resource(20), but not sure if that is something I can rely on?
I guess if there was something like getOpt() would be ideal, but i don't see anything like that for a curl handle from the research I have done.
Check last version of MultiRequest library. There you can do something like this:
$request = new MultiRequest_Request($url);
$request->setCurlOption(CURLOPT_PROXY, $proxy);
// ...
$curlOptions = $request->getCurlOptions();
list($proxyIp, $proxyPort) = explode(':', $curlOptions[CURLOPT_PROXY]);
I found a parallel curl class (by Pete Warden), that passes data for multi-curl using the following..
$this->outstanding_requests[$ch] = array(
'url' => $url,
'callback' => $callback,
'user_data' => $user_data,
'proxy' => $proxy
);
When the multi-curl is done, it's able to use the curl handle to hold information via the outstanding requests array. If you're interested in multi-curl check out the class, it sets up everything for you and is very customizable.

What functions can I use to contact another webpage (to send GET data) and set a timeout with?

I need a function that I can use in my script to contact another script to send it some GET data. But I need to be able to set a timeout so that it only loads for a few seconds, then continues with the rest of the script. I know I could easily use cURL to do this, but I'd like to know if there are any alternatives?
You can specify a timeout for the standard file access functions (like file_get_contents()) using stream_context_create():
<?php
$opts = array(
'http'=>array(
'method'=>"GET",
'timeout' => 5
)
);
$context = stream_context_create($opts);
$fp = fopen('http://www.example.com', 'r', false, $context);
fpassthru($fp);
fclose($fp);
?>
See the list of context options for an explanation on the timeout option.
This requires, of course, that you can access external URLs using fopen() and consorts.
The nice thing about curl, is it lets you uses threads even though php doesn't support them. So you can make the call to curl_multi, give it a callback, and let the rest of the script run. This way your regular processing isn't blocked. This reduces the need for a short timeout.

Categories