I'm trying to stream an ipcamera through PHP using the following code;
<?
# Stop apache timing out
set_time_limit(0);
# Set the correct header
header('Content-Type: multipart/x-mixed-replace;boundary=ipcamera');
# Read the images
readfile('http://[USER]:[PASSWORD]#[IPADDRESS]:[PORT]/videostream.cgi');
?>
This script works just fine on my localhost running apache and php, however on my web server (tested on 2 servers), I receive a 400 Bad Request error. I was previously receiving a 'connection refused' error, but this was resolved by my host forwarding the correct port.
Is 400 not something like incorrect syntax? Could this be because i have the "[USER]:[PASSWORD]#" in the url? If so, is there another way I can authenticate before running readfile?
I have run the following cases to determine the response code;
readfile('http://[USER]:[PASSWORD]#[IPADDRESS]:[PORT]/NONEXISTINGFILE.cgi');
// Returns 400 Bad Request (Should be 404)
readfile('http://[IPADDRESS]:[PORT]/NONEXISTINGFILE.cgi');
// Returns 404 Not Found (Correct Response)
readfile('http://[IPADDRESS]:[PORT]/videostream.cgi');
// Returns 401 Unauthorized (Correct Response)
readfile('http://[USER]:[PASSWORD]#[IPADDRESS]:[PORT]/videostream.cgi');
// Returns 400 Bad Request (Incorrect - Should be .. 200(?) OK)
readfile('http://[USER]:[PASSWORD]#[IPADDRESS]:[PORT]/FILETHATDOESEXIST.jpg');
// Returns 400 Bad Request (Should be 200(?) OK)
readfile('http://[IPADDRESS]:[PORT]/FILETHATDOESEXIST.jpg');
// Returns 200(?) OK (Correct Response)
If someone is able to give me the curl equivalent of this script, perhaps this is the solution. The Bounty still stands to anyone who can solve this for me :)
Regards,
For curl, try something like:
<?php
# Stop apache timing out
set_time_limit(0);
# Set the correct header
header('Content-Type: multipart/x-mixed-replace;boundary=ipcamera');
# Read the images
$ch = curl_init('http://[IPADDRESS]:[PORT]/videostream.cgi');
curl_setopt($ch, CURLOPT_USERPWD, '[USER]:[PASSWORD]');
curl_exec($ch);
If you want to continue to use readfile(), you could use stream_context_create() to construct a stream context for readfile. See this documentation on php.net to see how that could be done - specifically, you will want to create an HTTP stream context that passes an Authentication header.
simply check your [USER] and [PASSWORD] ... make sure it does not contains :, /, # ...
if there is escape it with \
You might have something set in your PHP.INI on the production server that is disabling readfile's ability to load external URLs.
I'm specifically thinking about allow_url_fopen which may be set to FALSE, though it may be something else.
You can also check you web server's error log for clues. PHP will usually emit a specific warning.
Related
we have a php web app using Guzzle 5 to download Wordpress RSS feeds.
It's working fine except for this feed https://www.socialquant.net/blog/feed/
The owner of this site does want us to pull the feed, and is not knowingly attempting to block access.
I can successfully download the file from my local machine and from the production web server (where we initially noticed the problem) using wget or curl with no special options.
This happened once before and that time we believed the issue to be caused by mod_security on Apache and it was solved by adding an arbitrary User-Agent header. But that time I was able to reproduce the issue consistently on the command line, this time it's only failing through Guzzle/PHP
I've copied the response headers from a browser request to the problem feed, and another feed that is working. I crossed off those that were the same and was left with the below
Server:Apache/2.2.22
Vary:User-Agent
X-Powered-By:PHP/5.3.29
Content-Encoding:gzip
Server:Apache
Vary:Accept-Encoding
X-Powered-By:PHP/5.5.30
That's not offering much insight. The gzip content encoding jumps out, I'm trying to find another working feed using gzip to verify this but it shouldn't matter as Guzzle's default mode is to automatically handle encoding. And we're using the same settings to download images from CDNs which are using gzip.
Does anyone have any ideas please? Thanks :)
EDIT
Using Guzzle 5.3.0
Code:
$client = new \GuzzleHttp\Client();
try {
$res = $client->get( $feed, [
'headers' => ['User-Agent' => 'Mozilla/4.0']
] );
} catch (\Exception $e) {
}
I'm afraid I don't have a proper solution to your problem, but I have it working again.
tl;dr version
It's the User-Agent header, changing it to pretty much anything else works.
This wget call fails:
wget -d --header="User-Agent: Mozilla/4.0" https://www.socialquant.net/blog/feed/
but this works
wget -d --header="User-Agent: SomeRandomText" https://www.socialquant.net/blog/feed/
And with that, the PHP below now also works:
require 'vendor/autoload.php';
$client = new \GuzzleHttp\Client();
$feed = 'https://www.socialquant.net/blog/feed/';
try {
$res = $client->get(
$feed,
[
'headers' => [
'User-Agent' => 'SomeRandomText',
]
]
);
echo $res->getBody();
} catch (\Exception $e) {
echo 'Exception: ' . $e->getMessage();
}
My thoughts
I started with wget and curl as you pointed out, which works when no special headers or options are set. Opening it in my browser also worked. I also tried using Guzzle without the User-Agent set and that also works.
Once I set the User-Agent to Mozilla/4.0 or even Mozilla/5.0 it started failing with 406 Not Acceptable
According to the HTTP Status Code definitions, a 406 means
The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request.
In theory, adding Accept and Accept-Encoding headers should resolve the issue, but it didn't. Not via Guzzle or wget.
I then found the Mozilla Developer Network definition which states:
This response is sent when the web server, after performing server-driven content negotiation, doesn't find any content following the criteria given by the user agent.
This kinda points at the User-Agent again. This led me to believe that you are indeed correct that mod_security is doing something odd. I am convinced that an update to mod_security or Apache on the client's servers added a rule to parse the Mozilla/* user agents in a specific way since sending the User-Agent: Mozilla/4.0 () also works.
That's why I'm saying I don't have a proper solution for you. Even though the client wants you to pull the feed, they (or their hosting) is still in control of the rules.
Note: I noticed my IP getting blacklisted after a number of failed 406 attempts, after which I had to wait an hour before I could access the site again. Most likely a mod_security rule. mod_security might even be picking up on the automated requests with your user agent and start blocking it or rejecting it with the 406.
I don't have a solution for you either, as I'm also experiencing this same issue (except I get error 503 and it fails 60% of the time). Let me know if you have found a solution.
However, I would like to share with you what I have found through my recent research. I found that certain User-Agents work better than others for me. This makes me believe that it's not what Donovan states to be the case (at least for me).
When I set User-Agent to null, it works 100% of the time. However, I haven't made any large requests yet, as I'm afraid of getting IP banned, as I know I would with a large request.
When I do a var_dump of the request itself, I see a lot of arrays which include Guzzle markers. I'm thinking, maybe Amazons detection services can tell that I'm spoofing the headers? I don't know.
Hope you figured it out.
The URL in question : http://www.roblox.com/asset/?id=149996624
When accessed in a browser, it will correctly download a file (which is an XML document). I wanted to get the file in php, and simply display its contents on a page.
$contents = file_get_contents("http://www.roblox.com/asset/?id=149996624");
The above is what I've tried using (as far as I know, the page does not expect any headers). I get a 500 HTTP error. However, in Python, the following code works and I receive the file.
r = requests.get("http://www.roblox.com/asset/?id=147781188")
I'm confused as to what the distinction is between how these two requests are sent. I am almost 100% it is not a header problem. I've also tried the cURL library in PHP to no avail. Nothing I've tried in PHP seems to succeed with the URL (with any valid id parameter); but Python is able to bring success nonchalantly.
Any insight as to why this issue may be happening would be great.
EDIT : I have already tried copying Python's headers into my PHP request.
EDIT2 : It also appears that there are two requests happening upon navigating to the link.
Is this on a linux/mac host by chance? If so you could use ngrep to see the differences on the request themselves on the wire. Something like the following should work
ngrep -t '^(GET) ' 'src host 127.0.0.1 and tcp and dst port 80'
EDIT - The problem is that your server is responding with a 302 and the PHP library is not following it automatically. Cheers!
I have some PHP code that calls a web service. It works perfectly when running on my local machine, but when I try to run it on the remote server (where it will ultimately reside), I get the following error:
Warning: simplexml_load_file(http://XXXXXXXXX:8080/symws/rest/standard/searchCatalog?clientID=StorageClient&term1=ocm00576702): failed to open stream: HTTP request failed! HTTP/1.1 400 Bad Request
in /var/www/ShoppingCart/storage_request_button.php on line 42
My local machine is running OSX; the server is running debian linux.
Any idea what could be causing this different behavior? Is there another package I need to install on the server?
UPDATE:
While putting the URL in a browser works fine, when I try to wget the URL from the linux server, I get the 400 error. The server URL is accessing is also debian linux. There's no firewall on the server. I've never had to configure that server to allow access to it from anywhere else.
You could try to supress errors as this question mentions to get rid of the status = 400. Perhaps with simplexml_load_file('url', null, LIBXML_NOERROR);
If it still doesnt work, there are a lot of things that can go wrong. simplexml_load_file doesnt have a lot of options to debug. You could try using curl.
<?php
// make request
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://XXXXXXXXX:8080/symws/rest/standard/searchCatalog?clientID=StorageClient&term1=ocm00576702");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
// handle error; error output
if(curl_getinfo($ch, CURLINFO_HTTP_CODE) !== 200) {
var_dump($output);
} else {
//load xml
$xml= simplexml_load_string($output);
}
curl_close($ch);
if (empty($xml)) {
echo 'Something went wrong';
exit;
}
?>
UPDATE:
Give this a try please and see if you get some info back:
<?php
$content = file_get_contents('http://XXXXXXXXX:8080/symws/rest/standard/searchCatalog?clientID=StorageClient&term1=ocm00576702');
var_dump($http_response_header);
?>
My guess is that your URL is malformed.
Here's what the 400 code means.
"The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications."
Well, I don't think anyone was going to be able to help me with this one. Here's what it was: The server that was sending the request, storagews, was originally cloned from the server receiving the request, sws. Because of this, in storagews's etc/hosts file, the hostname "sws" was resolving to localhost. So it was trying to send the request to itself. It was raj_nt's comment that clued me in to this. If you want to make an answer, I'll give you the bounty.
Thanks for trying everyone!
checkout the request being sent by simplexml_load_file.
you could also try accessing the 'url' directly in the browser (assuming you are performing a GET request)
you also need to consider the server side. Your server might be throwing a response with a 400 code even though everything is working normal. It might also not programmed to accept the request you are sending.
I assume that file loads directly to Your browser. Check with CURL and spoof user-agent to one from Your browser. Try also to enable cookie feature.
here's some example: http://www.electrictoolbox.com/php-curl-user-agent/
$output = file_get_contents("http://www.canadapost.ca/cpc2/addrm/hh/current/indexa/caONu-e.asp");
var_dump($output);
HTTP 505 Status means the webserver does not support the HTTP version used by the client (in this case, your PHP program).
What version of PHP are you running, and what HTTP/Web package(s) are you using in your PHP program?
[edit...]
Some servers deliberately block some browsers -- your code may "look like" a browser that the server is configured to ignore. I would particularly check the user agent string that your code is passing along to the server.
Check in your PHP installation (php.ini file) if the allow_url_fopen is enabled.
If not, any calls to file_get_contents will fail.
It works fine for me.
That site could be blocking the server that you're using to access it.
When you run the URL from your browser, your own ISP is used to get the information and display in your browser. But when you run from PHP, the ISP of your web host is used to get the information, then it passes it back to you.
Maybe you can do this to check and see what kind of headers its returning for you?
$headers=get_headers("http://www.canadapost.ca/cpc2/addrm/hh/current/indexa/caONu-e.asp");
print_r($headers);
I am hesitated to ask this question because it looks weird.
But anyway.
Just in case someone had encountered the same problem already...
filesystem functions (fopem, file, file_get_contents) behave very strange for http:// wrapper
it seemingly works. no errors raised. fopen() returns resource.
it returns no data for all certainly working urls (e.g. http://google.com/).
file returns empty array, file_get_contents() returns empty string, fread returns false
for all intentionally wrong urls (e.g. http://goog973jd23le.com/) it behaves exactly the same, save for little [supposedly domain lookup] timeout, after which I get no error (while should!) but empty string.
url_fopen_wrapper is turned on
curl (both command line and php versions) works fine, all other utilities and applications works fine, local files opened fine
This error seems inapplicable because in my case it doesn't work for every url or host.
php-fpm 5.2.11
Linux version 2.6.35.6-48.fc14.i686 (mockbuild#x86-18.phx2.fedoraproject.org)
I fixed this issue on my server (running PHP 5.3.3 on Fedora 14) by removing the --with-curlwrapper from the PHP configuration and rebuilding it.
Sounds like a bug. But just for posterity, here are a few things you might want to debug.
allow_url_fopen: already tested
PHP under Apache might behave differently than PHP-CLI, and would hint at chroot/selinux/fastcgi/etc. security restrictions
local firewall: unlikely since curl works
user-agent blocking: this is quite common actually, websites block crawlers and unknown clients
transparent proxy from your ISP, which either mangles or blocks (PHP user-agent or non-user-agent could be interpreted as malware)
PHP stream wrapper problems
Anyway, first let's proof that PHPs stream handlers are functional:
<?php
if (!file_get_contents("data:,ok")) {
die("Houston, we have a stream wrapper problem.");
}
Then try to see if PHP makes real HTTP requests at all. First open netcat on the console:
nc -l 80000
And debug with just:
<?php
print file_get_contents("http://localhost:8000/hello");
And from here you can try to communicate with PHP, see if anything returns if you variate the response. Enter an invalid response first into netcat. If there's no error thrown, your PHP package is borked.
(You might also try communicating over a "tcp://.." handle then.)
Next up is experimenting with http stream wrapper parameters. Use http://example.com/ literally, which is known to work and never block user-agents.
$context = stream_context_create(array("http"=>array(
"method" => "GET",
"header" => "Accept: xml/*, text/*, */*\r\n",
"ignore_errors" => false,
"timeout" => 50,
));
print file_get_contents("http://www.example.com/", false, $context, 0, 1000);
I think ignore_errors is very relevant here. But check out http://www.php.net/manual/en/context.http.php and specifically try to set protocol_version to 1.1 (will get chunked and misinterpreted response, but at least we'll see if anything returns).
If even this remains unsuccessful, then try to hack the http wrapper.
<?php
ini_set("user_agent" , "Mozilla/3.0\r\nAccept: */*\r\nX-Padding: Foo");
This will not only set the User-Agent, but inject extra headers. If there is a processing issue with construction the request within the http stream wrapper, then this could very eventually catch it.
Otherwise try to disable any Zend extensions, Suhosin, PHP xdebug, APC and other core modules. There could be interferences. Else this is potentiallyan issue specific to the Fedora package. Try a new version, see if it persists on your system.
When you use the http stream wrapper PHP creates an array for you called $http_response_header after file_get_contents() (or any of the other f family of functions) is called. This contains useful info on the state of the response. Could you do a var_dump() of this array and see if it gives you any more info on the response?
It's a really weird error that you're getting. The only thing I can think of is that something else on the server is blocking the http requests from PHP, but then I can't see why cURL would still be ok...
Is http stream registered in your PHP installation? Look for "Registered PHP Streams" in your phpinfo() output. Mine says "https, ftps, compress.zlib, compress.bzip2, php, file, glob, data, http, ftp, phar, zip".
If there is no http, set allow_url_fopen to on in your php.ini.
My problem was solved dealing with the SSL:
$arrContextOptions = array(
"ssl" => array(
"verify_peer" => false,
"verify_peer_name" => false,
),
);
$context = stream_context_create($arrContextOptions);
$jsonContent = file_get_contents("https://www.yoursite.com", false, $context);
What does a test with fsockopen tell you?
Is the test isolated from other code?
I had the same issue in Windows after installing XAMPP 1.7.7. Eventually I managed to solve it by adding the following line to php.ini (while having allow_url_fopen = On):
extension=php_openssl.dll
Use http://pear.php.net/reference/PHP_Compat-latest/__filesource/fsource_PHP_Compat__PHP_Compat-1.6.0a2CompatFunctionfile_get_contents.php.html and rename it and test if the error occurs with this rewritten function.