Should I use proxies with simplexml_load_file and file_get_contents?

Should I use proxies with simplexml_load_file and file_get_contents? - php

I'm using simplexml_load_file to get RSS from several websites for a while.
Sometimes I get errors from some of these websites and for about 5 days I'm having errors from 2 specific websites.
Here are the errors from simplexml_load_file:
PHP Warning: simplexml_load_file(http://example.com/feed): failed to open stream: Connection timed out
PHP Warning: simplexml_load_file(): I/O warning : failed to load external entity "http://example.com/feed"
Here are the errors from file_get_contents:
PHP Warning: file_get_contents(http://example.com/page): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden
That's how I'm using simplexml_load_file:
simplexml_load_file( $url );
That's how I'm using file_get_contents:
file_get_contents( $url );
Is that because I'm not using a proxy or invalid arguments?
UPDATE:
The 2 websites are using something like a firewall or a service to check for robots:
Accessing http://example.com/feed securely…
This is an automatic process. Your browser will redirect to your requested content in 5 seconds.

You're relying on an assumption that http://example.com/feed is always going to exist and always return exactly the content you're looking for. As you've discovered, this is a bad assumption.
You're attempting to access the network with your file_get_contents() and simplexml_load_file() and finding out that sometimes those call fail. You must always plan for these calls to fail. It doesn't matter if some websites openly allow this kind of behavior or if you have very reliable web host. There are circumstances out of your control, such as an Internet backbone outage, that will eventually cause your application to get back a bad response. In your situation, the third party has blocked you. This is one of the failures that happen with network requests.
The first take away is that you must handle the failure better. You cannot do this with file_get_contents() because file_get_contents() was designed to get the contents of files. In my opinion the PHP implementers that allowed it to make network calls made a very serious mistake allowing it this functionality. I'd recommend using curl:
function doRequest($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
$output = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if () {
return $output;
} else {
throw new Exception('Sorry, an error occurred');
}
}
Using this you will be able to handle errors (they will happen) better for your own users.
You're second problem is that this specific host is giving you a 403 error. This is probably intentional on their end. I would assume that this is them telling you that they don't want you using their website like this. However you will need to engage them specifically and ask them what you can do. They might ask you to use a real API, they might just ignore you entirely, they might even tell you to pound sand - but there isn't anything that we can do to advise here. This is strictly a problem (or feature) with their software and you must contact them directly for advice.
You could potentially use multiple IP addresses to connect to websites and rotate IPs each time one gets blocked. But doing so would be considered a malicious attack on their service.

Related

Loading website server side returns 403 error

I was trying to get RSS info from a website the other day, but when I tried to load it using PHP it returned a 403 error.
This was my PHP code:
<?php
$rss = file_get_contents('https://hypixel.net/forums/-/index.rss');
echo $rss;
?>
And the error I got was:
failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden
I must say that loading it regularly from a browser works just fine, but when I try loading it using PHP or any other server-side method it won't work.

Some people don't like servers accessing their stuff. They provide a service intended for human consumers, and not bots. Therefore they may include code that checks whether you are in fact a human using a web browser, which your naïve PHP script is failing to provide. Therefore, the third-party is returning a 403 Forbidden error, indicating that it is forbidden for your program to access it.
There are ways around this, of course, depending on how it's implemented. The most obvious thing to do is send a User-Agent header pretending to be a browser. But servers may do more clever checks than this, and it's questionably moral.

Cannot Get cURL or file_get_contents to work in PHP

I am having trouble getting either a cURL request or file_get_contents() for a https page that returns JSON. Please note that I have looked at multiple StackOverflow topics with similar problems, but none seem to address my specific issue.
BACKGROUND: I am using a MAC running OS X 10.10.3 (Yosemite). I just installed XAMPP 5.6.8 so that I can run a local web server in order to play around with php, etc.
GOAL: I would like to simply display the contents of a JSON object returned by a https GET request. For context, the page is https://api.discogs.com/releases/249504.
ACTIONS: I have read on other posts that for retrieving HTTPS pages I must make the following out-of-the-box changes in my php.ini file, which I have done:
- uncomment/turn on allow_url_fopen=On
- uncomment/turn on allow_url_include=on
- uncomment extension=php_openssl.dll
RESULTS:
Using file_get_contents()...
Code:
<?php
$url = "https://api.discogs.com/releases/249504";
$text = file_get_contents($url);
var_dump($text);
?>
Result:
Warning: file_get_contents(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed in /Applications/XAMPP/xamppfiles/htdocs/phpWorkspace/firstPhp/firstPhp.php on line 5
Warning: file_get_contents(): Failed to enable crypto in /Applications/XAMPP/xamppfiles/htdocs/phpWorkspace/firstPhp/firstPhp.php on line 5
Warning: file_get_contents(https://api.discogs.com/releases/249504): failed to open stream: operation failed in /Applications/XAMPP/xamppfiles/htdocs/phpWorkspace/firstPhp/firstPhp.php on line 5
bool(false)
Using cURL...
Code:
https://api.discogs.com/releases/249504";
// Initiate curl
$ch = curl_init();
// Will return the response, if false it print the response
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Set the url
curl_setopt($ch, CURLOPT_URL,$url);
// Execute
$result=curl_exec($ch);
// Closing
curl_close($ch);
var_dump($result);
?>
Result:
bool(false)
Any recommendations would be extremely helpful. I've played around with different cURL calls based on other SO posts, each have either returned NULL, an error related to certificates/authentication, or like above.
UPDATE!
I added CURLOPT_SSL_VERIFYPEER to my cURL code and set it to off in order to validate that I could at least communicate with the server - which helped me then identify that I needed to provide a user agent. However, when I removed the SSL_VERIFYPEER, I still had certificate issues.
As a last ditched effort I uninstalled XAMPP, and turned on the php/apache from within the Mac OS. Once configured, out-of-the-box both the the cURL and FILE_GET_CONTENTS() calls worked with the user agent using the same code as above [frustrating!].
My gut tells me that I had a bad configuration in XAMPP related to my certificates. Interestingly though I didn't see much online as this being an issue with other XAMPP users.

Try adding this
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
The certificate is not being successfully verified. This may not correct it depending on how severe the issue is.
You should not rely on this as a fix, however it should tell you if your system can even communicate with theirs. Reference the post Frankzers posted.

Use Server Side Includes 'include' command in the html file

Background
I am trying to include an rss feed using php into a html document
Code
<?php
include ("feed url");
?>
I have used ssl command to successfully add the include tag in the html file like this
<!--#include virtual="rssfeed.php" -->
which works fine after editing htaccess file. Now problem is because in my php im using include ("feed url") I am getting this error:
Warning: include() [function.include]: URL file-access is disabled in
the server configuration in path/rssfeed.php on line 2
Warning: include(feed url) [function.include]: failed to open stream:
no suitable wrapper could be found in path/rssfeed.php on line 2
Now things to note I have tried setting php_value allow_url_fopen 1 but no luck as the files are held on third party hosting server so I do not have alot of access so they have blocked me from turning allow_url_fopen to ON for obvious reasons. So My question is how do I approch this problem ? Any directions will be greatly apperciated.
Thanks everyone for reading.

Your server is configured in such a way that you cannot include from a remote location. This is common in shared hosting environments to help reduce server load and reduce the possibility of malicious code being accidentally executed.
However, if I understand you right, you could not just include the RSS feed using the include() construct anyway, because it is not valid PHP code - include() expects the path to be a valid PHP source code file. What you are doing, if your server allowed you to do it, would result in either useless output or a parse error.
You need to connect to the RSS feed (e.g. using cURL or fsockopen() depending on the level of control you want over the request to the remote site) and parse the feed data so you can output in a sensible format.

include "http://..." is a bad idea because the contents of http://... are evaluated as PHP code which opens your site open to attacks if someone can inject PHP code in the response of that RSS feed.
Use curl if you want to display data from another site. From the PHP Manual example:
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
?>

php file_get_contents not working on real server! does work on localhost?

hey guys,
i developed a website on my local apache setup on my mac. I'm using two requests to foreign domains. One goes out to geoplugin.net to get the current geolocation.
This works just fine on my local setup. However when I transfer the files to my real server the website prints the following:
Warning:
file_get_contents(http://www.geoplugin.net/php.gp?ip=185.43.32.341)
[function.file-get-contents]: failed
to open stream: HTTP request failed!
HTTP/1.0 403 Forbidden in
/home/.sites/74/site484/web/testsite/wp-content/themes/test/header.php
on line 241
what can I do here? What am I doing wrong?
Furthermore I'm using a curl request on my website which doesn't retrieve data as well. Both works fine on my local mamp setup.
any ideas?

The server responds with an "403 FORBIDDEN" status code. So file_get_contents() works fine, but the server you are trying to access (or a proxy or something in between) dont allow it.
This can have many reasons. For example (like the comment of the question) you are banned, or blocked (because of to much requests), or something.

HTTP/1.0 403 Forbidden
means you are not allowed to access this files! Try to add an user agent header.

You need to create an account at geoplugin.com and subscribe your domain to use the webservice without limitation, then you will stop to receive de 403 forbidden error. Don't worry about costs, it's a free service, i'm using it in three sites.

try to urlencode the query string.
also I would recommend using curl extension.

That is because geoPlugin is limited to 120 lookups per minute.
http://www.geoplugin.com/premium
So, any web-site feature based on this solution can be damaged suddenly.
I would recommend to use both www.geoplugin.net/json.gp?ip={ip} and freegeoip.net/json/{ip} . And check if first one returns null (means that limit already reached) then use another one.

PHP5 giving failed to open stream: HTTP request failed error when using fopen

This problem seems to have been discussed in the past everywhere on google and here, but I have yet to find a solution.
A very simple fopen gives me a
PHP Warning: fopen(http://www.google.ca): failed to open stream: HTTP request failed!".
The URL I am fetching have no importance because even when I fetch http://www.google.com it doesnt work. The exact same script works on different server. The one failing is Ubuntu 10.04 and PHP 5.3.2. This is not a problem in my script, it's something different in my server or it might be a bug in PHP.
I have tried using a user_agent in php.ini but no success. My allow_url_fopen is set to On.
If you have any ideas, feel free!

It sounds like your configuration isn't allowed to use file functions, which is common these days because of security concerns. If you have the cURL libraries available to you, I would recommend trying those out.
PHP: cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.google.ca/");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$file = curl_exec($ch);
curl_close($ch);
echo $file;

Check that your php.ini config is set to allow fopen to open external URL's:
allow_url_fopen "1"
http://www.php.net/manual/en/filesystem.configuration.php#ini.allow-url-fopen

I'm not at all sure about whether this is the problem or not, but I know in the past I've had problems with opening URLs with fopen, often due to php.ini's allow_url_fopen or other unknown security settings
You may want to try cURL in PHP, which often works for me, you'll find an example really easily by googling that.

Check your phpinfo output - is http present under Registered PHP Streams?

Are you getting "HTTP request failed" without further details? The socket timeout could be expired. This default to 60 seconds. See: http://php.net/manual/en/function.socket-set-timeout.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Should I use proxies with simplexml_load_file and file_get_contents? - php

Related

Loading website server side returns 403 error

Cannot Get cURL or file_get_contents to work in PHP

Use Server Side Includes 'include' command in the html file

php file_get_contents not working on real server! does work on localhost?

PHP5 giving failed to open stream: HTTP request failed error when using fopen

Categories

Resources