simplexml_load_file() does not load XML file when the URL includes an ampersand symbol. I have tried two examples with and without ampersand:
$source1 = simplexml_load_file("http://www.isws.illinois.edu/warm/data/outgoing/nbska/datastream.aspx?id=ncu");
print_r($source1); //works
$source2 = simplexml_load_file("http://forecast.weather.gov/MapClick.php?lat=38.8893&lon=-77.0494&unit=0&lg=english&FcstType=dwml");
print_r($source2); //no output
First example works well as it does not includes ampersand, but the second example does not work as it include ampersand.
I have referenced
simplexml_load_file with & (ampersand) in url with Solr and simplexml_load_file ampersand in url but it did not work.
The issue is not the ampersand in the URL. The issue, instead, is that weather.gov appears to be blocking these types of requests. They will not allow users that do not have a useragent set.
The fastest way to get around this is to set a UserAgent within PHP, which you can do by putting this code above your xml call:
ini_set('user_agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20100101 Firefox/9.0');
However, I would recommend using CURL instead of simplexml_load_file, as simplexml_load_file is often restricted by server configuration. If you were to do this with curl, you'd want to do something like the first answer here:
SimpleXML user agent
I have tested this locally and got it working just by specifying a user agent.
EDIT: Also, welcome to SO! Be sure to vote often ;D
Related
I see some weird code in a site. I am very confused about this. There is the http request that I tested my own server:
http://192.168.1.3/folder/ui/login_html.php/TEST/TEST
The folder named login_html.php and TEST are not exist.I checked the debug information for Chrome.It can properly request files but cannot parse it.debug information
.
It seems that will request all css and js resources which referenced in the login_html.php, And the request is initiated by TEST.
By the way, I did nothing in login_html.php, I just reference files and write some html code.
There is an apache information in access_log and nothing in error_log:
"GET /cos/ui/login_html.php/TEST/js/cloudmanager.js HTTP/1.1" 200 9564 "http://192.168.1.3/cos/ui/login_html.php/TEST/TEST" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36"
I am confused about this.And can someone explain it?
Everything after the ? is the query string.
Often this contains variables in the form var1=value1&var2=value2. In that case, PHP parses these automatically and puts them in the $_GET array.
In your example the query string doesn't contain a normal set of variables, so the $_GET array would likely be little use. However, you could just get the entire query string from the $_SERVER array.
$var = $_SERVER['QUERY_STRING'];
// $var would be "/HOME/getVersion"
I am writing a perl program to get the content of one website. while passing cookie in the request, the response i am getting is Disallowed Key Characters.. The webpage, i am trying to get the content of, is designed using PHP. Is there any other way of passing cookies in a clean manner and get the content of the page,same as the browsers do?
The perl snippet is as follows:
my $req = HTTP::Request->new(GET => $link);
$req->header("Host" => "www.example.com/sms");
$req->header('User-Agent' => 'Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 Firefox/29.0');
$req->header("Accept" => "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
$req->header("Accept-Language" => "en-us,en;q=0.5");
$req->header('Referer' => 'www.example.com/sms');
$req->header("Cookie" => 'ci_session=a:15:{s:10:"session_id";s:32:"6a023126d470b5c23231f38b00be945f";s:10:"ip_address";s:14:"122.165.230.17";s:10:"user_agent";s:76:"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0";s:13:"last_activity";i:1402922915;s:9:"user_data";s:0:"";s:1:"u";s:7:"username";s:2:"id";s:2:"47";s:7:"uidtype";s:1:"0";s:2:"to";s:0:"";s:4:"from";s:0:"";s:6:"userid";s:0:"";s:3:"fto";s:0:"";s:5:"ffrom";s:0:"";s:12:"sendcontacts";s:0:"";s:6:"checks";s:0:"";}4bdd1a196f5e2fff297cbc0333fde8be');
$req->header("Connection" => "keep-alive");
my $res = $usragt->request($req);
my $code = $res->code();
my $content = $res->content();
print "\n<p>$content</p>\n";
Output:
<p>Disallowed Key Characters.s:32:"6a023126d470b5c23231f38b00be945f"</p>
I was going to put this in a comment since it's not really an answer, but it's too long:
I'm looking at the structure of your cookie, and it's a bit suspicious looking. Here's the breakdown of that cookie:
ci_session=a:15:
{
s:10:"session_id";
s:32:"6a023126d470b5c23231f38b00be945f";
s:10:"ip_address";
s:14:"122.165.230.17";
s:10:"user_agent";
s:76:"Mozilla/5.0 (X11;Ubuntu;Linux x86_64;rv:30.0) Gecko/20100101 Firefox/30.0";
s:13:"last_activity";
i:1402922915;
s:9:"user_data";
s:0:"";
s:1:"u";
s:7:"username";
s:2:"id";
s:2:"47";
s:7:"uidtype";
s:1:"0";
s:2:"to";
s:0:"";
s:4:"from";
s:0:"";
s:6:"userid";
s:0:"";
s:3:"fto";
s:0:"";
s:5:"ffrom";
s:0:"";
s:12:"sendcontacts";
s:0:"";
s:6:"checks";
s:0:"";
}
4bdd1a196f5e2fff297cbc0333fde8be');
That last line could be cookie data, but the rest looks like something else. Cookies are bits of data that point to a unique ID set by the server. They usually have a name and a value associated with it.
The purpose of a cookie is to identify the browser that had previously visited the site. HTTP has no state, and cookies could help establish a state. For example, if you visit a store, a cookie could be set to represent your personal shopping cart. The items you buy won't be stored in the cookie -- only the cart ID. This way, the server can recognize you as you move around the store.
A cookie could contain an id that recognizes you as a user. For example, I log into a site, and check the keep me logged in box. A cookie is set to identify my user ID. When I return to the site, the site sees the cookie that's associated with a particular ID and skips the login process.
The point is that cookies themselves are usually short and sweet. Maybe 64 characters at the most. They may have an expiration date associated with it. Your cookie doesn't look like this. It's long, it's complex, and it contains a lot of stuff that are other parts of a header. I see IP address, Session_ID, User Agent, and what looks like some sort of query in that string. Much of it would change from system to system, so it'd make a terrible cookie.
Did you check the return status code from that webpage? I wouldn't be surprised if it was a 200 OK code. If that's the case, it means you've successfully contacted the server, and talked to that PHP page. It's the PHP page that's sending you back the error.
Since it's in the session_id, it could be that it's an invalid Session ID. Or, it could be almost anything else. You're talking to a PHP program, and it's hard to say what the error could possibly mean in that case.
You may need to find out how to chat with this webpage you want to talk to. Find out exactly the headers you need. Using curlcould help. You can play around with the --data headings and see what's going on.
Sorry I can't give you a better answer than this.
I'm getting errors while scraping data from usaspending.gov can I can't figure out why. I've checked that my php settings are all open and even setup a test scrape of another random site url.
I took another step to include options with the method and useragent.
I suspect it's timing out, but if that's not it, I'm not sure what else to try to get this to work. Every other url I try, I have no problem getting into. If anyone has any suggestions, I'd love to read them!!
Here's my sample code.
$opts = array(
'http'=>array(
'method'=>"GET",
'user_agent'=>"Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8",
'timeout'=>60
)
);
$context = stream_context_create($opts);
$test = file_get_contents('http://www.usaspending.gov/fpds/fpds.php?state=MI&detail=c&fiscal_year=2013',false,$context);
I'll also add, I've tried this with fopen, file_get_contents, and simplexml_load_file with no luck. I've tried it with the extended options on fopen and file_get_contents, no change. I'm sure I'm missing something small, just can't figure out what it is.
Edit: Here's the error message
Warning: file_get_contents(http://www.usaspending.gov/fpds/fpds.php?state=MI&detail=c&fiscal_year=2013) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in...
Additionally, the link works I'm trying to open, if you copy/paste it into your browser, you should get the download.
After beating my head against this same wall for a while, I used a curl method (How to get the real URL after file_get_contents if redirection happens?) to find where the basic API URL was redirecting and that seems to be working now!
Instead of getting your same error message with:
file_get_contents(http://www.usaspending.gov/fpds/fpds.php?detail=c&fiscal_year=2013&state=AL&max_records=1000&records_from=0)
It is now working for me with:
file_get_contents(http://www.usaspending.gov/api/fpds_api_complete.php?fiscal_year=2013&vendor_state=AL&Contracts=c&sortby=OBLIGATED_AMOUNT%2Bdesc&records_from=0&max_records=20&sortby=OBLIGATED_AMOUNT+desc)
So pretty much using this as my base URL to access the API with more parameters added on (with the "Contracts" parameter replacing the original "detail" parameter):
http://www.usaspending.gov/api/fpds_api_complete.php?Contracts=c&sortby=OBLIGATED_AMOUNT%2Bdesc&sortby=OBLIGATED_AMOUNT+desc
I hope this helps, and works for you too!
I'm trying to download the contents of a web page using PHP.
When I issue the command:
$f = file_get_contents("http://mobile.mybustracker.co.uk/mobile.php?searchMode=2");
It returns a page that reports that the server is down. Yet when I paste the same URL into my browser I get the expected page.
Does anyone have any idea what's causing this? Does file_get_contents transmit any headers that differentiate it from a browser request?
Yes, there are differences -- the browser tends to send plenty of additionnal HTTP headers, I'd say ; and the ones that are sent by both probably don't have the same value.
Here, after doing a couple of tests, it seems that passing the HTTP header called Accept is necessary.
This can be done using the third parameter of file_get_contents, to specify additionnal context informations :
$opts = array('http' =>
array(
'method' => 'GET',
//'user_agent ' => "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2) Gecko/20100301 Ubuntu/9.10 (karmic) Firefox/3.6",
'header' => array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*\/*;q=0.8
'
),
)
);
$context = stream_context_create($opts);
$f = file_get_contents("http://mobile.mybustracker.co.uk/mobile.php?searchMode=2", false, $context);
echo $f;
With this, I'm able to get the HTML code of the page.
Notes :
I first tested passing the User-Agent, but it doesn't seem to be necessary -- which is why the corresponding line is here as a comment
The value is used for the Accept header is the one Firefox used when I requested that page with Firefox before trying with file_get_contents.
Some other values might be OK, but I didn't do any test to determine which value is the required one.
For more informations, you can take a look at :
file_get_contents
stream_context_create
Context options and parameters
HTTP context options -- that's the interesting page, here ;-)
replace all spaces with %20
How can we get Browser Name and Version information using php script?
<?php
echo $_SERVER['HTTP_USER_AGENT'];
?>
As Palantir says, additionally have a look at the get_browser function, where you can check also capabilities enabled in the browser.
http://php.net/manual/en/function.get-browser.php
You will need to create function to translate user agent data into common names of browsers
For example, $_SERVER['HTTP_USER_AGENT'] could return
Mozilla/5.0 (Windows; ?; Windows NT 5.1; *rv:*) Gecko/* Firefox/0.9* is firefox
or
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.55 Safari/533.4 is Chrome
The details provide you with the rendering engine, code base, version, os, etc...
I'd suggest using preg_match and/or a list of known browsers is you want to do something like
echo browserCommonName($_SERVER['HTTP_USER_AGENT']);
to output "Google Chrome".
browserCommonName($userAgent) would need a list of known browsers.
edit: just noticed get_browser buit into php does this, my bad for not reading the thread.
See get_browser().
<?php
echo $_SERVER['HTTP_USER_AGENT'] . "\n\n";
$browser = get_browser(null, true);
print_r($browser);
?>
All in all, you can't. You can certainly try to get it, and you're almost certainly guaranteed to get something that looks like what you want; but there is absolutely no way of checking wether or not the information is correct. When you receive a user agent string, the browser on the other end could be truthful, or it could be lying. When dealing with users, always assume that it is, in fact, lying.
There is no "best way" to deal with this, but what you most likely want to do is test your site with a wide variety of browsers, use portable HTML and CSS techniques, and if you absolutely must, fill the holes with JavaScript.
Choosing what data to send to a browser based on what browser you think it is, is a Bad Idea™.
You could always take a look at the php function get_browser http://php.net/manual/en/function.get-browser.php. You will need $_SERVER['HTTP_USER_AGENT'] for this.
You may also want to take a look at Chris Schuld Browser Detection Class. http://chrisschuld.com/projects/browser-php-detecting-a-users-browser-from-php.html