File_get_contents returns "..." - php

Im trying to scrape information from the site http://steamstat.us - The thing i want to get is the status and such from the site.
Im currently only using this code:
<?php
$homepage = file_get_contents('http://www.steamstat.us/');
echo $homepage;
?>
The problem I have here is that "Normal (16h)" and the rest just returns 3 dots.
Cant figure what the problem should be.
Anyone have any clue?
EDIT
This is now fixed.
I solved the problem as followed:
<?php
$opts = array('http' => array('header' => "User-Agent:MyAgent/1.0\r\n"));
$context = stream_context_create($opts);
$json_url = file_get_contents('https://crowbar.steamdb.info/Barney', FALSE, $context);
$data = json_decode($json_url);
?>

Its a https protocol which is not easy to scrap. Thought the website allows it as the headers sent for "access-control-allow-origin" are marked as * which means the content can be requested by any other site.
You are not receiving the content becasue Normal (16h) is not yet populated on page load. Its coming from ajax.
The HTML source says <span class="status" id="repo">…</span>. You are receiving these three dots inside span tag in file_get_contents.
The only way to do it is to look for the ajax call in the network log and then use file_get_contents in that URL called by ajax.

Related

Simple DOM file_get_html returns nothing

I'm trying to scrape data from some websites. For several sites it all seems to go fine, but for one website it doesn't seem to be able to get any HTML. This is my code:
<?php include_once('simple_html_dom.php');
$html = file_get_html('https://www.magiccardmarket.eu/?mainPage=showSearchResult&searchFor=' . $_POST['data']);
echo $html; ?>
I'm using ajax to fetch the data. When I log the returned value in my js it's completely empty.
Could it be due to the fact that this website is running on https? And if so, is there any way to work around it? (I've tried changed the url to http, but I get the same result)
Update:
If I var_dump the $html variable, I get bool(false).
My PHP error log says this:
[27-Feb-2014 22:20:50 Europe/Amsterdam] PHP Warning: file_get_contents(http://www.magiccardmarket.eu/?mainPage=showSearchResult&searchFor=tarmogoyf): failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden
in /Users/leondewit/PhpstormProjects/Magic/stores/simple_html_dom.php on line 75
It's your user agent, file_get_contents doesn't send one by default, so:
$url = 'http://www.magiccardmarket.eu/?mainPage=showSearchResult&searchFor=tarmogoyf';
$context = stream_context_create(array('http' => array('header' => 'User-Agent: Mozilla compatible')));
$response = file_get_contents($url, false, $context);
$html = str_get_html($response);
echo $html;

Stuck getting price info from a web page with multi-currencies

page: http://www.nastygal.com/accessories/minnie-bow-clutch
code: $html = file_get_contents('http://www.nastygal.com/accessories/minnie-bow-clutch');
The $html always contains the USD price of the product even when I change the currency on the upper right of the page. How do I capture the html that has the CAD price when I change the currency of the page to CAD?
It looks like currency preferences are being saved in a cookie named: CURRENCYPREFERENCE
Since it's not your browser making the connection to retrieve that view, you're likely not sending any cookie data along with your request.
I believe example #4 here will get you what you need:
http://php.net/manual/en/function.file-get-contents.php
It seems as though the country and currency selection are stored in cookies.
I'm assuming you're going to have to pass those values along with your file_get_contents() call. See: PHP - Send cookie with file_get_contents
EDIT #1
To follow up on my comment, I just tested this:
// Create a stream
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: CURRENCYPREFERENCE=cad\r\n"
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.nastygal.com/accessories/minnie-bow-clutch', false, $context);
print_r($file);
And was able to get this:
EDIT #2:
In response to your second comment. Those were important details. What does your bookmarklet do with the scraped contents? Are you saving a copy of the bookmarked product page on your own website? Regardless, you're going to have to modify your bookmarklet to check the user's cookies before submitting the request to run file_get_contents().
I was able to access my cookies from nastygal.com using the following simple bookmarklet example. Note: nastygal.com uses jQuery and the jQuery UI cookie plugin. If you're looking for a more generic solution, you should not rely on these scripts being there:
javascript:(function(){ console.log($.cookie('CURRENCYPREFERENCE')); }());
Output in the JS console:
cad

cUrl alternatives to get POST answer on a webpage

I would like to get the resulting web page of a specific form submit. This form is using POST so my current goal is to be able to send POST data to an url, and to get the HTML content of the result in a variable.
My problem is that i cannot use cUrl (not enabled), that's why i ask for your knowledge to know if an other solution is possible.
Thanks in advance
See this, using fsockopen:
http://www.jonasjohn.de/snippets/php/post-request.htm
Fsockopen is in php standard library, so all php fron version 4 has it :)
try file_get_contents() and stream
$opts = array( 'http'=>array('method'=>"POST", 'content' => http_build_query(array('status' => $message)),));
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);

PHP Magento Screen Scraping

I am trying to scrape a suppliers magento site in an effort to save some time because of there being around 2000 products I need to gather info for. I'm totally OK with writing a screen scraper for pretty much anything but i've encountered a major problem. Im using get_file_contentsto gather the html of the product page.
The problem is:
You need to be logged in, to view the product page. Its a standard magento login, so how can I get round this in my screen scraper? I don't require a full script, just advice on a method.
Using stream_context_create you can specify headers to be sent when calling your file_get_contents.
What I'd suggest is, open your browser and login to the site. Open up Firebug (or your favorite Cookie viewer) and grab the cookies and send them with your request.
Edit: Here's an example from PHP.net:
<?php
// Create a stream
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: foo=bar\r\n"
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);
?>
Edit (2): This is out of the scope of your question, but if you are wondering how to scrape the website afterwards you could look into the DOMDocument::loadHTML method. This will essentially give you the required functions (i.e. XPath query, getElementsByTagName, getElementsById) to scrape what you need.
If you want to scrape something simple, you can also use RegEx with preg_match_all.
If you're familiar with CURL this should be relatively simple to do in a day or so. I've created some similar apps to login to banks to retrieve data - which of course also require authentication.
Below is a link with an example of how to use CURL with cookies for authentication purposes:
http://coderscult.com/php/php-curl/2008/05/20/php-curl-cookies-example/
If you can grab the output of the page you can parse for your results with a regex. Alternatively, you can use a class like Snoopy to do this work for you:
http://sourceforge.net/projects/snoopy/

PHP Get Content of HTTP 400 Response

I am using PHP with the Amazon Payments web service. I'm having problems with some of my requests. Amazon is returning an error as it should, however the way it goes about it is giving me problems.
Amazon returns XML data with a message about the error, but it also throws an HTTP 400 (or even 404 sometimes). This makes file_get_contents() throw an error right away and I have no way to get the content. I've tried using cURL also, but never got it to give me back a response.
I really need a way to get the XML returned regardless of HTTP status code. It has an important "message" element that gives me clues as to why my billing requests are failing.
Does anyone have a cURL example or otherwise that will allow me to do this? All my requests currently use file_get_contents() but I am not opposed to changing them. Everyone else seems to think cURL is the "right" way.
You have to define custom stream context (3rd argument of function file_get_contents) with ignore_errors option on.
As a follow-up to DoubleThink's post, here is a working example:
$url = 'http://whatever.com';
//Set stream options
$opts = array(
'http' => array('ignore_errors' => true)
);
//Create the stream context
$context = stream_context_create($opts);
//Open the file using the defined context
$file = file_get_contents($url, false, $context);

Categories