In my php code, i get the web page with this code :
$opts = array(
'http'=>array(
'method'=>"GET",
'timeout' => 5
)
);
$context = stream_context_create($opts);
$response = file_get_contents($url , false, $context);
It works, yet I have a problem. I wish I get the page with 1 or 2 seconds of delay. I receive the source code but the page will update with jQuery. It is not the same after.
For example the target page has a div#result which is filled by jQuery request after loading of page. For my part, I do get only the empty div#result.
It's impossible, whether in php or jquery.
Related
Im trying to scrape information from the site http://steamstat.us - The thing i want to get is the status and such from the site.
Im currently only using this code:
<?php
$homepage = file_get_contents('http://www.steamstat.us/');
echo $homepage;
?>
The problem I have here is that "Normal (16h)" and the rest just returns 3 dots.
Cant figure what the problem should be.
Anyone have any clue?
EDIT
This is now fixed.
I solved the problem as followed:
<?php
$opts = array('http' => array('header' => "User-Agent:MyAgent/1.0\r\n"));
$context = stream_context_create($opts);
$json_url = file_get_contents('https://crowbar.steamdb.info/Barney', FALSE, $context);
$data = json_decode($json_url);
?>
Its a https protocol which is not easy to scrap. Thought the website allows it as the headers sent for "access-control-allow-origin" are marked as * which means the content can be requested by any other site.
You are not receiving the content becasue Normal (16h) is not yet populated on page load. Its coming from ajax.
The HTML source says <span class="status" id="repo">…</span>. You are receiving these three dots inside span tag in file_get_contents.
The only way to do it is to look for the ajax call in the network log and then use file_get_contents in that URL called by ajax.
Any idea, Page loaded from different domain in iframe containing a table in it. how can i get that table.. Using any language, html, javascript, jquery, php, etc ..
Edit:
You can get data after posting the form with PHP too. Try this:
$post = http_build_query(
array(
'var1' => 'some content',
'var2' => 'doh'
)
);
$opts = array('http' =>
array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
'content' => $post
)
);
$context = stream_context_create($opts);
$file = file_get_contents('http://example.com/submit.php', false, $context);
$file now contains the response to the posted data. Now parse it via simplehtmldom that i said below.
Obviously, it won't work due to the cross-origin policy restrictions.
If you know the src of the iframe, you can get the source by either CURL or file_get_contents and then read through the DOM structure to get the desired table data.
A sample code:
$file = file_get_contents( 'IFRAME_URL_HERE' );
Once you have got the source, you can parse it very easily by using a library called SimpleHTMLDom http://simplehtmldom.sourceforge.net/
It can get you the desired table data in like a line or two of code. (Very similar syntax to jQuery, just written in PHP).
If you can give the iframe src and which table data you want, i can give you a working sample.
Hope it helps.
You could get it using CURL in PHP - just curl the url that the iframe is loading. Here's a simple guide to getting started doing this kind of thing (scraping with PHP).
http://www.phpbuilder.com/columns/marc_plotz011410.php3
page: http://www.nastygal.com/accessories/minnie-bow-clutch
code: $html = file_get_contents('http://www.nastygal.com/accessories/minnie-bow-clutch');
The $html always contains the USD price of the product even when I change the currency on the upper right of the page. How do I capture the html that has the CAD price when I change the currency of the page to CAD?
It looks like currency preferences are being saved in a cookie named: CURRENCYPREFERENCE
Since it's not your browser making the connection to retrieve that view, you're likely not sending any cookie data along with your request.
I believe example #4 here will get you what you need:
http://php.net/manual/en/function.file-get-contents.php
It seems as though the country and currency selection are stored in cookies.
I'm assuming you're going to have to pass those values along with your file_get_contents() call. See: PHP - Send cookie with file_get_contents
EDIT #1
To follow up on my comment, I just tested this:
// Create a stream
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: CURRENCYPREFERENCE=cad\r\n"
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.nastygal.com/accessories/minnie-bow-clutch', false, $context);
print_r($file);
And was able to get this:
EDIT #2:
In response to your second comment. Those were important details. What does your bookmarklet do with the scraped contents? Are you saving a copy of the bookmarked product page on your own website? Regardless, you're going to have to modify your bookmarklet to check the user's cookies before submitting the request to run file_get_contents().
I was able to access my cookies from nastygal.com using the following simple bookmarklet example. Note: nastygal.com uses jQuery and the jQuery UI cookie plugin. If you're looking for a more generic solution, you should not rely on these scripts being there:
javascript:(function(){ console.log($.cookie('CURRENCYPREFERENCE')); }());
Output in the JS console:
cad
I would like to get the resulting web page of a specific form submit. This form is using POST so my current goal is to be able to send POST data to an url, and to get the HTML content of the result in a variable.
My problem is that i cannot use cUrl (not enabled), that's why i ask for your knowledge to know if an other solution is possible.
Thanks in advance
See this, using fsockopen:
http://www.jonasjohn.de/snippets/php/post-request.htm
Fsockopen is in php standard library, so all php fron version 4 has it :)
try file_get_contents() and stream
$opts = array( 'http'=>array('method'=>"POST", 'content' => http_build_query(array('status' => $message)),));
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);
Recently, with no changes to my code, my PHP page started to hang at a certain area. It generates all of the HTML on the page right before this line:
$tickerJSON = file_get_contents("http://mtgox.com/code/data/ticker.php");
I commented out everything else and this is the cause of the error.
I know that that JSON url is valid and the array names are correct. I'm not sure where the problem is in this case. Any help?
Note: It doesn't display a partial or white page, it'll keep loading forever with no display output.
The problem is that the remote server appears to be purposely stall requests that don't send a user agent string. By default, PHP's user-agent string is blank.
Try adding this line directly above your call:
ini_set('user_agent', 'PHP/' . PHP_VERSION);
I've tested the above using this script and it worked great for me:
<?php
ini_set('user_agent', 'PHP/' . PHP_VERSION);
$tickerJSON = file_get_contents("http://mtgox.com/code/data/ticker.php");
echo $tickerJSON;
Update:
$tickerJSON = shell_exec('wget --no-check-certificate -q -O - https://mtgox.com/code/data/ticker.php');
The remote connection you do takes a very long time. You can go around with that providing a timeout value. If it takes too long, the function won't return any data but it wont hinder the script as well from continuing to run.
Next to that you need to set the user-agent:
// Create a stream
$opts = array(
'http'=>array(
'timeout'=> 3, // 3 second timeout
'user_agent'=> 'hashcash',
'header'=>"Accept-language: en\r\n"
)
);
$context = stream_context_create($opts);
$url = "https://mtgox.com/code/data/ticker.php";
$tickerJSON = file_get_contents($url, FALSE, $context);