page: http://www.nastygal.com/accessories/minnie-bow-clutch
code: $html = file_get_contents('http://www.nastygal.com/accessories/minnie-bow-clutch');
The $html always contains the USD price of the product even when I change the currency on the upper right of the page. How do I capture the html that has the CAD price when I change the currency of the page to CAD?
It looks like currency preferences are being saved in a cookie named: CURRENCYPREFERENCE
Since it's not your browser making the connection to retrieve that view, you're likely not sending any cookie data along with your request.
I believe example #4 here will get you what you need:
http://php.net/manual/en/function.file-get-contents.php
It seems as though the country and currency selection are stored in cookies.
I'm assuming you're going to have to pass those values along with your file_get_contents() call. See: PHP - Send cookie with file_get_contents
EDIT #1
To follow up on my comment, I just tested this:
// Create a stream
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: CURRENCYPREFERENCE=cad\r\n"
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.nastygal.com/accessories/minnie-bow-clutch', false, $context);
print_r($file);
And was able to get this:
EDIT #2:
In response to your second comment. Those were important details. What does your bookmarklet do with the scraped contents? Are you saving a copy of the bookmarked product page on your own website? Regardless, you're going to have to modify your bookmarklet to check the user's cookies before submitting the request to run file_get_contents().
I was able to access my cookies from nastygal.com using the following simple bookmarklet example. Note: nastygal.com uses jQuery and the jQuery UI cookie plugin. If you're looking for a more generic solution, you should not rely on these scripts being there:
javascript:(function(){ console.log($.cookie('CURRENCYPREFERENCE')); }());
Output in the JS console:
cad
Related
I have a situation like this.
A crawler script fetches the content of the URL using
file_get_contents().
It sets the user agent as "CrawlerBot" just
above the line where file_get_contents() is called using
ini_set('user_agent').
My concern is when I write ini_get('user_agent') in the code of URL, it gets a blank value. However when I use $_SERVER['HTTP_USER_AGENT'] it detects the correct user agent. Both the files are hosted on same server.
Anybody aware why does it happen?
That's not what ini_get() does. It's for retrieving server configuration values (the configuration of your server), not request-specific values like the user agent sent by a requesting browser/script/whatever.
So, you can use ini_get() to find out what user agent value, if any, is set for requests made by your server, like the one you are actually making. You cannot use it to find out the user agent of a request made to your server.
Here is an example of code to set user agent and retrive a ressource with file_get_contents.
//Set uri
$uri = 'http://example.com';
//Init context
$ctx = stream_context_create(
array(
'http' => array(
'user_agent' => 'MySuperAgent/3.0'
)
)
);
//Try to retrieve content
if (($data = file_get_contents($uri, false, $ctx)) === false) {
die('file_get_contents error');
}
ps : Note that the context array should be under http key even for https ressources.
ps2 : I strongly suggests that you set the timeout and maximum acceptable redirections on the context to avoid slowdown in your application.
I would like to get the resulting web page of a specific form submit. This form is using POST so my current goal is to be able to send POST data to an url, and to get the HTML content of the result in a variable.
My problem is that i cannot use cUrl (not enabled), that's why i ask for your knowledge to know if an other solution is possible.
Thanks in advance
See this, using fsockopen:
http://www.jonasjohn.de/snippets/php/post-request.htm
Fsockopen is in php standard library, so all php fron version 4 has it :)
try file_get_contents() and stream
$opts = array( 'http'=>array('method'=>"POST", 'content' => http_build_query(array('status' => $message)),));
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);
I want to make a http post to an outside url using php. By outside url I mean the url i not hosted on my servers.The url is called in an iframe. I need to know if this is technically possible to do this.
I tried doing this using curl but curl creates its own session with the remote server while I want to use the session which the browser has already created.
Please let me know your thoughts on this.
<?php
php code to make http post.
?>
<iframe src="outside url to be posted" height="100" width="100"/>
The outside url is google calender, so when I call it, if the user is already logged into google, his calender should display and I need to make a post to the calender using http post to save a calender event.
I hope this makes myself more clear on what am trying to achieve.
Update - Current Answer
After the update to your question, here's a different answer that I think addresses your issue more closely.
I think the question you are asking involves doing things with a user's credentials on another site. This is dancing dangerously close to Cross-site Request Forgery.
If you only do the POSTing when the user requests that you do it, it's a little better (I guess) but still inadvisable.
Why don't you use the Google Calendar API to do what you need?
Previous Answer
You need to tell cURL to use a particular session. Because PHP is managing the session, you'll also need to tell php to stop writing to the session while cURL uses it.
Try this:
$strCookie = 'PHPSESSID=' . $_COOKIE['PHPSESSID'] . '; path=/';
session_write_close();
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt( $ch, CURLOPT_COOKIE, $strCookie );
$response = curl_exec($ch);
curl_close($ch);
$_COOKIE['PHPSESSID'] will be the identifier for your PHP session, and $url will be the URL you've pulled out of the iframe.
This is taken virtually verbatim from this blog post. It was one of the first links on Google, so I didn't do a lot of extra digging.
I've done a bit of messing with cURL and PHP sessions, so this looks right based on what I remember.
Edit:
By the way, you should reference this SO question for the method to do POSTs with cURL. I assume you at least have some idea of how to do this, but there it is in case you need a refresher.
Also (in case it's not clear already), you can run as many
curl_setopt($handle, (CURL OPTION), (CURL VALUE));
lines as you need to configure cURL the way you need it.
e.g.:
POST vals
Session settings
etc., etc.
Good luck!
It's javascript, not php.
<form id="post_form" method="post" target="post_frame">
<input type="hidden name="field1" value="value1>
.... other fields
</form>
<script type="text/javascript">
document.getElementById("post_form").submit();
</script>
<iframe name="post_frame" height="100" width="100"/>
right off the file_get_contents man page:
<?php
// Create a stream
$opts = array(
'http'=>array(
'method'=>"POST",
'header'=>"Accept-language: en\r\n" .
"Cookie: foo=bar\r\n"
)
);
//put post content into cookie part
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);
?>
<div><?=$file?></div>
not rly an iframe but the same idea
I have been made aware of the Accept-Range header.
I have a URL that I am calling that always returns a 2mb file. I don't need this much and only need the last section 20-50k.
I am not sure how to go about using it? Would I need to use cURL? I am currently using file_get_contents().
Would someone be able to provide me with an example / tutorial?
Thanks.
EDIT: If this isn't possible then what is post on about? Here ...
EDIT: Ulrika! I'm not insane.
This is possible using the Range header, provided the server supports it. See the HTTP 1.1 spec. You would want to send a header in the following format in your request:
Range: bytes=-50000
This would give you the last 50,000 bytes. Adjust to whatever you need.
You can specify this header in file_get_contents using a context. For example:
// Create a stream
$opts = array(
'http'=>array(
'method' => "GET",
'header' => "Range: bytes=-50000\r\n"
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);
If you were to file_get_contents() and dump that to a passthrough 'cache' file on disk, then you could use the unix/linux tail -c to only grab back the last 20kb or so. This doesn't mitigate the actual transfer, but gets that 20kb into the application.
This is indeed possible - see this question for an example of the HTTP headers sent and received
you can't do that. You're going to have to load the entire file (which is sent in its entirety, sequentially, by the source server), and just discard most of it.
What you're asking is like "I'm tuning to this radio station on my car stereo and I only want to hear the last 5 minutes of the show, without having to wait for the rest to complete or change channels".
I am trying to scrape a suppliers magento site in an effort to save some time because of there being around 2000 products I need to gather info for. I'm totally OK with writing a screen scraper for pretty much anything but i've encountered a major problem. Im using get_file_contentsto gather the html of the product page.
The problem is:
You need to be logged in, to view the product page. Its a standard magento login, so how can I get round this in my screen scraper? I don't require a full script, just advice on a method.
Using stream_context_create you can specify headers to be sent when calling your file_get_contents.
What I'd suggest is, open your browser and login to the site. Open up Firebug (or your favorite Cookie viewer) and grab the cookies and send them with your request.
Edit: Here's an example from PHP.net:
<?php
// Create a stream
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: foo=bar\r\n"
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents('http://www.example.com/', false, $context);
?>
Edit (2): This is out of the scope of your question, but if you are wondering how to scrape the website afterwards you could look into the DOMDocument::loadHTML method. This will essentially give you the required functions (i.e. XPath query, getElementsByTagName, getElementsById) to scrape what you need.
If you want to scrape something simple, you can also use RegEx with preg_match_all.
If you're familiar with CURL this should be relatively simple to do in a day or so. I've created some similar apps to login to banks to retrieve data - which of course also require authentication.
Below is a link with an example of how to use CURL with cookies for authentication purposes:
http://coderscult.com/php/php-curl/2008/05/20/php-curl-cookies-example/
If you can grab the output of the page you can parse for your results with a regex. Alternatively, you can use a class like Snoopy to do this work for you:
http://sourceforge.net/projects/snoopy/