I am trying to CURL this URL so that it automatically adds a product to a basket
http://www.juno.co.uk/cart/add/440551/01/
When I follow the URL in the browser it adds the product to basket
When I CURL it it doesnt add it
This is my CURL code
$url = "http://www.juno.co.uk/cart/add/440551/01/";
$c = curl_init();
curl_setopt($c, CURLOPT_URL,"$url");
$file_path = 'cookies.txt';
curl_setopt($c,CURLOPT_POST,true);
curl_setopt($c, CURLOPT_CONNECTTIMEOUT, 50);
curl_setopt($c,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($c, CURLOPT_RETURNTRANSFER,1);
curl_setopt($c, CURLOPT_COOKIEJAR, $file_path);
$complete = curl_exec($c);
curl_close($c);
Any ideas? CURL is definitely set up on my server as I am successfully using it for other scripts.
You can see the output here http://soundshelter.net/addjuno.php?id=440551 - it is redirecting to the page that I expect it to (i.e. adding the item to basket) but I do not want to redirect the user to this page - only ping the page so that the item is added to basket but the user remains on my page. Any ideas?
Thanks in advance
The cart (or something about it (id, content, etc) is stored in a session, you have to create a custom function in which you can pass the id of the cart, and you can update it.
EDIT:
if this would be possible, then it would be a security risk (add items to anybody cart ?)
user is identified by session id, you need to "steal" it from your visitor and call the url via curl like you were the user (you can create cookies for the curl session i think and set the session id), but of course this is a very similar thing like stealing cookie / session datas, and there are defending techniques against it
my opinion is only one possible solution is, if the juno.co.uk has a public api for such operations
Answer may be as simple as you shouldn't need to POST, that might be causing problems since you aren't sending/specifying any data. What I mean is to comment out that line:
//curl_setopt($c,CURLOPT_POST,true);
sidebar: Can you show the output that you do get?
Related
I need to retrieve and parse the text of public domain books, such as those found on gutenberg.org, with PHP.
To retrieve the content of most webpages I am able to use CURL requests to retrieve the HTML exactly as I would find had I navigated to the URL in a browser.
Unfortunately on some pages, most importantly gutenberg.org pages, the websites display different content or send a redirect header.
For example, when attempting to load this target, gutenberg.org, page a curl request gets redirected to this different but logically related, gutenberg.org, page. I am successfully able to visit the target page with both cookies and javascript turned off on my browser.
Why is the curl request being redirected while a regular browser request to the same site is not?
Here is the code I use to retrieve the webpage:
$urlToScan = "http://www.gutenberg.org/cache/epub/34175/pg34175.txt";
if(!isset($userAgent)){
$userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36";
}
$ch = curl_init();
$timeout = 15;
curl_setopt($ch, CURLOPT_COOKIESESSION, true );
curl_setopt($ch, CURLOPT_USERAGENT,$userAgent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
#curl_setopt($ch, CURLOPT_HEADER, 1); // return HTTP headers with response
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_URL, $urlToScan);
$html = curl_exec($ch);
curl_close($ch);
if($html == null){
return false;
}
print $html;
The hint is probably in the url: it says "welcome stranger". They are redirecting every "first" time visitor to this page. Once you have visited the page, they will not redirect you anymore.
THey don't seem to be saving a lot of stuff in your browser, but they do set a cookie with a session id. This is the most logical thing really: check if there is a session.
What you need to do is connect with curl AND a cookie. You can use your browsers cookie for this, but in case it expires, you'd be better of doing
request the page.
if the page is redirected, safe the cookie (you now have a session)
request the page again with that cookie.
If all goes well, the second request will not redirect. Until the cookie / session expires, and then you start again. see the manual to see how to work with cookies/cookie-jars
The reason that one could navigate to the target page in a browser without cookies or javascript, yet not by curl, was due to the website tracking the referrer in the header. The page can be loaded without cookies by setting the appropriate referrer header:
curl_setopt($ch, CURLOPT_REFERER, "http://www.gutenberg.org/ebooks/34175?msg=welcome_stranger");
As pointed out by #madshvero, the page also be, surprisingly, loaded by simply excluding the user agent.
I can web crawling a newspaper web site successful before but fail today.
But I can access the web successfully by using firefox. It just happen in curl. That mean it allow my IP to access and it is not banned.
Here is the error shown by the web
Please enable cookies.
Error 1010 Ray ID: 1a17d04d7c4f8888
Access denied
What happened?
The owner of this website (www1.hkej.com) has banned your access based
on your browser's signature (1a17d04d7c4f8888-ua45).
CloudFlare Ray ID: 1a17d04d7c4f8888 • Your IP: 2xx.1x.1xx.2xx •
Performance & security by CloudFlare
Here is my code which work before:
$cookieMain = "cookieHKEJ.txt"; // need to use 2 different cookies since it will overwrite the old one when curl store cookie. cookie file is store under apache folder
$cookieMobile = "cookieMobile.txt"; // need to use 2 different cookies since it will overwrite the old one when curl store cookie. cookie file is store under apache folder
$agent = "User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:33.0) Gecko/20100101 Firefox/33.0";
// submit a login
function cLogin($url, $post, $agent, $cookiefile, $referer) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 100); // follow the location if the web page refer to the other page automatically
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // Get returned value as string (don’t put to screen)
curl_setopt($ch, CURLOPT_USERAGENT, $agent); // Spoof the user-agent to be the browser that the user is on (and accessing the php script)
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookiefile); // Use cookie.txt for STORING cookies
curl_setopt($ch, CURLOPT_POST, true); // Tell curl that we are posting data
curl_setopt($ch, CURLOPT_POSTFIELDS, $post); // Post the data in the array above
curl_setopt($ch, CURLOPT_REFERER, $referer);
$output = curl_exec($ch); // execute
curl_close($ch);
return $output;
}
$input = cDisplay("http://www1.hkej.com/dailynews/toc", $agent, $cookieMain);
echo $input;
How can I use curl to pretend the browser successfully? Did I miss some parameters?
As I said in the post, I can use firefox to access the web and my IP is not banned.
At last, I got success after I changed the code from
$agent = "User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:33.0) Gecko/20100101 Firefox/33.0";
to
$agent = $_SERVER['HTTP_USER_AGENT'];
Actually, I don't know why it fail when "User-Agent: " exist start from yesterday but it is alright before.
Thanks all anyway.
The users have used Cloudflares security features to prevent you crawling their website, More than likely got shown as a malicious bot. They will have done this based on your user-agent and IP address.
Try changing your IP (if home user, try rebooting your router. sometimes will get a different IP address). Try using a proxy and try sending different headers with Curl.
More importantly they do not want people crawling their site and affecting their traffic etc, You should really ask permission for this.
I have a small web page that, every day, displays a one word answer - either Yes or No - depending on some other factor that changes daily.
Underneath this, I have a Facebook like button. I want this button to post, in the title/description, either "Yes" or "No", depending on the verdict that day.
I have set up the OG metadata dynamically using php to echo the correct string into the og:title etc. But Facebook caches the value, so someone sharing my page on Tuesday can easily end up posting the wrong content to Facebook.
I have confirmed this is the issue by using the Facebook object debugger. As soon as I force a refresh, all is well. I attempted to automate this using curl, but this doesn't seem to work.
$ch = curl_init();
$timeout = 30;
curl_setopt($ch, CURLOPT_URL, "http://developers.facebook.com/tools/lint/?url={http://ispizzahalfprice.com}");
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);
curl_close($ch);
echo $data;
Am I missing some easy fix here? Or do I need to re-evaluate my website structure to acheive what I am looking for (e.g. use two separate pages)?
Here's the page in case it's useful: http://ispizzahalfprice.com
Using two separate URL's would be the safe bet. As you have observed, Facebook does quite heavy caching on URL scrapes. You've also seen that you, as the admin of the App, can flush and refresh Facebook's cache by pulling the page through the debugger again.
Using two URL's would solve this issue because Facebook could cache the results all they want! There will still be a separate URL for "yes" and one for "no".
I try to get content from my facebook page like so:
echo file_get_contents("http://www.facebook.com/dma.y");
The problem is that it doesnt give me the page but redirects me to another page that says that I need to upgrade my browswer. Then I thought to use curl and fetch it by sending a request with some headers.
echo get_follow_url('http://www.facebook.com/dma.y');
function get_follow_url($url){
// must set $url first. Duh...
$http = curl_init($url);
curl_setopt($http, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($http, CURLOPT_HTTPHEADER, get_headers('http://google.com'));
// do your curl thing here
$result = curl_exec($http);
if(curl_errno($http)){
echo "<br/>An error has been thrown!<br/>";
exit();
}
$http_status = curl_getinfo($http, CURLINFO_HTTP_CODE);
curl_close($http);
return $http_status;
}
Still there is no luck. I should have a status code response returned which is either 404 or 200.. depending if I am logged into facebook. But it returns 301, cause it identifies my request as not being a regular browser request. so what am I missing in the curl option settings?
UPDATE
What I am actually trying to do is to replicate this functionality:
The script will trigger the function onload or onerror, depending on the status code returned..
That code will retrieve the page. However, that javascript method is clumsy, and breaks in some browsers like firefox..cause it isnt a javascript file.
What you might want to try is to set the user_agent with CURL.
$url = 'https://www.facebook.com/cocacola';
$http = curl_init($url);
$fake_user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040803 Firefox/0.9.3';
curl_setopt($http, CURLOPT_USERAGENT, $fake_user_agent);
$result = curl_exec($http);
This is the parameter that servers look at to see what browser you are using. I'm not 100% sure if this will bypass Facebook's checks and give you ALL the information on the page, but it's definitely worth a try! :)
i am creating an application via which a user shares a specific post on the facebook wall or the user's timeline page. This is done via the Javascript sdk and Facebook graph api.
I want to know is that i need to collect all the comments and the likes on that shared post whose id i store in the database.
then i run a cron which uses the graph api again to get the posts and comments on a specific feed (id from db) on facebook.
but i want to know is, that, is there any way for a real time update. Like if someone comments on the feed it send a request to my link and that link saves / update the comment in my database.
if not, let me know that is my cron thing the best way to do this. or is there another way for it
Facebook does indeed give you the ability to get real-time updates, as discussed in this document.
According to this document how ever, it doesn't look like you can get updated about the comments/likes of a post, you can only get updates to specific fields/collections of the User object, not a specific post.
There is no such ability to upadate it in real time, you may do it with cron or do update comments, likes count upon refresh button..
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $POST_URL);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.1) Gecko/20100101 Firefox/10.0.1");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$file_content = curl_exec($ch);
curl_close($ch);
if ($file_content === false) {
//post was delete or something else
} else {
$post_data = json_decode($file_content, true);
}
in $POST_URL you type: https://graph.facebook.com/+POST_ID
in $post_data['likes']['count'] you will have likes count
in $post_data['comments']['count'] you will have comments count