I need the users image as image object within PHP.
The obvious choice would be to do the following:
$url = 'https://graph.facebook.com/'.$fb_id.'/picture?type=large';
$img = imagecreatefromjpeg($url);
This works on my test server, but not on the server this script is supposed to run eventually (allow_url_fopen is turned off there).
So I tried to get the image via curl:
function LoadJpeg($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
$fileContents = curl_exec($ch);
curl_close($ch);
$img = imagecreatefromstring($fileContents);
return $img;
}
$url = 'https://graph.facebook.com/'.$fb_id.'/picture?type=large';
$img = LoadJpeg($url);
This, however, doesn't work with facebook profile pictures.
Loading, for example, Googles logo from google.com using curl works perfectly.
Can someone tell me why or tell me how to achieve what I am trying to do?
You have to set
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
in this way you find the image
without it you get a 302 response code without image because is in another position set in the field "url" of the response header.
The easiest solution: turn on allow_url_fopen
Facebook most likely matches your user agent.
Spoof it like ...
// spoofing Chrome
$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, wie z. B. Gecko) Chrome/13.0.782.215 Safari/525.13.";
$ch = curl_init();
// set user agent
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
// set the rest of your cURL options here
I do not have to mention this violates their TOS and might lead to legal problems, right? Also, make sure you follow their robots.txt !
Related
I wanted to make my site let anyone when open internal links, the links open on the current page without opening new url in browser (something like what happens in Instagram site).
For doing this I wanted to use jQuery/Ajax to make XMLHttpRequest to a php file that will prepare the new page HTML Data so I started making a php file that returns a url resources to use them in my old page:
<?php
$html = file_get_contents("http://example.com");
echo $html;
?>
It worked! so let me use my own link in that simple script:
<?php
$html = file_get_contents("http://apps.bestbadboy.ir/solver");
echo $html;
?>
It doesn't work! I searched in google and stackoverflow for its reason, I got some sites doesn't allow to use this method and maybe curl help me! so let me use curl:
<?php
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0');
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo curl('http://apps.bestbadboy.ir/solver');
?>
so after all it finally doesn't work again:(
I started searching but I found nothing! I just guess I should edit .htaccess file or put something in my url(s) that I want to copy them to grant access to php file that will do that. please help me how to make my site at this way (like Instagram opening its internal links)?
Very Well! I found my solution! after searching very much and trying many ways I found my problem! using curl is a good method for getting a HTML page source code, but the problem that was making this way not to work for me is I had edited .htaccess file to redirect http requests to https! and curl function I used doesn't support that redirect! so I put my url with https format and it worked very nice!
<?php
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0');
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo curl('https://apps.bestbadboy.ir/solver');
?>
My question is, i'm using - $url = http://sms.emefocus.com/sendsms.jsp?user="$uname"&password="$pwd"&mobiles="$mobiil_no"&sms="$msg"&senderid="$sender_id"; $ret = file($url);- url to send sms to users from user panel and i'm using FILE operation to execute this url as mentioned above.
After executing this when i'm trying to print $ret, its giving me status true and generating message id and sending id.
But its not getting delivered to user....??
When same url i'm executing in browser as $url = http://sms.emefocus.com/sendsms.jsp?user="$uname"&password="$pwd"&mobiles=98xxxxxx02&sms=Hi..&senderid="$sender_id"
its getting delivered immediately..??
can anyone help me out..?? Thanks in advance..
It is possible that this SMS service needs to think a browser and not a bot is executing the request, or there is some "protection" we don't know about. Is there any documentation regarding this particular service ? Is it intended to be used like you're trying to do?
You can try with CURL and see if the behaviour is still the same:
<?php
// create curl resource
$ch = curl_init();
$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)';
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
// Fake real browser
curl_setopt($curl, CURLOPT_USERAGENT, $agent);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$ret = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
?>
Does it help?
I have an app that uses cURL to scrape some elements of sites.
I've started receiving some errors that look like this:
"Not Acceptable!Not Acceptable!An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security."
Have you ever seen this?
If so, How can I get around it?
I checked 2 sites that do the same thing I do and everything worked fine
Regarding the cURL, this is what I use:
public function cURL_scraping($url){
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_MAXREDIRS, 10);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curl,CURLOPT_HTTPHEADER,array('Expect:'));
curl_setopt( $curl, CURLOPT_SSL_VERIFYPEER, false );
curl_setopt($curl, CURLOPT_ENCODING, 'identity');
$response['str'] = curl_exec($curl);
$response['header'] = curl_getinfo($curl, CURLINFO_HTTP_CODE);
curl_close($curl);
return $response;
}
Well I found the reason. I removed the user agent and it works. I guess the server was blocking this specific user agent.
It looks like the site you are scraping has set up a detection and blocking of scraping. To check this you can try to get the webpage from the same ip and/or with all the same headers.
If that is the case, you really should respect the site owners wishes to not be scraped. You could ask them, or experiment to what is an acceptable scraping of their site. Did you read their robots.txt?
The error usually has a timeout, but it might be permanent. In that case you probably need to change ip address to try again.
I got same error and I was just playing around and found an answer.
If you understand some basic python, it will be easy for you to change related code in in the language that you are working with.
I just added a header like this,
headers = {
"User-Agent":
"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
And this works!
In my application I am loading product info from a supplier:
$start_url = "http://www.example.com/product/product_code";
These URLs are usually redirected by the supplier's website, and I have written a function that successfully finds the destination URL, like so:
$end_url = destination( $start_url );
echo "start url"; // link get redirected to correct page
echo "end url"; // links straight to correct page, no redirection
However, if I want to get the HTML from the page...
echo file_get_contents( $start_url ); // 404
echo file_get_contents( $end_url ); // 404
...I just get the supplier's 404 page (not a generic one but their custom one).
I have allow_url_fopen enabled; file_get_contents( "http://www.example.com/" ) works fine.
I can use either URL to load the expected content in an iframe client-side, but XSS security prevents me extracting the data I need.
The only thing I can think of is if the site is using an URL rewriter, could this mess things up?
The PHP is running on my local machine, so it should appear no different from me looking at the website via a browser as far as I'm aware.
Thanks to #Loz Cherone ツ's comments, using cURL and changing the user agent worked.
$user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13";
$url = $_REQUEST["url"]; // e.g. www.example.com/product/ABC123
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follows any redirection
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo curl_exec($ch);
curl_close($ch);
I then put the response into the srcdoc attribute of an iframe client-side so I can access the DOM.
I want to access https://graph.facebook.com/19165649929?fields=name (obviously it's also accessable with "http") with cURL to get the file's content, more specific: I need the "name" (it's json).
Since allow_url_fopen is disabled on my webserver, I can't use get_file_contents! So I tried it this way:
<?php
$page = 'http://graph.facebook.com/19165649929?fields=name';
$ch = curl_init();
//$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";
//curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_URL, $page);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
?>
With that code I get a blank page! When I use another page, like http://www.google.com it works like a charm (I get the page's content). I guess facebook is checking something I don't know... What can it be? How can I make the code work? Thanks!
did you double post this here?
php: Get html source code with cURL
however in the thread above we found your problem beeing unable to resolve the host and this was the solution:
//$url = "https://graph.facebook.com/19165649929?fields=name";
$url = "https://66.220.146.224/19165649929?fields=name";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: graph.facebook.com'));
$output = curl_exec($ch);
curl_close($ch);
Note that the Facebook Graph API requires authentication before you can view any of these pages.
You basically got two options for this. Either you login as an application (you've registered before) or as a user. See the api documentation to find out how this works.
My recommendation for you is to use the official PHP-SDK. You'll find it here. It does all the session and cURL magic for you and is very easy to use. Take the examples which are included in the package and start to experiment.
Good luck.