How to get HTML resource of a page by php

How to get HTML resource of a page by php - php

I wanted to make my site let anyone when open internal links, the links open on the current page without opening new url in browser (something like what happens in Instagram site).
For doing this I wanted to use jQuery/Ajax to make XMLHttpRequest to a php file that will prepare the new page HTML Data so I started making a php file that returns a url resources to use them in my old page:
<?php
$html = file_get_contents("http://example.com");
echo $html;
?>
It worked! so let me use my own link in that simple script:
<?php
$html = file_get_contents("http://apps.bestbadboy.ir/solver");
echo $html;
?>
It doesn't work! I searched in google and stackoverflow for its reason, I got some sites doesn't allow to use this method and maybe curl help me! so let me use curl:
<?php
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0');
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo curl('http://apps.bestbadboy.ir/solver');
?>
so after all it finally doesn't work again:(
I started searching but I found nothing! I just guess I should edit .htaccess file or put something in my url(s) that I want to copy them to grant access to php file that will do that. please help me how to make my site at this way (like Instagram opening its internal links)?

Very Well! I found my solution! after searching very much and trying many ways I found my problem! using curl is a good method for getting a HTML page source code, but the problem that was making this way not to work for me is I had edited .htaccess file to redirect http requests to https! and curl function I used doesn't support that redirect! so I put my url with https format and it worked very nice!
<?php
function curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:31.0) Gecko/20100101 Firefox/31.0');
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo curl('https://apps.bestbadboy.ir/solver');
?>

Related

Loading website from a different domain PHP

In my application I am loading product info from a supplier:
$start_url = "http://www.example.com/product/product_code";
These URLs are usually redirected by the supplier's website, and I have written a function that successfully finds the destination URL, like so:
$end_url = destination( $start_url );
echo "start url"; // link get redirected to correct page
echo "end url"; // links straight to correct page, no redirection
However, if I want to get the HTML from the page...
echo file_get_contents( $start_url ); // 404
echo file_get_contents( $end_url ); // 404
...I just get the supplier's 404 page (not a generic one but their custom one).
I have allow_url_fopen enabled; file_get_contents( "http://www.example.com/" ) works fine.
I can use either URL to load the expected content in an iframe client-side, but XSS security prevents me extracting the data I need.
The only thing I can think of is if the site is using an URL rewriter, could this mess things up?
The PHP is running on my local machine, so it should appear no different from me looking at the website via a browser as far as I'm aware.

Thanks to #Loz Cherone ツ's comments, using cURL and changing the user agent worked.
$user_agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13";
$url = $_REQUEST["url"]; // e.g. www.example.com/product/ABC123
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follows any redirection
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo curl_exec($ch);
curl_close($ch);
I then put the response into the srcdoc attribute of an iframe client-side so I can access the DOM.

cURL retrieve only URL address

Using PHP and cURL, I'd like to check if I can login to a website using the provided user credentials. For that I'm currently retrieving the entire website and then use regex to filter for keywords that might indicate the login didn't work.
The url itself contains the string "errormessage" if a wrong username/password has been entered. Is it possible to only use curl to get the url address, without the contents to speed it up?
Here's my curl PHP code:
function curl_get_request($referer, $submit_url, $ch)
{
global $cookie_path;
// sends a request via curl to the string specifics listed
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
curl_setopt($ch, CURLOPT_URL, $submit_url);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_path);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_path);
return $result = curl_exec ($ch);
}
Also, if somebody has a better idea on how to handle a problem like this, please let me know!

What you should do is check the URL each time there is a redirect. Most redirects are going to be done with the proper HTTP headers. If that is the case, see this answer:
PHP: cURL and keep track of all redirections
Basically, turn off automatic redirection following, and check the HTTP status code for 301 or 302. If you get one of those, you can continue to follow the redirection if needed, or exit from there.
If instead, the redirection is happening client side, you will have to parse the page with a DOM parser.

Getting users profile image in PHP via curl doesn't work

I need the users image as image object within PHP.
The obvious choice would be to do the following:
$url = 'https://graph.facebook.com/'.$fb_id.'/picture?type=large';
$img = imagecreatefromjpeg($url);
This works on my test server, but not on the server this script is supposed to run eventually (allow_url_fopen is turned off there).
So I tried to get the image via curl:
function LoadJpeg($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
$fileContents = curl_exec($ch);
curl_close($ch);
$img = imagecreatefromstring($fileContents);
return $img;
}
$url = 'https://graph.facebook.com/'.$fb_id.'/picture?type=large';
$img = LoadJpeg($url);
This, however, doesn't work with facebook profile pictures.
Loading, for example, Googles logo from google.com using curl works perfectly.
Can someone tell me why or tell me how to achieve what I am trying to do?

You have to set
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
in this way you find the image
without it you get a 302 response code without image because is in another position set in the field "url" of the response header.

The easiest solution: turn on allow_url_fopen
Facebook most likely matches your user agent.
Spoof it like ...
// spoofing Chrome
$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, wie z. B. Gecko) Chrome/13.0.782.215 Safari/525.13.";
$ch = curl_init();
// set user agent
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
// set the rest of your cURL options here
I do not have to mention this violates their TOS and might lead to legal problems, right? Also, make sure you follow their robots.txt !

Cannot get XML output through cURL

I am using PHP cURL to fetch XML output from a URL. Here is what my code looks like:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.mydomain.com?querystring');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($ch, CURLOPT_USERPWD, "username:password");
$store = curl_exec($ch);
echo $store;
curl_close($ch);
But, instead of returning the XML it just shows my 404 error page. If I type the URL http://www.mydomain.com?querystring in the web browser I can see the XML in the browser.
What am I missing here? :(
Thanks.

Some website owners check for the existence of certain things to make sure the request comes from a web browser and not a bot (or cURL). You should try adding curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'); and see if that fixes the problem. That will send a user-agent string. The site may also check for the existence of cookies or other things.
To output the XML in a web-page, you'll need to use htmlentities(). You might want to wrap it inside a HTML <pre> element as well.

php: Get url content (json) with cURL

I want to access https://graph.facebook.com/19165649929?fields=name (obviously it's also accessable with "http") with cURL to get the file's content, more specific: I need the "name" (it's json).
Since allow_url_fopen is disabled on my webserver, I can't use get_file_contents! So I tried it this way:
<?php
$page = 'http://graph.facebook.com/19165649929?fields=name';
$ch = curl_init();
//$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";
//curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_URL, $page);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
?>
With that code I get a blank page! When I use another page, like http://www.google.com it works like a charm (I get the page's content). I guess facebook is checking something I don't know... What can it be? How can I make the code work? Thanks!

did you double post this here?
php: Get html source code with cURL
however in the thread above we found your problem beeing unable to resolve the host and this was the solution:
//$url = "https://graph.facebook.com/19165649929?fields=name";
$url = "https://66.220.146.224/19165649929?fields=name";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: graph.facebook.com'));
$output = curl_exec($ch);
curl_close($ch);

Note that the Facebook Graph API requires authentication before you can view any of these pages.
You basically got two options for this. Either you login as an application (you've registered before) or as a user. See the api documentation to find out how this works.
My recommendation for you is to use the official PHP-SDK. You'll find it here. It does all the session and cURL magic for you and is very easy to use. Take the examples which are included in the package and start to experiment.
Good luck.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to get HTML resource of a page by php - php

Related

Loading website from a different domain PHP

cURL retrieve only URL address

Getting users profile image in PHP via curl doesn't work

Cannot get XML output through cURL

php: Get url content (json) with cURL

Categories

Resources