How to get full html content of specific url?

How to get full html content of specific url? - php

I used several method to get html content of aptoide.com in php.
1) file_get_contents();
2) readfile();
3) curl as php function
function get_dataa($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Konqueror/4.0; Microsoft Windows) KHTML/4.0.80 (like Gecko)");
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
4)PHP Simple HTML DOM Parser
include_once('simple_html_dom.php');
$url="http://aptoide.com";
$html = file_get_html($url);
But all of them give empty output for aptoide.com
Is there a way to get full html content of that url ?

echo file_get_contents('http://www.aptoide.com/'); works fine for me.
So it's possible that aptoide.com has been blocked you. If you want to change your IP (as you said in comment) you have to use this:
$url = 'http://aptoide.com.com/';
$proxy = '127.0.0.1:9095'; // Your proxy
// $proxyauth = 'user:password'; // Proxy authentication if required
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
//curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyauth);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;

use your curl get_dataa function with this line added:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
because that page is redirecting to www.aptide.com
full function:
function get_dataa($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; Konqueror/4.0; Microsoft Windows) KHTML/4.0.80 (like Gecko)");
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

Related

I can't get the content of a specific URL

I can't get the content of this URL (I get a blank page): https://www.euronews.com/api/watchlive.json
I have always used this function and never had problems:
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 3);
curl_setopt($ch, CURLOPT_TIMEOUT, 3);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_HEADER, 5);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$data = json_decode(file_get_contents_curl("https://www.euronews.com/api/watchlive.json"), true); $url = $data['url']; echo $url;
Any idea?
EDIT: Unknown problem caused (possibly) by cURL module in Raspbian (Raspberry)

I think you need to echo the function if you want to see the content. I tried this on my end and I got the content. althought I have some undefined variable error because of $referrer, $header, $useragent and $cookie. but it runs and get the result. here is how I call the function
echo file_get_contents_curl("https://www.euronews.com/api/watchlive.json");

You're getting more than you expect in your function response, so the json_decode is failing. You need to set CURLOPT_HEADER to false instead of 5.
curl_setopt($ch, CURLOPT_HEADER, FALSE);
You can verify what you're getting by doing
$response = file_get_contents_curl("https://www.euronews.com/api/watchlive.json");
var_dump($response);
$data = json_decode($response, true);
$url = $data['url'];
echo $url;

Proxy Authentication Required with cURL

I would like to use a cURL function, but I'm behind a proxy, so I get an HTTP/1.1 407 Proxy Authentication Required error...
This is the php code I use:
$proxy_user = 'Michiel';
$proxy_pass = 'mypassword';
$proxy_url = 'myproxyurl:port';
$proxy = true;
$service_url = "https://www.myapiurltocall.com";
$service_user = 'user:password:FO';
$service_pass = 'password';
$ch = curl_init($service_url);
// Set proxy if necessary
if ($proxy) {
curl_setopt($ch, CURLOPT_PROXY, $proxy_url);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxy_user.':'.$proxy_pass);
curl_setopt($ch, CURLOPT_PROXYPORT, 8080);
curl_setopt($ch, CURLOPT_PROXYAUTH, CURLAUTH_NTLM);
}
// Set service authentication
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_USERPWD, "{$service_user}:{$service_pass}");
// HTTP headers
$headers['Authorization'] = 'Basic ' . base64_encode("$proxy_user:$proxy_pass");
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_TIMEOUT, 15);
//WARNING: this would prevent curl from detecting a 'man in the middle' attack
curl_setopt ($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, 0);
$data = curl_exec($ch);
And I don't know what I do wrong... How can I get around this error?

This is what I'm mostly using :
function curlFile($url,$proxy_ip,$proxy_port,$loginpassw)
{
//$loginpassw = 'username:password';
//$proxy_ip = '192.168.1.1';
//$proxy_port = '12345';
//$url = 'http://www.domain.com';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_PROXYPORT, $proxy_port);
curl_setopt($ch, CURLOPT_PROXYTYPE, 'HTTP');
curl_setopt($ch, CURLOPT_PROXY, $proxy_ip);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, $loginpassw);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

This is what I use and it works perfectly:
$proxy = "1.1.1.1:12121";
$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";
$url = "http://www.google.pt/search?q=anonymous";
$credentials = "user:pass"
$ch = curl_init();
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,15);
curl_setopt($ch, CURLOPT_HTTP_VERSION,'CURL_HTTP_VERSION_1_1' );
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_PROXYUSERPWD,$credentials);
curl_setopt($ch, CURLOPT_USERAGENT,$useragent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,0);
$result=curl_exec ($ch);
curl_close ($ch);

Try to implement this in your function
function send_string($data) {
// pr($data);exit;
$equi_input="Your values";
// echo $equi_input; exit;
$agent = ""//Agent Setting for Netscape
$url = ""; // URL to POST FORM. (Action of Form)
//xml format
$efx_request=strtoupper($equi_input);
//echo $efx_request;exit;
$post_fields = "site_id=""&service_name=""&efx_request=$efx_request";
$credentials = ;
$headers = array("HTTP/1.1",
"Content-Type: application/x-www-form-urlencoded",
"Authorization: Basic " . $credentials
);
$fh = fopen('/tmp/test.txt', 'w') or die("can't open file"); //File to write the header information of the curl request.
$ch = curl_init(); // Initialize a CURL session.
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, $url); // Pass URL as parameter.
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_STDERR, $fh);
curl_setopt($ch, CURLOPT_POST, 1); // use this option to Post a form
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields); // Pass form Fields.
curl_setopt($ch, CURLOPT_VERBOSE, '1');
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, '1');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch); // grab URL and pass it to the variable.
curl_close($ch); // close curl resource, and free system resources.
fclose($fh);
//echo '<pre>';print_r($array);echo '</pre>';exit;
if (!empty($result)) {
return $result;
} else {
echo "Curl Error" . curl_error($ch);
}
}
check request send 200ok ...
And check your secure certificate if you are using....
Thanks.

Assuming the $proxy_user:$proxy_pass are the correct values, if you are behind an office firewall the authentication credentials to be used would be that of a network administrator and not your own...I've encountered something very similar in my work environment and that was the case for me as my normal credentials didn't have the necessary privileges. The best way to ensure that this is the case however would be to ensure the code works in an open network with your proxy settings commented. Hope this helps!

Fetch the description from wikipedia from an article

I am trying to make a API call to wikipedia through: http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=xml, but the xml is full with html and css tags.
Is there a way to fetch only plain text without tags? Thanks!
*Edit 1:
$json = json_decode(file_get_contents('http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json'));
$txt = strip_tags($json->text);
var_dump($json);
Null displayed.

Question was partially answered here
$url = 'http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json&prop=text';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); // required by wikipedia.org server
$c = curl_exec($ch);
$json = json_decode($c);
var_dump(strip_tags($json->{'parse'}->{'text'}->{'*'}))
I was not able to use file_get_contents but it works fine with cURL.

it is possible to fetch info or description from wikipedia by using xml.
$url = "http://en.wikipedia.org/w/api.php?action=opensearch&search=".$term."&format=xml&limit=1";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_HEADER, false); // Include head as needed
curl_setopt($ch, CURLOPT_NOBODY, FALSE); // Return body
curl_setopt($ch, CURLOPT_VERBOSE, FALSE); // Minimize logs
curl_setopt($ch, CURLOPT_REFERER, ""); // Referer value
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // No certificate
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
curl_setopt($ch, CURLOPT_MAXREDIRS, 4); // Limit redirections to four
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return in string
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8"); // Webbot name
$page = curl_exec($ch);
$xml = simplexml_load_string($page);
if((string)$xml->Section->Item->Description) {
print_r(array((string)$xml->Section->Item->Text,
(string)$xml->Section->Item->Description,
(string)$xml->Section->Item->Url));
} else {
echo "sorry";
}
But curl must be install on server... have a nice day...

How to use CURL instead of file_get_contents?

I use file_get_contents function to get and show external links on my specific page.
In my local file everything is okay, but my server doesn't support the file_get_contents function, so I tried to use cURL with the below code:
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo file_get_contents_curl('http://google.com');
But it returns a blank page. What is wrong?

try this:
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}

This should work
function curl_load($url){
curl_setopt($ch=curl_init(), CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
curl_close($ch);
return $response;
}
$url = "http://www.google.com";
echo curl_load($url);

I encountered such a problem accessing Google Drive content via the direct link.
After calling file_get_contents returned 302 Moved temporarily
//Any google url. This example is fake for Google Drive direct link.
$url = "https://drive.google.com/uc?id=0BxQKKJYjuNElbFBNUlBndmVHHAj";
$html = file_get_contents($url);
echo $html; //print none because error 302.
With the code below it worked again:
//Any google url. This example is fake for Google Drive direct link.
$url = "https://drive.google.com/uc?id=0BxQKKJYjuNElbFBNUlBndmVHHAj";
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 3);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$html = curl_exec($ch);
curl_close($ch);
echo $html;
I tested it today, 03/19/2018

//You can try this . It should work fine.
function curl_tt($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 3);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo curl_tt("https://google.com");

How to use CURL via a proxy?

I am looking to set curl to use a proxy server. The url is provided by an html form, which has not been a problem. Without the proxy it works fine. I have found code on this and other sites, but they do not work. Any help in finding the correct solution would be much appreciated. I feel that the bellow are close, but that I am missing something. Thank You.
The bellow code I adapted from here http://www.webmasterworld.com/forum88/10572.htm but it returns an error message about a missing T_VARIABLE on line 12.
<?
$url = '$_POST[1]';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 0);
curl_setopt($ch, CURLOPT_PROXY, '66.96.200.39:80');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET');
curl_setopt ($ch, CURLOPT_HEADER, 1)
curl_exec ($ch);
$curl_info = curl_getinfo($ch);
curl_close($ch);
echo '<br />';
print_r($curl_info);
?>
The bellow is from curl through proxy returns no content
<?
$proxy = "66.96.200.39:80";
$proxy = explode(':', $proxy);
$url = "$_POST[1]";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_PROXY, $proxy[0]);
curl_setopt($ch, CURLOPT_PROXYPORT, $proxy[1]);
curl_setopt($ch, CURLOPT_HEADER, 1);
$exec = curl_exec($ch);
echo curl_error($ch);
print_r(curl_getinfo($ch));
echo $exec;
?>
is currently live on pelican-cement.com but also does not work.
UPDATE:
Thank you for all your help, I made the above changes. Now it only returns a blank screen.
<?
$url = $_POST['1'];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 0);
curl_setopt($ch, CURLOPT_PROXY, '66.96.200.39:80');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST,'GET');
curl_setopt ($ch, CURLOPT_HEADER, 1);
curl_exec ($ch);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;
?>

Here is a working version with your bugs removed.
$url = 'http://dynupdate.no-ip.com/ip.php';
$proxy = '127.0.0.1:8888';
//$proxyauth = 'user:password';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
//curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyauth);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;
I have added CURLOPT_PROXYUSERPWD in case any of your proxies require a user name and password.
I set CURLOPT_RETURNTRANSFER to 1, so that the data will be returned to $curl_scraped_page variable.
I removed a second extra curl_exec($ch); which would stop the variable being returned.
I consolidated your proxy IP and port into one setting.
I also removed CURLOPT_HTTPPROXYTUNNEL and CURLOPT_CUSTOMREQUEST as it was the default.
If you don't want the headers returned, comment out CURLOPT_HEADER.
To disable the proxy simply set it to null.
curl_setopt($ch, CURLOPT_PROXY, null);
Any questions feel free to ask, I work with cURL every day.

I have explained use of various CURL options required for CURL PROXY.
$url = 'http://dynupdate.no-ip.com/ip.php';
$proxy = '127.0.0.1:8888';
$proxyauth = 'user:password';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); // URL for CURL call
curl_setopt($ch, CURLOPT_PROXY, $proxy); // PROXY details with port
curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyauth); // Use if proxy have username and password
curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5); // If expected to call with specific PROXY type
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // If url has redirects then go to the final redirected URL.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); // Do not outputting it out directly on screen.
curl_setopt($ch, CURLOPT_HEADER, 1); // If you want Header information of response else make 0
$curl_scraped_page = curl_exec($ch);
curl_close($ch);
echo $curl_scraped_page;

root#APPLICATIOSERVER:/var/www/html# php connectiontest.php
61e23468-949e-4103-8e08-9db09249e8s1 OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 10.172.123.1:80 root#APPLICATIOSERVER:/var/www/html#
Post declaring the proxy settings in the php script file issue has been fixed.
$proxy = '10.172.123.1:80';
curl_setopt($cSession, CURLOPT_PROXY, $proxy); // PROXY details with port

Here is a well tested function which i used for my projects with detailed self explanatory comments
There are many times when the ports other than 80 are blocked by server firewall so the code appears to be working fine on localhost but not on the server
function get_page($url){
global $proxy;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
//curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_HEADER, 0); // return headers 0 no 1 yes
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // return page 1:yes
curl_setopt($ch, CURLOPT_TIMEOUT, 200); // http request timeout 20 seconds
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // Follow redirects, need this if the url changes
curl_setopt($ch, CURLOPT_MAXREDIRS, 2); //if http server gives redirection responce
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt"); // cookies storage / here the changes have been made
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // false for https
curl_setopt($ch, CURLOPT_ENCODING, "gzip"); // the page encoding
$data = curl_exec($ch); // execute the http request
curl_close($ch); // close the connection
return $data;
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to get full html content of specific url? - php

Related

I can't get the content of a specific URL

Proxy Authentication Required with cURL

Fetch the description from wikipedia from an article

How to use CURL instead of file_get_contents?

How to use CURL via a proxy?

Categories

Resources