PHP cURL returns encrypted html page - php

I'm trying to get simple html code, from cURL GET-request on PHP.
Default get-request on url, like http://example.com/ (not exacly this domain), returns html code I need, but get-request on page of this domain, like http://example.com/something returns gzip encrypted data, or something.
What I already tried to fix this issue:
curl_setopt(ch, CURLOPT_ENCODING, ''); // returns ''
curl_setopt(ch, CURLOPT_ENCODING, 'gzip'); // returns ''
curl_setopt(ch, CURLOPT_ENCODING, 'gzip,compressed'); // returns ''
$html = gzdecode($data); // data error
By the way, on inspector, like Fiddler, this page returns similar wierd symbols, but it fixes by one click: 'Click to decrypt'. How I can decrypt my data programmatically, using PHP?

If I understood you well, you need to get the content in HTML from an url.
Please, check this link:
Get HTML from URL using curl in PHP
You don't need to use CURLOPT_ENCODING in curl_setopt.
EDIT
I tried this and it works:
<?php
function get_data($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html_content = get_data('https://stackoverflow.com/questions/61548866/php-curl-returns-encrypted-html-page/61549219?noredirect=1#comment108875034_61549219');
echo "You are getting HTML code from an url <br>".$html_content;
?>
Image with test working in localhost
Thank you, I hope it helps you.

Related

How to copy content from a dynamic page using PHP?

Is it possible to get the information displayed in the page link given below using PHP. I want all the text content displayed on the page to be copied to a variable or to a file.
http://www.ncbi.nlm.nih.gov/nuccore/24655740?report=fasta&format=text
I have tried cURL too, but it didn't work. Where as cURL worked with a few other sites I know. But even if solutions with cURL are there do post. I might have tried various methods in which cURL can be used.
Use cURL to get the page content and then parse it - extract the <pre> section.
$ch = curl_init();
// Set query data here with the URL
curl_setopt($ch, CURLOPT_URL, 'val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '3');
$content = trim(curl_exec($ch));
curl_close($ch);
// show ALL the content
print $content;
$start_index = strpos($content, '<pre>')+5;
$end_index = strpos($content, '</pre>');
$your_text = substr($content, $start_index, $end_index-$start_index);
UPDATE
Using the link from #ovitinho's answer - it now works :)
You need to request the url used by form to show this result via javascript.
I founded this final url
http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000
Pay attention to use 24655740 from your first link in this request.
You can use cURL.

Grabbing Destination Link of a Redirect

Hopefully I am just overlooking this.
I am trying to grab the destination URL of a redirect link using PHP. It's to get the site URL of an affiliate/cloaked link.
Best example: http://tinyurl.com/2tx goes to google.com
NOTE: This is an example, the links are created dynamically
Right now I pass the URL through
www.mysite.com/redirect.php?link=http://tinyurl.com/2tx
Here is the code from the site - NOTE: since the URL has Ampersands in them I had to go this route over GET.
<?php
$name = http_build_query($_GET);
// which you would then may want to strip away the first 'name='
$name = substr($name, strlen('name='));
//change link to a nice URL
$url = rawurldecode($name);
?>
I have a simple script that grabs the URL, how could I process the URL to get the destination URL?
Hopefully that's not too confusing.
Cheers,
Robb
You should post some of your code next time. I assume you are using cURL to do this. It's fairly simple:
//sanitize
$ch = curl_init($_GET['link']);
//follow redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
$url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
EDIT: per Dagon, you just want to "know the url but not go there." It is more efficient to use this setting if you only need to know the url but not get its contents:
curl_setopt($ch, CURLOPT_NOBODY, true);
Here is how I would do it (Read the comments):
<?php
// Connect to the page:
$ch = curl_init("http://tinyurl.com/2tx");
// Don't get the body (remove if you want the body):
curl_setopt($ch, CURLOPT_NOBODY, true);
// Follow the page redirects:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
// Retun the data as a string (Remove to echo to the page):
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Execute:
curl_exec($ch);
// Get data:
print_r($data = curl_getinfo($ch));
// Get just the url:
echo $data["url"];
Make a HTTP HEAD Request to the URL you have. You will get back an HTTP 301 or 302 response with the destination URL.
Example: Put your URL here to see the response returned when making an HTTP Head Request.
This might be an encoding issue. The parameter in your URL is not encoded, so it's probably damaged when trying to get it using $_GET.
You want to use this URL:
www.mysite.com/redirect.php?link=http%3A%2F%2Ftinyurl.com%2F2tx
You can encode URL variables in PHP using the urlencode() function. The variable that (I think) you want can now be accessed like this:
echo $_GET['link']; // http://tinyurl.com/2tx

Validate link href attribute

I need to periodically loop through links in my PHP database to check whether the link leads to valid page. If the link has expired or is invalid, I don't want to output it. How can I check that the href value leads to a valid page efficiently?
Thanks for any *pointers.
You can also use multiple CUrl request each time to check all list more faster. Check here
Look into curl. It allows you to pull a site in php http://www.php.net/manual/en/function.curl-exec.php Then just check for either a status code on the response or something like a title tag.
I'm kind of a noob myself, but I would suggest using cURL. A quick Google search on using revealed the following code (which I haven't tested):
<?php
$statusCode = validate($_REQUEST['url']);
if ($statusCode==’200′)
echo ‘Voila! URL ‘.$_REQUEST['url'].
’ exists, returned code is :’.$statusCode;
else
echo ‘Opps! URL ‘.$_REQUEST['url'].
’ does NOT exist, returned code is :’.$statusCode;
function validateurl($url)
{
// Initialize the handle
$ch = curl_init();
// Set the URL to be executed
curl_setopt($ch, CURLOPT_URL, $url);
// Set the curl option to include the header in the output
curl_setopt($ch, CURLOPT_HEADER, true);
// Set the curl option NOT to output the body content
curl_setopt($ch, CURLOPT_NOBODY, true);
/* Set to TRUE to return the transfer
as a string of the return value of curl_exec(),
instead of outputting it out directly */
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Execute it
$data = curl_exec($ch);
// Finally close the handle
curl_close($ch);
/* In this case, we’re interested in
only the HTTP status code returned, therefore we
use preg_match to extract it, so in the second element
of the returned array is the status code */
preg_match(“/HTTP\/1\.[1|0]\s(\d{3})/”,$data,$matches);
return $matches[1];
}
?>
Source: http://www.ajaxapp.com/2009/03/23/to-validate-if-an-url-exists-use-php-curl/

Using PHP, how can I read a server's error page?

I want to read a server's reply for a certain request, modify it to my needs, and send it to the site visitor. get_headers() works perfectly for the headers, but if the requested file is missing (404), and that's exactly what I want to use, get_file_contents(), readfile() and other functions I've tried all break with the warning/error that the file is missing instead of reading the replied stream into a variable.
So what I want is a function similar to get_headers() only for the rest of the data, like a get_data() that doesn't cancel. Is there such a thing?
Thanks for reading.
Use curl_exec. It will always return the body unless the CURLOPT_FAILONERROR option is set to TRUE.
Here's an example:
$url = 'http://www.example.com/thisrequestwillerror';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
// This is the default, but just making sure...
curl_setopt($curl, CURLOPT_FAILONERROR, false);
// Execute and return as a string
$str = curl_exec($curl);
curl_close($curl);
// Dump the response body
var_dump($str);
Wrap this in a function and use it wherever you need to get an HTTP response body in your application.

Why does curl_exec() return partial HTML from one URL and full (as expected) HTML on others?

Compare the following two chunks of code using two values for $url:
1)
$url = 'http://www.localharvest.org';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo htmlspecialchars(curl_exec($ch));
2)
$url = 'http://www.localharvest.org/caledonia-farm-M136';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo htmlspecialchars(curl_exec($ch));
1 returns full HTML as expected. 2 only returns a single line of HTML. Visiting the second page confirms there is in fact much more HTML.
Why?
<3
I just tried this.
I got the same result. It might be because of the way Curl looks for the headers. Headers are usually seperated from the main content by 4 new lines.
If you look at the content of the second URL you will see that rather fantastically there is a lot of white space around the line:
<!--jsp:setProperty name="mapg" property="projection" value="init"/-->
Curl might be getting confused as to what is body and what is header.
I suggest you use some different options to see what Curl is actually getting back, try CURLOPT_HEADER
Full list of PHP Curl options

Categories