How to copy content from a dynamic page using PHP? - php

Is it possible to get the information displayed in the page link given below using PHP. I want all the text content displayed on the page to be copied to a variable or to a file.
http://www.ncbi.nlm.nih.gov/nuccore/24655740?report=fasta&format=text
I have tried cURL too, but it didn't work. Where as cURL worked with a few other sites I know. But even if solutions with cURL are there do post. I might have tried various methods in which cURL can be used.

Use cURL to get the page content and then parse it - extract the <pre> section.
$ch = curl_init();
// Set query data here with the URL
curl_setopt($ch, CURLOPT_URL, 'val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '3');
$content = trim(curl_exec($ch));
curl_close($ch);
// show ALL the content
print $content;
$start_index = strpos($content, '<pre>')+5;
$end_index = strpos($content, '</pre>');
$your_text = substr($content, $start_index, $end_index-$start_index);
UPDATE
Using the link from #ovitinho's answer - it now works :)

You need to request the url used by form to show this result via javascript.
I founded this final url
http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000
Pay attention to use 24655740 from your first link in this request.
You can use cURL.

Related

PHP cURL returns encrypted html page

I'm trying to get simple html code, from cURL GET-request on PHP.
Default get-request on url, like http://example.com/ (not exacly this domain), returns html code I need, but get-request on page of this domain, like http://example.com/something returns gzip encrypted data, or something.
What I already tried to fix this issue:
curl_setopt(ch, CURLOPT_ENCODING, ''); // returns ''
curl_setopt(ch, CURLOPT_ENCODING, 'gzip'); // returns ''
curl_setopt(ch, CURLOPT_ENCODING, 'gzip,compressed'); // returns ''
$html = gzdecode($data); // data error
By the way, on inspector, like Fiddler, this page returns similar wierd symbols, but it fixes by one click: 'Click to decrypt'. How I can decrypt my data programmatically, using PHP?
If I understood you well, you need to get the content in HTML from an url.
Please, check this link:
Get HTML from URL using curl in PHP
You don't need to use CURLOPT_ENCODING in curl_setopt.
EDIT
I tried this and it works:
<?php
function get_data($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html_content = get_data('https://stackoverflow.com/questions/61548866/php-curl-returns-encrypted-html-page/61549219?noredirect=1#comment108875034_61549219');
echo "You are getting HTML code from an url <br>".$html_content;
?>
Image with test working in localhost
Thank you, I hope it helps you.

Load divs' content from another domain to my pages' php variable

I'm trying to figure it out how to take information from another site(with different domain name) and place it in my php program.
Explanation:
User inputs URL from another site.
jQuery or PHP takes information from entered URL. I know where the information is (i know its' divs ID)
And that var is put into my php program as a variable $kaina, for example.
EX:
User enters URL:http://www.sportsdirect.com/lee-cooper-bud-mens-boots-118358
And I want to get the Price. (27,99)
What lang should I use? PHP? or jquery? or anything else?
What function should I use?
How should the program look like?
Thank you for your answers :)
I'd say you have to use php (curl or file_get_contents) to download the page on to your server, parse it or use regular expression to get the price. But in this case it will be even trickier because it looks like this link leads to a page that uses javascript.
But you have to know the format of how you are going to extract the data. So PHP will do the job.
PHP's cURL library should do the trick for you: http://php.net/manual/en/book.curl.php
<?php
$ch = curl_init("http://www.example.com/");
$fp = fopen("example_homepage.txt", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
?>
You need to research on each of the step mentioned below,
One Thing That you can do is, post the message entered by the user to the server means PHP file, there you can extract the url entered by the user,
In order to extract the URL from the user post, you can use regex search:-
Check this link out:-
Extract URLs from text in PHP
Know you can curl to the url extracted from the user input.
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $extracted_url );
$html = curl_exec ( $ch );
curl_close($ch);
The curl output will contain the complete html of the page, you can then use a HTML parser
$DOM = new DOMDocument;
$DOM->loadHTML($str);
to parse till the required div is found, to have its value.
I would proabaly do something like this:
get the contents of the file: $contents = file_get_contents("http://www.sportsdirect.com/lee-cooper-bud-mens-boots-118358")
convert the contents you just got to xml: $xml = new SimpleXMLElement($contents);
search the xml for the node with attribute itemprop="price" using xpath query
read the contents of that node, et voila, you have your price

Grabbing Destination Link of a Redirect

Hopefully I am just overlooking this.
I am trying to grab the destination URL of a redirect link using PHP. It's to get the site URL of an affiliate/cloaked link.
Best example: http://tinyurl.com/2tx goes to google.com
NOTE: This is an example, the links are created dynamically
Right now I pass the URL through
www.mysite.com/redirect.php?link=http://tinyurl.com/2tx
Here is the code from the site - NOTE: since the URL has Ampersands in them I had to go this route over GET.
<?php
$name = http_build_query($_GET);
// which you would then may want to strip away the first 'name='
$name = substr($name, strlen('name='));
//change link to a nice URL
$url = rawurldecode($name);
?>
I have a simple script that grabs the URL, how could I process the URL to get the destination URL?
Hopefully that's not too confusing.
Cheers,
Robb
You should post some of your code next time. I assume you are using cURL to do this. It's fairly simple:
//sanitize
$ch = curl_init($_GET['link']);
//follow redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
$url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
EDIT: per Dagon, you just want to "know the url but not go there." It is more efficient to use this setting if you only need to know the url but not get its contents:
curl_setopt($ch, CURLOPT_NOBODY, true);
Here is how I would do it (Read the comments):
<?php
// Connect to the page:
$ch = curl_init("http://tinyurl.com/2tx");
// Don't get the body (remove if you want the body):
curl_setopt($ch, CURLOPT_NOBODY, true);
// Follow the page redirects:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
// Retun the data as a string (Remove to echo to the page):
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Execute:
curl_exec($ch);
// Get data:
print_r($data = curl_getinfo($ch));
// Get just the url:
echo $data["url"];
Make a HTTP HEAD Request to the URL you have. You will get back an HTTP 301 or 302 response with the destination URL.
Example: Put your URL here to see the response returned when making an HTTP Head Request.
This might be an encoding issue. The parameter in your URL is not encoded, so it's probably damaged when trying to get it using $_GET.
You want to use this URL:
www.mysite.com/redirect.php?link=http%3A%2F%2Ftinyurl.com%2F2tx
You can encode URL variables in PHP using the urlencode() function. The variable that (I think) you want can now be accessed like this:
echo $_GET['link']; // http://tinyurl.com/2tx

Header() substitute

Hi I am new to php and want to know some alternate function for the header('location:mysit.php');
I am in a scenario that I am sending the request like this:
header('Location: http://localhost/(some external site).php'&?var='test')
something like this but what I wanna do is that I want to send values of variables to the external site but I actually dont want that page to pop out.
I mean variables should be sent to some external site/page but on screen I want to be redirected to my login page. But seemingly I dont know any alternative please guide me. Thx.
You are searching for PHP cUrl:
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
Set the location header to the place you actually want to redirect the browser to and use something like cURL to make an HTTP request to the remote site.
The way you usually would do that is by sending those parameters by cURL, parse the return values and use them however you need.
By using cURL you can pass POST and GET variables to any URL.
Like so:
$ch = curl_init('http://example.org/?aVariable=theValue');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
Now, in $result you have the response from the URL passed to curl_init().
If you need to post data, the code needs a little more:
$ch = curl_init('http://example.org/page_to_post_to.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'variable1=value1&variable2=value2');
$result = curl_exec($ch);
curl_close($ch);
Again, the result from your POST reqeust is saved to $result.
You could connect to another URL in the background in numerous ways. There's cURL ( http://php.net/curl - already mentioned here in previous comments ), there's fopen ( http://php.net/manual/en/function.fopen.php ), there's fsockopen ( http://php.net/manual/en/function.fsockopen.php - little more advanced )

Why does curl_exec() return partial HTML from one URL and full (as expected) HTML on others?

Compare the following two chunks of code using two values for $url:
1)
$url = 'http://www.localharvest.org';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo htmlspecialchars(curl_exec($ch));
2)
$url = 'http://www.localharvest.org/caledonia-farm-M136';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
echo htmlspecialchars(curl_exec($ch));
1 returns full HTML as expected. 2 only returns a single line of HTML. Visiting the second page confirms there is in fact much more HTML.
Why?
<3
I just tried this.
I got the same result. It might be because of the way Curl looks for the headers. Headers are usually seperated from the main content by 4 new lines.
If you look at the content of the second URL you will see that rather fantastically there is a lot of white space around the line:
<!--jsp:setProperty name="mapg" property="projection" value="init"/-->
Curl might be getting confused as to what is body and what is header.
I suggest you use some different options to see what Curl is actually getting back, try CURLOPT_HEADER
Full list of PHP Curl options

Categories