This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to extract img src, title and alt from html using php?
I want to replicate some functionality from Digg.com whereby when you post a new address it automatically scans the url and finds the page title.
Please tell how it is done in php......is there any other management system available by which you can make website like digg
You can use file_get_contents() to get the data from the page, then use preg_match() along with a regex pattern to get the data between <title></title>
'/<title>(.*?)<\/title>'/
You can achieve this using Ajax call to the server, where you curl the URL and send back all the details you want. You might be interested in Title, Description, Keywords etc.
function get_title($url) {
$ch = curl_init();
$titleName = '';
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
// data will contain the whole page you are looking for
// You need to parse it for the string like this <title>Google</title>
// start = strrpos($data, '<title>');
// end = strrpos($data, '</title>');
// substr($data, $start + 6, $end); 6 - length of title
return $titleName;
}
You need to implement smarter way of parsing, because <title > Google < /title> it will no find.
Related
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
Is it possible to scrape a webpage with PHP without downloading some sort of PHP library or extension?
Right now, I can grab meta tags from a website with PHP like this:
$tags = get_meta_tags('www.example.com/');
echo $tags['author']; // name
echo $tags['description']; // description
Is there a similar way to grab a info like the href from this tag, from any given website:
<link rel="img_src" href="image.png"/>
I would like to be able to do it with just PHP.
Thanks!
Try the file_get_contents function. For example:
<?php
$data = file_get_contents('www.example.com');
$regex = '/Search Pattern/';
preg_match($regex,$data,$match);
var_dump($match);
echo $match[1];
?>
You could also use the cURL library - http://php.net/manual/en/book.curl.php
Use curl for more advanced functionality. You'll be able to access headers, redirections etc. PHP Curl
<?php
$c = curl_init();
// set some options
curl_setopt($c, CURLOPT_URL, "google.com");
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($c);
curl_close($c);
?>
Is it possible to get the information displayed in the page link given below using PHP. I want all the text content displayed on the page to be copied to a variable or to a file.
http://www.ncbi.nlm.nih.gov/nuccore/24655740?report=fasta&format=text
I have tried cURL too, but it didn't work. Where as cURL worked with a few other sites I know. But even if solutions with cURL are there do post. I might have tried various methods in which cURL can be used.
Use cURL to get the page content and then parse it - extract the <pre> section.
$ch = curl_init();
// Set query data here with the URL
curl_setopt($ch, CURLOPT_URL, 'val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, '3');
$content = trim(curl_exec($ch));
curl_close($ch);
// show ALL the content
print $content;
$start_index = strpos($content, '<pre>')+5;
$end_index = strpos($content, '</pre>');
$your_text = substr($content, $start_index, $end_index-$start_index);
UPDATE
Using the link from #ovitinho's answer - it now works :)
You need to request the url used by form to show this result via javascript.
I founded this final url
http://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?val=24655740&db=nuccore&dopt=fasta&extrafeat=0&fmt_mask=0&maxplex=1&sendto=t&withmarkup=on&log$=seqview&maxdownloadsize=1000000
Pay attention to use 24655740 from your first link in this request.
You can use cURL.
I'm trying to figure it out how to take information from another site(with different domain name) and place it in my php program.
Explanation:
User inputs URL from another site.
jQuery or PHP takes information from entered URL. I know where the information is (i know its' divs ID)
And that var is put into my php program as a variable $kaina, for example.
EX:
User enters URL:http://www.sportsdirect.com/lee-cooper-bud-mens-boots-118358
And I want to get the Price. (27,99)
What lang should I use? PHP? or jquery? or anything else?
What function should I use?
How should the program look like?
Thank you for your answers :)
I'd say you have to use php (curl or file_get_contents) to download the page on to your server, parse it or use regular expression to get the price. But in this case it will be even trickier because it looks like this link leads to a page that uses javascript.
But you have to know the format of how you are going to extract the data. So PHP will do the job.
PHP's cURL library should do the trick for you: http://php.net/manual/en/book.curl.php
<?php
$ch = curl_init("http://www.example.com/");
$fp = fopen("example_homepage.txt", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
?>
You need to research on each of the step mentioned below,
One Thing That you can do is, post the message entered by the user to the server means PHP file, there you can extract the url entered by the user,
In order to extract the URL from the user post, you can use regex search:-
Check this link out:-
Extract URLs from text in PHP
Know you can curl to the url extracted from the user input.
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $extracted_url );
$html = curl_exec ( $ch );
curl_close($ch);
The curl output will contain the complete html of the page, you can then use a HTML parser
$DOM = new DOMDocument;
$DOM->loadHTML($str);
to parse till the required div is found, to have its value.
I would proabaly do something like this:
get the contents of the file: $contents = file_get_contents("http://www.sportsdirect.com/lee-cooper-bud-mens-boots-118358")
convert the contents you just got to xml: $xml = new SimpleXMLElement($contents);
search the xml for the node with attribute itemprop="price" using xpath query
read the contents of that node, et voila, you have your price
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Fetching data from another website
I want to create a webpage, that would display another webpage on it at user's request. User enters URL and sees webpage he wants on my website. Request to another page has to come from my server, not from user. Otherwise I could just use iframe.
I'm willing to write it on php because I know some of it. Can anyone tell me what subjects one must know to do this ?
You need some kind of "PHP Proxy" for this, that means get the website contents via curl or file_get_contents(). Have a look at this here: http://davidwalsh.name/curl-download
Your proxy script that may look like this:
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo get_data($_GET["url"]);
Please note that you may have to pay attention to headers for images etc. and there may also be some security flaws, but that is the basic idea.
Now you have to parse the contents of the initial website you just got and change all links from this format:
http://example.com/thecss.css
to
http://yoursite.com/proxy.php?url=http://example.com/thecss.css
Some regexes or PHP HTML parser may work here.
You could just use
echo file_get_contents('http://google.com')
But why not just download a php webproxy package like http://sourceforge.net/projects/poxy/
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How do I save a web page, programatically?
I'm just starting with curl and I've managed to pull an external website:
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch,CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,true);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$test = get_data("http://www.selfridges.com");
echo $test;
However the CSS and images are not included. I need to be also retrieve the CSS and images, basically the whole website. Can someone please post a brief way for me to get started in understanding how to parse the CSS, images and URL to get me going?
There are better tools to do this than PHP, eg. wget with the --page-requisites parameter.
Note however that automatic scraping is often a violation of the site's TOS.
There are HTML parsers for PHP. There are qute a few available, here's a post that discusses that: How do you parse and process HTML/XML in PHP?