How to get specific content with PHP and DOM Document? - php

I have a url I want to grab. I only want a short piece of content from it. The content in question is in a div that has a ID of sample.
<div id="sample">
Content
</div>
I can grab the file like so:
$url= file_get_contents('http://www.example.com/');
But how do I select just that sample div.
Any ideas?

I'd recommend using the PHP Simple HTML DOM Parser.
Then you can do:
$html = file_get_html('http://www.example.com/');
$html->find('div[#sample]', 0);

I would recommend something like Simple HTML DOM, although if you are very sure of the format, you may wish to look at using regex to extract the data you want.

A while ago, I released an open source library named PHPPowertools/DOM-Query, which allows you to (1) load an HTML file and then (2) select or change parts of your HTML much like you'd do it with jQuery.
Using that library, here's how you'd select the sample div for your example :
use \PowerTools\DOM_Query;
// Get file content
$htmlcode = file_get_contents('http://www.example.com/');
// Create a new DOM_Query object
$H = new DOM_Query($htmlcode);
// Find the elements that match selector "div#sample"
$s = $H->select('div#sample');

Related

How can I update an element property using simple_html_dom

Id like to update the 'src' attribute of an img tag using Simple HTML DOM. Ive got his at the top of the php file (join.php) which contains the img file:
include_once("simplehtmldom/simple_html_dom.php");
$htmldomOb = file_get_html('join.php');
$htmldomOb->find('img[id=imgtapchat]', 0)->src = './tapchat/clss_tapcht-1.php';
echo $htmldomOb;
This works but outputs the entire page again since i read in the entire page dom object. How can i just update the image src similar to how it is done in jQuery - As it says in the SimpleHTMLDOM site docs
Find tags on an HTML page with selectors just like jQuery
With thanks
I use PHP Dom extension to rewrite PHP Simple Dom, just finished. You can try it here.
http://shinbonlin.github.io/html-parser/

Php file_get_contents() issue

With php file_get_contents() i want just only the post and image. But it's get whole page. (I know there is other way to do this)
Example:
$homepage = file_get_contents('http://www.bdnews24.com/details.php?cid=2&id=221107&hb=5',
true);
echo $homepage;
It's show full page. Is there any way to show only the post which cid=2&id=221107&hb=5.
Thanks a lot.
Use PHP's DomDocument to parse the page. You can filter it more if you wish, but this is the general idea.
$url = 'http://www.bdnews24.com/details.php?cid=2&id=221107&hb=5';
// Create new DomDocument
$doc = new DomDocument();
$doc->loadHTMLFile($url);
// Get the post
$post = $doc->getElementById('opage_mid_left');
var_dump($post);
Update:
Unless the image is a requirement, I'd use the printer-friendly version: http://www.bdnews24.com/pdetails.php?id=221107, it's much cleaner.
You will need to parse the resulting HTML using a DOM parser to get the HTML of only the part you want. I like PHP Simple HTML DOM Parser, but as Paul pointed out, PHP also has it's own.
you can extract the
<div id="page">
//POST AND IMAGE EXIST HERE
</div>
part from the fetched contents using regex and push it on your page...

Creating a personalization engine with php

I am new to php and I want to create an php engine which changes the web content of a webpage with PHP with the use of data in mysql. For example (changing the order of navigation links on a webpage with the order of highest click count) I am not sure how PHP will read the HTML file and change the elements in the HTML file and also output the HTML file with the changes. Is this possible?
I am not quite sure why you would want to generate the html, read it, change it and then output it. It seems to be a lot easier to just generate it the way you want to in the first place.
I am not sure how PHP will read the HTML file and change the elements in the HTML file and also output the HTML file with the changes. Is this possible?
You could use file_get_contents:
$html = file_get_contents($url);
Then use a html-parser like Simple HTML DOM Parser, change what you want to do and output it.
If you want to modify HTML structure, use ganon - HTML DOM parser for PHP
include('path/ganon.php');
// Parse the google code website into a DOM
$html = file_get_dom('http://code.google.com/');
foreach($html('p[class]') as $element) {
echo $element->class, "<br>\n";
}

How do I get the link element in a html page with PHP

First, I know that I can get the HTML of a webpage with:
file_get_contents($url);
What I am trying to do is get a specific link element in the page (found in the head).
e.g:
<link type="text/plain" rel="service" href="/service.txt" /> (the element could close with just >)
My question is: How can I get that specific element with the "rel" attribute equal to "service" so I can get the href?
My second question is: Should I also get the "base" element? Does it apply to the "link" element? I am trying to follow the standard.
Also, the html might have errors. I don't have control on how my users code there stuff.
Using PHP's DOMDocument, this should do it (untested):
$doc = new DOMDocument();
$doc->loadHTML($file);
$head = $doc->getElementsByTagName('head')->item(0);
$links = $head->getElementsByTagName("link");
foreach($links as $l) {
if($l->getAttribute("rel") == "service") {
echo $l->getAttribute("href");
}
}
You should get the Base element, but know how it works and its scope.
In truth, when I have to screen-scrape, I use phpquery. This is an older PHP port of jQuery... and what that may sound like something of a dumb concept, it is awesome for document traversal... and doesn't require well-formed XHTMl.
http://code.google.com/p/phpquery/
I'm working with Selenium under Java for Web-Application-Testing. It provides very nice features for document traversal using CSS-Selectors.
Have a look at How to use Selenium with PHP.
But this setup might be to complex for your needs if you only want to extract this one link.

How to write this crawler in php?

I need to create a php script.
The idea is very simple:
When I send a link of a blogpost to this php script, then the webpage is crawled and the first image with the title page are saved on my server.
What PHP function I have to use for this crawler ?
Use PHP Simple HTML DOM Parser
// Create DOM from URL
$html = file_get_html('http://www.example.com/');
// Find all images
$images = array();
foreach($html->find('img') as $element) {
$images[] = $element->src;
}
Now $images array have images links of given webpage. Now you can store your desired image in database.
HTML Parser: HTMLSQL
Features: you can get external html file, http or ftp link and parse content.
Well, you'll have to use quite a few functions :)
But I'm going to assume that you're asking specifically about finding the image, and say that you should use a DOM parser like Simple HTML DOM Parser, then curl to grab the src of the first img element.
I would user file_get_contents() and a regular expression to extract the first image tags src attribute.
CURL or a HTML Parser seem overkill in this case, but you are welcome to check it out.

Categories