Dom element of paragraph's text - php

I'm making a web scraper and this is driving me crazy!
I need to get the text of a paragraph. Simple, right?! Here's the code.
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//div");
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('class');
echo "<br />Found it: $url";
}
It works perfectly, grabs the class of every div on the page and echoes it out. But what I really need to do is find all <p> tags - every one on the page - and echo the text that is in between the <p>! I have a feeling it's simple but I just can't figure it out.
edit
All it took was the following:
$doc = new DOMDocument();
#$doc->loadHTML($html);
$node = $doc->getElementsByTagName('p')->item(3);
echo $node->textContent."\n";
What you really want is getElementsByName and then once you have the node, you textContent for the win. Thanks folks! Not sure if it will apply to everyone else's situation, but it sure does mine. =o

Use getElementsByTagName to retrieve all <p>-elements. Then iterate over the resulting DOMNodeList an fetch the nodeValue of the items.
<?php
$dom=new DOMDocument;
$dom->loadXML('<html><body><p>para1<p>para2<p>para3</p></p></p></body></html>');
$paras=$dom->getElementsByTagName('p');
for($p=0;$p<$paras->length;++$p)
{
echo htmlentities($paras->item($p)->nodeValue).'<hr/>';
}
?>

This jQuery snippet may help. upon click on textarea, it will find all contents in p element
and load them into textarea.
/** BEGIN **/
$(document).ready(function(){
$('textarea').click(function(){
var pText = $('p').text();
if($('p').children('a, span, li'))
{
var aText = $('a').text();
var spanText=$('span').text();
var liText= $('li').text();
}
//alert('the value p is ' + pText +''+ spanText+''+liText);
$(this).text(pText+''+ spanText+''+liText);
});
});
/** END **/

Related

blank element when trying to fetch the elements from html tag

I need some help with my php. I'm trying to fetch the list of elements from html tag called streams. When I try to fetch the elements from the html tags, it will not fetch it as it will show as a blank.
Here is the code:
<?php
//ini_set('max_execution_time', 1000);
error_reporting(0);
//$errmsg_arr = array();
//$errflag = false;
$baseUrl = file_get_contents('http://www.example.com/streams.php');
$domdoc = new DOMDocument();
$domdoc->strictErrorChecking = false;
$domdoc->recover=true;
#$domdoc->loadHTML($baseUrl);
$links = $domdoc->getElementById('streams');
echo $links;
?>
Here is the html source:
<p id='channels'>BBC One South E</p><p id='streams'>http://www.example.com/live/35vlz78As8/191</p>
Can you please help me with how I can be able to fetch the elements from the id of the html tags?
Remove the # at the begin of $domdc so you can see if some error happen and
You should use textContent
$domdoc->loadHTML($baseUrl);
$links = $domdoc->getElementById('streams');
echo $links->textContent;

How to get data or value from any div in php

i Have create php page where use many div with different id name.
so i want to get data or value from one div.
Here am showing one div with id name
i want to get data or value from this div.
<div id="tablename">tablename</div>
i have use this but its not working.
$doc = new DomDocument();
$thediv = $doc->getElementById('tablename');
echo $thediv->textContent;
So please tell me how can i get this value from my div?
You need to pass the whole content of your page to the class, otherwise, it can't select nothing since it thinks the document is empty:
$content = '<div id="tablename"></div>';
$doc = new DomDocument();
$doc->loadHTML($content); // That's the addition
$thediv = $doc->getElementById('tablename');
echo $thediv->textContent;
More info:
loadHTML(): Load the HTML from a string.
loadHTMLFile(): Load the HTML from a file.
Downloaded and include PHP Simple HTML DOM Parser from https://sourceforge.net/projects/simplehtmldom/files/ and
Try this
include 'simple_html_dom.php';
$html = file_get_html("http://www.facebook.com");
$displaybody = $html->find('div[id=blueBarDOMInspector]', 0)->plaintext;
echo $displaybody ;exit;

Obtain text between div tags within only certain span IDs with DOMdocument or SimpleDOM

this is my code:-
<?php
$url = file_get_contents("http://www.youtube.com/watch?v=QR8A3T6sPzU");
$doc = new DOMDocument();
#$doc->loadHTML($url);
$xpath = new DOMXPath($doc);
$myNews = $xpath->query('//#id="watch7-views-info"')->item(0);
echo $myNews;
?>
how to get the all text between div tags within only certain span IDs...
thanks
I'd use simpleHtmlDOM (simpleHtmlDOM):
include 'simpledom.php';
$html = file_get_html('http://www.youtube.com/watch?v=QR8A3T6sPzU');
// Find all divs with id watch7-views-info and echo their contents
foreach($html->find('div[id=watch7-views-info]') as $element)
echo $element->plaintext . '<br>';
This would find all divs with the specific id, you mentioned something about spans, but you'll have to elaborate because I don't know what you mean.

Get the first image in a page with class foo

I'm trying to get the first image with specific class from page by php
<?php
$document = new DOMDocument();
#$document->loadHTML(file_get_contents('http://www.cbsnews.com/8301-501465_162-57471379-501465/first-picture-on-the-internet-turns-20/'));
$lst = $document->getElementsByTagName('img');
for ($i=0; $i<$lst->length; $i++) {
$image = $lst->item($i);
echo $image->attributes->getNamedItem('src')->value, '<br />';
}
?>
this code get all images from the page, i'm trying now to get the images with class "cnet-image" from this page
You should be able to do what you need to with Simple HTML Dom, give it a try, I've used it for several similar things including image crawlers. http://simplehtmldom.sourceforge.net/
It looks like you should be able to use the following for what you need.
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
$ret = $html->find('img[class=foo]');
I presume that you want to retrieve the first image with a specific class attribute name in a HTML document.
If that's the case, then this could help.
var l = document.images;
var myclass = "myclass";//This is the class you want
var firstImageWithMyClass = null;
for(var i = 0; i<l; i++)
if(document.images[i].className==myclass){
firstImageWithMyClass = document.images[i];
break;
}
//Then you can see if an image with that class was found,
//then do what you want to do withit here;
if(firstImageWithMyClass!=null){
var imageSource = firstImageWithMyClass.src;
//etc, etc
}
jQuery makes this easier. Let me know if you would like to know how to do the same with jQuery and I can share with you.

Finding and Printing all Links within a DIV

I am trying to find all links in a div and then printing those links.
I am using the Simple HTML Dom to parse the HTML file. Here is what I have so far, please read the inline comments and let me know where I am going wrong.
include('simple_html_dom.php');
$html = file_get_html('tester.html');
$articles = array();
//find the div the div with the id abcde
foreach($html->find('#abcde') as $article) {
//find all a tags that have a href in the div abcde
foreach($article->find('a[href]') as $link){
//if the href contains singer then echo this link
if(strstr($link, 'singer')){
echo $link;
}
}
}
What currently happens is that the above takes a long time to load (never got it to finish). I printed what it was doing in each loop since it was too long to wait and I find that its going through things I don't need it to! This suggests my code is wrong.
The HTML is basically something like this:
<div id="abcde">
<!-- lots of html elements -->
<!-- lots of a tags -->
<a href="singer/tom" />
<img src="image..jpg" />
</a>
</div>
Thanks all for any help
The correct way to select a div (or whatever) by ID using that API is:
$html->find('div[id=abcde]');
Also, since IDs are supposed to be unique, the following should suffice:
//find all a tags that have a href in the div abcde
$article = $html->find('div[id=abcde]', 0);
foreach($article->find('a[href]') as $link){
//if the href contains singer then echo this link
if(strstr($link, 'singer')){
echo $link;
}
}
Why don't you use the built-in DOM extension instead?
<?php
$cont = file_get_contents("http://stackoverflow.com/") or die("1");
$doc = new DOMDocument();
#$doc->loadHTML($cont) or die("2");
$nodes = $doc->getElementsByTagName("a");
for ($i = 0; $i < $nodes->length; $i++) {
$el = $nodes->item($i);
if ($el->hasAttribute("href"))
echo "- {$el->getAttribute("href")}\n";
}
gives
... (lots of links before) ...
- http://careers.stackoverflow.com
- http://serverfault.com
- http://superuser.com
- http://meta.stackoverflow.com
- http://www.howtogeek.com
- http://doctype.com
- http://creativecommons.org/licenses/by-sa/2.5/
- http://www.peakinternet.com/business/hosting/colocation-dedicated#
- http://creativecommons.org/licenses/by-sa/2.5/
- http://blog.stackoverflow.com/2009/06/attribution-required/

Categories