Get part go text with domDocument in PHP

Get part go text with domDocument in PHP - php

I've this HTMl coming from a file_get_contents:
<div class="attractions-attraction-filtered-common-ListingsHeader__listingsCount--PflJ1">
<span>We found <b>10 results</b> for you.</span>
</div>
How can I get the number of results (i.e.: 10)?
Note, that the part PflJ1 is something random.
This is what I tried:
$page = file_get_contents($url);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xp = new DOMXpath($dom);
$activitiesNb = $xp->query('//div/span/text()');
$activitiesNb = $activitiesNb->nodeValue;
echo $activitiesNb;
But it does not work.
What I'm missing please ?
Thanks.

Using evaluate():
$results = $xp->evaluate('string(//span[contains(., "We found ")]/b/text())');

Your XPath needs to be fixed (you're looking for the content of a b element). Also, use item() in combination with nodevalue.
<?php
$html = <<<'HTML'
<div class="attractions-attraction-filtered-common-ListingsHeader__listingsCount--PflJ1">
<span>We found <b>10 results</b> for you.</span>
</div>
HTML;
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
$res = $xpath->query('//div[starts-with(#class,"attractions-")]/span/b');
$val = $res ->item(0)->nodeValue;
echo substr($val,0,2)
?>
Output : 10
Alternative :
$res = $xpath->evaluate("substring(//div[starts-with(#class,'attractions-')]/span/b,1,2)");
echo $res
And if you just have one number (for e.g. 8) :
echo substr($val,0,1)
or
$res = $xpath->evaluate("substring(//div[starts-with(#class,'attractions-')]/span/b,1,1)");
echo $res

Related

Displaying the results of php document parser

Can you echo the results of a document parser or do you have to first create an array to display the results? Anyway, when running the code, nothing appears (no output or errors), and I have tried both methods. Could possibly be a site issue but I have tried a few others and get the same result.
<?php
$ebayquery ='halo';
$ebayhtml = 'https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1311.R6.TR12.TRC2.A0.H0.X.TRS0&_nkw=' . $ebayquery . '&_sacat=0';
$ebayresults = array();
$document = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$document->loadHTML($ebayhtml);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXpath($document);
$links = $xpath->query('//h3[#id="lvtitle"]/a');
foreach($links as $a) {
echo $a->nodeValue;
}
?>

There are a couple of problems with the code. Firstly is that loadHTML() takes a string for the HTML and not a filename or URI. So first you have to read the web page and pass it in ( I've used file_get_contents() here).
Secondly, the XPath was looking for any <h3> tag with an id attribute of lvtitle, there are only instances where the class attribute is lvtitle. I've updated the XPath expression to use this instead.
$ebayquery ='halo';
$ebayhtml = 'https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1311.R6.TR12.TRC2.A0.H0.X.TRS0&_nkw=' . $ebayquery . '&_sacat=0';
$ebayresults = array();
$document = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$ebayhtml = file_get_contents($ebayhtml);
$document->loadHTML($ebayhtml);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXpath($document);
$links = $xpath->query('//h3[#class="lvtitle"]/a');
print_r($links);
foreach($links as $a) {
echo $a->nodeValue.PHP_EOL;
}

How to Get value with name by Dom

Hy friends I am using this method to get all href links from tag from a site
$DOM = new DOMDocument();
#$DOM->loadHTML($data);
#$links = $DOM->getElementsByTagName('a');
foreach($links as $link){
$url = $link->getAttribute('href');
echo $url;
Now I don't know how to get the value by name fb_dtsg ..... Here is the source code
<input type="hidden" name="fb_dtsg" value="AQF0dSiG6Lyr:AQEnJP0PhWzy" autocomplete="off" />
I want to get it's value with DOm how to do this...... Thanks in advance

$DOM = new DOMDocument();
#$DOM->loadHTML($data);
#$links = $DOM->getElementsByTagName('input');
foreach($inputs as $input) {
if ($input->getAttribute('name') == 'fb_dtsg') {
echo 'found, do whatever';
break;
}
}

You can use DOMXpath()'s query method to get elements by the name attribute.
$DOM = new DOMDocument();
#$DOM->loadHTML($data);
#$links = $DOM->getElementsByTagName('a');
$xpath = new DOMXpath($DOM);
$input = $xpath->query('//input[#name="fb_dtsg"]');
echo $input[0]->getAttribute('value');
This will print the value of the first input element with name 'fb_dtsg'.
Hope it helps :) Feel free to ask if you need to know anything more.

Use xpath for that.
$DOM = new DOMDocument();
#$DOM->loadHTML($data);
$xpath = new DOMXpath($DOM);
$elementByName = $xpath->query("//input[#name='fb_dtsg']");
...
http://php.net/manual/ro/class.domxpath.php

$DOM->getElementsByTagName('a'); // for tag name
$DOM->getElementsByName('fb_dtsg'); // for name
document.getElementById('fb_dtsg_id').value // for showing value of the field

How can I add an element into the middle of a text node's text?

Given the following HTML:
$content = '<html>
<body>
<div>
<p>During the interim there shall be nourishment supplied</p>
</div>
</body>
</html>';
How can I alter it to the following HTML:
<html>
<body>
<div>
<p>During the <span>interim</span> there shall be nourishment supplied</p>
</div>
</body>
</html>
I need to do this using DomDocument. Here's what I've tried:
$dom = new DomDocument();
$dom->loadHTML($content);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXpath($dom);
$elements = $xpath->query("//*[contains(text(),'interim')]");
if (!is_null($elements)) {
foreach ($elements as $element) {
$text = $element->nodeValue;
$element->nodeValue = str_replace('interim','<span>interim</span>',$text);
}
}
echo $dom->saveHTML();
However, this outputs literal html entities so it renders like this in the browser:
During the <span>interim</span> there shall be nourishment supplied
I imagine one should use createElement and appendChild methods instead of assigning nodeValue directly but I can't see how to insert an element in the middle of a textNode string?

Marcus Harrison's answer using splitText is a good one, but it can be simplified and needs to use mb_* methods to work with UTF-8 input:
<?php
$html = <<<END
<html>
<meta charset="utf-8">
<body>
<div>
<p>During € the interim there shall be nourishment supplied</p>
</div>
</body>
</html>
END;
$replace = 'interim';
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query(sprintf('//text()[contains(., "%s")]', $replace));
foreach ($nodes as $node) {
$start = mb_strpos($node->textContent, $replace);
$end = $start + mb_strlen($replace);
$node->splitText($end); // do this first
$node->splitText($start); // do this last
$newnode = $doc->createElement('span');
$node->parentNode->insertBefore($newnode, $node->nextSibling);
$newnode->appendChild($newnode->nextSibling);
}
$doc->encoding = 'UTF-8';
print $doc->saveHTML($doc->documentElement);

Create a new DomDocument with modified element and replace the old one
foreach ($elements as $element) {
$text = $element->nodeValue;
$el = new DomDocument();
$el->loadHTML('<iframe>'. str_replace('interim','<span>interim</span>',$text) . '</iframe>');
$new = $dom->importNode($el->getElementsByTagName('iframe')->item(0), true);
unset($el);
$element->parentNode->replaceChild($new, $element);
}

In order to do this, you must use the DOMString's splitText interface. This accepts an offset, which can be retrieved by using strpos:
$dom = new DomDocument();
$dom->loadHTML($content);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXpath($dom);
$elements = $xpath->query("//*[contains(text(),'interim')]");
if (!is_null($elements)) {
foreach ($elements as $element) {
$text = $element->childNodes->item(0);
$text->splitText(strpos($text->textContent, "interim"));
$text2 = $element->childNodes->item(1);
$text2->splitText(strpos($text2->textContent, " "));
$element->removeChild($text2);
$span = $dom->createElement("span");
$span->appendChild($dom->createTextNode("interim"));
$element->insertBefore($span, $element->childNodes->item(1));
}
}
echo $dom->saveHTML();
Edits: having just tested it, I realise I hadn't removed the original "interim" in the second text node. Edited this answer to do that. I have also edited this code to be as compatible with old versions of PHP as I can think of making it: as I don't run an old version of PHP it isn't possible for me to test that.

How to extract the contents inside a div based on its class?

I tried with this code,
$html= file_get_contents("page.html");
$dom = new DOMDocument;
$dom->loadHTML($html);
$div = $dom->getElementsByClassName('mydiv1');
$result = $dom->saveHTML($div);
echo $result;
page.html
<html>
<body>
<div id="test">
<div class="mydiv1">Hello</div>
<div class="mydiv2">How are you</div>
</div>
</body>
</html>
But when I tried with Id its works. like,
$html= file_get_contents("page.html");
$dom = new DOMDocument;
$dom->loadHTML($html);
$div = $dom->getElementById('test');
$result = $dom->saveHTML($div);
echo $result;
How can I get the content based on class ?

Try this code,
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$div = $xpath->query('//div[#class="mydiv1"]');
$div = $div->item(0);
$result = $dom->saveXML($div);
echo $result;

There is no actual getElementsByClassName (yet) in DOMDocument, but the same results can be produced using DOMXpath as :
$dom = new DomDocument();
$dom->load($filePath);
$finder = new DomXPath($dom);
$nodes= $finder->query('//div[#class="mydiv1"]');

DOM Parser grabbing href of <a> tag by class="Decision"

I'm working with a DOM parser and I'm having issues. I'm basically trying to grab the href within the tag that only contain the class ID of 'thumbnail '. I've been trying to print the links on the screen and still get no results. Any help is appreciated. I also turned on error_reporting(E_ALL); and still nothing.
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$classId = "thumbnail ";
$div = $html->find('a#'.$classId);
echo $div;
I also tried this but still had the same result of NOTHING:
include('simple_html_dom.php');
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$ret = $html->find('a[class=thumbnail]');
echo $ret;

You were almost there:
<?php
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a[contains(concat(' ',normalize-space(#class),' '),' thumbnail ')]");
var_dump($hrefs);
Gives:
class DOMNodeList#28 (1) {
public $length =>
int(25)
}
25 matches, I'd call it success.

This code would probably work:
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hyperlinks = $xpath->query('//a[#class="thumbnail"]');
foreach($hyperlinks as $hyperlink) {
echo $hyperlink->getAttribute('href'), '<br>;'
}

if you're using simple_html_dom, why are you doing all these superfluous things? It already wraps the resource in everything you need -- http://simplehtmldom.sourceforge.net/manual.htm
include('simple_html_dom.php');
// set up:
$html = new simple_html_dom();
// load from URL:
$html->load_file('http://www.reddit.com/r/funny');
// find those <a> elements:
$links = $html->find('a[class=thumbnail]');
// done.
echo $links;

Tested it and made some changes - this works perfect too.
<?php
// load the url and set up an array for the links
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$links = array();
// loop thru all the A elements found
foreach($dom->getElementsByTagName('a') as $link) {
$url = $link->getAttribute('href');
$class = $link->getAttribute('class');
// Check if the URL is not empty and if the class contains thumbnail
if(!empty($url) && strpos($class,'thumbnail') !== false) {
array_push($links, $url);
}
}
// Print results
print_r($links);
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get part go text with domDocument in PHP - php

Using evaluate(): $results = $xp->evaluate('string(//span[contains(., "We found ")]/b/text())');

Related

Displaying the results of php document parser

How to Get value with name by Dom

How can I add an element into the middle of a text node's text?

How to extract the contents inside a div based on its class?

DOM Parser grabbing href of <a> tag by class="Decision"

Categories

Resources