Get A specific Data from a url - php

I looked up the other answers but none of them seem to work right for me because those who answered forgot to add comment. Am trying to get a specific P tage from div in a url. i have 3 case but how can i get the first <p> in div class="entry-content" in any of the cases.
CASE 1
<div class="entry-content">
<div></div>
<div></div>
<p> want to get content here </p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<div></div>
</div>
CASE 2
<div class="entry-content">
<div></div>
<p> want to get content here </p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<div></div>
</div>
CASE 3
<div class="entry-content">
<p> want to get content here </p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<p></p>
<div></div>
</div>
.PHP
$html = file_get_contents('http://www.myurl.com/');
$doc = new DOMDocument();
#$doc->loadHTML($html);
$p=$doc->getElementByClassName('entry-content')->getElementsByTagName('p')->item(0);
echo $p->nodeValue;

You can use PHP's DOMXPath class to select elements with a class. PHP's DOMDocument class does not have getElementsByClassName method.
<?php
$html = file_get_contents('http://www.myurl.com/');
$doc = new DOMDocument;
$doc->loadHTML($html);
$finder = new DomXPath($doc);
$p = $finder->query("//*[contains(#class, 'entry-content')]")->item(0)->getElementsByTagName('p')->item(0);
echo $p->nodeValue;
?>

With jquery it is easy:
var firstP = $('.entry-content p:first');
But your code looks like php, so I am a little confused what do you want to archive.

Related

How can I select only the immediate parent node of a text string using xpath for every match

Note: this differs from the following question in that here we have values appearing within a node and within a childnode of that same node:
XPath contains(text(),'some string') doesn't work when used with node with more than one Text subnode
Given the following html:
$content =
'<html>
<body>
<div>
<p>During the interim there shall be nourishment supplied</p>
</div>
<div>
<p>During the interim there shall be interim nourishment supplied</p>
</div>
<div>
<ul><li>During the interim there shall be nourishment supplied</li></ul>
</div>
</body>
</html>';
And the following xpath:
//*[contains(text(),'interim')]
... only provides 3 matches, whereas I want four matches. As per comments, the four elements I'm expecting are P P A LI.
This works exactly as expected. See this glot.io link.
<?php
$html = <<<HTML
<html>
<body>
<div>
<p>During the interim there shall be nourishment supplied</p>
</div>
<div>
<p>During the interim there shall be interim nourishment supplied</p>
</div>
<div>
<ul><li>During the interim there shall be nourishment supplied</li></ul>
</div>
</body>
</html>
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//*/text()[contains(.,"interim")]') as $n) var_dump($n->getNodePath());
You will get four matches:
/html/body/div[1]/p/text()
/html/body/div[2]/p/a/text()
/html/body/div[2]/p/text()[2]
/html/body/div[3]/ul/li/text()

PHP: xPath, getting all <p> tags inside a div

I'm attempting to access all the p tags inside a specific div. My xPath query looks like this, this should in theory return all p tags, however it only returns the first. Does anybody know how I might return all p tags?
//*[#id="shopMain"]/div/div/p
The structure is as follows:
<div id="shopMain">
<div id="px10">
<div id="pB30">
<p>
<span>Text I need</span>
</p>
<p>
<span>Text I need</span>
</p>
</div>
</div>
</div>
This worked for me..
define('BR','<br />');
$strhtml='<div id="shopMain">
<div id="px10">
<div id="pB30">
<p>
<span>Text I need</span>
</p>
<p>
<span>Text I need</span>
</p>
</div>
</div>
</div>';
$dom=new DOMDocument;
$dom->loadHTML( $strhtml );
$xpath=new DOMXPath( $dom );
$col=$xpath->query('//div[#id="shopMain"]/div/div/p');
if( $col ){
foreach( $col as $node ) echo $node->tagName.' '.$node->nodeValue.BR;
}
/*
output
------
p Text I need
p Text I need
*/

How to copy content of element and place it into another element?

Example file:
<p>
some content
<sup>3</sup>
some content</p>
<p>
some content
<sup>4</sup>
some content<sup>5</sup></p>
<div class="footnote">
<li id="fn3">
<p>
content3
↩
</p>
</li>
<li id="fn4">
<p>
content4
↩
</p>
</li>
<li id="fn5">
<p>
content5
↩
</p>
</li>
<div>
I need to place reference footnote at the bottom of the paragraph where the footnote is referenced.(i.e.)if the content in ptag has aelement with class fn-ref(one or many atags in a paragraph), I need to place related footnote at the bottom of that paragraph. Related footnote reference can be found in the div class="footnotes"
I should search in every ptag for a class="fn-ref", If I found, I should create a div class="footnote" in which the related footnote reference content should be placed. If it is more than one, then within that div element itself reference content should be placed one by one.
Expected output:
<p>
some content
<sup>3</sup>
some content</p>
<div class=footnote>
<p>
<span class="label-fn">
3
</span>
content3
</p>
</div>
<p>
some content
<sup>4</sup>
some content<sup>5</sup></p>
<div class=footnote>
<p>
<span class="label-fn">
4
</span>
content4
</p>
<p>
<span class="label-fn">
5
</span>
content5
</p>
</div>
I should try like parent().clone().html() then before and after add stuff but I don't know where to get started as am newbie in DOM parser class.
Tried so far:
$dom = new DOMDocument;
$dom->loadHTMLFile("test.html", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$xp->registerNamespace("php", "http://php.net/xpath");
$pElement = $xp->query("//*[contains(#class, "fn-ref")]");
foreach($pElement as $pNode) {
if ($pNode->nodeName[0] === 'p') {
//??

xpath not returning text if p tag is followed by any other tag

i want to get all the text between <p> and <h3> tag for the following HTML
<div class="bodyText">
<p>
<div class="articleBox articleSmallHorizontal channel-32333770 articleBoxBordered alignRight">
<div class="one">
<img src="url" alt="bar" class="img" width="80" height="60" />
</div>
<div class="two">
<h4 class="preTitle">QIEZ-Lieblinge</h4>
<h3 class="title"><a href="url" title="ABC" onclick="cmsTracking.trackClickOut({element:this, channel : 32333770, channelname : 'top_listen', content : 14832081, callTemplate : '_htmltagging.Text', action : 'click', mouseevent : event});">
Prominente Gastronomen </a></h3>
<span class="postTitle"></span>
<span class="district">Berlin</span> </div>
<div class="clear"></div>
</div>
I want this TEXT</p>
<h3>I want this TEXT</h3>
<p>I want this TEXT</p>
<p>
<div class="inlineImage alignLeft">
<div class="medium">
<img src="http://images03.qiez.de/Restaurant+%C3%96_QIEZ.jpg/280x210/0/167.231.886/167.231.798" width="280" height="210" alt="Schöne Lage: das Restaurant Ø. (c)QIEZ"/>
<span class="caption">
Schöne Lage: das Restaurant Ø. (c)QIEZ </span>
</div>
</div>I want this TEXT</p>
<p>I want this TEXT</p>
<p>I want this TEXT<br /> </p>
<blockquote><img src="url" alt="" width="68" height="68" />
"Eigentlich nur drei Worte: Ich komme wieder."<span class="author">Tina Gerstung</span></blockquote>
<div class="clear"></div>
</div>
i want all "I want this TEXT". i used xpath query
//div[contains(#class,'bodyText')]/*[local-name()='p' or local-name()='h3']
but it does not give me the text if <p> tag is followed by any other tag
It looks like you have div elements contained within your p element which is not valid and messing up things. If you use a var_dump in the loop you can see that it does actually pick up the node but the nodeValue is empty.
A quick and dirty fix to your html would be to wrap the first div that is contained in the p element in a span.
<span><div class="articleBox articleSmallHorizontal channel-32333770 articleBoxBordered alignRight">...</div></span>
A better fix would be to put the div element outside the paragraph.
If you use the dirty workaround you will need to change your query like so:
$xpath->query("//div[contains(#class,'bodyText')]/*[local-name()='p' or local-name()='h3']/text()");
If you do not have control of the source html. You can make a copy of the html and remove the offending divs:
$nodes = $xpath->query("//div[contains(#class,'articleBox')]");
$node = $nodes->item(0);
$node->parentNode->removeChild($node);
It might be easier to work with simple_html_dom. Maybe you can try this:
include('simple_html_dom.php');
$dom = new simple_html_dom();
$dom->load($html);
foreach($dom->find("div[class=bodyText]") as $parent) {
foreach($parent->children() as $child) {
if ($child->tag == 'p' || $child->tag == 'h3') {
// remove the inner text of divs contained within a p element
foreach($dom->find('div') as $e)
$e->innertext = '';
echo $child->plaintext . '<br>';
}
}
}
This is mixed content. Depending on what defines the position of the element, you can use a number of factors. In this cse, probably simply selected all the text nodes will be sufficient:
//div[contains(#class, 'bodyText')]/(p | h3)/text()
If the union operator within a path location is not allowed in your processor, then you can use your syntax as before or a little bit simpler in my opinion:
//div[contains(#class, 'bodyText')]/*[local-name() = ('p', 'h3')]/text()

How can I use XPath and DOM to replace a node/element in php?

Say I have the following html
$html = '
<div class="website">
<div>
<div id="old_div">
<p>some text</p>
<p>some text</p>
<p>some text</p>
<p>some text</p>
<div class="a class">
<p>some text</p>
<p>some text</p>
</div>
</div>
<div id="another_div"></div>
</div>
</div>
';
And I want to replace #old_div with the following:
$replacement = '<div id="new_div">this is new</div>';
To give an end result of:
$html = '
<div class="website">
<div>
<div id="new_div">this is new</div>
<div id="another_div"></div>
</div>
</div>
';
Is there an easy cut-and-paste function for doing this with PHP?
Final working code thanks to all Gordon's help:
<?php
$html = <<< HTML
<div class="website">
<div>
<div id="old_div">
<p>some text</p>
<p>some text</p>
<p>some text</p>
<p>some text</p>
<div class="a class">
<p>some text</p>
<p>some text</p>
</div>
</div>
<div id="another_div"></div>
</div>
</div>
HTML;
$dom = new DOMDocument;
$dom->loadXml($html); // use loadHTML if it's invalid XHTML
//create replacement
$replacement = $dom->createDocumentFragment();
$replacement ->appendXML('<div id="new_div">this is new</div>');
//make replacement
$xp = new DOMXPath($dom);
$oldNode = $xp->query('//div[#id="old_div"]')->item(0);
$oldNode->parentNode->replaceChild($replacement , $oldNode);
//save html output
$new_html = $dom->saveXml($dom->documentElement);
echo $new_html;
?>
Since the answer in the linked duplicate is not that comprehensive, I'll give an example:
$dom = new DOMDocument;
$dom->loadXml($html); // use loadHTML if its invalid (X)HTML
// create the new element
$newNode = $dom->createElement('div', 'this is new');
$newNode->setAttribute('id', 'new_div');
// fetch and replace the old element
$oldNode = $dom->getElementById('old_div');
$oldNode->parentNode->replaceChild($newNode, $oldNode);
// print xml
echo $dom->saveXml($dom->documentElement);
Technically, you don't need XPath for this. However, it can happen that your version of libxml cannot do getElementById for non-validated documents (id attributes are special in XML). In that case, replace the call to getElementById with
$xp = new DOMXPath($dom);
$oldNode = $xp->query('//div[#id="old_div"]')->item(0);
Demo on codepad
To create a $newNode with child nodes without having to to create and append elements one by one, you can do
$newNode = $dom->createDocumentFragment();
$newNode->appendXML('
<div id="new_div">
<p>some other text</p>
<p>some other text</p>
<p>some other text</p>
<p>some other text</p>
</div>
');
use jquery hide() first to hide particular div and then use append to append new div
$('#div-id').remove();
$('$div-id').append(' <div id="new_div">this is new</div>');

Categories