How to select the right div by class with Xpath? - php

$divs = $xpathsuj->query("//div[#class='txt-msg text-enrichi-forum ']");
$div = $divs[$i];
With this XPath command I'm able to select the div with the class "txt-msg text-enrichi-forum " :
<div class="bloc-contenu">
<div class="txt-msg text-enrichi-forum ">
<p>Tu pourrais écrire en FRANCAIS si ce n'est pas trop demandé?
<img src="http://image.jeuxvideo.com/smileys_img/54.gif" alt=":coeur:" data-code=":coeur:" title=":coeur:" width="21" height="20" />
</p>
</div>
</div>
But not this one :
<div class="bloc-contenu">
<div class="txt-msg text-enrichi-forum ">
<p>
<img src="http://image.jeuxvideo.com/smileys_img/42.gif" alt=":salut:" data-code=":salut:" title=":salut:" width="46" height="41" />
</p>
</div>
<div class="signature-msg text-enrichi-forum ">
<p>break;</p>
</div>
</div>
What am I doing wrong?

I've tried it with both segments of XML and it seems to work with both, but there is a possibility that there is some issue with spacing.
In your XPath query, your looking for an exact match of 'txt-msg text-enrichi-forum ' which has two spaces after txt-msg and one after the last part. If any spaces are missing, then this will not find the element.
If you change it to...
$divs = $xpathsuj->query("//div[contains(#class,'txt-msg') and contains(#class,'text-enrichi-forum')]");
foreach ( $divs as $div ) {
echo $doc->saveXML($div).PHP_EOL;
}
It should be a bit more tolerant.

Related

How to remove P tags surrounding img using jQuery?

I've got a webpage that is outputted through CKEditor. I need it to display the image without the <p></p> tags but I need it to leave the actual text within the paragraph tags so I can target it for styling.
I've tried to achieve this through the jQuery below that I found on another post here but it isn't working for me..
I have tried:
$('img').unwrap();
and I've tried:
$('p > *').unwrap();
Both of these don't work. I can disable the tags altogether from my editors config, but I wont be able to target the text on it's own if it's not wrapped in a tag.
The outputted HTML is:
<body>
<div id="container" class="container">
<p><img alt="" src="http://localhost/integrated/uploads/images/roast-dinner-main-xlarge%281%29.jpg" style="height:300px; width:400px" /></p><p>Our roast dinners are buy one get one free!</p>
</div>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
<script>
$(document).ready(function() {
$('p > *').unwrap();
});
</script>
</body>
All help is appreciated!
Usually done using
$('img').unwrap("p");
but this will also orphan any other content (like text) from it's <p> parent (that contained the image).
So basically you want to move the image out of the <p> tags.
There's two places you can move your image: before or after the p tag:
$("p:has(img)").before(function() { // or use .after()
return $(this).find("img");
});
p {
background: red;
padding: 10px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="container" class="container">
<p>
<img alt="" src="http://placehold.it/50x50/f0b" />
</p>
<p>
Our roast dinners are buy one get one free!
</p>
</div>
<p>
<img src="http://placehold.it/50x50/f0b" alt="">
Lorem ipsum dolor ay ay
<img src="http://placehold.it/50x50/0bf" alt="">
</p>
<p>
<img src="http://placehold.it/50x50/0bf" alt="">
</p>
although notice that the above will not remove the empty <p> tags we left behind. See here how to remove empty p tags
Remedy
If you want to remove the empty paragraphs - if the image was the only child -
and keep paragraphs that had both image and other content:
$("p:has(img)").each(function() {
$(this).before( $(this).find("img") );
if(!$.trim(this.innerHTML).length) $(this).remove();
});
p{
background:red;
padding: 10px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="container" class="container">
<p>
<img alt="" src="http://placehold.it/50x50/f0b" />
</p>
<p>
Our roast dinners are buy one get one free!
</p>
</div>
<p>
<img src="http://placehold.it/50x50/f0b" alt="">
Lorem ipsum dolor ay ay
<img src="http://placehold.it/50x50/0bf" alt="">
</p>
<p>
<img src="http://placehold.it/50x50/0bf" alt="">
</p>
This will work for sure
var par = $(".par");
var tmp = par.find('.img').clone();
var parent = par.parent();
par.remove();
tmp.appendTo(parent);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="parent">
<p class="par">
<img src="https://webkit.org/demos/srcset/image-src.png" class="img" alt="">
</p>
</div>

PHP: xPath, getting all <p> tags inside a div

I'm attempting to access all the p tags inside a specific div. My xPath query looks like this, this should in theory return all p tags, however it only returns the first. Does anybody know how I might return all p tags?
//*[#id="shopMain"]/div/div/p
The structure is as follows:
<div id="shopMain">
<div id="px10">
<div id="pB30">
<p>
<span>Text I need</span>
</p>
<p>
<span>Text I need</span>
</p>
</div>
</div>
</div>
This worked for me..
define('BR','<br />');
$strhtml='<div id="shopMain">
<div id="px10">
<div id="pB30">
<p>
<span>Text I need</span>
</p>
<p>
<span>Text I need</span>
</p>
</div>
</div>
</div>';
$dom=new DOMDocument;
$dom->loadHTML( $strhtml );
$xpath=new DOMXPath( $dom );
$col=$xpath->query('//div[#id="shopMain"]/div/div/p');
if( $col ){
foreach( $col as $node ) echo $node->tagName.' '.$node->nodeValue.BR;
}
/*
output
------
p Text I need
p Text I need
*/

xpath not returning text if p tag is followed by any other tag

i want to get all the text between <p> and <h3> tag for the following HTML
<div class="bodyText">
<p>
<div class="articleBox articleSmallHorizontal channel-32333770 articleBoxBordered alignRight">
<div class="one">
<img src="url" alt="bar" class="img" width="80" height="60" />
</div>
<div class="two">
<h4 class="preTitle">QIEZ-Lieblinge</h4>
<h3 class="title"><a href="url" title="ABC" onclick="cmsTracking.trackClickOut({element:this, channel : 32333770, channelname : 'top_listen', content : 14832081, callTemplate : '_htmltagging.Text', action : 'click', mouseevent : event});">
Prominente Gastronomen </a></h3>
<span class="postTitle"></span>
<span class="district">Berlin</span> </div>
<div class="clear"></div>
</div>
I want this TEXT</p>
<h3>I want this TEXT</h3>
<p>I want this TEXT</p>
<p>
<div class="inlineImage alignLeft">
<div class="medium">
<img src="http://images03.qiez.de/Restaurant+%C3%96_QIEZ.jpg/280x210/0/167.231.886/167.231.798" width="280" height="210" alt="Schöne Lage: das Restaurant Ø. (c)QIEZ"/>
<span class="caption">
Schöne Lage: das Restaurant Ø. (c)QIEZ </span>
</div>
</div>I want this TEXT</p>
<p>I want this TEXT</p>
<p>I want this TEXT<br /> </p>
<blockquote><img src="url" alt="" width="68" height="68" />
"Eigentlich nur drei Worte: Ich komme wieder."<span class="author">Tina Gerstung</span></blockquote>
<div class="clear"></div>
</div>
i want all "I want this TEXT". i used xpath query
//div[contains(#class,'bodyText')]/*[local-name()='p' or local-name()='h3']
but it does not give me the text if <p> tag is followed by any other tag
It looks like you have div elements contained within your p element which is not valid and messing up things. If you use a var_dump in the loop you can see that it does actually pick up the node but the nodeValue is empty.
A quick and dirty fix to your html would be to wrap the first div that is contained in the p element in a span.
<span><div class="articleBox articleSmallHorizontal channel-32333770 articleBoxBordered alignRight">...</div></span>
A better fix would be to put the div element outside the paragraph.
If you use the dirty workaround you will need to change your query like so:
$xpath->query("//div[contains(#class,'bodyText')]/*[local-name()='p' or local-name()='h3']/text()");
If you do not have control of the source html. You can make a copy of the html and remove the offending divs:
$nodes = $xpath->query("//div[contains(#class,'articleBox')]");
$node = $nodes->item(0);
$node->parentNode->removeChild($node);
It might be easier to work with simple_html_dom. Maybe you can try this:
include('simple_html_dom.php');
$dom = new simple_html_dom();
$dom->load($html);
foreach($dom->find("div[class=bodyText]") as $parent) {
foreach($parent->children() as $child) {
if ($child->tag == 'p' || $child->tag == 'h3') {
// remove the inner text of divs contained within a p element
foreach($dom->find('div') as $e)
$e->innertext = '';
echo $child->plaintext . '<br>';
}
}
}
This is mixed content. Depending on what defines the position of the element, you can use a number of factors. In this cse, probably simply selected all the text nodes will be sufficient:
//div[contains(#class, 'bodyText')]/(p | h3)/text()
If the union operator within a path location is not allowed in your processor, then you can use your syntax as before or a little bit simpler in my opinion:
//div[contains(#class, 'bodyText')]/*[local-name() = ('p', 'h3')]/text()

php remove tags before a specified tag

I want to remove all image-tags before the headline starts, but they are not nested the same way. And then remove the empty tags.
<div class="c2">
<img src="image/file" width="480" height="360" alt="Image" />
</div>
<div class="c2">
<div class="headline">
headline
</div>
<div class="headline">
headline2
</div>
</div>
and different nested tags like
<div class="c2">
<p>
<img src="image/A.JPG" width="480" height="319" alt="Image" />
</p>
<div class="headline">
A headline
</div>
</div>
i think that could be solved recursively, but i dont know how.
Thanks for your help!
EDIT: if you want to remove only <img> followed by <div><div class="headline>" or <div class="headline">, use this xpath:
$imgs = $xpath->query("//img[../following-sibling::div[1]/div/#class='headline' or ../following-sibling::div[1]/#class='headline']");
see it working: http://codepad.viper-7.com/QhprLP
Do it like this:
$doc = new DOMDocument();
$doc->loadHTML($x); // assuming HTML in $x
$xpath = new DOMXpath($doc);
$imgs = $xpath->query("//img"); // select all <img> nodes
foreach ($imgs as $img) { // loop through list of all <img> nodes
$parent = $img->parentNode;
$parent->removeChild($img); // delete <img> node
if ($parent->childNodes->length >= 1) // if parent node of <img> is empty delete it
$parent->parentNode->removeChild($parent);
}
echo htmlentities($doc->saveHTML()); // display the new HTML
see it working: http://codepad.viper-7.com/350Hw6

how to scrape the external url using php simple html dom parser

<div id="lyrics">
<img />
<span id="line_31" class="line line-s">En medio de este tropico mortal</span
<br>
<span id="line_32" class="line line-s">Roots and creation, come again!</span>
<br>
<span id="line_33" class="line line-s">So mi guardian, mi guardian mi lift up di plan</span>
<span id="line_34" class="line line-s">Now everybody a go' do dis one</span>
<span id="line_35" class="line line-s">Like in down di Caribbean</span>
<span id="line_36" class="line line-s">San Andrés, Providence Island</span>
<br>
</div>
Here I have a div, inside div there is multiple span and br tag between span. I want to scrape the span text and br tag as it is. so how can i scrape with php simple dom parser.
thanks for any help.
Let's say the html file you have above is called "index.html".
$html = file_get_html("index.html");
$element = $html->find('div#lyrics');
$result = $element->innertext;
You want to consult the manual: http://simplehtmldom.sourceforge.net/

Categories