How to keep <p><img ... /></p> with XPATH? - php

I use XPATH to remove untidy HTML tags,
$nodeList = $xpath->query("//*[normalize-space(.)='' and not(self::br)]");
foreach($nodeList as $node)
{
$node->parentNode->removeChild($node);
}
will remove the horrible input like these,
<p><em><br /></em></p>
<p><span style="text-decoration: underline;"><em><br /></em></span></p>
but it also removes the img tag like blow that I want to keep,
<p><img title="picture summit" src="images/32913430_127001_e.jpg" alt="picture summit" width="590" height="366" /></p>
How can I keep the img tag input with XPATH?

Use:
//p[not(descendant::*[self::img or self::br]) and normalize-space()='']

Maybe you could use an XPath 1.0 expression like the one below to remove unwanted paragraphs:
//p[count(text())=0 and count(img)=0]

Related

How to replace div with one of its child p nodes

This html I get from the Response.
And I need to remove the extra text.
There is a line of the following content
<?php
$str = <<<HTML
AAAA <span>span txt</span>
<div class='unique_div' id='xrz' data-id='1'>
div text
<span>span text</span>
<p class='unique_p'>
<span>p span text</span>
<p>p p text</p>
</p>
div text
</div>
BBBB <span>span txt</span>
HTML;
How to replace div on p which is inside it?
I need to write a regular expression to get the following result
<?php
$str = <<<HTML
AAAA <span>span txt</span>
<p class='unique_p'>
<span>p span text</span>
<p>p p text</p>
</p>
BBBB <span>span txt</span>
HTML;
There is only one div and p with such attributes.
Since you're looking at what appears to be HTML and given that your requirements entail some form of modification to the Document Object Model (DOM) I would suggest using a DOM parser like DOMDocument.
If I understood your question correctly, you're looking to replace the <div> node which appears to have an id attribute of xrz with the p node that has a class attribute of unique_p and is a child of the div.
Getting the div is easy, because it has an id and they are unique. So we can use a method like DOMDocument::getElementById to get that div.
Getting its child p gets a little trickier since we want to make sure it's both a child of div and has the specified class. So we'll rely on an XPath query for that using DOMXPath.
Finally, we'll replace the div with its captured child p by using DOMNode::replaceChild from there.
Here's a simple example.
$str = <<<HTML
AAAA <span>span txt</span>
<div class='unique_div' id='xrz' data-id='1'>
div text
<span>span text</span>
<p class='unique_p'>
<span>p span text</span>
<p>p p text</p>
</p>
div text
</div>
BBBB <span>span txt</span>
HTML;
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($str, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$children = $xpath->query('//div/p[#class="unique_p"]');
$p = $children->item(0);
$div = $dom->getElementById('xrz');
$div->parentNode->replaceChild($p, $div);
echo $dom->saveHTML();
The output should look something like this.
<p>AAAA <span>span txt</span>
<p class="unique_p">
<span>p span text</span>
</p><p>
BBBB <span>span txt</span></p></p>
In case you're wondering why the output may appear slightly different than what you might expect, it's important to note that your initial HTML, provided in your question, is actually malformed.
See section 9.3.1 of the HTML 4.01 specification
The P element represents a paragraph. It cannot contain block-level elements (including P itself).
So each time a DOM parser finds an opening p tag inside of another p tag it will just implicitly close the previous one first.

preg_replace regex to remove stray end tag

I have a string containing different types of html tags and stuff, including some <img> elements. I am trying to wrap those <img> elements inside a <figure> tag. So far so good using a preg_replace like this:
preg_replace( '/(<img.*?>)/s','<figure>$1</figure>',$content);
However, if the <img>tag has a neighboring <figcaption> tag, the result is rather ugly, and produces a stray end tag for the figure-element:
<figure id="attachment_9615">
<img class="size-full" src="http://www.example.com/pic.png" alt="name" width="1699" height="354" />
<figcaption class="caption-text"></figure>Caption title here</figcaption>
</figure>
I've tried a whole bunch of preg_replace regex variations to wrap both the img-tag and figcaption-tag inside figure, but can't seem to make it work.
My latest try:
preg_replace( '/(<img.*?>)(<figcaption .*>*.<\/figcaption>)?/s',
'<figure">$1$2</figure>',
$content);
As others pointed out, better use a parser, i.e. DOMDocument instead. The following code wraps a <figure> tag around each img where the next sibling is a <figcaption>:
<?php
$html = <<<EOF
<html>
<img class="size-full" src="http://www.example.com/pic.png" alt="name" width="1699" height="354" />
<figcaption class="caption-text">Caption title here</figcaption>
<img class="size-full" src="http://www.example.com/pic.png" alt="name" width="1699" height="354" />
<img class="size-full" src="http://www.example.com/pic.png" alt="name" width="1699" height="354" />
<figcaption class="caption-text">Caption title here</figcaption>
</html>
EOF;
$dom = new DOMdocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
# get all images
$imgs = $xpath->query("//img");
foreach ($imgs as $img) {
if ($img->nextSibling->tagName == 'figcaption') {
# create a new figure tag and append the cloned elements
$figure = $dom->createElement('figure');
$figure->appendChild($img->cloneNode(true));
$figure->appendChild($img->nextSibling->cloneNode(true));
# insert the newly generated elements right before $img
$img->parentNode->insertBefore($figure, $img);
# and remove both the figcaption and the image from the DOM
$img->nextSibling->parentNode->removeChild($img->nextSibling);
$img->parentNode->removeChild($img);
}
}
$dom->formatOutput=true;
echo $dom->saveHTML();
See a demo on ideone.com.
To have a <figure> tag around all your images, you might want to add an else branch:
} else {
$figure = $dom->createElement('figure');
$figure->appendChild($img->cloneNode(true));
$img->parentNode->insertBefore($figure, $img);
$img->parentNode->removeChild($img);
}

PHP - wrap an image with a span tag

Hi I am trying to wrap images containing a specific class (pinthis is this example) in a span to which I will add info for schema. This is a basic example and I will need to inject other schema info also. To get me started though can anyone help me get from my existing code to my example output. I need to update multiple pages dynamically and some of the content will come via PHP from Wordpress taxonomies and other data so would prefer to do it in PHP if possible.
<p>
<a class="fancybox" rel="gallery1" href="image.jpg">
<img src="img.jpg" alt="alt text" width="1000" height="1000" class="various classes including ... pinthis">
</a>
</p>
Which I would like to become...
<p>
<a class="fancybox" rel="gallery1" href="image.jpg">
<span itemscope itemtype="http://schema.org/ImageObject">
<img src="img.jpg" alt="alt text" width="1000" height="1000" class="various classes including ... pinthis">
</span>
</a>
</p>
I think if someone could point me in the right direction and give me a push start that would give me enough to carry on from there
Many thanks.
Using PHP DOMDocument, you could do something like this:
$html = '<p><a class="fancybox" rel="gallery1" href="image.jpg"><img src="img.jpg" alt="alt text" width="1000" height="1000" class="various classes pinthis"></a></p>';
// Create a DOMDocument and load the HTML.
$dom = new DOMDocument();
$dom->loadHTML($html);
// Create the span wrapper.
$span = $dom->createElement('span');
$span->setAttribute('itemscope', '');
$span->setAttribute('itemtype', 'http://schema.org/ImageObject');
// Get all the images.
$images = $dom->getElementsByTagName('img');
// Loop the images.
foreach ($images as $image) {
// Only affect those with the pinthis class.
if (strpos($image->getAttribute('class'), 'pinthis') !== false) {
// Clone the span if we need to use it often.
$span_clone = $span->cloneNode();
// Replace the image tag with the span tag.
$image->parentNode->replaceChild($span_clone, $image);
// Add the image tag as a child of the new span tag.
$span_clone->appendChild($image);
}
}
// Get your HTML with saveHTML()
$html = $dom->saveHTML();
echo $html;
Just modify the code to suit your specific needs. For example, if you need to change your span tag attributes, if you are changing your class for searching, etc... You might even want to make a function where you can specify your class and span attributes.
Documentation to DOMDocument: http://php.net/manual/en/class.domdocument.php
use warpAll
check if the image has required class
if image has class, then wrap it with the desired <span></span>
Try it this way :
if ($('img.classes').hasClass('pinthis')){
$('img.classes').wrapAll('<span itemscope itemtype="http://schema.org/ImageObject">></span>');
}
Fiddle Demo
helpful thread : jquery, wrap elements inside a div

Regex: Extracting img-tags from string

I'm trying to make this:
<span class="introduction">
<img alt="image" src="/picture.jpg" />
</span>
transform into this:
<img alt="image" src="/picture.jpg" />
How would I do this with regex? That is, how do I extract ONLY the img-tag from a given string of html?
Note: There can be a lot more html within the introduction-tag BUT only one img-tag
You shouldn't really use regex on HTML, what about this:?
$string = '<span class="introduction"><img alt="image" src="/picture.jpg" /></span>';
echo strip_tags($string, '<img>');
Otherwise I would use an HTML/XML parser
how about
"<img[^>]*>"
try with grep
kent$ echo '<span class="introduction">
quote> <img alt="image" src="/picture.jpg" />
quote> </span>
quote> '|grep -P "<img[^>]*>"
<img alt="image" src="/picture.jpg" />
preg_match('#(<img.*?>)#', $string, $results);
should work, result in $results[1]
Use DOM and this XPath:
//span[#class="introduction"]/img
to find all img elements that are direct children of any span element with a class attribute of introduction.
I've come to this solution
/<img ([^>"']*("[^"]*"|'[^']*')?[^>"']*)*>/
tested on
<other html elements like span or whatever><img src="asd>qwe" attr1='asd>qwe' attr2='as"dqwe' attr3="as'dqwe" ></other html elements like span or whatever>

PHP allow img tags only

I need your assistence related php. In php, i want to allow html <img> tags only, i tried php's built-in function strip_tags() but it's not giving me the output i need. For instance, in the following code strip_tags() allows img tags but along with text.
$img = "<img src='/img/fawaz.jpg' alt= ''> <br /> <p> This is a detailed paragraph about Fawaz and his mates.</p>";
echo strip_tags($img , "<img>");
What would be the proper way to just allow <img> or any tag only from the function or variable.
Any help 'd be appreciated.
Thanks
This might be due to non closing img tag in your code. Try this
$img = "<img src='/img/fawaz.jpg' alt= '' /> <br /> <p> This is a detailed paragraph about Fawaz and his mates.</p>";
echo strip_tags($img , "<img>");
strip_tags() doesn't work that way you want it to behave. If supplied with a second argument, the tags listed are allowed to be part of the resulting string - except those which are not listed. And it will not filter out inner text.
If you want to extract <img/> elements only, don't even think about using a regex. Use a DOM parser for that:
libxml_use_internal_errors(true);
$doc=new DOMDocument;
$html=$doc->loadHTML('<img src="/img/fawaz.jpg" alt= ""> <br /> <p> This is a
detailed paragraph about Fawaz and his mates.</p>');
$path=new DOMXPath($doc);
foreach ($path->query('//img') as $found)
var_dump($doc->saveXML($found));
delete HTML Tags Without <img> and <a> and <br/> and <hr/> and ...
$img = "
<img src='/img/fawaz.jpg' alt= '' />
<br /><br/>
<hr/>
<p> This is a detailed paragraph about Fawaz and his mates.</p>
<a href='cft'>123</a>
";
$img = strip_tags($img , "<img>|<a>|<br>|<hr>");
echo $img;

Categories