Regular Expression to ignore a link text - php

I have the following code:
<p> <img src="spas01.jpg" alt="" width="630" height="480"></p>
<p style="text-align: right;">Spas</p>
<p>My Site Content [...]</p>
I need a regular expression to get only the "My Site Content [...]".
So, i need to ignore first image (and maybe other) and links.

Try This:
Use (?<=<p>)([^><]+)(?=</p>) or <p>\K([^><]+)(?=</p>)
Update
$re = "#<p>\\K([^><]+)(?=</p>)#m";
$str = "<p> <img src=\"spas01.jpg\" alt=\"\" width=\"630\" height=\"480\"></p>\n<p style=\"text-align: right;\">Spas</p>\n<p>My Site Content [...]</p>";
preg_match_all($re, $str, $matches);
Demo

With DOMDocument and DOMXPath:
$html = <<<'EOD'
<p> <img src="spas01.jpg" alt="" width="630" height="480"></p>
<p style="text-align: right;">Spas</p>
<p>My Site Content [...]</p>
EOD;
$dom = new DOMDocument;
$dom->loadHTML($html);
$xp = new DOMXPath($dom);
$query = '//p//text()[not(ancestor::a)]';
$textNodes = $xp->query($query);
foreach ($textNodes as $textNode) {
echo $textNode->nodeValue . PHP_EOL;
}

Related

How to get html tags starting with 'abcd' using PHP?

I have the following html code:
<div class="pictures">
<figure>
<img src="img/foo.jpg" height="400" width="400" id="abcd001"/>
<figcaption>foo</figcaption>
</figure>
<figure>
<img src="img/bar.jpg" height="400" width="400" id="abcd002"/>
<figcaption>bar</figcaption>
</figure>
<figure>
<img src="img/Joe.jpg" height="400" width="400" id="abcd003"/>
<figcaption>Joe</figcaption>
</figure>
</div>
<div id="abcd004">
Lorem Ipsum.
<div>
And am trying to obtain all html tags and their children whose ids start with 'abcd'. The query function of the XDOMDocument class does not seem to work with regex.
$dom = new DomDocument();
$dom->load("/var/www/html/myWebsite.html")
$xpath = new DOMXPath($dom);
$xdom = $xpath->query("//img[#id='abcd*']");
foreach($xdom as $entry)
{
echo $entry;
}
Any suggestions?
EDIT: start-with does not work because that function does not seem to work on the IDs of html tags
I think you are not using the correct syntax for start-with, try the following:
$xpath = new DOMXPath($dom);
$xdom = $xpath->query("//img[starts-with(#id, 'abcd')]");
foreach($xdom as $entry) {
echo $entry->getAttribute('src') . "\n";
}
See it working here: https://3v4l.org/04i5a

Add data-mfp-src attribute to image tags PHP

his is the content:
<div class="image">
<img src="https://www.gravatar.com/avatar/" alt="test" width="50" height="50">
</div>
I want to use preg_replace to add data-mfp-src attribute (getting the value from the src attribute) to be the final code like this:
<div class="image">
<img src="https://www.gravatar.com/avatar/" data-mfp-src="https://www.gravatar.com/avatar/" alt="test" width="50" height="50">
</div>
This is my code and it's working without any issues but i want to use preg_replcae for some specific reasons:
function lazyload_images( $content ){
$content = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$dom = new DOMDocument;
libxml_use_internal_errors(true);
#$dom->loadHTML($content);
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//div[img]') as $paragraphWithImage) {
//$paragraphWithImage->setAttribute('class', 'test');
foreach ($paragraphWithImage->getElementsByTagName('img') as $image) {
$image->setAttribute('data-mfp-src', $image->getAttribute('src'));
$image->removeAttribute('src');
}
};
return preg_replace('~<(?:!DOCTYPE|/?(?:html|head|body))[^>]*>\s*~i', '', $dom->saveHTML($dom->documentElement));
}
As a robust means of isolating the src value and setting the new attribute to this value, I'll urge you to avoid regex. Not that it can't be done, but that my snippet to follow won't break if more classes are added to the <div> nor if the <img> attributes are shifted around.
Code: (Demo)
$html = <<<HTML
<div class="image">
<img src="https://www.gravatar.com/avatar/" alt="test" width="50" height="50">
</div>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
// using a loop in case there are multiple occurrences
foreach ($xpath->query("//div[contains(#class, 'image')]/img") as $node) {
$node->setAttribute('data-mfp-src', $node->getAttribute('src'));
}
echo $dom->saveHTML();
Output:
<div class="image">
<img src="https://www.gravatar.com/avatar/" alt="test" width="50" height="50" data-mfp-src="https://www.gravatar.com/avatar/">
</div>
Resources:
http://php.net/manual/en/domelement.setattribute.php
http://php.net/manual/en/domelement.getattribute.php
Just to show you what the regex might look like...
Find: ~<img src="([^"]*)"~
Replace: <img src="$1" data-mfp-src="$1"
Demo: https://regex101.com/r/lXIoFw/1 but again, I don't recommend it because it could silently let you down in the future.

Get img tag details using preg_replace [PHP]

This is my content:
<p><img src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
This is my PHP code:
function convert_the_content($content){
$content = preg_replace('/<p><img.+src=[\'"]([^\'"]+)[\'"].*>/i', "<p class=\"uploaded-img\"><img class=\"lazy-load\" data-src=\"$1\" /></p>", $content);
return $content;
}
I using my code to add a class for <p> tag and <img> tag and to convert src="" to data-src="".
The problem that my code has removed the width and the height attr from <img> tag, So my question is how can i change my code to work and getting this details with it too?
NOTE: My content may have many of <img> and <p> tags.
If you only have this very exact HTML snippet, you can do it simpler by just doing
$html = <<< HTML
<p><img src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
HTML;
$html = str_replace('<p>', '<p class="foo">', $html);
$html = str_replace(' src=', ' data-src=', $html);
echo $html;
This will output
<p class="foo"><img data-src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
If you are trying to convert arbitrary HTML, consider using a DOM Parser instead:
<?php
$html = <<< HTML
<html>
<body>
<p><img src="http://localhost/contents/uploads/2017/11/1.jpg" width="215" height="1515"></p>
<p><img width="215" height="1515" src="http://localhost/contents/uploads/2017/11/1.png"></p>
<p ><img
class="blah"
height="1515"
width="215"
src="http://localhost/contents/uploads/2017/11/1.png"
>
</p>
</body>
</html>
HTML;
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//p[img]') as $paragraphWithImage) {
$paragraphWithImage->setAttribute('class', 'foo');
foreach ($paragraphWithImage->getElementsByTagName('img') as $image) {
$image->setAttribute('class', trim('bar ' . $image->getAttribute('class')));
$image->setAttribute('data-src', $image->getAttribute('src'));
$image->removeAttribute('src');
}
};
echo $dom->saveHTML($dom->documentElement);
Output:
<html><body>
<p class="foo"><img width="215" height="1515" class="bar" data-src="http://localhost/contents/uploads/2017/11/1.jpg"></p>
<p class="foo"><img width="215" height="1515" class="bar" data-src="http://localhost/contents/uploads/2017/11/1.png"></p>
<p class="foo"><img class="bar blah" height="1515" width="215" data-src="http://localhost/contents/uploads/2017/11/1.png"></p>
</body></html>

getting first images next to id with DOMXpath::query

<span class="byline">
<ul class="foobar"></ul>
<img alt="" src="resize_image.php?src=images/newsManagement/87600069ef0dffad5fd02f862ea3787b.jpg&w=675&h=675">
<p style="text-align: justify;">
<img alt="" src="resize_image.php?src=images/newsManagement/87600069ef0dffad5fd02f862ea3787b.jpg&w=675&h=675">
<hr>
Hi this is my html. I can fetch all images using DOMDocument but i want to get first images that comes after ul.foobar class. I don't want other images. How can I query for that.
I tried this for all images.
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($url);
//$xpath = new DomXpath($doc);
//$entries = $xpath->query("//div[#id='newsbox']/ul[#class='foobar']");
$elements = $dom->getElementsByTagName('img');
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<br/>". $element->getAttribute('src'). ": ";
}
}
I think you can use DOMXPath query with this xpath expression:
$image = $xpath->query('//ul[#class="foobar"]/following-sibling::img')->item(0);
This will get the following img siblings for <ul class="foobar"> using following-sibling and then get the first item.
The $image is of type DOMElement.
In this example I've used loadHTML to load the html from a string $source.
If you want to load your html from a file, you could for example use loadHTMLFile.
$source = <<<SOURCE
<span class="byline">
<ul class="foobar"></ul>
<img alt="" src="resize_image.php?src=images/newsManagement/87600069ef0dffad5fd02f862ea3787b.jpg&w=675&h=675">
<p style="text-align: justify;">
<img alt="" src="resize_image.php?src=images/newsManagement/87600069ef0dffad5fd02f862ea3787b.jpg&w=675&h=675">
<hr>
SOURCE;
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($source);
$xpath = new DomXpath($dom);
$image = $xpath->query('//ul[#class="foobar"]/following-sibling::img')->item(0);

Why the getElementsByTagName is not working in this example

I have an DomElement with this content:
$cell = <td colspan=3>
<p class=5tablebody>
<span style='position:relative;top:14.0pt'>
<img width=300 height=220 src="forMerrin_files/image020.png">
</span>
</p>
</td>
There, I am geting the p element with:
$paragraphs = $xpath->query('.//p', $cell);
My goal is to get the img element from the cell element.
I have tried:
$paragraph->getElementsByTagName('img')->item(0);
But I am getting null. Any idea why?
Thank you
Is this what you after?
$htmlStr = '<td colspan=3>
<p class=5tablebody>
<span style=\'position:relative;top:14.0pt\'>
<img width=300 height=220 src="forMerrin_files/image020.png">
</span>
</p>
</td>';
$doc = new DOMDocument();
$doc->loadHTML($htmlStr);
$paragraphs = $doc->getElementsByTagName('img');
var_dump($paragraphs->item(0)->getAttribute('src'));
Outputs:
string 'forMerrin_files/image020.png' (length=28)
The second argument of DOMXpath::query() has to be a context node, you can not just use some HTML string. I suggest using DOMXpath::evaluate() anyway. The syntax of both methods is the same, but query() is limited to Xpath expressions that return a node list, evaluate() allows Xpath expressions that return scalars, too.
$html = <<<HTML
<td colspan=3>
<p class=5tablebody>
<span style='position:relative;top:14.0pt'>
<img width=300 height=220 src="forMerrin_files/image020.png">
</span>
</p>
</td>
HTML;
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
// for each td element
foreach ($xpath->evaluate('//td') as $cell) {
// for each img inside a p
foreach ($xpath->evaluate('.//p//img', $cell) as $img) {
var_dump($img->getAttribute('src'));
}
}
Output: https://eval.in/147576
string(28) "forMerrin_files/image020.png"

Categories