Regex to replace html src attribute in PHP

Regex to replace html src attribute in PHP - php

I'm trying to use regex to replace source attribute (could be image or any tag) in PHP.
I've a string like this:
$string2 = "<html><body><img src = 'images/test.jpg' /><img src = 'http://test.com/images/test3.jpg'/><video controls="controls" src='../videos/movie.ogg'></video></body></html>";
And I would like to turn it into:
$string2 = "<html><body><img src = 'test.jpg' /><img src = 'test3.jpg'/><video controls="controls" src='movie.ogg'></video></body></html>";
Heres what I tried :
$string2 = preg_replace("/src=["']([/])(.*)?["'] /", "'src=' . convert_url('$1') . ')'" , $string2);
echo htmlentities ($string2);
Basically it didn't change anything and gave me a warning about unescaped string.
Doesn't $1 send the content of the string ? What is wrong here ?
And the function of convert_url is from an example I posted here before :
function convert_url($url)
{
if (preg_match('#^https?://#', $url)) {
$url = parse_url($url, PHP_URL_PATH);
}
return basename($url);
}
It's supposed to strip out url paths and just return the filename.

Don't use regular expressions on HTML - use the DOMDocument class.
$html = "<html>
<body>
<img src='images/test.jpg' />
<img src='http://test.com/images/test3.jpg'/>
<video controls='controls' src='../videos/movie.ogg'></video>
</body>
</html>";
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML( $html );
$xpath = new DOMXPath( $dom );
libxml_clear_errors();
$doc = $dom->getElementsByTagName("html")->item(0);
$src = $xpath->query(".//#src");
foreach ( $src as $s ) {
$s->nodeValue = array_pop( explode( "/", $s->nodeValue ) );
}
$output = $dom->saveXML( $doc );
echo $output;
Which outputs the following:
<html>
<body>
<img src="test.jpg">
<img src="test3.jpg">
<video controls="controls" src="movie.ogg"></video>
</body>
</html>

You have to use the e modifier.
$string = "<html><body><img src='images/test.jpg' /><img src='http://test.com/images/test3.jpg'/><video controls=\"controls\" src='../videos/movie.ogg'></video></body></html>";
$string2 = preg_replace("~src=[']([^']+)[']~e", '"src=\'" . convert_url("$1") . "\'"', $string);
Note that when using the e modifier, the replacement script fragment needs to be a string to prevent it from being interpreted before the call to preg_replace.

function replace_img_src($img_tag) {
$doc = new DOMDocument();
$doc->loadHTML($img_tag);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
$old_src = $tag->getAttribute('src');
$new_src_url = 'website.com/assets/'.$old_src;
$tag->setAttribute('src', $new_src_url);
}
return $doc->saveHTML();
}

Related

Using PHP preg_replace to append text to pattern found with regex

I want to append a tag div before and after all tags img.
So I have
<img src=%random url image% />
And it should be replaced with
<div class="demo"><img src=%random url image% /></div>
Can I do it with preg_replace?
$string = %page source code%;
$find = array("/<img(.*?)\/>/");
$replace = array('<div class="demo">'.$find[0].'</div>');
$result = preg_replace($find, $replace, $string);
But it not work :/

A better way to parse HTML is using PHPs DOMDocument and DOMXPath classes. In your case, you can use XPath to find all the images, then add a div around them as shown in this example:
$html = '<div><img src="http://x.com" /><span>xyz</span><img src="http://example.com" /></div>';
$doc = new DOMDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXpath($doc);
$images = $xpath->query('//img');
foreach ($images as $image) {
$div = $doc->createElement('div');
$div->setAttribute('class', 'demo');
$image->parentNode->replaceChild($div, $image);
$div->appendChild($image);
}
echo $doc->saveHTML();
Output:
<div>
<div class="demo"><img src="http://x.com"></div>
<span>xyz</span>
<a href="http://example.com">
<div class="demo"><img src="http://example.com"></div>
</a>
</div>
Demo on 3v4l.org

Regex match if pattern is not after another patten - PHP

I want to wrap an iframe object in a div class, but only if it isn't already wrapped in that div class. I'm trying to use a negative match pattern for that div class so preg_replace will not match and return the original $content. However it still matches:
<?php
$content = <<< EOL
<div class="aoa_wrap"><iframe width="500" height="281" src="https://www.youtube.com/embed/uuZE_IRwLNI" frameborder="0" allowfullscreen></iframe></div>
EOL;
$pattern = "~(?!<div(.*?)aoa_wrap(.*?)>)<iframe\b[^>]*>(?:(.*?)?</iframe>)?~";
$replace = '<div class="aoa_wrap">${0}</div>';
$content = preg_replace( $pattern, $replace, $content);
echo $content . "\n";
?>
Output (incorrect):
<div class="aoa_wrap"><div class="aoa_wrap"><iframe width="500" height="281" src="https://www.youtube.com/embed/uuZE_IRwLNI" frameborder="0" allowfullscreen></iframe></div></div>
I'm not sure why the negative pattern at the beginning is not causing preg_replace to return the original $content as expected. Am I missing something obvious?

I ended up trying DOM as suggested in above comments. This is what works for me:
<?php
$content = <<< EOL
<p>something here</p>
<iframe width="500" height="281" src="https://www.youtube.com/embed/uuZE_IRwLNI" frameborder="0" allowfullscreen></iframe>
<p><img src="test.jpg" /></p>
EOL;
$doc = new DOMDocument();
$doc->loadHTML( "<div>" . $content . "</div>" );
// remove <!DOCTYPE and html and body tags that loadHTML adds:
$container = $doc->getElementsByTagName('div')->item(0);
$container = $container->parentNode->removeChild($container);
while ($doc->firstChild) {
$doc->removeChild($doc->firstChild);
}
while ($container->firstChild ) {
$doc->appendChild($container->firstChild);
}
// get all iframes and see if we need to wrap them in our aoa_wrap class:
$nodes = $doc->getElementsByTagName( 'iframe' );
foreach ( $nodes as $node ) {
$parent = $node->parentNode;
// skip if already wrapped in div class 'aoa_wrap'
if ( isset( $parent->tagName ) && 'div' == $parent->tagName && 'aoa_wrap' == $parent->getAttribute( 'class' ) ) {
continue;
}
// create new element for class "aoa_wrap"
$wrap = $doc->createElement( "div" );
$wrap->setAttribute( "class", "aoa_wrap" );
// clone the iframe node as child
$wrap->appendChild( $node->cloneNode( true ) );
// replace original iframe node with new div class wrapper node
$parent->replaceChild( $wrap, $node );
}
echo $doc->saveHTML();
?>

Wrap <img> elements in <div> but allow for <a> tags

I have a function that scans for img tags in a string using DOMDocument and wraps them in a div.
$str = 'string containing HTML';
$doc = new DOMDocument();
$doc->loadHtml(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
}
return $str;
However, when an img is wrapped in a tags, the a tags are removed and 'replaced' with the div. How can I keep the a tags?
Currently,
<img src="srctoimg"/>
results in;
<div><img src="srctoimg"/></div>
rather than;
<div><img src="srctoimg"/></div>
Is there a 'wildcard' I can pass in with the second argument to insertBefore() or how can I achieve this?

You can do this with an XPath query.
'//*[img/parent::a or (self::img and not(parent::a))]'
This will get the parent for any img tag that has an a parent, as well as any image tag itself for any img tag that does not have an immediate a parent.
This way you don't have to change the code within your loop.
$str = <<<EOS
<html>
<body>
Image with link:
<a href="http://google.com">
<img src="srctoimg"/>
</a>
Image without link:
<img src="srctoimg"/>
</body>
</html>
EOS;
$doc = new DOMDocument();
$doc->loadHtml(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));
$xpath = new DOMXPath($doc);
$tags = $xpath->query(
'//*[img/parent::a or (self::img and not(parent::a))]'
);
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
}
echo $doc->saveHTML();
Output (indented for clarity):
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
Image with link:
<div>
<a href="http://google.com">
<img src="srctoimg">
</a>
</div>
Image without link:
<div>
<img src="srctoimg">
</div>
</body>
</html>

Try with this inside your foreach
$parent = $tag->parentNode;
if( $parent->tagName == 'a' )
{
$parent->parentNode->insertBefore($div, $parent);
$div->appendChild($parent);
}
else
{
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
}

It might be simplest to just use an if clause:
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$x = $tag->parentNode;
// Parent node is not 'a': insert before <img>
if($tag->parentNode->tag != 'a') {
$tag->parentNode->insertBefore($div, $tag);
}
// Parent node is 'a': insert before <a>
else{
$tag->parentNode->parentNode->insertBefore($div, $tag);
}
$div->appendChild($tag);
}

Using xpath
$str = 'string containing HTML';
$doc = new DOMDocument();
$doc->loadHtml(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));
$xpath = new DOMXPath($doc);
$tags0 = $xpath->query('//a/img'); // get all <img> in <a> tag
$tags1 = $xpath->query('//img[not(parent::a)]'); // get all <img> except those with parent <a> tag
$tags = array_merge($tags0,$tags1); // merge the 2 arrays
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
}
return $str;

Hi I provide this solution using jquery. I know you ask this question in php.
<img src="srctoimg"/>
<script src="http://code.jquery.com/jquery-1.11.2.min.js"></script>
<script>
$('a').replaceWith(function(){
return $('<div/>', {
html: this.innerHTML
})
})
</script>

PHP - replace <img> tags and return src

Mission is to replace all <img> tags in given string with <div> tags and src property as inner text.
In search for the answer I found similar question
<?php
$content = "this is something with an <img src=\"test.png\"/> in it.";
$content = preg_replace("/<img[^>]+\>/i", "(image) ", $content);
echo $content;
?>
result:
this is something with an (image) in it.
Question: How to upgrade script ant get this result:
this is something with an <div>test.png</div> in it.

This is the kind of problem that PHP's DOMDocument class excels at:
$dom = new DOMDocument();
$dom->loadHTML($content);
foreach ($dom->getElementsByTagName('img') as $img) {
// put your replacement code here
}
$content = $dom->saveHTML();

$content = "this is something with an <img src=\"test.png\"/> in it.";
$content = preg_replace('/(<)([img])(\w+)([^>]*>)/', '<div>$1</div>', $content);
echo $content;

<?php
$strings = 'awdaw <img src="http://ua1.us/media/media.jpg" alt="Image" width="100" height="100"> aw <img src="http://ua1.us/media/media1awdwa.jpg"> wawadwad';
preg_match_all('/<img[^>]+>/i', $strings, $images);
foreach ($images[0] as $image) {
preg_match('/src="([^"]+)/i', $image, $replacements);
$replacement = isset($replacements[1]) ? $replacements[1] : (isset($replacements[0]) ? $replacements[0] : "image");
$strings = str_replace($image, $replacement, $strings);
}
echo $strings;

How to strip a tag and all of its inner html using the tag's id?

I have the following html:
<html>
<body>
bla bla bla bla
<div id="myDiv">
more text
<div id="anotherDiv">
And even more text
</div>
</div>
bla bla bla
</body>
</html>
I want to remove everything starting from <div id="anotherDiv"> until its closing <div>. How do I do that?

With native DOM
$dom = new DOMDocument;
$dom->loadHTML($htmlString);
$xPath = new DOMXPath($dom);
$nodes = $xPath->query('//*[#id="anotherDiv"]');
if($nodes->item(0)) {
$nodes->item(0)->parentNode->removeChild($nodes->item(0));
}
echo $dom->saveHTML();

You can use preg_replace() like:
$string = preg_replace('/<div id="someid"[^>]+\>/i', "", $string);

Using the native XML Manipulation Library
Assuming that your html content is stored in the variable $html:
$html='<html>
<body>
bla bla bla bla
<div id="myDiv">
more text
<div id="anotherDiv">
And even more text
</div>
</div>
bla bla bla
</body>
</html>';
To delete the tag by ID use the following code:
$dom=new DOMDocument;
$dom->validateOnParse = false;
$dom->loadHTML( $html );
// get the tag
$div = $dom->getElementById('anotherDiv');
// delete the tag
if( $div && $div->nodeType==XML_ELEMENT_NODE ){
$div->parentNode->removeChild( $div );
}
echo $dom->saveHTML();
Note that certain versions of libxml require a doctype to be present in order to use the getElementById method.
In that case you can prepend $html with <!doctype>
$html = '<!doctype>' . $html;
Alternatively, as suggested by Gordon's answer, you can use DOMXPath to find the element using the xpath:
$dom=new DOMDocument;
$dom->validateOnParse = false;
$dom->loadHTML( $html );
$xp=new DOMXPath( $dom );
$col = $xp->query( '//div[ #id="anotherDiv" ]' );
if( !empty( $col ) ){
foreach( $col as $node ){
$node->parentNode->removeChild( $node );
}
}
echo $dom->saveHTML();
The first method works regardless the tag. If you want to use the second method with the same id but a different tag, let say form, simply replace //div in //div[ #id="anotherDiv" ] by '//form'

strip_tags() function is what you are looking for.
http://us.php.net/manual/en/function.strip-tags.php

I wrote these to strip specific tags and attributes. Since they're regex they're not 100% guaranteed to work in all cases, but it was a fair tradeoff for me:
// Strips only the given tags in the given HTML string.
function strip_tags_blacklist($html, $tags) {
foreach ($tags as $tag) {
$regex = '#<\s*' . $tag . '[^>]*>.*?<\s*/\s*'. $tag . '>#msi';
$html = preg_replace($regex, '', $html);
}
return $html;
}
// Strips the given attributes found in the given HTML string.
function strip_attributes($html, $atts) {
foreach ($atts as $att) {
$regex = '#\b' . $att . '\b(\s*=\s*[\'"][^\'"]*[\'"])?(?=[^<]*>)#msi';
$html = preg_replace($regex, '', $html);
}
return $html;
}

how about this?
// Strips only the given tags in the given HTML string.
function strip_tags_blacklist($html, $tags) {
$html = preg_replace('/<'. $tags .'\b[^>]*>(.*?)<\/'. $tags .'>/is', "", $html);
return $html;
}

Following RafaSashi's answer using preg_replace(), here's a version that works for a single tag or an array of tags:
/**
* #param $str string
* #param $tags string | array
* #return string
*/
function strip_specific_tags ($str, $tags) {
if (!is_array($tags)) { $tags = array($tags); }
foreach ($tags as $tag) {
$_str = preg_replace('/<\/' . $tag . '>/i', '', $str);
if ($_str != $str) {
$str = preg_replace('/<' . $tag . '[^>]*>/i', '', $_str);
}
}
return $str;
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex to replace html src attribute in PHP - php

Related

Using PHP preg_replace to append text to pattern found with regex

Regex match if pattern is not after another patten - PHP

Wrap <img> elements in <div> but allow for <a> tags

PHP - replace <img> tags and return src

How to strip a tag and all of its inner html using the tag's id?

Categories

Resources