Using PHP preg_replace to append text to pattern found with regex - php

I want to append a tag div before and after all tags img.
So I have
<img src=%random url image% />
And it should be replaced with
<div class="demo"><img src=%random url image% /></div>
Can I do it with preg_replace?
$string = %page source code%;
$find = array("/<img(.*?)\/>/");
$replace = array('<div class="demo">'.$find[0].'</div>');
$result = preg_replace($find, $replace, $string);
But it not work :/

A better way to parse HTML is using PHPs DOMDocument and DOMXPath classes. In your case, you can use XPath to find all the images, then add a div around them as shown in this example:
$html = '<div><img src="http://x.com" /><span>xyz</span><img src="http://example.com" /></div>';
$doc = new DOMDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXpath($doc);
$images = $xpath->query('//img');
foreach ($images as $image) {
$div = $doc->createElement('div');
$div->setAttribute('class', 'demo');
$image->parentNode->replaceChild($div, $image);
$div->appendChild($image);
}
echo $doc->saveHTML();
Output:
<div>
<div class="demo"><img src="http://x.com"></div>
<span>xyz</span>
<a href="http://example.com">
<div class="demo"><img src="http://example.com"></div>
</a>
</div>
Demo on 3v4l.org

Related

Replace all title attributes in an html document

I have html code in a variable. For example $html equals:
<div title="Cool stuff" alt="Cool stuff"><a title="another stuff">......</a></div>
I need to replace content of all title attributes title="Cool stuff" and title="anot stuff" and so on with title="$newTitle".
Is there any non-regex way to do this?
And if I have to use regex is there a better(performance-wise) and/or more elegant solution than what I came up with?
$html = '...'
$newTitle = 'My new title';
$matches = [];
preg_match_all(
'/title=(\"|\')([^\"\']{1,})(\"|\')/',
$html,
$matches
);
$attributeTitleValues = $matches[2];
foreach ($attributeTitleValues as $title)
{
$html = str_replace("title='{$title}'", "title='{$newTitle}'", $html);
$html = str_replace("title=\"{$title}\"", "title=\"{$newTitle}\"", $html);
}
Definitely don't use regex -- it is a dirty rabbit hole....the hole is dirty, not the rabbit :)
I prefer to use DomDocument and Xpath to directly target all title attributes of all element in your html document.
LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD flags are in place to prevent your output being garnished with <doctype> and <html> tags.
// in the XPath expression says: go to any depth in search of matches
Code: (Demo)
$html = <<<HTML
<div title="Cool stuff" alt="Cool stuff"><a title="another stuff">......</a></div>
HTML;
$newTitle = 'My new title';
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//#title') as $attr) {
$attr->value = $newTitle;
}
echo $dom->saveHTML();
Output:
<div title="My new title" alt="Cool stuff"><a title="My new title">......</a></div>

Get img src value inside a document.write PHP

I need to get the image src value from the following code using PHP XPath & node.
Sample HTML
<div class=\"thumb-inside\">
<div class=\"thumb\">
<script>document.write(thumbs.replaceThumbUrl('<img src=\".....\" />'));</script>
</div>
</div>
I tried like this:
$node = $xpath->query("div[#class='thumb-inside']/div[#class='thumb'‌​]/a/img/attribute::s‌​rc", $e);
$th = $node->item(0)->nodeValue;
I achieved through the following code. But I dont know whether its a correct code.
$string = str_replace("document.write(thumbs.replaceThumbUrl(","",$string);
$string = str_replace("'));","",$string);
$pattern = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#';
preg_match_all($pattern, $string, $matches, PREG_PATTERN_ORDER);
$th = $matches[0][0];
You can use DOMDocument in php like below to get the image source.
$html=file_get_contents('file_path');
$doc = new \DomDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $tag->getAttribute('src');
}

Wrap <img> elements in <div> but allow for <a> tags

I have a function that scans for img tags in a string using DOMDocument and wraps them in a div.
$str = 'string containing HTML';
$doc = new DOMDocument();
$doc->loadHtml(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
}
return $str;
However, when an img is wrapped in a tags, the a tags are removed and 'replaced' with the div. How can I keep the a tags?
Currently,
<img src="srctoimg"/>
results in;
<div><img src="srctoimg"/></div>
rather than;
<div><img src="srctoimg"/></div>
Is there a 'wildcard' I can pass in with the second argument to insertBefore() or how can I achieve this?
You can do this with an XPath query.
'//*[img/parent::a or (self::img and not(parent::a))]'
This will get the parent for any img tag that has an a parent, as well as any image tag itself for any img tag that does not have an immediate a parent.
This way you don't have to change the code within your loop.
$str = <<<EOS
<html>
<body>
Image with link:
<a href="http://google.com">
<img src="srctoimg"/>
</a>
Image without link:
<img src="srctoimg"/>
</body>
</html>
EOS;
$doc = new DOMDocument();
$doc->loadHtml(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));
$xpath = new DOMXPath($doc);
$tags = $xpath->query(
'//*[img/parent::a or (self::img and not(parent::a))]'
);
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
}
echo $doc->saveHTML();
Output (indented for clarity):
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
Image with link:
<div>
<a href="http://google.com">
<img src="srctoimg">
</a>
</div>
Image without link:
<div>
<img src="srctoimg">
</div>
</body>
</html>
Try with this inside your foreach
$parent = $tag->parentNode;
if( $parent->tagName == 'a' )
{
$parent->parentNode->insertBefore($div, $parent);
$div->appendChild($parent);
}
else
{
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
}
It might be simplest to just use an if clause:
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$x = $tag->parentNode;
// Parent node is not 'a': insert before <img>
if($tag->parentNode->tag != 'a') {
$tag->parentNode->insertBefore($div, $tag);
}
// Parent node is 'a': insert before <a>
else{
$tag->parentNode->parentNode->insertBefore($div, $tag);
}
$div->appendChild($tag);
}
Using xpath
$str = 'string containing HTML';
$doc = new DOMDocument();
$doc->loadHtml(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));
$xpath = new DOMXPath($doc);
$tags0 = $xpath->query('//a/img'); // get all <img> in <a> tag
$tags1 = $xpath->query('//img[not(parent::a)]'); // get all <img> except those with parent <a> tag
$tags = array_merge($tags0,$tags1); // merge the 2 arrays
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
}
return $str;
Hi I provide this solution using jquery. I know you ask this question in php.
<img src="srctoimg"/>
<script src="http://code.jquery.com/jquery-1.11.2.min.js"></script>
<script>
$('a').replaceWith(function(){
return $('<div/>', {
html: this.innerHTML
})
})
</script>

how to use preg_match_all to remove the <a> tag

$content = preg_replace("~(<a href=\"(.*)\">\w+)~iU", '', $content);
$ok = preg_replace("~(</a>)~iU", '', $content);
echo $ok;
I need to control the $content...
I want to remove all link in the $content....
even <img xxxx> all to remove A tag Just save <img xxx>...
How can I do?
I need edit the REGEX??
why I just can del the first one
You can replace the anchors with their contents using DOMDocument:
$html = <<<'EOS'
<img src="http://example.com">
<img src="http://example.com">
EOS;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//a') as $anchor) {
$fragment = $doc->createDocumentFragment();
// collecting the child nodes
foreach ($anchor->childNodes as $node) {
$fragment->appendChild($node);
}
// replace anchor with all child nodes
$anchor->parentNode->replaceChild($fragment, $anchor);
}
echo $doc->saveHTML();

Get href value from matching anchor text

I'm pretty new to the DOMDocument class and can't seem to find an answer for what i'm trying to do.
I have a large html file and i want to grab the link from an element based on the anchor text.
so for example
$html = <<<HTML
<div class="main">
<img src="http://images.com/spacer.gif"/>Keyword</font></span>
other text
</div>
HTML;
// domdocument
$doc = new DOMDocument();
$doc->loadHTML($html);
i want to get the value of the href attribute of any element that has the text keyword. Hope that was clear
$html = <<<HTML
<div class="main">
<img src="http://images.com/spacer.gif"/>Keyword</font></span>
other text
</div>
HTML;
$keyword = "Keyword";
// domdocument
$doc = new DOMDocument();
$doc->loadHTML($html);
$as = $doc->getElementsByTagName('a');
foreach ($as as $a) {
if ($a->nodeValue === $keyword) {
echo $a->getAttribute('href'); // prints "http://link.com"
break;
}
}

Categories