Replace all title attributes in an html document - php

I have html code in a variable. For example $html equals:
<div title="Cool stuff" alt="Cool stuff"><a title="another stuff">......</a></div>
I need to replace content of all title attributes title="Cool stuff" and title="anot stuff" and so on with title="$newTitle".
Is there any non-regex way to do this?
And if I have to use regex is there a better(performance-wise) and/or more elegant solution than what I came up with?
$html = '...'
$newTitle = 'My new title';
$matches = [];
preg_match_all(
'/title=(\"|\')([^\"\']{1,})(\"|\')/',
$html,
$matches
);
$attributeTitleValues = $matches[2];
foreach ($attributeTitleValues as $title)
{
$html = str_replace("title='{$title}'", "title='{$newTitle}'", $html);
$html = str_replace("title=\"{$title}\"", "title=\"{$newTitle}\"", $html);
}

Definitely don't use regex -- it is a dirty rabbit hole....the hole is dirty, not the rabbit :)
I prefer to use DomDocument and Xpath to directly target all title attributes of all element in your html document.
LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD flags are in place to prevent your output being garnished with <doctype> and <html> tags.
// in the XPath expression says: go to any depth in search of matches
Code: (Demo)
$html = <<<HTML
<div title="Cool stuff" alt="Cool stuff"><a title="another stuff">......</a></div>
HTML;
$newTitle = 'My new title';
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//#title') as $attr) {
$attr->value = $newTitle;
}
echo $dom->saveHTML();
Output:
<div title="My new title" alt="Cool stuff"><a title="My new title">......</a></div>

Related

How does one strip tags (and their content) from an HTML string using PHP's DOMDocument?

I'd like to remove all links and their contents from an HTML string.
So this ...
LINK1 and <i>also</i> LINK2 ... should become this: and <i>also</i>
The following ...
$html = 'LINK1 - and <i>also</i> LINK2';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = false;
$dom->resolveExternals = false;
$dom->substituteEntities = false;
$dom->loadHTML( $html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
$list = $dom->getElementsByTagName('a');
while ($list->length > 0) {
$p = $list->item(0);
$p->parentNode->removeChild($p);
}
$html_new = $dom->saveHTML();
echo htmlentities($html);
echo '<br><br><hr><br>';
echo htmlentities($html_new);
... does not work unless I wrap $html in a <div>, but then I have <div> and <i>also</i> </div>. I could use substr to strip the first 5 and last 6 characters off the result, but that's just stupid, and my face is already too sore from all the face-palming I've endured trying to figure out the above.
Any advice on how to strip all tags out of a string without using regex, or resorting to facepalmy hacks?
Based on Niet the Dark Absol's comment, my solution was to simply wrap my code nippet in a div, and then use substr to remove it. Seems like an acceptable workaround for working with valid inline HTML snippets (and not the entire DOM) via DOMDocument.
$html = 'LINK1 - and <i>also</i> LINK2';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = false;
$dom->resolveExternals = false;
$dom->substituteEntities = false;
$dom->loadHTML( '<div>'.$html.'</div>', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
$list = $dom->getElementsByTagName('a');
while ($list->length > 0) {
$p = $list->item(0);
$p->parentNode->removeChild($p);
}
$result = substr($dom->saveHTML(), 5, -6);

Using PHP preg_replace to append text to pattern found with regex

I want to append a tag div before and after all tags img.
So I have
<img src=%random url image% />
And it should be replaced with
<div class="demo"><img src=%random url image% /></div>
Can I do it with preg_replace?
$string = %page source code%;
$find = array("/<img(.*?)\/>/");
$replace = array('<div class="demo">'.$find[0].'</div>');
$result = preg_replace($find, $replace, $string);
But it not work :/
A better way to parse HTML is using PHPs DOMDocument and DOMXPath classes. In your case, you can use XPath to find all the images, then add a div around them as shown in this example:
$html = '<div><img src="http://x.com" /><span>xyz</span><img src="http://example.com" /></div>';
$doc = new DOMDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXpath($doc);
$images = $xpath->query('//img');
foreach ($images as $image) {
$div = $doc->createElement('div');
$div->setAttribute('class', 'demo');
$image->parentNode->replaceChild($div, $image);
$div->appendChild($image);
}
echo $doc->saveHTML();
Output:
<div>
<div class="demo"><img src="http://x.com"></div>
<span>xyz</span>
<a href="http://example.com">
<div class="demo"><img src="http://example.com"></div>
</a>
</div>
Demo on 3v4l.org

how to use preg_match_all to remove the <a> tag

$content = preg_replace("~(<a href=\"(.*)\">\w+)~iU", '', $content);
$ok = preg_replace("~(</a>)~iU", '', $content);
echo $ok;
I need to control the $content...
I want to remove all link in the $content....
even <img xxxx> all to remove A tag Just save <img xxx>...
How can I do?
I need edit the REGEX??
why I just can del the first one
You can replace the anchors with their contents using DOMDocument:
$html = <<<'EOS'
<img src="http://example.com">
<img src="http://example.com">
EOS;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//a') as $anchor) {
$fragment = $doc->createDocumentFragment();
// collecting the child nodes
foreach ($anchor->childNodes as $node) {
$fragment->appendChild($node);
}
// replace anchor with all child nodes
$anchor->parentNode->replaceChild($fragment, $anchor);
}
echo $doc->saveHTML();

Modify html attribute with php

I have a html string that contains exactly one a-element in it. Example:
test
In php I have to test if rel contains external and if yes, then modify href and save the string.
I have looked for DOM nodes and objects. But they seem to be too much for only one A-element, as I have to iterate to get html nodes and I am not sure how to test if rel exists and contains external.
$html = new DOMDocument();
$html->loadHtml($txt);
$a = $html->getElementsByTagName('a');
$attr = $a->item(0)->attributes();
...
At this point I am going to get NodeMapList that seems to be overhead. Is there any simplier way for this or should I do it with DOM?
Is there any simplier way for this or should I do it with DOM?
Do it with DOM.
Here's an example:
<?php
$html = 'test';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(#rel), ' '), ' external ')]");
foreach($nodes as $node) {
$node->setAttribute('href', 'http://example.org');
}
echo $dom->saveHTML();
I kept going to modify with DOM. This is what I get:
$html = new DOMDocument();
$html->loadHtml('<?xml encoding="utf-8" ?>' . $txt);
$nodes = $html->getElementsByTagName('a');
foreach ($nodes as $node) {
foreach ($node->attributes as $att) {
if ($att->name == 'rel') {
if (strpos($att->value, 'external')) {
$node->setAttribute('href','modified_url_goes_here');
}
}
}
}
$txt = $html->saveHTML();
I did not want to load any other library for just this one string.
The best way is to use a HTML parser/DOM, but here's a regex solution:
$html = 'test<br>
<p> Some text</p>
test2<br>
<a rel="external">test3</a> <-- This won\'t work since there is no href in it.
';
$new = preg_replace_callback('/<a.+?rel\s*=\s*"([^"]*)"[^>]*>/i', function($m){
if(strpos($m[1], 'external') !== false){
$m[0] = preg_replace('/href\s*=\s*(("[^"]*")|(\'[^\']*\'))/i', 'href="http://example.com"', $m[0]);
}
return $m[0];
}, $html);
echo $new;
Online demo.
You could use a regular expression like
if it matches /\s+rel\s*=\s*".*external.*"/
then do a regExp replace like
/(<a.*href\s*=\s*")([^"]\)("[^>]*>)/\1[your new href here]\3/
Though using a library that can do this kind of stuff for you is much easier (like jquery for javascript)

PHP Manipulating HTML from string

I'm reading in an HTML string from a text editor and need to manipulate some of the elements before saving it to the DB.
What I have is something like this:
<h3>Some Text<img src="somelink.jpg" /></h3>
or
<h3><img src="somelink.jpg" />Some Text</h3>
and I need to put it into the following format
<h3>Some Text</h3><div class="img_wrapper"><img src="somelink.jpg" /></div>
This is the solution that I came up with.
$html = '<html><body>' . $field["data"][0] . '</body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$domNodeList = $dom->getElementsByTagName("img");
// Remove Img tags from H3 and place it before the H# tag
foreach ($domNodeList as $domNode) {
if ($domNode->parentNode->nodeName == "h3") {
$parentNode = $domNode->parentNode;
$parentParentNode = $parentNode->parentNode;
$parentParentNode->insertBefore($domNode, $parentNode->nextSibling);
}
}
echo $dom->saveHtml();
You may be looking for a preg_replace
// take a search pattern, wrap the image tag matching parts in a tag
// and put the start and ending parts before the wrapped image tag.
// note: this will not match tags that contain > characters within them,
// and will only handle a single image tag
$output = preg_replace(
'|(<h3>[^<]*)(<img [^>]+>)([^<]*</h3>)|',
'$1$3<div class="img_wrapper">$2</div>',
$input
);
I updated the question with the answer, but for good measure, here it is again in the answers section.
$html = '<html><body>' . $field["data"][0] . '</body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$domNodeList = $dom->getElementsByTagName("img");
// Remove Img tags from H3 and place it before the H# tag
foreach ($domNodeList as $domNode) {
if ($domNode->parentNode->nodeName == "h3") {
$parentNode = $domNode->parentNode;
$parentParentNode = $parentNode->parentNode;
$parentParentNode->insertBefore($domNode, $parentNode->nextSibling);
}
}
echo $dom->saveHtml();

Categories