I have html code in a variable. For example $html equals:
<div title="Cool stuff" alt="Cool stuff"><a title="another stuff">......</a></div>
I need to replace content of all title attributes title="Cool stuff" and title="anot stuff" and so on with title="$newTitle".
Is there any non-regex way to do this?
And if I have to use regex is there a better(performance-wise) and/or more elegant solution than what I came up with?
$html = '...'
$newTitle = 'My new title';
$matches = [];
preg_match_all(
'/title=(\"|\')([^\"\']{1,})(\"|\')/',
$html,
$matches
);
$attributeTitleValues = $matches[2];
foreach ($attributeTitleValues as $title)
{
$html = str_replace("title='{$title}'", "title='{$newTitle}'", $html);
$html = str_replace("title=\"{$title}\"", "title=\"{$newTitle}\"", $html);
}
Definitely don't use regex -- it is a dirty rabbit hole....the hole is dirty, not the rabbit :)
I prefer to use DomDocument and Xpath to directly target all title attributes of all element in your html document.
LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD flags are in place to prevent your output being garnished with <doctype> and <html> tags.
// in the XPath expression says: go to any depth in search of matches
Code: (Demo)
$html = <<<HTML
<div title="Cool stuff" alt="Cool stuff"><a title="another stuff">......</a></div>
HTML;
$newTitle = 'My new title';
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//#title') as $attr) {
$attr->value = $newTitle;
}
echo $dom->saveHTML();
Output:
<div title="My new title" alt="Cool stuff"><a title="My new title">......</a></div>
Related
I'd like to remove all links and their contents from an HTML string.
So this ...
LINK1 and <i>also</i> LINK2 ... should become this: and <i>also</i>
The following ...
$html = 'LINK1 - and <i>also</i> LINK2';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = false;
$dom->resolveExternals = false;
$dom->substituteEntities = false;
$dom->loadHTML( $html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
$list = $dom->getElementsByTagName('a');
while ($list->length > 0) {
$p = $list->item(0);
$p->parentNode->removeChild($p);
}
$html_new = $dom->saveHTML();
echo htmlentities($html);
echo '<br><br><hr><br>';
echo htmlentities($html_new);
... does not work unless I wrap $html in a <div>, but then I have <div> and <i>also</i> </div>. I could use substr to strip the first 5 and last 6 characters off the result, but that's just stupid, and my face is already too sore from all the face-palming I've endured trying to figure out the above.
Any advice on how to strip all tags out of a string without using regex, or resorting to facepalmy hacks?
Based on Niet the Dark Absol's comment, my solution was to simply wrap my code nippet in a div, and then use substr to remove it. Seems like an acceptable workaround for working with valid inline HTML snippets (and not the entire DOM) via DOMDocument.
$html = 'LINK1 - and <i>also</i> LINK2';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = false;
$dom->resolveExternals = false;
$dom->substituteEntities = false;
$dom->loadHTML( '<div>'.$html.'</div>', LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
$list = $dom->getElementsByTagName('a');
while ($list->length > 0) {
$p = $list->item(0);
$p->parentNode->removeChild($p);
}
$result = substr($dom->saveHTML(), 5, -6);
I want to append a tag div before and after all tags img.
So I have
<img src=%random url image% />
And it should be replaced with
<div class="demo"><img src=%random url image% /></div>
Can I do it with preg_replace?
$string = %page source code%;
$find = array("/<img(.*?)\/>/");
$replace = array('<div class="demo">'.$find[0].'</div>');
$result = preg_replace($find, $replace, $string);
But it not work :/
A better way to parse HTML is using PHPs DOMDocument and DOMXPath classes. In your case, you can use XPath to find all the images, then add a div around them as shown in this example:
$html = '<div><img src="http://x.com" /><span>xyz</span><img src="http://example.com" /></div>';
$doc = new DOMDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXpath($doc);
$images = $xpath->query('//img');
foreach ($images as $image) {
$div = $doc->createElement('div');
$div->setAttribute('class', 'demo');
$image->parentNode->replaceChild($div, $image);
$div->appendChild($image);
}
echo $doc->saveHTML();
Output:
<div>
<div class="demo"><img src="http://x.com"></div>
<span>xyz</span>
<a href="http://example.com">
<div class="demo"><img src="http://example.com"></div>
</a>
</div>
Demo on 3v4l.org
$content = preg_replace("~(<a href=\"(.*)\">\w+)~iU", '', $content);
$ok = preg_replace("~(</a>)~iU", '', $content);
echo $ok;
I need to control the $content...
I want to remove all link in the $content....
even <img xxxx> all to remove A tag Just save <img xxx>...
How can I do?
I need edit the REGEX??
why I just can del the first one
You can replace the anchors with their contents using DOMDocument:
$html = <<<'EOS'
<img src="http://example.com">
<img src="http://example.com">
EOS;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//a') as $anchor) {
$fragment = $doc->createDocumentFragment();
// collecting the child nodes
foreach ($anchor->childNodes as $node) {
$fragment->appendChild($node);
}
// replace anchor with all child nodes
$anchor->parentNode->replaceChild($fragment, $anchor);
}
echo $doc->saveHTML();
I have a html string that contains exactly one a-element in it. Example:
test
In php I have to test if rel contains external and if yes, then modify href and save the string.
I have looked for DOM nodes and objects. But they seem to be too much for only one A-element, as I have to iterate to get html nodes and I am not sure how to test if rel exists and contains external.
$html = new DOMDocument();
$html->loadHtml($txt);
$a = $html->getElementsByTagName('a');
$attr = $a->item(0)->attributes();
...
At this point I am going to get NodeMapList that seems to be overhead. Is there any simplier way for this or should I do it with DOM?
Is there any simplier way for this or should I do it with DOM?
Do it with DOM.
Here's an example:
<?php
$html = 'test';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(#rel), ' '), ' external ')]");
foreach($nodes as $node) {
$node->setAttribute('href', 'http://example.org');
}
echo $dom->saveHTML();
I kept going to modify with DOM. This is what I get:
$html = new DOMDocument();
$html->loadHtml('<?xml encoding="utf-8" ?>' . $txt);
$nodes = $html->getElementsByTagName('a');
foreach ($nodes as $node) {
foreach ($node->attributes as $att) {
if ($att->name == 'rel') {
if (strpos($att->value, 'external')) {
$node->setAttribute('href','modified_url_goes_here');
}
}
}
}
$txt = $html->saveHTML();
I did not want to load any other library for just this one string.
The best way is to use a HTML parser/DOM, but here's a regex solution:
$html = 'test<br>
<p> Some text</p>
test2<br>
<a rel="external">test3</a> <-- This won\'t work since there is no href in it.
';
$new = preg_replace_callback('/<a.+?rel\s*=\s*"([^"]*)"[^>]*>/i', function($m){
if(strpos($m[1], 'external') !== false){
$m[0] = preg_replace('/href\s*=\s*(("[^"]*")|(\'[^\']*\'))/i', 'href="http://example.com"', $m[0]);
}
return $m[0];
}, $html);
echo $new;
Online demo.
You could use a regular expression like
if it matches /\s+rel\s*=\s*".*external.*"/
then do a regExp replace like
/(<a.*href\s*=\s*")([^"]\)("[^>]*>)/\1[your new href here]\3/
Though using a library that can do this kind of stuff for you is much easier (like jquery for javascript)
I'm reading in an HTML string from a text editor and need to manipulate some of the elements before saving it to the DB.
What I have is something like this:
<h3>Some Text<img src="somelink.jpg" /></h3>
or
<h3><img src="somelink.jpg" />Some Text</h3>
and I need to put it into the following format
<h3>Some Text</h3><div class="img_wrapper"><img src="somelink.jpg" /></div>
This is the solution that I came up with.
$html = '<html><body>' . $field["data"][0] . '</body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$domNodeList = $dom->getElementsByTagName("img");
// Remove Img tags from H3 and place it before the H# tag
foreach ($domNodeList as $domNode) {
if ($domNode->parentNode->nodeName == "h3") {
$parentNode = $domNode->parentNode;
$parentParentNode = $parentNode->parentNode;
$parentParentNode->insertBefore($domNode, $parentNode->nextSibling);
}
}
echo $dom->saveHtml();
You may be looking for a preg_replace
// take a search pattern, wrap the image tag matching parts in a tag
// and put the start and ending parts before the wrapped image tag.
// note: this will not match tags that contain > characters within them,
// and will only handle a single image tag
$output = preg_replace(
'|(<h3>[^<]*)(<img [^>]+>)([^<]*</h3>)|',
'$1$3<div class="img_wrapper">$2</div>',
$input
);
I updated the question with the answer, but for good measure, here it is again in the answers section.
$html = '<html><body>' . $field["data"][0] . '</body></html>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$domNodeList = $dom->getElementsByTagName("img");
// Remove Img tags from H3 and place it before the H# tag
foreach ($domNodeList as $domNode) {
if ($domNode->parentNode->nodeName == "h3") {
$parentNode = $domNode->parentNode;
$parentParentNode = $parentNode->parentNode;
$parentParentNode->insertBefore($domNode, $parentNode->nextSibling);
}
}
echo $dom->saveHtml();