I want to add target="_blank" to <a> tags to open that link in a new page, so I found this RegEx :
$content = preg_replace('/(<a href[^<>]+)>/is', '\\1 target="_blank">', $content);
This will work without any problem, but this code will add target="_blank" to all links, I want to add just to links which href will start with http://
How can I do this?
You've asked for a regular expression here, but it's not the right tool for the job.
$doc = new DOMDocument;
$doc->loadHTML($html); // Load your HTML
$xpath = new DOMXPath($doc);
$links = $xpath->query('//a[starts-with(#href, "http://")]');
foreach($links as $link) {
$link->setAttribute('target', '_blank');
}
echo $doc->saveHTML();
If you want to exclude internal links as suggested in the comments, you can do:
$links = $xpath->query('//a[starts-with(#href, "http://") and
not(starts-with(#href, "http://yoursite.com")) and
not(starts-with(#href, "http://www.yoursite.com))]');
You can use this regex:
(<a\b[^<>]*href=['"]?http[^<>]+)>
See demo.
I have added \b[^<>]* to account for any other attributes before href.
Sample code:
$re = "/(<a\\b[^<>]*href=['\"]?http[^<>]+)>/is";
$str = "<a href=\"do.com\">\n<a href=\"do.com\">\n<a another=\"val\" href=\"http://do.com\">\n";
$subst = "$1 target=\"_blank\">";
$result = preg_replace($re, $subst, $str);
Related
I have the following code:
$string = 'Try to remove the link text from the content links in it Try to remove the link text from the content testme Try to remove the link text from the content';
$string = preg_replace('#(<a.*?>).*?(</a>)#', '$1$2', $string);
$result = preg_replace('/<a href="(.*?)">(.*?)<\/a>/', "\\2", $string);
echo $result; // this will output "I am a lot of text with links in it";
I am looking to merge these preg_replace lines. Please suggest.
You need to use DOM for these tasks. Here is a sample that removes links from this content of yours:
$str = 'Try to remove the link text from the content links in it Try to remove the link text from the content testme Try to remove the link text from the content';
$dom = new DOMDocument;
#$dom->loadHTML($str, LIBXML_HTML_NOIMPLIED|LIBXML_HTML_NODEFDTD);
$xp = new DOMXPath($dom);
$links = $xp->query('//a');
foreach ($links as $link) {
$link->parentNode->removeChild($link);
}
echo preg_replace('/^<p>([^<>]*)<\/p>$/', '$1', #$dom->saveHTML());
Since the text node is the only one in the document, the PHP DOM creates a dummy p node to wrap the text, so I am using a preg_replace to remove it. I think it is not your case.
See IDEONE demo
How to remove all attributes from <a> tag except href="/index.php..." ? and add a custom class to it ?
So this:
content
Becomes:
content
i cant manage the preg_replace to work it: `
<?php
$text = 'content';
echo preg_replace("/<a([a-z][a-z0-9]*)(?:[^>]*(\shref=['\"][^'\"]['\"]))?>/i", '<$1$2$3>', $text);
?>
DOMDocument is better, but with regex
preg_replace("/<a [^>]*?(href=[^ >]+)[^>]*>/i", '<a $1 class="custom">', $text);
Assumes no space in href and no > in attributes.
You could use DomDocument:
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML('content');
$items = $doc->getElementsByTagName('a');
$href = $items->item(0)->getAttribute('href');
$value = $items->item(0)->nodeValue;
libxml_clear_errors();
echo ''.$value.'';
Say I have this.
$string = "<div class=\"name\">anyting</div>1234<div class=\"name\">anyting</div>abcd";
$regex = "#([<]div)(.*)([<]/div[>])#";
echo preg_replace($regex,'',$string);
The output is
abcd
But I want
1234abcd
How do I do it?
Like this:
preg_replace('/(<div[^>]*>)(.*?)(<\/div>)/i', '$1$3', $string);
If you want to remove the divs too:
preg_replace('/<div[^>]*>.*?<\/div>/i', '', $string);
To replace only the content in the divs with class name and not other classes:
preg_replace('/(<div.*?class="name"[^>]*>)(.*?)(<\/div>)/i', '$1$3', $string);
$string = "<div class=\"name\">anything</div>1234<div class=\"name\">anything</div>abcd";
echo preg_replace('%<div.*?</div>%i', '', $string); // echo's 1234abcd
Live example:
http://codepad.org/1XEC33sc
add ?, it will find FIRST occurence
preg_replace('~<div .*?>(.*?)</div>~','', $string);
http://sandbox.phpcode.eu/g/c201b/3
This might be a simple example, but if you have a more complex one, use an HTML/XML parser. For example with DOMDocument:
$doc = DOMDocument::loadHTML($string);
$xpath = new DOMXPath($doc);
$query = "//body/text()";
$nodes = $xpath->query($query);
$text = "";
foreach($nodes as $node) {
$text .= $node->wholeText;
}
Which query you have to use or whether you have to process the DOM tree in some other way, depends on the particular content you have.
I'm not sure how to explain this, so I'll show it on my code.
First and
Second and
Third
how can I delete opening and closing but not the rest?
I'm asking for preg_replace(); and I'm not looking for DomDocument or others methods to do it. I just want to see example on preg_replace();
how is it achievable?
Only pick the groups you want to preserve:
$pattern = '~()([^<]*)()~';
// 1 2 3
$result = preg_replace($pattern, '$2', $subject);
You find more examples on the preg_replace manual page.
Since you asked me in the comments to show any method of doing this, here it is.
$html =<<<HTML
First and
Second and
Third
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$elems = $xpath->query("//a[#class='delete']");
foreach ($elems as $elem) {
$elem->parentNode->removeChild($elem);
}
echo $dom->saveHTML();
Note that saveHTML() saves a complete document even if you only parsed a fragment.
As of PHP 5.3.6 you can add a $node parameter to specify the fragment it should return - something like $xpath->query("/*/body")[0] would work.
$pattern = '/<a (.*?)href=[\"\'](.*?)\/\/(.*?)[\"\'](.*?)>(.*?)<\/a>/i';
$new_content = preg_replace($pattern, '$5', $content);
$pattern = '/<a[^<>]*?class="delete"[^<>]*?>(.*?)<\/a>/';
$test = 'First and Second and Third';
echo preg_replace($pattern, '$1', $test)."\n";
$test = 'First and <b class="delete">seriously</b> and Third';
echo preg_replace($pattern, '$1', $test)."\n";
$test = 'First and <b class="delete">seriously</b> and Third';
echo preg_replace($pattern, '$1', $test)."\n";
$test = 'First and <a class="delete" href="url2.html">Second</a> and Third';
echo preg_replace($pattern, '$1', $test)."\n";
preg_replace('#(.+?)#', '$1', $html_string);
It is important to understand this is not an ideal solution. First, it requires markup in this exact format. Second, if there were, say, a nested anchor tag (albeit unlikely) this would fail. These are some of the many reasons why Regular Expressions should not be used for parsing/manipulating HTML.
I am interesting in removing all the text within the following tags:
<p class="wp-caption-text">Remove this text</p>
Can anybody give me an idea of how this can be done in php?
Thank you very much
Get rid of the tag and content inside of it:
$content = preg_replace('/<p\sclass=\"wp\-caption\-text\">[^<]+<\/p>/i', '', $content);
or if you want to preserve the tags:
$content = preg_replace('/(<p\sclass=\"wp\-caption\-text\">)[^<]+(<\/p>)/i', '$1$2', $content);
As bit higher-level alternative to regular expressions.
You can process with DOM. You can match all nodes you're looking for with XPath //p[#class="wp-caption-text"].
For example:
$doc = new DOMDocument();
$doc->loadHTML($yourHTMLasString);
$xpath = new DOMXPath($doc);
$query = '//p[#class="wp-caption-text"]';
$entries = $xpath->query($query);
foreach ($entries as $entry) {
$entry->textContent = '';
}
echo $doc->saveHTML();
Try this:
$string = '<p class="wp-caption-text">Remove this text</p>';
$pattern = '/(.*<p .*>).*(<\/p>.*)/';
$replacement = '$1$2';
echo preg_replace($pattern, $replacement, $string);
if its always the same tag you could simply do search for the string. use the position resulting to substring from it to the closing tag.
Or you could use a regular expression, there are good ones posted here that can help you.