extracting and printing an html element by it's id using DOMDocument - php

i want to extract couple of tables from a web page and show them in my page
i was going to use regex to extract them but then i saw the DOMDocument class
and it seems cleaner i've looked in stackoverflow and it seems all the questions are about getting inner text or using a loop to get inner nodes of elements . i want to now how can i extract and print a html element by it's id .
$html = file_get_contents("www.site.com");
$xml = new DOMDocument();
$xml->loadHTML($html);
$xpath = new DOMXPath($xml);
$table =$xpath->query("//*[#id='myid']");
$table->saveHTML(); // this obviously doesn't work
how can i show or echo the $table as an actual html table on my page ?

Firstly, DOMDocument has a getElementById() method so your XPath is unnecessary - although I suspect that is how it works underneath.
Secondly, in order to get fragments of markup rather than a whole document, you use DOMNode::C41N(), so your code would look like this:
<?php
// Load the HTML into a DOMDocument
// Don't forget you could just pass the URL to loadHTML()
$html = file_get_contents("www.site.com");
$dom = new DOMDocument('1.0');
$dom->loadHTML($html);
// Get the target element
$element = $dom->getElementById('myid');
// Get the HTML as a string
$string = $element->C14N();
See a working example.

You can use DOMElement::C14N() to get the canonicalized HTML(XML) representation of a DOMElement, or if you like a bit more control so that you can filter certain elements and attributes you can use something like this:
function toHTML($nodeList, $tagsToStrip=array('script','object','noscript','form','style'),$attributesToSkip=array('on*')) {
$html = '';
foreach($nodeList as $subIndex => $values) {
if(!in_array(strtolower($values->nodeName), $tagsToStrip)) {
if(substr($values->nodeName,0,1) != '#') {
$html .= ' <'.$values->nodeName;
if($values->attributes) {
for($i=0;$values->attributes->item($i);$i++) {
if( !in_array( strtolower($values->attributes->item($i)->nodeName) , $attributesToSkip ) && (in_array('on*',$attributesToSkip) && substr( strtolower($values->attributes->item($i)->nodeName) ,0 , 2) != 'on') ) {
$vvv = $values->attributes->item($i)->nodeValue;
if( in_array( strtolower($values->attributes->item($i)->nodeName) , array('src','href') ) ) {
$vvv = resolve_href( $this->url , $vvv );
}
$html .= ' '.$values->attributes->item($i)->nodeName.'="'.$vvv.'"';
}
}
}
if(in_array(strtolower($values->nodeName), array('br','img'))) {
$html .= ' />';
} else {
$html .= '> ';
if(!$values->firstChild) {
$html .= htmlspecialchars( $values->textContent , ENT_COMPAT , 'UTF-8' , true );
} else {
$html .= toHTML($values->childNodes,$tagsToStrip,$attributesToSkip);
}
$html .= ' </'.$values->nodeName.'> ';
}
} elseif(substr($values->nodeName,1,1) == 't') {
$inner = htmlspecialchars( $values->textContent , ENT_COMPAT , 'UTF-8' , true );
$html .= $inner;
}
}
}
return $html;
}
echo toHTML($table);

Related

Shortcode content not visible

I have this code that creates a dynamic TOC inside single.php at the top of the page, based on the given class "toc-item" for specific elements. It works as a filter but I would like to use it in a shortcode.
I tried using add_shortcode('toc_content', 'create_toc'); but it does not display anything. Any idea how can I change this into a shortcode function? Thanks a lot!
function create_toc($html) {
$toc = '';
if (is_single()) {
if (!$html) return $html;
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
libxml_clear_errors();
$toc = '<div class="toc-bound">
<div class="toc-ctr">
table of contents
</div>
<ul class="toc">';
$h2_status = 0;
$i = 1;
$xpath = new DOMXpath($dom);
$expression = '//*[contains(concat(" ", normalize-space(#class), " "), " toc-item ")]';
foreach ($xpath->evaluate($expression) as $element) {
if($h2_status){
$toc .= '</li>';
$h2_status = 0;
}
$h2_status = 1;
$toc .= '<li>' . $element->textContent . '';
$element->setAttribute('id', 'toc-' . $i);
$i++;
}
if($h2_status){
$toc .= '</li>';
}
$toc .= '</ul></div>';
$html = $dom->saveHTML();
}
return $toc . $html;
}
add_filter('the_content', 'create_toc');
//add_shortcode('toc_content', 'create_toc'); does not work
The add_shortcode() callback function is passed three arguments: $atts (an array of shortcode arguments), $content (whatever it is inside the shortcode tags, if anything at all) and $tag (the name of the shortcode).
Your $html argument, as it is, would be an array of shortcode arguments (always empty I'm assuming).
If you just need the shortcode to return a div element, you probably don't need any argument at all. Just return the element.

PHP Encode parts of a string and then decode

I have a string containing HTML and some placeholders.
The placeholders always start with {{ and end with }}.
I'm trying to encode the contents of places holders and the decode them later.
While they're encoded the ideally need to be valid HTML as I want to use DOMDocument on the string and the problem I'm having is that it ends up being a mess because the places holders are usually something like:
<img src="{{image url="mediadir/someimage.jpg"}}"/>
Sometimes they're something like this though:
<p>Some text</p>
{{widget type="pagelink" pageid="1"}}
<div class="whatever">Content</div>
I was wondering what the best way of doing this, thanks!
UPDATE: CONTEXT
The overall problem is that I have Magento site with a bunch of static links like:
Link text
And I need to replace them with widgets to the page so that if the URL changes the links update. So replace the above with something like this.
{{widget type="Magento\Cms\Block\Widget\Page\Link" anchor_text="Link Text" template="widget/link/link_block.phtml" page_id="123"}}
I have something which does this using the PHP DOMDocument functionality. It looks up CMS page through their URL, finds the ID and replaces the anchor node with the widget text. This works fine if the page doesn't already contain any widgets or URL placeholders.
However if it does then the placeholders come out broken when processed through the DOMDocument saveHTML() function.
My idea of a solution to this was to encode the widgets and URL placeholders before passing it toe DOMDocument loadHTML() function and decode them after the saveHTML() function when it is string again.
UPDATE: CODE
This is a cut down version of what I've got currently. It's messy but it does work in replacing pages with widgets.
$pageCollection = $this->pageCollectionFactory->create();
$collection = $pageCollection->load();
$findarray = array('http', 'mailto', '.pdf', '{', '}');
$findarray2 = array('mailto', '.pdf', '{', '}');
$specialurl = 'https://www.example.com';
$relative_links = 0;
$missing_pages = 0;
$fixed_links = 0;
try {
foreach ($collection as $page) {
$dom = new \DOMDocument();
$content = $this->cleanMagentoCode( $page->getContent() );
libxml_use_internal_errors(true); // Surpress warnings created by reading bad HTML
$dom->loadHTML( $content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD ); // Load HTML without doctype or html containing elements
$elements = $dom->getElementsByTagName("a");
for ($i = $elements->length - 1; $i >= 0; $i --) {
$link = $elements->item($i);
$found = false;
// To clean up later
if ( strpos($link->getAttribute('href'), $specialurl) !== FALSE ) {
foreach ($findarray2 as $find) {
if (stripos($link->getAttribute('href'), $find) !== FALSE) {
$found = true;
break;
}
}
} else {
foreach ($findarray as $find) {
if (stripos($link->getAttribute('href'), $find) !== FALSE) {
$found = true;
break;
}
}
}
if ( strpos($link->getAttribute('href'), '#') === 0 ) {
$found = true;
}
if ( $link->getAttribute('href') == '' ) {
$found = true;
}
if ( !$found ) {
$url = parse_url($link->getAttribute('href'));
if ( isset( $url['path'] ) ) {
$identifier = rtrim( ltrim($url['path'], '/'), '/' );
try {
$pagelink = $this->pageRepository->getById($identifier);
// Fix link
if ($this->fixLinksFlag($input)) {
if ( stripos( $link->getAttribute('class'), "btn" ) !== FALSE ) {
$link_template = "widget/link/link_block.phtml";
} else {
$link_template = "widget/link/link_inline.phtml";
}
$widgetcode = '{{widget type="Magento\Cms\Block\Widget\Page\Link" anchor_text="' . $link->nodeValue . '" template="' . $link_template . '" page_id="' . $pagelink->getId() . '"}}';
$widget = $dom->createTextNode($widgetcode);
$link->parentNode->replaceChild($widget, $link);
}
}
}
}
}
$page->setContent( $this->dirtyMagentoCode( $dom->saveHTML() ) );
$page->save();
}
}

adding rel="nofollow" while saving data

I have my application to allow users to write comments on my website. Its working fine. I also have tool to insert their weblinks in it. I feel good with contents with their own weblinks.
Now i want to add rel="nofollow" to every links on content that they have been written.
I would like to add rel="nofollow" using php i.e while saving data.
So what's a simple method to add rel="nofollow" or updated rel="someother" with rel="someother nofollow" using php
a nice example will be much efficient
Regexs really aren't the best tool for dealing with HTML, especially when PHP has a pretty good HTML parser built in.
This code will handle adding nofollow if the rel attribute is already populated.
$dom = new DOMDocument;
$dom->loadHTML($str);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $anchor) {
$rel = array();
if ($anchor->hasAttribute('rel') AND ($relAtt = $anchor->getAttribute('rel')) !== '') {
$rel = preg_split('/\s+/', trim($relAtt));
}
if (in_array('nofollow', $rel)) {
continue;
}
$rel[] = 'nofollow';
$anchor->setAttribute('rel', implode(' ', $rel));
}
var_dump($dom->saveHTML());
CodePad.
The resulting HTML is in $dom->saveHTML(). Except it will wrap it with html, body elements, etc, so use this to extract just the HTML you entered...
$html = '';
foreach($dom->getElementsByTagName('body')->item(0)->childNodes as $element) {
$html .= $dom->saveXML($element, LIBXML_NOEMPTYTAG);
}
echo $html;
If you have >= PHP 5.3, replace saveXML() with saveHTML() and drop the second argument.
Example
This HTML...
hello
hello
hello
hello
...is converted into...
hello
hello
hello
hello
Good Alex. If it is in the form of a function it is more useful. So I made it below:
function add_no_follow($str){
$dom = new DOMDocument;
$dom->loadHTML($str);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $anchor) {
$rel = array();
if ($anchor->hasAttribute('rel') AND ($relAtt = $anchor->getAttribute('rel')) !== '') {
$rel = preg_split('/\s+/', trim($relAtt));
}
if (in_array('nofollow', $rel)) {
continue;
}
$rel[] = 'nofollow';
$anchor->setAttribute('rel', implode(' ', $rel));
}
$dom->saveHTML();
$html = '';
foreach($dom->getElementsByTagName('body')->item(0)->childNodes as $element) {
$html .= $dom->saveXML($element, LIBXML_NOEMPTYTAG);
}
return $html;
}
Use as follows :
$str = "Some content with link Some content ... ";
$str = add_no_follow($str);
I've copied Alex's answer and made it into a function that makes links nofollow and open in a new tab/window (and added UTF-8 support). I'm not sure if this is the best way to do this, but it works (constructive input is welcome):
function nofollow_new_window($str)
{
$dom = new DOMDocument;
$dom->loadHTML($str);
$anchors = $dom->getElementsByTagName('a');
foreach($anchors as $anchor)
{
$rel = array();
if ($anchor->hasAttribute('rel') AND ($relAtt = $anchor->getAttribute('rel')) !== '') {
$rel = preg_split('/\s+/', trim($relAtt));
}
if (in_array('nofollow', $rel)) {
continue;
}
$rel[] = 'nofollow';
$anchor->setAttribute('rel', implode(' ', $rel));
$target = array();
if ($anchor->hasAttribute('target') AND ($relAtt = $anchor->getAttribute('target')) !== '') {
$target = preg_split('/\s+/', trim($relAtt));
}
if (in_array('_blank', $target)) {
continue;
}
$target[] = '_blank';
$anchor->setAttribute('target', implode(' ', $target));
}
$str = utf8_decode($dom->saveHTML($dom->documentElement));
return $str;
}
Simply use the function like this:
$str = '<html><head></head><body>fdsafffffdfsfdffff dfsdaff flkklfd aldsfklffdssfdfds Google</body></html>';
$str = nofollow_new_window($str);
echo $str;

Find and append hrefs of a certain class

I've been searching for a solution to this but haven't found quite the right thing yet.
The situation is this:
I need to find all links on a page with a given class (say class="tracker") and then append query string values on the end, so when a user loads a page, those certain links are updated with some dynamic information.
I know how this can be done with Javascript, but I'd really like to adapt it to run server side instead. I'm quite new to PHP, but from the looks of it, XPath might be what I'm looking for but I haven't found a suitable example to get started with. Is there anything like GetElementByClass?
Any help would be greatly appreciated!
Shadowise
Is there anything like GetElementByClass?
Here is an implementation I whipped up...
function getElementsByClassName(DOMDocument $domNode, $className) {
$elements = $domNode->getElementsByTagName('*');
$matches = array();
foreach($elements as $element) {
if ( ! $element->hasAttribute('class')) {
continue;
}
$classes = preg_split('/\s+/', $element->getAttribute('class'));
if ( ! in_array($className, $classes)) {
continue;
}
$matches[] = $element;
}
return $matches;
}
This version doesn't rely on the helper function above.
$str = '<body>
a
a
a
a
</body>
';
$dom = new DOMDocument;
$dom->loadHTML($str);
$anchors = $dom->getElementsByTagName('body')->item(0)->getElementsByTagName('a');
foreach($anchors as $anchor) {
if ( ! $anchor->hasAttribute('class')) {
continue;
}
$classes = preg_split('/\s+/', $anchor->getAttribute('class'));
if ( ! in_array('tracker', $classes)) {
continue;
}
$href = $anchor->getAttribute('href');
$url = parse_url($href);
$attach = 'stackoverflow=true';
if (isset($url['query'])) {
$href .= '&' . $attach;
} else {
$href .= '?' . $attach;
}
$anchor->setAttribute('href', $href);
}
echo $dom->saveHTML();
Output
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
a
a
a
a
</body></html>
I need to find all links on a page
with a given class (say
class="tracker")
[...]
I'm quite new to PHP, but from the
looks of it, XPath might be what I'm
looking for but I haven't found a
suitable example to get started with.
Is there anything like
GetElementByClass?
This XPath 1.0 expression:
//a[contains(
concat(' ',normalize-space(#class),' '),
' tracker '
)
]
A bit shorter, using xpath:
$dom = new DomDocument();
$dom->loadXml('<?xml version="1.0" encoding="UTF-8" ?>
<root>
label
label
label
label
label
label
label
</root>');
$xpath = new DomXPath($dom);
foreach ($xpath->query('//a[contains(#class, "tracker")]') as $node) {
if (preg_match('/\btracker\b/', $node->getAttribute('class'))) {
$node->setAttribute(
'href',
$node->getAttribute('href') . '#some_extra'
);
}
}
header('Content-Type: text/xml; charset"UTF-8"');
echo $dom->saveXml();

How to add rel="nofollow" to links with preg_replace()

The function below is designed to apply rel="nofollow" attributes to all external links and no internal links unless the path matches a predefined root URL defined as $my_folder below.
So given the variables...
$my_folder = 'http://localhost/mytest/go/';
$blog_url = 'http://localhost/mytest';
And the content...
internal
internal cloaked link
external
The end result, after replacement should be...
internal
internal cloaked link
external
Notice that the first link is not altered, since its an internal link.
The link on the second line is also an internal link, but since it matches our $my_folder string, it gets the nofollow too.
The third link is the easiest, since it does not match the blog_url, its obviously an external link.
However, in the script below, ALL of my links are getting nofollow. How can I fix the script to do what I want?
function save_rseo_nofollow($content) {
$my_folder = $rseo['nofollow_folder'];
$blog_url = get_bloginfo('url');
preg_match_all('~<a.*>~isU',$content["post_content"],$matches);
for ( $i = 0; $i <= sizeof($matches[0]); $i++){
if ( !preg_match( '~nofollow~is',$matches[0][$i])
&& (preg_match('~' . $my_folder . '~', $matches[0][$i])
|| !preg_match( '~'.$blog_url.'~',$matches[0][$i]))){
$result = trim($matches[0][$i],">");
$result .= ' rel="nofollow">';
$content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
}
}
return $content;
}
Here is the DOMDocument solution...
$str = 'internal
internal cloaked link
external
external
external
external
';
$dom = new DOMDocument();
$dom->preserveWhitespace = FALSE;
$dom->loadHTML($str);
$a = $dom->getElementsByTagName('a');
$host = strtok($_SERVER['HTTP_HOST'], ':');
foreach($a as $anchor) {
$href = $anchor->attributes->getNamedItem('href')->nodeValue;
if (preg_match('/^https?:\/\/' . preg_quote($host, '/') . '/', $href)) {
continue;
}
$noFollowRel = 'nofollow';
$oldRelAtt = $anchor->attributes->getNamedItem('rel');
if ($oldRelAtt == NULL) {
$newRel = $noFollowRel;
} else {
$oldRel = $oldRelAtt->nodeValue;
$oldRel = explode(' ', $oldRel);
if (in_array($noFollowRel, $oldRel)) {
continue;
}
$oldRel[] = $noFollowRel;
$newRel = implode($oldRel, ' ');
}
$newRelAtt = $dom->createAttribute('rel');
$noFollowNode = $dom->createTextNode($newRel);
$newRelAtt->appendChild($noFollowNode);
$anchor->appendChild($newRelAtt);
}
var_dump($dom->saveHTML());
Output
string(509) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
internal
internal cloaked link
external
external
external
external
</body></html>
"
Try to make it more readable first, and only afterwards make your if rules more complex:
function save_rseo_nofollow($content) {
$content["post_content"] =
preg_replace_callback('~<(a\s[^>]+)>~isU', "cb2", $content["post_content"]);
return $content;
}
function cb2($match) {
list($original, $tag) = $match; // regex match groups
$my_folder = "/hostgator"; // re-add quirky config here
$blog_url = "http://localhost/";
if (strpos($tag, "nofollow")) {
return $original;
}
elseif (strpos($tag, $blog_url) && (!$my_folder || !strpos($tag, $my_folder))) {
return $original;
}
else {
return "<$tag rel='nofollow'>";
}
}
Gives following output:
[post_content] =>
internal
<a href="http://localhost/mytest/go/hostgator" rel=nofollow>internal cloaked link</a>
<a href="http://cnn.com" rel=nofollow>external</a>
The problem in your original code might have been $rseo which wasn't declared anywhere.
Try this one (PHP 5.3+):
skip selected address
allow manually set rel parameter
and code:
function nofollow($html, $skip = null) {
return preg_replace_callback(
"#(<a[^>]+?)>#is", function ($mach) use ($skip) {
return (
!($skip && strpos($mach[1], $skip) !== false) &&
strpos($mach[1], 'rel=') === false
) ? $mach[1] . ' rel="nofollow">' : $mach[0];
},
$html
);
}
Examples:
echo nofollow('something');
// will be same because it's already contains rel parameter
echo nofollow('something'); // ad
// add rel="nofollow" parameter to anchor
echo nofollow('something', 'localhost');
// skip this link as internall link
Using regular expressions to do this job properly would be quite complicated. It would be easier to use an actual parser, such as the one from the DOM extension. DOM isn't very beginner-friendly, so what you can do is load the HTML with DOM then run the modifications with SimpleXML. They're backed by the same library, so it's easy to use one with the other.
Here's how it can look like:
$my_folder = 'http://localhost/mytest/go/';
$blog_url = 'http://localhost/mytest';
$html = '<html><body>
internal
internal cloaked link
external
</body></html>';
$dom = new DOMDocument;
$dom->loadHTML($html);
$sxe = simplexml_import_dom($dom);
// grab all <a> nodes with an href attribute
foreach ($sxe->xpath('//a[#href]') as $a)
{
if (substr($a['href'], 0, strlen($blog_url)) === $blog_url
&& substr($a['href'], 0, strlen($my_folder)) !== $my_folder)
{
// skip all links that start with the URL in $blog_url, as long as they
// don't start with the URL from $my_folder;
continue;
}
if (empty($a['rel']))
{
$a['rel'] = 'nofollow';
}
else
{
$a['rel'] .= ' nofollow';
}
}
$new_html = $dom->saveHTML();
echo $new_html;
As you can see, it's really short and simple. Depending on your needs, you may want to use preg_match() in place of the strpos() stuff, for example:
// change the regexp to your own rules, here we match everything under
// "http://localhost/mytest/" as long as it's not followed by "go"
if (preg_match('#^http://localhost/mytest/(?!go)#', $a['href']))
{
continue;
}
Note
I missed the last code block in the OP when I first read the question. The code I posted (and basically any solution based on DOM) is better suited at processing a whole page rather than a HTML block. Otherwise, DOM will attempt to "fix" your HTML and may add a <body> tag, a DOCTYPE, etc...
Thanks #alex for your nice solution. But, I was having a problem with Japanese text. I have fixed it as following way. Also, this code can skip multiple domains with the $whiteList array.
public function addRelNoFollow($html, $whiteList = [])
{
$dom = new \DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
$a = $dom->getElementsByTagName('a');
/** #var \DOMElement $anchor */
foreach ($a as $anchor) {
$href = $anchor->attributes->getNamedItem('href')->nodeValue;
$domain = parse_url($href, PHP_URL_HOST);
// Skip whiteList domains
if (in_array($domain, $whiteList, true)) {
continue;
}
// Check & get existing rel attribute values
$noFollow = 'nofollow';
$rel = $anchor->attributes->getNamedItem('rel');
if ($rel) {
$values = explode(' ', $rel->nodeValue);
if (in_array($noFollow, $values, true)) {
continue;
}
$values[] = $noFollow;
$newValue = implode($values, ' ');
} else {
$newValue = $noFollow;
}
// Create new rel attribute
$rel = $dom->createAttribute('rel');
$node = $dom->createTextNode($newValue);
$rel->appendChild($node);
$anchor->appendChild($rel);
}
// There is a problem with saveHTML() and saveXML(), both of them do not work correctly in Unix.
// They do not save UTF-8 characters correctly when used in Unix, but they work in Windows.
// So we need to do as follows. #see https://stackoverflow.com/a/20675396/1710782
return $dom->saveHTML($dom->documentElement);
}
<?
$str='internal
internal cloaked link
external';
function test($x){
if (preg_match('#localhost/mytest/(?!go/)#i',$x[0])>0) return $x[0];
return 'rel="nofollow" '.$x[0];
}
echo preg_replace_callback('/href=[\'"][^\'"]+/i', 'test', $str);
?>
Here is the another solution which has whitelist option and add tagret Blank attribute.
And also it check if there already a rel attribute before add a new one.
function Add_Nofollow_Attr($Content, $Whitelist = [], $Add_Target_Blank = true)
{
$Whitelist[] = $_SERVER['HTTP_HOST'];
foreach ($Whitelist as $Key => $Link)
{
$Host = preg_replace('#^https?://#', '', $Link);
$Host = "https?://". preg_quote($Host, '/');
$Whitelist[$Key] = $Host;
}
if(preg_match_all("/<a .*?>/", $Content, $matches, PREG_SET_ORDER))
{
foreach ($matches as $Anchor_Tag)
{
$IS_Rel_Exist = $IS_Follow_Exist = $IS_Target_Blank_Exist = $Is_Valid_Tag = false;
if(preg_match_all("/(\w+)\s*=\s*['|\"](.*?)['|\"]/",$Anchor_Tag[0],$All_matches2))
{
foreach ($All_matches2[1] as $Key => $Attr_Name)
{
if($Attr_Name == 'href')
{
$Is_Valid_Tag = true;
$Url = $All_matches2[2][$Key];
// bypass #.. or internal links like "/"
if(preg_match('/^\s*[#|\/].*/', $Url))
{
continue 2;
}
foreach ($Whitelist as $Link)
{
if (preg_match("#$Link#", $Url)) {
continue 3;
}
}
}
else if($Attr_Name == 'rel')
{
$IS_Rel_Exist = true;
$Rel = $All_matches2[2][$Key];
preg_match("/[n|d]ofollow/", $Rel, $match, PREG_OFFSET_CAPTURE);
if( count($match) > 0 )
{
$IS_Follow_Exist = true;
}
else
{
$New_Rel = 'rel="'. $Rel . ' nofollow"';
}
}
else if($Attr_Name == 'target')
{
$IS_Target_Blank_Exist = true;
}
}
}
$New_Anchor_Tag = $Anchor_Tag;
if(!$IS_Rel_Exist)
{
$New_Anchor_Tag = str_replace(">",' rel="nofollow">',$Anchor_Tag);
}
else if(!$IS_Follow_Exist)
{
$New_Anchor_Tag = preg_replace("/rel=[\"|'].*?[\"|']/",$New_Rel,$Anchor_Tag);
}
if($Add_Target_Blank && !$IS_Target_Blank_Exist)
{
$New_Anchor_Tag = str_replace(">",' target="_blank">',$New_Anchor_Tag);
}
$Content = str_replace($Anchor_Tag,$New_Anchor_Tag,$Content);
}
}
return $Content;
}
To use it:
$Page_Content = 'internal
internal
google
example
stackoverflow';
$Whitelist = ["http://yoursite.com","http://localhost"];
echo Add_Nofollow_Attr($Page_Content,$Whitelist,true);
WordPress decision:
function replace__method($match) {
list($original, $tag) = $match; // regex match groups
$my_folder = "/articles"; // re-add quirky config here
$blog_url = 'https://'.$_SERVER['SERVER_NAME'];
if (strpos($tag, "nofollow")) {
return $original;
}
elseif (strpos($tag, $blog_url) && (!$my_folder || !strpos($tag, $my_folder))) {
return $original;
}
else {
return "<$tag rel='nofollow'>";
}
}
add_filter( 'the_content', 'add_nofollow_to_external_links', 1 );
function add_nofollow_to_external_links( $content ) {
$content = preg_replace_callback('~<(a\s[^>]+)>~isU', "replace__method", $content);
return $content;
}
a good script which allows to add nofollow automatically and to keep the other attributes
function nofollow(string $html, string $baseUrl = null) {
return preg_replace_callback(
'#<a([^>]*)>(.+)</a>#isU', function ($mach) use ($baseUrl) {
list ($a, $attr, $text) = $mach;
if (preg_match('#href=["\']([^"\']*)["\']#', $attr, $url)) {
$url = $url[1];
if (is_null($baseUrl) || !str_starts_with($url, $baseUrl)) {
if (preg_match('#rel=["\']([^"\']*)["\']#', $attr, $rel)) {
$relAttr = $rel[0];
$rel = $rel[1];
}
$rel = 'rel="' . ($rel ? (strpos($rel, 'nofollow') ? $rel : $rel . ' nofollow') : 'nofollow') . '"';
$attr = isset($relAttr) ? str_replace($relAttr, $rel, $attr) : $attr . ' ' . $rel;
$a = '<a ' . $attr . '>' . $text . '</a>';
}
}
return $a;
},
$html
);
}

Categories