JavaScript regular expression to replace HTML anchors - php

I got a HTML string and from this I want to convert some special a tags to something else. I need this for a TinyMCE plugin. I tried to change Wordpress wpgallery plugin.
For example: These are in HTML string
Yahoo
Google
<a href="#" rel='special' title='link cat_id="4" content_id="5" content_slug="Slug 1"'>Some where else</a>
Here I have to find special link one and convert it to something else from it's title value
like:
{link cat_id="4" content_id="5" content_slug="Slug 1"}
i need return value like this to insert it into MySQL
Yahoo
Google
{link cat_id="4" content_id="5" content_slug="Slug 1"}
I tried this
function getAttr(s, n) {
n = new RegExp(n + '="([^"]+)"', 'g').exec(s);
return n ? tinymce.DOM.decode(n[1]) : '';
};
return co.replace(/[^<]*(<a href="([^"]+)">([^<]+)<\/a>)/g, function(a,im) {
var cls = getAttr(im, 'rel');
if ( cls.indexOf('special') != -1 )
return '{'+tinymce.trim(getAttr(im, 'title'))+'}';
return a;
});
this
/[^<]*(<a href="([^"]+)">([^<]+)<\/a>)/g
does not find tags with rel eq to 'special' but all the others.

You might want to look into the DOMDocument and related classes. They are much better at parsing HTML than a homebrewed regex solution would be.
You can create a DOMdocument using your supplied markup, execute getElementsByTagName to get all the hyperlinks, scan their attributes for a rel attribute with the value of special, and then take the appropriate action.

Related

PHP Regex replace link if it does not have data attribute

I need to loop through a bunch of HTML code and remove the <a> </a> tags from all links which DONT include the data attribute data-link="keepLink"
Here is an example of body value I need to modify:
<p><a data-link=\"keepLink\" href=\"[1|9999|16|191967|256]\">Daily Racing Link</a></p>\r\n<br>\n <strong>OFFER – Get up to a £400 deposit bonus when you sign up with Fanduel.</strong>
After the modification I need it to look like (so the offer link is removed):
<p><a data-link=\"keepLink\" href=\"[1|9999|16|191967|256]\">Daily Racing Link</a></p>\r\n<br>\n <strong>OFFER – Get up to a £400 deposit bonus when you sign up with Fanduel.</strong>
So far I have managed to get the first half of the link removing if it doesn't include a data-link="keepLink" attribute. But the closing </a> is still present.
Here is the regex I have used:
$result["body_value"] = preg_replace('/<a (?![^>]*data-link="keepLink").*?>/i', '', $result["body_value"]);
So the new body value looks like:
<p><a data-link=\"keepLink\" href=\"[1|9999|16|191967|256]\">Daily Racing Link</a></p>\r\n<br>\n <strong>OFFER – Get up to a £400 deposit bonus when you sign up with Fanduel</a>.</strong>
The DOMDocument extension is available by default in PHP. It is presumably faster and is designed exactly for what you are trying to achieve. You can use it to load your document and search for any links without a data-link attribute like this:
$dom = new DOMDocument;
$dom->loadHTMLFile('http://www.example.com'); // load the file
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a[not(#data-link=\'keepLink\')]'); // search for links that do not have the 'data-link' attribute set to 'keepLink'
foreach($nodes as $element){
$textInside = $element->nodeValue; // get the text inside the link
$parentNode = $element->parentNode; // save parent node
$parentNode->replaceChild(new DOMText($textInside), $element); // remove the element
}
$myNewHTML = $dom->saveHTML(); // see http://php.net/manual/ro/domdocument.savehtml.php for limitations such as auto-adding of doc-type
echo $myNewHTML;
Proof of concept: https://3v4l.org/ejatQ.
Please bear in mind that this will take only the text values inside the elements without a data-link='keepLink' attribute value.
If you are set on regex and don't want to use a parser.
Try this
<a (?!data-link=)[^>]*>((?!<\/a>).*?)<\/a>
And replace it by $1. To keep your link-text.
See https://regex101.com/r/wKQk4p/2
Please say if you need any further explaination.

Retrieve HTML from database, and format it as html instead of plain text

I have a database query that returns the raw HTML for a page, but if I use it on my page, it gets shown as plain text (of course). How would I format it as HTML so that it uses the tags and such.
An example of what I have in my database:
<div class="test">SOME TEXT HERE</div>
But it is also displayed like that. I would like it to format the text as if it was HTML. So it would just display:
SOME TEXT HERE
But that it would also be in a div with the class: "test"
What would be the best approach to reach this goal?
Im using Twig in the MVC model to render the page. So the page renderer is like this
public function renderArticle() {
$twig = new TwigHelper();
$args['title'] = "Artikel $this->articleId";
$args['blogHTML'] = BlogController::retrieveBlogHTML($this->articleId);
echo $twig->render('article.twig', $args);
}
And the "BlogController::retrieveBlogHTML" goes like this:
public static function retrieveBlogHTML($id) {
$db = DatabaseHelper::get();
$st = $db->prepare("SELECT PageHTML FROM T_Blog WHERE BlogId = :BlogId");
$st->execute([
':BlogId' => $id,
]);
if ($st->errorCode() !== \PDO::ERR_NONE) {
return null;
}
return $st->fetchAll();
}
This means that I will not be able to use JavaScript at this point in time, if that will be the only way to fix the problem i'll have to build a workaround.
So I dont know if I accidently escape too or something along those lines, but im not using any headers.
You need to escape the html characters (so < becomes < for example).
In javascript you can use the HE library or theres this function, which is generally fine, but doesn't cover all possible cases that the HE library does
var encodedStr = rawStr.replace(/[\u00A0-\u9999<>\&]/gim, function(i) {
return '&#'+i.charCodeAt(0)+';';
});
If your using php you can use htmlentities, other languages will have a similar function either inbuilt or provided via a library.

use selector search on html code(string) on PHP variable or ways alike

what im currently doing is i have a text area for user to copy and paste the html code.
i want to get a certain element of that html file.
in pure html, this can be done via jquery selector
but i think its a whole different thing when html code is on a variable and considered as a string.
how can i get a certain element location in that way?
code is:
function searchHtml() {
$html = $_POST; // text area input contains html code
$selector = "#rso > div > div > div:nth-child(1) > div > h3 > a"; //example - the a element with hello world
$getValue = getValueBySelector($selector); //will return hello world
}
function getValueBySelector($selector) {
//what will i do here?
}
searchHtml();
You can look at SimpleHTMLDom Parser (manual at http://simplehtmldom.sourceforge.net/manual.htm). This is a powerful tool to parse the HTML code to find and extract various elements and their attribute.
For your particular case, you can use
// Create a DOM object from the input string
$htmlDom = str_get_html($html);
// Find the required element
$e = $htmlDom->find($selector);
Oh, and you've to pass the provided input value to the getValueBySelector() function :-)

php dom parser return parent and child

I think this is a simple question but I can't sort it, I am trying to get all heading tags with the simple php DOM parser, my code works only one way, example
$heading['h2']=$html->find('h2 a');//works fine
I have found some sites wrap the h2 within the a tag like this
<a href='#'><h2> my heading</h2></a>
The problem is trying to get both tags so I can display the link with it. So when I do this
$heading['h2']=$html->find('a h2');
I get the h2 fine but it will not wrap the link tag around it, which of course makes sense, find all h2 tags that are children of a but how do I get the entire parent tag, I hope that makes sense, what I want it to return is
<h2>My Headings</h2>
then I can just print the output with
echo $headings['h2']; //and the link with be there
If the <a href="[..]"> ist just the outer element, you can do it like this:
$heading['h2']=$html->find('a h2');
foreach ($heading['h2'] as $h2) {
echo $h2->parent(), "\n";
}
You could also go up the DOM tree until you reach an <a> tag:
$heading['h2']=$html->find('a h2');
foreach ($heading['h2'] as $h2) {
$a = $h2;
while ($a && $a->tag != "h2") $a = $a->parent();
if (!$a) continue; // no <a> above <h2>
echo $a, "\n";
}
Well my first thought we be to use
$html->find('a');
But I'm guessing you have multiple links on your page. So the correct practice would then be to use an ID (or a class) to identify your link
<h2> my heading</h2>
And then search for that specific ID:
$html->find('a#titleLink');
I don't know what library you're using and what syntax it supports, but I hope you get the idea anyway.
According to docs: $heading['h2']=$html->find('a > h2')->parent(); would return the anchor tag wrapping the h2, but if you have multiple 'a > h2' in the page, the find function will return an array, so try it and/or use foreach.
$info = $html->find('a,h2');
echo '<a href='.$info[0]->href.'>'.$info[1]->innertext.'</a>';

what is {VARIABLE} in HTML means, and how to initialize it?

I have recently reading someone code. In his code I see a weird html text written like {VARIABLE} . What is that syntax mean? and how to create it? Thanks
In PHP, there's something called "Complex (curly) syntax" (look for this deeper in the page) where you inject variable's values into strings using {} instead of cutting and concatenating the string.
A similar answer can be found here
Another case is that the HTML could contain that text is when it is used as a template, like this one in CodeIgniter.
You don't initialize it. It's part of their templating engine.
Regardless of how they are doing it, the idea is to find/replace "{VAR}" with the actual data you want.
var songTemplate = "<li class=\"track\"><span class=\"num\">{{TRACKNUM}}.</span>" +
"<span class=\"title\">{{TITLE}}</span>" +
"<span class=\"duration\">{{DURATION}}</span></li>";
var songs = [ { tracknum : 1, title : "Speak to Me/Breathe", duration : "4:13" },
{ tracknum : 2, title : "On the Run", duration : "3:36" },
{ tracknum : 3, title : "Time", duration : "7:01" } ];
function makeTrack (song, template) {
var track = "";
track = template.replace("{{TRACKNUM}}", song.tracknum);
track = template.replace("{{TITLE}}"), song.title);
track = template.replace("{{DURATION}}", song.duration);
return track;
}
function trackList (songs, template) {
var list = "<ul class=\"tracklist\">";
songs.forEach(function (song) {
list += makeTrack(song, template);
});
list += "</ul>";
return list;
}
var songlist = trackList(songs, songTemplate);
parentEl.innerHTML = songlist;
The basic idea, regardless of what language is used to template it, is that you start with a string of HTML, pull out what you know you want to replace, and put in the data that you want.
I've shown you an ugly, ugly template (it'd be better if I only had to write in an array of variable names, and it did the rest... ...or if it looked through the string to find {{X}} and then looked through an object for the right value to replace what it found).
This also has security holes, if you don't control both the template and the data (if you allow for end-user input anywhere on your site, then you don't have control).
But this should be enough to show how templates do what they do, and why.

Categories