PHP regex replace with diffrent action for each result - php

In PHP I'm trying to replace all iframe tags with paragraph tag that include the ifame tag.
in other words im trying to surround the iframe tag with <p> tag that have a random number.
everything is working fine except when $record contains more than one iframe tag, in that case it would give the same paragraph number for the all the <p> tags.
here is my code:
$x = rand(1, 99);
$replacement = '<p' . $x . '>$1</p' . $x . '><br>';
$record = preg_replace("/(<iframe.*<\/iframe>)/U", $replacement, $record);
i want to give a unique number for for the tag for each iframe tag
ex:
<p1><iframe>sometext</iframe></p1>
<p2><iframe>sometext</iframe></p2>

$s = <<<'HTML'
<p1><iframe>sometext1</iframe></p1>
<p2><iframe>sometext2</iframe></p2>
HTML;
$re = "/(<iframe[^>]*>.*?<\/iframe>)/U";
echo preg_replace_callback($re, function ($a) {
$x = rand(1, 99);
return '<p' . $x . '>'.$a[1].'</p' . $x . '><br>';
}, $s);

Related

PHP strip_tags html validation and bracket check?

I use at the moment strip_tags($content, '<a>') tag to clear html tags except <a> tag.
Example 1: Example "lorem ipsum dolor <sit amet....." it cuts everything after "<"
Example 2: If the content starts with "<test lorem ipsum" I get only empty string.
I tried to check it with regex but the outcome is the same.
preg_replace('/<[^>]*>/', '', $content) it returns the same result for validation.
I need somehow to clear html and keep correct using of "<" bracket inside the content.
If you want to clear every tag except plain <a> and </a>, you could just filter them, replace them, then clear the HTML and replace them back, like this:
<?php
$text = "<a> ahahahasjusjhcbzdeu <div>JEY ssjisuj</div>jn<p> here somehing else </p></a>";
$EndText = str_replace("<a>", "&ATL", $text);
$EndText = str_replace("</a>", "&ATR", $EndText);
$EndText = strip_tags($EndText);
$EndText = str_replace("&ATL", "<a>", $EndText);
$EndText = str_replace("&ATR", "</a>", $EndText);
echo htmlspecialchars($EndText);
?>
But if you want to get something like here , the link would get deleted, too.
So you need to filter the text between <a and > out (that can be done with explode, sub_str and str_replace), then do the same as in the solution above and then paste it in again.
A code that would do this is:
<?php
$text = "<a>Here something</a><div>Again<a href='website.com'>That's a better link</a> Here</div>";
$Texts = explode("<a", $text);
$Begin = strip_tags(array_shift($Texts));
$Middles = [];
foreach ($Texts as &$value) {
$Middle = explode(">", $value)[0];
array_push($Middles, $Middle);
$Position = strpos($value, ">");
$value = substr($value, $Position+1);
$value = str_replace("</a>", "&htlENDA&", $value);
$value = strip_tags($value);
}
$EndText = $Begin;
for ($i = 0; $i < count($Texts); $i++) {
$EndText = $EndText."<a".$Middles[$i].">".$Texts[$i];
}
$EndText = str_replace("&htlENDA&", "</a>", $EndText);
echo "<br><br>Ende: ".htmlspecialchars($EndText);
?>
That would solve your problem, as it deletes every html tag except <a ... > and </a>

php remove all attributes from a tag

Here is my code:
$content2= preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $content1);
This code removes all attributes from all tags in my website, but what I want is to only remove attributes from the form tag. This is what I have tried:
$content2 = preg_replace("/<form([a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $content1);
and
$content2 = preg_replace("/<(form[a-z][a-z0-9]*)[^>]*?(\/?)>/i",'<$1$2>', $content1);
This should do it for you.
<?php
$content1 = '<form method="post">test</form><form>2</form><form action=\'test\' method="post" type="blah"><img><b>bold</b></form>';
$content2 = preg_replace("~<form\s+.*?>~i",'<form>', $content1);
echo $content2;
Output:
<form>test</form><form>2</form><form><img><b>bold</b></form>
Explanation and demo: https://regex101.com/r/oA1fV8/1
The \s+ is requiring whitespace after the opening form tag if we have that we presume there is an attribute after so we use .*? which takes everything until the next >. We don't need capture groups because the only thing you want is an empty form element, right?
Answer from a related question:
<?php
function stripArgumentFromTags( $htmlString ) {
$regEx = '/([^<]*<\s*[a-z](?:[0-9]|[a-z]{0,9}))(?:(?:\s*[a-z\-]{2,14}\s*=\s*(?:"[^"]*"|\'[^\']*\'))*)(\s*\/?>[^<]*)/i'; // match any start tag
$chunks = preg_split($regEx, $htmlString, -1, PREG_SPLIT_DELIM_CAPTURE);
$chunkCount = count($chunks);
$strippedString = '';
for ($n = 1; $n < $chunkCount; $n++) {
$strippedString .= $chunks[$n];
}
return $strippedString;
}
?>
Then use call call it like this
$strippedTag = stripArgumentFromTags($initialTag);
Related question with more answers

Multiple occurances of delimeters within a HTML template

I am facing a problem that I can't get my head around. I thought I would turn to the experts once again to shine some light.
I have a HTML template and within the template I have delimiters like:
[has_image]<p>The image is <img src="" /></p>[/has_image]
These delimiters may have multiple occurances within the template and below is what I am trying to achieve:
Find all occurances of these delimiters and replace the content between these delimiters with an image source or replace it empty if image doesn't exist but still keep the value/content of the remaining template.
Below is my code that works only for one occurance but struggling to accomplish it for multiple occurances.
function replace_text_template($template_body, $start_tag, $end_tag, $replacement = ''){
$occurances = substr_count($template_body, $start_tag);
$x = 1;
while($x <= $occurances) {
$start = strpos($template_body, $start_tag);
$stop = strpos($template_body, $end_tag);
$template_body = substr($template_body, 0, $start) . $start_tag . $replacement . substr($template_body, $stop);
$x++;
}
return $template_body;
}
$template_body will have HTML code with delimiters
replace_text_template($template_body, "[has_image]", "[/has_image]");
Whether I remove the while loop it still works for a single delimiter.
I have managed to solve the problem. If anybody finds this useful please feel free to use the code. However, if anyone finds a better way please do share it.
function replace_text_template($template_body, $start_tag, $end_tag, $replacement = ''){
$occurances = substr_count($template_body, $start_tag);
$x = 1;
while($x <= $occurances) {
$start = strpos($template_body, $start_tag);
$stop = strpos($template_body, $end_tag);
$template_body = substr($template_body, 0, $start) . $start_tag . $replacement . substr($template_body, $stop);
$template_body = str_replace($start_tag.''.$end_tag, '', $template_body); // replace the tags so on next loop the position will be correct
$x++;
}
return $template_body;
}
function replace_text_template($template_body, $start_tag, $replacement = '') {
return preg_replace_callback("~\[".preg_quote($start_tag)."\].*?\[\/".preg_quote($start_tag)."\]~i", function ($matches) use ($replacement) {
if(preg_match('~<img.*?src="([^"]+)"~i', $matches[0], $match)) {
if (is_array(getimagesize($match[1]))) return $match[1];
}
return $replacement;
}, $template_body);
}
$template_body = <<<EOL
text
[has_image]<p>The image is <img src="" /></p>[/has_image]
abc [has_image]<p>The image is <img src="http://blog.stackoverflow.com/wp-content/themes/se-company/images/logo.png" /></p>[/has_image]xyz
EOL;
echo replace_text_template($template_body, "has_image", "replacement");
Returns:
text
replacement
abc http://blog.stackoverflow.com/wp-content/themes/se-company/images/logo.pngxyz

How to wrap user mentions in a HTML link on PHP?

Im working on a commenting web application and i want to parse user mentions (#user) as links. Here is what I have so far:
$text = "#user is not #user1 but #user3 is #user4";
$pattern = "/\#(\w+)/";
preg_match_all($pattern,$text,$matches);
if($matches){
$sql = "SELECT *
FROM users
WHERE username IN ('" .implode("','",$matches[1]). "')
ORDER BY LENGTH(username) DESC";
$users = $this->getQuery($sql);
foreach($users as $i=>$u){
$text = str_replace("#{$u['username']}",
"<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
$echo $text;
}
The problem is that user links are being overlapped:
<a rel="11327" class="ct-userLink" href="#">
<a rel="21327" class="ct-userLink" href="#">#user</a>1
</a>
How can I avoid links overlapping?
Answer Update
Thanks to the answer picked, this is how my new foreach loop looks like:
foreach($users as $i=>$u){
$text = preg_replace("/#".$u['username']."\b/",
"<a href='#' title='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
Problem seems to be that some usernames can encompass other usernames. So you replace user1 properly with <a>user1</a>. Then, user matches and replaces with <a><a>user</a>1</a>. My suggestion is to change your string replace to a regex with a word boundary, \b, that is required after the username.
The Twitter widget has JavaScript code to do this. I ported it to PHP in my WordPress plugin. Here's the relevant part:
function format_tweet($tweet) {
// add #reply links
$tweet_text = preg_replace("/\B[#@]([a-zA-Z0-9_]{1,20})/",
"#<a class='atreply' href='http://twitter.com/$1'>$1</a>",
$tweet);
// make other links clickable
$matches = array();
$link_info = preg_match_all("/\b(((https*\:\/\/)|www\.)[^\"\']+?)(([!?,.\)]+)?(\s|$))/",
$tweet_text, $matches, PREG_SET_ORDER);
if ($link_info) {
foreach ($matches as $match) {
$http = preg_match("/w/", $match[2]) ? 'http://' : '';
$tweet_text = str_replace($match[0],
"<a href='" . $http . $match[1] . "'>" . $match[1] . "</a>" . $match[4],
$tweet_text);
}
}
return $tweet_text;
}
instead of parsing for '#user' parse for '#user ' (with space in the end) or ' #user ' to even avoid wrong parsing of email addresses (eg: mailaddress#user.com) maybe ' #user: ' should also be allowed. this will only work, if usernames have no whitespaces...
You can go for a custom str replace function which stops at first replace.. Something like ...
function str_replace_once($needle , $replace , $haystack){
$pos = strpos($haystack, $needle);
if ($pos === false) {
// Nothing found
return $haystack;
}
return substr_replace($haystack, $replace, $pos, strlen($needle));
}
And use it like:
foreach($users as $i=>$u){
$text = str_replace_once("#{$u['username']}",
"<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
You shouldn’t replace one certain user mention at a time but all at once. You could use preg_split to do that:
// split text at mention while retaining user name
$parts = preg_split("/#(\w+)/", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$n = count($parts);
// $n is always an odd number; 1 means no match found
if ($n > 1) {
// collect user names
$users = array();
for ($i=1; $i<$n; $i+=2) {
$users[$parts[$i]] = '';
}
// get corresponding user information
$sql = "SELECT *
FROM users
WHERE username IN ('" .implode("','", array_keys($users)). "')";
$users = array();
foreach ($this->getQuery($sql) as $user) {
$users[$user['username']] = $user;
}
// replace mentions
for ($i=1; $i<$n; $i+=2) {
$u = $users[$parts[$i]];
$parts[$i] = "<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a>";
}
// put everything back together
$text = implode('', $parts);
}
I like dnl solution of parsing ' #user', but maybe is not suitable for you.
Anyway, did you try to use strip_tags function to remove the anchor tags? That way you have the string without the links, and you can parse it building the links again.
strip_tags

Convert clickable anchor tags to plain text in html document

I am trying to match <a> tags within my content and replace them with the link text followed by the url in square brackets for a print-version.
The following example works if there is only the "href". If the <a> contains another attribute, it matches too much and doesn't return the desired result.
How can I match the URL and the link text and that's it?
Here is my code:
<?php
$content = 'This is a text link';
$result = preg_replace('/<a href="(http:\/\/[A-Za-z0-9\\.:\/]{1,})">([\\s\\S]*?)<\/a>/',
'<strong>\\2</strong> [\\1]', $content);
echo $result;
?>
Desired result:
<strong>This is a text link </strong> [http://www.website.com]
You should be using DOM to parse HTML, not regular expressions...
Edit: Updated code to do simple regex parsing on the href attribute value.
Edit #2: Made the loop regressive so it can handle multiple replacements.
$content = '
<p>This is a text link</p>
bah
I wont change
';
$dom = new DOMDocument();
$dom->loadHTML($content);
$anchors = $dom->getElementsByTagName('a');
$len = $anchors->length;
if ( $len > 0 ) {
$i = $len-1;
while ( $i > -1 ) {
$anchor = $anchors->item( $i );
if ( $anchor->hasAttribute('href') ) {
$href = $anchor->getAttribute('href');
$regex = '/^http/';
if ( !preg_match ( $regex, $href ) ) {
$i--;
continue;
}
$text = $anchor->nodeValue;
$textNode = $dom->createTextNode( $text );
$strong = $dom->createElement('strong');
$strong->appendChild( $textNode );
$anchor->parentNode->replaceChild( $strong, $anchor );
}
$i--;
}
}
echo $dom->saveHTML();
?>
You can make the match ungreedy using ?.
You should also take into account there may be attributes before the href attribute.
$result = preg_replace('/<a [^>]*?href="(http:\/\/[A-Za-z0-9\\.:\/]+?)">([\\s\\S]*?)<\/a>/',
'<strong>\\2</strong> [\\1]', $content);

Categories