Removing excess ">" in string in PHP [duplicate] - php

This question already has answers here:
Remove style attribute from HTML tags
(9 answers)
Closed 9 years ago.
In my code I get a string which have html tags like so:
$string = '<div style="width:100px;">ABC 1234 <span> Test string, testing this string</span></div>';
Now, I removed the style attribute from the said string using preg_replace:
$string = preg_replace('/(<[^>]+) style=".*?"/i', '', $string);
After removing the style tag, I managed to remove the style attribute so the div tag ended up looking like <div>. The problem, I encountered after doing this is that I now get an excess > after the closing tag for the span so the string looks like this now:
$string = '<div>ABC 1234 <span> Test string, testing this string</span> > </div>';
My question is, why did I suddenly get an exccess >? Is there a different regular expression I can use that will get rid of the style attribute without the additional > appearing? Or is there any way I can get ride of this?
I tried using str_replace twice like so:
$string = str_replace("\n", "", $string);
$string = str_replace(">>", ">", $string);
But that did not work either.
I am not trying to remove the HTML tags, just the style part.

Use it only for this string.
<?php
$string = "<div style=\"width:100px;\">ABC 1234 <span> Test string, testing this string</span></div>";
$string = strip_tags($string,"<span>");
$string = "<div>".$string."</div>";
?>
Now the string is:
<div>ABC 1234 <span> Test string, testing this string</span></div>

I used This
$string = '<div style="width:100px;">ABC 1234 <span> Test string, testing this string</span></div>';
$output = preg_replace('/(<[^>]+) style=".*?"/i', '$1', $string);
die(htmlentities($output))
and the output is
<div>ABC 1234 <span> Test string, testing this string</span></div>
as you need

Related

How to preg_match_all to get the text inside the tags "<h3>" and "<h3> <a/> </h3>"

Hello I am currently creating an automatic table of contents my wordpress web. My reference from
https://webdeasy.de/en/wordpress-table-of-contents-without-plugin/
Problem :
Everything goes well unless in the <h3> tag has an <a> tag link. It make $names result missing.
I see problems because of this regex section
preg_match_all("/<h[3,4](?:\sid=\"(.*)\")?(?:.*)?>(.*)<\/h[3,4]>/", $content, $matches);
// get text under <h3> or <h4> tag.
$names = $matches[2];
I have tried modifying the regex (I don't really understand this)
preg_match_all (/ <h [3,4] (?: \ sid = \ "(. *) \")? (?:. *)?> <a (. *)> (. *) <\ / a> <\ / h [3,4]> /", $content, $matches)
// get text under <a> tag.
$names = $matches[4];
The code above work for to find the text that is in the <h3> <a> a text </a> <h3> tag, but the h3 tag which doesn't contain the <a> tag is a problem.
My Question :
How combine code above?
My expectation is if when the first code result does not appear then it is execute the second code as a result.
Or maybe there is a better solution? Thank you.
Here's a way that will remove any tags inside of header tags
$html = <<<EOT
<h3>Here's an alternative solution</h3> to using regex. <h3>It may <a name='#thing'>not</a></h3> be the most elegant solution, but it works
EOT;
preg_match_all('#<h(.*?)>(.*?)<\/h(.*?)>#si', $html, $matches);
foreach ($matches[0] as $num=>$blah) {
$look_for = preg_quote($matches[0][$num],"/");
$tag = str_replace("<","",explode(">",$matches[0][$num])[0]);
$replace_with = "<$tag>" . strip_tags($matches[2][$num]) . "</$tag>";
$html = preg_replace("/$look_for/", $replace_with,$html,1);
}
echo "<pre>$html</pre>";
The answer #kinglish is the base of this solution, thank you very much. I slightly modify and simplify it according to my question article link. This code worked for me:
preg_match_all('#(\<h[3-4])\sid=\"(.*?)\"?\>(.*?)(<\/h[3-4]>)#si',$content, $matches);
$tags = $matches[0];
$ids = $matches[2];
$raw_names = $matches[3];
/* Clean $rawnames from other html tags */
$clean_names= array_map(function($v){
return trim(strip_tags($v));
}, $raw_names);
$names = $clean_names;

PHP can't take <img> tag from page

I have a problem with PHP preg_match function.
In CMS DLE, I try to extract a picture from the news (image-x), but in the module I'm referring to via a direct link.
//remove <p></p> tags
$row[$i]['short_story'] = str_replace( "</p><p>", " ",$row[$i]['short_story'] );
//remove the \" escapes (DLE put it in the MySQL column)
$row[$i]['short_story'] = str_replace("\\\"", " ", $row[$i]['short_story']);
//remove all tags except <img>, but there remains a simple text that is stored without tags
$row[$i]['img'] = strip_tags($row[$i]['short_story'], "<img>");
//try to find <img> (by '>'), to remove the simple text;
preg_match(".*>", $row[$i]['img'], $matches);
// print only <br/> (matches is empty)
print_r($matches."<br/>\n");
for example print_r($row[$i]['img']) is
<img src="somelink" class="fr-fic" fr-dib="" alt=""> Some text
And i need only
<img src="somelink" class="fr-fic" fr-dib="" alt="">
Your regex pattern to selecting <img> is incorrect. Use /<img[^>]+>/ in pattern instead. The code should change to
preg_match("/<img[^>]+>/", $row[$i]['img'], $matches);
Also you can use preg_replace() to removing additional text after <img>
preg_replace("/(<img[^>]+>)[\w\s]+/", "$1", $string)

How to remove plain text from a string after using strip_tags()

So i have a string and I used the strip_tags() function to remove all tags except IMG but I still have plain text next to my IMG element. Here a visual example
$myvariable = "This text needs to be removed<a href='blah_blah_blah'>Blah</a><img src='blah.jpg'>"
So using PHP strip_tags() I was able to remove all tags except the <img> tag (which is what I want). But the thing is now it didn't remove the text.
How do I remove the left over text? Text will always either before tag or after tag as well
[ADDED MORE DETAILS]
$description = 'crazy stuff<img src="https://scontent.cdninstagram.com/t51.2885-15/e15/14287934_1389514537744146_673363238_n.jpg?ig_cache_key=MTMzNzM3MzgwNjAyNDY5NDAzMA%3D%3D.2">';
that's what the variable is actually holding.
Thanks in Advance
Instead of replacing something you can very well extract the values you want:
(<(\w+).+</\2>)
To be used with preg_match(), see a demo on regex101.com.
IN PHP:
<?php
$regex = '~(<(\w+).+</\2>)~';
$string = 'crazy stuff<img src="https://scontent.cdninstagram.com/t51.2885-15/e15/14287934_1389514537744146_673363238_n.jpg?ig_cache_key=MTMzNzM3MzgwNjAyNDY5NDAzMA%3D%3D.2">here as well';
if (preg_match($regex, $string, $match)) {
echo $match[1];
}
?>
Please show your whole piece of code with the use of strip_tags.
You can try: preg_replace('~.*(<img[^>]+>)~', '$1', $myvariable);

Regex exclude character between string tag [duplicate]

This question already has answers here:
Find an element by id and replace its contents with php
(3 answers)
Closed 8 years ago.
So i have a string that i want to search through using regex, and not any other method like domDocument etc.
Example:
<div class="form-item form-type-textarea form-item-answer2">
<div class="form-textarea-wrapper resizable"><textarea id="edit-answer2" name="answer2" cols="60" rows="5" class="form-textarea">
this is some text
</textarea>
</div>
</div>
Desired:
this is some text
So what i want to do from this is using 1 regex line be left with 'this is some text', which is not fixed and will be dynamic. I will then pass this through a preg_replace to get desired outcome.
Current regex is
div class="form-item.*class="form-textarea">$\A<\/textarea>.*<\/div>/gU
I have tried using the end of string and start of string anchors, but to no avail.
Don't parse HTML with regexes. Use a DOM parser:
$doc = new DOMDocument();
$doc->loadHTML($html);
$textarea = $doc->getElementById("edit-answer2");
echo $textarea->nodeValue;
if you want to modify the value:
$textarea->nodeValue = "foo bar";
$html = $doc->saveHTML();
Your regex would be,
/<textarea id[^>]*>\n([^\n]*)/gs
DEMO
OR
/<textarea id[^>]*>(.*?)(?=<\/textarea>)/gs
DEMO
Captured group1 conatins the string this is some text
OR
you could use the below regex to match only the string this is some text.
/div class="form-item.*class="form-textarea">[^\n]*\n\K[^\n]*/s
DEMO

preg_replace only OUTSIDE tags ? (... we're not talking full 'html parsing', just a bit of markdown)

What is the easiest way of applying highlighting of some text excluding text within OCCASIONAL tags "<...>"?
CLARIFICATION: I want the existing tags PRESERVED!
$t =
preg_replace(
"/(markdown)/",
"<strong>$1</strong>",
"This is essentially plain text apart from a few html tags generated with some
simplified markdown rules: <a href=markdown.html>[see here]</a>");
Which should display as:
"This is essentially plain text apart from a few html tags generated with some simplified markdown rules: see here"
... BUT NOT MESS UP the text inside the anchor tag (i.e. <a href=markdown.html> ).
I've heard the arguments of not parsing html with regular expressions, but here we're talking essentially about plain text except for minimal parsing of some markdown code.
Actually, this seems to work ok:
<?php
$item="markdown";
$t="This is essentially plain text apart from a few html tags generated
with some simplified markdown rules: <a href=markdown.html>[see here]</a>";
//_____1. apply emphasis_____
$t = preg_replace("|($item)|","<strong>$1</strong>",$t);
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=
// <strong>markdown</strong>.html>[see here]</a>"
//_____2. remove emphasis if WITHIN opening and closing tag____
$t = preg_replace("|(<[^>]+?)(<strong>($item)</strong>)([^<]+?>)|","$1$3$4",$t);
// this preserves the text before ($1), after ($4)
// and inside <strong>..</strong> ($2), but without the tags ($3)
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=markdown.html>
// [see here]</a>"
?>
A string like $item="odd|string" would cause some problems, but I won't be using that kind of string anyway... (probably needs htmlentities(...) or the like...)
You could split the string into tag‍/‍no-tag parts using preg_split:
$parts = preg_split('/(<(?:[^"\'>]|"[^"<]*"|\'[^\'<]*\')*>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
Then you can iterate the parts while skipping every even part (i.e. the tag parts) and apply your replacement on it:
for ($i=0, $n=count($parts); $i<$n; $i+=2) {
$parts[$i] = preg_replace("/(markdown)/", "<strong>$1</strong>", $parts[$i]);
}
At the end put everything back together with implode:
$str = implode('', $parts);
But note that this is really not the best solution. You should better use a proper HTML parser like PHP’s DOM library. See for example these related questions:
Highlight keywords in a paragraph
Regex / DOMDocument - match and replace text not in a link
First replace any string after a tag, but force your string is after a tag:
$t=preg_replace("|(>[^<]*)(markdown)|i",'$1<strong>$2</strong>',"<null>$t");
Then delete your forced tag:
$show=preg_replace("|<null>|",'',$show);
You could split your string into an array at every '<' or '>' using preg_split(), then loop through that array and replace only in entries not beginning with an '>'. Afterwards you combine your array to an string using implode().
This regex should strip all HTML opening and closing tags: /(<[.*?]>)+/
You can use it with preg_replace like this:
$test = "Hello <strong>World!</strong>";
$regex = "/(<.*?>)+/";
$result = preg_replace($regex,"",$test);
actually this is not very efficient, but it worked for me
$your_string = '...';
$search = 'markdown';
$left = '<strong>';
$right = '</strong>';
$left_Q = preg_quote($left, '#');
$right_Q = preg_quote($right, '#');
$search_Q = preg_quote($search, '#');
while(preg_match('#(>|^)[^<]*(?<!'.$left_Q.')'.$search_Q.'(?!'.$right_Q.')[^>]*(<|$)#isU', $your_string))
$your_string = preg_replace('#(^[^<]*|>[^<]*)(?<!'.$left_Q.')('.$search_Q.')(?!'.$right_Q.')([^>]*<|[^>]*$)#isU', '${1}'.$left.'${2}'.$right.'${3}', $your_string);
echo $your_string;

Categories