Replace all html codes by preg_replace - php

I want to replace all html codes to empty space. I think I should use preg_replace function, but I'm not sure how should I do that in case when html codes looks in this way:
”
β
$text="β something ” test..."
$text=preg_replace("&# [what should be here?] ;", " ", $text);
echo $text;
result = something test...
I think it should be only numeric, because I found only numeric ones here: http://www.ascii.cl/htmlcodes.htm

You could look at strip_tags which does exactly that. However those arent HTML codes, they are called HTML entities.
The regex to match what you want looks like this:
(&#.+?;)
Its rather simple, look for the &# then any repeated character until ;.
Edit: As Qtax pointed out, they dont have to be numbers. Dot matches all.

HTML character references can be defined in two ways. Assuming that you only want to replace numeric character references, you need a regular expression that parses these formats:
&#D; where D is a decimal number
&#xH; where H is a hexadecimal number
The regex that takes care of both:
/&#(\d+|x[\da-f]+);/i

If you want to replace all HTML entities like &foo; you could use something like:
preg_replace('/&(?:[a-z]+|#x[\da-f]+|#\d+);/i', ' ', $text);
If you want to decode them, use html_entity_decode.

&<something>; is a syntax for HTML entity. If you want to replace all of them, use this regexp:
preg_replace('/&.*?;/', '', $subject); // from ampersand till the next semicolon
It will replace all HTML entities with an empty string, including ä, &x20; and others

Related

str_replace on string only if it INCLUDES a whitespace

I have a substitution code that replaces all instances of XD with an XD smiley face... thing is, should a link include the string 'XD', it then breaks the link.
I want it to only replace the XD, if it is followed by a whitespace, as in 'XD ', except I can't seem to get it to work (tried &nbsp, /\s/ and as in 'XD&nbsp')
Chances are I'm getting something really obvious wrong, but I can't find any help (all of it seems to be about removing whitespace, not requiring it), so I'm hoping someone can help me.
Here's the code for reference:
function BB_CODE($content) {
$content = str_replace("XD", "<img src=\"images/smilies/icon_xd.gif\" alt=\"XD\">", $content);
}
The content is user input. Thanks for any help!
You should surround "XD" with %
$content= str_replace("%XD%", "<img src=\"images/smilies/icon_xd.gif\" alt=\"XD\">", $content);
EDIT :
Or using preg_replace
preg_replace("/XD/", "<img src=\"images/smilies/icon_xd.gif\" alt=\"XD\">", $content);
So you want to replace XD only if it's alone?
preg_replace('/\bXD\b/', '(ಠ‿ಠ)', "Then I was like XD")
Use \b to watch for word boundaries instead of \s. This means it works at the beginning and the end of string too, like in my example.
With preg_replace() there are two common gotchas:
The separator char, I used / here by convention but in my own code I prefer %. You could write the regex as '%\bXD\b%' with the same meaning.
Escaping the backslashes, I used a single quoted string so I don't have to escape the backslash in \b. If you use double qoutes, you have to escape it, like so: "/\\bXD\\b/"

regular expression gone wrong

I want to find all strings looking like [!plugin=tesplugin arg=dfd arg=2!] and put them in array.
Important feature: the string could contain arg=uments or NOT(in some cases). and of course there could be any number of arg's. So the string could look like:
[!plugin=myname!] or [!plugin=whatever1 arg=22!] or even [!plugin=gal-one arg=1 arg=text arg=tx99!]. I need to put them all in $strarray items
Here is what i did...
$inp = "[!plugin=tesplugin arg=dfd!] sometxt [!plugin=second arg=1 arg=2!] 1sd";
preg_match_all('/\[!plugin=[a-z0-9 -_=]*!]/i', $inp, $str);
but $str[0][0] contains:
[!plugin=tesplugin arg=dfd!] sometxt [!plugin=second arg=1 arg=2!]
instead of putting each expression in a new array item..
I think my problem in regex.. but can't find one. Plz help...
The last ] needs to be escaped and the - in the character class needs to be at the start, end, or escaped. As is it is a range of ascii characters between a space and underscore.
\[!plugin=[a-z0-9 \-_=]*!\]
Regex101 Demo: https://regex101.com/r/zV4bO2/1

Encoding SEO friendly URL

I am trying to encode a phrase in order to pass it inside a URL. Currently it works fine with basic words, where spaces are replaces with dashes.
<a href="./'.str_replace(' ', '-', preg_replace("/[^A-Za-z0-9- ]/", '', $phrase)).'">
It produces something like:
/this-is-my-phase
On the page that this URL takes me I am able to replace the dashes with spaces and query my db for this phrase.
The problem I have is if the phrase contains apostrophe. My current script removes it. Is there any way to preserve it or replace with some URL-friendly character to accommodate something like?
this is bob's page
There is a PHP standard library function urlencode() to encode non-alphanumeric characters with %Xxx where xx is the hex value of the character.
If the limitations of that conversion (&, ©, £, etc.), are not acceptable, see rawurlencode().
If you want to allow another character , you have to add it to this section: ^A-Za-z0-9- so if for example you wish to allow ' the regex will be [^A-Za-z0-9-' ]
If you only need to replace all the apostrophes ('), then you can replace it with the URL-encoded character %27:
str_replace("'", "%20", $url);
EDIT
If you want to replace all URL-non-safe character, use a built-in function like in #wallyk's answer. It's much simpler.

How do you loop through a string in php and replace '%' characters that are ONLY followed by another '%' character?

Here is what I am trying to achieve in PHP:
I have this string: host/%%%25asd%%
Now I want to loop through it and replace only the % _blank characters with %25. So I get the output as host/%25%25%25asd%25%25. (The %25 was untouched because the % wasn't followed by another %)
How should I go by doing this? regex? if so do you have an example? or loop through every character in the string and replace? I was thinking about using str_pos for this but it might after one replacement, the positions in the string would change :(
[Edit: Let me add a couple more information to ease up the confusion. %25 is just an example, it could be anything like %30 or %0a, I won't know before hand. Also the string could also be host/%%25asd%% so a simple replace for %% screw it up as host/%2525asd%25 instead of host/%25%25asd%25. What am trying to achieve is to parse a url into how google wants it for their websafe api. http://code.google.com/apis/safebrowsing/developers_guide_v2.html#Canonicalization. If you look at their weird examples.]
Use preg_replace:
$string = preg_replace('/%(?=%)/', '%25', $string);
Note the lookahead assertion. This matches every % that is followed by a % and replaces it with %25.
Result is:
host/%25%25%25asd%25%
EDIT Missed the case for the last %, see:
$string = preg_replace('/%(?=(%|$))/', '%25', $string);
So the lookahead assertion checks the next character for another % or the end of the line.
How about a simple string (non-regex) replace of '%%' by '%25%25'?
This is assuming you indeed want the output to be host/%25%25%25asd%25%25 as you mentioned and not one %25 at the end.
edit: This is another method that might work in your case:
Use the methods urlencode and urldecode, e.g.:
$string = urlencode(urldecode("host/%%%25asd%%"));
use str_replaceenter link description here instead of preg_replace , its a lot easier to apply and clean
How about something like this?
s/%(?![0-9a-f]+)/%25/ig;
$str = 'host/%%%25asd%%';
$str =~ s/ % (?![0-9a-f]+) /%25/xig;
print $str."\n";
host/%25%25%25asd%25%25

How to clean a string by removing anything that is not a letter in PHP

lets say I have an html document
how can I remove every thing from the document
I want to remove the HTML tags
I want to remove any special character
I want to remove everything except letters
and extract the text
Thanks
You can use strip_tags and preg_replace to accomplish this:
function clean($in)
{
// Remove HTML
$out = strip_tags($in);
// Filter all other characters
return preg_replace("/[^a-z]+/i", "", $out);
}
[^a-z] will match any character other than A to Z, the + sign specifies that it should match any sequence length of such characters and the /i-modifier specifies that it's a case insensitive search. All matched characters will be replaced with an empty string leaving only the characters left.
If you want to keep spaces you can use [^a-z ] instead and if you want to keep numbers as well [^a-z0-9 ]. This allows you to whitelist all allowed characters and discard the rest.
Use strip_tags() to get rid of HTML first, then use Emil H's regex.
Prepend a
$in = preg_replace("/<[^>]*>/", "", $in);
to Emil H's solution, so your Tags will get striped. Else, a "<p>Hello World</p>" will appear as "pHelloWorldp"

Categories