I have been trying to remove junk character from a stream of html strings using PHP but haven't been successfull yet. Is there any special syntax or logics to remove special character from the string?
I had tried this so far, but ain't working
$new_string = preg_replace("�", "", $HtmlText);
echo '<pre>'.$new_string.'</pre>';
\p{S}
You can use this.\p{S} matches math symbols, currency signs, dingbats, box-drawing characters, etc
See demo.
https://www.regex101.com/r/rK5lU1/30
$re = "/\\p{S}/i";
$str = "asdas�sadsad";
$subst = "";
$result = preg_replace($re, $subst, $str);
This is due to mismatch in Charset between database and front-end. Correcting this will fix the issue.
function clean($string) {
return preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
}
Related
$string = #iconv("UTF-8", "UTF-8", $string);
I'm using this code to replace Unicode characters in my string, but actually what this does is remove all characters after the first Unicode sign in the string. Is there any other function to helps me to do this?
I suggest doing this with preg_replace like this:
preg_replace('/[\x00-\x1F\x7F]/u', '', $string);
or even better:
preg_replace('/[\x00-\x1F\x7F\xA0]/u', '', $string);
If the above does not work for your case, this might:
preg_replace( '/[^[:cntrl:]]/', '',$string);
There is also the option to filter what you need instead of removing what you do not. Something like this should work:
filter_var($string, FILTER_UNSAFE_RAW, FILTER_FLAG_STRIP_LOW|FILTER_FLAG_STRIP_HIGH);
I'm making a function that that detect and remove all trailing special characters from string. It can convert strings like :
"hello-world"
"hello-world/"
"hello-world--"
"hello-world/%--+..."
into "hello-world".
anyone knows the trick without writing a lot of codes?
Just for fun
[^a-z\s]+
Regex demo
Explanation:
[^x]: One character that is not x sample
\s: "whitespace character": space, tab, newline, carriage return, vertical tab sample
+: One or more sample
PHP:
$re = "/[^a-z\\s]+/i";
$str = "Hello world\nhello world/\nhello world--\nhellow world/%--+...";
$subst = "";
$result = preg_replace($re, $subst, $str);
try this
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
or escape apostraphe from string
preg_replace('/[^A-Za-z0-9\-\']/', '', $string); // escape apostraphe
You could use a regex like this, depending on your definition of "special characters":
function clean_string($input) {
return preg_replace('/\W+$/', '', $input);
}
It replaces any characters that are not a word character (\W) at the end of the string $ with nothing. \W will match [^a-zA-Z0-9_], so anything that is not a letter, digit, or underscore will get replaced. To specify which characters are special chars, use a regex like this, where you put all your special chars within the [] brackets:
function clean_string($input) {
return preg_replace('/[\/%.+-]+$/', '', $input);
}
This one is what you are looking for. :
([^\n\w\d \"]*)$
It removes anything that is not from the alphabet, a number, a space and a new line.
Just call it like this :
preg_replace('/([^\n\w\s]*)$/', '', $string);
I'm trying to get words from string in php using preg_split like this:
$result = preg_split('/[^A-Za-z]+/', $text)
but this doesn't work, some words are split,
what am I doing wrong?
Edit: the fact is it doesn't work with russian text = "фыва ывафы фываф";
$result = preg_split('/[^А-яа-я]+/', $text)
[^A-Za-z] only takes ASCII letters into account. You need to split on Unicode non-letters:
$result = preg_split('/\P{L}+/u', $subject);
[^А-Яа-я]+ won't work either because in the Unicode character set, А (0x0410) is not the first Kyrillian letter, and я (0x044F) is not the last one. It appears these honors go to Ё (0x0401) and ӹ (0x04F9). I don't know Russian at all, so I can't speculate on why this is so.
You can check this easily using your character map program:
$str ="As sdf fdasf";
$result = preg_split('/[\b ]/', $str);
edit:
$result = preg_split('/\b\s+/', $str); //this is not for Unicode
I have to remove and replace ASCII newline characters (nl) and carrier return characters (cr) from a php string.
I tried using following statement to replace all ASCII (nl) char from $input with blank space but didn't work:
preg_replace('/[\x0a]+/',' ',$input);
then i tried to replace all the ASCII control characters with blank spaces, following is the statement:
ereg_replace('[[:cntrl:]]', ' ', $encoded); // didn't work
I tried the following statements also but no luck with them:
ereg_replace("[:cntrl:]", "", $pString);
preg_replace('/[\x00-\x1F\x7F]/', '', $input);
preg_replace('/[\x00-\x09\x0B\x0C\x0E-\x1F\x7F]/', '', $input);
What is the regex expression to remove ASCII newline characters (nl) and carrier return characters (cr) from a php string?
I referred to few link below :
ASCII Table
Regular Expressions
Regular expression posix
Can't you just use str_replace?
str_replace( array("\n", "\r"), "", $stringinput );
Why use a regexp? What's wrong with
str_replace(array("\n", "\r"), "", $string);
? In PHP, the characters \n and \r are guaranteed to be the actual newline and carriage return points: http://php.net/manual/en/language.types.string.php
if you insist using preg_replace() for such a simple task you can use:
$result = preg_replace('/[\r\n]/', '', $subject);
Although, you should use str_replace(array("\n", "\r"), "", $string); as advised previoulsy.
I've got text from which I want to remove all characters that ARE NOT the following.
desired_characters =
0123456789!&',-./abcdefghijklmnopqrstuvwxyz\n
The last is a \n (newline) that I do want to keep.
To match all characters except the listed ones, use an inverted character set [^…]:
$chars = "0123456789!&',-./abcdefghijklmnopqrstuvwxyz\n";
$pattern = "/[^".preg_quote($chars, "/")."]/";
Here preg_quote is used to escape certain special characters so that they are interpreted as literal characters.
You could also use character ranges to express the listed characters:
$pattern = "/[^0-9!&',-.\\/a-z\n]/";
In this case it doesn’t matter if the literal - in ,-. is escaped or not. Because ,-. is interpreted as character range from , (0x2C) to . (0x2E) that already contains the - (0x2D) in between.
Then you can remove those characters that are matched with preg_replace:
$output = preg_replace($pattern, "", $str);
$string = 'This is anexample $tring! :)';
$string = preg_replace('/[^0-9!&\',\-.\/a-z\n]/', '', $string);
echo $string; // hisisanexampletring!
^ This is case sensitive, hence the capital T is removed from the string. To allow capital letters as well, $string = preg_replace('/[^0-9!&\',\-.\/A-Za-z\n]/', '', $string)