PHP how to Remove non-language Characters from a String? - php

how can i to remove all characters non-language ?
i want to remove characters like this below, and all other of not language characters:

i using this:
preg_replace("/[^a-z0-9A-Z\-\'\|\!\.\?\:\)\(\;\*\"]/u", " ", $text );
this is good for english,
i need to approve all language characters, like Russian,arabic,hebrew,japan...
Are there any string functions I can use to leave all language characters?
thanks

No regex will be perfect for what you want - language and writing are just too complex for this. But an approximation could be
preg_replace('/[^\p{L}\p{M}\p{Z}\p{N}\p{P}]/u', ' ', $text);
This will replace anything by a space that's not a Unicode character with one of the properties “letter”, “mark”, “separator”, “number” or “punctuation”.

Tim Pietzcker's answer not working in my case.
This works.
$after = preg_replace('/[^\w\s]+/u','' , $before);

Related

Replace All Special Characters Expect Language Specific

Remove everything from the string expect the language-specific special signs and characters etc.
I've been using this method:
$string = preg_replace('/[^A-Za-z0-9\-]/', ' ', $string);
Now it's obvious that it's not working with the following languages:
1. Arabic
2. Hindi
3. With Spanish characters.
And all the languages outside English.
Now my question is simple, what will be the best way to remove all the special characters from the string.
Try this:
$string = "abcßöäü #.,}* हिंदी عربى";
$string = preg_replace('/[^\w0-9 \-]/u', '', $string);
var_dump($string);
//string(28) "abcßöäü हद عربى"
Whether \w works depends on the system configuration.

Replace all kind of dashes

I have a excel document which I import in MySQL using a library.
But some of the texts in the document contain dashes which I though I have replaced, but apparently not all of them.
-, –, - <-all of these are different.
Is there any way I could replace all kind of dahes with this one -
The main problem is that I dont know all of the dashes that exist in computers.
Just use regex with unicode modifier u and a character class:
$output = preg_replace('#\p{Pd}#u', '-', $input);
From the manual : Pd Dash punctuation
Online demo
How about:
$string = str_replace(array('-','–','-','—', ...), '-', $string);
Use the above code and see if it works. If you're still seeing some dashes not being replaced, you can just add them into the array, and it'll work.

Replace all html codes by preg_replace

I want to replace all html codes to empty space. I think I should use preg_replace function, but I'm not sure how should I do that in case when html codes looks in this way:
”
β
$text="β something ” test..."
$text=preg_replace("&# [what should be here?] ;", " ", $text);
echo $text;
result = something test...
I think it should be only numeric, because I found only numeric ones here: http://www.ascii.cl/htmlcodes.htm
You could look at strip_tags which does exactly that. However those arent HTML codes, they are called HTML entities.
The regex to match what you want looks like this:
(&#.+?;)
Its rather simple, look for the &# then any repeated character until ;.
Edit: As Qtax pointed out, they dont have to be numbers. Dot matches all.
HTML character references can be defined in two ways. Assuming that you only want to replace numeric character references, you need a regular expression that parses these formats:
&#D; where D is a decimal number
&#xH; where H is a hexadecimal number
The regex that takes care of both:
/&#(\d+|x[\da-f]+);/i
If you want to replace all HTML entities like &foo; you could use something like:
preg_replace('/&(?:[a-z]+|#x[\da-f]+|#\d+);/i', ' ', $text);
If you want to decode them, use html_entity_decode.
&<something>; is a syntax for HTML entity. If you want to replace all of them, use this regexp:
preg_replace('/&.*?;/', '', $subject); // from ampersand till the next semicolon
It will replace all HTML entities with an empty string, including ä, &x20; and others

Php - Group by similar words

I was just thinking that how could we group by or seperate similar words in PHP or MYSQL. For instance, like i have samsung Glaxy Ace, Is this possible to recognize S120, S-120, s120, S-120.
Is this even possible?
Thanks
What you could do is strip all non alphanumeric characters and spaces, and strtoupper() the string.
$new_string = preg_replace("/[^a-zA-Z0-9]/", "", $string);
$new_string = strtoupper($new_string);
Only those? Easily.
/S-?120/i
But if you want to extend, you'll probably need to move from REGEX to something a little more sophisticated.
The best thing to do here is to pick a format and standardise on it. So for your example, you would just store S120, and when you get a value from a user, strip all non-alphanumeric characters from it and convert it to upper case.
You can do this in PHP with this code:
$result = strtoupper(preg_replace('/(\W|_)+/', '', $userInput));

PHP Regex Problem:

$string1 = preg_replace('/[^A-Za-z0-9äöü!&_=\+-]/', ' ', $string4);
This Regex shouldn't replace the chars äöü.
In Ruby it worked as expected.
But in PHP it replaces also the ä ö and ü.
Can someone give me a hint how to fix it?
Set the u pattern modifier (to tell php to treat the regex as a UTF-8 string).
'/[^A-Za-z0-9äöü!&_=\+-]/u'
i think this should work:
$string1 = preg_replace('/\[^A-Za-z0-9\pL!&_=\+-]/u', ' ', $string4 );
Unicode support is one of the features promised for PHP 6.
Currently in php5
use the multibyte string functions like mb_ereg
PHP will interpret '/regex/u' as a UTF-8 string, with preg_match,preg_replace

Categories