Regex to match all characters except letters and numbers - php

I want to clean the filenames of all uploaded files. I want to remove all characters except periods, letters and numbers. I'm not good with regex so I thought I would ask here.
Can someone show me how to put this together? I'm using PHP.

$newfilename=preg_replace('/[^a-zA-Z0-9.]/','',$filename);

s/[^.a-zA-Z\d]//g
(This is a Perl expression of how to use the RegExp. In PHP you do:
$output = preg_replace('/[^.a-zA-Z\d]/', '', $input);

Try to use this:
$cleanString = preg_replace('#\W#', '', $string);
It will remove all but letters and numbers.

Related

How to replace (.) dots by (·) interpuncts in PHP?

There's a French typographic rule to write properly some words in a genderless way, by adding an (·) interpunct between letters.
A few authors on my website are however typing a simple (.) dot instead.
As a solution, I'd like to create a function to replace in PHP strings each dots which are placed between two lowercase letters by interpuncts. But my PHP skills are rather limited… Here is what I'm looking for:
REPLACE THIS:
$string = "T.N.T.: Chargé.e des livreur.se.s."
BY THIS:
$string = "T.N.T.: Chargé·e des livreur·se·s."
Could someone help me please?
Thank you.
Use the preg_replace with pattern to dynamically match 3 groups - two lowercase letters (including special French letters) and dot between, and use the first and third captured group in replacement, along with intepunct:
$string = "T.N.T.: Chargé.e des livreur.se.s.";
$pattern = '/([a-zàâçéèêëîïôûùüÿñæœ])(\.)([a-zàâçéèêëîïôûùüÿñæœ])/';
$replacement = '$1·$3'; //captured first and third group, and interpunct in the middle
//results in "T.N.T.: Chargé·e des livreur·se·s."
$string_replaced = preg_replace($pattern, $replacement, $string);
More about preg_replace:
https://www.php.net/manual/en/function.preg-replace.php
You could use str_replace() if you know the grammar rules surrounding the dots you want to replace. (for instance, if everything between éand e is concerned, then you can do :
$bodytag = str_replace("é.e", "é·e", $sourceText);
But you will always risk some side effects. For instance if there is an acronym you don't want to be replaced with this pattern. I don't think there is any magic way to avoid this.
More specifically
I'd like to create a function to replace in PHP strings each dots which are placed between two lowercase letters by interpuncts.
This can be achieved with preg_replace() and the appropriate REGEX
See this post

using regex for filtering some words in persian in php

I'm working on a script that is going to identify offensive words from text messages. The problem is that sometimes users make some changes in words and make them unidentifiable. my code has to be able to identify those too as far as possible.
First of all I replace all non-alnum chars to spaces.
And then:
I've written two regex patterns.
One to remove repeating characters from string.
for Example: the user has written: seeeeex, it replaces it with sex:
preg_replace('/(.)\1+/', '$1', $text)
this regex works fine for English words but not in Farsi words which is my case.
for example if you write:
امیییییییییین
it does nothing with it.
I also tried
mb_ereg_replace
But it didn't work either.
My other regex is to remove spaces around all one-letter words.
for example: I want it to convert S E X to sex:
preg_replace('/( [a-zA-Zآ-ی] )\1+/', trim('$1'), $text);
This regex doesn't work at all and needs to be corrected.
Thank you for your help
Working with multi-byte characters, you should enable Unicode Aware modifier to change behavior of tokens in order to match right thing. In your first case it should be:
/(.)\1+/u
In your second regex, however, I see both syntax and semantic errors which you would change it to:
/\b(\pL)\s+/u
PHP:
preg_replace('/\b(\pL)\s+/u', '$1', $text);
Putting all together:
$text = 'سسس ککک سسس';
echo preg_replace(['/(.)\1+/u', '/\b(\pL)\s+/u'], '$1', $text); // خروجی میدهد: سکس
Live demo

PHP converting plain text to hashtag link

I am trying to convert user's posts (text) into hashtag clickable links, using PHP.
From what I found, hashtags should only contain alpha-numeric characters.
$text = 'Testing#one #two #three.test';
$text = preg_replace('/#([0-9a-zA-Z]+)/i', '#$1', $text);
It places links on all (#one #two #three), but I think the #one should not be converted, because it is next to another alpha-numeric character, how to adjust the reg-ex to fix that ?
The 3rd one is also OK, it matches just #three, which I think is correct.
You could modify your regex to include a negative lookbehind for a non-whitespace character, like so:
(?<!\S)#([0-9a-zA-Z]+)
Working regex example:
http://regex101.com/r/mR4jZ7
PHP:
$text = preg_replace('/(?<!\S)#([0-9a-zA-Z]+)/', '#$1', $text);
Edit:
And to make the expression compatible with other languages (non-english characters):
(?<!\S)#([0-9\p{L}]+)
Working example:
https://regex101.com/r/Pquem3/1
With uni-code, html encoded safe and joined regexp; ~(?<!&)#([\pL\d]+)~u
Here some's tags like #tag1 #tag2#tag3 etc.
Finally I have found the solution like: facebook or others hashtag to url solutions, it may be help you too. This code also works with unicode. I have used some of Bangla Unicode, let me know other Languages work as well, I think it will work on any language.
$str = '#Your Text #Unicode #ফ্রিকেলস বা #তিল মেলানিনের #অতিরিক্ত উৎপাদনের জন‍্য হয় যা #সূর্যালোকে #বাড়ে';
$regex = '/(?<!\S)#([0-9a-zA-Z\p{L}\p{M}]+)/mu';
$text = preg_replace($regex, '#$1', $str);
echo $text;
To catch the second and third hashtags without the first one, you need to specify that the hashtag should start at the beginning of the line, or be preceded one of more characters of whitespace as follows:
$text = 'Testing#one #two #three.test';
$text = preg_replace('/(^|\s+)#([0-9a-zA-Z]+)(\b|$)/', '$1#$2', $text);
The \b in the third group defines a word boundary, which allows the pattern to match #three when it is immediately followed by a non-word character.
Edit: MElliott's answer above is more efficient, for the record.

Php - Group by similar words

I was just thinking that how could we group by or seperate similar words in PHP or MYSQL. For instance, like i have samsung Glaxy Ace, Is this possible to recognize S120, S-120, s120, S-120.
Is this even possible?
Thanks
What you could do is strip all non alphanumeric characters and spaces, and strtoupper() the string.
$new_string = preg_replace("/[^a-zA-Z0-9]/", "", $string);
$new_string = strtoupper($new_string);
Only those? Easily.
/S-?120/i
But if you want to extend, you'll probably need to move from REGEX to something a little more sophisticated.
The best thing to do here is to pick a format and standardise on it. So for your example, you would just store S120, and when you get a value from a user, strip all non-alphanumeric characters from it and convert it to upper case.
You can do this in PHP with this code:
$result = strtoupper(preg_replace('/(\W|_)+/', '', $userInput));

Regex replace one or two letter words

I am trying to replace one or two letters in a string. Please consider this regex
$str = 'I haven\'t got much time to spend!';
echo preg_replace('/\b([a-z0-9]{1,2})\b/i','',$str);
returns: haven' got much time spend!
expected output: haven't got much time spend!
My goal is remove any one or two characters length words from a string. This can be alphanumeric or special characters.
Use lookarounds:
preg_replace('/(?<!\S)\S{1,2}(?!\S)/', '', $str)
Altho this leaves double whitespace when words are removed. To also remove spaces you could try something like:
preg_replace('/\s+\S{1,2}(?!\S)|(?<!\S)\S{1,2}\s+/', '', $str)
Just use:
echo preg_replace('/(?<!\S)\S{1,2}(?!\S)/i', '', 'a dljlj-b2 adl xy zq a');
The output is as wanted:
dljlj-b2 adl
So don't forget to handle beginning/end of a string by negative assertions.

Categories