Regex replace one or two letter words - php

I am trying to replace one or two letters in a string. Please consider this regex
$str = 'I haven\'t got much time to spend!';
echo preg_replace('/\b([a-z0-9]{1,2})\b/i','',$str);
returns: haven' got much time spend!
expected output: haven't got much time spend!
My goal is remove any one or two characters length words from a string. This can be alphanumeric or special characters.

Use lookarounds:
preg_replace('/(?<!\S)\S{1,2}(?!\S)/', '', $str)
Altho this leaves double whitespace when words are removed. To also remove spaces you could try something like:
preg_replace('/\s+\S{1,2}(?!\S)|(?<!\S)\S{1,2}\s+/', '', $str)

Just use:
echo preg_replace('/(?<!\S)\S{1,2}(?!\S)/i', '', 'a dljlj-b2 adl xy zq a');
The output is as wanted:
dljlj-b2 adl
So don't forget to handle beginning/end of a string by negative assertions.

Related

PHP RexExp match and substitute

I am testing RegExp with online regexr.com tool. I will test string with multiple cases, but I can't get substitution to work.
RexEx for matching string is:
/^[0-9]{1,3}[0-9]{6,7}$/
Which matches local mobile number in my country like this:
0921234567
But then I want to substitute number in this way: add "+" sign, add my country code "123", add "." sign, and then finaly, add matched number with stripped leading zero.
Final number will be:
+385.921234567
I have basic idea to insert matched string, but I am not sure how prepend characters, and strip zero from matched string in following substitution pattern:
\+$&\n\t
I will use PHP preg_replace function.
EDIT:
As someone mentioned wisely, there is posibility that there will be one, two or none of zeros, but I will create separate test cases with regex just testing number of zeroes. Doing so in one regex seems to complicated for now.
Possible numbers will be:
0921234567
00111921234567
Where 111 is country code. I know that some country codes consist of 2 or 3 digits, but I will create special cases, for most country codes.
You can use this preg_replace to strip optional zeroes from start of your mobile #:
$str = preg_replace('~^0*(\d{7,9})$~', '+385.$1', $str);
^[0-9]([0-9]{1,2}[0-9]{6,7})$
You just need to add groups.Replace by +385.$1.See demo.
https://regex101.com/r/cJ6zQ3/22
$re = "/^[0-9]([0-9]{1,2}[0-9]{6,7})$/m";
$str = "0921234567\n";
$subst = "+385.$1";
$result = preg_replace($re, $subst, $str);
I would use a 2-step solution:
Check if we match the main regex
Replace the number by pre-pending + + country code + . + number without leading zeros.
PHP code:
$re = "/^[0-9]{7,10}$/";
$str = "0921234567";
if (preg_match($re, $str, $match)) {
echo "+385." . preg_replace('/^0+/', '', $match[0]);
}
Note that splitting out character class in your regex pattern makes no sense when not using capture groups. ^[0-9]{7,10}$ is the same then as ^[0-9]{1,3}[0-9]{6,7}$, meaning match 7 to 10 digits from start to end of the string.
Leading zeros are easily trimmed from the start with /^0+/ regex.

removing letter followed by three of four numbers in php

I'm trying to remove part of string in my php script. The strings will be similar to this one:
Samsung I8730 Galaxy Express
I need to remove part "I8730", this will be used on other models like "i9500", "B2100", etc. etc.
Please assist with some preg_replace pattern or something that will fix this problem.
Thanks.
A letter is [A-Za-z]. A number is \d.
/[A-Za-z]\d{3,4}/
First explode the string using on " ".
And then use preg_match() for the following regex.
^[a-zA-Z][0-9]{4}$
Use a regex and replace by nothing, this one matches if string has at least 1 char and 1 number:
$string = preg_replace('/[A-Za-z]+[0-9]+/', '', $string);
assume that after character there can be 2-5 intergers
$str = 'Samsung I8730 Galaxy Express';
echo preg_replace( '/([a-zA-Z][0-9]{2,5} )/','', $str);
This might helpful

How to remove special/accented characters and words with digits?

I am trying to create slugs. My string is like this: $string='möbel#*-jérôme-mp3-how?';
Step: 1
First, I want to remove special characters, non-alphanumeric and non-latin characters from this string.
Like this: $string='möbel-jérôme-mp3-how';
Previously, I used to have only english characters in the string.
So, I used to do like this: $string = preg_replace("([^a-z0-9])", "-", $string);
However, since I also want to retain foreign characters, this is not working.
Step: 2
Then, I want to remove the all the words that have one or more numbers in them.
In this example string, I want to remove the word mp3 as it contains one or more numbers.
So, the final string looks like this: $string='möbel-jérôme-how';
I used to do like this:
$words = explode('-',$string);
$result = array();
foreach($words as $word)
{
if( ($word ==preg_replace("([^a-z])", "-", $word)) && strlen($word)>2)
$result[]=$word;
}
$string = implode(' ',$result);
This does not work now as it contains foreign characters.
In PHP, you have access to Unicode properties:
$result = preg_replace('/[^\p{L}\p{N}-]+/u', '', $subject);
will do step 1 for you. (\p{L} matches any Unicode letter, \p{N} matches any Unicode digit).
Removing words with digits is just as easy:
$result2 = preg_replace('/\b\w*\d\w*\b-?/', '', $result);
(\b matches the start and end of a word).
I would strongly suggest to transliterate the unicode characters if you are actually doing slugs for links. You can use PHP's iconv to achieve that.
Similar question here. The ingenuity and simplicity of the top voted answer, I think, is great:)
I would suggest doing this in multiple steps:
Create a string of allowed characters ( all of them ) and and go through the string by keeping only them. ( it will take some time, but it's a one time thing )
Do an explode on - and go through all the words and keep only the ones, that don't contain numbers. Then implode it again.
I believe, you can write the script on your own from now.

Regex to match sentences with at least n words

I'm trying to pull all sentences from a text that consist of, say, at least 5 words in PHP. Assuming sentences end with full stop, question or exclamation mark, I came up with this:
/[\w]{5,*}[\.|\?|\!]/
Any ideas, what's wrong?
Also, what needs to be done for this to work with UTF-8?
\w only matches a single character. A single word would be \w+. If you need at least 5 words, you could do something like:
/(\w+\s){4,}\w+[.?!]/
i.e. at least 4 words followed by spaces, followed by another word followed by a sentence delimiter.
I agree with the solution posted here. If you're using preg functions in PHP you can add 'u' pattern modifier for this to work with UTF-8. /(\w+\s){4,}\w+[.?!]/u for example
The without regex method:
$str = "this is a more than five word sentence. But this is not. Neither this. NO";
$sentences = explode(".", $str);
foreach($sentences as $s)
{
$words = explode(' ', $s);
if(count(array_filter($words, 'is_notempty')) > 5)
echo "Found matching sentence : $s" . "<br/>";
}
function is_notempty($x)
{
return !empty($x);
}
This outputs:
Found matching sentence : this is a more than five word sentence

Regex to match all characters except letters and numbers

I want to clean the filenames of all uploaded files. I want to remove all characters except periods, letters and numbers. I'm not good with regex so I thought I would ask here.
Can someone show me how to put this together? I'm using PHP.
$newfilename=preg_replace('/[^a-zA-Z0-9.]/','',$filename);
s/[^.a-zA-Z\d]//g
(This is a Perl expression of how to use the RegExp. In PHP you do:
$output = preg_replace('/[^.a-zA-Z\d]/', '', $input);
Try to use this:
$cleanString = preg_replace('#\W#', '', $string);
It will remove all but letters and numbers.

Categories