I need to know if there is any way to merge two regular expression into a single regexp. Recently I had to make the following php code but I feel that there is a simplified way to achieve this without using multiple preg_replace tags. What I am trying to do is strip off & © etc.. and to remove all multiple spaces
$textinput = 'this is a test input \' """""" """" ##$$%&*)_+!##$%^&*) 123 456';
$var = preg_replace("/&#?[a-z0-9]{2,8};/i",'',$textinput)
$string = preg_replace('/\s+/', ' ', $var);
output
this is a test input ' """""""""" ##$$%&*)_+!##$%^&*) 123 456
I am aware about the html_entity_decode function in php to strip the special characters off, well this just an example! How can I merge both of the regexp into a single one?
Thank you!
This will do your two replacements in one efficient step (without losing the whitespace character):
$replaced = preg_replace('~(?:&#?[a-z0-9]{2,8};)+|\s\K\s+~', '', $yourstring);
On the demo, see how all the extra characters are targeted.
Explanation
On the left side of the |, (?:&#?[a-z0-9]{2,8};)+ targets groups such as , not just one at a time but several together if they are touching.
On the right side, the \s matches one space, then the \K tells the engine to drop it from the match (it will not be replaced), then the \s+ matches any whitespace chars that follow
We replace with the empty string.
$var = preg_replace_callback('/&#?[a-z0-9]{2,8};|\s+/i', function($match) {
return $match[0][0] === '&' ? '' : ' ';
}, $textinput);
You could use a logical OR operator to combine both regexes,
(?:&#?[a-z0-9]{2,8};)+|(?<=\s)\s+
Your code would be,
<?php
$mystring = 'this is a test input \' """""" """" ##$$%&*)_+!##$%^&*) 123 456';
$pattern = "~(?:&#?[a-z0-9]{2,8};)+|(?<=\s)\s+~";
$replacement = "";
echo preg_replace($pattern, $replacement, $mystring);
?>
OR
<?php
$mystring = 'this is a test input \' """""" """" ##$$%&*)_+!##$%^&*) 123 456';
$pattern = "~&#?[a-z0-9]{2,8};|(?<=\s)\s+~";
$replacement = "";
echo preg_replace($pattern, $replacement, $mystring);
?>
output:
this is a test input ' """""" """" ##$$%&*)_+!##$%^&*) 123 456
Related
Consider the below use of preg_replace
$str='{{description}}';
$repValue='$0.0 $00.00 $000.000 $1.1 $11.11 $111.111';
$field = 'description';
$pattern = '/{{'.$field.'}}/';
$str =preg_replace($pattern, $repValue, $str );
echo $str;
// Expected output: $0.0 $00.00 $000.000 $1.1 $11.11 $111.11
// Actual output: {{description}}.0 {{description}}.00 {{description}}0.000 .1 .11 1.111
Here is a phpFiddle showing the issue
It's clear to me that the actual output is not as expected because preg_replace is viewing $0, $0, $0, $1, $11, and $11 as back references for matched groups replacing $0 with the full match and $1 and $11 with an empty string since there are no capture groups 1 or 11.
How can I prevent preg_replace from treating prices in my replacement value as back references and attempting to fill them?
Note that $repValue is dynamic and it's content will not be know before the operation.
Escape the dollar character before using a character translation (strtr):
$repValue = strtr('$0.0 $00.00 $000.000 $1.1 $11.11 $111.111', ['$'=>'\$']);
For more complicated cases (with dollars and escaped dollars) you can do this kind of substitution (totally waterproof this time):
$str = strtr($str, ['%'=>'%%', '$'=>'$%', '\\'=>'\\%']);
$repValue = strtr($repValue, ['%'=>'%%', '$'=>'$%', '\\'=>'\\%']);
$pattern = '/{{' . strtr($field, ['%'=>'%%', '$'=>'$%', '\\'=>'\\%']) . '}}/';
$str = preg_replace($pattern, $repValue, $str );
echo strtr($str, ['%%'=>'%', '$%'=>'$', '\\%'=>'\\']);
Note: if $field contains only a literal string (not a subpattern), you don't need to use preg_replace. You can use str_replace instead and in this case you don't have to substitute anything.
I am trying to sanitize my string, so it would be made only from A-Z (with unicode), 0-9, and ".", ",", "-" symbols.
Example
Maama-Paapaa-Test
Must be
Mama-Papa-Test
What I've done so far
$string = 'lietuviškos';
$string .= ' +!##$%^&*()(,,,*&^%AAAA-Sdas.. .d#$%#%#dasf0000-!!####$$%%^^&&**())__-+---++aaaa';
$string .= ' klaviatūros-įgūdžiams';
$string = preg_replace('/[^\p{L}\p{N} \-]/u', null, $string);
$string = preg_replace('/[,-.]/u', null, $string);
$string = ucfirst(strtolower($string));
var_dump($string);
And the only problem here, if char/symbol is duplicated somewhere in a string, it removed this char/symbol from string everywhere.
So
Maama-Paapaa-Test
Becomes
Mm-Pp-Test
What's problem with using simple one (.)\1+
I am trying to sanitize my string, so it would be made only from A-Z (with unicode), 0-9, and ".", ",", "-" symbols.
So in your case it will be ([A-Z0-9.,-])\1
Explanation: This will capture characters in a captured group and check if it's repeated with \1+.
Match should be replaced with \1 i.e single such character.
Regex101 Demo
Ideone Demo
Please check and let me know
<?php
echo preg_replace("/(.)\\1+/", "$1", "Maama-Paapaa-Test");
?>
Output: Mama-Papa-Test
Thanks
I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);
Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries
You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''
Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);
As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"
To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.
I have the following strings:
Johnny arrived at BOB
Peter is at SUSAN
I want a function where I can do this:
$string = stripWithWildCard("Johnny arrived at BOB", "*at ")
$string must equal BOB. Also if I do this:
$string = stripWithWildCard("Peter is at SUSAN", "*at ");
$string must be equal to SUSAN.
What is the shortest way to do this?
A regular expression. You substitute .* for * and replace with the empty string:
echo preg_replace('/.*at /', '', 'Johnny arrived at BOB');
Keep in mind that if the string "*at " is not hardcoded then you also need to quote any characters which have special meaning in regular expressions. So you would have:
$find = '*at ';
$find = preg_quote($find, '/'); // "/" is the delimiter used below
$find = str_replace('\*', '.*'); // preg_quote escaped that, unescape and convert
echo preg_replace('/'.$find.'/', '', $input);
I'm trying to remove excess whitespace from a string like this:
hello world
to
hello world
Anyone has any idea how to do that in PHP?
With a regexp :
preg_replace('/( )+/', ' ', $string);
If you also want to remove every multi-white characters, you can use \s (\s is white characters)
preg_replace('/(\s)+/', ' ', $string);
$str = 'Why do I
have so much white space?';
$str = preg_replace('/\s{2,}/', ' ', $str);
var_dump($str); // string(34) "Why do I have so much white space?"
See it!
You could also use the + quantifier, because it always replaces it with a . However, I find {2,} to show your intent clearer.
There is an example on how to strip excess whitespace in the preg_replace documentation
Not a PHP expert, but his sounds like a job for REGEX....
<?php
$string = 'Hello World and Everybody!';
$pattern = '/\s+/g';
$replacement = ' ';
echo preg_replace($pattern, $replacement, $string);
?>
Again, PHP is not my language, but the idea is to replace multiple whitespaces with single spaces. The \s stands for white space, and the + means one or more. The g on the end means to do it globally (i.e. more than once).