PHP preg_replace odd characters not working

PHP preg_replace odd characters not working - php

I have the following code, but for some reason, the characters are not replaced....
test.php
<?php
$s = 'AABBCC����ˮ��������Ƽ���� �˾XXYYZZ';
$softwareVersion = preg_replace('[^a-zA-Z\d\s\.]', '', $s);
echo $softwareVersion . "\n";
what I am getting
jeffreylroberts:~$ php test.php
AABBCC����ˮ��������Ƽ���� �˾XXYYZZ
jeffreylroberts:~$
what I am expecting
jeffreylroberts:~$ php test.php
AABBCC XXYYZZ
jeffreylroberts:~$
Any ideas on how to preg_replace those characters?

You forgot to add a leading an trailing forward slash in the regex, This will give you the output you need:
$softwareVersion = preg_replace('/[^a-zA-Z0-9\d\s\.]/', '', $s);
Also you can do it this way, which will remove all characters except alphanumeric and underscore:
$softwareVersion = preg_replace('/\W/', '', $s);

A few things to tweak:
Use a pattern delimiter. / is the most common one.
Reduce your pattern length by only writing a-z in the character class and use the i modifier/flag at the end of your pattern.
Escaping the dot is not necessary in the character class.
Use the + "one or more" quantifier to improve efficiency. It will match consecutive occurrences of character and replace the multi-character substring in one shot.
Code: (Demo)
$s='AABBCC����ˮ��������Ƽ���� �˾XXYYZZ';
$softwareVersion = preg_replace('/[^a-z\d\s.]+/i','',$s);
echo $softwareVersion . "\n";
Output:
AABBCC XXYYZZ

Related

PHP rtrim all trailing special characters

I'm making a function that that detect and remove all trailing special characters from string. It can convert strings like :
"hello-world"
"hello-world/"
"hello-world--"
"hello-world/%--+..."
into "hello-world".
anyone knows the trick without writing a lot of codes?

Just for fun
[^a-z\s]+
Regex demo
Explanation:
[^x]: One character that is not x sample
\s: "whitespace character": space, tab, newline, carriage return, vertical tab sample
+: One or more sample
PHP:
$re = "/[^a-z\\s]+/i";
$str = "Hello world\nhello world/\nhello world--\nhellow world/%--+...";
$subst = "";
$result = preg_replace($re, $subst, $str);

try this
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
or escape apostraphe from string
preg_replace('/[^A-Za-z0-9\-\']/', '', $string); // escape apostraphe

You could use a regex like this, depending on your definition of "special characters":
function clean_string($input) {
return preg_replace('/\W+$/', '', $input);
}
It replaces any characters that are not a word character (\W) at the end of the string $ with nothing. \W will match [^a-zA-Z0-9_], so anything that is not a letter, digit, or underscore will get replaced. To specify which characters are special chars, use a regex like this, where you put all your special chars within the [] brackets:
function clean_string($input) {
return preg_replace('/[\/%.+-]+$/', '', $input);
}

This one is what you are looking for. :
([^\n\w\d \"]*)$
It removes anything that is not from the alphabet, a number, a space and a new line.
Just call it like this :
preg_replace('/([^\n\w\s]*)$/', '', $string);

PHP Regex: Remove words less than 3 characters

I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);

Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries

You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''

Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);

As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"

To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.

preg_replace vs trim PHP

I am working with a slug function and I dont fully understand some of it and was looking for some help on explaining.
My first question is about this line in my slug function $string = preg_replace('# +#', '-', $string); Now I understand that this replaces all spaces with a '-'. What I don't understand is what the + sign is in there for which comes after the white space in between the #.
Which leads to my next problem. I want a trim function that will get rid of spaces but only the spaces after they enter the value. For example someone accidentally entered "Arizona " with two spaces after the a and it destroyed the pages linked to Arizona.
So after all my rambling I basically want to figure out how I can use a trim to get rid of accidental spaces but still have the preg_replace insert '-' in between words.
ex.. "Sun City West " = "sun-city-west"
This is my full slug function-
function getSlug($string){
if(isset($string) && $string <> ""){
$string = strtolower($string);
//var_dump($string); echo "<br>";
$string = preg_replace('#[^\w ]+#', '', $string);
//var_dump($string); echo "<br>";
$string = preg_replace('# +#', '-', $string);
}
return $string;
}

You can try this:
function getSlug($string) {
return preg_replace('#\s+#', '-', trim($string));
}
It first trims extra spaces at the beginning and end of the string, and then replaces all the other with the - character.
Here your regex is:
#\s+#
which is:
# = regex delimiter
\s = any space character
+ = match the previous character or group one or more times
# = regex delimiter again
so the regex here means: "match any sequence of one or more whitespace character"

The + means at least one of the preceding character, so it matches one or more spaces. The # signs are one of the ways of marking the start and end of a regular expression's pattern block.
For a trim function, PHP handily provides trim() which removes all leading and trailing whitespace.

Regex to insert dot (.) after characters, before new line

I'm reformatting some text, and sometimes I have a string, where there is a sentence which is not ended by a dot.
I'm running various checks for this purpose, and one more I'd like is to "Add dot after last character before new line".
I'm not sure how to form the regular expression for this:]
$string = preg_replace("/???/", ".\n", $string);

Try this one:
$string = preg_replace("/(?<![.])(?=[\n\r]|$)/", ".", $string);
negative lookbehind (?<![.]) is checking previous character is not .
positive lookahead (?=[\n\r]|$) is checking next character is a newline or end of string.

like this I suppose:
<?php
$string = "Add dot after last character before new line\n";
$string = preg_replace("/(.)$/", "$1.\n", $string);
print $string;
?>
This way the dot will be added after the word line in the sentence and before the \n.
demo : http://ideone.com/J4g7tH

I'd do:
$string = "Add dot after last character before new line\n";
$string = preg_replace("/([^.\r\n])$/s", "$1.", $string);

Thanks for all the answers, but none of them really caught all scenarios right.
I fumbled my way to a good solution using the word boundary regex character class:
// Add dot after every word boundary that is followed by a new line.
$string = preg_replace("/[\b][\n]/", ".\n", $string);
I guess [\b][\n] could just as well be \b\n without square brackets.

This works for me:
$content = preg_replace("/(\w+)(\n)/", "$1.$2", $content);
It will match a word immediately followed by a new line, and add a dot in between.
Will match:
Hello\n
Will not match:
Hello \n
or
Hello.\n

php replace if two or more non alphanumeric characters

I have been trying to replace a portion of a string if two of more non alphanumeric characters are found.
I have it partly working but can not replace when a underscore is in there.
This is what i am trying.
$str = "-dxs_ s";
$str = preg_replace('/\W{2,}|\_{2,}/', ' ', $str);
reults in -dxs_ s should be -dxs s.
So how do you replace if two or more non alphanumeric characters are found in a string?

Simply
$str = preg_replace('/(\W|_){2,}/', ' ', $str);
What this is doing is grouping the "non-word or underscore" part and applies the 2+ quantifier to it as a whole.
See it in action.

\W also excludes _ therefore you need your own characters class :
/[^a-zA-Z0-9]{2,}/
or
$result = preg_replace('/[^a-z\d]{2,}/i', ' ', $subject);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP preg_replace odd characters not working - php

Related

PHP rtrim all trailing special characters

PHP Regex: Remove words less than 3 characters

preg_replace vs trim PHP

Regex to insert dot (.) after characters, before new line

php replace if two or more non alphanumeric characters

Categories

Resources