Remove duplicated chars in string - php

I am trying to sanitize my string, so it would be made only from A-Z (with unicode), 0-9, and ".", ",", "-" symbols.
Example
Maama-Paapaa-Test
Must be
Mama-Papa-Test
What I've done so far
$string = 'lietuviškos';
$string .= ' +!##$%^&*()(,,,*&^%AAAA-Sdas.. .d#$%#%#dasf0000-!!####$$%%^^&&**())__-+---++aaaa';
$string .= ' klaviatūros-įgūdžiams';
$string = preg_replace('/[^\p{L}\p{N} \-]/u', null, $string);
$string = preg_replace('/[,-.]/u', null, $string);
$string = ucfirst(strtolower($string));
var_dump($string);
And the only problem here, if char/symbol is duplicated somewhere in a string, it removed this char/symbol from string everywhere.
So
Maama-Paapaa-Test
Becomes
Mm-Pp-Test

What's problem with using simple one (.)\1+
I am trying to sanitize my string, so it would be made only from A-Z (with unicode), 0-9, and ".", ",", "-" symbols.
So in your case it will be ([A-Z0-9.,-])\1
Explanation: This will capture characters in a captured group and check if it's repeated with \1+.
Match should be replaced with \1 i.e single such character.
Regex101 Demo
Ideone Demo

Please check and let me know
<?php
echo preg_replace("/(.)\\1+/", "$1", "Maama-Paapaa-Test");
?>
Output: Mama-Papa-Test
Thanks

Related

PHP function to keep only a-z 0-9 and replace spaces with "-" (including regular expression)

I want to write a PHP function that keeps only a-z (keeps all letters as lowercase) 0-9 and "-", and replace spaces with "-".
Here is what I have so far:
...
$s = strtolower($s);
$s = str_replace(' ', '-', $s);
$s = preg_replace("/[^a-z0-9]\-/", "", $s);
But I noticed that it keeps "?" (question marks) and I'm hoping that it doesn't keep other characters that I haven't noticed.
How could I correct it to obtain the expected result?
(I'm not super comfortable with regular expressions, especially when switching languages/tools.)
$s = strtolower($s);
$s = str_replace(' ', '-', $s);
$s = preg_replace("/[^a-z0-9\-]+/", "", $s);
You did not have the \- in the [] brackets.
It also seems you can use - instead of \-, both worked for me.
You need to add multiplier of the searched characters.
In this case, I used +.
The plus sign indicates one or more occurrences of the preceding element.

Php make spaces in a word with a dash

I have the following string:
$thetextstring = "jjfnj 948"
At the end I want to have:
echo $thetextstring; // should print jjf-nj948
So basically what am trying to do is to join the separated string then separate the first 3 letters with a -.
So far I have
$string = trim(preg_replace('/s+/', ' ', $thetextstring));
$result = explode(" ", $thetextstring);
$newstring = implode('', $result);
print_r($newstring);
I have been able to join the words, but how do I add the separator after the first 3 letters?
Use a regex with preg_replace function, this would be a one-liner:
^.{3}\K([^\s]*) *
Breakdown:
^ # Assert start of string
.{3} # Match 3 characters
\K # Reset match
([^\s]*) * # Capture everything up to space character(s) then try to match them
PHP code:
echo preg_replace('~^.{3}\K([^\s]*) *~', '-$1', 'jjfnj 948');
PHP live demo
Without knowing more about how your strings can vary, this is working solution for your task:
Pattern:
~([a-z]{2}) ~ // 2 letters (contained in capture group1) followed by a space
Replace:
-$1
Demo Link
Code: (Demo)
$thetextstring = "jjfnj 948";
echo preg_replace('~([a-z]{2}) ~','-$1',$thetextstring);
Output:
jjf-nj948
Note this pattern can easily be expanded to include characters beyond lowercase letters that precede the space. ~(\S{2}) ~
You can use str_replace to remove the unwanted space:
$newString = str_replace(' ', '', $thetextstring);
$newString:
jjfnj948
And then preg_replace to put in the dash:
$final = preg_replace('/^([a-z]{3})/', '\1-', $newString);
The meaning of this regex instruction is:
from the beginning of the line: ^
capture three a-z characters: ([a-z]{3})
replace this match with itself followed by a dash: \1-
$final:
jjf-nj948
$thetextstring = "jjfnj 948";
// replace all spaces with nothing
$thetextstring = str_replace(" ", "", $thetextstring);
// insert a dash after the third character
$thetextstring = substr_replace($thetextstring, "-", 3, 0);
echo $thetextstring;
This gives the requested jjf-nj948
You proceeding is correct. For the last step, which consists in inserting a - after the third character, you can use the substr_replace function as follows:
$thetextstring = 'jjfnj 948';
$string = trim(preg_replace('/\s+/', ' ', $thetextstring));
$result = explode(' ', $thetextstring);
$newstring = substr_replace(implode('', $result), '-', 3, false);
If you are confident enough that your string will always have the same format (characters followed by a whitespace followed by numbers), you can also reduce your computations and simplify your code as follows:
$thetextstring = 'jjfnj 948';
$newstring = substr_replace(str_replace(' ', '', $thetextstring), '-', 3, false);
Visit this link for a working demo.
Oldschool without regex
$test = "jjfnj 948";
$test = str_replace(" ", "", $test); // strip all spaces from string
echo substr($test, 0, 3)."-".substr($test, 3); // isolate first three chars, add hyphen, and concat all characters after the first three

PHP Regex: Remove words less than 3 characters

I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);
Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries
You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''
Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);
As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"
To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.

php replace if two or more non alphanumeric characters

I have been trying to replace a portion of a string if two of more non alphanumeric characters are found.
I have it partly working but can not replace when a underscore is in there.
This is what i am trying.
$str = "-dxs_ s";
$str = preg_replace('/\W{2,}|\_{2,}/', ' ', $str);
reults in -dxs_ s should be -dxs s.
So how do you replace if two or more non alphanumeric characters are found in a string?
Simply
$str = preg_replace('/(\W|_){2,}/', ' ', $str);
What this is doing is grouping the "non-word or underscore" part and applies the 2+ quantifier to it as a whole.
See it in action.
\W also excludes _ therefore you need your own characters class :
/[^a-zA-Z0-9]{2,}/
or
$result = preg_replace('/[^a-z\d]{2,}/i', ' ', $subject);

Remove all non-matching characters in PHP string?

I've got text from which I want to remove all characters that ARE NOT the following.
desired_characters =
0123456789!&',-./abcdefghijklmnopqrstuvwxyz\n
The last is a \n (newline) that I do want to keep.
To match all characters except the listed ones, use an inverted character set [^…]:
$chars = "0123456789!&',-./abcdefghijklmnopqrstuvwxyz\n";
$pattern = "/[^".preg_quote($chars, "/")."]/";
Here preg_quote is used to escape certain special characters so that they are interpreted as literal characters.
You could also use character ranges to express the listed characters:
$pattern = "/[^0-9!&',-.\\/a-z\n]/";
In this case it doesn’t matter if the literal - in ,-. is escaped or not. Because ,-. is interpreted as character range from , (0x2C) to . (0x2E) that already contains the - (0x2D) in between.
Then you can remove those characters that are matched with preg_replace:
$output = preg_replace($pattern, "", $str);
$string = 'This is anexample $tring! :)';
$string = preg_replace('/[^0-9!&\',\-.\/a-z\n]/', '', $string);
echo $string; // hisisanexampletring!
^ This is case sensitive, hence the capital T is removed from the string. To allow capital letters as well, $string = preg_replace('/[^0-9!&\',\-.\/A-Za-z\n]/', '', $string)

Categories