How to replace double/more letters to a single letter? - php

I need to convert any letter that occur twice or more within a word with a single letter of itself.
For example:
School -> Schol
Google -> Gogle
Gooooogle -> Gogle
VooDoo -> Vodo
I tried the following, but stuck at the second parameter in eregi_replace.
$word = 'Goooogle';
$word2 = eregi_replace("([a-z]{2,})", "?", $word);
If I use \\\1 to replace ?, it would display the exact match.
How do I make it single letter?
Can anyone help? Thanks

See regular expression to replace two (or more) consecutive characters by only one?
By the way: you should use the preg_* (PCRE) functions instead of the deprecated ereg_* functions (POSIX).
Richard Szalay's answer leads the right way:
$word = 'Goooogle';
$word2 = preg_replace('/(\w)\1+/', '$1', $word);

Not only are you capturing the entire thing (instead of just the first character), but {2,} rematching [a-z] (not the original match). It should work if you use:
$word2 = eregi_replace("(\w)\1+", "\\1", $word);
Which backreferences the original match. You can replace \w with [a-z] if you wish.
The + is required for your Goooogle example (for the JS regex engine, anyway), but I'm not sure why.
Remember that you will need to use the "global" flag ("g").

Try this:
$string = "thhhhiiiissssss hasss sooo mannnny letterss";
$string = preg_replace('/([a-zA-Z])\1+/', '$1', $string);
How this works:
/ ... / # Marks the start and end of the expression.
([a-zA-Z]) # Match any single a-z character lowercase or uppercase.
\1+ # One or more occurrence of the single character we matched previously.
$1
\1+ # The same single character we matched previously.

Related

Using preg_replace()?

I'm trying to understand the function preg_replace(), but it looks rather cryptic to me. From looking at the documentation on the function here. I understand that it consists of three things - the subject, the pattern for matching, and what to replace it with.
I'm currently trying to 'sanitize' numerical input by replacing anything that isn't a number. So far I know I would need to allow the numbers 0-9, but remove anything that isn't a number and replace it with: "".
Instead of escaping every character I need to, is there some way to simply not allow any other character than the numbers 0-9? Also if anyone could shed light on how the 'pattern for matching' part works...
If you want to sanitize a string to replace anything that isn't a number, you would write a regular expression that matches characters not in list.
The pattern [0-9] would match all numerals. Placing a caret (^) at the beginning of the set matches everything that isn't in the set: [^0-9]
$result = preg_replace('/[^0-9]/', '', $input);
Note that this will also filter out periods/decimal points and other mathematical marks. You could include periods/decimal points (allowing floats) by making the period allowed:
$result = preg_replace('/[^0-9.]/', '', $input);
Note that the period (.) is the wildcard character in regular expressions. It doesn't need to be escaped in a bracket expression, but it would elsewhere in the pattern.
preg_replace() is easier than you think:
$p = "/[A-Z a-z]/";
$r = "";
$s = "12345t7890";
echo preg_replace($p, $r, $s);
Will output "123457890" - note no '6'
is_numeric(); isn't resolve your problem ?

Does preg_replace use a regular expression to target, or some other format?

I am trying to replace all occurrences of one ot the groups of three letters (in capital) followed by 5 numbers (0-9) and then replace them with a link. Nothing I've done so far seems to work. This is what I have at the moment.
return preg_replace("(MTM|SIR|FDF|TAA)[0-9]{5}", "<a href='$&'>$&</a>", $str);
You don't need to have a capturing group in your case since you can refer to the whole pattern with $0. And you must add delimiters to your pattern:
$str = preg_replace('~(?>MTM|SIR|FDF|TAA)\d{5}~', '$0', $str);
Create a backreference to the characters by wrapping them in a group. Then use $1 to refer to the contents of the first group.
return preg_replace("/((?:MTM|SIR|FDF|TAA)[0-9]{5})/", "<a href='$1'>$1</a>", $str);
Also, you'll probably want (MTM|SIR|FDF|TAA) instead of [MTM|SIR|FDF|TAA]{3}. This means you must have either MTM, SIR, FDF or TAA.

How to use search and replace all the matching words in a sentence in php

I have to search and replace all the words starting with # and # in a sentence. Can you please let me know the best way to do this in PHP. I tried with
preg_replace('/(\#+|\#+).*?(?=\s)/','--', $string);
This will solve only one word in a sentence. I want all the matches to be replace.
I cannot g here like in perl.
preg_replace replaces all matches by default. If it is not doing so, it is an issue with your pattern or the data.
Try this pattern instead:
(?<!\S)[##]+\w+
(?<!\S) - do not match if the pattern is preceded by a non-whitespace character.
[##]+ - match one or more of # and #.
\w+ - match one or more word characters (letter, numbers, underscores). This will preserve punctuation. For example, #foo. would be replaced by --.. If you don't want this, you could use \S+ instead, which matches all characters that are not whitespace.
A word starting with a character implies that it has a space right before this character. Try something like that:
/(?<!\S)[##].*(?=[^a-z])/
Why not use (?=\s)? Because if there is some ponctuation right after the word, it's not part of the word. Note: you can replace [^a-z] by any list of unallowed character in your word.
Be careful though, there are are two particular cases where that doesn't work. You have to use 3 preg_replace in a row, the two others are for words that begin and end the string:
/^[##].*(?=[^a-z])/
/(?<!\S)[##].*$/
Try this :
$string = "#Test let us meet_me#noon see #Prasanth";
$new_pro_name = preg_replace('/(?<!\S)(#\w+|#\w+)/','--', $string);
echo $new_pro_name;
This replaces all the words starting with # OR #
Output: -- let us meet_me#noon see --
If you want to replace word after # OR # even if it at the middle of the word.
$string = "#Test let us meet_me#noon see #Prasanth";
$new_pro_name = preg_replace('/(#\w+|#\w+)/','--', $string);
echo $new_pro_name;
Output: -- let us meet_me-- see --

PHP: How to convert a string that contains upper case characters

i'm working on class names and i need to check if there is any upper camel case name and break it this way:
"UserManagement" becomes "user-management"
or
"SiteContentManagement" becomes "site-content-management"
after extensive search i only found various use of ucfirst, strtolower,strtoupper, ucword and i can't see how to use them to suit my needs any ideas?
thanks for reading ;)
You can use preg_replace to replace any instance of a lowercase letter followed with an uppercase with your lower-dash-lower variant:
$dashedName = preg_replace('/([^A-Z-])([A-Z])/', '$1-$2', $className);
Then followed by a strtolower() to take care of any remaining uppercase letters:
return strtolower($dashedName);
The full function here:
function camel2dashed($className) {
return strtolower(preg_replace('/([^A-Z-])([A-Z])/', '$1-$2', $className));
}
To explain the regular expression used:
/ Opening delimiter
( Start Capture Group 1
[^A-Z-] Character Class: Any character NOT an uppercase letter and not a dash
) End Capture Group 1
( Start Capture Group 2
[A-Z] Character Class: Any uppercase letter
) End Capture Group 2
/ Closing delimiter
As for the replacement string
$1 Insert Capture Group 1
- Literal: dash
$2 Insert Capture Group 2
Theres no built in way to do it.
This will ConvertThis into convert-this:
$str = preg_replace('/([a-z])([A-Z])/', '$1-$2', $str);
$str = strtolower($str);
You can use a regex to get each words, then add the dashes like this:
preg_match_all ('/[A-Z][a-z]+/', $className, $matches); // get each camelCase words
$newName = strtolower(implode('-', $matches[0])); // add the dashes and lowercase the result
This simply done without any capture groups -- just find the zero-width position before an uppercase letter (excluding the first letter of the string), then replace it with a hyphen, then call strtolower on the new string.
Code: (Demo)
echo strtolower(preg_replace('~(?!^)(?=[A-Z])~', '-', $string));
The lookahead (?=...) makes the match but doesn't consume any characters.
The best way to do that might be preg_replace using a pattern that replaces uppercase letters with their lowercase counterparts adding a "-" before them.
You could also go through each letter and rebuild the whole string.

Insert separators into a string in regular intervals

I have the following string in php:
$string = 'FEDCBA9876543210';
The string can be have 2 or more (I mean more) hexadecimal characters
I wanted to group string by 2 like :
$output_string = 'FE:DC:BA:98:76:54:32:10';
I wanted to use regex for that, I think I saw a way to do like "recursive regex" but I can't remember it.
Any help appreciated :)
If you don't need to check the content, there is no use for regex.
Try this
$outputString = chunk_split($string, 2, ":");
// generates: FE:DC:BA:98:76:54:32:10:
You might need to remove the last ":".
Or this :
$outputString = implode(":", str_split($string, 2));
// generates: FE:DC:BA:98:76:54:32:10
Resources :
www.w3schools.com - chunk_split()
www.w3schools.com - str_split()
www.w3schools.com - implode()
On the same topic :
Split string into equal parts using PHP
Sounds like you want a regex like this:
/([0-9a-f]{2})/${1}:/gi
Which, in PHP is...
<?php
$string = 'FE:DC:BA:98:76:54:32:10';
$pattern = '/([0-9A-F]{2})/gi';
$replacement = '${1}:';
echo preg_replace($pattern, $replacement, $string);
?>
Please note the above code is currently untested.
You can make sure there are two or more hex characters doing this:
if (preg_match('!^\d*[A-F]\d*[A-F][\dA-F]*$!i', $string)) {
...
}
No need for a recursive regex. By the way, recursive regex is a contradiction in terms. As a regular language (which a regex parses) can't be recursive, by definition.
If you want to also group the characters in pairs with colons in between, ignoring the two hex characters for a second, use:
if (preg_match('!^[\dA-F]{2}(?::[A-F][\dA-F]{2})*$!i', $string)) {
...
}
Now if you want to add the condition requiring tow hex characters, use a positive lookahead:
if (preg_match('!^(?=[\d:]*[A-F][\d:]*[A-F])[\dA-F]{2}(?::[A-F][\dA-F]{2})*$!i', $string)) {
...
}
To explain how this works, the first thing it does it that it checks (with a positive lookahead ie (?=...) that you have zero or more digits or colons followed by a hex letter followed by zero or more digits or colons and then a letter. This will ensure there will be two hex letters in the expression.
After the positive lookahead is the original expression that makes sure the string is pairs of hex digits.
Recursive regular expressions are usually not possible. You may use a regular expression recursively on the results of a previous regular expression, but most regular expression grammars will not allow recursivity. This is the main reason why regular expressions are almost always inadequate for parsing stuff like HTML. Anyways, what you need doesn't need any kind of recursivity.
What you want, simply, is to match a group multiple times. This is quite simple:
preg_match_all("/([a-z0-9]{2})+/i", $string, $matches);
This will fill $matches will all occurrences of two hexadecimal digits (in a case-insensitive way). To replace them, use preg_replace:
echo preg_replace("/([a-z0-9]{2})/i", $string, '\1:');
There will probably be one ':' too much at the end, you can strip it with substr:
echo substr(preg_replace("/([a-z0-9]{2})/i", $string, '\1:'), 0, -1);
While it is not horrible practice to use rtrim(chunk_split($string, 2, ':'), ':'), I prefer to use direct techniques that avoid "mopping up" after making modifications.
Code: (Demo)
$string = 'FEDCBA9876543210';
echo preg_replace('~[\dA-F]{2}(?!$)\K~', ':', $string);
Output:
FE:DC:BA:98:76:54:32:10
Don't be intimidated by the regex. The pattern says:
[\dA-F]{2} # match exactly two numeric or A through F characters
(?!$) # that is not located at the end of the string
\K # restart the fullstring match
When I say "restart the fullstring match" I mean "forget the previously matched characters and start matching from this point forward". Because there are no additional characters matched after \K, the pattern effectively delivers the zero-width position where the colon should be inserted. In this way, no original characters are lost in the replacement.

Categories