PHP: How to convert a string that contains upper case characters - php

i'm working on class names and i need to check if there is any upper camel case name and break it this way:
"UserManagement" becomes "user-management"
or
"SiteContentManagement" becomes "site-content-management"
after extensive search i only found various use of ucfirst, strtolower,strtoupper, ucword and i can't see how to use them to suit my needs any ideas?
thanks for reading ;)

You can use preg_replace to replace any instance of a lowercase letter followed with an uppercase with your lower-dash-lower variant:
$dashedName = preg_replace('/([^A-Z-])([A-Z])/', '$1-$2', $className);
Then followed by a strtolower() to take care of any remaining uppercase letters:
return strtolower($dashedName);
The full function here:
function camel2dashed($className) {
return strtolower(preg_replace('/([^A-Z-])([A-Z])/', '$1-$2', $className));
}
To explain the regular expression used:
/ Opening delimiter
( Start Capture Group 1
[^A-Z-] Character Class: Any character NOT an uppercase letter and not a dash
) End Capture Group 1
( Start Capture Group 2
[A-Z] Character Class: Any uppercase letter
) End Capture Group 2
/ Closing delimiter
As for the replacement string
$1 Insert Capture Group 1
- Literal: dash
$2 Insert Capture Group 2

Theres no built in way to do it.
This will ConvertThis into convert-this:
$str = preg_replace('/([a-z])([A-Z])/', '$1-$2', $str);
$str = strtolower($str);

You can use a regex to get each words, then add the dashes like this:
preg_match_all ('/[A-Z][a-z]+/', $className, $matches); // get each camelCase words
$newName = strtolower(implode('-', $matches[0])); // add the dashes and lowercase the result

This simply done without any capture groups -- just find the zero-width position before an uppercase letter (excluding the first letter of the string), then replace it with a hyphen, then call strtolower on the new string.
Code: (Demo)
echo strtolower(preg_replace('~(?!^)(?=[A-Z])~', '-', $string));
The lookahead (?=...) makes the match but doesn't consume any characters.

The best way to do that might be preg_replace using a pattern that replaces uppercase letters with their lowercase counterparts adding a "-" before them.
You could also go through each letter and rebuild the whole string.

Related

PHP/Laravel trim all but last word in a namespace

Trying to trim a fully qualified namespace so to use just the last word. Example namepspace is App\Models\FruitTypes\Apple where that final word could be any number of fruit types. Shouldn't this...
$fruitName = 'App\Models\FruitTypes\Apple';
trim($fruitName, "App\\Models\\FruitTypes\\");
...do the trick? It is returning an empty string. If I try to trim just App\\Models\\ it returns FruitTypes\Apples as expected. I know the backslash is an escape character, but doubling should treat those as actual backslashes.
If you want to use native functionality for this rather than string manipulation, then ReflectionClass::getShortName will do the job:
$reflection = new ReflectionClass('App\\Models\\FruitTypes\\Apple');
echo $reflection->getShortName();
Apple
See https://3v4l.org/eVl9v
preg_match() with the regex pattern \\([[:alpha:]]*)$ should do the trick.
$trimmed = preg_match('/\\([[:alpha:]]*)$/', $fruitName);
Your result will then live in `$trimmed1'. If you don't mind the pattern being a bit less explicit, you could do:
preg_match('/([[:alpha:]]*)$/', $fruitName, $trimmed);
And your result would then be in $trimmed[0].
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
preg_match - php.net
(matches is the third parameter that I named $trimmed, see documentation for full explanation)
An explanation for the regex pattern
\\ matches the character \ literally to establish the start of the match.
The parentheses () create a capturing group to return the match or a substring of the match.
In the capturing group ([[:alpha:]]*):
[:alpha:] matches a alphabetic character [a-zA-Z]
The * quantifier means match between zero and unlimited times, as many times as possible
Then $ asserts position at the end of the string.
So basically, "Find the last \ then return all letter between this and the end of the string".

Regex to replace character with character itself and hyphen

I need to replace some camelCase characters with the camel Case character and a -.
What I have got is a string like those:
Albert-Weisgerber-Allee 35
Bruninieku iela 50-10
Those strings are going through this regex to seperate the number from the street:
$data = preg_replace("/[^ \w]+/", '', $data);
$pcre = '\A\s*(.*?)\s*\x2f?(\pN+\s*[a-zA-Z]?(?:\s*[-\x2f\pP]\s*\pN+\s*[a-zA-Z]?)*)\s*\z/ux';
preg_match($pcre, $data, $h);
Now, I have two problems.
I'm very bad at regex.
Above regex also cuts every - from the streets name, and there are a lot of those names in germany and europe.
Actually it would be quite easy to just adjust the regex to not cut any hyphens, but I want to learn how regex works and so I decided to try to find a regex that just replaces every camel case letter in the string with
- & matched Camel Case letter
except for the first uppercase letter appearance.
I've managed to find a regex that shows me the places I need to paste a hyphen like so:
.[A-Z]{1}/ug
https://regex101.com/r/qI2iA9/1
But how on earth do I replace this string:
AlbertWeisgerberAllee
that it becomes
Albert-Weisgerber-Allee
To insert dashes before caps use this regex:
$string="AlbertWeisgerberAllee";
$string=preg_replace("/([a-z])([A-Z])/", "\\1-\\2", $string);
Just use capture groups:
(.)([A-Z]) //removed {1} because [A-Z] implicitly matches {1}
And replace with $1-$2
See https://regex101.com/r/qI2iA9/3
You seem to be over complicating the expression. You can use the following to place - before any uppercase letters except the first:
(.)(?=[A-Z])
Just replace that with $1-. Essentially, what this regex does is:
(.) Find any character and place that character in group 1.
(?=[A-Z]) See if an uppercase character follows.
$1- If matched, replace with the character found in group 1 followed by a hyphen.

PHP Regex Not Matching Desired Substrings

I've written the next regular expression
$pattern = "~\d+[.][\s]*[A-Z]{1}[A-Za-z0-9\s-']+~";
in order to match substrings as 2.bon jovi - it's my life
the problem is the only part that is recognized is - bon jovi
none " - " or " ' " are recognized by this regular expression.
I'd prefer to know what is wrong with the regular expression that I've wrote rather than getting a new one.
Your regular expressions states that after the period character (can be changed to \.), you will have zero or more white space characters which should then be followed by 1 upper case letter. In your string, you do not have any upper case letters.
Secondly, the - should be placed last when you want to match it. So, changing your regex to this: ~\d+[.][\s]*[A-Z]{1}[A-Za-z0-9\s'-]+~ will match something like so: 2.Bon jovi - it's my life.
On the other hand, you can change it to this: ~\d+[.][\s]*[A-Za-z0-9\s'-]+~ to match something like so: 2.bon jovi - it's my life.
EDIT: Ammended as per the comments of Marko D and aleation.
A better regular expression to handle that would be...
$pattern = "~\d+\.\s*[\pL\pP\s]+~";
CodePad.
This will match a number, followed by a ., followed by optional whitespace, followed by one or more Unicode letters, whitespace or punctuation marks.
$pattern = "~\d+\..*~";
$string = "2.bon jovi - it's my life";
preg_match($pattern, $string, $match);
print_r($match);
output: Array ( [0] => 2.bon jovi - it's my life )
So the way I understand this regular expression is:
\d+ // Match any digit, 1 or more times
[.] // Match a dot
[\s]* // Match 0 or more whitespace characters
[A-Z]{1} // Match characters between an UPPERCASE A-Z Range 1 time
[A-Za-z0-9\s-']+ // Match characters between A-Z, a-z, 0-9, whitespace, dashe and apostrophe
So straight away, your 'bon jovi' might not get matched as it's lower case and you're only looking for uppercase characters. 'bon jovi' also contains a space so perhaps changing that part of the regular expression to allow for lowercase characters and whitespace might help so you'd end up with:
$pattern = "~\d+[.][\s]*[A-Za-z\s]{1}[A-Za-z0-9\s-']+~";
Note: I quickly tested this on RegExr ( http://gskinner.com/RegExr/ ) and it appeared to match the string fine.
Your regrex is as follows.
~ // delimiter
\d+ // 1 or more numbers
[.] // a period
[\s]* // 0 or more whitespace characters
[A-Z]{1} // 1 upper case letter
[A-Za-z0-9\s-\']+ // 1 or more characters, from the character class
~ //delimiter
Comparing that to the string "2.bon jovi" You have:
~ //
\d+ // "2"
[.] // "."
[\s]* // ""
[A-Z]{1} // <- NO MATCH
[A-Za-z0-9\s-\']+ //
~ //
"bon" does not start with a captial letter, it therefore does not match [A-Z]{1}
Cleaner regex
There are a few simple things you can do to clean up your regex
don't use character-classes for one character
don't specify {1} it's the same as not being present
Applying the above to your existing regex you get:
$pattern = "~\d+\.\s*[A-Z][A-Za-z0-9\s-']+~";
Which is slightly easier to read.
Your [A-Z]{1} sub-pattern requires one capital letter, so "2.bon jovi - it's my life" will not match.
And you need to escape the - in the [A-Za-z0-9\s-'] character class, or put it at the start or end, otherwise it is specifying a range.
"~\d+\.[A-Za-z0-9\s'-]+~"
As pointed out in the comments, it is actually not necessary to escape the - in the character class in your regex. That is only because you happened to precede it with a metacharacter \s that cannot be part of a range. Normally, if you want to match a literal - and you have it in a character class, you must escape it or position it as described above.

Consolidate repeating pattern

I am working on a script that develops certain strings of alphanumeric characters, separated by a dash -. I need to test the string to see if there are any sets of characters (the characters that lie in between the dashes) that are the same. If they are, I need to consolidate them. The repeating chars would always occur at the front in my case.
Examples:
KRS-KRS-454-L
would become:
KRS-454-L
DERP-DERP-545-P
would become:
DERP-545-P
<?php
$s = 'KRS-KRS-454-L';
echo preg_replace('/^(\w+)-(?=\1)/', '', $s);
?>
// KRS-454-L
This uses a positive lookahead (?=...) to check for repeated strings.
Note that \w also contains the underscore. If you want to limit to alphanumeric characters only, use [a-zA-Z0-9].
Also, I've anchored with ^ as you've mentioned: "The repeating chars would always occur at the front [...]"
Try the pattern:
/([a-z]+)(?:-\1)*(.*)/i
and replace it with:
$1$2
A demo:
$tests = array(
'KRS-KRS-454-L',
'DERP-DERP-DERP-545-P',
'OKAY-666-A'
);
foreach ($tests as $t) {
echo preg_replace('/([a-z]+)(?:-\1)*(.*)/i', '$1$2', $t) . "\n";
}
produces:
KRS-454-L
DERP-545-P
OKAY-666-A
A quick explanation:
([a-z]+) # group the first "word" in match group 1
(?:-\1)* # match a hyphen followed by what was matched in
# group 1, and repeat it zero or more times
(.*) # match the rest of the input and store it in group 2
the replacement string $1$2 are replaced by what was matched by group 1 and group 2 in the pattern above.
Use this regex ((?:[A-Z-])+)\1{1} and replaced the matched string by $1.
\1 is used in connection with {1} in the above regex. It will look for repeating instance of characters.
You need back references. Using perl syntax, this would work for you:
$line =~ s/([A-Za-z0-9]+-)\1+/\1/gi;

How to replace double/more letters to a single letter?

I need to convert any letter that occur twice or more within a word with a single letter of itself.
For example:
School -> Schol
Google -> Gogle
Gooooogle -> Gogle
VooDoo -> Vodo
I tried the following, but stuck at the second parameter in eregi_replace.
$word = 'Goooogle';
$word2 = eregi_replace("([a-z]{2,})", "?", $word);
If I use \\\1 to replace ?, it would display the exact match.
How do I make it single letter?
Can anyone help? Thanks
See regular expression to replace two (or more) consecutive characters by only one?
By the way: you should use the preg_* (PCRE) functions instead of the deprecated ereg_* functions (POSIX).
Richard Szalay's answer leads the right way:
$word = 'Goooogle';
$word2 = preg_replace('/(\w)\1+/', '$1', $word);
Not only are you capturing the entire thing (instead of just the first character), but {2,} rematching [a-z] (not the original match). It should work if you use:
$word2 = eregi_replace("(\w)\1+", "\\1", $word);
Which backreferences the original match. You can replace \w with [a-z] if you wish.
The + is required for your Goooogle example (for the JS regex engine, anyway), but I'm not sure why.
Remember that you will need to use the "global" flag ("g").
Try this:
$string = "thhhhiiiissssss hasss sooo mannnny letterss";
$string = preg_replace('/([a-zA-Z])\1+/', '$1', $string);
How this works:
/ ... / # Marks the start and end of the expression.
([a-zA-Z]) # Match any single a-z character lowercase or uppercase.
\1+ # One or more occurrence of the single character we matched previously.
$1
\1+ # The same single character we matched previously.

Categories