Regex to replace character with character itself and hyphen

Regex to replace character with character itself and hyphen - php

I need to replace some camelCase characters with the camel Case character and a -.
What I have got is a string like those:
Albert-Weisgerber-Allee 35
Bruninieku iela 50-10
Those strings are going through this regex to seperate the number from the street:
$data = preg_replace("/[^ \w]+/", '', $data);
$pcre = '\A\s*(.*?)\s*\x2f?(\pN+\s*[a-zA-Z]?(?:\s*[-\x2f\pP]\s*\pN+\s*[a-zA-Z]?)*)\s*\z/ux';
preg_match($pcre, $data, $h);
Now, I have two problems.
I'm very bad at regex.
Above regex also cuts every - from the streets name, and there are a lot of those names in germany and europe.
Actually it would be quite easy to just adjust the regex to not cut any hyphens, but I want to learn how regex works and so I decided to try to find a regex that just replaces every camel case letter in the string with
- & matched Camel Case letter
except for the first uppercase letter appearance.
I've managed to find a regex that shows me the places I need to paste a hyphen like so:
.[A-Z]{1}/ug
https://regex101.com/r/qI2iA9/1
But how on earth do I replace this string:
AlbertWeisgerberAllee
that it becomes
Albert-Weisgerber-Allee

To insert dashes before caps use this regex:
$string="AlbertWeisgerberAllee";
$string=preg_replace("/([a-z])([A-Z])/", "\\1-\\2", $string);

Just use capture groups:
(.)([A-Z]) //removed {1} because [A-Z] implicitly matches {1}
And replace with $1-$2
See https://regex101.com/r/qI2iA9/3

You seem to be over complicating the expression. You can use the following to place - before any uppercase letters except the first:
(.)(?=[A-Z])
Just replace that with $1-. Essentially, what this regex does is:
(.) Find any character and place that character in group 1.
(?=[A-Z]) See if an uppercase character follows.
$1- If matched, replace with the character found in group 1 followed by a hyphen.

Related

Php Regex to insert character after first all-capital letter word in a string

I'm trying to use a preg_replace or similar php function to:
- identify the first all capital letter word in a string,
- and insert a character directly after it (a dash or semi-colon will do)
- the all capital letter word should be 3 characters long or more.
So far I have the regular expression:
/(?<!\ )([^A-Z{3,}])/
But, this isn't working in terms of only words that are 3+ characters. I'm also not sure I have it 'strictly' only looking at the very first word.
I believe that once I have the regex sorted out - this
$string = "LONDON On November 12th twelve people...";
$replaced_string = preg_replace('/myregex/',': ', $string);
will output as the following
LONDON: On November 12th twelve people..."

It's a fairly simple regex, really:
$replacedString = preg_replace('/\b([A-Z]{3,})\b/', '$1: ', $string);
It works like this:
\b: word boundary. This detects the start and end of a "word"
([A-Z]{3,}): Match 3 or more upper-case characters. The brackets capture this part of the match, so we can use it in the replacement string
\b: Another word boundary
Replace this match with:
'$1: ': the $1 refers back to the first captured group (the 3 or more upper case characters). To this, we're adding a colon and a space. That will be our replacement string
This will add the colon and space after all upper-case words of 3 or more characters. To replace only 1 word, just pass a limit to preg_replace:
$replaced = preg_replace('/\b([A-Z]{3,})\b/', '$1: ', $string, 1);
Where that last argument is the number of matches you wish to replace. -1 for all, 1 for 1, 2 for 2, etc...
Demo
Judging by your sample string, the upper-case words are city names. It's possible for city names to contain a dash, or even a space. To address this, you might want to match all strings containing upper-case chars, dashes and spaces:
$replaceAll = preg_replace('/\b([A-Z -]{2,}[A-Z])\b/', '$1: ', $string);
Demo 2
What changed:
([A-Z -]{2,}: The capturing match start with upper-case chars (2 or more, not 3), but also matches spaces and dashes.
[A-Z]): The last character of the captured group must be an upper-case character, this avoids capturing the trailing spaces or dashes. The result is that we capture stuff like "NEW YORK" or "FOO-TOWN", but not "ON - Something".
The rest is the same as before. If you want to allow for other characters that might occur (like a dot) just add them to the first part of the capturing group. The most complete pattern will probably be something like this:
$replaced = preg_replace('/\b([A-Z][A-Z .-]+[A-Z])\b/', '$1: ', $string);
This ensures the captured group starts, and ends with an upper case character, and contains any number of upper-case chars, spaces, dots and dashes in between. So this will match something like "ST. LEWIS", too

Match 2 or more uppercase characters in entire string

I'm trying to create a pattern in PHP that matches 2 or more upper case characters in a string.
I've tried the following, but it only matches 2 or more upper case characters in a row, not the entire string:
preg_match('/[A-Z]{2,}/', $string);
For example, the string "aBcDe" or "Red Apple" should return true.

You just have to allow other characters between your uppercase letters:
^(?:.*?\p{Lu}){2}
Demo
I used \p{Lu} here to include Unicode characters as well. If you don't want that just use [A-Z] instead like you did in your pattern.
This simply means:
^ from the start of the pattern
(?: group:
.*? match anything, but as few chars as possible
\p{Lu} match an uppercase letter
){2} ... two times

If all you need to do is identify that a string contains at least 2 uppercase characters then you can use the following:
[A-Z].*?[A-Z]
Try it here.
If you need to identify the specific uppercase characters in the string then things get more complicated.
UPDATE: As Lucas mentioned, you need a different regex if you want unicode support.
\p{Lu}.*?\p{Lu}

^.*[A-Z].*[A-Z].*$
A simple pattern stating the same would do.See demo.
https://regex101.com/r/pT4tM5/23

[A-Z].*[A-Z]
is about as simple as it gets - match an uppercase followed by anything repeated any number of times followed by any other uppercase letter.

If you need to match the whole line/string that has at least 2 upper case letters, you can also use
^(?=(?:.*[A-Z]){2}).+$
Demo here.

regex exclude space from \W

What I have now: preg_match("[\W|_]",$string), which matches any non-word character and underscores. However, I only want to match strings containing \w and individual spaces in the middle of the string (as opposed to $string starting or ending with any number of spaces), but not underscores. Thanks for your help!
Examples that should be matched: Example 123 or One Two Three.
Examples that should be rejected: example& or (starting with one ore more spaces, and multiple spaces between "Example" and "of) Example of foo.

Ah, so you don't need to catch the results of the match - just to test whether or not the string matches some pattern. That can be done with...
$pattern = '/^[A-Z0-9](?:[A-Z0-9 ]*[A-Z0-9])?$/i';
... but that's destined to fail if you want to cover letters outside of ASCII range. You should use this instead then:
$pattern = '/^[\p{L}0-9](?:[\p{L}0-9 ]*[\p{L}0-9])?$/u';
Check the demo to see that in action.

PHP Regex Not Matching Desired Substrings

I've written the next regular expression
$pattern = "~\d+[.][\s]*[A-Z]{1}[A-Za-z0-9\s-']+~";
in order to match substrings as 2.bon jovi - it's my life
the problem is the only part that is recognized is - bon jovi
none " - " or " ' " are recognized by this regular expression.
I'd prefer to know what is wrong with the regular expression that I've wrote rather than getting a new one.

Your regular expressions states that after the period character (can be changed to \.), you will have zero or more white space characters which should then be followed by 1 upper case letter. In your string, you do not have any upper case letters.
Secondly, the - should be placed last when you want to match it. So, changing your regex to this: ~\d+[.][\s]*[A-Z]{1}[A-Za-z0-9\s'-]+~ will match something like so: 2.Bon jovi - it's my life.
On the other hand, you can change it to this: ~\d+[.][\s]*[A-Za-z0-9\s'-]+~ to match something like so: 2.bon jovi - it's my life.
EDIT: Ammended as per the comments of Marko D and aleation.

A better regular expression to handle that would be...
$pattern = "~\d+\.\s*[\pL\pP\s]+~";
CodePad.
This will match a number, followed by a ., followed by optional whitespace, followed by one or more Unicode letters, whitespace or punctuation marks.

$pattern = "~\d+\..*~";
$string = "2.bon jovi - it's my life";
preg_match($pattern, $string, $match);
print_r($match);
output: Array ( [0] => 2.bon jovi - it's my life )

So the way I understand this regular expression is:
\d+ // Match any digit, 1 or more times
[.] // Match a dot
[\s]* // Match 0 or more whitespace characters
[A-Z]{1} // Match characters between an UPPERCASE A-Z Range 1 time
[A-Za-z0-9\s-']+ // Match characters between A-Z, a-z, 0-9, whitespace, dashe and apostrophe
So straight away, your 'bon jovi' might not get matched as it's lower case and you're only looking for uppercase characters. 'bon jovi' also contains a space so perhaps changing that part of the regular expression to allow for lowercase characters and whitespace might help so you'd end up with:
$pattern = "~\d+[.][\s]*[A-Za-z\s]{1}[A-Za-z0-9\s-']+~";
Note: I quickly tested this on RegExr ( http://gskinner.com/RegExr/ ) and it appeared to match the string fine.

Your regrex is as follows.
~ // delimiter
\d+ // 1 or more numbers
[.] // a period
[\s]* // 0 or more whitespace characters
[A-Z]{1} // 1 upper case letter
[A-Za-z0-9\s-\']+ // 1 or more characters, from the character class
~ //delimiter
Comparing that to the string "2.bon jovi" You have:
~ //
\d+ // "2"
[.] // "."
[\s]* // ""
[A-Z]{1} // <- NO MATCH
[A-Za-z0-9\s-\']+ //
~ //
"bon" does not start with a captial letter, it therefore does not match [A-Z]{1}
Cleaner regex
There are a few simple things you can do to clean up your regex
don't use character-classes for one character
don't specify {1} it's the same as not being present
Applying the above to your existing regex you get:
$pattern = "~\d+\.\s*[A-Z][A-Za-z0-9\s-']+~";
Which is slightly easier to read.

Your [A-Z]{1} sub-pattern requires one capital letter, so "2.bon jovi - it's my life" will not match.
And you need to escape the - in the [A-Za-z0-9\s-'] character class, or put it at the start or end, otherwise it is specifying a range.
"~\d+\.[A-Za-z0-9\s'-]+~"
As pointed out in the comments, it is actually not necessary to escape the - in the character class in your regex. That is only because you happened to precede it with a metacharacter \s that cannot be part of a range. Normally, if you want to match a literal - and you have it in a character class, you must escape it or position it as described above.

PHP: How to convert a string that contains upper case characters

i'm working on class names and i need to check if there is any upper camel case name and break it this way:
"UserManagement" becomes "user-management"
or
"SiteContentManagement" becomes "site-content-management"
after extensive search i only found various use of ucfirst, strtolower,strtoupper, ucword and i can't see how to use them to suit my needs any ideas?
thanks for reading ;)

You can use preg_replace to replace any instance of a lowercase letter followed with an uppercase with your lower-dash-lower variant:
$dashedName = preg_replace('/([^A-Z-])([A-Z])/', '$1-$2', $className);
Then followed by a strtolower() to take care of any remaining uppercase letters:
return strtolower($dashedName);
The full function here:
function camel2dashed($className) {
return strtolower(preg_replace('/([^A-Z-])([A-Z])/', '$1-$2', $className));
}
To explain the regular expression used:
/ Opening delimiter
( Start Capture Group 1
[^A-Z-] Character Class: Any character NOT an uppercase letter and not a dash
) End Capture Group 1
( Start Capture Group 2
[A-Z] Character Class: Any uppercase letter
) End Capture Group 2
/ Closing delimiter
As for the replacement string
$1 Insert Capture Group 1
- Literal: dash
$2 Insert Capture Group 2

Theres no built in way to do it.
This will ConvertThis into convert-this:
$str = preg_replace('/([a-z])([A-Z])/', '$1-$2', $str);
$str = strtolower($str);

You can use a regex to get each words, then add the dashes like this:
preg_match_all ('/[A-Z][a-z]+/', $className, $matches); // get each camelCase words
$newName = strtolower(implode('-', $matches[0])); // add the dashes and lowercase the result

This simply done without any capture groups -- just find the zero-width position before an uppercase letter (excluding the first letter of the string), then replace it with a hyphen, then call strtolower on the new string.
Code: (Demo)
echo strtolower(preg_replace('~(?!^)(?=[A-Z])~', '-', $string));
The lookahead (?=...) makes the match but doesn't consume any characters.

The best way to do that might be preg_replace using a pattern that replaces uppercase letters with their lowercase counterparts adding a "-" before them.
You could also go through each letter and rebuild the whole string.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex to replace character with character itself and hyphen - php

To insert dashes before caps use this regex: $string="AlbertWeisgerberAllee"; $string=preg_replace("/([a-z])([A-Z])/", "\\1-\\2", $string);

Just use capture groups: (.)([A-Z]) //removed {1} because [A-Z] implicitly matches {1} And replace with $1-$2 See https://regex101.com/r/qI2iA9/3

Related

Php Regex to insert character after first all-capital letter word in a string

Match 2 or more uppercase characters in entire string

regex exclude space from \W

PHP Regex Not Matching Desired Substrings

PHP: How to convert a string that contains upper case characters

Categories

Resources