I have an issue with right capitalization of (dutch) city names when they start with an apostrophe. For instance I could have the names:
'S-HERTOGENBOSCH or 's gravendeel or 'T Harde
What I would like to do is to bring all to lowercase and then capitalize the following letter after the prefix 'S or 's or 'T. So the outcome should be:
's-Hertogenbosch and 's Gravendeel and 't Harde
I'm thinking about using a Regex to do this but am not quite sure yet how this should be done. Could someone point me in the right direction?
Thanks!
You could use preg_replace_callback.
$city = strtolower("'T-HERTOGENBOSCH");
echo preg_replace_callback("/('(s|t)( |\-))([a-z])/", function($matches) {
return $matches[1] . ucfirst($matches[4]);
}, $city);
The pattern is using multiple subpatterns, whose results getting reassembled in the callback function:
('(s|t)( |\-)) # Apostrophe, then 's' or 't', then '<space>' or '-'
([a-z]) # The following lowercased character
Note that I've wrapped the first part into an additional subpattern. This makes reassembling it simpler.
Try the following function which is based on sanchises RegEx (I edited it slightly...):
function dutch_city_name($name) {
$name = strtolower(trim($name));
$matches = array();
preg_match("/'([a-z])( |-)[a-z]*/", $name, $matches);
if(count($matches) == 0) {
return $name;
}
return "'".$matches[1].$matches[2].ucfirst(substr($name, 3, strlen($name) - 3));
}
I tried it and it is working.
Firstly, I would like to reccomend websites like regex101.com or equivalent. Then, lets talk you through a very basic regex:
-You want the literal "'" character followed by exactly one other character which you would like to match to 'uncapitalize',
-and then a whole word
Basically, you need to match something of the form '(a-zA-Z)(?: |\-)[a-zA-Z]*. From left to right
' Literal '
(a-zA-Z) Single character in the alphabet, lower- or uppercase. Is a matching group.
(?: |\-) Either a space or a dash. Is not a matching group
[a-zA-Z]* A series of characters in the alphabet. Could be (a-zA-Z)* if you want something with this bit too.
Now that you have your matching, all you need to do is replace it with the uncapitalized version, for example using a PHP function.
Here's one without regex. It simply checks if the first character is an apostrophe and if so, skips the character after the apostrophe when searching for the first letter to capitalize.
function capitalizeCityName($name) {
$name = strtolower(trim($name));
$i = ($name[0] === "'") ? 2 : 1;
for(; $i<strlen($name); $i++) {
if(ctype_alpha($name[$i])) {
$name[$i] = strtoupper($name[$i]);
break;
}
}
return $name;
}
print capitalizeCityName("'T Harde"); //'t Harde
print capitalizeCityName("Harde"); //Harde
I don't know if the PHP replace function you want to use supports changing case of letters on dynamic replace string. But following worked with Perl regular expression engine in text editor UltraEdit v21.10.
Search string:
'([STst])(\W)(\w)([\w\-]+)
Replace string:
'\L\1\E\2\U\3\E\L\4\E
or
'\l\1\2\u\3\L\4\E
The search string matches:
a straight apostrophe,
followed by character s or t in any case marked for backreferencing as string 1,
a single non word character marked for backreferencing as string 2,
a single word character marked for backreferencing as string 3,
1 or more additional word characters or hyphens marked for backreferencing as string 4.
The replace string:
keeps the apostrophe,
first marked string (character s or t in any case) converted to lower case,
second marked string unmodified,
third marked string (first word character of city name) converted to upper case,
fourth marked string converted to lower case.
Explanation of the special characters in replace string:
\l ... convert only next character to lower case.
\u ... convert only next character to upper case.
\L ... convert all characters up to \E to lower case.
\U ... convert all characters up to \E to upper case.
Note: The case conversion works only for the ASCII letters A-Za-z and not for language specific, localized letters like German umlauts, characters with an accent, etc.
Related
I have a string with text, numbers, and symbols. I'm trying to extract the numbers, and symbols from the string with limited success. Instead of getting the entire number and symbols, I'm only getting part of it. I will explain my regex below, to make it more clearer, and easier to understand.
\d : any number
[+,-,*,/,0-9]+ : 1 or more of any +,-,*,/, or number
\d : any number
Code:
$string = "text 1+1-1*1/1= text";
$regex = "~\d[+,-,*,/,0-9]+\d~siU";
preg_match_all($regex, $string, $matches);
echo $matches[0][0];
Expected Results
1+1-1*1/1
Actual Results
1+1
Remove the U flag. It's causing the the + to be nongreedy in its matching. Also, you don't need commas between characters in your character list. (You only need 1 , if you're trying match it. You do need to escape - so that it doesn't think you're trying to make a range
The problem here is that your regex does mix up quite a few unescaped metacharacters. In your character class you have [+,-,*,/,0-9]. You do not need to separate different characters with commas, that will only tell the regex-engine to include commas in your expression. Furthermore, you need to escape the -, as it has a special meaning inside the character class. As it is, it will be interpreted as 'characters from "," to "," instead of the literal character "-". A similar problem exists with the "/"-character. The expression \d[+\-*/0-9]+\d should do the trick.
Didn't test it with your code but should work :)
((?:[0-9]+[\+|\-|\*|\/]?)+)
More in details, if you want to understand my pattern : https://regex101.com/r/mF0zO8/2
Could you help me with PHP function/regex that in given text finds all words starting with character ":" ?
..in other words all substrings that start with ":" and are separated with " " (a space)
Since :word should probably be valid, and I guess :word:another should be considered two words, then you cannot say that there is always a space.
Words in natural languages can be followed by dots and other characters.
In digital input, they can be followed by end of line.
I suggest using this regexp:
~:\w+~
It takes any : character followed by at least one alpha character and will end at any character that is not valid letter.
Example: on RegExr.com
You can also try ~:\w+\b~, where \b is word boundary (literally end of word), but I see it not necessary here.
Note: \w stands for [a-zA-Z0-9_] meaning it catches underscores _ and digits 0-9 as well. It works pretty much like variable/function naming in PHP
EDIT (some notes on usage):
You said that in given text (I understand that like input with random things) you want to extract all words prepended with :, for example :word. To do that easily, you should use preg_match_all() function with PREG_PATTERN_ORDER flag.
Example:
$regex = '~(:\w+)~';
if (preg_match_all($regex, $input, $matches, PREG_PATTERN_ORDER)) {
foreach ($matches[1] as $word) {
echo $word .'<br/>';
}
}
regex: /:\w+\s/g
\w Matchs any word character
\s Matchs whitespace character
This would work:
preg_match('/^:\w*\s$/g', $var);
Sorry, because I don't use PHP. But I suppose that your problem is that PHP would have reserved the character ":" for some reason in its regex implementation ?
Well, in that case, you still can catch any word beginning with ":" and ending with some space this way:
(...)
match('^\x3A[.]*[\s]');
("3A" is hexadecimal value for 58, which is the ASCII code for ":")
This should work, I think...
I am trying to set a validation rule for a field in my form that checks that the input only contains letters.
At first I tried to make a function that returned true if there were no numbers in the string, for that I used preg_match:
function my_format($str)
{
return preg_match('/^([^0-9])$', $str);
}
It doesn't matter how many times I look at the php manual, it seems like I won't get to understand how to create the pattern I want. What's wrong with what I made?
But I'd like to extend the question: I want the input text to contain any letter but no numbers nor symbols, like question marks, exclamation marks, and all those you can imagine. BUT the letters I want are not only a-z, I want letters with all kinds of accents, as those used in Spanish, Portuguese, Swedish, Polish, Serbian, Islandic...
I guess this is no easy task and hard or impossible to do with preg_match. It there any library that covers my exact needs?
If you're using utf-8 encoded input, go for unicode regex. Using the u modifier.
This one would match a string that only consists of letters and any kind of whitespace/invisible separators:
preg_match('~^[\p{L}\p{Z}]+$~u', $str);
function my_format($str)
{
return preg_match('/^\p{L}+$/', $str);
}
Simpler than you think about!
\p{L} matches any kind of letter from any language
First of all,Merry Christmas.
You are on the right track with the first one, just missing a + to match one or more non-number characters:
preg_match('/^([^0-9]+)$/', $str);
As you can see, 0-9 is a range, from number 0 to 9. This applies to some other cases, like a-z or A-Z, the '-' is special and it indicates that it is a range. for 0-9, you can use shorthand of \d like:
preg_match('/^([^\d]+)$/', $str);
For symbols, if your list is punctuations . , " ' ? ! ; : # $ % & ( ) * + - / < > = # [ ] \ ^ _ { } | ~, there is a shorthand.
preg_match('/^([^[:punct:]]+)$/', $str);
Combined you get:
preg_match('/^([^[:punct:]\d]+)$/', $str);
Use the [:alpha:] POSIX expression.
function my_format($str) {
return preg_match('/[[:alpha:]]+/u', $str);
}
The extra [] turns the POSIX into a range modified by the + to match 1 or more alphabetical characters. As you can see, the :alpha: POSIX matches accented characters as well
If you want to include whitespace, just add \s to the range:
preg_match('/[[:alpha:]\s]+/u', $str);
EDIT: Sorry, I misread your question when I looked over it a second time and thought you wanted punctuation. I've taken it back out.
i'm working on class names and i need to check if there is any upper camel case name and break it this way:
"UserManagement" becomes "user-management"
or
"SiteContentManagement" becomes "site-content-management"
after extensive search i only found various use of ucfirst, strtolower,strtoupper, ucword and i can't see how to use them to suit my needs any ideas?
thanks for reading ;)
You can use preg_replace to replace any instance of a lowercase letter followed with an uppercase with your lower-dash-lower variant:
$dashedName = preg_replace('/([^A-Z-])([A-Z])/', '$1-$2', $className);
Then followed by a strtolower() to take care of any remaining uppercase letters:
return strtolower($dashedName);
The full function here:
function camel2dashed($className) {
return strtolower(preg_replace('/([^A-Z-])([A-Z])/', '$1-$2', $className));
}
To explain the regular expression used:
/ Opening delimiter
( Start Capture Group 1
[^A-Z-] Character Class: Any character NOT an uppercase letter and not a dash
) End Capture Group 1
( Start Capture Group 2
[A-Z] Character Class: Any uppercase letter
) End Capture Group 2
/ Closing delimiter
As for the replacement string
$1 Insert Capture Group 1
- Literal: dash
$2 Insert Capture Group 2
Theres no built in way to do it.
This will ConvertThis into convert-this:
$str = preg_replace('/([a-z])([A-Z])/', '$1-$2', $str);
$str = strtolower($str);
You can use a regex to get each words, then add the dashes like this:
preg_match_all ('/[A-Z][a-z]+/', $className, $matches); // get each camelCase words
$newName = strtolower(implode('-', $matches[0])); // add the dashes and lowercase the result
This simply done without any capture groups -- just find the zero-width position before an uppercase letter (excluding the first letter of the string), then replace it with a hyphen, then call strtolower on the new string.
Code: (Demo)
echo strtolower(preg_replace('~(?!^)(?=[A-Z])~', '-', $string));
The lookahead (?=...) makes the match but doesn't consume any characters.
The best way to do that might be preg_replace using a pattern that replaces uppercase letters with their lowercase counterparts adding a "-" before them.
You could also go through each letter and rebuild the whole string.
I want a regular expression in PHP which will check for all caps the string.
If the given string contains all capital letters irrespective of numbers and other characters then it should match them.
Since you want to match other characters too, look for lowercase letters instead of uppercase letters. If found, return false. (Or use tdammers' suggestion of a negative character class.)
return !preg_match('/[a-z]/', $str);
You can also skip regex and just compare strtoupper($str) with the original string, this leaves digits and symbols intact:
return strtoupper($str) == $str;
Both don't account for multi-byte strings though; for that, you could try adding a u modifier to the regex and using mb_strtoupper() respectively (I've not tested either — could someone more experienced with Unicode verify this?).
if (preg_match('/^[^\p{Ll}]*$/u', $subject)) {
# String doesn't contain any lowercase characters
} else {
# String contains at least one lowercase characters
}
\p{Ll} matches a Unicode lowercase letter; [^\p{Ll}] therefore matches any character that is not a lowercase letter.
Something like this maybe:
'/^[^a-z]*$/'
The trick is to use an exclusive character class: this one matches all characters that are not lower-case letters. Note that accented letters aren't checked.