PHP: Find all words starting with ":" - php

Could you help me with PHP function/regex that in given text finds all words starting with character ":" ?
..in other words all substrings that start with ":" and are separated with " " (a space)

Since :word should probably be valid, and I guess :word:another should be considered two words, then you cannot say that there is always a space.
Words in natural languages can be followed by dots and other characters.
In digital input, they can be followed by end of line.
I suggest using this regexp:
~:\w+~
It takes any : character followed by at least one alpha character and will end at any character that is not valid letter.
Example: on RegExr.com
You can also try ~:\w+\b~, where \b is word boundary (literally end of word), but I see it not necessary here.
Note: \w stands for [a-zA-Z0-9_] meaning it catches underscores _ and digits 0-9 as well. It works pretty much like variable/function naming in PHP
EDIT (some notes on usage):
You said that in given text (I understand that like input with random things) you want to extract all words prepended with :, for example :word. To do that easily, you should use preg_match_all() function with PREG_PATTERN_ORDER flag.
Example:
$regex = '~(:\w+)~';
if (preg_match_all($regex, $input, $matches, PREG_PATTERN_ORDER)) {
foreach ($matches[1] as $word) {
echo $word .'<br/>';
}
}

regex: /:\w+\s/g
\w Matchs any word character
\s Matchs whitespace character

This would work:
preg_match('/^:\w*\s$/g', $var);

Sorry, because I don't use PHP. But I suppose that your problem is that PHP would have reserved the character ":" for some reason in its regex implementation ?
Well, in that case, you still can catch any word beginning with ":" and ending with some space this way:
(...)
match('^\x3A[.]*[\s]');
("3A" is hexadecimal value for 58, which is the ASCII code for ":")
This should work, I think...

Related

regex - Why doesn't this work to make sure a variable has some text?

I have been looking around and googling about regex and I came up with this to make sure some variable has letters in it (and nothing else).
/^[a-zA-Z]*$/
In my mind ^ denotes the start of the string, the token [a-zA-Z] should in my mind make sure only letters are allowed. The star should match everything in the token, and the $-sign denotes the end of the string I'm trying to match.
But it doesn't work, when I try it on regexr it doesn't work sadly. What's the correct way to do it then? I would also like to allow hyphen and spaces, but figured just starting with letters are good enough to start with and expand.
Short answer : this is what you are looking for.
/^[a-zA-Z]+$/
The star * quantifier means "zero or more", meaning your regexp will match everytime even with an empty string as subject. You need the + quantifier instead (meaning "one or more") to achieve what you need.
If you also want to match at least one character which could also be a whitespace or a hyphen you could add those to your character class ^[A-Za-z -]+$ using the plus + sign for the repetition.
If you want to use preg_match to match at least one character which can contain an upper or lowercase character, you could shorten your pattern to ^[a-z]+$ and use the i modifier to make the regex case insensitive. To also match a hyphen and a whitespace, this could look like ^[a-z -]+$
For example:
$strings = [
"TeSt",
"Te s-t",
"",
"Te4St"
];
foreach ($strings as $string) {
if(preg_match('#^[a-z -]+$#i', $string, $matches)){
echo $matches[0] . PHP_EOL;
}
}
That would result in:
TeSt
Te s-t
Output php

Remove string's portion from the start until certain word

I have a string :
$s = "I am not foo+bar";
I want to remove the first portion of $s starting from the beginning of the string until the word "foo+" so it becomes "I am not foo+bar" :
$s == "bar"
How can I achieve that with PHP?
Edit : I have a "+" sign inside the string. Why preg_replace is not replacing it? The pattern that I've used is /^(.*?\bfoo+)\b/. Any ideas?
You should be able to use a regex to find everything up until a certain word. For your example,
/^(.*?\bfoo)\b/
Should work with preg_replace. The ^ makes sure we start at the beginning of the string. .*? is anything (excluding new lines add the s modifier to allow new lines as well) until the first foo.
Simply put: \b allows you to perform a "whole words only" search using a regular expression in the form of \bword\b. A "word character" is a character that can be used to form words. All characters that are not "word characters" are "non-word characters".
-http://www.regular-expressions.info/wordboundaries.html
Regex demo: https://regex101.com/r/gJ3nS7/3
Rough untested replacement example using preg_quote.
preg_replace('/^(.*?\b' . preg_quote('foo', '/') . '\b/', '', $s);
Longer example the + is a special character but also is a non-word character so the \b won't work trailing that. You can put the + into an optional grouping with the word boundary and that should work.
https://regex101.com/r/gJ3nS7/5
/^(.*?\bfoo(?:\+|\b))/

Capitalize city names when they start with apostrophe

I have an issue with right capitalization of (dutch) city names when they start with an apostrophe. For instance I could have the names:
'S-HERTOGENBOSCH or 's gravendeel or 'T Harde
What I would like to do is to bring all to lowercase and then capitalize the following letter after the prefix 'S or 's or 'T. So the outcome should be:
's-Hertogenbosch and 's Gravendeel and 't Harde
I'm thinking about using a Regex to do this but am not quite sure yet how this should be done. Could someone point me in the right direction?
Thanks!
You could use preg_replace_callback.
$city = strtolower("'T-HERTOGENBOSCH");
echo preg_replace_callback("/('(s|t)( |\-))([a-z])/", function($matches) {
return $matches[1] . ucfirst($matches[4]);
}, $city);
The pattern is using multiple subpatterns, whose results getting reassembled in the callback function:
('(s|t)( |\-)) # Apostrophe, then 's' or 't', then '<space>' or '-'
([a-z]) # The following lowercased character
Note that I've wrapped the first part into an additional subpattern. This makes reassembling it simpler.
Try the following function which is based on sanchises RegEx (I edited it slightly...):
function dutch_city_name($name) {
$name = strtolower(trim($name));
$matches = array();
preg_match("/'([a-z])( |-)[a-z]*/", $name, $matches);
if(count($matches) == 0) {
return $name;
}
return "'".$matches[1].$matches[2].ucfirst(substr($name, 3, strlen($name) - 3));
}
I tried it and it is working.
Firstly, I would like to reccomend websites like regex101.com or equivalent. Then, lets talk you through a very basic regex:
-You want the literal "'" character followed by exactly one other character which you would like to match to 'uncapitalize',
-and then a whole word
Basically, you need to match something of the form '(a-zA-Z)(?: |\-)[a-zA-Z]*. From left to right
' Literal '
(a-zA-Z) Single character in the alphabet, lower- or uppercase. Is a matching group.
(?: |\-) Either a space or a dash. Is not a matching group
[a-zA-Z]* A series of characters in the alphabet. Could be (a-zA-Z)* if you want something with this bit too.
Now that you have your matching, all you need to do is replace it with the uncapitalized version, for example using a PHP function.
Here's one without regex. It simply checks if the first character is an apostrophe and if so, skips the character after the apostrophe when searching for the first letter to capitalize.
function capitalizeCityName($name) {
$name = strtolower(trim($name));
$i = ($name[0] === "'") ? 2 : 1;
for(; $i<strlen($name); $i++) {
if(ctype_alpha($name[$i])) {
$name[$i] = strtoupper($name[$i]);
break;
}
}
return $name;
}
print capitalizeCityName("'T Harde"); //'t Harde
print capitalizeCityName("Harde"); //Harde
I don't know if the PHP replace function you want to use supports changing case of letters on dynamic replace string. But following worked with Perl regular expression engine in text editor UltraEdit v21.10.
Search string:
'([STst])(\W)(\w)([\w\-]+)
Replace string:
'\L\1\E\2\U\3\E\L\4\E
or
'\l\1\2\u\3\L\4\E
The search string matches:
a straight apostrophe,
followed by character s or t in any case marked for backreferencing as string 1,
a single non word character marked for backreferencing as string 2,
a single word character marked for backreferencing as string 3,
1 or more additional word characters or hyphens marked for backreferencing as string 4.
The replace string:
keeps the apostrophe,
first marked string (character s or t in any case) converted to lower case,
second marked string unmodified,
third marked string (first word character of city name) converted to upper case,
fourth marked string converted to lower case.
Explanation of the special characters in replace string:
\l ... convert only next character to lower case.
\u ... convert only next character to upper case.
\L ... convert all characters up to \E to lower case.
\U ... convert all characters up to \E to upper case.
Note: The case conversion works only for the ASCII letters A-Za-z and not for language specific, localized letters like German umlauts, characters with an accent, etc.

PHP preg_match: any letter but no numbers (and symbols)

I am trying to set a validation rule for a field in my form that checks that the input only contains letters.
At first I tried to make a function that returned true if there were no numbers in the string, for that I used preg_match:
function my_format($str)
{
return preg_match('/^([^0-9])$', $str);
}
It doesn't matter how many times I look at the php manual, it seems like I won't get to understand how to create the pattern I want. What's wrong with what I made?
But I'd like to extend the question: I want the input text to contain any letter but no numbers nor symbols, like question marks, exclamation marks, and all those you can imagine. BUT the letters I want are not only a-z, I want letters with all kinds of accents, as those used in Spanish, Portuguese, Swedish, Polish, Serbian, Islandic...
I guess this is no easy task and hard or impossible to do with preg_match. It there any library that covers my exact needs?
If you're using utf-8 encoded input, go for unicode regex. Using the u modifier.
This one would match a string that only consists of letters and any kind of whitespace/invisible separators:
preg_match('~^[\p{L}\p{Z}]+$~u', $str);
function my_format($str)
{
return preg_match('/^\p{L}+$/', $str);
}
Simpler than you think about!
\p{L} matches any kind of letter from any language
First of all,Merry Christmas.
You are on the right track with the first one, just missing a + to match one or more non-number characters:
preg_match('/^([^0-9]+)$/', $str);
As you can see, 0-9 is a range, from number 0 to 9. This applies to some other cases, like a-z or A-Z, the '-' is special and it indicates that it is a range. for 0-9, you can use shorthand of \d like:
preg_match('/^([^\d]+)$/', $str);
For symbols, if your list is punctuations . , " ' ? ! ; : # $ % & ( ) * + - / < > = # [ ] \ ^ _ { } | ~, there is a shorthand.
preg_match('/^([^[:punct:]]+)$/', $str);
Combined you get:
preg_match('/^([^[:punct:]\d]+)$/', $str);
Use the [:alpha:] POSIX expression.
function my_format($str) {
return preg_match('/[[:alpha:]]+/u', $str);
}
The extra [] turns the POSIX into a range modified by the + to match 1 or more alphabetical characters. As you can see, the :alpha: POSIX matches accented characters as well
If you want to include whitespace, just add \s to the range:
preg_match('/[[:alpha:]\s]+/u', $str);
EDIT: Sorry, I misread your question when I looked over it a second time and thought you wanted punctuation. I've taken it back out.

Regular Expression: Detect Specific String

I'm trying to search a string in PHP using preg_match(), to return a string that takes the format of 'xdr', where x is a single digit.
The string is otherwise made of characters, numbers, '.' full stops and spaces.
I'm hopeless with regular expressions, can somebody help me out?
I've tried ^([a-zA-Z0-9]+)\s2dr$ but it doesn't work
For example, the string might look like A really big 2.0 was 3dr once
As far as I understand your needs, this regex will work to match a string that contains some number followed by dr:
/\b\d+dr\b/
in action in preg_match:
preg_match('/\b\d+dr\b/', $string);
explanation:
/ : regex delim
\b : word boundary
\d+ : one or more digit
dr : literaly dr
\b : word boundary
/ : regex delim
If you want the other characters to be only characters, numbers, '.' full stops and spaces use this:
preg_match('/^[\w\d. ]*\b\d+dr\b[\w\d. ]*$/', $string);
And, to be unicode compatible:
preg_match('/^[\pL\pN. ]*\b\pN+dr\b[\pL\pN. ]*$/', $string);
You're close.
The following regex will match a single digit followed by one or more alpha characters, numbers, full stops and spaces, and ends in dr.
^\d[a-zA-Z0-9. ]+dr$
Note: Not sure if you want the literal dr, if not, drop it from the end.

Categories