Find a pattern in string - php

I have a string like this:
{param1}{param2}{param3}....{myparam paramvalue}{paramn}
How can i get the paramvalue of myparam

Simple regex:
/\({[^ ]+?) ([^}]+?)\}/
{[^ ]+?) - it will look for anything at least 1 time occured but space and put it in subpattern
([^}]+?) - it will look for anything at least 1 time occured but { and put it in subpattern.
use it with preg_match() function
OR
The other simple regex:
preg_match('/([a-z0-9]+?) ([a-z0-9]+?)\}/', $str, $matches);
([a-z0-9]+?) - a-z 0-9 at least one time not greedy
([^}]+?) - a-z 0-9 at least one time not greedy
Output:
Array ( [0] => myparam paramvalue} [1] => myparam [2] => paramvalue )
Demo

To specifically get that parameter value, you first have to match the left part:
/\{myparam/
Followed by at least one space:
/\{myparam\s+/
Capture characters until a closing curly brace is found:
/\{myparam\s+([^}]+)\}/
The expression [^}]+ is a negative character set, indicated by the ^ just after the opening bracket; it means "match all characters except these".

Try with this regex:
/\{\w+\s+(\w+)\}/

if(preg_match('/\{'.preg_quote('myparam').' ([^\}]+)\}/', $input, $matches) {
echo "myparam=".$matches[1];
} else {
echo "myparam not found";
}
in preg_match, '{' and '}' are special chars, so they need to be escaped
the preg_quote may not be neccessary, as long as "myparam" will never have any special regex chars
the (cryptic) part ([^}]+)} matches one or more chars not being a '}', followed by '}'
the parantheses make that match available in the third arg to preg_match, $matches in this case

You can try this one as well:
.+?\s+([^}]+)
EDIT
Explanation:
.+? means match everything one or more time but its lazy, will prefer to match as less as it can.
\s+ means it will match white-spaces one or more time.
([^}]+) means match everything except `}`(close bracket) one or more time and capture group.

Related

Why regex with lookaheads doesn't match?

I need (in PHP) to split a sententse by the word that cannot be the first or the last one in the sentence. Say the word is "pression" and here is my regex
/^.+?(?=[\s\.\,\:\;])pression(?=[\s\.\,\:\;]).+$/i
Live here: https://regex101.com/r/CHAhKj/1/
First, it doesn't match.
Next, I think - it is at all possible to split that way? I tryed simplified example
print_r(preg_split('/^.+pizza.+$/', 'my pizza is cool'));
live here http://sandbox.onlinephpfunctions.com/code/10b674900fc1ef44ec79bfaf80e83fe1f4248d02
and it prints an array of 2 empty strings, when I expect
['my ', ' is cool']
I need (in PHP) to split a sentence by the word that cannot be the first or the last one in the sentence
You may use this regex:
(?<=[^\s.?]\h)pression(?=\h[^\s.?])
RegEx Demo
RegEx Details:
(?<=[^\s.?]\h): Lookbehind to assert that ahead of current position we have a space and a character that not a whitespace, not a dot and not a ?.
pression: Match word pression
(?=\h[^\s.?]): Lookahead to assert that before current position we have a space and a character that not a whitespace, not a dot and not a ?
First, ^.+?(?=[\s\.\,\:\;])pression(?=[\s\.\,\:\;]).+$ can't match any string at all because the (?=[\s\.\,\:\;])p part requires p to be also either a whitespace char, or a ., ,, : or ;, which invalidates the whole match at once.
Second, ^.+pizza.+$ pattern does not ensure the pizza matched is not the first or last word in a sentence as . matches whitespace, too. It does not return anything meaningful, because preg_split uses the match to break string into chunks, and the two empty values are 1) start of string and 2) empty string positions.
That said, all you need is:
preg_match('~^(.*?\w\W+)pression(\W+\w.*)$~is', $text, $m)
See the regex demo. Details:
^ - start of string
(.*?\w\W+) - Capturing group 1: any zero or more chars, as few as possible, then a word char and then one or more non-word chars
pression - a word
(\W+\w.*) - Capturing group 2: one or more non-word chars, a word char, and then any zero or more chars as many as possible
$ - end of string.
s makes the . match across lines and i flag makes the pattern match in a case insensitive way.
See the PHP demo:
$text = "You can use any regular expression pression inside the lookahead ";
if (preg_match('~^(.*?\w\W+)pression(\W+\w.*)$~is', $text, $m)) {
echo $m[1] . " << | >> " . $m[2];
}
// => You can use any regular expression << | >> inside the lookahead

preg match between two strings

I need help with this preg match. I tried this from other post but did not get the result. So finally posting it.
I am trying to extract z,a,b from first and a from second example.
1) Write a function operations with parameter z,a,b and returns b.
2) write a function factorial with parameter a.
This is what I tried so far:
preg_match_all('/\parameter(.*?)\and?/', $question, $match);
$questionVars = $match[1];
print $questionVars;
Thank you so much!
Your solution can be different depending on actual requirements.
If you need a string after parameter as a whole word that can consist of word and comma chars you may use
preg_match('~\bparameter\s+\K\w+(?:\s*,\s*\w+)*~', $s, $m)
See the regex demo. The \bparameter\s+ matches a word boundary, parameter and 1+ whitespace chars, and all this text is omitted with the help of \K, the match reset operator. \w+(?:\s*,\s*\w+)* matches and returns the 1+ word chars followed with 0+ repetitions of a comma enclosed with optional whitespace chars and again 1+ word chars.
If you plan to get those comma-separated chunks separately, use
preg_match_all('~(?:\G(?!^)\s*,\s*|\bparameter\s+)\K\w+~', $s, $m)
See another regex demo. Here, (?:\G(?!^),\s*|\bparameter\s+) will either match the whole word parameter with 1+ whitespace after (\bparameter\s+, as in the previous solution) or the end of the previous successful match with , enclosed with optional whitespace chars (\G(?!^)\s*,\s*). The \K will omit the text matched so far and \w+ will grab the value. You may replace with [^,]* to grab 0+ chars other than a comma.

Regular expressions, allow specific format only. "John-doe"

I've researched a little, but I found nothing that relates exactly to what I need and whenever tried to create the expression it is always a little off from what I require.
I attempted something along the lines of [AZaz09]{3,8}\-[AZaz09]{3,8}.
I want the valid result to only allow text-text, where either or the text can be alphabetical or numeric however the only symbol allowed is - and that is in between the two texts.
Each text must be at least three characters long ({3,8}?), then separated by the -.
Therefore for it to be valid some examples could be:
Text-Text
Abc-123
123-Abc
A2C-def4gk
Invalid tests could be:
Ab-3
Abc!-ajr4
a-bc3-25aj
a?c-b%
You need to use anchors and use the - so the characters in the character class are read as a range, not the individual characters.
Try:
^[A-Za-z0-9]{3,8}-[A-Za-z0-9]{3,8}$
Demo: https://regex101.com/r/xH3oM8/1
You also could simplify it a but with the i modifier and the \d meta character.
(?i)^[a-z\d]{3,8}-[a-z\d]{3,8}$
If accented letters should be allowed, or any other letter that exists in the Unicode range (like Greek or Cyrillic letters), then use the u modifier (for UTF-8 support) and \pL to match Unicode letters (and \d for digits):
$string ="
Mañana-déjà
Text-Text
Abc-123
123-Abc
A2C-def4gk
Ab-3
Abc!-ajr4
a-bc3-25aj
a?c-b%";
$regex='/^[\pL\d]{3,}-[\pL\d]{3,}$/mu';
preg_match_all($regex, $string, $matches);
var_export($matches);
Output:
array (
0 =>
array (
0 => 'Mañana-déjà',
1 => 'Text-Text',
2 => 'Abc-123',
3 => '123-Abc',
4 => 'A2C-def4gk',
),
)
NB: the difference with \w is that [\pL\d] will not match an underscore.
You could come up with the following:
<?php
$string ="
Text-Text
Abc-123
123-Abc
A2C-def4gk
Ab-3
Abc!-ajr4
a-bc3-25aj
a?c-b%";
$regex='~
^\w{3,} # at last three word characters at the beginning of the line
- # a dash
\w{3,}$ # three word characters at the end of the line
~xm'; # multiline and freespacing mode (for this explanation)
# ~xmu for accented characters
preg_match_all($regex, $string, $matches);
print_r($matches);
?>
As #chris85 pointed out, \w will match an underscore as well. Trincot had a good comment (matching accented characters, that is). To achieve this, simply use the u modifier.
See a demo on regex101.com and a complete code on ideone.com.
You can use this regex
^\w{3,}-\w{3,}$
^ // start of the string
\w{3,} // match "a" to "z", "A" to "Z" and 0 to 9 and requires at least 3 characters
- // requires "-"
\w{3,} // same as above
$ // end of the string
Regex Demo
And a short one.
^([^\W_]{3,8})-(?1)$
[^\W_] can be used as short for alnum. It subtracts the underscore from \w
(?1) is a subroutine call to the pattern in first group
Demo at regex101
My vote for #chris85 which is most obvious and performant.
This one
^([\w]{3,8}-[\w]{3,8})$
https://regex101.com/r/uS8nB5/1

Can you explain/simplify this regular expression (PCRE) in PHP?

preg_match('/.*MyString[ (\/]*([a-z0-9\.\-]*)/i', $contents, $matches);
I need to debug this one. I have a good idea of what it's doing but since I was never an expert at regular expressions I need your help.
Can you tell me what it does block by block (so I can learn)?
Does the syntax can be simplified (I think there is no need to escape the dot with a slash)?
The regexp...
'/.*MyString[ (\/]*([a-z0-9\.\-]*)/i'
.* matches any character zero or more times
MyString matches that string. But you are using case insensitive matching so the matched string will spell "mystring" by but with any capitalization
EDIT: (Thanks to Alan Moore) [ (\/]*. This matches any of the chars space ( or / repeated zero of more times. As Alan points out the final escape of / is to stop the / being treated as a regexp delimeter.
EDIT: The ( does not need escaping and neither does the . (thanks AlexV) because:
All non-alphanumeric characters other than \, -, ^ (at the start) and
the terminating ] are non-special in character classes, but it does no
harm if they are escaped.
-- http://www.php.net/manual/en/regexp.reference.character-classes.php
The hyphen, generally does need to be escaped, otherwise it will try to define a range. For example:
[A-Z] // matches all upper case letters of the aphabet
[A\-Z] // matches 'A', '-', and 'Z'
However, where the hyphen is at the end of the list you can get away with not escaping it (but always best to be in the habit of escaping it... I got caught out by this].
([a-z0-9\.\-]*) matches any string containing the characters a through z (note again this is effected by the case insensitive match), 0 through 9, a dot, a hyphen, repeated zero of more times. The surrounding () capture this string. This means that $matches[1] will contain the string matches by [a-z0-9\.\-]*. The brackets () tell preg_match to "capture" this string.
e.g.
<?php
$input = "aslghklfjMyString(james321-james.org)blahblahblah";
preg_match('/.*MyString[ (\/]*([a-z0-9.\-]*)/i', $input, $matches);
print_r($matches);
?>
outputs
Array
(
[0] => aslghklfjMyString(james321-james.org
[1] => james321-james.org
)
Note that because you use a case insensitive match...
$input = "aslghklfjmYsTrInG(james321898-james.org)blahblahblah";
Will also match and give the same answer in $matches[1]
Hope this helps....
Let's break this down step-by step, removing the explained parts from the expression.
"/.*MyString[ (\/]*([a-z0-9\.\-]*)/i"
Let's first strip the regex delimiters (/i at the end means it's case-insensitive):
".*MyString[ (\/]*([a-z0-9\.\-]*)"
Then we've got a wildcard lookahead (search for any symbol any number of times until we match the next statement.
"MyString[ (\/]*([a-z0-9\.\-]*)"
Then match 'MyString' literally, followed by any number (note the '*') of any of the following: ' ', '(', '/'. This is probably the error zone, you need to escape that '('. Try [ (/].
"([a-z0-9\.\-]*)"
Then we get a capture group for any number of any of the following: a-z literals, 0-9 digits, '.', or '-'.
That's pretty much all of it.

PHP Regex Not Matching Desired Substrings

I've written the next regular expression
$pattern = "~\d+[.][\s]*[A-Z]{1}[A-Za-z0-9\s-']+~";
in order to match substrings as 2.bon jovi - it's my life
the problem is the only part that is recognized is - bon jovi
none " - " or " ' " are recognized by this regular expression.
I'd prefer to know what is wrong with the regular expression that I've wrote rather than getting a new one.
Your regular expressions states that after the period character (can be changed to \.), you will have zero or more white space characters which should then be followed by 1 upper case letter. In your string, you do not have any upper case letters.
Secondly, the - should be placed last when you want to match it. So, changing your regex to this: ~\d+[.][\s]*[A-Z]{1}[A-Za-z0-9\s'-]+~ will match something like so: 2.Bon jovi - it's my life.
On the other hand, you can change it to this: ~\d+[.][\s]*[A-Za-z0-9\s'-]+~ to match something like so: 2.bon jovi - it's my life.
EDIT: Ammended as per the comments of Marko D and aleation.
A better regular expression to handle that would be...
$pattern = "~\d+\.\s*[\pL\pP\s]+~";
CodePad.
This will match a number, followed by a ., followed by optional whitespace, followed by one or more Unicode letters, whitespace or punctuation marks.
$pattern = "~\d+\..*~";
$string = "2.bon jovi - it's my life";
preg_match($pattern, $string, $match);
print_r($match);
output: Array ( [0] => 2.bon jovi - it's my life )
So the way I understand this regular expression is:
\d+ // Match any digit, 1 or more times
[.] // Match a dot
[\s]* // Match 0 or more whitespace characters
[A-Z]{1} // Match characters between an UPPERCASE A-Z Range 1 time
[A-Za-z0-9\s-']+ // Match characters between A-Z, a-z, 0-9, whitespace, dashe and apostrophe
So straight away, your 'bon jovi' might not get matched as it's lower case and you're only looking for uppercase characters. 'bon jovi' also contains a space so perhaps changing that part of the regular expression to allow for lowercase characters and whitespace might help so you'd end up with:
$pattern = "~\d+[.][\s]*[A-Za-z\s]{1}[A-Za-z0-9\s-']+~";
Note: I quickly tested this on RegExr ( http://gskinner.com/RegExr/ ) and it appeared to match the string fine.
Your regrex is as follows.
~ // delimiter
\d+ // 1 or more numbers
[.] // a period
[\s]* // 0 or more whitespace characters
[A-Z]{1} // 1 upper case letter
[A-Za-z0-9\s-\']+ // 1 or more characters, from the character class
~ //delimiter
Comparing that to the string "2.bon jovi" You have:
~ //
\d+ // "2"
[.] // "."
[\s]* // ""
[A-Z]{1} // <- NO MATCH
[A-Za-z0-9\s-\']+ //
~ //
"bon" does not start with a captial letter, it therefore does not match [A-Z]{1}
Cleaner regex
There are a few simple things you can do to clean up your regex
don't use character-classes for one character
don't specify {1} it's the same as not being present
Applying the above to your existing regex you get:
$pattern = "~\d+\.\s*[A-Z][A-Za-z0-9\s-']+~";
Which is slightly easier to read.
Your [A-Z]{1} sub-pattern requires one capital letter, so "2.bon jovi - it's my life" will not match.
And you need to escape the - in the [A-Za-z0-9\s-'] character class, or put it at the start or end, otherwise it is specifying a range.
"~\d+\.[A-Za-z0-9\s'-]+~"
As pointed out in the comments, it is actually not necessary to escape the - in the character class in your regex. That is only because you happened to precede it with a metacharacter \s that cannot be part of a range. Normally, if you want to match a literal - and you have it in a character class, you must escape it or position it as described above.

Categories