I need some help with writing regexp to parse input strings like this ones:
test-12-1
blabla12412-5
t-dsf-gsdg-x-10
to next matches:
test and 1
blabla12412 and 5
t-dsf-gsdg-x and 10
I try to reach it by using something like
$matches = [];
preg_match('/^[a-zA-Z0-9]+(-\d+)+$/', 'test-12-1', $matches);
But I received unexpected result:
array (
0 => 'test-12-1',
1 => '-1',
)
You can move forward with help on this playground: https://ru.functions-online.com/preg_match.html?command={"pattern":"/^[a-zA-Z0-9]+(-\d+)+$/","subject":"test-12-1"}
Thanks a lot!
You may use
'~^(.*?)(?:-(\d+))+$~'
See the regex demo
Details
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
(?:-(\d+))+ - 1 or more occurrences of
- - a hyphen
(\d+) - Group 2: one or more digits (the last occurrence is kept in the group value since it is located in a repeated non-capturing group)
$ - end of string.
Related
i can't figure out this thing i think it possible with only one pattern, please help me improve.
I have this string 2 / 3 items and i wont receive only 2 / 3
Items can also be write in cirillic so 2 / 3 штуки
So i think the best way is use \D all non digit (result 23)
But this delete also the slash that i want to keep, how i can do?
// this was my solution for now,
// but it not complete for cirillic cause i have an error
// it return: 2 / 3 �
// maybe is something with encoding?
preg_replace('#[a-zA-Zа-яА-Я]*#', '', '2 / 3 штуки');
// so i chose to do this, but doesn't know how to keep slash
preg_replace('#[\D]*#', '', '2 / 3 штуки');
// it return: 23
# How to get 2 / 3 ?
You can use
if (preg_match('~\d+\s*/\s*\d+~u', $text, $match)) {
echo $match[0];
}
Also, if the fraction part is optional, use
preg_match('~\d+(?:\s*/\s*\d+)?~u', $text, $match)
And if you need to extract all occurrences, use preg_match_all:
preg_match_all('~\d+(?:\s*/\s*\d+)?~u', $text, $matches)
See the regex demo and the PHP demo. Note that preg_match extracts the match rather than remove it (as is the case with preg_replace).
Pattern details
\d+ - one or more digits
- \s*/\s* - / enclosed with zero or more whitespaces
\d+ - one or more digits
Note that u is used in case the whitespace in your string can be other than regular ASCII whitespace, like \xA0.
There is a string with numbers I need to validate with PHP preg_match.
If it starts with 10 or 20 or 30, I need 7 more numbers after the inital 2, but in any other cases I need 8 numbers only and don't care what are the lead characters.
The first part is the simple one
/^(1|2|3)0\d{7}$
But how can I add an ELSE part? There I need a simple
^\d{8}$
I need to match these examples:
101234567
201234567
12345678
33445566
You may use
^(?:[1-3]0\d{7}|(?![1-3]0)\d{8})$
See the regex demo
Details
^ - start of string
(?: - start of a non-capturing group:
[1-3]0\d{7} - 1, 2 or 3, then 0 and any 7 digits
| - or
(?![1-3]0)\d{8} - no 10, 20 or 30 immediately at the start of the string are allowed, then any 8 digits are matched
) - end of the group
$ - end of the string.
Here's an alternative using (?(?=regex)then|else) aka conditionals:
^(?(?=[1-3]0)[1-3]0\d{7}|\d{8})$
It literally says: if [1-3]0 is right at the start, match [1-3]0\d{7}, else match \d{8}.
Demo: https://regex101.com/r/LXoHyk/1 (examples shamelessly taken from Wiktor's answer)
In my wordpress post contents, I have a line [yu_TOC title="Short Stories"]. I am trying to match it with
preg_match('/\[yu_TOC title=\"(.*?)\"\s*\]/', $content[0], $matchedTitle);
I have printed out the line I wanted to match using error_log(substr($content, 0, 1000));.
The output (relevant part of it) is [yu_TOC title=”Short Stories”]</p>
Is it expected that the quotes have changed from " to ”?
Why does not my pattern match the line that should be matched?
How to fix it?
Update: I have tried to replace []s with {}s, still the same issue.
If those quotes have changed and you also want to match the encoded version you could use an alternation to match either one of them in a capturing group and then use a backreference \1 for the same match as the accompanying closing match.
Your value is in the second capturing group as the first group is used for the backreference.
\[yu_TOC title=("|”)(.*?)\1\s*\]
Regex demo | Php demo
Note that you don't have to escape "
For example
$content = ["[yu_TOC title=”Short Stories”]</p>"];
preg_match('/\[yu_TOC title=("|”)(.*?)\1\s*\]/', $content[0], $matchedTitle);
print_r($matchedTitle);
Output
Array
(
[0] => [yu_TOC title=”Short Stories”]
[1] => ”
[2] => Short Stories
)
I have these two regular expression
^(((98)|(\+98)|(0098)|0)(9){1}[0-9]{9})+$
^(9){1}[0-9]{9}+$
How can I combine these phrases together?
valid phone :
just start with : 0098 , +98 , 98 , 09 and 9
sample :
00989151855454
+989151855454
989151855454
09151855454
9151855454
You haven't provided what passes and what doesn't, but I think this will work if I understand correctly...
/^\+?0{0,2}98?/
Live demo
^ Matches the start of the string
\+? Matches 0 or 1 plus symbols (the backslash is to escape)
0{0,2} Matches between 0 and 2 (0, 1, and 2) of the 0 character
9 Matches a literal 9
8? Matches 0 or 1 of the literal 8 characters
Looking at your second regex, it looks like you want to make the first part ((98)|(\+98)|(0098)|0) in your first regex optional. Just make it optional by putting ? after it and it will allow the numbers allowed by second regex. Change this,
^(((98)|(\+98)|(0098)|0)(9){1}[0-9]{9})+$
to,
^(?:98|\+98|0098|0)?9[0-9]{9}$
^ this makes the non-grouping pattern optional which contains various alternations you want to allow.
I've made few more corrections in the regex. Use of {1} is redundant as that's the default behavior of a character, with or without it. and you don't need to unnecessarily group regex unless you need the groups. And I've removed the outer most parenthesis and + after it as that is not needed.
Demo
This regex
^(?:98|\+98|0098|0)?9[0-9]{9}$
matches
00989151855454
+989151855454
989151855454
09151855454
9151855454
Demo: https://regex101.com/r/VFc4pK/1/
However note that you are requiring to have a 9 as first digit after the country code or 0.
I have the following value:
start=2011-03-10T13:00:00Z;end=2011-03-30T13:00:00Z;scheme=W3C-DTF
I use the following regular expression to strip out the 'start' and 'end' dates and assign them to their own named capture pair:
#^start=(?P<publishDate>.+);end=(?P<expirationDate>.+);#ix'
Probably not the absolute best REGEX, but it works well enough if both 'start' and 'end' values are present.
Now, what I need to do is still match 'publishDate' if 'expirationDate' is missing and vise-versa.
How can I do this using a single expression? I'm not the greatest at regular expressions and I'm starting to wander off into the more advanced stuff, so any help with this would be greatly appreciated.
Thanks!
UPDATE:
Thanks to Mr. Chung, I have resolved this issue with the following expression:
#^(start=(?P<publishDate>.*?);)?(end=(?P<expirationDate>.*?);)?#xi
As always, thank you so much for all of your help, everyone. :)
Use (...)? for an optional section
^(start=(?P<publishDate>.+);)?(end=(?P<expirationDate>.+));)?
These both set the named buffer to a value (instead of null or undefined)
I recommend the first one.
1. To find either/both in any order:
/^(?=.*\bstart=(?P<publishDate>.*?);|(?P<publishDate>))(?=.*\bend=(?P<expirationDate>.*?);|(?P<expirationDate>))/ix
/^(?= # from beginning, look ahead for start
.*\b # any character 0 or more times (backtrack to match 'start')
start=(?P<publishDate>.*?); # put start date in publish
| (?P<publishDate>) # OR, put empty string publish
)
(?= # from beginning, look ahead for end
.*\b # same criteria as above ...
end=(?P<expirationDate>.*?);
| (?P<expirationDate>)
)
/ix
2. To find either/both in start/end order:
/^(?:.*\bstart=(?P<publishDate>.*?);|(?P<publishDate>))(?:.*\bend=(?P<expirationDate>.*?);|(?P<expirationDate>))/ix
Edit -
#Josh Davis - I had to go searching PCRE.org, some great stuff there.
With Perl there is no problem with duplicate names.
Docs: "If multiple groups have the same name then it refers to the leftmost defined group in the current match."
The is never a problem when used in an alternation.
With PCRE ..
Duplicate names will work properly with PHP if its used with the branch reset.
Branch reset insures duplicate names will occupy the same capture group.
After that, using the dup names constant, $match['name'] will either contain a value
or an empty string, but it will exist.
ie:
(?J) = PCRE_DUPNAMES
(?| ... | ...) = Branch reset
This works:
/(?Ji)^
(?= (?| .* end = (?P<expirationDate> .*? ); | (?P<expirationDate>)) )
(?= (?| .* start = (?P<publishDate> .*? ); | (?P<publishDate>)) )
/x
Try it here: http://www.ideone.com/zYd24
<?php
$string = "start=2011-03-(start)10T13:00:00Z;end=2011-03-(end)30T13:00:00Z;scheme=W3C-DTF";
preg_match('/(?Ji)^
(?= (?| .* end = (?P<expirationDate> .*? ); | (?P<expirationDate>)) )
(?= (?| .* start = (?P<publishDate> .*? ); | (?P<publishDate>)) )
/x', $string, $matches);
echo "Published = ",$matches['publishDate'],"\n";
echo "Expires = ",$matches['expirationDate'],"\n";
print_r($matches);
?>
Output
Published = 2011-03-(start)10T13:00:00Z
Expires = 2011-03-(end)30T13:00:00Z
Array
(
[0] =>
[expirationDate] => 2011-03-(end)30T13:00:00Z
[1] => 2011-03-(end)30T13:00:00Z
[publishDate] => 2011-03-(start)10T13:00:00Z
[2] => 2011-03-(start)10T13:00:00Z
)
If 'start=;' isn't present when the corresponding date is absent, the Stephen Chung's code is OK
Otherwise I think that replacing '+' with '*' is enough:
#^start=(?P<publishDate>.*?);end=(?P<expirationDate>.*?);#ix'
By the way, the '?' is necessary to make the point ungreedy in every code