How can I repeat my pattern with a minor change? - php

Here is my pattern, \/\d+$. It matches the slash and number which is in the end of string. Now I want to expand it and making it working for a sequence of them.
Here is the input:
ticket/2/1/19
And here is the current result:
ticket/2/1
And this is the expected result:
ticket
How can I do that?

You may wrap \/\d+ with a grouping construct and quantify the group:
(?:\/\d+)+$
See the regex demo
Details
(?: - start of a non-capturing group
\/ - a /
\d+ - 1+ digits
)+ - end of the group, repeated 1 or more times (+)
$ - end of string.

Related

Regex optional groups and digit length

Maybe some regex-Master can solve my problem.
I have a big list with many addresses with no seperators( , ; ).
The address string contains following Information:
The first group is the street name
The second group is the street number
The third group is the zipcode (optional)
The last group is the town name (optional)
As you can see on the image above the last two test strings are not matching.
I need the last two regex groups to be optional and the third group should be either 4 or 5 digits.
I tried (\d{4,5}) for allowing 4 and 5 digits. But this only works halfways as you can see here: https://regex101.com/r/ZurqHh/1
(This sometimes mixes the street number and zipcode together)
I also tried (?:\d{5})? to make the third and fourth group optional. But this destroys my whole group layout...
https://regex101.com/r/EgxeMy/1
This is my current regex:
/^([a-zäöüÄÖÜß\s\d.,-]+?)\s*([\d\s]+(?:\s?[-|+\/]\s?\d+)?\s*[a-z]?)?\s*(\d{5})\s*(.+)?$/im
Try it out yourself:
https://regex101.com/r/zC8NCP/1
My brain is only farting at this moment and i can't think straight anymore.
Please help me fix this problem so i can die in peace.
You can use
^(.*?)(?:\s+(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b))?(?:\s+(\d{4,5})(?:\s+(.*))?)?$
See the regex demo (note all \s are replaced with \h to only match horizontal whitespaces).
Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars
(?:\s+(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b))? - an optional non-capturing group matching
\s+ - one or more whitespaces
(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b) - Group 2:
\d+ - one or more digits
(?:\s*[-|+\/]\s*\d+)* - zero or more sequences of zero or more whitespaces, -, +, | or /, zero or more whitespaces, one or more digits
\s* - zero or more whitespaces
[a-z]?\b - an optional lowercase ASCII letter and a word boundary
(?:\s+(\d{4,5})\b(?:\s+(.*))?)? - an optional non-capturing group matching
\s+ - one or more whitespaces
(\d{4,5}) - Group 3: four or five digits
(?:\s+(.*))? - an optional sequence of one or more whitespaces and then any zero or more chars other than line break chars as many as possible
$ - end of string.
Please note that the (?:\s+(.*))? optional group must be inside the (?:\s+(\d{4,5})...)? group to work.
It is difficult to parse addresses because we are halfway between formatted text and natural language. Here is a pattern that tries as much as possible to reduce the number of optional parameters to succeed with the examples offered without asking too much to the regex engine. To do this, I mainly rely on character classes, atomic groups, and a relatively accurate description of the street names. Obviously, all the examples of the question cannot be representative of reality and characters could be added or removed from the classes to deal with new cases. Nevertheless, the structure of this pattern is a good starting point.
~
^
(?<strasse> [\pL\d-]+ \.? (?> \h+ [\pL\d-]+ \.? )*? ) \h*
(?<nummer> \b (?> \d+ | [-+/\h]+ | [a-z] \b )*? )
(?: \h+ (?<plz> \d{4,5} )
\h+ (?<stadt> .+ ) )?
$
~mxui
demo
Note that in the above link you can also see a previous version of this pattern with a more accurate description of the street number (a bit more efficient but longer).

How to apply regular expressions to get a number from a dynamic string

I have a variable that can contain a few variations, it also contains a number which can be any number.
The variations:
($stuksprijs+9.075);
($stuksprijs-9.075);
($stuksprijs*9.075);
($m2+9.075);
($m2-9.075);
($m2*9.075);
($o+9.075);
($o-9.075);
($o*9.075);
These are the only variations except for the numbers in it, they can change. And I need that number.
So there can be:
($m2+5);
or
($o+8.25);
or
($stuksprijs*3);
How can I get the number from those variations? How can I get the 9.075 or 5 or 8.25 or 3 from my above examples with regular expression?
I am trying to fix this with PHP, my variable that contains the string is: $explodeberekening[1]
I read multiple regex tutorials and got it to work for a single string that never changes, but how can I write a regex to get the number from above variations?
As per my comment, which seems to have worked, you can try:
^\(\$(?:stuksprijs|m2|o)[+*-](\d+(?:\.\d+)?)\);$
The number is captured in the 1st capture group. See the online demo.
A quick breakdown:
^ - Start string anchor.
\(\$ - Literally match "($".
(?: - Open a non-capture group to list alternation:
stuksprijs|m2|o - Match one of these literal alternatives.
) - Close non-capture group.
[+*-] - Match one of the symbols from the character-class.
( - Open 1st capture group:
\d+ - 1+ digits.
(?:\.\d+)? - Extra optional non-capture group to match a literal dot and 1+ digits.
) - Close 1st capture group.
\); - Literally match ");".
$ - End string anchor.

Regex allow only datetime (h:m:s) or an integer

I'm looking for a regex pattern that only accept datetime (eg: 01:02:00) or an integer (123456789), The datetime can accept optional leading zero i mean it can also allow 1:2:10
It should allow or disallow these inputs:
0123456789✅
0123456789 word❌
word❌
01:00:10✅
1:2:10✅
1:10❌
1:2:❌
1:❌
I tried this pattern but not working correctly:
if (preg_match('~^[0-9:]*$|[0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}~')) {
//allowed
}
https://regex101.com/r/mRxBNu/1
I would just use a regex alternation here:
^(?:\d+|\d{1,2}:\d{1,2}:\d{1,2})$
Demo
Explanation of regex:
^ from the start of the input
(?:
\d+ match one or more digits
| OR
\d{1,2}:\d{1,2}:\d{1,2} match an H:M:S timestamp
)
$ end of the input
I'd suggest the following could work:
^\d\d?(?:\d*|:\d\d?:\d\d?)$
See the online demo
^ - Start string anchor.
\d\d? - A single digit and an optional one.
(?: - Open non-capture group:
\d* - 0+ digits;
| - Or:
:\d\d?:\d\d? - A colon, digit and an optional digit (two times in a row).
) - Close non-capture group.
$ - End string anchor.
This regex will detect any date and a 10 digit Integer/Number
^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]))\1|(?:(?:29|30)(\/|-|\.)(?:0?[13-9]|1[0-2])\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)0?2\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9])|(?:1[0-2]))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^([0-9]{10})$
Inspired from here :
https://ihateregex.io/expr/date/
I just added the last "|([0-9]{10})" for the 10 digit number.

PHP regex preg_match to identify url pattern

Is there any way to make rule allow only example 1 and 3 and not all 4 of them?
/^(en\/|)([\d]{1,3})([-])(.+?)([\/])$/
examples:
12-blog/
12-blog/blog2/
en/12-blog/
en/12-blog/blog2/
https://www.phpliveregex.com/p/tFe
You might use an optional part for en/ followed by match 1-3 digits, - and match not a / 1+ times using a negated character class.
Note that you can omit the square brackets for [\d], [-] and [\/]. If you choose a different delimiter than / you don't have to escape the forward slash.
^(?:en/)?\d{1,3}-[^/]+/$
In parts
^ Start of string
(?:en/)? Optionally match en/
\d{1,3} Match 1-3 digits
- Match literally
[^/]+/ Match 1+ times any char except /
$ End of string
Regex demo | Php demo

Update a regex that matches twitter like mentions to allow for dots

I have already found helpful answers for a regex that matches twitter like username mentions in this answer and this answer
(?<=^|(?<=[^a-zA-Z0-9-_\.]))#([A-Za-z]+[A-Za-z0-9_]+)
(?<=^|(?<=[^a-zA-Z0-9-_\.]))#([A-Za-z]+[A-Za-z0-9-_]+)
However, I need to update this regex to also include usernames that has dots.
One or more dots are allowed in a username.
The username must not start or end with a dot.
No two consecutive dots are allowed.
Example of a matched string:
#valid.user.name
^^^^^^^^^^^^^^^^
Examples of non-matched strings:
#.user.name // starts with a dot
#user.name. // ends with a dot
#user..name // has two consecutive dots
You can use this refactored regex:
(?<=[^\w.-]|^)#([A-Za-z]+(?:\.\w+)*)$
RegEx Demo
RegEx Details:
(?<=[^\w.-]|^): Lookbehind to assert that we have start of line or any non-word, non-dot, non-hyphen character before current position
#: Match literal `#1
(: Start capture group
[A-Za-z]+: Match 1+ ASCII letters
(?:\.\w+)*: Match 0 or more instances of dot followed 1+ word characters
): End capture group
$: End
The (?<=^|(?<=[^a-zA-Z0-9-_\.])) is a positive lookbehind that requires a match to be at the start of the string or right after an alphanumeric, -, _, ., you may write it in a more compact way as (?<![\w.-]), a negative lookbehind.
Next, ([A-Za-z]+[A-Za-z0-9_]+) captures 1+ ASCII letters and then 1+ ASCII letters or/and underscores. You seem to make sure the first char is a letter, then any number of sequences of . and 1+ word chars are allowed, that is, you may use [A-Za-z]\w*(?:\.\w+)*.
As you do not want to match it if there is a . right after the expected match, you need to set a lookahead that will require a space or end of string, (?!\S).
So, combining it, you can use
'~(?<![\w.-])#([A-Za-z]\w*(?:\.\w+)*)(?!\S)~'
See the regex demo
Details
(?<![\w.-]) - no letters, digits, _, . and - immediately to the left of the current location are allowed
# - a # char
([A-Za-z]\w*(?:\.\w+)*) - Group 1:
[A-Za-z] - an ASCII letter
\w* - 0+ letters, digits, _
(?:\.\w+)* - 0+ sequences of
\. - dot
\w+ - 1+ letters, digits, _
(?!\S) - whitespace or end of string are required immediately to the right of the current location.
EDIT: Simpler version (same result)
^#[a-zA-Z](\.?[\w-]+)*$
Original
Another one:
^#[a-zA-Z][a-zA-Z_-]?(\.?[\w\d-]+){0,}$
^# starts with #
[a-zA-Z] first char
[a-zA-Z_-]? match a-zA-Z_- 0 or more times
( start group
\.? match . (optional)
[\w\d-]+ match a-zA-Z0-9-_ 1 or more times
) end group
{0,} repeat group 0 to infinite times
$ end
Tests
valid:
#validusername
#valid.user.name
#valid-user-name
#valid_user-name
#valid-user123_name
#a.valid-user123_name
not valid:
#-invalid.user
#_invalid.user
#1notvalid-user_123name33
#.user.name
#user.name.
#user..name

Categories