I cannot make this regular expression work - php

May be it's simple but I cannot do it work.
I have two filename strings:
wrap.html
wrap-popup.html
I try to select both using
/.*wrap.*\.htm.*/ mask
But it only matches the first one "wrap.html".
If I use /.*wrap.+\.htm.*/, it only matches the second one "wrap-popup.html"
I thought * sounds 0 to infinite characters.
What's the correct mask to select both strings ???

Consider the string "this is text with 2 html pages: wrap.html and wrap-popup.html"
The first regex /.*wrap.*\.htm.*/ will match that whole string.
So if you don't want to include the first part of the string then you need to remove the first .*
Now /wrap.*\.htm.*/ will match "wrap.html and wrap-popup.html" from the string.
That's because the first .* is a greedy match.
So when we change the regex to /wrap.*?\.html?/ the .*? is now a lazy match. And the l? is an optional l. So the regex will return "wrap.html".
But if we want to retrieve both we need a global search, or it would only find the first match.
A preg_match_all (instead of preg_match) with the regex /wrap[\w\-]*?\.html?/ will match both "wrap.html" and "wrap-popup.html".
That second regex of yours wouldn't match wrap.html because with the .+ it expected at least 1 character between "match" and the dot.

Related

PHP/Laravel trim all but last word in a namespace

Trying to trim a fully qualified namespace so to use just the last word. Example namepspace is App\Models\FruitTypes\Apple where that final word could be any number of fruit types. Shouldn't this...
$fruitName = 'App\Models\FruitTypes\Apple';
trim($fruitName, "App\\Models\\FruitTypes\\");
...do the trick? It is returning an empty string. If I try to trim just App\\Models\\ it returns FruitTypes\Apples as expected. I know the backslash is an escape character, but doubling should treat those as actual backslashes.
If you want to use native functionality for this rather than string manipulation, then ReflectionClass::getShortName will do the job:
$reflection = new ReflectionClass('App\\Models\\FruitTypes\\Apple');
echo $reflection->getShortName();
Apple
See https://3v4l.org/eVl9v
preg_match() with the regex pattern \\([[:alpha:]]*)$ should do the trick.
$trimmed = preg_match('/\\([[:alpha:]]*)$/', $fruitName);
Your result will then live in `$trimmed1'. If you don't mind the pattern being a bit less explicit, you could do:
preg_match('/([[:alpha:]]*)$/', $fruitName, $trimmed);
And your result would then be in $trimmed[0].
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
preg_match - php.net
(matches is the third parameter that I named $trimmed, see documentation for full explanation)
An explanation for the regex pattern
\\ matches the character \ literally to establish the start of the match.
The parentheses () create a capturing group to return the match or a substring of the match.
In the capturing group ([[:alpha:]]*):
[:alpha:] matches a alphabetic character [a-zA-Z]
The * quantifier means match between zero and unlimited times, as many times as possible
Then $ asserts position at the end of the string.
So basically, "Find the last \ then return all letter between this and the end of the string".

NOT words in Regex Pattern

I am trying to grab the text after the first hyphen in a pattern
<title>.*?-(.*?)(-|<\/title>)
which then grabs DesiredText from the pattern below:
<title>Stuff - DesiredText - Other Stuff</title>
However in this pattern:
<title>Stuff - Unwanted - DesiredText - Otherstuff</title>
I want it to skip the 'Unwanted' text and match the text after the next hyphen instead (DesiredText). I made a regex101 with both patterns and need to modify my basic regex so that if a word or words I don't want to match are present in that capture group it then matches the second hyphen text instead:
https://regex101.com/r/veSqH3/1
I believe this is what you are looking for. The key is in using the caret (^) character within the square-bracket character list ([]). Using the caret and brackets together indicate a blacklist. It will only match things that are NOT in the list.
https://regex101.com/r/alAZhj/3
Pattern: <title>.*?-\s*([^-\s]*)\s*- End<\/title>
This matches anything in between the middle hyphens that is not a hyphen or space. You can of course modify the pattern to include such characters by using the following pattern.
Pattern: <title>.*?-\s*([^-]*)\s*- End<\/title>
This will match anything in between the middle hyphens that is not a hyphen, so that you can have less restricted text in there.
This will use a negative lookahead to disqualify Note. There may be ways to optimize the pattern, but I cannot do so with confidence because I don't know how variable your inputs strings are.
Pattern: /<title>.*?- (?P<title>(?!Note).*?)(?= -|<])/
Demo
I am using a positive lookahead to ensure the captured match doesn't have any unwanted trailing characters.
If you just want the second last delimited value, you could do something like this to return the value as the fullstring match:
~- \K[^-]*(?= - [^-]*?</title>)~
Or faster with a capture group:
~- ([^-]*) - [^-]*?</title>~
This assumes there are no hyphens in the value.
I took a different approach and focused on returning the capture prior to the last word, rather than any sort of negation. In this way it's highly generic.
This pattern will match what you want in the capture group:
\s-\s([a-zA-Z]+)\s-\s[a-zA-Z]+<\/title>
If you are concerned that this only match between title tags, then you can add:
<title>.*?\s-\s([a-zA-Z]+)\s-\s[a-zA-Z]+<\/title>
Here's a link to the Test
The only limitation to this I see, is that it uses words and whitespace, so if your desired match is "- Some phrase -" then this won't work with it, but that was not indicated in your example. It's a bit unclear because you used "other stuff" and then "otherstuff".

regexp - match pattern and prefix before pattern

I need to match a specific pattern
(?<!\d|\d )(?:dk)?(\d{2})\D?(\d{2})\D?(\d{2})\D?(\d{2})(?!\d)
eg.
dk30344510
dk30 34 45 10
30344510
30 34 45 10
But I also need to fetch the "prefix" string before the pattern
This is my solution, but it doesn't always work
^(.*)(?<!\d|\d )(?:dk)?(\d{2})\D?(\d{2})\D?(\d{2})\D?(\d{2})(?!\d)
It's hard to explain so check it here.
https://regex101.com/r/fM1xD3/2
It's too "greedy" and match multiple pattern in the string. The actual match is here a part of the "prefix" of the second match
The example should output two matches. One with dk30344510 and 62226420
The example should output CVR-nr. as prefix and dk30344510 as the pattern and second match should be / Tlf. as prefix and 62226420 as the pattern
Your regex doesn't output expected results since you have a start of string anchor ^ and a greedy dot .*. It means it starts at only start of a string and ends to one successful match only.
Solution
Regex:
\s*(.*?)\s*\b((?i:dk)?(?:\d{2}\D?){3}\d{2})\b
I didn't apply many changes to your main regex. What I did is reducing repeating pattern \d{2}\D? and replacing lookarounds with word boundary \b token.
Live demo
you can try this one with the optionn 'g' to get multiple resultes
^(.*?)\s(dk\d+)\s(.*?)\s(\d+)
demo

PHP RegEx get first letter after set of characters

I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group

Matching ugly extra abbreviations and numbers in titles with PHP regex

I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?
In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)
I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo

Categories