Regex any string with max length - php

I'm trying to make a mysql-regex SQL that wil find any strings with a maximum length - i can do it in javascrtip, php and so on - but for some reason i can get it to work in mysql
Example strings:
a12
a123
a1234
a12345
a123456
So lets say i want my regex to hit ANY strings with 4 or less in length - so in the example's above my regex should hit 1 & 2.
The code below works perfectly when trying to hit any strings above a certain length, but i can not get it to work with strings below a certain length.
([a-zA-Z])[0-9a-zA-Z]{20,}
i obviously tried to do
([a-zA-Z])[0-9a-zA-Z]{0,6}
but doesnt work for some reason

I think you want:
^[a-zA-Z][0-9a-zA-Z]{0,3}$
This matches on an a string made of an alphabetic character followed by 0 to 3 alphanumeric characters. One important thing is to match on the entire string: that's what anchors ^ (beginning of the string) and $ (end of the string) do.

Related

Match all occurrences of group A followed by two groups B, with padding characters

I have a string with the following "valid" pattern which is repeated multiple times:
A specific group of characters, say "ab", any number of other characters, say "xx", a different specific group of characters, say "cd", any number of other characters, say "xx".
So a valid sequence would be:
"abxcdabxxcdabxcdxx"
I'm trying to detect invalid sequences of this specific form: "abxxcdxxcd", and remove the middle "cd" to make it valid: "abxxxxcd"
I have tried the following regex:
/(?<=ab).*(cd).*(?=ab)/gsU
It works for a single sequence, but it fails for the following string:
"abxxcdxcdxxabxcdxxabxcdxxcd", which contains an invalid sequence, followed by a valid sequence, followed by another invalid sequence. I want to capture both groups in bold.
Note that the other characters "xx" may contain anything, including line breaks. They will never, however, contain the strings "ab" or "cd", except in the invalid case I specified.
Here's the corresponding regex101 link: https://regex101.com/r/U9pRfo/1
Edit:
Wiktor's answer worked out for me. I was however getting PREG_JIT_STACKLIMIT_ERROR in php when using that regex on a very large string. I ended up just splitting that string into smaller chunks and rebuilding the string after, which worked perfectly.
You may use
'~(?:\G(?!^)|ab)(?:(?!ab).)*?\Kcd(?=(?:(?!ab).)*?cd)~s'
See the regex demo
(?:\G(?!^)|ab) - a nbon-capturing group matching ab or the end of the previous match
(?:(?!ab).)*? - matches any char, 0 or more times, as few as possible, that does not start a ab char sequence
\K - match reset operator
cd - a substring
(?=(?:(?!ab).)*?cd) - a positive lookahead that requires any char, 0 or more repetitions, as few as possible, that does not start the ab char sequence and then cd char sequence.

Match string that doesn't have number after letter

I've got a scenario as follows. Our systems needs to pull filters from a string passed in as a query parameter, but also throw a 404 error if the string isn't correctly formatted. So let's take the following three strings as an exmple:
pf0pt1000r
pfasdfadf
pf2000pt2100
By the application requirements, only #3 is supposed to match as a "valid" string. So my current regex to match that is /([a-z]+)(\d+)/. But this also matches #1, if not entirely, but it still matches.
My problem thus is twofold - I need 2 patterns, 1 that will match only the 3rd string in this list, and another that will match the "not-acceptable" strings 1 and 2. I believe there must be some way to "negate" a pattern (so then I'd technically only need one pattern, I'm assuming), but I'm not sure how exactly to do that.
Thanks for any help!
EDIT
For clarity's sake, let me explain. The "filter parameters" present here take the following structure - 1 or 2 letters, followed by a number of, well, numbers. That structure can repeat itself however many times. So for example, valid filter strings could be
pf100pt2000
pf100pt2000r2wp0to1
etc.
Invalid strings could be
pf10000pt2000r
pf3000pt2123wpno
... anything not following the structure above.
After clarifying the question:
^([a-zA-Z]{1,2}\d+)*$
Explanation:
[a-zA-Z] - a lower or upper case letter
{1,2} - one or two of those
\d+ - one or more digits
()* - the whole thing repeated any number of times
^$ - match the entire string from start(^) to end($)
You can use this regex for valid input:
^([a-zA-Z]+\d+)+$
RegEx Demo 1
To find invalid inputs use:
^(?!([a-zA-Z]+\d+)+$).+$
RegEx Demo 2
/^(?:(?:[a-z]+)(?:\d+))*$/
You were hella close, man. Just need to repeat that pattern over and over again till the end.
Change the * to a + to reject the empty string.
Oh, you had more specific requirements, try this:
/^(?:[a-z]{1,2}\d+)*$/
Broken down:
^ - Matches the start of the string an anchor
(?: - start a non-capturing group
[a-z] - A to Z. This you already had.
{1,2} - Repeat 1 or 2 times
\d+ - a digit or more You had this, too.
)* - Repeat that group ad nauseum
$ - Match the end of the string
If you only want digits at the end of the string, then
/\d$/
would do. \d = digit, $ = end of string.

Regex OR matching stuff that I dont want

I am using PHP.
I have a strings like:
example.123.somethingelse
example.1234.somethingelse
example.2015.123.somethingelse
example.2015.1234.somethingelse
and I came up with this regex
/example\.(2015\.|)([0-9]{3,4})\./
What I want to get is "123" or "1234" and it works for these strings. But when the string is
example.2015.A01.somethingelse
the result is "2015".
The way that I see it, after "2015." I have "A" and this should not be matched by the regex, but it is ( and I suppose there is a solid reason for it that I dont understand atm).
How can I fix it ( make the regex match nothing since the last string does not follow the same structure as the others) ?
Your regex is this:
/example\.(2015\.|)([0-9]{3,4})\./
That says
First match "example" followed by a period
Then match either "2015" followed by a period OR nothing at all.
Then match 3 or 4 digits in a row followed by a period
When you have the string example.2015.A01.somethingelse it matches the "example.2015." but then, as you said, the "A" messes it up so it backtracks and matches just "example." (remember the "OR" allowed for nothing to be matched). So it matches "example." followed by NOTHING followed by 3 or 4 numeric digits -- since "2015" is 4 numeric digits it comfortably matches "example.2015".
It's hard to tell from your description, but I think you've just got a mis-placed vertical bar:
/example\.(2015\.)|([0-9]{3,4})\./
That should match EITHER "example.2015." OR numbers like 123 -- but "2015" is still 4 numeric digits in a row, so it will still match. I don't have a clear enough idea of the pattern to figure out how that could be avoided.
Maybe use \d+ and take the first result in the array.
In your regex, you use the following:
(2015\.|)
This allows the regex to match either 2015. or the empty string (zero characters).
When the regex example\.(2015\.|)([0-9]{3,4})\. is applied to the following example:
example.2015.A01.somethingelse
it will to match the literal characters example, and then the empty string with (2015\.|) and then uses ([0-9]{3,4})\. to match the string 2015, which is 4 numerical characters. Thus your expression matches the following:
example.2015.
Looks like you need a possessive quantifier:
/example\.(2015\.)?+([0-9]{3,4})\./
The 2015. is still optional, but once the regex has matched it, it won't give it up, even if that causes the match to fail. I'm assuming the substring you're trying to capture with ([0-9]{3,4}) can never have the value 2015. That is, you won't need to match something like this:
example.2015.somethingelse
If that's not the case, it's going to be much more complicated.
here is one more pattern
example\.(?:2015\.)?\K(\d+)
Demo
or to your specific amount of digits
example\.(?:2015\.)?\K(\d{3,4})

Regular Expression to extract numbers from string with special characters

I have string like
8.123.351 (Some text here)
I have used the Regex
/([0-9,]+(\.[0-9]{2,})+(\.[0-9]{2,})?)/
to take value "8.123.351" from string. It is working for the string given above.
But it is not working when the string without "." for example "179 (Some text here)".
I modified Regex to match this value also, but no success.
So can anyone suggest me the Regex to get numbers from strings like:
8.123.351 (Some text here)
179 (Some text here)
179.123 (Some text here)
179.1 (Some text here)
You are not very clear. I make some assumptions to create a pattern.
The numbers are at the start of the string
There is at least 1 digit and at most 3 digits before there is a dot
Now we create your expression
Match 1 to 3 digits at the start of the row
/^\d{1,3}/
There is optionally (the ? after the group) a dot and one to three more digits
/^\d{1,3}(?:\.\d{1,3})?/
This part with the dot can be repeated 0 or more times (replace the ? with a *)
/^\d{1,3}(?:\.\d{1,3})*/
See it here on Regexr
If you want to read some basics about regular expressions, I wrote a blog post about that.
/([0-9]+[,\.]?)+/
matches all of your strings
By the way... your RegEx needs a point to match because + says 1 or more matches. * is 0 or more and ? is 0 or 1

Matching ugly extra abbreviations and numbers in titles with PHP regex

I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?
In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)
I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo

Categories