Trying to create a regex for 1d barcode(RegexIterator)

Trying to create a regex for 1d barcode(RegexIterator) - php

I'm trying for couple of days to create a regex for finding the correct picture by the product barcode from the pictures folder.
The folder containing something like 4500 pictures.
The name of the file can be in 4 formats.
XXXXXX.jpg/png - short barcode unknown number of characters(numbers only).
00000(from 1 to unknow number of leading zero)XXXX(then the short barcode).jpg/png
729(as leading number)00000(from 1 to unknow number of leading zero)XXXX(then the short barcode).jpg/png
72900000XXXXXXYYY YYY YYY.jpg/png same as option 3 but with some characters(Y-represent a character).
I came up with something like that:
$i = new RegexIterator($a, '($barcode)\D*|^([0][0-9]+$barcode)\D+|(729[0-9][0-9]+$barcode)\D+|(729[0-9][0-9]+$barcode).+/', RegexIterator::GET_MATCH);
$barcode - can be 7290000232 or 0000232 or 232
But it doesn't working.
Any ideas?

You have four cases that build up on each other:
Only numbers, 1 to unlimited times: \d+
1. with leading zeros: effectively the same as 1., as zeros are numbers ;) No need for a special case here
1. optionally preceeded by 729: (?:729)?\d+ (this may already be used for the cases 1.-3.)
3. with optional characters (zero to unlimited): (?:729)?\d+(?:[a-zA-Z])*
Only the extension is left to be added:
((?:729)?\d+(?:[a-zA-Z])*\.(?:jpg|png))
Now there's one thing left. This regex would match on abc123.jpg, as 123.jpg is perfectly valid. To counter this we add ^ (this denotes the start of the input):
^((?:729)?\d+(?:[a-zA-Z])*\.(?:jpg|png))
demo # regex101
As you insert the barcode (from case 1) yourself there are few adjustments to be made:
^((?:729)?0*?$barcode(?:[a-zA-Z])*\.(?:jpg|png))
Here we have to insert the second case with 0*? (0 zero to unlimited times, lazy).
Regarding the [a-zA-Z]: you have to decide what to allow here. Currently it only allows lowercase and uppercase letters. If you want to allow spaces (for example), then simply add them to the character group: [a-zA-Z ].
For non-latin characters you can use [\x{00BF}-\x{1FFF}\x{2C00}-\x{D7FF}a-zA-Z] (credits to this comment) as your character group, so your regex would then look like:
^((?:729)?0*?123(?:[\x{00BF}-\x{1FFF}\x{2C00}-\x{D7FF}a-zA-Z])*\.(?:jpg|png))
demo # regex101

From what I understand - options 1-3 are all the same (729 is a digit string same as others):
^\d+(?:jpg|png)$
With 4 you are saying 'allow word characters and whitespaces, but only if name starts with 729'. So it is now:
(?:(?:^\d+[.](?:jpg|png)$)|(?:^729\d*[\w\s]+[.](?:jpg|png)$))
Demo here.
\s matches spaces, '\w' matches word characters.

Related

Match all occurrences of group A followed by two groups B, with padding characters

I have a string with the following "valid" pattern which is repeated multiple times:
A specific group of characters, say "ab", any number of other characters, say "xx", a different specific group of characters, say "cd", any number of other characters, say "xx".
So a valid sequence would be:
"abxcdabxxcdabxcdxx"
I'm trying to detect invalid sequences of this specific form: "abxxcdxxcd", and remove the middle "cd" to make it valid: "abxxxxcd"
I have tried the following regex:
/(?<=ab).*(cd).*(?=ab)/gsU
It works for a single sequence, but it fails for the following string:
"abxxcdxcdxxabxcdxxabxcdxxcd", which contains an invalid sequence, followed by a valid sequence, followed by another invalid sequence. I want to capture both groups in bold.
Note that the other characters "xx" may contain anything, including line breaks. They will never, however, contain the strings "ab" or "cd", except in the invalid case I specified.
Here's the corresponding regex101 link: https://regex101.com/r/U9pRfo/1
Edit:
Wiktor's answer worked out for me. I was however getting PREG_JIT_STACKLIMIT_ERROR in php when using that regex on a very large string. I ended up just splitting that string into smaller chunks and rebuilding the string after, which worked perfectly.

You may use
'~(?:\G(?!^)|ab)(?:(?!ab).)*?\Kcd(?=(?:(?!ab).)*?cd)~s'
See the regex demo
(?:\G(?!^)|ab) - a nbon-capturing group matching ab or the end of the previous match
(?:(?!ab).)*? - matches any char, 0 or more times, as few as possible, that does not start a ab char sequence
\K - match reset operator
cd - a substring
(?=(?:(?!ab).)*?cd) - a positive lookahead that requires any char, 0 or more repetitions, as few as possible, that does not start the ab char sequence and then cd char sequence.

Add min char and a way to find words with first letter capitalized to a regex

Hi guys have the following regex:
/([A-Z][\w-]*(\s+[A-Z][\w-]*)+)/
I've tried in different way, but i'm not a pro with regex..so, this is what want to do:
Add a rule that match only 3+ characters words.
Add a rule that can match name like "Institute of Technology" (so, three words with a lowercase word between the first and the last)
Can you help me to do that? (I should do different regex, am i right?)

In order to help you to understand, this is what you have:
[A-Z]: one character in the class A-Z
[\w-]*: a concatenation of zero or more word character or hypens
(...)+: one or more:
\s+: at least one space
[A-Z]: one character in the class A-Z
[\w-]*: a concatenation of zero or more word character or hypens
This is what you want:
[A-Z]: a capital letter
[\w-]*: a concatenation of zero or more word character or hypens
\s+: at least one space
[a-z]: a lower-case letter
[\w-]*: a concatenation of zero or more word character or hypens
\s+: at least one space
[A-Z]: a capital letter
[\w-]*: a concatenation of zero or more word character or hypens
That is:
[A-Z][\w-]*\s+[a-z][\w-]*\s+[A-Z][\w-]*
You may want to do some small changes. I think you can do them by your own.
A rule that matches only 3+ characters word is \w{3,}. If you want to capitalize the first character use [A-Z]\w{2,}.

(\w\w\w+)|(\w+ [a-z]+ \w+) - This code searches for a word consisting of at least 3 letters OR a word with at least 1 sign, space, small letters, 1+ signs. You can switch \w with [A-Z] if necessary.
If your 3 word phrase has to have 2 words with capital letters, change the second brackets to ([A-Z]\w* [a-z]+ [A-Z]\w*). Try it here: https://regex101.com/r/E3IPTj/1

Not sure on the scope of your limitations but a few 'building blocks' might help. Also id suggest just starting at the beginning I don't know any recent websites that handle learning regex well but when I started I used the following http://www.regular-expressions.info/tutorial.html (It's been many years, and the website does reflect its age so to speak)
However onto your regex:
Following your example: Institute of Technology
You need to know just a few things, character sets (and how to use matching length) and the space.
Character sets match one length (by default) and are done like for example [abc] that will match a, b, or c, and also supports character ranges (a-z)/grouped (eg. \d all digits).
The match length can be changed by using the:
+ - one or more (examples: a+, [abc]+, \d+)
* - zero or more (examples: a*, [abc]*)
And this one you might want but thats up to you
{min, max} - specific range, eg. b{3,5} will match 3-5 joined 'b' characters (bbb, bbbb, bbbbb) max can be omitted `{min,} to have at least min chars but no max
Spaces are done using "" (a space), (\s matches any whitespace character (equal to [\r\n\t\f\v ]) (spaces, tabs, newlines, ...)
In your example its a matter of case sensitive or not if not case sensitive we can use a simple [A-Za-z]+ to match upper and lowercase a-z of at least one length, together with the space we get something along the lines of
/[A-Za-z]+ [A-Za-z]+ [A-Za-z]+/
It's that simple. For case insensitive matching there is also an option flag, we can use i which will result in
/[a-z]+ [a-z]+ [a-z]+/i
If you do want to have case sensitive matching you will need to separate them how you like:
/[A-Z][a-z]* [a-z]+ [A-Z][a-z]*/ // (*A a A*)
As a small change I've also changed + into * so the lowercase part is not required, again up to you.
Also note that to match the beginning of a string your required to use ^ and to match the end of a string use $ the above examples will match any segment, not the whole input eg: qhg8Institute of Technology8tghagus would work
So final result:
/^[A-Z][a-z]* [a-z]+ [A-Z][a-z]*$/ // case sensitive (Aa a Aa)
/^[a-z]+ [a-z]+ [a-z]+$/i // case insensitive
Obviously there is lots more to learn that can be used to expand/ optimize this but regex are so customizable its really up to the person needing them to specify his/ her limitations/ requirements.
As a side note I noticed people using \w for word chars, but this also includes digits, _, and special language letters like à, ü, etc. Again up to you what to do with this.

PHP preg_replace match numbers following a special character

I'm creating a comment board feature that allows users to reference post-ID's, which will be auto-configured by regex to hyperlink to the relevant post.
Posts references are formatted as the following, using the double-arrow ASCII symbol: »1234
6 numbers maximum can follow the double-arrow in order for the reference to be hyperlinked, so »1234567 would not hyperlink, but »1, »12, »123, etc would.
How would I go about doing this with regex?

Match the special character followed by 1-6 digits and then followed by a word boundary, so it won't match if it's concatenated with any other string.
»\d{1,6}\b

Here is one solution: » matches the arrow character, \d matches a number between 0 and 9 and {1,6} specifies, that at least 1 and maximal 6 numbers should follow. If you want to match only whole words, you can use a word boundary on front and on back of the regex (\b). If you want to check if the whole string consists only of this pattern, you can use an anchor (^ in the beginning, $ at the end).
»\d{1,6}

php preg_match only numbers, letters and dot

I have been searching for 2 hours now and I still don't get it.
I need to evaluate the input of the account name. That can ONLY contain numbers (0-9), letters (a-z and A-Z) and the dot (.).
Everything else is forbidden. So, no underscore (_), plus (+) and so on.
Valid accounts should look like, e.g.:
john.green
luci.mayer89
admin
I tried many preg_match/regex examples but I don't get it working. Whenever I do echo preg_match(...) I get 1 as true.
$accountname = "+_#luke123*";
echo preg_match("/[a-z0-9.]/i", $accountname);
//--> gives back 1 as true
In addition, it would be great to control that the account name starts with at least 2 letters or numbers and ends with at least 1 letter or number - but I am far, far away from that.

You need to use anchors and a quantifier:
echo preg_match("/^[a-z0-9.]+$/i", $accountname);
Your string +_#luke123* contains a letter and a number, thus there is a match. If we tell the engine to only match the whole string from beginning (^) to end ($), we'll make sure this will not match. + ensures we capture not just 1, but all characters.
See this demo, now there is no match!
EDIT:
Since you also need to check these conditions:
string must start with 2 or more letters or numbers and end with 1 or
more letters or numbers
I can suggest this ^[a-z0-9]{2,}[a-z0-9.]*[a-z0-9]+$ regex (must be used with i option) that means:
Starts with 2 or more letters or numbers
then follow any number of digits, letters or periods
and ends in 1 or more letters or numbers.
Another demo

Regular Expression to extract numbers from string with special characters

I have string like
8.123.351 (Some text here)
I have used the Regex
/([0-9,]+(\.[0-9]{2,})+(\.[0-9]{2,})?)/
to take value "8.123.351" from string. It is working for the string given above.
But it is not working when the string without "." for example "179 (Some text here)".
I modified Regex to match this value also, but no success.
So can anyone suggest me the Regex to get numbers from strings like:
8.123.351 (Some text here)
179 (Some text here)
179.123 (Some text here)
179.1 (Some text here)

You are not very clear. I make some assumptions to create a pattern.
The numbers are at the start of the string
There is at least 1 digit and at most 3 digits before there is a dot
Now we create your expression
Match 1 to 3 digits at the start of the row
/^\d{1,3}/
There is optionally (the ? after the group) a dot and one to three more digits
/^\d{1,3}(?:\.\d{1,3})?/
This part with the dot can be repeated 0 or more times (replace the ? with a *)
/^\d{1,3}(?:\.\d{1,3})*/
See it here on Regexr
If you want to read some basics about regular expressions, I wrote a blog post about that.

/([0-9]+[,\.]?)+/
matches all of your strings
By the way... your RegEx needs a point to match because + says 1 or more matches. * is 0 or more and ? is 0 or 1

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Trying to create a regex for 1d barcode(RegexIterator) - php

Related

Match all occurrences of group A followed by two groups B, with padding characters

Add min char and a way to find words with first letter capitalized to a regex

PHP preg_replace match numbers following a special character

php preg_match only numbers, letters and dot

Regular Expression to extract numbers from string with special characters

Categories

Resources