Regular Expression to extract numbers from string with special characters

Regular Expression to extract numbers from string with special characters - php

I have string like
8.123.351 (Some text here)
I have used the Regex
/([0-9,]+(\.[0-9]{2,})+(\.[0-9]{2,})?)/
to take value "8.123.351" from string. It is working for the string given above.
But it is not working when the string without "." for example "179 (Some text here)".
I modified Regex to match this value also, but no success.
So can anyone suggest me the Regex to get numbers from strings like:
8.123.351 (Some text here)
179 (Some text here)
179.123 (Some text here)
179.1 (Some text here)

You are not very clear. I make some assumptions to create a pattern.
The numbers are at the start of the string
There is at least 1 digit and at most 3 digits before there is a dot
Now we create your expression
Match 1 to 3 digits at the start of the row
/^\d{1,3}/
There is optionally (the ? after the group) a dot and one to three more digits
/^\d{1,3}(?:\.\d{1,3})?/
This part with the dot can be repeated 0 or more times (replace the ? with a *)
/^\d{1,3}(?:\.\d{1,3})*/
See it here on Regexr
If you want to read some basics about regular expressions, I wrote a blog post about that.

/([0-9]+[,\.]?)+/
matches all of your strings
By the way... your RegEx needs a point to match because + says 1 or more matches. * is 0 or more and ? is 0 or 1

Related

Regex any string with max length

I'm trying to make a mysql-regex SQL that wil find any strings with a maximum length - i can do it in javascrtip, php and so on - but for some reason i can get it to work in mysql
Example strings:
a12
a123
a1234
a12345
a123456
So lets say i want my regex to hit ANY strings with 4 or less in length - so in the example's above my regex should hit 1 & 2.
The code below works perfectly when trying to hit any strings above a certain length, but i can not get it to work with strings below a certain length.
([a-zA-Z])[0-9a-zA-Z]{20,}
i obviously tried to do
([a-zA-Z])[0-9a-zA-Z]{0,6}
but doesnt work for some reason

I think you want:
^[a-zA-Z][0-9a-zA-Z]{0,3}$
This matches on an a string made of an alphabetic character followed by 0 to 3 alphanumeric characters. One important thing is to match on the entire string: that's what anchors ^ (beginning of the string) and $ (end of the string) do.

Using preg_match to validate a string format

I have an html form with an input for a sales order number which should have the format of K1234/5678. It should always start with the letter K then 4 numbers, a / and followed by another set of 4 numbers.
I'm trying to validate the formatting using preg_match and I'm getting lost in the syntax of preg_match. From http://php.net/manual/en/function.preg-match.php I've gotten close. With the following code I'm able to verify that it contains at least 1 letter, some numbers and at least 1 non- alphanumeric value.
$so= $_POST['so'];
if (preg_match(""/^(?=.*[a-z]{1})(?=.*[0-9]{4})(?=.*[^a-z0-9]{1})/i", $so))
{
print $so;
}
What is the correct syntax to use for this? Is preg_match even the best way to do this?

Try this:
preg_match("#^K[0-9]{4}/[0-9]{4}$#i", $so)
Explanation:
The # characters are regular expression delimiters - they indicate the start/end of the pattern. The ^ and $ indicate the start and end of the string - this means that it will only match if your sales order number is the only thing in the string. The letter K means match that letter, [0-9]{4} means match a digit exactly 4 times. The i at the end means a case-insensitive match - the K will match either "K" or "k".
When developing regular expressions, I often use regular expression testers - these allow you to enter your data and try a bunch of different things to refine your regex. Google PHP regex tester to find a list of tools. Also, there's a very complete reference to regular expressions at http://www.regular-expressions.info/.

Trying to create a regex for 1d barcode(RegexIterator)

I'm trying for couple of days to create a regex for finding the correct picture by the product barcode from the pictures folder.
The folder containing something like 4500 pictures.
The name of the file can be in 4 formats.
XXXXXX.jpg/png - short barcode unknown number of characters(numbers only).
00000(from 1 to unknow number of leading zero)XXXX(then the short barcode).jpg/png
729(as leading number)00000(from 1 to unknow number of leading zero)XXXX(then the short barcode).jpg/png
72900000XXXXXXYYY YYY YYY.jpg/png same as option 3 but with some characters(Y-represent a character).
I came up with something like that:
$i = new RegexIterator($a, '($barcode)\D*|^([0][0-9]+$barcode)\D+|(729[0-9][0-9]+$barcode)\D+|(729[0-9][0-9]+$barcode).+/', RegexIterator::GET_MATCH);
$barcode - can be 7290000232 or 0000232 or 232
But it doesn't working.
Any ideas?

You have four cases that build up on each other:
Only numbers, 1 to unlimited times: \d+
1. with leading zeros: effectively the same as 1., as zeros are numbers ;) No need for a special case here
1. optionally preceeded by 729: (?:729)?\d+ (this may already be used for the cases 1.-3.)
3. with optional characters (zero to unlimited): (?:729)?\d+(?:[a-zA-Z])*
Only the extension is left to be added:
((?:729)?\d+(?:[a-zA-Z])*\.(?:jpg|png))
Now there's one thing left. This regex would match on abc123.jpg, as 123.jpg is perfectly valid. To counter this we add ^ (this denotes the start of the input):
^((?:729)?\d+(?:[a-zA-Z])*\.(?:jpg|png))
demo # regex101
As you insert the barcode (from case 1) yourself there are few adjustments to be made:
^((?:729)?0*?$barcode(?:[a-zA-Z])*\.(?:jpg|png))
Here we have to insert the second case with 0*? (0 zero to unlimited times, lazy).
Regarding the [a-zA-Z]: you have to decide what to allow here. Currently it only allows lowercase and uppercase letters. If you want to allow spaces (for example), then simply add them to the character group: [a-zA-Z ].
For non-latin characters you can use [\x{00BF}-\x{1FFF}\x{2C00}-\x{D7FF}a-zA-Z] (credits to this comment) as your character group, so your regex would then look like:
^((?:729)?0*?123(?:[\x{00BF}-\x{1FFF}\x{2C00}-\x{D7FF}a-zA-Z])*\.(?:jpg|png))
demo # regex101

From what I understand - options 1-3 are all the same (729 is a digit string same as others):
^\d+(?:jpg|png)$
With 4 you are saying 'allow word characters and whitespaces, but only if name starts with 729'. So it is now:
(?:(?:^\d+[.](?:jpg|png)$)|(?:^729\d*[\w\s]+[.](?:jpg|png)$))
Demo here.
\s matches spaces, '\w' matches word characters.

Why preg_match("/[^(22|75)]/", "25") returns false?

I want to test that a given string does not belong to the following group of strings: 22 75.
Could anyone please tell why PHP's preg_match("/[^(22|75)]/", "25") returns 0?
The weirdest thing is that preg_match("/[^(22|76)]/", "25") returns 1 as expected...
Edit:
I guess I understand the reason and the nature of my mistake, not how to make a check that a given two-digit number does not match 20,21,22,23,24, 75,76,77,78,79,80 ?
I need to assemble an expression to check a given age against the list of allowed ages (this presumes only two-digit numbers)
I can not use anything other than preg_match() (!preg_match() is not available in my case), I can only play with RegEx pattern.

Time for a Regular Expressions Lesson!
Explanation of your regular expressions
[^(22|75)]
Matches false because it is looking for the following:
A single character NOT in this list of characters: |()275
[^(22|76)]
Matches true because it is looking for:
A single character NOT in this list of characters: |()276
Why does it do this?
You wrapped your regex in a character class (click for more info)
To give an example of how character classes work, look at this regex:
[2222222222222221111111199999999999]
This character class will only match ONE character, if it is a 2,1 or a 9.
How to make it work for you:
To match the number 25 (or 22, 52, and 55), you can use this character class:
[25]{2}
This will match a 2 digit number containing either 2 or 5 at either place.

What are character classes
A character class is a collection of characters (not strings). With a character class, you're telling the regex engine to match only one out of several characters.
For example, if you wanted to match an a or e, you'd write [ae]. If you wanted to match grey or gray, you'd write gr[ae]y.
Explanation for first regex
[^(22|75)]
As said above, character classes match a single character from the list. Here, you're using ^ to get a negated character class, so this will match a single character that's not in the supplied list. In this case, our list contains the following characters:
( 2 2 | 7 5 )
Multiple characters are only counted once. So this effectively becomes:
( 2 | 7 5 )
25 is the string you're matching against. The regular expression asks: Does the supplied string contains a single character that's not in the above list? 2 and 5 are in the list, so the answer is No. That explains why preg_match() returns false (not false, 0 to be precise).
Explanation for second regex
/[^(22|76)]/
It is same as above. The only difference here is that 5 changed to 6. It now checks for the absense of any of the following characters:
( 2 | 7 6 )
The supplied string is still the same as before - 25. Does the string contain any character that's not in the list above? Yes! It does contain 5 (which is not in the list anymore). That explains why preg_match() returns 1.
Difference between character classes and alternation
They look similar but they do different things. Alternation can be used when you want to match a single regular expression out of several possible regular expressions. Unlike character classes, alternation works with a regex. A simple string, say foo is also a valid regular expression. It matches f followed by o, followed by o.
Use character class when you want to match one of the included characters. Use alternation when you want to match between n number of strings.
How should you modify the regex to obtain correct results
Negate your preg_match() call and use the regex (22|75):
if (!preg_match('/(22|75)/', '25')) {
# code...
}
This is the easiest approach. If you want to achieve this directly using a regex, then you may want to use look-arounds.
Alternative solution
If this is exactly what you're trying to do, then you don't need a regular expression at all. Leverage PHP's built-in functions for string manipulation! Not only it will be faster, it will be more readable too.
In this case, a simple in_array() should suffice:
if(!in_array('25', array(25,75))) {
# code ...
}

In regular expression, [...] match any character inside the bracket.
To be more correct:
[^...]: match any charcter not listed inside the bracket. (^: negate)
Remove the [, and ] if you want to match string that starts with 22 or 76.

Your regex is asking "does the string contain a character that is not (, 2, 7, 5, | or )?"
This is obviously not what you want.
Try this:
if( !in_array("25", array("22","75")))

^ inside of [...] is a negation of a character list.
(22|76)
Regex multiple character negation is a very tricky subject and can't be easily resolved.
But you could invert the return result of preg_match function ie.:
if(!preg_match('#22|76#', '25', $matches))
doSomething();

Matching ugly extra abbreviations and numbers in titles with PHP regex

I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?

In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)

I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular Expression to extract numbers from string with special characters - php

/([0-9]+[,\.]?)+/ matches all of your strings By the way... your RegEx needs a point to match because + says 1 or more matches. * is 0 or more and ? is 0 or 1

Related

Regex any string with max length

Using preg_match to validate a string format

Trying to create a regex for 1d barcode(RegexIterator)

Why preg_match("/[^(22|75)]/", "25") returns false?

Matching ugly extra abbreviations and numbers in titles with PHP regex

Categories

Resources