My pattern in preg_match not working as i want - php

I want to get:
false for A3312+A192389+B2323+B948348
false for A6712+A1922389
false for A4512
true for A4552+B948348
(only one Aelement and one or more Belement)
I tried:
print_r(preg_match('/^A((?!\+A).)\+B(.*)$/', $string));

So it looks as if the basic pattern you're after is "A(digits)+B(digits)"
Your expression seems a bit over-complicated for that purpose, I'd simply use:
preg_match('/^A\d+(\+B\d+)+$/', $input, $match);
If the input can be alphanumeric (A(alnum)+B(alnum), just use
preg_match('/^A[:alnum:]+(\+B[:alnum:]+)+$/', $input, $match);
instead.
Basically, the 2 absolute hard requirements are: the input string should start with an upper-case A, and there should be one + sign, followed by an upper-case B. Whatever the characters in between should be, you just have to add a character group that best fits your requirements. From the examples you gave \d+ (one or more digits) seems to fit the bill. If "A00FF33+B123ABC" should be valid, I'd either use [:alnum:] or [0-9A-F] (for hex values) instead.
The trick for the one-or-more requirement is to create a group for the +Belement part of the match, and repeat that group one or more times:
\+B\d+ //matches once
(\+B\d+)+ //matches once or more

This regex will work for you:
^A\d+\+B\d+(?:\+B\d+)*$
Use it as:
preg_match('/^A\d+\+B\d+(?:\+B\d+)*$/', $string);
This matches A followed by digits the +B followed by digits repeated 1 or more times.
A4552+B948348 is matched,
A4552+B948348+B948348+B948348 is matched,
A3312+A192389+B2323+B948348 is not matched

And with not-only decimals after A/B
^A[^\\+]+(?:\\+B[^\\+]+)+$

Related

Sanitize phone number: regular expression match all except first occurence is on first position

regarding to this post "https://stackoverflow.com/questions/35413960/regular-expression-match-all-except-first-occurence" I'm wondering how to find the first occurence on a string only if it start's with a specfic character in PHP.
I would like to sanitize phonenumbers. Example bad phone number:
+49+12423#23492#aosd#+dasd
Regex to remove all "+" except first occurence.
\G(?:\A[^\+]*\+)?+[^\+]*\K\+
Problem: it should remove every "+" only if it starts with "+" not if the first occurence-position is greater than 1.
The regex to remove everything except numbers is easy:
[^0-9]*
But I don't know how to combine those two within one regex. I would just use preg_replace() twice.
Of course I would be able to use a workaround like if ($str[0] === '+') {...} but I prefer to learn some new stuff (regex :)
Thanks for helping.
You can use
(?:\G(?!\A)|^\+)[^+]*\K\+
See the regex demo. Details:
(?:\G(?!\A)|^\+) - either the end of the preceding successful match or a + at the start of string
[^+]* - zero or more chars other than +
\K - match reset operator discarding the text matched so far
\+ - a + char.
See the PHP demo:
$re = '/(?:\G(?!\A)|^\+)[^+]*\K\+/m';
$str = '+49+12423#23492#aosd#+dasd';
echo preg_replace($re, '', $str);
// => +4912423#23492#aosd#dasd
You seem to want to combine the two queries:
A regex to remove everything except numbers
A regex to remove all "+" except first occurence
Here is my two cents:
(?:^\+|\d)(*SKIP)(*F)|.
Replace what is matched with nothing. Here is an online demo
(?:^\+|\d) - A non-capture group to match a starting literal plus or any digit in the range from 0-9.
(*SKIP)(*F) - Consume the previous matched characters and fail them in the rest of the matching result.
| - Or:
. - Any single character other than newline.
I'd like to think that this is a slight adaptation of what some consider "The best regex trick ever" where one would first try to match what you don't want, then use an alternation to match what you do want. With the use of the backtracking control verbs (*SKIP)(*F) we reverse the logic. We first match what we do want, exclude it from the results and then match what we don't want.

Regex to select a certain word followed by a integer or decimal?

I need a regular expression to detect the phrase Figure 1.5: in a given string. Also, I intend on using this expression in a PHP preg_replace() function.
Here are some more examples:
...are given. Figure 2.1: shows that...
...are given. Figure 3: shows that...
...are given. Figure 1.16: shows that...
...are given. Figure 0.4 shows that...
...are given. figure 5.1: shows that...
With my limited Regex knowledge, I was able to create this:
/\wFigure \d*\.?\d*/g
But that doesn't even begin to handle all of the permutations that could occur.
I would appreciate any suggestions that you might have.
There are several points here:
You are using \w at the start, perhaps, as a word boundary. In fact, \w matches a letter, digit or _ and actually requires this char to be at the exact location. However, there is no word char before Figure, so you need to either remove \w or replace with \b.
preg_replace replaces all non-overlapping occurrences by default, you do not need the g modifier
\d*\.?\d* is fine here, but since you want to match any digits followed with zero or more occurrences of . and digits you can use a more specific pattern like \d+(?:\.\d+)*.
You can use
preg_replace('/Figure \d+(?:\.\d+)*/', '', $string)
See the regex demo.
Details:
Figure - a string
- a space (replace with \s+ to match any one or more whitespaces, and consider adding u flag after last / if you need to find all Unicode whitespaces)
\d+ - one or more digits
(?:\.\d+)* - zero or more occurrences of . and one or more digits.

Regex to get the first number after a certain string followed by any data until the number

I have a piece of data, retrieved from the database and containing information I need. Text is entered in a free form so it's written in many different ways. The only thing I know for sure is that I'm looking for the first number after a given string, but after that certain string (before the number) can be any text as well.
I tried this (where mytoken is the string I know for sure its there) but this doesn't work.
/(mytoken|MYTOKEN)(.*)\d{1}/
/(mytoken|MYTOKEN)[a-zA-Z]+\d{1}/
/(mytoken|MYTOKEN)(.*)[0-9]/
/(mytoken|MYTOKEN)[a-zA-Z]+[0-9]/
Even mytoken can be written in capitals, lowercase or a mix of capitals and lowercase character. Can the expression be case insensitive?
You do not need any lazy matching since you want to match any number of non-digit symbols up to the first digit. It is better done with a \D*:
/(mytoken)(\D*)(\d+)/i
See the regex demo
The pattern details:
(mytoken) - Group 1 matching mytoken (case insensitively, as there is a /i modifier)
(\D*) - Group 2 matching zero or more characters other than a digit
(\d+) - Group 3 matching 1 or more digits.
Note that \D also matches newlines, . needs a DOTALL modifier to match across newlines.
You need to use a lazy quantifier. You can do that by putting a question mark after the star quantifier in the regex: .*?. Otherwise, the numbers will be matched by the dot operator until the last number, which will be matched by \d.
Regex: /(mytoken|MYTOKEN)(.*?)\d/
Regex demo
You can use the opposite:
/(mytoken|MYTOKEN)(\D+)(\d)/
This says: mytoken, followed by anything not a number, followed by a number. The (lazy) dot-star-soup is not always your best bet. The desired number will be in $3 in this example.

PHP regex replacement doesn't match

I'm using this regex to get house number of a street adress.
[a-zA-ZßäöüÄÖÜ .]*(?=[0-9])
Usually, the street is something like "Ohmstraße 2a" or something. At regexpal.com my pattern matches, but I guess preg_replace() isn't identical with it's regex engine.
$num = preg_replace("/[a-zA-ZßäöüÄÖÜ .]*(?=[0-9])/", "", $num);
Update:
It seems that my pattern matches, but I've got some encoding problems with the special chars like äöü
Update #2:
Turns out to be a encoding problem with mysqli.
First of all if you want to get the house number then you should not replace it. So instead of preg_replace use preg_match.
I modified your regex a little bit to match better:
$street = 'Öhmsträße 2a';
if(preg_match('/\s+(\d+[a-z]?)$/i', trim($street), $matches) !== 0) {
var_dump($matches);
} else {
echo 'no house number';
}
\s+ matches one or more space chars (blanks, tabs, etc.)
(...) defines a capture group which can be accesses in $matches
\d+ matches one or more digits (2, 23, 235, ...)
[a-z] matches one character from a to z
? means it's optional (not every house number has a letter in it)
$ means end of string, so it makes sure the house number is at the end of the string
Make sure you strip any spaces after the end of the house number with trim().
The u modifier can help sometimes for handling "extra" characters.
I feel this may be a character set or UTF-8 issue.
It would be a good idea to find out what version of PHP you're running too. If I recall correctly, full Unicode support came in around 5.1.x

Matching ugly extra abbreviations and numbers in titles with PHP regex

I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?
In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)
I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo

Categories