Php lookahead assertion at the end of the regex - php

I want to write a regex with assertions to extract the number 55 from string unknownstring/55.1, here is my regex
$str = 'unknownstring/55.1';
preg_match('/(?<=\/)\d+(?=\.1)$/', $str, $match);
so, basically I am trying to say give me the number that comes after slash, and is followed by a dot and number 1, and after that there are no characters. But it does not match the regex. I just tried to remove the $ sign from the end and it matched. But that condition is essential, as I need that to be the end of the string, because the unknownstring part can contain similar text, e.g. unknow/545.1nstring/55.1. Perhaps I can use preg_match_all, and take the last match, but I want understand why the first regex does not work, where is my mistake.
Thanks

Use anchor $ inside lookahead:
(?<=\/)\d+(?=\.1$)
RegEx Demo
You cannot use $ outside the positive lookahead because your number is NOT at the end of input and there is a \.1 following it.

Related

Sanitize phone number: regular expression match all except first occurence is on first position

regarding to this post "https://stackoverflow.com/questions/35413960/regular-expression-match-all-except-first-occurence" I'm wondering how to find the first occurence on a string only if it start's with a specfic character in PHP.
I would like to sanitize phonenumbers. Example bad phone number:
+49+12423#23492#aosd#+dasd
Regex to remove all "+" except first occurence.
\G(?:\A[^\+]*\+)?+[^\+]*\K\+
Problem: it should remove every "+" only if it starts with "+" not if the first occurence-position is greater than 1.
The regex to remove everything except numbers is easy:
[^0-9]*
But I don't know how to combine those two within one regex. I would just use preg_replace() twice.
Of course I would be able to use a workaround like if ($str[0] === '+') {...} but I prefer to learn some new stuff (regex :)
Thanks for helping.
You can use
(?:\G(?!\A)|^\+)[^+]*\K\+
See the regex demo. Details:
(?:\G(?!\A)|^\+) - either the end of the preceding successful match or a + at the start of string
[^+]* - zero or more chars other than +
\K - match reset operator discarding the text matched so far
\+ - a + char.
See the PHP demo:
$re = '/(?:\G(?!\A)|^\+)[^+]*\K\+/m';
$str = '+49+12423#23492#aosd#+dasd';
echo preg_replace($re, '', $str);
// => +4912423#23492#aosd#dasd
You seem to want to combine the two queries:
A regex to remove everything except numbers
A regex to remove all "+" except first occurence
Here is my two cents:
(?:^\+|\d)(*SKIP)(*F)|.
Replace what is matched with nothing. Here is an online demo
(?:^\+|\d) - A non-capture group to match a starting literal plus or any digit in the range from 0-9.
(*SKIP)(*F) - Consume the previous matched characters and fail them in the rest of the matching result.
| - Or:
. - Any single character other than newline.
I'd like to think that this is a slight adaptation of what some consider "The best regex trick ever" where one would first try to match what you don't want, then use an alternation to match what you do want. With the use of the backtracking control verbs (*SKIP)(*F) we reverse the logic. We first match what we do want, exclude it from the results and then match what we don't want.

Php regex that matches substring followed by any length of character and then comma

I have a long string containing Copyright: 'any length of unknown string here',
what regex should I write to exactly match this as substring in a string?
I tried this preg_replace('/Copyright:(.*?)/', 'mytext', $str); but its not working, it only matches the Copyright:
A lazily quantified pattern at the end of the pattern will always match no text in case of *? and 1 char only in case of +?, i.e. will match as few chars as possible to return a valid match.
You need to make sure you get to the ', by putting them into the pattern:
'/Copyright:.*?\',/'
^^^
See the regex demo
The ? in your group 1 (.*?) makes this block lazy, i.e. matching as few characters as possible. Removing that would solve it.
Copyright:(.*)',
However, that would match everything in that same line. If you have text in that same line, make sure to limit it further. My screenshot below just just grouping () to make it easier for you to look, you can do without the parentheses.
I usually use Regxr.com to test my regular expression, there's also many other similar tools online, note that this one is great in UX, but does not support lookbehind.

PHP RegEx get first letter after set of characters

I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group

Parsing from:x; but not lfrom:x;

I am trying to parse a string with something like :
preg_match( "|from:(.*?);|", $string, $match);
But then I found that the string can also contain lfrom: and _from:
A few examples of how the string can be:
var1:34234;from:website1.com;lfrom:website2.com;var2:343423;
lfrom:website1.com;var1:4234234;from:website2.com
from:website1.com;_from:website2.com;lfrom:website2.com;var1:43523;
How can I parse only from:(.*?); and not lfrom, _from, etc.
I was gonna give you the solution but I better explain you about the lookbehind modifier.
In regex each time you "match" a h for example, that h will add 1 to the pointer of where the regex is at the moment so you dont want to "add" nothing to the pointer. You just want to look if the from is preceded by a ;\s\b or the start of the string. You don't want to match the VOID because there are voids everywhere!!
So, an example: (?<a)b that would match a b that has an a before it. So it just does the next: When a b found it looks before it, if there is an a it matches the regex.
So... (?<=[;\s\b]|^)from:(\w+\.\w+) Would match a from that right before it has [;\s\b] OR ^ (The string start)
DEMO
Pretty easy, huh!?
You could either use an assertion:
|(?<!l)from:(.*?);|
Or look for the preceding ; or line start:
|(;|^)from:(.*?);|m
It might also be a good idea to replace the generic .*? match with [^;]*
Assuming preceding from is whitespace or a ;
/[\s\b;]from:([^;]+);/
This will only match from preceeded by a space, word boundary, or ;. I also prefer to narrow captures, i.e. [^;]+ vs. [.*?];.
There is a concept called (negative) lookbehind, which asserts that your current position is (not) preceded by certain things. I guess, in this case I would go with a positive lookbehind, and assert that from is preceded by a the start of the string, a line-break or a ;:
preg_match('|(?<=^|;)from:(.*?);|m', $string, $match);
Make sure to you multi-line mode m, so that ^ will also match at the start of each line and not just at the start of the string.
If you only wanted to exlude l and _ in front of from but accept any other characters, then a negative lookbehind might be what you are looking for:
preg_match('|(?<![l_])from:(.*?);|m', $string, $match);
The convenient thing about lookbehinds is, that they are not included in the actual match. They just check what's there without actually consuming it. Here is some reading.

Regular Expression get part of string

How can I get only the text inside "()"
For example from "(en) English" I want only the "en".
I've written this pattern "/\(.[a-z]+\)/i" but it also gets the "()";
Thanks in advance.
<?php
$string = '(en) English';
preg_match('#\((.*?)\)#is', $string, $matches);
echo $matches[1]; # en
?>
$matches[0] will contain entire matches string, $matches[1] will first group, in this case (.*?) between ( and ).
What is the dot in your regex good for, I assume its there by mistake.
Second to give you an alternative to the capturing group answer (which is perfectly fine!), here is to soltution using lookbehind and lookahead.
(?<=\()[a-z]+(?=\))
See it here on Regexr
The trick here is, those lookarounds do not match the characters inside, they just check if they are there. So those characters are not included in the result.
(?<=\() positive look behind assertion, checking for the character ( before its position
(?=\) positive look ahead assertion, checking for the character ( ahead of its position
That should do the job.
"/\(([a-z]+)\)/i"
The easiest way is to get "/\(([a-z]+)\)/i" and use the capture group to get what you want.
Otherwise, you have to get into look ahead, look behinds
You could use a capture group like everyone else proposes
OR
you can make your match only check if your match is preceded by "(" and followed by ")". It's called Lookahead and lookbehind.
"/(?<=\().[a-z]+(?=\))/i"

Categories