RegEx - Positive Lookbehind Problem - php

How do I use a positive look-behind to match more than 1 occurrence using a greedy + ?
This works:
(?<=\w)\w+
But I need to match all \w similar to:
(?<=\w+)\w+
The syntax is wrong in the second example and it does not work.
How do I make a positive lookbehind match multiple occurrences?

A very dirty way to do it is to reverse the string and use a positive lookahead instead. This is a trick that I use in Javascript (no lookbehinds supported there :( ).
So you must do something like:
$string = 'This is a long string that must show this is what will happen';
$str_rev = strrev($string);
if (preg_match('!(si)(?=\w+)(\w+)!i', $str_rev, $matches)) {
print_r($matches);
}
The code above will match is in the occurrences of THIS in the string. The second \w+ is just to show where it matched and is not needed in your example.
Keep in mind that this technique is possible only if you use only one direction for greediness for the Lookbehind/aheads (e.g. you can't use a lookbehind with \w+ together with a lookahead with \w+ )

You probably just want to match it without any lookbehinds and then use your capturing groups:
if (preg_match('~[abc]+([cde]+)~', $string, $matches)) {
echo $matches[1]; // will contain the [cde]+ part
}

Sorry to say, but no quantifiers in lookbehinds!
I found this in the perlretut
Lookahead (?=regexp) can match
arbitrary regexps, but lookbehind
(?<=fixed-regexp) only works for
regexps of fixed width
I assume that this is also valid for the php regex engine.

Related

Regex to only match two asterisks

I'm sorry if the question is unclear
I am trying to make a regular expression that replaces everything with ** at the beginning and end with "Test" (for now at least.)
Currently this is my pattern:
\*{2}[\w\s]+\*{2}
This works so that strings like **Car**, **123**, **This is a test** get replaced with "Test", except also for example ***Bird*** becomes *Test*.
So my question is if there is a way to make sure strings only get replaced with "Test" when there's exactly two ** at beginning and end, no more (so ***Bird*** stays ***Bird*** and doesn't get replaced).
In my opinion, you can have a lazy regex that does match the *-chars-* pattern in a way where it doesn't bother about how many * are there before and after.
Use preg_replace_callback to check with the captured groups and return Test accordingly if only 2 * before and after meet this condition. This way, your code is much more readable and simple.
Snippet:
<?php
$newText = preg_replace_callback(
'/([*]+)[^*]+([*]+)/',
function ($matches) {
return strlen($matches[1]) == 2 && strlen($matches[2]) == 2 ? 'Test' : $matches[0];
},
$text
);
Online Demo
If you wish to keep the text inside ** as is and make it bold, you can capture it in a group and surround it with bold tags.
Snippet:
<?php
$newText = preg_replace_callback(
'/([*]+)([^*]+)([*]+)/',
function ($matches) {
return strlen($matches[1]) == 2 && strlen($matches[3]) == 2 ? '<b>' . $matches[2] . '</b>' : $matches[0];
},
$text
);
Online Demo
You can do it with a handful of zero-length assertions. This is the regex that I suggest: (?<!\*)\*{2}(?!\*).*?(?<!\*)(?<!\*)\*{2}(?!\*) You can play with this here.
Explanation:
(?<!\*) A negative lookbehind: the match must not be preceded with a star character. It can be preceded with any other character, as well as with the line start. For the record, ^ is a well-known zero-length assertion.
\*{2} - matches two stars
(?!\*) - negative lookahead. This means that the next character must not be a star. However, this is a zero-length assertion, so the next character will not be matched.
.*? - everything else - the star is for the non-greedy match. Not necessary, but I find it enhances the regex match. You can also group this if you want to do something with the match later.
(?<!\*) - negative lookbehind - another zero-length assertion. It specifies that the last character must not be a star.
\*{2} - two stars, to close the match
(?!\*) - A negative lookahead: the match must not be followed by a star. It can be any other character, as well as the end of line. Btw, $ is a well-known zero-length assertion.

Regex negative lookahead to match url

I'm trying to match urls
/api/v1/users...
/api/v1/other_stuff...
Except for
/api/v1/users/invitation_register
I've been trying to use negative lookbehind
^\/api.*(?<!\binvitation_register\b)
and several similar constructs and have no idea how to actually do this.
Any help would be more then welcome.
You can use this negative lookahead instead of lookbehind:
^\/api\/(?!v1\/users\/invitation_register\b).*
(?!v1\/users\/invitation_register\b) is a negative lookahead that asserts that v1/users/invitation_register fails after /api/.
RegEx Demo
If your intended match is always starting with /api/v1/... then you can use:
^\/api\/v1\/(?!users\/invitation_register\b).*
which asserts that users/invitation_register fails after /api/va/.

Matching something with regex in a string and removing / cutting out everything that did not match

I am wondering how to solve this. Let's say I have a string looking like this:
xx-123-456-12-xxl-1235-6122
I also have an regex that will try match anything that look like this
[LETTER][LETTER]-[NUMBER][NUMBER][NUMBER]-[NUMBER][NUMBER][NUMBER]
meaning in the strong above it would match this:
xx-123-456
How do I go about cutting everything else out of that string, that did not match the regular expression. Meaning that everything after xx-123-456 should be cut our and removed. This would need to work as well no matter where in the string the regex finds the match.
Any ideas / solutions?
This will work:
$txt = 'xx-123-456-12-xxl-1235-6122';
preg_match( '/^[a-z]{2}-\d{3}-\d{3}/i', $txt, $matches );
echo $matches[0];
^ = begin of the string;
[a-z] = any characters from a through z;
{2} = previous pattern repeat 2;
\d = any digit/number
There are several ways to do this in php:
Use preg_match to match what you want and print matched array element
Use preg_replace and use captured group to use a back-reference in replacement.
Use preg_replace and use lookbehind assertion
Use preg_replace and use \K (match reset)
Here is one approach using #4:
$str = 'xx-123-456-12-xxl-1235-6122';
$str = preg_replace('/^\p{L}{2}-\p{N}{3}-\p{N}{3}\K.*$/u', '', $str);
//=> xx-123-456
RegEx Demo

Php lookahead assertion at the end of the regex

I want to write a regex with assertions to extract the number 55 from string unknownstring/55.1, here is my regex
$str = 'unknownstring/55.1';
preg_match('/(?<=\/)\d+(?=\.1)$/', $str, $match);
so, basically I am trying to say give me the number that comes after slash, and is followed by a dot and number 1, and after that there are no characters. But it does not match the regex. I just tried to remove the $ sign from the end and it matched. But that condition is essential, as I need that to be the end of the string, because the unknownstring part can contain similar text, e.g. unknow/545.1nstring/55.1. Perhaps I can use preg_match_all, and take the last match, but I want understand why the first regex does not work, where is my mistake.
Thanks
Use anchor $ inside lookahead:
(?<=\/)\d+(?=\.1$)
RegEx Demo
You cannot use $ outside the positive lookahead because your number is NOT at the end of input and there is a \.1 following it.

Regular Expression get part of string

How can I get only the text inside "()"
For example from "(en) English" I want only the "en".
I've written this pattern "/\(.[a-z]+\)/i" but it also gets the "()";
Thanks in advance.
<?php
$string = '(en) English';
preg_match('#\((.*?)\)#is', $string, $matches);
echo $matches[1]; # en
?>
$matches[0] will contain entire matches string, $matches[1] will first group, in this case (.*?) between ( and ).
What is the dot in your regex good for, I assume its there by mistake.
Second to give you an alternative to the capturing group answer (which is perfectly fine!), here is to soltution using lookbehind and lookahead.
(?<=\()[a-z]+(?=\))
See it here on Regexr
The trick here is, those lookarounds do not match the characters inside, they just check if they are there. So those characters are not included in the result.
(?<=\() positive look behind assertion, checking for the character ( before its position
(?=\) positive look ahead assertion, checking for the character ( ahead of its position
That should do the job.
"/\(([a-z]+)\)/i"
The easiest way is to get "/\(([a-z]+)\)/i" and use the capture group to get what you want.
Otherwise, you have to get into look ahead, look behinds
You could use a capture group like everyone else proposes
OR
you can make your match only check if your match is preceded by "(" and followed by ")". It's called Lookahead and lookbehind.
"/(?<=\().[a-z]+(?=\))/i"

Categories