Regular Expression get part of string - php

How can I get only the text inside "()"
For example from "(en) English" I want only the "en".
I've written this pattern "/\(.[a-z]+\)/i" but it also gets the "()";
Thanks in advance.

<?php
$string = '(en) English';
preg_match('#\((.*?)\)#is', $string, $matches);
echo $matches[1]; # en
?>
$matches[0] will contain entire matches string, $matches[1] will first group, in this case (.*?) between ( and ).

What is the dot in your regex good for, I assume its there by mistake.
Second to give you an alternative to the capturing group answer (which is perfectly fine!), here is to soltution using lookbehind and lookahead.
(?<=\()[a-z]+(?=\))
See it here on Regexr
The trick here is, those lookarounds do not match the characters inside, they just check if they are there. So those characters are not included in the result.
(?<=\() positive look behind assertion, checking for the character ( before its position
(?=\) positive look ahead assertion, checking for the character ( ahead of its position

That should do the job.
"/\(([a-z]+)\)/i"

The easiest way is to get "/\(([a-z]+)\)/i" and use the capture group to get what you want.
Otherwise, you have to get into look ahead, look behinds

You could use a capture group like everyone else proposes
OR
you can make your match only check if your match is preceded by "(" and followed by ")". It's called Lookahead and lookbehind.
"/(?<=\().[a-z]+(?=\))/i"

Related

Php lookahead assertion at the end of the regex

I want to write a regex with assertions to extract the number 55 from string unknownstring/55.1, here is my regex
$str = 'unknownstring/55.1';
preg_match('/(?<=\/)\d+(?=\.1)$/', $str, $match);
so, basically I am trying to say give me the number that comes after slash, and is followed by a dot and number 1, and after that there are no characters. But it does not match the regex. I just tried to remove the $ sign from the end and it matched. But that condition is essential, as I need that to be the end of the string, because the unknownstring part can contain similar text, e.g. unknow/545.1nstring/55.1. Perhaps I can use preg_match_all, and take the last match, but I want understand why the first regex does not work, where is my mistake.
Thanks
Use anchor $ inside lookahead:
(?<=\/)\d+(?=\.1$)
RegEx Demo
You cannot use $ outside the positive lookahead because your number is NOT at the end of input and there is a \.1 following it.

PHP RegEx get first letter after set of characters

I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group

Parsing from:x; but not lfrom:x;

I am trying to parse a string with something like :
preg_match( "|from:(.*?);|", $string, $match);
But then I found that the string can also contain lfrom: and _from:
A few examples of how the string can be:
var1:34234;from:website1.com;lfrom:website2.com;var2:343423;
lfrom:website1.com;var1:4234234;from:website2.com
from:website1.com;_from:website2.com;lfrom:website2.com;var1:43523;
How can I parse only from:(.*?); and not lfrom, _from, etc.
I was gonna give you the solution but I better explain you about the lookbehind modifier.
In regex each time you "match" a h for example, that h will add 1 to the pointer of where the regex is at the moment so you dont want to "add" nothing to the pointer. You just want to look if the from is preceded by a ;\s\b or the start of the string. You don't want to match the VOID because there are voids everywhere!!
So, an example: (?<a)b that would match a b that has an a before it. So it just does the next: When a b found it looks before it, if there is an a it matches the regex.
So... (?<=[;\s\b]|^)from:(\w+\.\w+) Would match a from that right before it has [;\s\b] OR ^ (The string start)
DEMO
Pretty easy, huh!?
You could either use an assertion:
|(?<!l)from:(.*?);|
Or look for the preceding ; or line start:
|(;|^)from:(.*?);|m
It might also be a good idea to replace the generic .*? match with [^;]*
Assuming preceding from is whitespace or a ;
/[\s\b;]from:([^;]+);/
This will only match from preceeded by a space, word boundary, or ;. I also prefer to narrow captures, i.e. [^;]+ vs. [.*?];.
There is a concept called (negative) lookbehind, which asserts that your current position is (not) preceded by certain things. I guess, in this case I would go with a positive lookbehind, and assert that from is preceded by a the start of the string, a line-break or a ;:
preg_match('|(?<=^|;)from:(.*?);|m', $string, $match);
Make sure to you multi-line mode m, so that ^ will also match at the start of each line and not just at the start of the string.
If you only wanted to exlude l and _ in front of from but accept any other characters, then a negative lookbehind might be what you are looking for:
preg_match('|(?<![l_])from:(.*?);|m', $string, $match);
The convenient thing about lookbehinds is, that they are not included in the actual match. They just check what's there without actually consuming it. Here is some reading.

RegEx - Positive Lookbehind Problem

How do I use a positive look-behind to match more than 1 occurrence using a greedy + ?
This works:
(?<=\w)\w+
But I need to match all \w similar to:
(?<=\w+)\w+
The syntax is wrong in the second example and it does not work.
How do I make a positive lookbehind match multiple occurrences?
A very dirty way to do it is to reverse the string and use a positive lookahead instead. This is a trick that I use in Javascript (no lookbehinds supported there :( ).
So you must do something like:
$string = 'This is a long string that must show this is what will happen';
$str_rev = strrev($string);
if (preg_match('!(si)(?=\w+)(\w+)!i', $str_rev, $matches)) {
print_r($matches);
}
The code above will match is in the occurrences of THIS in the string. The second \w+ is just to show where it matched and is not needed in your example.
Keep in mind that this technique is possible only if you use only one direction for greediness for the Lookbehind/aheads (e.g. you can't use a lookbehind with \w+ together with a lookahead with \w+ )
You probably just want to match it without any lookbehinds and then use your capturing groups:
if (preg_match('~[abc]+([cde]+)~', $string, $matches)) {
echo $matches[1]; // will contain the [cde]+ part
}
Sorry to say, but no quantifiers in lookbehinds!
I found this in the perlretut
Lookahead (?=regexp) can match
arbitrary regexps, but lookbehind
(?<=fixed-regexp) only works for
regexps of fixed width
I assume that this is also valid for the php regex engine.

Getting a random string within a string

I need to find a random string within a string.
My string looks as follows
{theme}pink{/theme} or {theme}red{/theme}
I need to get the text between the tags, the text may differ after each refresh.
My code looks as follows
$str = '{theme}pink{/theme}';
preg_match('/{theme}*{\/theme}/',$str,$matches);
But no luck with this.
* is only the quantifier, you need to specify what the quantifier is for. You've applied it to }, meaning there can be 0 or more '}' characters. You probably want "any character", represented by a dot.
And maybe you want to capture only the part between the {..} tags with (.*)
$str = '{theme}pink{/theme}';
preg_match('/{theme}(.*){\/theme}/',$str,$matches);
var_dump($matches);
'/{theme}(.*?){\/theme}/' or even more restrictive '/{theme}(\w*){\/theme}/' should do the job
preg_match_all('/{theme}(.*?){\/theme}/', $str, $matches);
You should use ungreedy matching here. $matches[1] will contain the contents of all matched tags as an array.
$matches = array();
$str = '{theme}pink{/theme}';
preg_match('/{([^}]+)}([^{]+){\/([^}]+)}/', $str, $matches);
var_dump($matches);
That will dump out all matches of all "tags" you may be looking for. Try it out and look at $matches and you'll see what I mean. I'm assuming you're trying to build your own rudimentary template language so this code snippet may be useful to you. If you are, I may suggest looking at something like Smarty.
In any case, you need parentheses to capture values in regular expressions. There are three captured values above:
([^}]+)
will capture the value of the opening "tag," which is theme. The [^}]+ means "one or more of any character BUT the } character, which makes this non-greedy by default.
([^{]+)
Will capture the value between the tags. In this case we want to match all characters BUT the { character.
([^}]+)
Will capture the value of the closing tag.
preg_match('/{theme}([^{]*){\/theme}/',$str,$matches);
[^{] matches any character except the opening brace to make the regex non-greedy, which is important, if you have more than one tag per string/line

Categories