Help with PHP regex - php

I'm trying to replace all the letters and spaces after the first two, using PHP's preg_replace. Here is my failed attempt at doing so:
echo preg_replace('/^[a-zA-Z]{2}([a-zA-Z ])*.*$/i','','FALL 2012'); //expected output is "FA2012", but it outputs nothing
I'm just trying to replace the part in parentheses ([a-zA-Z ]) .. I'm guessing I'm not doing it right to just replace that part..

You're asking it to replace the entire string. Use a lookbehind to match the first two characters instead.
echo preg_replace('/(?<=^[a-z]{2})[a-z ]*/i','','FALL 2012');
By the way, the i modifier means it's case insensitive, so you don't need to match both uppercase and lowercase characters.

The /i at the end makes it case-insensitive. The (?<=regex) means look immediately before the current position for the beginning of the line followed by 2 letters.
echo preg_replace('/(?<=^[a-z]{2})[a-z ]*/i','','FALL 2012');

you are saying to replace the entire match with blank (''). You want to put the parenthesis around the parts you want to keep and then replace with $1$2 which is equal to what is in the first ($1) and second ($2) set of parenthesis.
preg_replace("/^([a-z]{2})[a-z\s]*(.*)$/i", '$1$2', $string);

In this case you can get away with a (?<=lookbehind); however, in certain other cases you may find the \K escape to be more suitable. What it does is reset the start offset value passed by the application to the current position in the subject, effectively dumping the portion of the string that was consumed thus far in the current match. For example:
preg_replace('^[a-z]{2}\K[a-z ]*/i', '', 'FALL 2012')
Now only the substring matched by [a-z ]* is substituted.

Related

I cannot make this regular expression work

May be it's simple but I cannot do it work.
I have two filename strings:
wrap.html
wrap-popup.html
I try to select both using
/.*wrap.*\.htm.*/ mask
But it only matches the first one "wrap.html".
If I use /.*wrap.+\.htm.*/, it only matches the second one "wrap-popup.html"
I thought * sounds 0 to infinite characters.
What's the correct mask to select both strings ???
Consider the string "this is text with 2 html pages: wrap.html and wrap-popup.html"
The first regex /.*wrap.*\.htm.*/ will match that whole string.
So if you don't want to include the first part of the string then you need to remove the first .*
Now /wrap.*\.htm.*/ will match "wrap.html and wrap-popup.html" from the string.
That's because the first .* is a greedy match.
So when we change the regex to /wrap.*?\.html?/ the .*? is now a lazy match. And the l? is an optional l. So the regex will return "wrap.html".
But if we want to retrieve both we need a global search, or it would only find the first match.
A preg_match_all (instead of preg_match) with the regex /wrap[\w\-]*?\.html?/ will match both "wrap.html" and "wrap-popup.html".
That second regex of yours wouldn't match wrap.html because with the .+ it expected at least 1 character between "match" and the dot.

PHP - Preg match reversal?

How do you inverse a Regex expression in PHP?
This is my code:
preg_match("!<div class=\"foo\">.*?</div>!is", $source, $matches);
This is checking the $source String for everything within the Container and stores it in the $matches variable.
But what I want to do is reversing the expression i.e. I want to get everything that is NOT inside the container.
I know there is something called negative lookahead, but I am really bad with Regular expressions and didn't manage to come up with a working solution.
Simply using ?!
preg_match("?!<div class=\"foo\">.*?</div>!is", $source, $matches);
Does not seem to work.
Thanks!
New solution
Since your goal is to remove the matching divs, as mentioned in the comment, using the original regex with preg_split, plus implode would be the simpler solution:
implode('', preg_split('~<div class="foo">.*?</div>~is', $text))
Demo on ideone
Old solution
I'm not sure whether this is a good idea, but here is my solution:
~(.*?)(?:<div class="foo">.*?</div>|$)~is
Demo on regex101
The result can be picked out from capturing group 1 of each matches.
Note that the last match is always an empty string, and there can be empty string match between 2 matching divs or if the string starts with matching div. However, you need to concatenate them anyway, so it seems to be a non-issue.
The idea is to rely on the fact that lazy quantifier .*? will always try the sequel (whatever comes after it) first before advancing itself, resulting in something similar to look-ahead assertion that makes sure that whatever matched by .*? will not be inside <div class="foo">.*?</div>.
The div tag is matched along in each match in order to advance the cursor past the closing tag. $ is used to match the text after the last matching div.
The s flag makes . matches any character, including line separators.
Revision: I had to change .+? to .*?, since .+? handle strings with 2 matching div next to each other and strings start with matching div.
Anyway, it's not a good idea to modify HTML with regular expression. Use a parser instead.
<div class=\"foo\">.*?</div>\K|.
You can simply do this by using \K.
\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match

Php lookahead assertion at the end of the regex

I want to write a regex with assertions to extract the number 55 from string unknownstring/55.1, here is my regex
$str = 'unknownstring/55.1';
preg_match('/(?<=\/)\d+(?=\.1)$/', $str, $match);
so, basically I am trying to say give me the number that comes after slash, and is followed by a dot and number 1, and after that there are no characters. But it does not match the regex. I just tried to remove the $ sign from the end and it matched. But that condition is essential, as I need that to be the end of the string, because the unknownstring part can contain similar text, e.g. unknow/545.1nstring/55.1. Perhaps I can use preg_match_all, and take the last match, but I want understand why the first regex does not work, where is my mistake.
Thanks
Use anchor $ inside lookahead:
(?<=\/)\d+(?=\.1$)
RegEx Demo
You cannot use $ outside the positive lookahead because your number is NOT at the end of input and there is a \.1 following it.

Parsing from:x; but not lfrom:x;

I am trying to parse a string with something like :
preg_match( "|from:(.*?);|", $string, $match);
But then I found that the string can also contain lfrom: and _from:
A few examples of how the string can be:
var1:34234;from:website1.com;lfrom:website2.com;var2:343423;
lfrom:website1.com;var1:4234234;from:website2.com
from:website1.com;_from:website2.com;lfrom:website2.com;var1:43523;
How can I parse only from:(.*?); and not lfrom, _from, etc.
I was gonna give you the solution but I better explain you about the lookbehind modifier.
In regex each time you "match" a h for example, that h will add 1 to the pointer of where the regex is at the moment so you dont want to "add" nothing to the pointer. You just want to look if the from is preceded by a ;\s\b or the start of the string. You don't want to match the VOID because there are voids everywhere!!
So, an example: (?<a)b that would match a b that has an a before it. So it just does the next: When a b found it looks before it, if there is an a it matches the regex.
So... (?<=[;\s\b]|^)from:(\w+\.\w+) Would match a from that right before it has [;\s\b] OR ^ (The string start)
DEMO
Pretty easy, huh!?
You could either use an assertion:
|(?<!l)from:(.*?);|
Or look for the preceding ; or line start:
|(;|^)from:(.*?);|m
It might also be a good idea to replace the generic .*? match with [^;]*
Assuming preceding from is whitespace or a ;
/[\s\b;]from:([^;]+);/
This will only match from preceeded by a space, word boundary, or ;. I also prefer to narrow captures, i.e. [^;]+ vs. [.*?];.
There is a concept called (negative) lookbehind, which asserts that your current position is (not) preceded by certain things. I guess, in this case I would go with a positive lookbehind, and assert that from is preceded by a the start of the string, a line-break or a ;:
preg_match('|(?<=^|;)from:(.*?);|m', $string, $match);
Make sure to you multi-line mode m, so that ^ will also match at the start of each line and not just at the start of the string.
If you only wanted to exlude l and _ in front of from but accept any other characters, then a negative lookbehind might be what you are looking for:
preg_match('|(?<![l_])from:(.*?);|m', $string, $match);
The convenient thing about lookbehinds is, that they are not included in the actual match. They just check what's there without actually consuming it. Here is some reading.

php regular expression help finding multiple filenames only not full URL

I am trying to fix a regular expression i have been using in php it finds all find filenames within a sentence / paragraph. The file names always look like this: /this-a-valid-page.php
From help i have received on SOF my old pattern was modified to this which avoids full urls which is the issue i was having, but this pattern only finds one occurance at the beginning of a string, nothing inside the string.
/^\/(.*?).php/
I have a live example here: http://vzio.com/upload/reg_pattern.php
Remove the ^ - the carat signifies the beginning of a string/line, which is why it's not matching elsewhere.
If you need to avoid full URLs, you might want to change the ^ to something like (?:^|\s) which will match either the beginning of the string or a whitespace character - just remember to strip whitespace from the beginning of your match later on.
The last dot in your expression could still cause problems, since it'll match "one anything". You could match, for example, /somefilename#php with that pattern. Backslash it to make it a literal period:
/\/(.*?)\.php/
Also note the ? to make .* non-greedy is necessary, and Arda Xi's pattern won't work. .* would race to the end of the string and then backup one character at a time until it can match the .php, which certainly isn't what you'd want.
To find all the occurrences, you'll have to remove the start anchor and use the preg_match_all function instead of preg_match :
if(preg_match_all('/\/(.*?)\.php/',$input,$matches)) {
var_dump($matches[1]); // will print all filenames (after / and before .php)
}
Also . is a meta char. You'll have to escape it as \. to match a literal period.

Categories