How to find this regex using PHP - php

find all the <1></1> and <2></2> and <3></3>... in a string.

<(\d+)></\1>
should work. This ensures that the regex won't match <1></4> for example.
\1 is here a backreference which must match exactly the same string as the first capturing group (the (\d+) in the first angle brackets).

One regex to match any of them?
<([1-3])></\1>
Should there code allow for anything to be posted in between the > and the <? Something like this then:
<([1-3])>(.*?)</\1>

<STYLE[^>]*>([\s\S]*?)<\/STYLE[^>]*>
Just replace STYLE with your tag like 1, 2 whatever.

You can use the following regex: <[0-9]></[0-9]>
EDIT: To avoid that the search matches <2></3> too, you can use a sub-expression and a backreference to instantiate it: <([0-9])></\1>

Related

PHP - Preg match reversal?

How do you inverse a Regex expression in PHP?
This is my code:
preg_match("!<div class=\"foo\">.*?</div>!is", $source, $matches);
This is checking the $source String for everything within the Container and stores it in the $matches variable.
But what I want to do is reversing the expression i.e. I want to get everything that is NOT inside the container.
I know there is something called negative lookahead, but I am really bad with Regular expressions and didn't manage to come up with a working solution.
Simply using ?!
preg_match("?!<div class=\"foo\">.*?</div>!is", $source, $matches);
Does not seem to work.
Thanks!
New solution
Since your goal is to remove the matching divs, as mentioned in the comment, using the original regex with preg_split, plus implode would be the simpler solution:
implode('', preg_split('~<div class="foo">.*?</div>~is', $text))
Demo on ideone
Old solution
I'm not sure whether this is a good idea, but here is my solution:
~(.*?)(?:<div class="foo">.*?</div>|$)~is
Demo on regex101
The result can be picked out from capturing group 1 of each matches.
Note that the last match is always an empty string, and there can be empty string match between 2 matching divs or if the string starts with matching div. However, you need to concatenate them anyway, so it seems to be a non-issue.
The idea is to rely on the fact that lazy quantifier .*? will always try the sequel (whatever comes after it) first before advancing itself, resulting in something similar to look-ahead assertion that makes sure that whatever matched by .*? will not be inside <div class="foo">.*?</div>.
The div tag is matched along in each match in order to advance the cursor past the closing tag. $ is used to match the text after the last matching div.
The s flag makes . matches any character, including line separators.
Revision: I had to change .+? to .*?, since .+? handle strings with 2 matching div next to each other and strings start with matching div.
Anyway, it's not a good idea to modify HTML with regular expression. Use a parser instead.
<div class=\"foo\">.*?</div>\K|.
You can simply do this by using \K.
\K resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match

Php lookahead assertion at the end of the regex

I want to write a regex with assertions to extract the number 55 from string unknownstring/55.1, here is my regex
$str = 'unknownstring/55.1';
preg_match('/(?<=\/)\d+(?=\.1)$/', $str, $match);
so, basically I am trying to say give me the number that comes after slash, and is followed by a dot and number 1, and after that there are no characters. But it does not match the regex. I just tried to remove the $ sign from the end and it matched. But that condition is essential, as I need that to be the end of the string, because the unknownstring part can contain similar text, e.g. unknow/545.1nstring/55.1. Perhaps I can use preg_match_all, and take the last match, but I want understand why the first regex does not work, where is my mistake.
Thanks
Use anchor $ inside lookahead:
(?<=\/)\d+(?=\.1$)
RegEx Demo
You cannot use $ outside the positive lookahead because your number is NOT at the end of input and there is a \.1 following it.

PHP RegEx get first letter after set of characters

I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group

How to match 2nd instance in regex

get_by_my_column
If I only want to match the get_by portion of the above string, how can I do this? I keep reading on this regex cheatsheet that I should use \n but I can't figure out how to implement it properly...
I've tried variations of the following...
/((_){2})/
/(_+){2}/
/(\w+?_\w+?)_\w+/ (use non greedy quantifiers, your substring should be in capture group 1)
or just /\w+?_\w+?/ <---(edit: won't work, you do need that second underscore as regex structure to force the non greedy \w up to it :])
Do you need to use a regex for this? You could use explode() and just grab the first two elements of the resulting array.
Try
preg_match('/(^[a-z]+[_][a-z]+)/', $string, $results);
This matches a string that starts with a group of letters followed by an underscore followed by another set of letters.
Edit: (lowercase letters)
try /^get_by. ^ for the condition that g must be the starting character.

preg_match doesn't capture the content

what is wrong with my preg_match ?
preg_match('numVar("XYZ-(.*)");',$var,$results);
I want to get all the CONTENT from here:
numVar("XYZ-CONTENT");
Thank you for any help!
I assume this is PHP? If so there are three problems with your code.
PHP's PCRE functions require that regular expressions be formatted with a delimiter. The usual delimiter is /, but you can use any matching pair you want.
You did not escape your parentheses in your regular expression, so you're not matching a ( character but creating a RE group.
You should use non-greedy matching in your RE. Otherwise a string like numVar("XYZ-CONTENT1");numVar("XYZ-CONTENT2"); will match both, and your "content" group will be CONTENT1");numVar("XYZ-CONTENT2.
Try this:
$var = 'numVar("XYZ-CONTENT");';
preg_match('/numVar\("XYZ-(.*?)"\);/',$var,$results);
var_dump($results);
Paste your example string into http://txt2re.com and look at the PHP result.
It will show that you need to escape characters that have special meaning to the regex engine (such as the parentheses).
You should escape some chars:
preg_match('numVar\("XYZ-(.*)"\);',$var,$results);
preg_match("/XYZ\-(.+)\b/", $string, $result);
print_r($result[0]); // full matches ie XYZ-CONTENT
print_r($result[1]); // matches in the first paren set (.*)

Categories