I'm creating a text parser, that basically looks for the following:
{IF SOMETHING} then include this text {ENDIF SOMETHING}
I can find that by using the regex:
/{IF [A-Z]+}.*{ENDIF [A-Z]}/
But that wont help if there are nested conditions. So i was looking to do something more like:
/{IF ([A-Z]+)}.*{ENDIF $1}/
But that doesn't seem to work - is it possible?
You can use this other syntax too: \g{1} that is useful to avoid confusion with a backreference followed with a literal digit. This syntax allows to use relative references like this:\g{-1} (i.e. the last defined capture group on the left)
$1 is only used in a replacement string with preg_replace.
PHP Regex uses \1 instead of $1. For more information, refer to the PHP Manual on regex back references.
Related
My string is: /var/www/domain.com/public_html/foo/bar/folder/another/..
I want to remove the root folder from this string, to get only public folder, because some servers have multiple websites inside.
My actual regex is: /^(.*?)(www|public_html|public|html)/s
My actual result is: /domain.com/public_html/foo/bar/folder/another/..
But i want to remove the last ocorrence, and get somethig like this: /foo/bar/folder/another/..
Thanks!
You have to use a greedy quantifier and to check if the alternative is enclosed between slashes using lookarounds:
/^.*(?<![^\/])(?:www|public(?:_html)?|html)(?![^\/])/
About the lookarounds: I use negative lookarounds with a negated character class to check if there is a slash or the limit of the string at the same time. This way you are sure that for instance html is a folder and not the part of another folder name.
I removed the s modifier that is useless. I removed the capture groups too since the goal is to replace all with an empty string.
The ? makes your expression non-greedy which is not actually what you want here. Try:
^(.*)(www|public_html|public|html)
which should keep going until the last match.
Demo: https://regex101.com/r/v5WbB3/1/
I have a string like this
<div><span style="">toto</span> some character <span>toto2</span></div>
My regex:
/(<span .*>)(.*)(<\/span>)/
I used preg_match and it returns the entire string
<span style="">toto</span> some character <span>toto2</span>
I want it returns:
<span style="">toto</span>
and
<span>toto2</span>
What do I need to do to achieve this? Thanks.
How about this:
/(<span[^>]*>)(.*?)(<\/span>)/
Check the docs here at PHP preg_match Repetition:
By default, the quantifiers are "greedy", that is, they match as much as possible
and
However, if a quantifier is followed by a question mark, then it becomes lazy, and instead matches the minimum number of times possible
Even though I guess all previous answers are correct, I just want to add that as you only want to capture the whole expressions (i.e. from to ) you don't have to capture eveything inside the regexp with ()
The following does what you expect without capturing additional expressions
/(<span\w*[^>]*>[^<]*<\/span>)/
(tested on http://rubular.com/)
EDIT : of course there might be some differences between PHP and ruby regexp implementations, but the idea is the same :)
I have a problem with a regular expression.
I'm working with tokens and I have to parse a text like this:
Just some random text
#IT=AB|First statement# #xxxx=xxx|First statement|Second statement#
More text
I use preg_replace_callback since I have to use the first statement or the second one, depending on the first expression is true or not; it's a sort of IF...ELSE... statement.
What I expect are 2 elements like this:
#IT=AB|First statement#
#xxxx=xxx|First statement|Second statement#
So I can start manipulating them inside my callback function.
I tried with this regex /#.*#/, but i get the entire string, it's not parsed into elements.
How can I achieve that? I'm sorry but regex aren't my thing :(
The quantifier * is greedy by default. So a .* will match as much as it can and as a result it'll match a # as well. To fix this you can make the * non-greedy by adding a ? after it. Now a .*? will try to much as little as it can.
/#.*?#/
or you can look for only non # characters between two #:
/#[^#]*#/
I need the regex to find function calls in strings in php, I have tried to search here on stackoverflow but none of the ones i've tried worked.
this pattern: ^.*([\w][\(].*[\)])
This will match: functionone(fgfg) but also functionone(fgfg) dhgfghfgh functiontwo() as one match. Not 2 separate matches (as in functionone(fgfg) and functiontwo().
I don't know how to write it but I think this is what I need.
1. Any string, followed by (
2. Any string followed by )
And then it should stop, not continue. Any regex-gurus that can help me out?
I see 5 issues with your regex
If you want to match 2 functions in the same row, don't use the anchor ^, this will anchor the regex to the start of the string.
You then don't need .* at the start maybe more something like \w+ (I am not sure what the spec of a function name in PHP is)
if there is only one entry in a character class (and its not a negated one), you don't need the character class
The quantifier between the brackets needs to be a lazy one (followed by a ?). So after this 4 points your regex would look something like
\w+\(.*?\)
Is a regex really the right tool for this job?
Don't use regexp for this... use PHP's built-in tokenizer
A function signature is not a regular language. As such, you cannot use a regular expression to match a function signature. Your current regex will match signatures that are NOT valid function signatures.
What I would suggest you use is the PHP tokenizer.
I have written the following Regex in PHP for use within preg_replace().
/\b\S*(.com|.net|.us|.biz|.org|.info|.xxx|.mx|.ca|.fr|.in|.cn|.hk|.ng|.pr|.ph|.tv|.ru|.ly|.de|.my|.ir)\S*\b/i
This regex removes all URLs from a string pretty effectively this far (though I am sure I can write a better one). I need to be able to add an exclusion though from a specific domain. So the pseudo code will look like this:
IF string contains: .com or .net or. biz etc... and does not contain: foo.com THEN execute condition.
Any idea on how to do this?
Just add a negative lookahead assertion:
/(?<=\s|^)(?!\S*foo\.com)\S*\.(com|net|us|biz|org|info|xxx|mx|ca|fr|in|cn|hk|ng|pr|ph|tv|ru|ly|de|my|ir)\S*\b/im
Also, remember that you need to escape the dot - and that you can move it outside the alternation since each of the alternatives starts with a dot.
Use preg_replace_callback instead.
Let your callback decide whether to replace.
It can give more flexibility if the requirements become too complicated for a simple regex.