PHP regex last occurrence of words - php

My string is: /var/www/domain.com/public_html/foo/bar/folder/another/..
I want to remove the root folder from this string, to get only public folder, because some servers have multiple websites inside.
My actual regex is: /^(.*?)(www|public_html|public|html)/s
My actual result is: /domain.com/public_html/foo/bar/folder/another/..
But i want to remove the last ocorrence, and get somethig like this: /foo/bar/folder/another/..
Thanks!

You have to use a greedy quantifier and to check if the alternative is enclosed between slashes using lookarounds:
/^.*(?<![^\/])(?:www|public(?:_html)?|html)(?![^\/])/
About the lookarounds: I use negative lookarounds with a negated character class to check if there is a slash or the limit of the string at the same time. This way you are sure that for instance html is a folder and not the part of another folder name.
I removed the s modifier that is useless. I removed the capture groups too since the goal is to replace all with an empty string.

The ? makes your expression non-greedy which is not actually what you want here. Try:
^(.*)(www|public_html|public|html)
which should keep going until the last match.
Demo: https://regex101.com/r/v5WbB3/1/

Related

PHP preg_replace_callback match string but exclude urls

What I'm trying to do is find all the matches within a content block, but ignore anything that is inside tags, for use inside preg_replace_callback().
For example:
test
test title
test
In this case, I want the first line to match, and the third line to match, but NOT the url match, nor the title match in between the a tags.
I've got a regex that I feel like is close:
#(?!<.*?)(\btest\b)(?![^<>]*?>)#si
(and this will not match the url part)
But how do I modify the regex to also exclude the "test" between a and /a?
If it's always the same pattern you can use [A-Z] or a combination like [A-Za-z]
I ended up solving it myself. This regex pattern will do what I wanted:
#(?!<a[^>]*?>)(\btest\b)(?![^<]*?<\/a>)#si

How to remove backpath/parentpath from the URL?

Input:
http://foo/bar/baz/../../qux/
Desired Output:
http://foo/qux/
This can be achieved using regular expression (unless someone can suggest a more efficient alternative).
If it was a forward look-up, it would be as simple as:
/\.\.\/[^\/]+/
Though I am not familiar with with how to make a backward look up for the first "/" (ie. not doing /[a-z0-9-_]+\/\.\./).
One of the solutions I thought of is to use strrev then apply forward look up regex (first example) and then do strrev. Though I am sure there is a more efficient way.
Not the clearest question I've ever seen, but if I understand what you're asking, I think you only need to switch around what you have like this:
/[^\/]+/\.\./
...then replace that with a /
Do that until no replacements are made and you should have what you want
EDIT
Your attempt seems to try to match a forward slash / and two dots \.\. followed by a slash / (or \/ - they should both match the same thing), then one or more non-slash characters[^/]+, terminated by a slash /. Flipping it around, you want to find a slash followed by one or more non-slash characters and a terminating slash, then two dots and a final slash.
You may be confused into thinking that the regex engine parses and consumes things as it goes (so you wouldn't want to consume a directory name that is not followed by the correct number of dots), but that's not how it typically works - a regex engine matches the entire expression before it replaces or returns anything. So, you can have two dots followed by a directory name, or a directory name followed by two dots - it doesn't make a difference to the engine.
If your attempt is using the slash-enclosed Perl-style syntax, then you would of course need to use \/ for any slashes you're trying to match such as the middle one, but I would also recommend matching and replacing the enclosing slashes in the url as well: I think the PHP would be something like
preg_replace('/\/[^\/]+\/\.\.\//', '/', $input)
(??)
Technically what do you want is replace segments of '/path1/path2/../../' by '/' what is needed to do that is match 'pathx/'^n'../'^n that is definetly NOT a regular expression (Context Free Lenguaje) ... but most of Regex libraries supports some non regular lenguajes and can (with a lot of effort) manage those kind of lenguajes.
An easy way to solve it is stay in Regular Expressions and cycle several times, replacing '/[^./]+/../' by ''
if you still to do it in a single step, Lookahead and grouping is needed, but it will be hard to write it, (I'm not so used on, but I will try)
EDIT:
I've found the solution in only 1 REGEX... but should use PCRE Regex
([^/.]+/(?1)?\.\./)
I've based my solution on the folowing link:
Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)
(note that dots are "forbidden" in the first section, you cannot have path.1/path.2/ if you whant to is quite more complex because you should admit them but forbid '../' as valid in the first section
this sub expression is for admiting the path names like 'path1/'
[^/.]+/
this sub expression is for admiting the double dots.
\.\./
you can test the regexp in
https://www.debuggex.com/
(remember to set it in PCRE mode)
Here is a working copy:
https://eval.in/52675

PHP string replace question

If I have a string that equals "firstpart".$unknown_var."secondpart", how can I delete everything between "firstpart" and "secondpart" (on a page that does not know the value of $unknown_var)?
Thanks.
Neel
substr_replace
start and length can be computed with strpos. Or you could go the regex route if you're comfortable learning about them.
As long as $unkonwn_var does not contain neither firstpart nor secondpart, you can match against
firstpart(.*)secondpart
and replace it with
firstpartsecondpart
You shoukd use a regexp to do so.
preg_replace('/firspart(.*)secondpart/','firstpartsecondpart',$yourstring);
will replace anything between the first occurence of firstpart and the last of secondpart, if you want to delete multiple time between first and second part you can make the expression ungreedy by replacing (.*) by (.*?) in the expression
preg_replace('/firspart(.*?)secondpart/','firstpartsecondpart',$yourstring);

How can I check if a string EXACTLY matches a regex pattern?

I'm working on a registration script for my client's product sales website.
I'm currently working on a reference ID input area, and I want to make sure that the reference ID is within the correct parameters of the payment method
The Reference ID will look something like this: XXXXX-XXXXX-XXXXX
I'm trying to use this RegEx pattern to match it: /(\w+){5}-(\w+){5}-(\w+){5}/
This matches it perfectly, but it also matches XXXXX-XXXXX-XXXXXXXXXX
Or at least it finds a match in there. I want it to make sure the entire string matches. I'm not too familiar with RegEx
How can I do this?
You need to use start and finish anchors. Alternatively, if you don't need to capture those groups, you can omit the parenthesis.
Also, the +{5} means match more than once exactly 5 times. I believe you didn't want that so I dropped the +.
/^\w{5}-\w{5}-\w{5}\z/
Also, I used \z so your string doesn't match "abcde-12345-edcba\n".
Use ^ and $ to match the start and end of the input string, respectively.
Also note that your use of + was superfluous, as (\w+){5} means "a word character, at least once, times five" which means it can match at least five times. You probably meant (\w){5} (or just \w{5} if you don't need the backreference; I'll assume in my example that you do).
/^(\w){5}-(\w){5}-(\w){5}$/
put the regular expression in between ^ and $ to match the whole string and check if it matches anything
example:
/^(\w+){5}-(\w+){5}-(\w+){5}$/
Try
/^([\w]{5,5})-([\w]{5,5})-([\w]{5,5})$/i
There are several online regex tester out there, I work with this one before I code.
Enclose it in "^" and "$" thus:
/^(\w+){5}-(\w+){5}-(\w+){5}$/
You need ^ to match the start of the string and $ to match the end:
/^\w{5}-\w{5}-\w{5}$/
Note that (\w+){5} is incorrect because that means five repetitions of \w+, but that in turn means "one or more word characters".
/^(\w){5}-(\w){5}-(\w){5}$/
You need to explicitly say that you want the pattern to start at the beginning of the string and end at it's ending.
You can improve it: /^((\w){5}-){2}(\w){5}$/ ; this way, you can easily modify the number of elements your serial number might have.
Use ^ and $ to mark the start and end of the regex string:
/^\w{5}-\w{5}-\w{5}$/
http://www.regular-expressions.info/anchors.html
In preg, \b marks word boundaries. So you could try with something like
/\b(\w+){5}-(\w+){5}-(\w+){5}\b/

What does this Regular Expression do

$pee = preg_replace( '|<p>|', "$1<p>", $pee );
This regular expression is from the Wordpress source code (formatting.php, wpautop function); I'm not sure what it does, can anyone help?
Actually I'm trying to port this function to Python...if anyone knows of an existing port already, that would be much better as I'm really bad with regex.
The preg_replace() function - somewhat confusingly - allows you to use other delimiters besides the standard "/" for regular expressions, so
"|<p>|"
Would be a regular expression just matching
"<p>"
in the text. However, I'm not clear on what the replacement parameter of
"$1<p>"
would be doing, since there's no grouping to map to $1. It would seem like as given, this is just replacing a paragraph tag with an empty string followed by a paragraph tag, and in effect doing nothing.
Anyone with more in-depth knowledge of PHP quirks have a better analysis?
wordpress really calls a variable "pee" ?
I'm not sure what the $1 stands for (there are no braces in the first parameter?), so I don't think it actually does anything, but i could be wrong.
...?
Actually, it looks like this takes the first <p> tag and prepends the previous regular expression's first match to it (since there's no match in this one),
However, it seems that this behavior is bad to say the least, as there's no guarantee that preg_* functions won't clobber $1 with their own values.
Edit: Judging from Jay's comment, this regex actually does nothing.
The pipe symbols | in this case do not have the default meaning of "match this or that" but are use as alternative delimiters for the pattern instead of the more common slashes /. This may make sense, if you want to match for / without having to escape those appearances (e.g. /(.\*)\/(.\*)\// is not as readable as #/(.\*)/(.\*)/#). Seems quite contra productive to use | instead which is just another reserved char for patterns, though.
Normally $1 in the replacement pattern should match the first group denoted by parentheses. E.g if you've got a pattern like
"(.*)<p>"
$0 would contain the whole match and $1 the part before the <p>.
As the given reg-ex does not declare any groups and $1 is not a valid name for a variable (in PHP4) defined elsewhere, this call seems to replace any occurrences of <p> with <p>?
To be honest, now I'm also quite confused. Just a guess: gets another pattern-matching method (preg_match and the like) called before the given line so the $1 is "leaked" from there?
I highly recommend the amazing RegexBuddy
I believe that line does nothing.
For what it's worth, this is the previous line, in which $1 is set:
$pee = preg_replace('!<p>([^<]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);
However, I don't think that's worth anything. In my testing, $1 does not maintain a value from one preg_replace to the next, even if the next doesn't set its own value for $1. Remember that PHP variable names cannot begin with a number (see: http://php.net/language.variables ), so $1 is not a PHP variable. It only means something within a single preg_replace, and in this case the rules of preg_replace suggest it doesn't mean anything.
That said, autop being such a widely-used function makes me doubt my own conclusion that this line is doing nothing. So I look forward to someone correcting me.
The regex simply matches the literal text . The choice to delimit the regex with the vertical bar instead of forward slashes is very unfortunate. It doesn't change the code, but it makes it harder for humans to read. (It also makes it impossible to use the alternation operator in the regex.)
$1 is not a valid variable name in PHP, so $1 is never interpolated in double-quoted strings. The $1 gets passed to preg_replace unchanged. preg_replace parses the replacement string, and replaces $1 with the contents of the first capturing group. If there is no capturing group, $1 is replaced with nothing.
Thus, this code does the same as:
$pee = preg_replace( '/<p>/', "<p>", $pee );
It's not correct that this does nothing. The search-and-replace will run, slowing down your software, and eating up memory for temporary copies of $pee.
It replace the match from the pattern
"|<p>|"
by the string
"$1<p>"
The | in the replacement pattern is causes the regex engine to match either the part on the left side, or the part on the right side.
I do not get why it's used that way because usually it's for something like "ta(b|p)e"...
For the $1, I guess the variable $1 is in the PHP code and it replaced during the preg_replace so if $1 = "test"; the replacement will replace the
"<p>"
to
"test<p>"
But I am not sure of it for the $1

Categories