How to fix this regular expression? - php

i have a regular expression to remove certain parts from a URI. However it doesn't take into account multiple parts in a way that works :-). Can somebody assist?
$regex = '~/{(.*?)}\*~'
$uri = '/user/{action}/{id}*/{subAction}*';
$newuri = preg_replace($regex, '' , $uri);
//$newuri = /user/
//Should be: $newuri = /user/{action}/
I know it matches the following part as one match:
/{action}/{id}/{subAction}
But it should match the following two seperately:
/{id}*
/{subAction}*

To me it looks like your {(.*?)}\* test is matching all of {action}/{id}*, which judging from what you've written isn't what you want.
So change the Kleene closure to be less greedy:
'~/{([^}]*)}\*~'
But do you really need to capture the part inside the curly braces? Seems to me you could go with this one instead:
'~/{[^}]*}\*~'
Either way, the [^}]* part guarantees that the expression will not match {action}/ because it doesn't end in an asterisk.

Related

PHP - regex match a URL that contains parentheses

I'm using the following pattern to match URLs in a string:
$pattern = '%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s';
This works pretty well. However, the match fails with URLs like this:
https://twitter.com/search/from:username(exclude:replies)min_faves:20
It seems to stop at the parentheses. Any ideas on how I could modify the pattern to match this type of URL? Thanks in advance!
Take the parentheses of of your negated character class and it works.
[^\s<>]+
https://regex101.com/r/4amF6u/1/
Full version:
$pattern = '%\b(([\w-]+://?|www[.])[^\s<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s';

Extract only numbers from link with codeception

I have this link, and i need to work only with the numbers from that link.
How would i extract them?
I didn't find any answer that would work with codepcetion.
https://www.my-website.com/de/booking/extras#tab-nav-extras-1426
I tired something like this.
$I->grabFromCurrentUrl('\d+');
But i won't work.
Any ideas ?
Staying within the framework:
The manual clearly says that:
grabFromCurrentUrl
Executes the given regular expression against the current URI and
returns the first capturing group. If no parameters are provided, the
full URI is returned.
Since you didn't used any capturing groups (...), nothing is returned.
Try this:
$I->grabFromCurrentUrl('~(\d+)$~');
The $ at the end is optional, it just states that the string should end with the pattern.
Also note that the opening and closing pattern delimiters you would normally use (/) are replaced by tilde (~) characters for convenience, since the input string has a great chance to contain multiple forward slashes. Custom pattern delimiters are completely standard in regexp, as #Naktibalda pointed it out in this answer.
You can use parse_url() to parse entire URL and then extract the part which is most interested for you. After that you can use regex to extract only numbers from the string.
$url = "https://www.my-website.com/de/booking/extras#tab-nav-extras-1426";
$parsedUrl = parse_url($url);
$fragment = $parsedUrl['fragment']; // Contains: tab-nav-extras-1426
$id = preg_replace('/[^0-9]/', '', $fragment);
var_dump($id); // Output: string(4) "1426"
A variant using preg_match() after parse_url():
$url = "https://www.my-website.com/de/booking/extras#tab-nav-extras-1426";
preg_match('/\d+$/', parse_url($url)['fragment'], $id);
var_dump($id[0]);
// Outputs: string(4) "1426"

Regex solution to find a regex pattern and parse it.

I am trying to write a simple router for PHP. And I am facing some problem. Example of the routes are as follows.
$route = []
$route['index'] = "/";
$route['home'] = "/home";
$route['blog'] = "/blog/[a-z]";
$route['article'] = "/article/id/[\d+]/title/[\w+]";
Now if we take the last example, I would like the regex only to look for patterns such as [\d+] and [\w+] that is it. I will use explode() to actually cross check if URL contains /blog/, /id/ and /title/. I don't want regex's help with that, but only to detect the patterns and match it.
for example. If a given $URL was dev.test/blog/id/11/title/politics
I would need some like: preg_match($route['url'], $URL)
So, now the preg_match() function knows, that after "/article/id/ there is a pattern asking only for a digit to occur, then if the digit is found it will continue parsing, or else it will show fail or 0.
I don't know much about regex to handle this complex problem.
Your question is a little unclear, but if you want only to capture the [\d+] or [\w+] parts of the target string, you should consider using brackets to capture sub-matches, and the (?:xxx) non-capturing match, which checks for the pattern but does not add it to the array, something like:
$route['article'] = "(?:\/article\/id\/)([\d+])(?:\/title\/)([\w+])";
This will add the matched [\d+] and [\w+] to your matches array only. You'll find them like so:
$matches[0][0] and matches[1][0].
See http://www.regular-expressions.info/tutorial.html for an outstanding tutorial on regexes, by the way.
If you aren't sure of the values of 'article', 'id', and 'title' in advance, then you will probably at least need to be sure of the number of directories given in the url. That means as long as you know the position of the [\d+] and [\w+] entries, you could use
$route['article'] = "(?:\/[\w+]\/[w+]\/)([\d+])(?:\/[\w+]\/)([\w+])"

preg_replace between anything between {} problems with javascripts

I need help to solve this problem. I am not good in preg patterns, so maybe it is very simple :)
I have this one preg_replace in my template system:
$code = preg_replace('#\{([a-z0-9\-_].*?)\}#is', '\1', $code);
which works fine, but in case i have some javascript code like this google plus button:
window.___gcfg = {lang: 'sk'};
it replaces is to this one:
window.___gcfg = ;
I tried this pattern: #\{([a-z0-9\-_]*?)\}#is
That works well with gplus button, but when I have some like this (google adsense code) (adsbygoogle = window.adsbygoogle || []).push({});
result is (adsbygoogle = window.adsbygoogle || []).push();
I need rule to be applied something like this, but I dont know why it is not working
\{([a-z0-9-_])\} - Just letters, numbers, underscore and dash. Anything else i need to keep as it is.
Thank you for answers.
Edit:
More simple example of what I need:
{SOMETHING} -> do rewrite
{A_SOMETHING} -> do rewrite
{} -> do not rewrite
{name : 'me'} -> do not rewrite
So if there is something other than a-z0-9-_ or if there is nothing between {}, just do not rewrite and skip that.
So, it looks like you want to match curly braces where the contents are solely a-z0-9_-.
In that case, try:
$code = preg_replace('#\{([a-z0-9\-_]+?)\}#is',
'whatever_you_wanted_to_replace_with',
$code);
Your original regex said "match [a-z0-9_-] followed by 0 or more of anything" (the .*?).
This one says "match 1 or more of [a-z0-9_-]".
As to what you want to replace such things with, you haven't made it clear, so I assume you can do that bit.
You can try to search script substrings with the first part of the pattern and your template tags with the second part. A script substring will be replaced by itself, and a template tag with its content.
Since the pattern uses the branch reset feature (?|...|...) the capture groups have the same number (i.e. the number 1).
$pattern = '#(?|(<script\b(?>[^<]++|<(?!/script>))+</script>)|{([\w-]++)})#i';
$code = preg_replace($pattern, '$1', $code);
Note that you can do the same without the branch reset feature, but you must change the replacement pattern:
$pattern = '#(<script\b(?>[^<]++|<(?!/script>))+</script>)|{([\w-]++)}#i';
$code = preg_replace($pattern, '$1$2', $code);
An other way consists to use the backtracking control verbs (*SKIP) and (*FAIL) to skip script substrings. (*SKIP) forces to not retry the substring (matched before with subpattern on its left) when the subpattern on its right fails. (*FAIL) makes the pattern fail immediately:
$pattern = '#<script\b(?>[^<]++|<(?!/script>))+</script>(*SKIP)(*FAIL)|{([\w-]++)}#i';
$code = preg_replace($pattern, '$1', $code);
The difference with the two precedent patterns is that you don't need at all to put any reference for script substrings in the replacement pattern.

Need help with a PHP preg_match

I'm not very good with preg_match so please forgive me if this is easy. So far I have...
preg_match("/".$clear_user."\/[0-9]{10}/", $old_data, $matches)
The thing I'm trying to match looks like..
:userid/time()
Right now, the preg_match would have a problem with 22/1266978013 and 2/1266978013. I need to figure out how to match the colon.
Also, is there a way to match all the numbers until the next colon instead of just the next 10 numbers, because time() could be more or less than 10.
try this as your pattern:
#:$userId/[0-9]+#
preg_match("#:$userId/[0-9]+#", $old_data, $matches);
preg_match("/:*".$clear_user."\/[0-9]{10}/", $old_data, $matches);
You need to extend your match and include in your pattern the : delimiter.
Failing in doing so lead to the erratic behaviour you already experienced.
By the way, it is not so erratic: take in account the two cases you filed:
22/1266978013 and 2/1266978013.
The regex engine matches :2(2/1266978013):(2/1266978013) your pattern two times. If you comprehend the field delimitator (:) you can be shure that only the intended target will be affected.
I would use preg_replace to directly substitute the pattern you found,
once you fire the expensive regular expression engine, you should, to me, let it to perform as much work it can.
$pattern='#:'.$clear_user.'/[0-9]{10}#';
$replacement = ":$clear_user/$new_time"
$last_message=preg_replace($pattern, $replacement, $old_data ,-1,$matches);
if (!$matches) {
$last_message .= $replacement;
}

Categories