Using regex to match multiple optional arguments in an enclosed custom function - php

I'm building a website and I have created a couple of custom functions that add certain information on the page. One of them is as follows:
{{i|Custom argument 1|Custom argument 2|Custom argument 3|...|Custom argument n}}
The number of arguments has no limit, and they all follow the same pattern, starting with |, and finishing with another |, or }} for the last argument.
I'm trying to capture all custom arguments with a RegEx pattern, and them use preg_replace_callback to make the replacements.
I managed to create a RegEx pattern that works with 1 optional argument, but I can't manage to add more. A single pattern to capture all arguments as groups would be the best scenario.
Thanks!
This is my current code that works with 1 mandatory $match[1] and 1 optional argument $match[2].
I'm trying to get the 2nd optional argument $match[3] to work.
$text = preg_replace_callback("{{{i\|(.*?)(?:\|(.*?))?}}}",
function($match)
{
return ''. (isset($match[3]) ? $match[3] : "Default text") .'';
}, $text);

Related

Can't find a correct regular expression for parsing

I'm trying to write my own template engine which gives me the possibility to use something like placeholders inside html.
An example regex I've tried:
{:(.+):\s(.*)}
It works, but unfortunately not as I want. I'm trying to get the name of the placeholder and it's arguments, preferably splitted.
Here's an example:
https://regex101.com/r/Z7KahK/1
First, it matches "var" and "any_id". My parser looks for a variable named "any_id" and will replace the whole placeholder.
Second (and further) is for example "render". It matches the name of the placeholder and all it's arguments together, which I don't really want. I want to match every argument alone.
For example:
Match 1 = render
Match 2 = partial (_order/service_section)
Match 3 = args (myArguments)
This placeholder will replace with a new file (provided as argument "partial" with additional arguments ("args") given in this argument.
I'm not really sure if you understand what I'm trying to say, but I tried my best to write it as best as possible.

Using different names for subpatterns of the same number with preg_replace_callback

I'm having a hard time getting my head around what exactly is being numbered in my regex subpatterns. I'm being given the PHP warning:
PHP Warning: preg_replace_callback(): Compilation failed: different names for subpatterns of the same number are not allowed
When attempting the following:
$input = "A string that contains [link-ssec-34] and a [i]word[/i] here";
$matchLink = "\[link-ssec-(0?[1-9]|[1-9][0-9]|100)\]";
$matchItalic = "\[i](.+)\[\/i]";
$output = preg_replace_callback(
"/(?|(?<link>$matchLink)|(?<italic>$matchItalic))/",
function($m) {
if(isset($m['link'])){
$matchedLink = substr($m['link'][0], 1, -1);
//error_log('m is: ' . $matchedLink);
$linkIDExplode = explode("-",$matchedLink);
$linkHTML = createSubSectionLink($linkIDExplode[2]);
return $linkHTML;
} else if(isset($m['italic'])){
// TO DO
}
},
$input);
If I remove the named capture groups, like so:
"/(?|(?:$matchLink)|(?:$matchItalic))/"
There's no warnings, and I get matches fine but can't target them conditionally in my function. I believe I'm following correct procedure for naming capture groups, but PHP is saying they're using the same subpattern number, which is where I'm lost as I'm not sure what's being numbered. I'm familiar with addressing subpatterns using $1, $2, etc. but don't see the relevancy here when used with named groups.
Goal
Incase I'm using completely the wrong technique, I should include my goal. I was originally using preg_replace_callback() to replace tagged strings that matched a pattern like so :
$output = preg_replace_callback(
"/\[link-ssec-(0?[1-9]|[1-9][0-9]|100)\]/",
function($m) {
$matchedLink = substr($m[0], 1, -1);
$linkIDExplode = explode("-",$matchedLink);
$linkHTML = createSubSectionLink($linkIDExplode[2]);
return $linkHTML;
},
$input);
The requirement has grown to needing to match multiple tags in the same paragraph (My original example included the next one [i]word[/i]. Rather than parsing the entire string from scratch for each pattern, I'm trying to look for all the patterns in a single sweep of the paragraph/string in the belief that it will be less taxing on the system. Researching it led me to believe that using named capture groups in a branch reset was the best means of being able to target matches with conditional statements. Perhaps I'm walking down the wrong trail with this one but would appreciate being directed to a better method.
Result Desired
$input = "A string that contains [link-ssec-34] and a [i]word[/i] here";
$output = "A string that contains <a href='linkfromdb.php'>Link from Database</a> and a <span class='italic'>word</span> here."
With the potential to add further patterns as needed in the format of square brackets encompassing a word or being self-contained.
To answer your question about the warning:
PHP Warning: preg_replace_callback(): Compilation failed: different names for subpatterns of the same number are not allowed
Your pattern defines named matchgroups. But your pattern is using alternations (|) as well, meaning a whole part of the pattern does not need to be matched as all.
That means, that the named pattern link can appear with the match-number 1, but italic can also appear with match-number 1.
Since there is an alternation BOTH the matches can only be the same "number", hence they are only allowed to have the same NAME:
#(?|(?<first>one)|(?<first>two))#
would be allowed.
#(?|(?<first>one)|(?<second>two))#
throws this warning.
Without fully understand what I've done (but will look into it now) I did some trial and error on #bobblebubble comment and got the following to produce the desired result. I can now use conditional statements targeting named capture groups to decide what action to take with matches.
I changed the regex to the following:
$matchLink = "\[link-ssec-(0?[1-9]|[1-9][0-9]|100)\]"; // matches [link-ssec-N]
$matchItalic = "\[i](.+)\[\/i]"; // matches [i]word[/i]
$output = preg_replace_callback(
"/(?<link>$matchLink)|(?<italic>$matchItalic)/",
function($m) { etc...
Hopefully it's also an efficient way, in terms of overhead, of matching multiple regex patterns with callbacks in the same string.

Regex: ignoring match with two brackets

I try to match markup by regex:
1. thats an [www.external.com External Link], as you can see
2. thats an [[Internal Link]], as you can see
That should result in
1. thats an [External Link](www.external.com), as you can see
2. thats an [Internal Link](wiki.com/Internal Link), as you can see
Both of it work fine with this preg_replaces:
1. $line = preg_replace("/(\[)(.*?)( )(.*)(\])/", "[$4]($2)", $line);
2. $line = preg_replace("/(\[\[)(.*)(\]\])/", "[$2](wiki.com/$2)", $line);
But they interfere with each other, so using the replaces one after the other returns ugly results. So Iam trying to ignore in one of the matches the other one. I tried to replace the first regex by this one:
([^\[]{0,})(\[)([^\[]{1,})( )(.*)(])
It should check if there is only one [ and the char after and before isn't a [. But its still matching the [Internal Link] within the [], but it should ignore this part completely
With preg_replace_callback you can build a pattern to handle the two cases and to define a conditional replacement in the callback function. In this way the string is parsed only once.
$str = <<<'EOD'
1. thats an [www.external.com External Link], as you can see
2. thats an [[Internal Link]], as you can see
EOD;
$domain = 'wiki.com';
$pattern = '~\[(?:\[([^]]+)]|([^] ]+) ([^]]+))]~';
$str = preg_replace_callback($pattern, function ($m) use ($domain) {
return empty($m[1]) ? "[$m[3]]($m[2])" : "[$m[1]]($domain/$m[1])";
}, $str);
echo $str;
The pattern uses an alternation (?: xxx | yyy). The first branch describes internal links and the second external links.
When the second branch succeeds the first capture group 1 is empty (but defined). The callback function has to test it to know which branch succeeds and to return the appropriate replacement string.

Error trying to pass regex match to function

I'm getting Syntax error, unexpected T_LNUMBER, expecting T_VARIABLE or '$'
This is the code i'm using
function wpse44503_filter_content( $content ) {
$regex = '#src=("|\')'.
'(/images/(19|20)(0-9){2}/(0|1)(0-9)/[^.]+\.(jpg|png|gif|bmp|jpeg))'.
'("|\')#';
$replace = 'src="'.get_site_url( $2 ).'"';
$output = preg_replace( $regex, $replace, $content );
return $output;
}
This is the line where i'm getting that error $replace = 'src="'.get_site_url( $2 ).'"';
Can anyone help me to fix it?
Thanks
You can't have '$2' as a variable name. It must start with a letter or underscore.
http://php.net/manual/en/language.variables.basics.php
Variable names follow the same rules as other labels in PHP. A valid variable name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thus: '[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*'
Edit Above was my original answer and is the correct answer to the simple "syntax error" question. More in-depth answer below...
You are trying to use $2 to represent "the second capture group", but you haven't done anything at that point to match your regex. Even if $2 was a valid PHP variable name, it still wouldn't be set at that point in your script. Because of this, you can determine that you are using preg_replace improperly and that it may not suit your actual needs.
Note that the preg_replace documentation doesn't support using $n as a separate variable outside of the replacement operation. In other words, 'foo' . $1 . 'bar' is not a valid replacement string, but 'foo$1bar' is.
Depending on the complexity of get_site_url, you have 2 options:
If get_site_url is simply adding a root directory or server name, you could change your replacement string to src="/myotherlocation$2". This will effectively replace "/image/..." with "/myotherlocation/image/..." in the img src. This will not work if get_site_url is doing something more complex.
If get_site_url is complex, you should use preg_replace_callback per other answers. Give the documentation a read and post a new question (or I guess update this question?) if you have trouble with the implementation.
What you're trying to do (ie replacing the matched string with the result of a function call) can't be done using preg_replace, you'll need to use preg_replace_callback instead to get a function called for every match.
A short example of preg_replace_callback;
$get_site_url = // Returns replacement
function($row) {
return '!'.$row[1].'!'; // row[1] is first "backref"
};
$str = 'olle';
$regex = '/(ll)/'; // String to match
$output = preg_replace_callback( // Match, calling get_site_url for replacement
$regex,
$get_site_url,
$str);
var_dump($output); // output "o!ll!e"
PHP variable names cant begin with a number.
$2 is not a valid PHP variable. If you meant the second group in the regex then you want to put \2 in a string. However, since you're passing it to a function then you'll need to use preg_replace_callback() instead and substitute appropriately in the callback.
if PHP variable begins with number use following:
when I was getting the following as the result set from thrid party API
Code Works
$stockInfo->original->data[0]->close_yesterday
Code Failed
$stockInfo->original->data[0]->52_week_low
Solution
$stockInfo->original->data[0]->{'52_week_high'}

PHP preg_replace non-greedy trouble

I've been using the following site to test a PHP regex so I don't have to constantly upload:
http://www.spaweditor.com/scripts/regex/index.php
I'm using the following regex:
/(.*?)\.{3}/
on the following string (replacing with nothing):
Non-important data...important data...more important data
and preg_replace is returning:
more important data
yet I expect it to return:
important data...more important data
I thought the ? is the non-greedy modifier. What's going on here?
Your non-greedy modifier is working as expected. But preg_match replaces all occurences of the the (non-greedy) match with the replacement text ("" in your case). If you want only the first one replaced, you could pass 1 as the optional 4th argument (limit) to preg_replace function (PHP docs for preg_replace). On the website you linked, this can be accomplished by typing 1 into the text input between the word "Flags" and the word "limit".
just an actual example of #Asaph solution. In this example ou don't need non-greediness because you can specify a count.
replace just the first occurrence of # in a line with a marker
$line=preg_replace('/#/','zzzzxxxzzz',$line,1);

Categories