preg_match_all with callback? - php

I'm interested in replace numeric matches in real time and manipulate them to hexadecimal.
I was wonder if it's possible without using foreach loop.
so iow...
every thing in between :
= {numeric value} ;
will be manupulated to :
= {hexadecimal numeric value} ;
preg_match_all('/\=[0-9]\;/',$src,$matches);
Is there any callback to preg_match_all so instead of preform a loop afterwards I can manipulate them as soon as preg_match_all catch every match (real time).
this is not correct syntax but just so you can get the idea :
preg_match_all_callback('/\=[0-9]\;/',$src,$matches,{convertAll[0-9]ToHexadecimal});

You want preg_replace_callback().
You would match them with a regex like /=\d+?;/ and then your callback would look like...
function($matches) { return dechex($matches[1]); }
Combined, it gives us...
preg_replace_callback('/=(\d+?);/', function($matches) {
return dechex($matches[1]);
}, $str);
CodePad.
Alternatively, you could use positive lookbehind/forward to match the delimiters and then pass 'dechex' straight as the callback.

Or you could use T-Regx tool, which is far better! (automatic delimiters, exceptions instead of warnings, cleaner API)
pattern('=(\d+?);')->replace($str)->group(1)->callback('dechex');
or if you prefer the anonymous function
pattern('=(\d+?);')->replace($str)->group(1)->callback(function (Group $group) {
return dechex($group);
});

Related

Matching substrings with PHP preg_match_all()

I'm attempting to create a lightweight BBCode parser without hardcoding regex matches for each element. My way is utilizing preg_replace_callback() to process the match in the function.
My simple yet frustrating way involves using regex to group the elements name and parse different with a switch for each function.
Here is my regex pattern:
'~\[([a-z]+)(?:=(.*))?(?: (.*))?\](.*)(?:\[/\1\])~siU'
And here is the preg_replace_callback() I've got to test.
return preg_replace_callback(
'~\[([a-z]+)(?:=(.*))?(?: (.*))?\](.*)(?:\[/\1\])~siU',
function($matches) {
var_dump($matches);
return "<".$matches[1].">".$matches[4]."</".$matches[1].">";
},
$this->raw
);
This one issue has stumped me. The regex pattern won't seem to recursively match, meaning if it matches an element, it won't match elements inside it.
Take this BBCode for instance:
[i]This is all italics along with a [b]bold[/b].[/i]
This will only match the [u], and won't match any of the elements inside of it, so it looks like
This is all italics along with a [b]bold[/b].
preg_match_all() continues to show this to be the case, and I've tried messing with greedy syntax and modes.
How can I solve this?
Thanks to #Casimir et Hippolyte for their comment, I was able to solve this using a while loop and the count parameter like they said.
The basic regex strings don't work because I would like to use values in the tags like [color=red] or [img width=""].
Here is the finalized code. It isn't perfect but it works.
$str = $this->raw;
do {
$str = preg_replace_callback(
'~\[([a-z]+)(?:=([^]\s]*))?(?: ([^[]*))?\](.*?)(?:\[/\1\])~si',
function($matches) {
return "<".$matches[1].">".$matches[4]."</".$matches[1].">";
},
$str,
-1,
$count
);
} while ($count);
return $str;

preg_replace. How to use \0 in function?

I want to use class method as a second variable in preg_replace, like
$x = preg_replace('/\[\[\[(.+)\]\]\]/',
(new ButtonGroupWidget(['idsForLoad' => ['\0']]))->run(),
$code);
Idea is generate buttons instead of [[[button id]]]. Yes, this is kinda strange) And yes, I know what smarty is)
You may use preg_replace_callback and pass a callback function instead of a string replacement pattern into that function. If you define the match object argument as $m, the whole match will reside in $m[0].
function repl($m) {
return (new ButtonGroupWidget(['idsForLoad' => [$m[0]]]))->run();
}
$code = "[[[btn1]]] [[[btn2]]]";
$x = preg_replace_callback('/\[\[\[(.+?)]]]/', 'repl', $code);
I also advise to use a lazy dot matching pattern in the regex to enforce the regex to match the shortest strings between [[[ and ]]]. Note that ] does not have to be escaped here.

How to use preg_replace_callback?

I have the following HTML statement
[otsection]Wallpapers[/otsection]
WALLPAPERS GO HERE
[otsection]Videos[/otsection]
VIDEOS GO HERE
What I am trying to do is replace the [otsection] tags with an html div. The catch is I want to increment the id of the div from 1->2->3, etc..
So for example, the above statement should be translated to
<div class="otsection" id="1">Wallpapers</div>
WALLPAPERS GO HERE
<div class="otsection" id="2">Videos</div>
VIDEOS GO HERE
As far as I can research, the best way to do this is via a preg_replace_callback to increment the id variable between each replacement. But after 1 hour of working on this, I just cant get it working.
Any assistance with this would be much appreciated!
Use the following:
$out = preg_replace_callback(
"(\[otsection\](.*?)\[/otsection\])is",
function($m) {
static $id = 0;
$id++;
return "<div class=\"otsection\" id=\"ots".$id."\">".$m[1]."</div>";
},
$in);
In particular, note that I used a static variable. This variable persists across calls to the function, meaning that it will be incremented every time the function is called, which happens for each match.
Also, note that I prepended ots to the ID. Element IDs should not start with numbers.
For PHP before 5.3:
$out = preg_replace_callback(
"(\[otsection\](.*?)\[/otsection\])is",
create_function('$m','
static $id = 0;
$id++;
return "<div class=\"otsection\" id=\"ots".$id."\">".$m[1]."</div>";
'),
$in);
Note: The following is intended to be a general answer and does not attempt to solve the OP's specific problem as it has already been addressed before.
What is preg_replace_callback()?
This function is used to perform a regular expression search-and-replace. It is similar to str_replace(), but instead of plain strings, it searches for a user-defined regex pattern, and then applies the callback function on the matched items. The function returns the modified string if matches are found, unmodified string otherwise.
When should I use it?
preg_replace_callback() is very similar to preg_replace() - the only difference is that instead of specifying a replacement string for the second parameter, you specify a callback function.
Use preg_replace() when you want to do a simple regex search and replace. Use preg_replace_callback() when you want to do more than just replace. See the example below for understanding how it works.
How to use it?
Here's an example to illustrate the usage of the function. Here, we are trying to convert a date string from YYYY-MM-DD format to DD-MM-YYYY.
// our date string
$string = '2014-02-22';
// search pattern
$pattern = '~(\d{4})-(\d{2})-(\d{2})~';
// the function call
$result = preg_replace_callback($pattern, 'callback', $string);
// the callback function
function callback ($matches) {
print_r($matches);
return $matches[3].'-'.$matches[2].'-'.$matches[1];
}
echo $result;
Here, our regular expression pattern searches for a date string of the format NNNN-NN-NN where N could be a digit ranging from 0 - 9 (\d is a shorthand representation for the character class [0-9]). The callback function will be called and passed an array of matched elements in the given string.
The final result will be:
22-02-2014
Note: The above example is for illustration purposes only. You should not use to parse dates. Use DateTime::createFromFormat() and DateTime::format() instead. This question has more details.

How do I filter this in array?

I do have tags like this
search100
web250
seo36
analytics5060
traffic8000
web2.0
I want remove numbers from this tag so I can use this code in php
preg_replace("/\d+$/gm", "", input)
but I want to keep web2.0 without intact...how do I filter this when I am using a loop..I do have more than 100k tags like this.
You could use the pattern /(\w)\d+$/m and $1 as replacement:
preg_replace('/(\w)\d+$/m', '$1', $input)
This pattern requires that there is at least one word character before the sequence of digits.
And to apply this replacement on each element of an array use array_map:
array_map(function($elem) { return preg_replace('/(\w)\d+$/m', '$1', $elem); }, $arr);
If you can’t use an anonymous function (available since PHP 5.3) like in my example, you can either define a separate function, use create_function instead or just use a foreach.
From your vague description and the small sample, it seems you could just use:
$input = preg_replace("/\d\d+$/m", "", $input);
This would spare the 2.0 suffix, because it looks for two consecutive numbers as minimum. But another way to accomplish it would be a negative lookbehind (?<!2\.)\d+$

Replacing Tags with Includes in PHP with RegExps

I need to read a string, detect a {VAR}, and then do a file_get_contents('VAR.php') in place of {VAR}. The "VAR" can be named anything, like TEST, or CONTACT-FORM, etc. I don't want to know what VAR is -- not to do a hard-coded condition, but to just see an uppercase alphanumeric tag surrounded by curly braces and just do a file_get_contents() to load it.
I know I need to use preg_match and preg_replace, but I'm stumbling through the RegExps on this.
How is this useful? It's useful in hooking WordPress.
Orion above has a right solution, but it's not really necessary to use a callback function in your simple case.
Assuming that the filenames are A-Z + hyphens you can do it in 1 line using PHP's /e flag in the regex:
$str = preg_replace('/{([-A-Z]+)}/e', 'file_get_contents(\'$1.html\')', $str);
This'll replace any instance of {VAR} with the contents of VAR.html. You could prefix a path into the second term if you need to specify a particular directory.
There are the same vague security worries as outlined above, but I can't think of anything specific.
You'll need to do a number of things. I'm assuming you can do the legwork to get the page data you want to preprocess into a string.
First, you'll need the regular expression to match correctly. That should be fairly easy with something like /{\w+}/.
Next you'll need to use all of the flags to preg_match to get the offset location in the page data. This offset will let you divide the string into the before, matching, and after parts of the match.
Once you have the 3 parts, you'll need to run your include, and stick them back together.
Lather, rinse, repeat.
Stop when you find no more variables.
This isn't terribly efficient, and there are probably better ways. You may wish to consider doing a preg_split instead, splitting on /[{}]/. No matter how you slice it you're assuming that you can trust your incoming data, and this will simplify the whole process a lot. To do this, I'd lay out the code like so:
Take your content and split it like so: $parts = preg_split('/[{}]/', $page_string);
Write a recursive function over the parts with the following criteria:
Halt when length of arg is < 3
Else, return a new array composed of
$arg[0] . load_data($arg[1]) . $arg[2]
plus whatever is left in $argv[3...]
Run your function over $parts.
You can do it without regexes (god forbid), something like:
//return true if $str ends with $sub
function endsWith($str,$sub) {
return ( substr( $str, strlen( $str ) - strlen( $sub ) ) === $sub );
}
$theStringWithVars = "blah.php cool.php awesome.php";
$sub = '.php';
$splitStr = split(" ", $theStringWithVars);
for($i=0;$i<count($splitStr);$i++) {
if(endsWith(trim($splitStr[$i]),$sub)) {
//file_get_contents($splitStr[$i]) etc...
}
}
Off the top of my head, you want this:
// load the "template" file
$input = file_get_contents($template_file_name);
// define a callback. Each time the regex matches something, it will call this function.
// whatever this function returns will be inserted as the replacement
function replaceCallback($matches){
// match zero will be the entire match - eg {FOO}.
// match 1 will be just the bits inside the curly braces because of the grouping parens in the regex - eg FOO
// convert it to lowercase and append ".html", so you're loading foo.html
// then return the contents of that file.
// BEWARE. GIANT MASSIVE SECURITY HOLES ABOUND. DO NOT DO THIS
return file_get_contents( strtolower($matches[1]) . ".html" );
};
// run the actual replace method giving it our pattern, the callback, and the input file contents
$output = preg_replace_callback("\{([-A-Z]+)\}", replaceCallback, $input);
// todo: print the output
Now I'll explain the regex
\{([-A-Z]+)\}
The \{ and \} just tell it to match the curly braces. You need the slashes, as { and } are special characters, so they need escaping.
The ( and ) create a grouping. Basically this lets you extract particular parts of the match. I use it in the function above to just match the things inside the braces, without matching the braces themselves. If I didn't do this, then I'd need to strip the { and } out of the match, which would be annoying
The [-A-Z] says "match any uppercase character, or a -
The + after the [-A-Z] means we need to have at least 1 character, but we can have up to any number.
Comparatively speaking, regular expression are expensive. While you may need them to figure out which files to load, you certainly don't need them for doing the replace, and probably shouldn't use regular expressions. After all, you know exactly what you are replacing so why do you need fuzzy search?
Use an associative array and str_replace to do your replacements. str_replace supports arrays for doing multiple substitutions at once. One line substitution, no loops.
For example:
$substitutions = array('{VAR}'=>file_get_contents('VAR.php'),
'{TEST}'=>file_get_contents('TEST.php'),
...
);
$outputContents = str_replace( array_keys($substitutions), $substitutions, $outputContents);

Categories