PHP regex replace multiple patterns with callback - php

I'm trying to run a simple replacement on some input data that could be described as follows:
take a regular expression
take an input data stream
on every match, replace the match through a callback
Unfortunately, preg_replace_callback() doesn't work as I'd expect. It gives me all the matches on the entire line, not individual matches. So I need to put the line together again after replacement, but I don't have the information to do that. Case in point:
<?php
echo replace("/^\d+,(.*),(.*),.*$/", "12,LOWERME,ANDME,ButNotMe")."\n";
echo replace("/^\d+-\d+-(.*) .* (.*)$/", "13-007-THISLOWER ThisNot THISAGAIN")."\n";
function replace($pattern, $data) {
return preg_replace_callback(
$pattern,
function($match) {
return strtolower($match[0]);
}, $data
);
}
https://www.tehplayground.com/hE1ZBuJNtFiHbdHO
gives me 12,lowerme,andme,butnotme, but I want 12,lowerme,andme,ButNotMe.
I know using $match[0] is wrong. It's just to illustrate here. Inside the closure I need to run something like
foreach ($match as $m) { /* do something */ }
But as I said, I have no information about the position of the matches in the input string which makes it impossible to put the string together again.
I've digged through the PHP documentation as well as several searches and couldn't find a solution.
Clarifications:
I know that $match[1], $match[2]... etc contain the matches. But only a string, not a position. Imagine in my example the final string is also ANDME instead of ButNotMe - according to the regex, it should not be matched and the callback should not be applied to it. That's why I'm using regexes in the first place instead of string replacements.
Also, the reason I'm using capture groups this way is that I need the replacement process to be configurable. So I cannot hardcode something like "replace #1 and #2 but not #3". On a different input file, the positions might be different, or there might be more replacements needed, and only the regex used should change.
So if my input is "15,LOWER,ME,NotThis,AND,ME,AGAIN", I want to be able to just change the regex, not the code and get the desired result. Basically, both $pattern and $data are variable.

This uses preg_match() and PREG_OFFSET_CAPTURE to return the capture groups and the offset within the original string where it is found. This then uses substr_replace() with each capture group to replace only the part of the string which is to be changed - this stops any chance of replacing similar text which you do not want to be changed...
function lowerParts (string $input, string $regex ) {
preg_match($regex, $input, $matches, PREG_OFFSET_CAPTURE);
array_shift($matches);
foreach ( $matches as $match ) {
$input = substr_replace($input, strtolower($match[0]),
$match[1], strlen($match[0]));
}
return $input;
}
echo lowerParts ("12,LOWERME,ANDME,ButNotMe", "/^\d+,(.*),(.*),.*$/");
gives...
12,lowerme,andme,ButNotMe
But also with
echo lowerParts ("12,LOWERME,ANDME,LOWERME", "/^\d+,(.*),(.*),.*$/");
it gives
12,lowerme,andme,LOWERME
Edit:
If the replacement data is of different lengths, then you would need to chop the string up into parts and replace each one. The complication is that each change in length alters the relative position of the offsets, so this has to keep track of what this offset is. This version also has a parameter which is the process you want to apply to the strings (this example just passes "strtolower") ...
function processParts (string $input, string $regex, callable $process ) {
preg_match($regex, $input, $matches, PREG_OFFSET_CAPTURE);
array_shift($matches);
$offset = 0;
foreach ( $matches as $match ) {
$replacement = $process($match[0]);
$input = substr($input, 0, $match[1]+$offset)
.$replacement.
substr($input, $match[1]+$offset+strlen($match[0]));
$offset += strlen($replacement) - strlen($match[0]);
}
return $input;
}
echo processParts ("12,LOWERME,ANDME,LOWERME", "/^\d+,.*,(.*),(.*)$/", "strtolower");

This will work:
function replaceGroups(string $pattern, string $string, callable $callback)
{
preg_match($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
array_shift($matches);
foreach (array_reverse($matches) as $match) {
$string = substr_replace($string, $callback($match[0]), $match[1], mb_strlen($match[0]));
}
return $string;
}
echo replaceGroups("/^\d+-\d+-(.*) .* (.*)$/", "13-007-THISLOWER ThisNot THISAGAIN", 'strtolower');

Related

Regular Expression For Time string [duplicate]

I need to print all matches using preg_match_all.
$search = preg_match_all($pattern, $string, $matches);
foreach ($matches as $match) {
echo $match[0];
echo $match[1];
echo $match[...];
}
The problem is I don't know how many matches there in my string, and even if I knew and if it was 1000 that would be pretty dumb to type all those $match[]'s.
The $match[0], $match[1], etc., items are not the individual matches, they're the "captures".
Regardless of how many matches there are, the number of entries in $matches is constant, because it's based on what you're searching for, not the results. There's always at least one entry, plus one more for each pair of capturing parentheses in the search pattern.
For example, if you do:
$matches = array();
$search = preg_match_all("/\D+(\d+)/", "a1b12c123", $matches);
print_r($matches);
Matches will have only two items, even though three matches were found. $matches[0] will be an array containing "a1", "b12" and "c123" (the entire match for each item) and $matches[1] will contain only the first capture for each item, i.e., "1", "12" and "123".
I think what you want is something more like:
foreach ($matches[1] as $match) {
echo $match;
}
Which will print out the first capture expression from each matched string.
Does print_r($matches) give you what you want?
You could loop recursively. This example requires SPL and PHP 5.1+ via RecursiveArrayIterator:
foreach( new RecursiveArrayIterator( $matches ) as $match )
print $match;

preg_match how to return matches?

According to PHP manual "If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on."
How can I return a value from a string with only knowing the first few characters?
The string is dynamic and will always change whats inside, but the first four character will always be the same.
For example how could I return "Car" from this string "TmpsCar". The string will always have "Tmps" followed by something else.
From what I understand I can return using something like this
preg_match('/(Tmps+)/', $fieldName, $matches);
echo($matches[1]);
Should return "Car".
Your regex is flawed. Use this:
preg_match('/^Tmps(.+)$/', $fieldName, $matches);
echo($matches[1]);
$matches = []; // Initialize the matches array first
if (preg_match('/^Tmps(.+)/', $fieldName, $matches)) {
// if the regex matched the input string, echo the first captured group
echo($matches[1]);
}
Note that this task could easily be accomplished without regex at all (with better performance): See startsWith() and endsWith() functions in PHP.
"The string will always have "Tmps" followed by something else."
You don't need a regular expression, in that case.
$result = substr($fieldName, 4);
If the first four characters are always the same, just take the portion of the string after that.
An alternative way is using the explode function
$fieldName= "TmpsCar";
$matches = explode("Tmps", $fieldName);
if(isset($matches[1])){
echo $matches[1]; // return "Car"
}
Given that the text you are looking in, contains more than just a string, starting with Tmps, you might look for the \w+ pattern, which matches any "word" char.
This would result in such an regular expression:
/Tmps(\w+)/
and altogether in php
$text = "This TmpsCars is a test";
if (preg_match('/Tmps(\w+)/', $text, $m)) {
echo "Found:" . $m[1]; // this would return Cars
}

How do I get number from this format:,[[5,["95",1,"#ffffff"]]]], using regex

I have a string like this:
",[[3,"bus.png",null,"Bus",[["https://maps.gstatic.com/mapfiles/transit/iw2/b/bus.png",0,[15,15],null,0]]]],[[null,null,null,null,"0x31da18325b415901:0xeb661015c651c24a",[[5,["48",1,"#ffffff"]]]],[null,null,null,null,"0x31da19f34e04d59b:0x5758ef6990938b",[[5,["61",1,"#ffffff"]]]],[null,null,null,null,"0x31da1a5b8b75c379:0x6a13e189555f9fab",[[5,["95",1,"#ffffff"]]]],[null,null,null,null,"0x31da1a16ea23bf95:0xd7c90f15535c2b9f",[[5,["106",1,"#ffffff"]]]],[null,null,null,null,"0x31da10a7613d616f:0xf1f61ffeac2ea8a4",[[5,["970",1,"#ffffff"]]]],[null,null,null,null,"0x31da1a0bd6262d0b:0xfbd5d2bfd7a1252",[[5,["NR8",1,"#ffffff"]]]]],null,0,"5"]]],["http://www
I need to get all the numbers: "48, 61,95,106,970,NR8"; so basically, need to process this format :"48, 61,95,106,970,NR8"
I tried:
function get_numbers_from($input) {
$matches = preg_match_all('(\[\"[]a-zA-Z0-9]*?\"\,\d*?\,\".*?\"\])', $input);
foreach($matches[1] as $key => $match) {
array_push($numbers, explode(',', $match)[0]);
}
return $numbers;
}
But seems it shows: Invalid argument supplied for foreach()
How to correct it?
Check the manual for preg_match_all(), the function returns a boolean. And you use the third parameter for the matches.
Also you can change your regex to this one:
\[\[\d+,\[\"(\w+)\",\d+,"#[\da-fA-F]+"]]]]
To get the number directly from it without explode(), e.g.
function get_numbers_from($input) {
preg_match_all('/\[\[\d+,\[\"(\w+)\",\d+,"#[\da-fA-F]+"]]]]/', $input, $matches);
return $matches[1];
}
You can use
'~\["([A-Z]*\d+)"~'
See the regex demo and the IDEONE demo
$re = '~\["([A-Z]*\d+)"~';
$str = "\",[[3,\"bus.png\",null,\"Bus\",[[\"https://maps.gstatic.com/mapfiles/transit/iw2/b/bus.png\",0,[15,15],null,0]]]],[[null,null,null,null,\"0x31da18325b415901:0xeb661015c651c24a\",[[5,[\"48\",1,\"#ffffff\"]]]],[null,null,null,null,\"0x31da19f34e04d59b:0x5758ef6990938b\",[[5,[\"61\",1,\"#ffffff\"]]]],[null,null,null,null,\"0x31da1a5b8b75c379:0x6a13e189555f9fab\",[[5,[\"95\",1,\"#ffffff\"]]]],[null,null,null,null,\"0x31da1a16ea23bf95:0xd7c90f15535c2b9f\",[[5,[\"106\",1,\"#ffffff\"]]]],[null,null,null,null,\"0x31da10a7613d616f:0xf1f61ffeac2ea8a4\",[[5,[\"970\",1,\"#ffffff\"]]]],[null,null,null,null,\"0x31da1a0bd6262d0b:0xfbd5d2bfd7a1252\",[[5,[\"NR8\",1,\"#ffffff\"]]]]],null,0,\"5\"]]],[\"http://www\n48, 61,95,106,970,NR8";
preg_match_all($re, $str, $matches);
print_r($matches[1]);
The pattern matches:
\[ - a [
" - a quote
([A-Z]*\d+) - Group 1: any uppercase ASCII letter, 0 or more times, followed with 1 or more digits
" - a quote
The value you need is located inside the $matches[1] variable. It holds all the values captured with the parenthesized subpattern (Group 1).

Put URLs from string into array using regex (problem with trailing period)

I am trying to write a function that pulls all url's from a string and remove a potential trailing slash from the end.
function getUrls($string) {
$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $string, $matches);
return ($matches[0]);
}
But that returns http://test.com. (trailing period) If i have
$string = "Hi I am sharing http://test.com.";
$urls = getUrls($string);
It returns the URL with the period at the end.
This one seems to work (taken from here)
$regex="/(https?:\/\/+[\w\-]+\.[\w\-]+)/i";
In case anyone comes across this, here is what I put together:
$aProtocols = array('http:\/\/', 'https:\/\/', 'ftp:\/\/', 'news:\/\/', 'nntp:\/\/', 'telnet:\/\/', 'irc:\/\/', 'mms:\/\/', 'ed2k:\/\/', 'xmpp:', 'mailto:');
$aSubdomains = array('www'=>'http://', 'ftp'=>'ftp://', 'irc'=>'irc://', 'jabber'=>'xmpp:');
$sRELinks = '/(?:(' . implode('|', $aProtocols) . ')[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s])|(?:(?:(?:(?:[^#:<>(){}`\'"\/\[\]\s]+:)?[^#:<>(){}`\'"\/\[\]\s]+#)?(' . implode('|', array_keys($aSubdomains)) . ')\.(?:[^`~!##$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6}(?:[\/#?](?:[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s])?)?)|(?:(?:[^#:<>(){}`\'"\/\[\]\s]+#)?((?:(?:(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))(?:\.(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))){3})|(?:[A-Fa-f0-9:]{16,39}))|(?:(?:[^`~!##$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6}))\/(?:[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s](?:[#?](?:[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s])?)?)?)|(?:[^#:<>(){}`\'"\/\[\]\s]+:[^#:<>(){}`\'"\/\[\]\s]+#((?:(?:(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))(?:\.(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))){3})|(?:[A-Fa-f0-9:]{16,39}))|(?:(?:[^`~!##$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6}))(?:\/(?:(?:[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s])?)?)?(?:[#?](?:[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s])?)?))|([^#:<>(){}`\'"\/\[\]\s]+#(?:(?:(?:[^`~!##$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6})|(?:(?:(?:(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))(?:\.(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))){3})|(?:[A-Fa-f0-9:]{16,39}))))(?:[^\^*\[\]{}|\\"<>\/`\s]+[^!#\^()\[\]{}|\\:;"\',.?<>`\s])?)/i';
function getUrls($string) {
global $sRELinks;
preg_match_all($sRELinks, $string, $matches);
return ($matches[0]);
}
From http://yellow5.us/journal/server_side_text_linkification/
Depending on how strict you want to be, consider the Liberal, Accurate Regex Pattern for Matching URLs regular expression pattern discussed on Daring Fireball. The pattern in full is:
\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))
If you are interested in how it works, Alan Storm has a great explanation.

PHP preg_replace() backreferences used as arguments of another function

I am trying to extract information from a tags using a regex, then return a result based on various parts of the tag.
preg_replace('/<(example )?(example2)+ />/', analyze(array($0, $1, $2)), $src);
So I'm grabbing parts and passing it to the analyze() function. Once there, I want to do work based on the parts themselves:
function analyze($matches) {
if ($matches[0] == '<example example2 />')
return 'something_awesome';
else if ($matches[1] == 'example')
return 'ftw';
}
etc. But once I get to the analyze function, $matches[0] just equals the string '$0'. Instead, I need $matches[0] to refer to the backreference from the preg_replace() call. How can I do this?
Thanks.
EDIT: I just saw the preg_replace_callback() function. Perhaps this is what I am looking for...
You can't use preg_replace like that. You probably want preg_replace_callback
$regex = '/<(example )?(example2)+ \/>/';
preg_match($regex, $subject, $matches);
// now you have the matches in $matches and you can process them as you want
// here you can replace all matches with modifications you made
preg_replace($regex, $matches, $subject);

Categories