PHP preg_replace() backreferences used as arguments of another function - php

I am trying to extract information from a tags using a regex, then return a result based on various parts of the tag.
preg_replace('/<(example )?(example2)+ />/', analyze(array($0, $1, $2)), $src);
So I'm grabbing parts and passing it to the analyze() function. Once there, I want to do work based on the parts themselves:
function analyze($matches) {
if ($matches[0] == '<example example2 />')
return 'something_awesome';
else if ($matches[1] == 'example')
return 'ftw';
}
etc. But once I get to the analyze function, $matches[0] just equals the string '$0'. Instead, I need $matches[0] to refer to the backreference from the preg_replace() call. How can I do this?
Thanks.
EDIT: I just saw the preg_replace_callback() function. Perhaps this is what I am looking for...

You can't use preg_replace like that. You probably want preg_replace_callback

$regex = '/<(example )?(example2)+ \/>/';
preg_match($regex, $subject, $matches);
// now you have the matches in $matches and you can process them as you want
// here you can replace all matches with modifications you made
preg_replace($regex, $matches, $subject);

Related

PHP regex replace multiple patterns with callback

I'm trying to run a simple replacement on some input data that could be described as follows:
take a regular expression
take an input data stream
on every match, replace the match through a callback
Unfortunately, preg_replace_callback() doesn't work as I'd expect. It gives me all the matches on the entire line, not individual matches. So I need to put the line together again after replacement, but I don't have the information to do that. Case in point:
<?php
echo replace("/^\d+,(.*),(.*),.*$/", "12,LOWERME,ANDME,ButNotMe")."\n";
echo replace("/^\d+-\d+-(.*) .* (.*)$/", "13-007-THISLOWER ThisNot THISAGAIN")."\n";
function replace($pattern, $data) {
return preg_replace_callback(
$pattern,
function($match) {
return strtolower($match[0]);
}, $data
);
}
https://www.tehplayground.com/hE1ZBuJNtFiHbdHO
gives me 12,lowerme,andme,butnotme, but I want 12,lowerme,andme,ButNotMe.
I know using $match[0] is wrong. It's just to illustrate here. Inside the closure I need to run something like
foreach ($match as $m) { /* do something */ }
But as I said, I have no information about the position of the matches in the input string which makes it impossible to put the string together again.
I've digged through the PHP documentation as well as several searches and couldn't find a solution.
Clarifications:
I know that $match[1], $match[2]... etc contain the matches. But only a string, not a position. Imagine in my example the final string is also ANDME instead of ButNotMe - according to the regex, it should not be matched and the callback should not be applied to it. That's why I'm using regexes in the first place instead of string replacements.
Also, the reason I'm using capture groups this way is that I need the replacement process to be configurable. So I cannot hardcode something like "replace #1 and #2 but not #3". On a different input file, the positions might be different, or there might be more replacements needed, and only the regex used should change.
So if my input is "15,LOWER,ME,NotThis,AND,ME,AGAIN", I want to be able to just change the regex, not the code and get the desired result. Basically, both $pattern and $data are variable.
This uses preg_match() and PREG_OFFSET_CAPTURE to return the capture groups and the offset within the original string where it is found. This then uses substr_replace() with each capture group to replace only the part of the string which is to be changed - this stops any chance of replacing similar text which you do not want to be changed...
function lowerParts (string $input, string $regex ) {
preg_match($regex, $input, $matches, PREG_OFFSET_CAPTURE);
array_shift($matches);
foreach ( $matches as $match ) {
$input = substr_replace($input, strtolower($match[0]),
$match[1], strlen($match[0]));
}
return $input;
}
echo lowerParts ("12,LOWERME,ANDME,ButNotMe", "/^\d+,(.*),(.*),.*$/");
gives...
12,lowerme,andme,ButNotMe
But also with
echo lowerParts ("12,LOWERME,ANDME,LOWERME", "/^\d+,(.*),(.*),.*$/");
it gives
12,lowerme,andme,LOWERME
Edit:
If the replacement data is of different lengths, then you would need to chop the string up into parts and replace each one. The complication is that each change in length alters the relative position of the offsets, so this has to keep track of what this offset is. This version also has a parameter which is the process you want to apply to the strings (this example just passes "strtolower") ...
function processParts (string $input, string $regex, callable $process ) {
preg_match($regex, $input, $matches, PREG_OFFSET_CAPTURE);
array_shift($matches);
$offset = 0;
foreach ( $matches as $match ) {
$replacement = $process($match[0]);
$input = substr($input, 0, $match[1]+$offset)
.$replacement.
substr($input, $match[1]+$offset+strlen($match[0]));
$offset += strlen($replacement) - strlen($match[0]);
}
return $input;
}
echo processParts ("12,LOWERME,ANDME,LOWERME", "/^\d+,.*,(.*),(.*)$/", "strtolower");
This will work:
function replaceGroups(string $pattern, string $string, callable $callback)
{
preg_match($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
array_shift($matches);
foreach (array_reverse($matches) as $match) {
$string = substr_replace($string, $callback($match[0]), $match[1], mb_strlen($match[0]));
}
return $string;
}
echo replaceGroups("/^\d+-\d+-(.*) .* (.*)$/", "13-007-THISLOWER ThisNot THISAGAIN", 'strtolower');

preg_match how to return matches?

According to PHP manual "If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on."
How can I return a value from a string with only knowing the first few characters?
The string is dynamic and will always change whats inside, but the first four character will always be the same.
For example how could I return "Car" from this string "TmpsCar". The string will always have "Tmps" followed by something else.
From what I understand I can return using something like this
preg_match('/(Tmps+)/', $fieldName, $matches);
echo($matches[1]);
Should return "Car".
Your regex is flawed. Use this:
preg_match('/^Tmps(.+)$/', $fieldName, $matches);
echo($matches[1]);
$matches = []; // Initialize the matches array first
if (preg_match('/^Tmps(.+)/', $fieldName, $matches)) {
// if the regex matched the input string, echo the first captured group
echo($matches[1]);
}
Note that this task could easily be accomplished without regex at all (with better performance): See startsWith() and endsWith() functions in PHP.
"The string will always have "Tmps" followed by something else."
You don't need a regular expression, in that case.
$result = substr($fieldName, 4);
If the first four characters are always the same, just take the portion of the string after that.
An alternative way is using the explode function
$fieldName= "TmpsCar";
$matches = explode("Tmps", $fieldName);
if(isset($matches[1])){
echo $matches[1]; // return "Car"
}
Given that the text you are looking in, contains more than just a string, starting with Tmps, you might look for the \w+ pattern, which matches any "word" char.
This would result in such an regular expression:
/Tmps(\w+)/
and altogether in php
$text = "This TmpsCars is a test";
if (preg_match('/Tmps(\w+)/', $text, $m)) {
echo "Found:" . $m[1]; // this would return Cars
}

php regex replace substring

I am trying to detect a url with php regex and replace all the &amp that is has with just &. I had run htmlspecialchars in all my input data but i want urls to readable. I did that which obviously doesnt work because the replace part is wrong.
preg_replace('!(http(s)?://((\S)|(&amp))*)!m', '&', $message);
Basically i want all the string to remain the same but change the &amp when it occurs within an url.I was thinking to use preg_match_all but if the values of the array are not passed by reference it wont work.
Any ideas on how i could do it ?
You may match the URLs with a relatively simple !https?://\S+! (matching http:// or https:// and then matching 1+ non-whitespace symbols) and modify the &amp inside each match using a preg_replace_callback:
$message = preg_replace_callback('!https?://\S+!', function ($m) {
return str_replace('&amp', '&', $m[0]);
}, $message);
See a PHP demo.
This may work for you:
preg_match_all('%https?://\S+%msi', $html, $matches, PREG_PATTERN_ORDER);
foreach ($matches[0] as $match)
{
$fixed = preg_replace('/&amp/i', '&', $match);
$match = preg_quote($match);
$html = preg_replace("#$match#", $fixed, $html);
}

Php get specific word of string

What is the php function to extract only the word Duitsland from the following string /Duitsland|/groepsreizen fietsen|Stars/1.
I tried everything but dit not find the right method.
http://php.net/manual/de/function.str-replace.php
$result= str_replace("Duitsland", "", "/Duitsland|/groepsreizen fietsen|Stars/1");
Result: "/|/groepsreizen fietsen|Stars/1"
There is multiple way to do this work. One way is using regex. Use regex in preg_match() to finding specific word of string.
$str = "/Duitsland|/groepsreizen fietsen|Stars/1";
preg_match("/[^|\/]+/", $str, $match);
echo $match[0];
You can test it in demo
with strpos function find require o/p
$haystack = '/Duitsland|/groepsreizen fietsen|Stars/1';
$needle = 'Duitsland';
if (strpos($haystack,$needle) !== false) {
echo $needle;
}
I think this is what you're looking for:
(this code does not need to know what is the word before, it just seeks the first (as long as possible) word in the string)
<?php
$str = "/Duitsland|/groepsreizen fietsen|Stars/1";
preg_match("/\w+/i", $str, $matches);
$first_word = array_shift($matches);
echo $first_word;
It will work no matter how many non-letter symbols there are before that word, i.e. it's not dependent on any fixed-count other characters.
Demo: http://ideone.com/GX8IT6

Put URLs from string into array using regex (problem with trailing period)

I am trying to write a function that pulls all url's from a string and remove a potential trailing slash from the end.
function getUrls($string) {
$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $string, $matches);
return ($matches[0]);
}
But that returns http://test.com. (trailing period) If i have
$string = "Hi I am sharing http://test.com.";
$urls = getUrls($string);
It returns the URL with the period at the end.
This one seems to work (taken from here)
$regex="/(https?:\/\/+[\w\-]+\.[\w\-]+)/i";
In case anyone comes across this, here is what I put together:
$aProtocols = array('http:\/\/', 'https:\/\/', 'ftp:\/\/', 'news:\/\/', 'nntp:\/\/', 'telnet:\/\/', 'irc:\/\/', 'mms:\/\/', 'ed2k:\/\/', 'xmpp:', 'mailto:');
$aSubdomains = array('www'=>'http://', 'ftp'=>'ftp://', 'irc'=>'irc://', 'jabber'=>'xmpp:');
$sRELinks = '/(?:(' . implode('|', $aProtocols) . ')[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s])|(?:(?:(?:(?:[^#:<>(){}`\'"\/\[\]\s]+:)?[^#:<>(){}`\'"\/\[\]\s]+#)?(' . implode('|', array_keys($aSubdomains)) . ')\.(?:[^`~!##$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6}(?:[\/#?](?:[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s])?)?)|(?:(?:[^#:<>(){}`\'"\/\[\]\s]+#)?((?:(?:(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))(?:\.(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))){3})|(?:[A-Fa-f0-9:]{16,39}))|(?:(?:[^`~!##$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6}))\/(?:[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s](?:[#?](?:[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s])?)?)?)|(?:[^#:<>(){}`\'"\/\[\]\s]+:[^#:<>(){}`\'"\/\[\]\s]+#((?:(?:(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))(?:\.(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))){3})|(?:[A-Fa-f0-9:]{16,39}))|(?:(?:[^`~!##$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6}))(?:\/(?:(?:[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s])?)?)?(?:[#?](?:[^\^\[\]{}|\\"\'<>`\s]*[^!#\^()\[\]{}|\\:;"\',.?<>`\s])?)?))|([^#:<>(){}`\'"\/\[\]\s]+#(?:(?:(?:[^`~!##$%^&*()_=+\[{\]}\\|;:\'",<.>\/?\s]+\.)+[a-z]{2,6})|(?:(?:(?:(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))(?:\.(?:(?:[0-1]?[0-9]?[0-9])|(?:2[0-4][0-9])|(?:25[0-5]))){3})|(?:[A-Fa-f0-9:]{16,39}))))(?:[^\^*\[\]{}|\\"<>\/`\s]+[^!#\^()\[\]{}|\\:;"\',.?<>`\s])?)/i';
function getUrls($string) {
global $sRELinks;
preg_match_all($sRELinks, $string, $matches);
return ($matches[0]);
}
From http://yellow5.us/journal/server_side_text_linkification/
Depending on how strict you want to be, consider the Liberal, Accurate Regex Pattern for Matching URLs regular expression pattern discussed on Daring Fireball. The pattern in full is:
\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))
If you are interested in how it works, Alan Storm has a great explanation.

Categories