Regex rule to match value and replace with specific related value - php

This is primarily a regex question, but it is being used in Codeigniter's routing file. The routing file is a list of regex rules that it tries the match. Thus the need for a 1 liner.
Take the following 4 strings:
techniques/foo3
news/bar-4-22
reviews/non23-a
features/wins
I'm looking for a 1 line, regex rule that will find techniques,news,reviews or features and replace with a particular int value of 5,1,7 or 3. The number corresponds to the name, so techniques=5, news=1, reviews=7 and features=3. The last value after the slash can be any URL friendly text string. I'll be selecting this as well strait as is. I essentially want to convert them to the following:
categorysearch/5/foo3
categorysearch/1/bar-4-22
categorysearch/7/non23-a
categorysearch/3/wins
Can this be done with 1 regex line?

Use preg_replace_callback() like so:
$tokens = [
'techniques' => 5,
'news' => 1,
'reviews' => 7,
'features' => 3
];
echo preg_replace_callback('#^([^/]+)/(.*)$#', function ($m) use ($tokens) {
if (array_key_exists($m[1], $tokens)) {
return sprintf('%s/%d/%s', $m[1], $tokens[$m[1]], $m[2]);
}
return sprintf('%s/%s', $m[1], $m[2]);
}, $string);
If the replacement is simple as this, a regex is not even required. A simple sscanf() or explode() (with the list construct) should suffice.
Demo

$route['techniques/(:any)'] = 'categorysearch/5/$1';
$route['news/(:any)'] = 'categorysearch/1/$1';
This will work I would think!

You shouldn't use a regex for everything - they are not efficient for simple things like this and are designed for more complex scenarios.
The following will do the job (there are better ways but this is easiest to follow!)
$rawstring = "techniques/foo3"
$find = array('techniques', 'news', 'reviews', 'features');
$replace = array('5', '1', '7', '3');
$return = str_replace($find, "categorysearch/" . $replace, $rawstring);
//$return = "categorysearch/5/foo3"
This way also works and is simple to follow for noobies like me :-P Amals answer is far superior!!

Related

Is it possible to changes foreach php structure?

Is it possible to changes this foreach php structure ?
function token($word)
{
$result = $word;
$listconjunctions = ['and', 'on', 'in', 'or', 'which'];
foreach ($listconjuctions as $conjunctions){
$result = str_replace($conjunctions,'',$result);
}
return $result;
}
You ask Is it possible to changes this foreach php structure, and yes there is no need for it
function token($word, array $listconjunctions=['and', 'on', 'in', 'or', 'which'])
{
return str_replace($listconjunctions,'',$word);
}
There I fixed it for you, and I added the ability to give it an array of words to remove in the $word string. For example:
$string = "this that and the other which.";
echo token($string, ['that','the','this']);
outputs
and other which.
I tested it with this code, just to show they are functionally equivalent, by default.
function token($word)
{
$result = $word;
$listconjunctions = ['and', 'on', 'in', 'or', 'which'];
foreach ($listconjunctions as $conjunctions){
$result = str_replace($conjunctions,'',$result);
}
return $result;
}
function token2($word, $listconjunctions=['and', 'on', 'in', 'or', 'which'])
{
return str_replace($listconjunctions,'',$word);
}
$string = "this that and the other which.";
echo token($string)."\n\n";
echo token2($string)."\n\n";
Output
this that the other .
this that the other .
Try it yourself
https://3v4l.org/K83CL
Additionally
The problem with your original one, besides being over bloated is this:
$listconjunctions
$listconjuctions
See the difference, your missing an n in the one used in the foreach.
More Advanced
This is a much more advanced version using Regular expressions and preg_replace. Regular expressions or Regex for short, is almost like another language itself. What it does is let you pattern match in strings.
function token1($word, array $listconjunctions=['and', 'on', 'in', 'or', 'which'])
{
//escape words for use in regular expressions
$listconjunctions = array_map('preg_quote', $listconjunctions);
$pattern = [
'~\b('.implode('|',$listconjunctions).')\b~i', //words
'~\s{2,}~', //run on spaces, 2 or more. eg. 'one two'
'~\s+([^\w$])~' //spaces before punctuation. eg. 'word .'
];
return preg_replace($pattern, [' ', ' ', '$1'], $word);
}
$string = "this that and on and on the other which.";
echo token($string)."\n\n";
echo token1($string);
I named it token1 and when running it against either your original, or my slimmed down version, we get these differing outputs.
//original functionality
this that the other .
//advanced version
this that the other.
So as you can see the second one removes all that improper spaces. The [^\w$] is a character group (or a set of characters) the [^ makes it negative and the \w matches 0-9a-za-Z_ and the $ just a dollar sign. So this means Match anything but (not) 0-9a-za-Z_$. So what it does match are all the special characters and punctuation.
I mention this because the $ is in there to account for things like this string.
'this $5.00 is what you owe me for fixing your code.' //just kidding ... lol
Which would become this without saying not to match it.
'this$5.00 is what you owe me for fixing your code.'
You may need to add other stuff in there if you have problems like that. Just I couldn't think of any other punctuation that should be preceded by a space all the time, although I am sure there must be some.
I saw that "defect" in the original and I wouldn't feel right if I ignored it.
I hope that makes sense.
https://3v4l.org/XhOlQ
Cheers.

How to get equal parts of multiple strings/array?

I have the following point: a xls file contains one column with codes. The codes have a prefix and a unique code like this:
- VIP-AX757
- VIP-QBHE6
- CODE-IUEF7
- CODE-QDGF3
- VIP-KJQFB
- ...
How can I get equal parts of strings or an array? perfect would be if I get an array like this:
- $result[VIP] = 3;
- $result[CODE] = 2;
An array with the found prefix and the sum of cells with that prefix. But the result is not so important at the moment.
I couldn't find a soloution how to get equal parts of two strings: how to compare this "VIP-AX757" and "VIP-QBHE6" and get a result that says: "VIP-" is the same prefix/part in this two strings?
Hope someone has an idea.
thx!
-drum roll- Time for a one-liner!
$result = array_count_values(array_map(function($v) {list($a) = explode("-",$v); return $a;},$input));
(Assumes $input is your array of codes)
If you are using PHP 5.4 or newer (you should be), then:
$result = array_count_values(array_map(function($v) {return explode("-",$v)[0];},$input));
Tested in PHP CLI:
If the prefix is always followed by a '-' then you can do something like this:-
foreach ($codes as $code) {
$tmp = explode("-",$code);
$result[$tmp[0]] += 1;
}
print_r($result);
Depends on the variability of the data, but something like:
preg_match_all('/^([^-]+)/m', $string, $matches);
$result = array_count_values($matches[1]);
print_r($result);
If you don't know that there is an - after the prefix but the prefix is always letters then:
preg_match_all('/^([A-Z]+)/im', $string, $matches);
$result = array_count_values($matches[1]);
Otherwise you'll have to define exactly what the prefix can contain if it's not the delimiter.
Since you stated via comment to Niet that you don't have a reliable delimiter, then we can only write a pattern that identifies your targeted substrings based on their location in each line.
I recommend preg_match_all() with no capture group, a start of the line anchor, and a multi-line pattern modifier (m).
I've written a preg_split() alternative, but the pattern is a little "clunkier" because of the way I'm handling the line returns.
Code: (Demo)
$string = 'VIP-AX757
VIP-QBHE6
CODE-IUEF7
CODE-QDGF3
VIP-KJQFB';
var_export(array_count_values(preg_match_all('~^[A-Z]+~m', $string, $out) ? $out[0] : []));
echo "\n\n";
var_export(array_count_values(preg_split('~[^A-Z][^\r\n]+\R?~', $string, -1, PREG_SPLIT_NO_EMPTY)));
Output:
array (
'VIP' => 3,
'CODE' => 2,
)
array (
'VIP' => 3,
'CODE' => 2,
)

preg_replace: how to consider whole array of patterns before replacing?

I'm using preg_replace to match and replace improperly encoded UTF-8 characters with their proper characters. I've created a "old" array containing the wrong characters, and a corresponding "new" array with the replacements. Here is a snippet of each array:
$old = array(
'/â€/',
'/’/',
);
$new = array(
'†',
'’',
);
(Note: If you're curious about why I'm doing this, read more here)
A sample string that may contain the wrong data could be:
The programmer’s becoming very frustrated
Which should become:
The programmer's becoming very frustrated
I'm using this function:
$result = preg_replace($old, $new, $str);
But the subject is actually becoming:
The programmer†™s becoming very frustrated
It's clear that PHP is doing what I call a non-greedy match on the subject (not the correct term to use here, I know). preg_replace is executing the replacement on the first pair in the old/new array without considering if there may a different pattern in the pattern array that is more appropriate. If I reverse the order of the replacement pair, then it works as expected.
My question is: Is there an approach that will allow preg_replace to consider all elements of the pattern array before executing a replacement, or is my only option to re-order the arrays?
I don't think there is any option like that. However, you could use an associative array to store your replacements and sort it using uasort and strlen, so larger matches would come first and you wouldn't need to manage your array order manually.
Then you can use array_keys and array_values to act just like your separated $old and $new arrays.
$replacements = array(
'†' => '/â€/',
'’' => '/’/',
);
// sorts the replacements array by value string length keeping the indexes intact
uasort($replacements, function($a, $b) {
return strlen($b) - strlen($a);
});
$str = 'The programmer’s becoming very frustrated';
$result = preg_replace(array_values($replacements), array_keys($replacements), $str);
EDIT: As #CasimiretHippolyte pointed out, using array_values is not necessary on the first parameter of the preg_replace function in this case. It would only return a copy from $replacements with numerical indexes but the order would be the same. Unless you need an array with identical structure to $old from your question, you do not need to use it.
Order the arrays $old and $new in such way that the longest regex becomes first:
$old = array(
'/’/',
'/â€/',
);
$new = array(
'’',
'†',
);
$str = 'The programmer’s becoming very frustrated';
$result = preg_replace($old, $new, $str);
echo $result,"\n";
output:
The programmer’s becoming very frustrated
I don't believe there is a way to do this only using preg_replace. However you can easily do this sorting your array beforehand:
$replacements = array_combine($old, $new);
krsort($replacements);
$result = preg_repalce( array_keys($replacements), array_values($replacements), $string);

How to split delimited string with leading and trailing characters to array?

For example, if I have this string:
$stuff = "[1379082600-1379082720],[1379082480-1379082480],[1379514420-1379515800],";
I know can do this to split it into an array like this:
$stuff = str_replace(array("[","]"),array("",""),$stuff);
$stuff = explode(",",$stuff);
But it seems like there would be an easier way since the string is already in an array form almost. Is there an easier way?
since the string is already in an array form almost.
It is not. A string and an array are quite different things in terms of programming language.
Is there an easier way?
There is rather no point in looking for "an easier way". The way you have at the moment is pretty easy already.
You can get inside [] with preg_match_all. Try following:
preg_match_all("/\[(.*?)\]/",$stuff, $matches);
Output of $matches[1]
array (size=3)
0 => string '1379082600-1379082720' (length=21)
1 => string '1379082480-1379082480' (length=21)
2 => string '1379514420-1379515800' (length=21)
Trim the leading and trailing chars and then spit on ],[:
$stuff = explode('],[', trim($stuff, '[],');
This is as about as good as you're going to get I think
$stuff = array_filter(explode(",",str_replace(array("[","]"),"",$stuff)));
print_r($stuff);
[0] => 1379082600-1379082720
[1] => 1379082480-1379082480
[2] => 1379514420-1379515800
Using a regex-based solution will be slower / less efficient than the other methods.
If you are considering "simpler" to mean "fewer function calls, then I would recommend preg_split() or preg_match_all(). I want to explain, though, that preg_match_all() adds a variable to the global scope and preg_split() doesn't have to. Also, preg_match_all() produces a multidimensional array and you merely want a 1-dim array -- this is another advantage of preg_split().
Here is a battery of options. Some are mine and some are posted by others. Some work, some work better than others, and some don't work. It's education time...
$stuff = "[1379082600-1379082720],[1379082480-1379082480],[1379514420-1379515800],";
// NOTICE THE TRAILING COMMA ON THE STRING!
// my preg_split() pattern #1 (72 steps):
var_export(preg_split('/[\],[]+/', $stuff, 0, PREG_SPLIT_NO_EMPTY));
// my preg_split() pattern #2 (72 steps):
var_export(preg_split('/[^\d-]+/', $stuff, 0, PREG_SPLIT_NO_EMPTY));
// my preg_match_all pattern #1 (16 steps):
var_export(preg_match_all('/[\d-]+/', $stuff, $matches) ? $matches[0] : 'failed');
// my preg_match_all pattern #2 (16 steps):
var_export(preg_match_all('/[^\],[]+/', $stuff, $matches) ? $matches[0] : 'failed');
// Bora's preg_match_all pattern (144 steps):
var_export(preg_match_all('/\[(.*?)\]/', $stuff, $matches) ? $matches[0] : 'failed');
// Alex Howansky's is the cleanest, efficient / correct method
var_export(explode('],[', trim($stuff, '[],')));
// Andy Gee's method (flawed / incorrect -- 4 elements in output)
var_export(explode(",", str_replace(["[","]"], "", $stuff)));
// OP's method (flawed / incorrect -- 4 elements in output)
$stuff = str_replace(["[", "]"], ["", ""], $stuff);
$stuff = explode(",", $stuff);
var_export($stuff);
If you want to see the method demonstrations click here.
If you want to see the step counts click this pattern demonstration and swap in the patterns that I have provided.

preg_match PHP suggestions

I'm not too great at preg_match yet and I was wondering if someone could give me a hand.
I have an array of values e.g. array("black*", "blue", "red", "grey*") I need to find the values with a * at the end then return the word before it.
I believe preg_match() is the best way of doing it but I'm open to suggestions.
Thanks in advanced!
If you must use a regex...
$words = array_map(function($word) {
return preg_replace('/\*\z/', '', $word);
}, $arr);
CodePad.
...but you're probably better off not using regex and using something like...
$words = array_map(function($word) {
return rtrim($word, '*');
}, $arr);
CodePad.
If you want to return only the words which have a trailing *, try something like this first...
$words = preg_grep('/\*\z/', $arr);
CodePad.
The only disadvantage with this (as mentioned in the comments) is PHP will iterate twice over the array. You can simply use a foreach loop to do both of these in one loop if you wish.
Also, it is worth mentioning anonymous functions are a PHP 5.3 thing. You can still most of this code, just separate the functions into their own named functions and pass a reference to them.
If you always have an array like that (i.e. no complex strings, just word*), you really shouldn't use regular expressions, it's an overkill.
Use string functions, like strpos for searching and str_replace or rtrim for removing *.
If you don't need fancy replacing rules (like regular expressions), you should always use this function instead of preg_replace().
— from str_replace manual
Don't need to use preg_match for this - simple char lookup on the string will work:
$words = array('red*', 'grey', 'white', 'green*');
$return = array();
foreach ($words as $word) {
if ($word[strlen($word) - 1] === '*') {
$return[] = substr($word, 0, -1);
}
}
var_dump($return);

Categories