PHP - preg_replace not matching multiple occurrences - php

Trying to replace a string, but it seems to only match the first occurrence, and if I have another occurrence it doesn't match anything, so I think I need to add some sort of end delimiter?
My code:
$mappings = array(
'fname' => $prospect->forename,
'lname' => $prospect->surname,
'cname' => $prospect->company,
);
foreach($mappings as $key => $mapping) if(empty($mapping)) $mappings[$key] = '$2';
$match = '~{(.*)}(.*?){/.*}$~ise';
$source = 'Hello {fname}Default{/fname} {lname}Last{/lname}';
// $source = 'Hello {fname}Default{/fname}';
$text = preg_replace($match, '$mappings["$1"]', $source);
So if I use the $source that's commented, it matches fine, but if I use the one currently in the code above where there's 2 matches, it doesn't match anything and I get an error:
Message: Undefined index: fname}Default{/fname} {lname
Filename: schedule.php(62) : regexp code
So am I right in saying I need to provide an end delimiter or something?
Thanks,
Christian

Apparently your regexp matches fname}Default{/fname} {lname instead of Default.
As I mentioned here use {(.*?)} instead of {(.*)}.
{ has special meaning in regexps so you should escape it \\{.
And I recommend using preg_replace_callback instead of e modifier (you have more flow control and syntax higlighting and it's impossible to force your program to execute malicious code).
Last mistake you're making is not checking whether the requested index exists. :)
My solution would be:
<?php
class A { // Of course with better class name :)
public $mappings = array(
'fname' => 'Tested'
);
public function callback( $match)
{
if( isset( $this->mappings[$match[1]])){
return $this->mappings[$match[1]];
}
return $match[2];
}
}
$a = new A();
$match = '~\\{([^}]+)\\}(.*?)\\{/\\1\\}~is';
$source = 'Hello {fname}Default{/fname} {lname}Last{/lname}';
echo preg_replace_callback( $match, array($a, 'callback'), $source);
This results into:
[vyktor#grepfruit tmp]$ php stack.php
Hello Tested Last

Your regular expression is anchored to the end of the string so you closing {/whatever} must be the last thing in your string. Also, since your open and closing tags are simply .*, there's nothing in there to make sure they match up. What you want is to make sure that your closing tag matches your opening one - using a backreference like {(.+)}(.*?){/\1} will make sure they're the same.
I'm sure there's other gotchas in there - if you have control over the format of strings you're working with (IE - you're rolling your own templating language), I'd seriously consider moving to a simpler, easier to match format. Since you're not 'saving' the default values, having enclosing tags provides you with no added value but makes the parsing more complicated. Just using $VARNAME would work just as well and be easier to match (\$[A-Z]+), without involving back-references or having to explicitly state you're using non-greedy matching.

Related

Create a function to find a specific word in the title

I have the following title formation on my website:
It's no use going back to yesterday, because at that time I was... Lewis Carroll
Always is: The phrase… (author).
I want to delete everything after the ellipsis (…), leaving only the sentence as the title. I thought of creating a function in php that would take the parts of the titles, throw them in an array and then I would work each part, identifying the only pattern I have in the title, which is the ellipsis… and then delete everything. But when I do that, in the X space of my array, it returns the following:
was...
In position 8 of the array comes the word and the ellipsis and I don't know how to find a pattern to delete the author of the title, my pattern was the ellipsis. Any idea?
<?php
$a = get_the_title(155571);
$search = '... ';
if(preg_match("/{$search}/i", $a)) {
echo 'true';
}
?>
I tried with the code above and found the ellipsis, but I needed to bring it into an array to delete the part I need. I tried something like this:
<?php
define('WP_USE_THEMES', false);
require('./wp-blog-header.php');
global $wpdb;
$title_array = explode(' ', get_the_title(155571));
$search = '... ';
if (array_key_exists("/{$search}/i",$title_array)) {
echo "true";
}
?>
I started doing it this way, but it doesn't work, any ideas?
Thanks,
If you use regex you need to escape the string as preg_quote() would do, because a dot belongs to the pattern.
But in your simple case, I would not use a regex and just search for the three dots from the end of the string.
Note: When the elipsis come from the browser, there's no way to detect in PHP.
$title = 'The phrase... (author).';
echo getPlainTitle($title);
function getPlainTitle(string $title) {
$rpos = strrpos($title, '...');
return ($rpos === false) ? $title : substr($title, 0, $rpos);
}
will output
The phrase
First of all, since you're working with regular expressions, you need to remember that . has a special meaning there: it means "any character". So /... / just means "any three characters followed by a space", which isn't what you want. To match a literal . you need to escape it as \.
Secondly, rather than searching or splitting, you could achieve what you want by replacing part of the string. For instance, you could find everything after the ellipsis, and replace it with an empty string. To do that you want a pattern of "dot dot dot followed by anything", where "anything" is spelled .*, so \.\.\..*
$title = preg_replace('/\.\.\..*/', '', $title);

Replacing variables in a string

I am working on a multilingual website in PHP and in my languages files i often have strings which contain multiple variables that will be later filled in to complete the sentences.
Currently i am placing {VAR_NAME} in the string and manually replacing each occurence with its matching value when used.
So basically :
{X} created a thread on {Y}
becomes :
Dany created a thread on Stack Overflow
I have already thought of sprintf but i find it inconvenient because it depends on the order of the variables which can change from a language to another.
And I have already checked How replace variable in string with value in php? and for now i basically use this method.
But i am interested in knowing if there is a built-in (or maybe not) convenient way in PHP to do that considering that i already have variables named exactly as X and Y in the previous example, more like $$ for a variable variable.
So instead of doing str_replace on the string i would maybe call a function like so :
$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = '{X} created a thread on {Y}';
echo parse($lang['example']);
would also print out :
Dany created a thread on Stack Overflow
Thanks!
Edit
The strings serve as templates and can be used multiple times with different inputs.
So basically doing "{$X} ... {$Y}" won't do the trick because i will lose the template and the string will be initialized with the starting values of $X and $Y which aren't yet determined.
I'm going to add an answer here because none of the current answers really cut the mustard in my view. I'll dive straight in and show you the code I would use to do this:
function parse(
/* string */ $subject,
array $variables,
/* string */ $escapeChar = '#',
/* string */ $errPlaceholder = null
) {
$esc = preg_quote($escapeChar);
$expr = "/
$esc$esc(?=$esc*+{)
| $esc{
| {(\w+)}
/x";
$callback = function($match) use($variables, $escapeChar, $errPlaceholder) {
switch ($match[0]) {
case $escapeChar . $escapeChar:
return $escapeChar;
case $escapeChar . '{':
return '{';
default:
if (isset($variables[$match[1]])) {
return $variables[$match[1]];
}
return isset($errPlaceholder) ? $errPlaceholder : $match[0];
}
};
return preg_replace_callback($expr, $callback, $subject);
}
What does that do?
In a nutshell:
Create a regular expression using the specified escape character that will match one of three sequences (more on that below)
Feed that into preg_replace_callback(), where the callback handles two of those sequences exactly and treats everything else as a replacement operation.
Return the resulting string
The regex
The regex matches any one of these three sequences:
Two occurrences of the escape character, followed by zero or more occurrences of the escape character, followed by an opening curly brace. Only the first two occurrences of the escape character are consumed. This is replaced by a single occurrence of the escape character.
A single occurrence of the escape character followed by an opening curly brace. This is replaced by a literal open curly brace.
An opening curly brace, followed by one or more perl word characters (alpha-numerics and the underscore character) followed by a closing curly brace. This is treated as a placeholder and a lookup is performed for the name between the braces in the $variables array, if it is found then return the replacement value, if not then return the value of $errPlaceholder - by default this is null, which is treated as a special case and the original placeholder is returned (i.e. the string is not modified).
Why is it better?
To understand why it's better, let's look at the replacement approaches take by other answers. With one exception (the only failing of which is compatibility with PHP<5.4 and slightly non-obvious behaviour), these fall into two categories:
strtr() - This provides no mechanism for handling an escape character. What if your input string needs a literal {X} in it? strtr() does not account for this, and it would be substituted for the value $X.
str_replace() - this suffers from the same issue as strtr(), and another problem as well. When you call str_replace() with an array argument for the search/replace arguments, it behaves as if you had called it multiple times - one for each of the array of replacement pairs. This means that if one of your replacement strings contains a value that appears later in the search array, you will end up substituting that as well.
To demonstrate this issue with str_replace(), consider the following code:
$pairs = array('A' => 'B', 'B' => 'C');
echo str_replace(array_keys($pairs), array_values($pairs), 'AB');
Now, you'd probably expect the output here to be BC but it will actually be CC (demo) - this is because the first iteration replaced A with B, and in the second iteration the subject string was BB - so both of these occurrences of B were replaced with C.
This issue also betrays a performance consideration that might not be immediately obvious - because each pair is handled separately, the operation is O(n), for each replacement pair the entire string is searched and the single replacement operation handled. If you had a very large subject string and a lot of replacement pairs, that's a sizeable operation going on under the bonnet.
Arguably this performance consideration is a non-issue - you would need a very large string and a lot of replacement pairs before you got a meaningful slowdown, but it's still worth remembering. It's also worth remembering that regex has performance penalties of its own, so in general this consideration shouldn't be included in the decision-making process.
Instead we use preg_replace_callback(). This visits any given part of the string looking for matches exactly once, within the bounds of the supplied regular expression. I add this qualifier because if you write an expression that causes catastrophic backtracking then it will be considerably more than once, but in this case that shouldn't be a problem (to help avoid this I made the only repetition in the expression possessive).
We use preg_replace_callback() instead of preg_replace() to allow us to apply custom logic while looking for the replacement string.
What this allows you to do
The original example from the question
$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = '{X} created a thread on {Y}';
echo parse($lang['example']);
This becomes:
$pairs = array(
'X' = 'Dany',
'Y' = 'Stack Overflow',
);
$lang['example'] = '{X} created a thread on {Y}';
echo parse($lang['example'], $pairs);
// Dany created a thread on Stack Overflow
Something more advanced
Now let's say we have:
$lang['example'] = '{X} created a thread on {Y} and it contained {X}';
// Dany created a thread on Stack Overflow and it contained Dany
...and we want the second {X} to appear literally in the resulting string. Using the default escape character of #, we would change it to:
$lang['example'] = '{X} created a thread on {Y} and it contained #{X}';
// Dany created a thread on Stack Overflow and it contained {X}
OK, looks good so far. But what if that # was supposed to be a literal?
$lang['example'] = '{X} created a thread on {Y} and it contained ##{X}';
// Dany created a thread on Stack Overflow and it contained #Dany
Note that the regular expression has been designed to only pay attention to escape sequences that immediately precede an opening curly brace. This means that you don't need to escape the escape character unless it appears immediately in front of a placeholder.
A note about the use of an array as an argument
Your original code sample uses variables named the same way as the placeholders in the string. Mine uses an array with named keys. There are two very good reasons for this:
Clarity and security - it's much easier to see what will end up being substituted, and you don't risk accidentally substituting variables you don't want to be exposed. It wouldn't be much good if someone could simply feed in {dbPass} and see your database password, now would it?
Scope - it's not possible to import variables from the calling scope unless the caller is the global scope. This makes the function useless if called from another function, and importing data from another scope is very bad practice.
If you really want to use named variables from the current scope (and I do not recommend this due to the aforementioned security issues) you can pass the result of a call to get_defined_vars() to the second argument.
A note about choosing an escape character
You'll notice I chose # as the default escape character. You can use any character (or sequence of characters, it can be more than one) by passing it to the third argument - and you may be tempted to use \ since that's what many languages use, but hold on before you do that.
The reason you don't want to use \ is because many languages use it as their own escape character, which means that when you want to specify your escape character in, say, a PHP string literal, you run into this problem:
$lang['example'] = '\\{X}'; // results in {X}
$lang['example'] = '\\\{X}'; // results in \Dany
$lang['example'] = '\\\\{X}'; // results in \Dany
It can lead to a readability nightmare, and some non-obvious behaviour with complex patterns. Pick an escape character that is not used by any other language involved (for example, if you are using this technique to generate fragments of HTML, don't use & as an escape character either).
To sum up
What you are doing has edge-cases. To solve the problem properly, you need to use a tool capable of handling those edge-cases - and when it comes to string manipulation, the tool for the job is most often regex.
Here's a portable solution, using variable variables. Yay!
$string = "I need to replace {X} and {Y}";
$X = 'something';
$Y = 'something else';
preg_match_all('/\{(.*?)\}/', $string, $matches);
foreach ($matches[1] as $value)
{
$string = str_replace('{'.$value.'}', ${$value}, $string);
}
First you set up your string, and your replacements. Then, you perform a regular expression to get an array of matches (strings within { and }, including those brackets). Finally, you loop around these and replace those with the variables you created above, using variable variables. Lovely!
Just thought I'd update this with another option even though you've marked it as correct. You don't have to use variable variables, and an array can be used in it's place.
$map = array(
'X' => 'something',
'Y' => 'something else'
);
preg_match_all('/\{(.*?)\}/', $string, $matches);
foreach ($matches[1] as $value)
{
$string = str_replace('{'.$value.'}', $map[$value], $string);
}
That would allow you to create a function with the following signature:
public function parse($string, $map); // Probably what I'd do tbh
Another option thanks to toolmakersteve in the comments does away with the need for a loop and uses strtr, but requires minor additions to the variables and single quotes instead of double quotes:
$string = 'I need to replace {$X} and {$Y}';
$map = array(
'{$X}' => 'something',
'{$Y}' => 'something else'
);
$string = strtr($string, $map);
If you're running 5.4 and you care about being able to use PHP's builtin variable interpolation in the string, you can use the bindTo() method of Closure like so:
// Strings use interpolation, but have to return themselves from an anon func
$strings = [
'en' => [
'message_sent' => function() { return "You just sent a message to $this->recipient that said: $this->message."; }
],
'es' => [
'message_sent' => function() { return "Acabas de enviar un mensaje a $this->recipient que dijo: $this->message."; }
]
];
class LocalizationScope {
private $data;
public function __construct($data) {
$this->data = $data;
}
public function __get($param) {
if(isset($this->data[$param])) {
return $this->data[$param];
}
return '';
}
}
// Bind the string anon func to an object of the array data passed in and invoke (returns string)
function localize($stringCb, $data) {
return $stringCb->bindTo(new LocalizationScope($data))->__invoke();
}
// Demo
foreach($strings as $str) {
var_dump(localize($str['message_sent'], array(
'recipient' => 'Jeff Atwood',
'message' => 'The project should be done in 6 to 8 weeks.'
)));
}
//string(93) "You just sent a message to Jeff Atwood that said: The project should be done in 6 to 8 weeks."
//string(95) "Acabas de enviar un mensaje a Jeff Atwood que dijo: The project should be done in 6 to 8 weeks."
(Codepad Demo)
Perhaps, it feels a bit hacky, and I don't particularly like using $this in this instance. But you do get the added benefit of relying on PHP's variable interpolation (which allows you to do things like escaping, that are difficult to achieve with regex).
EDIT: Added LocalizationScope, which adds another benefit: no warnings if localization anonymous functions try to access data that was not provided.
strtr is probably a better choice for this kind of things, because it replaces longest keys first:
$repls = array(
'X' => 'Dany',
'Y' => 'Stack Overflow',
);
foreach($data as $key => $value)
$repls['{' . $key . '}'] = $value;
$result = strtr($text, $repls);
(think of situations where you have keys like XX and X)
And if you don't want to use an array and instead expose all variables from the current scope:
$repls = get_defined_vars();
If your only issue with sprintf is the order of the arguments you can use argument swapping.
From the doc (http://php.net/manual/en/function.sprintf.php):
$format = 'The %2$s contains %1$d monkeys';
echo sprintf($format, $num, $location);
gettext is a widely used universal localization system that does exactly what you want.
There are libraries for most programming languages and PHP has a built-in engine.
It is driven by po-files, simple text based format, for which there are many editors around and it is compatible with sprintf syntax.
It even has some functions to deal with things like complicated plurals that some languages have.
Here are some examples of what it does. Note that _() is an alias for gettext():
echo _('Hello world'); // will output hello world in the current selected language
echo sprintf(_("%s has created a thread on %s"), $name, $site); // translates the string, and hands it over to sprintf()
echo sprintf(_("%2$s has created a thread on %1$s"), $site, $name); // same as above, but with changed order of parameters.
If you have more than a handful of strings, you should definitely use an existing engine, rather than writing your own one.
Adding a new language is just a matter of translating a list of strings and most professional translation tools can work with this file format, too.
Check Wikipedia and the PHP documentation for a basic overview on how this works:
http://en.wikipedia.org/wiki/Gettext
http://de.php.net/gettext
Google finds heaps of documentation and your favourite software repository will most likely have a handful of tools for managing po-files.
Some that I have used are:
poedit: Very light and simple. Good if you don't have too much stuff to translate and don't want to spend time thinking about how that stuff works.
Virtaal: A bit more complex and has a bit of a learning curve, but also some nice features that make your life easier. Good if you need to translate a lot.
GlotPress is a web application (from the wordpress people) that allows collaborative editing of the translation database files.
Why not use str_replace then? If you want it as template.
echo str_replace(array('{X}', '{Y}'), array($X, $Y), $lang['example']);
for every occurrence of this that you need
str_replace was built for this in the first place.
How about defining the "variable" parts as an array with keys corresponding to the placeholders in your string?
$string = "{X} created a thread on {Y}";
$values = array(
'X' => "Danny",
'Y' => "Stack Overflow",
);
echo str_replace(
array_map(function($v) { return '{'.$v.'}'; }, array_keys($values)),
array_values($values),
$string
);
Why can't you just use the template string within a function?
function threadTemplate($x, $y) {
return "{$x} created a thread on {$y}";
}
echo threadTemplate($foo, $bar);
Simple:
$X = 'Dany';
$Y = 'Stack Overflow';
$lang['example'] = "{$X} created a thread on {$Y}";
Hence:
echo $lang['example'];
Will output:
Dany created a thread on Stack Overflow
As you requested.
UPDATE:
As per the OP's comments about making the solution more portable:
Have a class do the parsing for you each time:
class MyParser {
function parse($vstr) {
return "{$x} created a thread on {$y}";
}
}
That way, if the following occurs:
$X = 3;
$Y = 4;
$a = new MyParser();
$lang['example'] = $a->parse($X, $Y);
echo $lang['example'];
Which will return:
3 created a thread on 4;
And, double checking:
$X = 'Steve';
$Y = 10.9;
$lang['example'] = $a->parse($X, $Y);
Will print:
Steve created a thread on 10.9;
As desired.
UPDATE 2:
As per the OP's comments about improving portability:
class MyParser {
function parse($vstr) {
return "{$vstr}";
}
}
$a = new MyParser();
$X = 3;
$Y = 4;
$vstr = "{$X} created a thread on {$Y}";
$a = new MyParser();
$lang['example'] = $a->parse($vstr);
echo $lang['example'];
Will output the results cited previously.
Try
$lang['example'] = "$X created a thread on $Y";
EDIT: Based on latest info
Maybe you need to look at the sprintf() function
Then you could have your template string defined as this
$template_string = '%s created a thread on %s';
$X = 'Fred';
$Y = 'Sunday';
echo sprintf( $template_string, $X, $Y );
$template_string does not change but later in your code when you have assigned different values to $X and $Y you can still use the echo sprintf( $template_string, $X, $Y );
See PHP Manual
just throwing another solution in using associative arrays. This will loop through the associative array and either replace the template or leave it blank.
example:
$list = array();
$list['X'] = 'Dany';
$list['Y'] = 'Stack Overflow';
$str = '{X} created a thread on {Y}';
$newstring = textReplaceContent($str,$list);
function textReplaceContent($contents, $list) {
while (list($key, $val) = each($list)) {
$key = "{" . $key . "}";
if ($val) {
$contents = str_replace($key, $val, $contents);
} else {
$contents = str_replace($key, "", $contents);
}
}
$final = preg_replace('/\[\w+\]/', '', $contents);
return ($final);
}

Parsing plain text in such a way that will recognise a custom if statement

I have the following string:
$string = "The man has {NUM_DOGS} dogs."
I'm parsing this by running it through the following function:
function parse_text($string)
{
global $num_dogs;
$string = str_replace('{NUM_DOGS}', $num_dogs, $string);
return $string;
}
parse_text($string);
Where $num_dogs is a preset variable. Depending on $num_dogs, this could return any of the following strings:
The man has 1 dogs.
The man has 2 dogs.
The man has 500 dogs.
The problem is that in the case that "the man has 1 dogs", dog is pluralised, which is undesired. I know that this could be solved simply by not using the parse_text function and instead doing something like:
if($num_dogs = 1){
$string = "The man has 1 dog.";
}else{
$string = "The man has $num_dogs dogs.";
}
But in my application I'm parsing more than just {NUM_DOGS} and it'd take a lot of lines to write all the conditions.
I need a shorthand way which I can write into the initial $string which I can run through a parser, which ideally wouldn't limit me to just two true/false possibilities.
For example, let
$string = 'The man has {NUM_DOGS} [{NUM_DOGS}|0=>"dogs",1=>"dog called fred",2=>"dogs called fred and harry",3=>"dogs called fred, harry and buster"].';
Is it clear what's happened at the end? I've attempted to initiate the creation of an array using the part inside the square brackets that's after the vertical bar, then compare the key of the new array with the parsed value of {NUM_DOGS} (which by now will be the $num_dogs variable at the left of the vertical bar), and return the value of the array entry with that key.
If that's not totally confusing, is it possible using the preg_* functions?
The premise of your question is that you want to match a specific pattern and then replace it after performing additional processing on the matched text.
Seems like an ideal candidate for preg_replace_callback
The regular expressions for capturing matched parenthesis, quotes, braces etc. can become quite complicated, and to do it all with a regular expression is in fact quite inefficient. In fact you'd need to write a proper parser if that's what you require.
For this question I'm going to assume a limited level of complexity, and tackle it with a two stage parse using regex.
First of all, the most simple regex I can think off for capturing tokens between curly braces.
/{([^}]+)}/
Lets break that down.
{ # A literal opening brace
( # Begin capture
[^}]+ # Everything that's not a closing brace (one or more times)
) # End capture
} # Literal closing brace
When applied to a string with preg_match_all the results look something like:
array (
0 => array (
0 => 'A string {TOK_ONE}',
1 => ' with {TOK_TWO|0=>"no", 1=>"one", 2=>"two"}',
),
1 => array (
0 => 'TOK_ONE',
1 => 'TOK_TWO|0=>"no", 1=>"one", 2=>"two"',
),
)
Looks good so far.
Please note that if you have nested braces in your strings, i.e. {TOK_TWO|0=>"hi {x} y"}, this regex will not work. If this wont be a problem, skip down to the next section.
It is possible to do top-level matching, but the only way I have ever been able to do it is via recursion. Most regex veterans will tell you that as soon as you add recursion to a regex, it stops being a regex.
This is where the additional processing complexity kicks in, and with long complicated strings it's very easy to run out of stack space and crash your program. Use it carefully if you need to use it at all.
The recursive regex taken from one of my other answers and modified a little.
`/{((?:[^{}]*|(?R))*)}/`
Broken down.
{ # literal brace
( # begin capture
(?: # don't create another capture set
[^{}]* # everything not a brace
|(?R) # OR recurse
)* # none or more times
) # end capture
} # literal brace
And this time the ouput only matches top-level braces
array (
0 => array (
0 => '{TOK_ONE|0=>"a {nested} brace"}',
),
1 => array (
0 => 'TOK_ONE|0=>"a {nested} brace"',
),
)
Again, don't use the recursive regex unless you have to. (Your system may not even support them if it has an old PCRE library)
With that out of the way we need to work out if the token has options associated with it. Instead of having two fragments to be matched as per your question, I'd recommend keeping the options with the token as per my examples. {TOKEN|0=>"option"}
Lets assume $match contains a matched token, if we check for a pipe |, and take the substring of everything after it we'll be left with your list of options, again we can use regex to parse them out. (Don't worry I'll bring everything together at the end)
/(\d)+\s*=>\s*"([^"]*)",?/
Broken down.
(\d)+ # Capture one or more decimal digits
\s* # Any amount of whitespace (allows you to do 0 => "")
=> # Literal pointy arrow
\s* # Any amount of whitespace
" # Literal quote
([^"]*) # Capture anything that isn't a quote
" # Literal quote
,? # Maybe followed by a comma
And an example match
array (
0 => array (
0 => '0=>"no",',
1 => '1 => "one",',
2 => '2=>"two"',
),
1 => array (
0 => '0',
1 => '1',
2 => '2',
),
2 => array (
0 => 'no',
1 => 'one',
2 => 'two',
),
)
If you want to use quotes inside your quotes, you'll have to make your own recursive regex for it.
Wrapping up, here's a working example.
Some initialisation code.
$options = array(
'WERE' => 1,
'TYPE' => 'cat',
'PLURAL' => 1,
'NAME' => 2
);
$string = 'There {WERE|0=>"was a",1=>"were"} ' .
'{TYPE}{PLURAL|1=>"s"} named bob' .
'{NAME|1=>" and bib",2=>" and alice"}';
And everything together.
$string = preg_replace_callback('/{([^}]+)}/', function($match) use ($options) {
$match = $match[1];
if (false !== $pipe = strpos($match, '|')) {
$tokens = substr($match, $pipe + 1);
$match = substr($match, 0, $pipe);
} else {
$tokens = array();
}
if (isset($options[$match])) {
if ($tokens) {
preg_match_all('/(\d)+\s*=>\s*"([^"]*)",?/', $tokens, $tokens);
$tokens = array_combine($tokens[1], $tokens[2]);
return $tokens[$options[$match]];
}
return $options[$match];
}
return '';
}, $string);
Please note the error checking is minimal, there will be unexpected results if you pick options that don't exist.
There's probably a lot simpler way to do all of this, but I just took the idea and ran with it.
First of all, it is a bit debatable, but if you can easily avoid it, just pass $num_dogs as an argument to the function as most people believe global variables are evil!
Next, for the getting the "s", I generally do something like this:
$dogs_plural = ($num_dogs == 1) ? '' : 's';
Then just do something like this:
$your_string = "The man has $num_dogs dog$dogs_plural";
It's essentially the same thing as doing an if/else block, but less lines of code and you only have to write the text once.
As for the other part, I am STILL confused about what you're trying to do, but I believe you are looking for some sort of way to convert
{NUM_DOGS}|0=>"dogs",1=>"dog called fred",2=>"dogs called fred and harry",3=>"dogs called fred, harry and buster"]
into:
switch $num_dogs {
case 0:
return 'dogs';
break;
case 1:
return 'dog called fred';
break;
case 2:
return 'dogs called fred and harry';
break;
case 3:
return 'dogs called fred, harry and buster';
break;
}
The easiest way is to try to use a combination of explode() and regex to then get it to do something like I have above.
In a pinch, I have done something similar to what you're asking with an implementation vaguely like the code below.
This is nowhere near as feature rich as #Mike's answer, but it has done the trick in the past.
/**
* This function pluralizes words, as appropriate.
*
* It is a completely naive, example-only implementation.
* There are existing "inflector" implementations that do this
* quite well for many/most *English* words.
*/
function pluralize($count, $word)
{
if ($count === 1)
{
return $word;
}
return $word . 's';
}
/**
* Matches template patterns in the following forms:
* {NAME} - Replaces {NAME} with value from $values['NAME']
* {NAME:word} - Replaces {NAME:word} with 'word', pluralized using the pluralize() function above.
*/
function parse($template, array $values)
{
$callback = function ($matches) use ($values) {
$number = $values[$matches['name']];
if (array_key_exists('word', $matches)) {
return pluralize($number, $matches['word']);
}
return $number;
};
$pattern = '/\{(?<name>.+?)(:(?<word>.+?))?\}/i';
return preg_replace_callback($pattern, $callback, $template);
}
Here are some examples similar to your original question...
echo parse(
'The man has {NUM_DOGS} {NUM_DOGS:dog}.' . PHP_EOL,
array('NUM_DOGS' => 2)
);
echo parse(
'The man has {NUM_DOGS} {NUM_DOGS:dog}.' . PHP_EOL,
array('NUM_DOGS' => 1)
);
The output is:
The man has 2 dogs.
The man has 1 dog.
It may be worth mentioning that in larger projects I've invariably ended up ditching any custom rolled inflection in favour of GNU gettext which seems to be the most sane way forward once multi-lingual is a requirement.
This was copied from an answer posted by flussence back in 2009 in response to this question:
You might want to look at the gettext extension. More specifically, it sounds like ngettext() will do what you want: it pluralises words correctly as long as you have a number to count from.
print ngettext('odor', 'odors', 1); // prints "odor"
print ngettext('odor', 'odors', 4); // prints "odors"
print ngettext('%d cat', '%d cats', 4); // prints "4 cats"
You can also make it handle translated plural forms correctly, which is its main purpose, though it's quite a lot of extra work to do.

Get more backreferences from regexp than parenthesis

Ok this is really difficult to explain in English, so I'll just give an example.
I am going to have strings in the following format:
key-value;key1-value;key2-...
and I need to extract the data to be an array
array('key'=>'value','key1'=>'value1', ... )
I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:
/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/
to work with preg_match and this code:
for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
$parameters[$matches[$i]] = $matches[$i+1];
}
However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.
In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.
You can use a lookahead to validate the input while you extract the matches:
/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/
(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.
This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.
If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.
regex is powerful tool, but sometimes, its not the best approach.
$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
$e = explode("-",$k);
$array[$e[0]]=$e[1];
}
print_r($array);
Use preg_match_all() instead. Maybe something like:
$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';
preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$parameters[$match[1]] = $match[2];
}
print_r($parameters);
EDIT:
to first validate if the input string conforms to the pattern, then just use:
if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
EDIT2: the final semicolon is optional
if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.
what about this solution:
$samples = array(
"good" => "key-value;key1-value;key2-value;key5-value;key-value;",
"bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
"bad2" => "key;key1-value;key2-value;key5-value;key-value;",
"bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);
foreach($samples as $name => $value) {
if (preg_match("/^(\w+-\w+;)+$/", $value)) {
printf("'%s' matches\n", $name);
} else {
printf("'%s' not matches\n", $name);
}
}
I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.

Regular Expressions: how to do "option split" replaces

those reqular expressions drive me crazy. I'm stuck with this one:
test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not
Task:
Remove all [[ and ]] and if there is an option split choose the later one so output should be:
test1:link test2:silver test3:out1insideout2 test4:this|not
I came up with (PHP)
$text = preg_replace("/\\[\\[|\\]\\]/",'',$text); // remove [[ or ]]
this works for part1 of the task. but before that I think I should do the option split, my best solution:
$text = preg_replace("/\\[\\[(.*\|)(.*?)\\]\\]/",'$2',$text);
Result:
test1:silver test3:[[out1[[inside]]out2]] this|not
I'm stuck. may someone with some free minutes help me? Thanks!
I think the easiest way to do this would be multiple passes. Use a regular expression like:
\[\[(?:[^\[\]]*\|)?([^\[\]]+)\]\]
This will replace option strings to give you the last option from the group. If you run it repeatedly until it no longer matches, you should get the right result (the first pass will replace [[out1[[inside]]out2]] with [[out1insideout2]] and the second will ditch the brackets.
Edit 1: By way of explanation,
\[\[ # Opening [[
(?: # A non-matching group (we don't want this bit)
[^\[\]] # Non-bracket characters
* # Zero or more of anything but [
\| # A literal '|' character representing the end of the discarded options
)? # This group is optional: if there is only one option, it won't be present
( # The group we're actually interested in ($1)
[^\[\]] # All the non-bracket characters
+ # Must be at least one
) # End of $1
\]\] # End of the grouping.
Edit 2: Changed expression to ignore ']' as well as '[' (it works a bit better like that).
Edit 3: There is no need to know the number of nested brackets as you can do something like:
$oldtext = "";
$newtext = $text;
while ($newtext != $oldtext)
{
$oldtext = $newtext;
$newtext = preg_replace(regexp,replace,$oldtext);
}
$text = $newtext;
Basically, this keeps running the regular expression replace until the output is the same as the input.
Note that I don't know PHP, so there are probably syntax errors in the above.
This is impossible to do in one regular expression since you want to keep content in multiple "hierarchies" of the content. It would be possible otherwise, using a recursive regular expression.
Anyways, here's the simplest, most greedy regular expression I can think of. It should only replace if the content matches your exact requirements.
You will need to escape all backslashes when putting it into a string (\ becomes \\.)
\[\[((?:[^][|]+|(?!\[\[|]])[^|])++\|?)*]]
As others have already explained, you use this with multiple passes. Keep looping while there are matches, performing replacement (only keeping match group 1.)
Difference from other regular expressions here is that it will allow you to have single brackets in the content, without breaking:
test1:[[link]] test2:[[gold|si[lv]er]]
test3:[[out1[[in[si]de]]out2]] test4:this|not
becomes
test1:[[link]] test2:si[lv]er
test3:out1in[si]deout2 test4:this|not
Why try to do it all in one go. Remove the [[]] first and then deal with options, do it in two lines of code.
When trying to get something going favour clarity and simplicity.
Seems like you have all the pieces.
Why not just simply remove any brackets that are left?
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$str = preg_replace('/\\[\\[(?:[^|\\]]+\\|)+([^\\]]+)\\]\\]/', '$1', $str);
$str = str_replace(array('[', ']'), '', $str);
Well, I didn't stick to just regex, because I'm of a mind that trying to do stuff like this with one big regex leads you to the old joke about "Now you have two problems". However, give something like this a shot:
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not'; $reg = '/(.*?):(.*?)( |$)/';
preg_match_all($reg, $str, $m);
foreach($m[2] as $pos => $match) {
if (strpos($match, '|') !== FALSE && strpos($match, '[[') !== FALSE ) {
$opt = explode('|', $match); $match = $opt[count($opt)-1];
}
$m[2][$pos] = str_replace(array('[', ']'),'', $match );
}
foreach($m[1] as $k=>$v) $result[$k] = $v.':'.$m[2][$k];
This is C# using only using non-escaped strings, hence you will have to double the backslashes in other languages.
String input = "test1:[[link]] " +
"test2:[[gold|silver]] " +
"test3:[[out1[[inside]]out2]] " +
"test4:this|not";
String step1 = Regex.Replace(input, #"\[\[([^|]+)\|([^\]]+)\]\]", #"[[$2]]");
String step2 = Regex.Replace(step1, #"\[\[|\]\]", String.Empty);
// Prints "test1:silver test3:out1insideout2 test4:this|not"
Console.WriteLine(step2);
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$s = preg_split("/\s+/",$str);
foreach ($s as $k=>$v){
$v = preg_replace("/\[\[|\]\]/","",$v);
$j = explode(":",$v);
$j[1]=preg_replace("/.*\|/","",$j[1]);
print implode(":",$j)."\n";
}

Categories