named groups in PHP pcre regex - php

Trying to match string like this:
/2011/10/Lorem-ipsum-dolor-it-amet-consectetur-adipisicing
/2011/10/Lorem-ipsum-dolor-it-amet-consectetur-adipisicing/
and
/2011/10/4545
/2011/10/4545/
And get year, month and the third segment back. This is regex I've got:
%/(?P<year>\d{4})/(?P<month>\d{2})/((?P<id>\d{1,})|(?P<permalink>.{1,}))[/]{0,1}$%
I though resulting matches array will always contain 3 variables: year,month and id or permalink. But what happens - if permalink is matched - I also still get empty id variable in the resulting array anyway. Is there a way to rewrite a regex so resulting array will only contain year, month and id or permalink ?

I believe named groups aren't "ignored" when using the | syntax because there's no way of knowing whether you want to keep both of the results. In other words, both sides of | are evaluated even when one of them has or doesn't have a match, unlike conditional or in most programming languages.
As an example, if you have a regular expression
/(?P<foo>abc)|(?P<bar>def)/
and the string to compare against is abcdef, in some cases you'd want to know that both subexpressions matched and so both variables should be set. And if both variables are set in some cases, it's better to set them in all cases so that the programmer doesn't first have to check if they've been set before handling them.
And as a comment to the question "Is there a way to rewrite a regex so resulting array will only contain year, month and id or permalink", why would you want that? Just check if the variable is empty. If the regex would leave either of them out, you'd still need a check which of them is set. The exact same logic can be used to check which of them is empty.

Since they are present in the regex, the named groups will be always included in the match groups even if they did not match anything due to the |.
You may also want to improve the regex a bit, substituting the . in <permalink> with [^/] because you don't want a trailing slash (if present) as part of the permalink.
However, as Mob notes, there's a much easier way to parse such an easy target:
list($year, $target, $link) = array_slice(explode('/', $url), 1);
if (is_numeric($link)) {
// $link == id
}
else {
// $link == permalink
}

You don't necesarily need regex.
$x = "/2011/10/4545";
$v = explode("/", $x);
$r = array_shift($v);
if(count($v) == 4){
array_pop($v);
print_r($v); }
Outputs
Array
(
[0] => 2011
[1] => 10
[2] => 4545
$url = "/2011/10/Lorem-ipsum-dolor-it-amet-consectetur-adipisicing";
$v = explode("/", $url);
array_shift($v);
array_pop($v);
if(count($v) == 3){
array_pop($v);
print_r($v);
} else {
print_r($v); }
Outputs
Array
(
[0] => 2011
[1] => 10
)

Related

PHP Regex Pattern - Match url if only one level deep

My question is similar to this but I can't get it to work: Path Regular Expression - Allow only one level
I have an array with a bunch or urls from a website that are either a category or sub-category page so:
http://www.mysite.com/dogs/
http://www.mysite.com/cats/
http://www.mysite.com/food/
are category pages (only level beyond the root domain)
Sub-category pages look like:
http://www.mysite.com/dogs/poodles/
http://www.mysite.com/cats/siamese/
http://www.mysite.com/food/pizza/
I want to strip out the sub-categories and only be left with category pages in the array. Any url that contains anything beyond the first set of / / after the root url should be filtered out.
I think I need to use preg_grep but using the pattern in the updated answer that I referenced above like
$regex = "#^/[^/]+/?$#";
$categories_only = preg_grep($regex,$array);
yields an empty array.
What pattern will match this correctly?
So I think you don't need regex for this task.
You could implement a function to filter the array:
$urls = array('http://www.mysite.com/dogs/',
'http://www.mysite.com/cats/siamese/junk/?trash=1&x=y',
'http://www.mysite.com/food/pizza/');
function filter_url($url) {
$split = explode('/', $url);
return (count($split) == 5 && empty($split[4])) ||
(count($split) == 4 && !empty($split[3]));
}
print_r(array_filter($urls, 'filter_url'));
This would output:
Array ( [0] => http://www.mysite.com/dogs/ )
This outputs:
Array
(
[2] => http://www.mysite.com/dogs/
[3] => http://www.mysite.com/cats/
[4] => http://www.mysite.com/food/
)
<?php
$array = array("http://www.mysite.com/dogs/poodles/",
"http://www.mysite.com/cats/siamese/",
"http://www.mysite.com/dogs/",
"http://www.mysite.com/cats/",
"http://www.mysite.com/food/",
"http://www.mysite.com/food/pizza/");
$regex = "#^http://[^/]+/?[^/]+/?$#";
$categories_only = preg_grep($regex,$array);
print_r($categories_only);
I think this works:
^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})\/([\da-z\.-]+)\/
It only allows for two forward slashes after the .com or whatever.
play here... http://rubular.com/r/TBLpnJFdJg

Regex with possible empty matches and multi-line match

I've been trying to "parse" some data using a regex, and I feel as if I'm close, but I just can't seem to bring it all home.
The data that needs parsing generally looks like this: <param>: <value>\n. The number of params can vary, just as the value can. Still, here's an example:
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_ using `\`), and even basic markup!
To push this text into an object, I put together this little expresion
if (preg_match_all('/^([^:\n\\]+):\s*(.+)/m', $this->structuredMessage, $data))
{
$data = array_combine($data[1], $data[2]);
//$data is assoc array FooID => 123456, Name => Chuck, ...
$report = new Report($data);
}
Now, this works allright most of the time, except for the User Message bit: . doesn't match new lines, because if I were to use the s flag, the second group would match everything after FooID: till the very end of the string.
I'm having to use a dirty workaround for that:
$msg = explode(end($data[1], $string);
$data[2][count($data[2])-1] = array_pop($msg);
After some testing, I've come to understand that sometimes, one or two of the parameters aren't filled in (for example the InternalID can be empty). In that case, my expression doesn't fail, but rather results in:
[1] => Array
(
[0] => FooID
[1] => Name
[2] => When
[3] => InternalID
)
[2] => Array
(
[0] => 123465
[1] => Chuck
[2] => 01/02/2013 01:23:45
[3] => User Comment: Hello,
)
I've been trying various other expressions, and came up with this:
/^([^:\n\\]++)\s{0,}:(.*+)(?!^[^:\n\\]++\s{0,}:)/m
//or:
/^([^:\n\\]+)\s{0,}:(.*)(?!^[^:\\\n]+\s{0,}:)/m
The second version being slightly slower.
That solves the issues I had with InternalID: <void>, but still leaves me with the final obstacle: User Message: <multi-line>. Using the s flag doesn't do the trick with my expression ATM.
I can only think of this:
^([^:\n\\]++)\s{0,}:((\n(?![^\n:\\]++\s{0,}:)|.)*+)
Which is, to my eye at least, too complex to be the only option. Ideas, suggestions, links, ... anything would be greatly appreciated
The following regex should work, but I'm not so sure anymore if it is the right tool for this:
preg_match_all(
'%^ # Start of line
([^:]*) # Match anything until a colon, capture in group 1
:\s* # Match a colon plus optional whitespace
( # Match and capture in group 2:
(?: # Start of non-capturing group (used for alternation)
.*$ # Either match the rest of the line
(?= # only if one of the following follows here:
\Z # The end of the string
| # or
\r?\n # a newline
[^:\n\\\\]* # followed by anything except colon, backslash or newline
: # then a colon
) # End of lookahead
| # or match
(?: # Start of non-capturing group (used for alternation/repetition)
[^:\\\\] # Either match a character except colon or backslash
| # or
\\\\. # match any escaped character
)* # Repeat as needed (end of inner non-capturing group)
) # End of outer non-capturing group
) # End of capturing group 2
$ # Match the end of the line%mx',
$subject, $result, PREG_PATTERN_ORDER);
See it live on regex101.
i'm pretty new to PHP so maybe this is totally out of whack, but maybe you could use something like
$data = <<<EOT
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n's. It can be empty, too
EOT;
if ($key = preg_match_all('~^[^:\n]+?:~m', $data, $match)) {
$val = explode('¬', preg_filter('~^[^:\n]+?:~m', '¬', $data));
array_shift($val);
$res = array_combine($match[0], $val);
}
print_r($res);
yields
Array
(
[FooID:] => 123456
[Name:] => Chuck
[When:] => 01/02/2013 01:23:45
[InternalID:] => 789654
[User Message:] => Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of
's. It can be empty, too
)
So here's what I came up with using a tricky preg_replace_callback():
$string ='FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n\'s. It can be empty, too
Yellow:cool';
$array = array();
preg_replace_callback('#^(.*?):(.*)|.*$#m', function($m)use(&$array){
static $last_key = ''; // We are going to use this as a reference
if(isset($m[1])){// If there is a normal match (key : value)
$array[$m[1]] = $m[2]; // Then add to array
$last_key = $m[1]; // define the new last key
}else{ // else
$array[$last_key] .= PHP_EOL . $m[0]; // add the whole line to the last entry
}
}, $string); // Anonymous function used thus PHP 5.3+ is required
print_r($array); // print
Online demo
Downside: I'm using PHP_EOL to add newlines which is OS related.
I think I'd avoid using regex to do this task, instead split it into sub-tasks.
Basic algorithm outline
Split the string on \n using explode
Loop over the resulting array
Split the resulting strings on : also using explode with a limit of 2.
If the produced array's length is less than 2, add the entirety of the data to the previous key's value
Else, use the first array index as your key, the second as the value unless the split colon was escaped (in which case, instead add the key + split + value to the previous key's value)
This algorithm does assume there are no keys with escaped colons. Escaped colons in values will be dealt with just fine (i.e. user input).
Code
$str = <<<EOT
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID:
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
This\: works too. And can start with any number of \\n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_
using `\`) like so `\:`, and even basic markup!
EOT;
$arr = explode("\n", $str);
$prevKey = '';
$split = ': ';
$output = array();
for ($i = 0, $arrlen = sizeof($arr); $i < $arrlen; $i++) {
$keyValuePair = explode($split, $arr[$i], 2);
// ?: Is this a valid key/value pair
if (sizeof($keyValuePair) < 2 && $i > 0) {
// -> Nope, append the value to the previous key's value
$output[$prevKey] .= "\n" . $keyValuePair[0];
}
else {
// -> Maybe
// ?: Did we miss an escaped colon
if (substr($keyValuePair[0], -1) === '\\') {
// -> Yep, this means this is a value, not a key/value pair append both key and
// value (including the split between) to the previous key's value ignoring
// any colons in the rest of the string (allowing dates to pass through)
$output[$prevKey] .= "\n" . $keyValuePair[0] . $split . $keyValuePair[1];
}
else {
// -> Nope, create a new key with a value
$output[$keyValuePair[0]] = $keyValuePair[1];
$prevKey = $keyValuePair[0];
}
}
}
var_dump($output);
Output
array(5) {
["FooID"]=>
string(6) "123456"
["Name"]=>
string(5) "Chuck"
["When"]=>
string(19) "01/02/2013 01:23:45"
["InternalID"]=>
string(0) ""
["User Message"]=>
string(293) "Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
This\: works too. And can start with any number of \n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_
using `\`) like so `\:`, and even basic markup!"
}
Online demo

Efficient way to parse this string into array in PHP?

Background
I have an array which I create by splitting a string based on every occurrence of 0d0a using preg_split('/(?<=0d0a)(?!$)/').
For example:
$string = "78781110d0a78782220d0a";
will be split into:
Array ( [0] => 78781110d0a [1] => 78782220d0a )
A valid array element has to start with 7878 and end with 0d0a.
The Problem
But sometimes, there's an additional 0d0a in the string which splits into an extra and invalid array element, i.e., that doesn't begin with 7878.
Take this string for example:
$string = "78781110d0a2220d0a78783330d0a";
This is split into:
Array ( [0] => 78781110d0a [1] => 2220d0a [2] => 78783330d0a )
But it should actually be:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a)
My Solution
I've written the following (messy) code to get around this:
$data = Array('78781110d0a','2220d0a','78783330d0a');
$i = 0; //count for $data array;
$j = 0; //count for $dataFixed array;
$dataFixed = $data;
foreach($data as $packet) {
if (substr($packet,0,4) != "7878") { //if packet doesn't start with 7878, do some fixing
if ($i != 0) { //its the first packet, can't help it!
$j++;
if ((substr(strtolower($packet), -4, 4) == "0d0a")) { //if the packet doesn't end with 0d0a, its 'mostly' not valid, so discard it
$dataFixed[$i-$j] = $dataFixed[$i-$j] . $packet;
}
unset($dataFixed[$i-$j+1]);
$dataFixed = array_values($dataFixed);
}
}
$i++;
}
Description
I first copy the array to another array $dataFixed. In a foreach loop of the $data array, I check whether it starts with 7878. If it doesn't, I join it with the previous array in $data. I then unset the current array in $dataFixed and reset the array elements with array_values.
But I'm not very confident about this solution.. Is there a better, more efficient way?
UPDATE
What if the input string doesn't end in 0d0a like its supposed to? It will stick to the previous array element..
For e.g.: in the string 78781110d0a2220d0a78783330d0a0000, 0000 should be separated as another array element.
Use another positive lookahead (?=7878) to form:
preg_split('/(?<=0d0a)(?=7878)/',$string)
Note: I removed (?!$) because I wasn't sure what that was for, based on your example data.
For example, this code:
$string = "78781110d0a2220d0a78783330d0a";
$array = preg_split('/(?<=0d0a)(?=7878)(?!$)/',$string);
print_r($array);
Results in:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a )
UPDATE:
Based on your revised question of having possible random characters at the end of the input string, you can add three lines to make a complete program of:
$string = "78781110d0a2220d0a787830d0a330d0a0000";
$array = preg_split('/(?<=0d0a)(?=7878)/',$string);
$temp = preg_split('/(7878.*0d0a)/',$array[count($array)-1],null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$array[count($array)-1] = $temp[0];
if(count($temp)>1) { $array[] = $temp[1]; }
print_r($array);
We basically do the initial splitting, then split the last element of the resulting array by the expected data format, keeping the delimiter using PREG_SPLIT_DELIM_CAPTURE. The PREG_SPLIT_NO_EMPTY ensures we won't get an empty array element if the input string doesn't end in random characters.
UPDATE 2:
Based on your comment below where it seems you're implying there might be random characters between any of the desired matches, and you want these random characters preserved, you could do this:
$string = "0078781110d0a2220d0a2220d0a0000787830d0a330d0a000078781110d0a2220d0a0000787830d0a330d0a0000";
$split1 = preg_split('/(7878.*?0d0a)/',$string,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$result = array();
foreach($split1 as $e){
$split2 = preg_split('/(.*0d0a)/',$e,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
foreach($split2 as $el){
// test if $el doesn't start with 7878 and ends with 0d0a
if(strpos($el,'7878') !== 0 && substr($el,-4) == '0d0a'){
//if(preg_match('/^(?!7878).*0d0a$/',$el) === 1){
$result[ count($result)-1 ] = $result[ count($result)-1 ] . $el;
} else {
$result[] = $el;
}
}
}
print_r($result);
The strategy employed here is different than above. First we split the input string based on the delimiter that matches your desired data, using the nongreedy regex .*?. At this point we have some strings that contain the ending of a desired value and some garbage at the end, so we split again based on the last occurrence of "0d0a" with the greedy regex .*0d0a. We then append any of those resulting values that don't start with "7878" but end with "0d0a" to the previous value, as this should repair the first and second halves that got split because it contained an extra "0d0a".
I provided two methods for the innermost if statement, one using regular expressions. The regex one is marginally slower in my testing, so I've left that one commented out.
I might still not have your full requirements, so you'll have to let me know if it works and perhaps provided your full dataset.
I think you are using a delimiter "0d0a" which also happens to be part of a content! Its not possible to avoid getting junk data as long as delimiter can also be part of content. Somehow delimiter must be unique.
Possible solutions.
Change the delimited to something else that doesn't occur as part of your data ( 000000, #!.;)
If you are definite about length of text that easy arrange item may have, use it. As per examples its not possible.
Solutions given in answers considering only sample data you have shared. If you are confidant about what will be the content of string, then these solutions given by others are pretty good to use. Otherwise these solutions wont assure you guarantee!
Best solution: Fix right delimiter then use regex or explode whatever you prefer.
Why don't you use preg_match_all instead? You can avoid all of the non-capturing groups (the look aheads, look behinds) in order to split the string (which without the non-capturing groups removes the matches), and just find the matches you're looking for:
Updated
<?php
$string = "00787817878110d0a22278780d0a78783330d0a00";
preg_match_all('/7878.*?0d0a(?=7878|[^(7878)]*?$)/', $string, $arr);
print_r($arr);
?>
Gives an array $arr[0] => ( [0] => 787817878110d0a22278780d0a, [1] => 78783330d0a ). Strips leading and trailing garbage characters (whatever doesn't start with 7878 or end with 7878 or 0d0a.
So $arr[0] would be the array of values that you are looking for.
See example on ideone
Works with multiple 7878 values and multiple 0d0a values (even though that's ridiculous).
Update
If splitting is more your style, why not avoid regular expressions altogether?
<?php
$string = "787817878110d0a22278780d0a78783330d0a";
$arr = explode('0d0a7878', $string);
$string = implode('0d0a,7878', $arr);
$arr = explode(',', $string);
print_r($arr);
?>
Here we split the string by the delimiter 0d0a7878, which is what #CharlieGorichanaz's solution is doing, and props to him for the quick, accurate solution. We then add a comma, because who doesn't love comma separated values? And we explode again on the commas for an array of desired values. Performance-wise, this ought to be faster than using regular expressions. See example.

Parsing plain text in such a way that will recognise a custom if statement

I have the following string:
$string = "The man has {NUM_DOGS} dogs."
I'm parsing this by running it through the following function:
function parse_text($string)
{
global $num_dogs;
$string = str_replace('{NUM_DOGS}', $num_dogs, $string);
return $string;
}
parse_text($string);
Where $num_dogs is a preset variable. Depending on $num_dogs, this could return any of the following strings:
The man has 1 dogs.
The man has 2 dogs.
The man has 500 dogs.
The problem is that in the case that "the man has 1 dogs", dog is pluralised, which is undesired. I know that this could be solved simply by not using the parse_text function and instead doing something like:
if($num_dogs = 1){
$string = "The man has 1 dog.";
}else{
$string = "The man has $num_dogs dogs.";
}
But in my application I'm parsing more than just {NUM_DOGS} and it'd take a lot of lines to write all the conditions.
I need a shorthand way which I can write into the initial $string which I can run through a parser, which ideally wouldn't limit me to just two true/false possibilities.
For example, let
$string = 'The man has {NUM_DOGS} [{NUM_DOGS}|0=>"dogs",1=>"dog called fred",2=>"dogs called fred and harry",3=>"dogs called fred, harry and buster"].';
Is it clear what's happened at the end? I've attempted to initiate the creation of an array using the part inside the square brackets that's after the vertical bar, then compare the key of the new array with the parsed value of {NUM_DOGS} (which by now will be the $num_dogs variable at the left of the vertical bar), and return the value of the array entry with that key.
If that's not totally confusing, is it possible using the preg_* functions?
The premise of your question is that you want to match a specific pattern and then replace it after performing additional processing on the matched text.
Seems like an ideal candidate for preg_replace_callback
The regular expressions for capturing matched parenthesis, quotes, braces etc. can become quite complicated, and to do it all with a regular expression is in fact quite inefficient. In fact you'd need to write a proper parser if that's what you require.
For this question I'm going to assume a limited level of complexity, and tackle it with a two stage parse using regex.
First of all, the most simple regex I can think off for capturing tokens between curly braces.
/{([^}]+)}/
Lets break that down.
{ # A literal opening brace
( # Begin capture
[^}]+ # Everything that's not a closing brace (one or more times)
) # End capture
} # Literal closing brace
When applied to a string with preg_match_all the results look something like:
array (
0 => array (
0 => 'A string {TOK_ONE}',
1 => ' with {TOK_TWO|0=>"no", 1=>"one", 2=>"two"}',
),
1 => array (
0 => 'TOK_ONE',
1 => 'TOK_TWO|0=>"no", 1=>"one", 2=>"two"',
),
)
Looks good so far.
Please note that if you have nested braces in your strings, i.e. {TOK_TWO|0=>"hi {x} y"}, this regex will not work. If this wont be a problem, skip down to the next section.
It is possible to do top-level matching, but the only way I have ever been able to do it is via recursion. Most regex veterans will tell you that as soon as you add recursion to a regex, it stops being a regex.
This is where the additional processing complexity kicks in, and with long complicated strings it's very easy to run out of stack space and crash your program. Use it carefully if you need to use it at all.
The recursive regex taken from one of my other answers and modified a little.
`/{((?:[^{}]*|(?R))*)}/`
Broken down.
{ # literal brace
( # begin capture
(?: # don't create another capture set
[^{}]* # everything not a brace
|(?R) # OR recurse
)* # none or more times
) # end capture
} # literal brace
And this time the ouput only matches top-level braces
array (
0 => array (
0 => '{TOK_ONE|0=>"a {nested} brace"}',
),
1 => array (
0 => 'TOK_ONE|0=>"a {nested} brace"',
),
)
Again, don't use the recursive regex unless you have to. (Your system may not even support them if it has an old PCRE library)
With that out of the way we need to work out if the token has options associated with it. Instead of having two fragments to be matched as per your question, I'd recommend keeping the options with the token as per my examples. {TOKEN|0=>"option"}
Lets assume $match contains a matched token, if we check for a pipe |, and take the substring of everything after it we'll be left with your list of options, again we can use regex to parse them out. (Don't worry I'll bring everything together at the end)
/(\d)+\s*=>\s*"([^"]*)",?/
Broken down.
(\d)+ # Capture one or more decimal digits
\s* # Any amount of whitespace (allows you to do 0 => "")
=> # Literal pointy arrow
\s* # Any amount of whitespace
" # Literal quote
([^"]*) # Capture anything that isn't a quote
" # Literal quote
,? # Maybe followed by a comma
And an example match
array (
0 => array (
0 => '0=>"no",',
1 => '1 => "one",',
2 => '2=>"two"',
),
1 => array (
0 => '0',
1 => '1',
2 => '2',
),
2 => array (
0 => 'no',
1 => 'one',
2 => 'two',
),
)
If you want to use quotes inside your quotes, you'll have to make your own recursive regex for it.
Wrapping up, here's a working example.
Some initialisation code.
$options = array(
'WERE' => 1,
'TYPE' => 'cat',
'PLURAL' => 1,
'NAME' => 2
);
$string = 'There {WERE|0=>"was a",1=>"were"} ' .
'{TYPE}{PLURAL|1=>"s"} named bob' .
'{NAME|1=>" and bib",2=>" and alice"}';
And everything together.
$string = preg_replace_callback('/{([^}]+)}/', function($match) use ($options) {
$match = $match[1];
if (false !== $pipe = strpos($match, '|')) {
$tokens = substr($match, $pipe + 1);
$match = substr($match, 0, $pipe);
} else {
$tokens = array();
}
if (isset($options[$match])) {
if ($tokens) {
preg_match_all('/(\d)+\s*=>\s*"([^"]*)",?/', $tokens, $tokens);
$tokens = array_combine($tokens[1], $tokens[2]);
return $tokens[$options[$match]];
}
return $options[$match];
}
return '';
}, $string);
Please note the error checking is minimal, there will be unexpected results if you pick options that don't exist.
There's probably a lot simpler way to do all of this, but I just took the idea and ran with it.
First of all, it is a bit debatable, but if you can easily avoid it, just pass $num_dogs as an argument to the function as most people believe global variables are evil!
Next, for the getting the "s", I generally do something like this:
$dogs_plural = ($num_dogs == 1) ? '' : 's';
Then just do something like this:
$your_string = "The man has $num_dogs dog$dogs_plural";
It's essentially the same thing as doing an if/else block, but less lines of code and you only have to write the text once.
As for the other part, I am STILL confused about what you're trying to do, but I believe you are looking for some sort of way to convert
{NUM_DOGS}|0=>"dogs",1=>"dog called fred",2=>"dogs called fred and harry",3=>"dogs called fred, harry and buster"]
into:
switch $num_dogs {
case 0:
return 'dogs';
break;
case 1:
return 'dog called fred';
break;
case 2:
return 'dogs called fred and harry';
break;
case 3:
return 'dogs called fred, harry and buster';
break;
}
The easiest way is to try to use a combination of explode() and regex to then get it to do something like I have above.
In a pinch, I have done something similar to what you're asking with an implementation vaguely like the code below.
This is nowhere near as feature rich as #Mike's answer, but it has done the trick in the past.
/**
* This function pluralizes words, as appropriate.
*
* It is a completely naive, example-only implementation.
* There are existing "inflector" implementations that do this
* quite well for many/most *English* words.
*/
function pluralize($count, $word)
{
if ($count === 1)
{
return $word;
}
return $word . 's';
}
/**
* Matches template patterns in the following forms:
* {NAME} - Replaces {NAME} with value from $values['NAME']
* {NAME:word} - Replaces {NAME:word} with 'word', pluralized using the pluralize() function above.
*/
function parse($template, array $values)
{
$callback = function ($matches) use ($values) {
$number = $values[$matches['name']];
if (array_key_exists('word', $matches)) {
return pluralize($number, $matches['word']);
}
return $number;
};
$pattern = '/\{(?<name>.+?)(:(?<word>.+?))?\}/i';
return preg_replace_callback($pattern, $callback, $template);
}
Here are some examples similar to your original question...
echo parse(
'The man has {NUM_DOGS} {NUM_DOGS:dog}.' . PHP_EOL,
array('NUM_DOGS' => 2)
);
echo parse(
'The man has {NUM_DOGS} {NUM_DOGS:dog}.' . PHP_EOL,
array('NUM_DOGS' => 1)
);
The output is:
The man has 2 dogs.
The man has 1 dog.
It may be worth mentioning that in larger projects I've invariably ended up ditching any custom rolled inflection in favour of GNU gettext which seems to be the most sane way forward once multi-lingual is a requirement.
This was copied from an answer posted by flussence back in 2009 in response to this question:
You might want to look at the gettext extension. More specifically, it sounds like ngettext() will do what you want: it pluralises words correctly as long as you have a number to count from.
print ngettext('odor', 'odors', 1); // prints "odor"
print ngettext('odor', 'odors', 4); // prints "odors"
print ngettext('%d cat', '%d cats', 4); // prints "4 cats"
You can also make it handle translated plural forms correctly, which is its main purpose, though it's quite a lot of extra work to do.

preg_math multiply responce

<?php
$string = "Movies and Stars I., 32. part";
$pattern = "((IX|IV|V?I{0,3}[\.]))";
if(preg_match($pattern, $string, $x) == false)
{
print "NAPAKA!";
}
else
{
print_r($x);
}
?>
And the response is:
Array ( [0] => I. [1] => I. )
I should get only 1 response... Why do I get multiple responses?
The element at index 0 is the whole matched string. The element at index 1 is the contents of the first capture group, i.e. the content inside the parenthesis. In this case, they just happen to be the same. Just use $x[0] to get the value you're looking for.
The nested parenthesis should, in this instance, be a "non-capturing" subpattern.
$pattern = "~((?:IX|IV|V?I{0,3}[\.]))~";
Try that. It will tell the regex compiler to not capture the results of those parenthesis into the array.
In fact, looking at your regex, you don't even need those parenthesis. Make your regex this:
$pattern = "~IX|IV|V?I{0,3}[\.]~";
That should also work.
Your pattern has multiple groups in it -> the () brackets tell you what to capture in your match.
Try this:
$pattern = "(IX|IV|V?I{0,3}[\.])";
If you have a hard time identifying the wanted groups in the result you can name them as specified in the php.net documentation.
That would look something like this:
$pattern = "(?P<groupname>IX|IV|V?I{0,3}[\.])";
You get 0-indexed for all mathced string and result for every paretness (). it's helpful to get groups i.e
preg_match('~([0-9]+)([a-z]+)','12abc',$x);
$x is ([0]=>12abc [1]=>12 [2]=>abc)
In your case you can simply delete () (1 pair ot them, 1 pair is used as delimiters)

Categories