Regex Multiple Capture of Group - php

I'm using regex to capture the dimensions of ads
Source content is an HTML File, and I'm trying to capture for content that looks like:
size[200x400,300x1200] (could be 1-4 different sizes)
I'm trying to an array with the different sizes in it
My capture code looks like this:
$size_declaration = array();
$sizes = array();
$declaration_pattern = "/size\[(\d{2,4}x\d{2,4}|\d{2,4}x\d{2,4},){1,4}\]/";
$sizes_pattern = "/\d{2,4}x\d{2,4}/";
$result = preg_match($declaration_pattern, $html, $size_declaration);
if( $result ) {
$result = preg_match_all($sizes_pattern, $size_declaration[0], $sizes);
var_dump($sizes);
}
The code above produces usable results:
$sizes = array(
[0] => array (
[0] => '200x400',
[1] => '300x1200'
)
)
but it takes quite a bit of code. I was thinking it was possible to collect the results with a single regex, but I couldn't find a result that works. Is there a way to clean this up a bit?

It's not very practical to turn it into a single expression; it would be better to keep them separate; the first expression finds the boundaries and does rudimentary content checks on the inner contents, the second expression breaks it down into individual pieces:
if (preg_match_all('/size\[([\dx,]+)\]/', $html, $matches)) {
foreach ($matches[0] as $size_declaration) {
if (preg_match_all('/\d+x\d+/', $size_declaration, $sizes)) {
print_r($sizes[0]);
}
}
}

This one is a little simpler:
$html = "size[200x400,300x600,300x100]";
if (($result = preg_match_all("/(\d{2,4}x\d{2,4}){1,4}/", $html, $matches)) > 0)
var_dump($matches);
//
// $matches =>
// array(
// (int) 0 => array(
// (int) 0 => '200x400',
// (int) 1 => '300x600',
// (int) 2 => '300x100'
// ),
// (int) 1 => array(
// (int) 0 => '200x400',
// (int) 1 => '300x600',
// (int) 2 => '300x100'
// )
// )
//

The only way is to repeat the 4 eventual sizes in the pattern:
$subject = <<<LOD
size[523x800]
size[200x400,300x1200]
size[201x300,352x1200,123x456]
size[142x396,1444x32,143x89,231x456]
LOD;
$pattern = '`size\[(\d{2,4}x\d{2,4})(?:,(\d{2,4}x\d{2,4}))?(?:,(\d{2,4}x\d{2,4}))?(?:,(\d{2,4}x\d{2,4}))?]`';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
foreach ($matches as &$match) { array_shift($match); }
print_r($matches);
The pattern can also be shorten using references to capture groups:
$pattern = '`size\[(\d{2,4}x\d{2,4})(?:,((?1)))?(?:,((?1)))?(?:,((?1)))?]`';
or with the Oniguruma syntax:
$pattern = '`size\[(\d{2,4}x\d{2,4})(?:,(\g<1>))?(?:,(\g<1>))?(?:,(\g<1>))?]`';

Related

update the string and preserve old data in array

I'm curious if it is possible to make this piece of code I've made a bit shorter and probably faster? The goal of this code below is to update the string by changing (and preserving) numbers in it with ordered replacements such as {#0}, {#1} and so on for each number found.
Also, keep that found numbers separately in array so we may recover information at any time.
The code below works but I believe it may be significantly optimized and hopefully done in one step.
$str = "Lnlhkjfs7834hfdhrf87whf4akuhf999re";//could be any string
$nums = array();
$count = 0;
$res = preg_replace_callback('/\d+/', function($match) use(&$count) {
global $nums;
$nums[] = $match[0];
return "{#".($count++)."}";
}, $str);
print_r($str); // "Lnlhkjfs7834hfdhrf87whf4akuhf999re"
print_r($res); // "Lnlhkjfs{#0}hfdhrf{#1}whf{#2}akuhf{#3}re"
print_r($nums); // ( [0] => 7834 [1] => 87 [2] => 4 [3] => 999 )
Is it possible?
$str = "Lnlhkjfs7834hfdhrf87whf4akuhf999re";//could be any string
$nums = array();
$count = 0;
$res = preg_replace_callback('/([0-9]+)/', function($match) use (&$count,&$nums) {
$nums[] = $match[0];
return "{#".($count++)."}";
}, $str);
print_r($str); // "Lnlhkjfs7834hfdhrf87whf4akuhf999re"
print_r($res); // "Lnlhkjfs{#0}hfdhrf{#1}whf{#2}akuhf{#3}re"
print_r($nums); // ( [0] => 7834 [1] => 87 [2] => 4 [3] => 999 )
After some little fixes it works. \d+ works too.
NOTE: Can not explain why global $nums; wont work. Maybe php internal issue/bug
Nothing to add to #JustOnUnderMillions answer, just an other way that avoids the callback function:
$nums = [];
$res = preg_split('~([0-9]+)~', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach ($res as $k => &$v) {
if ( $k & 1 ) {
$nums[] = $v;
$v = '{#' . ($k >> 1) . '}';
}
}
$res = implode('', $res);
Not shorter, but faster.

use preg_split to split chords and words

I'm working on a little piece of code playing handling song tabs, but i'm stuck on a problem.
I need to parse each song tab line and to split it to get chunks of chords on the one hand, and words in the other.
Each chunk would be like :
$line_chunk = array(
0 => //part of line containing one or several chords
1 => //part of line containing words
);
They should stay "grouped". I mean by this that it should split only when the function reaches the "limit" between chords and words.
I guess I should use preg_split to achieve this. I made some tests, but I've been only able to split on chords, not "groups" of chords:
$line_chunks = preg_split('/(\[[^]]*\])/', $line, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
Those examples shows you what I would like to get :
on a line containing no chords :
$input = '{intro}';
$results = array(
array(
0 => null,
1 => '{intro}
)
);
on a line containing only chords :
$input = '[C#] [Fm] [C#] [Fm] [C#] [Fm]';
$results = array(
array(
0 => '[C#] [Fm] [C#] [Fm] [C#] [Fm]',
1 => null
)
);
on a line containing both :
$input = '[C#]I’m looking for [Fm]you [G#]';
$results = array(
array(
0 => '[C#]',
1 => 'I’m looking for'
),
array(
0 => '[Fm]',
1 => 'you '
),
array(
0 => '[G#]',
1 => null
),
);
Any ideas of how to do this ?
Thanks !
preg_split isn't the way to go. Most of the time, when you have a complicated split task to achieve, it's more easy to try to match what you are interested by instead of trying to split with a not easy to define separator.
A preg_match_all approach:
$pattern = '~ \h*
(?| # open a "branch reset group"
( \[ [^]]+ ] (?: \h* \[ [^]]+ ] )*+ ) # one or more chords in capture group 1
\h*
( [^[\n]* (?<=\S) ) # eventual lyrics (group 2)
| # OR
() # no chords (group 1)
( [^[\n]* [^\s[] ) # lyrics (group 2)
) # close the "branch reset group"
~x';
if (preg_match_all($pattern, $input, $matches, PREG_SET_ORDER)) {
$result = array_map(function($i) { return [$i[1], $i[2]]; }, $matches);
print_r($result);
}
demo
A branch reset group preserves the same group numbering for each branch.
Note: feel free to add:
if (empty($i[1])) $i[1] = null;
if (empty($i[2])) $i[2] = null;
in the map function if you want to obtain null items instead of empty items.
Note2: if you work line by line, you can remove the \n from the pattern.
I would go with PHP explode:
/*
* Process data
*/
$input = '[C#]I’m looking for [Fm]you [G#]';
$parts = explode("[", $input);
$results = array();
foreach ($parts as $item)
{
$pieces = explode("]", $item);
if (count($pieces) < 2)
{
$arrayitem = array( "Chord" => $pieces[0],
"Lyric" => "");
}
else
{
$arrayitem = array( "Chord" => $pieces[0],
"Lyric" => $pieces[1]);
}
$results[] = $arrayitem;
}
/*
* Echo results
*/
foreach ($results as $str)
{
echo "Chord: " . $str["Chord"];
echo "Lyric: " . $str["Lyric"];
}
Boudaries are not tested in the code, as well as remaining whitespaces, but it is a base to work on.

PHP Regex to find first 3 match between slash

I have a string like this:
$url = '/controller/method/para1/para2/';
Expected output:
Array(
[0] => 'controller',
[1] => 'method',
[2] => array(
[0] => 'para1',
[1] => 'para2'
)
)
I am trying to build a regex to achieve this but not able to construct the pattern properly.
Please assist.
I tried to use explode function to split,
$split_url = explode('/',$url);
$controller = $split_url[1];
$method = $split_url[2];
unset($split_url[0]);
unset($split_url[1]);
unset($split_url[2]);
$para = $split_url;
But this is really not a great way of doing this and is prone to errors.
whithout regex:
$url = '/controller/method/para1/para2/para3/';
$arr = explode('/', trim($url, '/'));
$result = array_slice($arr, 0, 2);
$result[] = array_slice($arr, 2);
print_r($result);
Note: if you need to always have parameters at the same index (even if there is no method or parameters), you can change $result[] = array_slice($arr, 2); to $result[2] = array_slice($arr, 2);
Here's a slightly nasty method using explode:
$url = '/controller/method/para1/para2/para3/';
# get rid of leading and trailing slashes
$url = trim($url, '/');
$arr = explode('/', $url);
$results = array( $arr[0], $arr[1], array_slice($arr, 2) );
print_r($results);
Output:
Array
(
[0] => controller
[1] => method
[2] => Array
(
[0] => para1
[1] => para2
[2] => para3
)
)
It will work for any number of para elements.
And just to show that regexs are not scary, they're lovely fluffy friendly things, here's a regex version:
preg_match_all("/\/(\w+)/", $url, $matches);
$arr = $matches[1];
$results = array( $arr[0], $arr[1], array_slice($arr, 2) );
It's actually very easy to match this URL -- just search for / followed by alphanumeric characters (\w+).
How about something like:
$url = '/controller/method/para1/para2/para3/';
$regex = '~^/([^/]+)/([^/]+)/(?:(.*)/)?$~';
if(preg_match($regex, $url, $matches)) {
$controller = $matches[1];
$method = $matches[2];
$parameters = explode('/', $matches[3]);
}
This will capture 3 segments separated by a leading/trailing /. The 3rd segment of parameters can then be split with explode(). To get the array exactly like in your question:
$array = array($controller, $method, $parameters);
// Array
// (
// [0] => controller
// [1] => method
// [2] => Array
// (
// [0] => para1
// [1] => para2
// [2] => para3
// )
// )
An alterate way of thinking about this is to actually parse your route to determine the controller and then pass the remaining route components off to the controller to determine what to do.
$url = '/controller/method/para1/para2/para3/';
$route_parts = explode('/', $url, '/')); // we don't need leading and trailing forward slashes
$controller_str = array_shift($route_parts);
$method_str = array_shift($route_parts);
// instantiate controller object be some means (a factory pattern shown here for demo purposes)
$controller = controllerFactory::getInstance($controller_str);
// set method on controller
$controller->setMethod($method_str);
// pass parameters to controller
$controller->setParams($route_parts);
// do whatever with controller
$controller->execute();

Query string like parameters regex

From a text like:
category=[123,456,789], subcategories, id=579, not_in_category=[111,333]
I need a regex to get something like:
$params[category][0] = 123;
$params[category][1] = 456;
$params[category][2] = 789;
$params[subcategories] = ; // I just need to know that this exists
$params[id] = 579;
$params[not_category][0] = 111;
$params[not_category][1] = 333;
Thanks everyone for the help.
PS
As you suggested, I clarify that the structure and the number of items may change.
Basically the structure is:
key=value, key=value, key=value, ...
where value can be:
a single value (e.g. category=123 or postID=123 or mykey=myvalue, ...)
an "array" (e.g. category=[123,456,789])
a "boolean" where the TRUE value is an assumption from the fact that "key" exists in the array (e.g. subcategories)
This method should be flexible enough:
$str = 'category=[123,456,789], subcategories, id=579, not_in_category=[111,333]';
$str = preg_replace('#,([^0-9 ])#',', $1',$str); //fix for string format with no spaces (count=10,paginate,body_length=300)
preg_match_all('#(.+?)(,[^0-9]|$)#',$str,$sections); //get each section
$params = array();
foreach($sections[1] as $param)
{
list($key,$val) = explode('=',$param); //Put either side of the "=" into variables $key and $val
if(!is_null($val) && preg_match('#\[([0-9,]+)\]#',$val,$match)>0)
{
$val = explode(',',$match[1]); //turn the comma separated numbers into an array
}
$params[$key] = is_null($val) ? '' : $val;//Use blank string instead of NULL
}
echo '<pre>'.print_r($params,true).'</pre>';
var_dump(isset($params['subcategories']));
Output:
Array
(
[category] => Array
(
[0] => 123
[1] => 456
[2] => 789
)
[subcategories] =>
[id] => 579
[not_in_category] => Array
(
[0] => 111
[1] => 333
)
)
bool(true)
Alternate (no string manipulation before process):
$str = 'count=10,paginate,body_length=300,rawr=[1,2,3]';
preg_match_all('#(.+?)(,([^0-9,])|$)#',$str,$sections); //get each section
$params = array();
foreach($sections[1] as $k => $param)
{
list($key,$val) = explode('=',$param); //Put either side of the "=" into variables $key and $val
$key = isset($sections[3][$k-1]) ? trim($sections[3][$k-1]).$key : $key; //Fetch first character stolen by previous match
if(!is_null($val) && preg_match('#\[([0-9,]+)\]#',$val,$match)>0)
{
$val = explode(',',$match[1]); //turn the comma separated numbers into an array
}
$params[$key] = is_null($val) ? '' : $val;//Use blank string instead of NULL
}
echo '<pre>'.print_r($params,true).'</pre>';
Another alternate: full re-format of string before process for safety
$str = 'count=10,paginate,body_length=300,rawr=[1, 2,3] , name = mike';
$str = preg_replace(array('#\s+#','#,([^0-9 ])#'),array('',', $1'),$str); //fix for varying string formats
preg_match_all('#(.+?)(,[^0-9]|$)#',$str,$sections); //get each section
$params = array();
foreach($sections[1] as $param)
{
list($key,$val) = explode('=',$param); //Put either side of the "=" into variables $key and $val
if(!is_null($val) && preg_match('#\[([0-9,]+)\]#',$val,$match)>0)
{
$val = explode(',',$match[1]); //turn the comma separated numbers into an array
}
$params[$key] = is_null($val) ? '' : $val;//Use blank string instead of NULL
}
echo '<pre>'.print_r($params,true).'</pre>';
You can use JSON also, it's native in PHP : http://php.net/manual/fr/ref.json.php
It will be more easy ;)
<?php
$subject = "category=[123,456,789], subcategories, id=579, not_in_category=[111,333]";
$pattern = '/category=\[(.*?)\,(.*?)\,(.*?)\]\,\s(subcategories),\sid=(.*?)\,\snot_in_category=\[(.*?)\,(.*?)\]/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3);
print_r($matches);
?>
I think this will get you the matches out... didn't actually test it but it might be a good starting point.
Then you just need to push the matches to the correct place in the array you need. Also test if the subcategories string exists with strcmp or something...
Also, notice that I assumed your subject string has that fixe dtype of structure... if it is changing often, you'll need much more than this...
$str = 'category=[123,456,789], subcategories, id=579, not_in_category=[111,333]';
$main_arr = preg_split('/(,\s)+/', $str);
$params = array();
foreach( $main_arr as $value) {
$pos = strpos($value, '=');
if($pos === false) {
$params[$value] = null;
} else {
$index_part = substr($value, 0, $pos);
$value_part = substr($value, $pos+1, strlen($value));
$match = preg_match('/\[(.*?)\]/', $value_part,$xarr);
if($match) {
$inner_arr = preg_split('/(,)+/', $xarr[1]);
foreach($inner_arr as $v) {
$params[$index_part][] = $v;
}
} else {
$params[$index_part] = $value_part;
}
}
}
print_r( $params );
Output :
Array
(
[category] => Array
(
[0] => 123
[1] => 456
[2] => 789
)
[subcategories] =>
[id] => 579
[not_in_category] => Array
(
[0] => 111
[1] => 333
)
)

Turn text inside brackets to an array PHP

If I have a string that looks like this:
$myString = "[sometext][moretext][993][112]This is a long text";
I want it to be turned into:
$string = "This is a long text";
$arrayDigits[0] = 993;
$arrayDigits[1] = 112;
$arrayText[0] = "sometext";
$arrayText[1] = "moretext";
How can I do this with PHP?
I understand Regular Expressions is the solution. Please notice that $myString was just an example. There can be several brackets, not just two of each, as in my example.
Thanks for your help!
This is what I came up with.
<?php
#For better display
header("Content-Type: text/plain");
#The String
$myString = "[sometext][moretext][993][112]This is a long text";
#Initialize the array
$matches = array();
#Fill it with matches. It would populate $matches[1].
preg_match_all("|\[(.+?)\]|", $myString, $matches);
#Remove anything inside of square brackets, and assign to $string.
$string = preg_replace("|\[.+\]|", "", $myString);
#Display the results.
print_r($matches[1]);
print_r($string);
After that, you can iterate over the $matches array and check each value to assign it to a new array.
Try this:
$s = '[sometext][moretext][993][112]This is a long text';
preg_match_all('/\[(\w+)\]/', $s, $m);
$m[1] will contain all texts in the brakets, after this you could check type of each value. Also, you could check this using two preg_match_all: at first time with pattern /\[(\d+)\]/ (will return array of digits), in the second - pattern /\[([a-zA-z]+)\]/ (that will return words):
$s = '[sometext][moretext][993][112]This is a long text';
preg_match_all('/\[(\d+)\]/', $s, $matches);
$arrayOfDigits = $matches[1];
preg_match_all('/\[([a-zA-Z]+)\]/', $s, $matches);
$arrayOfWords = $matches[1];
For cases like yours you can make use of named subpatterns so to "tokenize" your string. With some little code, this can be made easily configurable with an array of tokens:
$subject = "[sometext][moretext][993][112]This is a long text";
$groups = array(
'digit' => '\[\d+]',
'text' => '\[\w+]',
'free' => '.+'
);
Each group contains the subpattern and it's name. They match in their order, so if the group digit matches, it won't give text a chance (which is necessary here because \d+ is a subset of \w+). This array can then turned into a full pattern:
foreach($groups as $name => &$subpattern)
$subpattern = sprintf('(?<%s>%s)', $name, $subpattern);
unset($subpattern);
$pattern = sprintf('/(?:%s)/', implode('|', $groups));
The pattern looks like this:
/(?:(?<digit>\[\d+])|(?<text>\[\w+])|(?<free>.+))/
Everything left to do is to execute it against your string, capture the matches and filter them for some normalized output:
if (preg_match_all($pattern, $subject, $matches))
{
$matches = array_intersect_key($matches, $groups);
$matches = array_map('array_filter', $matches);
$matches = array_map('array_values', $matches);
print_r($matches);
}
The matches are now nicely accessible in an array:
Array
(
[digit] => Array
(
[0] => [993]
[1] => [112]
)
[text] => Array
(
[0] => [sometext]
[1] => [moretext]
)
[free] => Array
(
[0] => This is a long text
)
)
The full example at once:
$subject = "[sometext][moretext][993][112]This is a long text";
$groups = array(
'digit' => '\[\d+]',
'text' => '\[\w+]',
'free' => '.+'
);
foreach($groups as $name => &$subpattern)
$subpattern = sprintf('(?<%s>%s)', $name, $subpattern);
unset($subpattern);
$pattern = sprintf('/(?:%s)/', implode('|', $groups));
if (preg_match_all($pattern, $subject, $matches))
{
$matches = array_intersect_key($matches, $groups);
$matches = array_map('array_filter', $matches);
$matches = array_map('array_values', $matches);
print_r($matches);
}
You could try something along the lines of:
<?php
function parseString($string) {
// identify data in brackets
static $pattern = '#(?:\[)([^\[\]]+)(?:\])#';
// result container
$t = array(
'string' => null,
'digits' => array(),
'text' => array(),
);
$t['string'] = preg_replace_callback($pattern, function($m) use(&$t) {
// shove matched string into digits/text groups
$t[is_numeric($m[1]) ? 'digits' : 'text'][] = $m[1];
// remove the brackets from the text
return '';
}, $string);
return $t;
}
$string = "[sometext][moretext][993][112]This is a long text";
$result = parseString($string);
var_dump($result);
/*
$result === array(
"string" => "This is a long text",
"digits" => array(
993,
112,
),
"text" => array(
"sometext",
"moretext",
),
);
*/
(PHP5.3 - using closures)

Categories