I have this as an input to my command line interface as parameters to the executable:
-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"
What I want to is to get all of the parameters in a key-value / associative array with PHP like this:
$result = [
'Parameter1' => '1234',
'Parameter2' => '1234',
'param3' => 'Test \"escaped\"',
'param4' => '10',
'param5' => '0',
'param6' => 'TT',
'param7' => 'Seven',
'param8' => 'secret',
'SuperParam9' => '4857',
'SuperParam10' => '123',
];
The problem here lies at the following:
parameter's prefix can be - or --
parameter's glue (value assignment operator) can be either an = sign or a whitespace ' '
some parameters may be inside a quote block and can also have different, both separators and glues and prefixes, ie. a ? mark for the separator.
So far, since I'm really bad with RegEx, and still learning it, is this:
/(-[a-zA-Z]+)/gui
With which I can get all the parameters starting with an -...
I can go to manually explode the entire thing and parse it manually, but there are way too many contingencies to think about.
You can try this that uses the branch reset feature (?|...|...) to deal with the different possible formats of the values:
$str = '-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"';
$pattern = '~ --?(?<key> [^= ]+ ) [ =]
(?|
" (?<value> [^\\\\"]*+ (?s:\\\\.[^\\\\"]*)*+ ) "
|
([^ ?"]*)
)~x';
preg_match_all ($pattern, $str, $matches);
$result = array_combine($matches['key'], $matches['value']);
print_r($result);
demo
In a branch reset group, the capture groups have the same number or the same name in each branch of the alternation.
This means that (?<value> [^\\\\"]*+ (?s:\\\\.[^\\\\"]*)*+ ) is (obviously) the value named capture, but that ([^ ?"]*) is also the value named capture.
You could use
--?
(?P<key>\w+)
(?|
=(?P<value>[^-\s?"]+)
|
\h+"(?P<value>.*?)(?<!\\)"
|
\h+(?P<value>\H+)
)
See a demo on regex101.com.
Which in PHP would be:
<?php
$data = <<<DATA
-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"
DATA;
$regex = '~
--?
(?P<key>\w+)
(?|
=(?P<value>[^-\s?"]+)
|
\h+"(?P<value>.*?)(?<!\\\\)"
|
\h+(?P<value>\H+)
)~x';
if (preg_match_all($regex, $data, $matches)) {
$result = array_combine($matches['key'], $matches['value']);
print_r($result);
}
?>
This yields
Array
(
[Parameter1] => 1234
[Parameter2] => 38518
[param3] => Test \"escaped\"
[param4] => 10
[param5] => 0
[param6] => TT
[param7] => Seven
[param8] => secret
[SuperParam9] => 4857
[SuperParam10] => 123
)
I need some help. What I want is to make ignore a comma in specific string. It is a comma seperated file csv, but the name have a comma, and I need to ignore that.
What I got is
<?php
$pattern = '/([\\W,\\s]+Inc.])|[,]/';
$subject = 'hypertext language, programming, Amazon, Inc., 100';
$limit = -1;
$flags = PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE;
$result = preg_split ($pattern, $subject, $limit, $flags);
?>
Result is
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon',
3 => ' Inc.',
4 => ' 100',
);
?>
And I want the result to be
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon, Inc.',
3 => ' 100',
);
?>
Thanks for your help :)
Note that [\W,\s] = \W since \W matches any char that is not a letter, digit or underscore. However, it seems you just want to split on a , that is not followed with space(s)*+Inc..
You may use a negative lookahead to achieve this:
/,(?!\s*Inc\.)/
^^^^^^^^^^^^
See the regex demo
The (?!\s*Inc\.) will fail any , match if there are 0+ whitespaces (\s*) followed with a sequence of literal characters Inc. after them.
From your tutorial, if I pull the Amazon information as a CSV, I get the following format. Which you can then parse with one of Php's native functions. This shows you don't need to use explode or regex to handle this data. Use the right tool for the job:
<?php
$csv =<<<CSV
"amzn","Amazon.com, Inc.",765.56,"11/2/2016","4:00pm","-19.85 - -2.53%",10985
CSV;
$array = str_getcsv($csv);
var_dump($array);
Output:
array (size=7)
0 => string 'amzn' (length=4)
1 => string 'Amazon.com, Inc.' (length=16)
2 => string '765.56' (length=6)
3 => string '11/2/2016' (length=9)
4 => string '4:00pm' (length=6)
5 => string '-19.85 - -2.53%' (length=15)
6 => string '10985' (length=5)
In PHP I have an array like this:
array
0 => string 'open' (length=4)
1 => string 'http://www.google.com' (length=21)
2 => string 'blank' (length=5)
but it could also be like:
array
0 => string 'blank' (length=5)
1 => string 'open' (length=4)
2 => string 'http://www.google.com' (length=21)
now it is easy to find "blank" with in_array("blank", $array) but how can I see if one string is starting with "http"?
I've tried with
array_search('http', $array); // not working
array_search('http://www.google.com', $array); // is working
now everything after `http? could vary (how to write vary, varie? could be different is what I mean!)
Now do I need a regex or how can I check if http exists in array string?
Thanks for advices
"Welcome to PHP, there's a function for that."
Try preg_grep
preg_grep("/^http\b/i",$array);
Regex explained:
/^http\b/i
^\ / ^ `- Case insensitive match
| \/ `--- Boundary character
| `------ Literal match of http
`--------- Start of string
Try using the preg_grep function which returns an array of entries that match the pattern.
$array = array("open", "http://www.google.com", "blank");
$search = preg_grep('/http/', $array);
print_r($search);
Solution without regex:
$input = array('open', 'http://www.google.com', 'blank');
$output = array_filter($input, function($item){
return strpos($item, 'http') === 0;
});
Output:
array (size=1)
1 => string 'http://www.google.com' (length=21)
You can use preg_grep
$match = preg_grep("/http/",$array);
if(!empty($match)) echo "http exist in the array of string.";
or you can use foreach and preg_match
foreach($array as $check) {
if (preg_match("/http/", $check))
echo "http exist in the array of string.";
}
I have php code stored (( array definition )) in a string like this
$code=' array(
0 => "a",
"a" => $GlobalScopeVar,
"b" => array("nested"=>array(1,2,3)),
"c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
); ';
there is a regular expression to extract this array??, i mean i want something like
$array=(
0 => '"a"',
'a' => '$GlobalScopeVar',
'b' => 'array("nested"=>array(1,2,3))',
'c' => 'function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }',
);
pD :: i do research trying to find a regular expression but nothing was found.
pD2 :: gods of stackoverflow, let me bounty this now and i will offer 400 :3
pD3 :: this will be used in a internal app, where i need extract an array of some php file to be 'processed' in parts, i try explain with this codepad.org/td6LVVme
Regex
So here's the MEGA regex I came up with:
\s* # white spaces
########################## KEYS START ##########################
(?: # We\'ll use this to make keys optional
(?P<keys> # named group: keys
\d+ # match digits
| # or
"(?(?=\\\\")..|[^"])*" # match string between "", works even 4 escaped ones "hello \" world"
| # or
\'(?(?=\\\\\')..|[^\'])*\' # match string between \'\', same as above :p
| # or
\$\w+(?:\[(?:[^[\]]|(?R))*\])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
) # close group: keys
########################## KEYS END ##########################
\s* # white spaces
=> # match =>
)? # make keys optional
\s* # white spaces
########################## VALUES START ##########################
(?P<values> # named group: values
\d+ # match digits
| # or
"(?(?=\\\\")..|[^"])*" # match string between "", works even 4 escaped ones "hello \" world"
| # or
\'(?(?=\\\\\')..|[^\'])*\' # match string between \'\', same as above :p
| # or
\$\w+(?:\[(?:[^[\]]|(?R))*\])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
| # or
array\s*\((?:[^()]|(?R))*\) # match an array()
| # or
\[(?:[^[\]]|(?R))*\] # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
| # or
(?:function\s+)?\w+\s* # match functions: helloWorld, function name
(?:\((?:[^()]|(?R))*\)) # match function parameters (wut), (), (array(1,2,4))
(?:(?:\s*use\s*\((?:[^()]|(?R))*\)\s*)? # match use(&$var), use($foo, $bar) (optionally)
\{(?:[^{}]|(?R))*\} # match { whatever}
)?;? # match ; (optionally)
) # close group: values
########################## VALUES END ##########################
\s* # white spaces
I've put some comments, note that you need to use 3 modifiers:
x : let's me make comments
s : match newlines with dots
i : match case insensitive
PHP
$code='array(0 => "a", 123 => 123, $_POST["hello"][\'world\'] => array("is", "actually", "An array !"), 1234, \'got problem ?\',
"a" => $GlobalScopeVar, $test_further => function test($noway){echo "this works too !!!";}, "yellow" => "blue",
"b" => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3)), "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
"bug", "fixed", "mwahahahaa" => "Yeaaaah"
);'; // Sample data
$code = preg_replace('#(^\s*array\s*\(\s*)|(\s*\)\s*;?\s*$)#s', '', $code); // Just to get ride of array( at the beginning, and ); at the end
preg_match_all('~
\s* # white spaces
########################## KEYS START ##########################
(?: # We\'ll use this to make keys optional
(?P<keys> # named group: keys
\d+ # match digits
| # or
"(?(?=\\\\")..|[^"])*" # match string between "", works even 4 escaped ones "hello \" world"
| # or
\'(?(?=\\\\\')..|[^\'])*\' # match string between \'\', same as above :p
| # or
\$\w+(?:\[(?:[^[\]]|(?R))*\])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
) # close group: keys
########################## KEYS END ##########################
\s* # white spaces
=> # match =>
)? # make keys optional
\s* # white spaces
########################## VALUES START ##########################
(?P<values> # named group: values
\d+ # match digits
| # or
"(?(?=\\\\")..|[^"])*" # match string between "", works even 4 escaped ones "hello \" world"
| # or
\'(?(?=\\\\\')..|[^\'])*\' # match string between \'\', same as above :p
| # or
\$\w+(?:\[(?:[^[\]]|(?R))*\])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
| # or
array\s*\((?:[^()]|(?R))*\) # match an array()
| # or
\[(?:[^[\]]|(?R))*\] # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
| # or
(?:function\s+)?\w+\s* # match functions: helloWorld, function name
(?:\((?:[^()]|(?R))*\)) # match function parameters (wut), (), (array(1,2,4))
(?:(?:\s*use\s*\((?:[^()]|(?R))*\)\s*)? # match use(&$var), use($foo, $bar) (optionally)
\{(?:[^{}]|(?R))*\} # match { whatever}
)?;? # match ; (optionally)
) # close group: values
########################## VALUES END ##########################
\s* # white spaces
~xsi', $code, $m); // Matching :p
print_r($m['keys']); // Print keys
print_r($m['values']); // Print values
// Since some keys may be empty in case you didn't specify them in the array, let's fill them up !
foreach($m['keys'] as $index => &$key){
if($key === ''){
$key = 'made_up_index_'.$index;
}
}
$results = array_combine($m['keys'], $m['values']);
print_r($results); // printing results
Output
Array
(
[0] => 0
[1] => 123
[2] => $_POST["hello"]['world']
[3] =>
[4] =>
[5] => "a"
[6] => $test_further
[7] => "yellow"
[8] => "b"
[9] => "c"
[10] =>
[11] =>
[12] => "mwahahahaa"
[13] => "this is"
)
Array
(
[0] => "a"
[1] => 123
[2] => array("is", "actually", "An array !")
[3] => 1234
[4] => 'got problem ?'
[5] => $GlobalScopeVar
[6] => function test($noway){echo "this works too !!!";}
[7] => "blue"
[8] => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3))
[9] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
[10] => "bug"
[11] => "fixed"
[12] => "Yeaaaah"
[13] => "a test"
)
Array
(
[0] => "a"
[123] => 123
[$_POST["hello"]['world']] => array("is", "actually", "An array !")
[made_up_index_3] => 1234
[made_up_index_4] => 'got problem ?'
["a"] => $GlobalScopeVar
[$test_further] => function test($noway){echo "this works too !!!";}
["yellow"] => "blue"
["b"] => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3))
["c"] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
[made_up_index_10] => "bug"
[made_up_index_11] => "fixed"
["mwahahahaa"] => "Yeaaaah"
["this is"] => "a test"
)
Online regex demo
Online php demo
Known bug (fixed)
$code='array("aaa", "sdsd" => "dsdsd");'; // fail
$code='array(\'aaa\', \'sdsd\' => "dsdsd");'; // fail
$code='array("aaa", \'sdsd\' => "dsdsd");'; // succeed
// Which means, if a value with no keys is followed
// by key => value and they are using the same quotation
// then it will fail (first value gets merged with the key)
Online bug demo
Credits
Goes to Bart Kiers for his recursive pattern to match nested brackets.
Advice
You maybe should go with a parser since regexes are sensitive. #bwoebi has done a great job in his answer.
Even when you asked for a regex, it works also with pure PHP. token_get_all is here the key function. For a regex check #HamZa's answer out.
The advantage here is that it is more dynamic than a regex. A regex has a static pattern, while with token_get_all, you can decide after every single token what to do. It even escapes single quotes and backslashes where necessary, what a regex wouldn't do.
Also, in regex, you have, even when commented, problems to imagine what it should do; what code does is much easier to understand when you look at PHP code.
$code = ' array(
0 => "a",
"a" => $GlobalScopeVar,
"b" => array("nested"=>array(1,2,3)),
"c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
"string_literal",
12345
); ';
$token = token_get_all("<?php ".$code);
$newcode = "";
$i = 0;
while (++$i < count($token)) { // enter into array; then start.
if (is_array($token[$i]))
$newcode .= $token[$i][1];
else
$newcode .= $token[$i];
if ($token[$i] == "(") {
$ending = ")";
break;
}
if ($token[$i] == "[") {
$ending = "]";
break;
}
}
// init variables
$escape = 0;
$wait_for_non_whitespace = 0;
$parenthesis_count = 0;
$entry = "";
// main loop
while (++$i < count($token)) {
// don't match commas in func($a, $b)
if ($token[$i] == "(" || $token[$i] == "{") // ( -> normal parenthesis; { -> closures
$parenthesis_count++;
if ($token[$i] == ")" || $token[$i] == "}")
$parenthesis_count--;
// begin new string after T_DOUBLE_ARROW
if (!$escape && $wait_for_non_whitespace && (!is_array($token[$i]) || $token[$i][0] != T_WHITESPACE)) {
$escape = 1;
$wait_for_non_whitespace = 0;
$entry .= "'";
}
// here is a T_DOUBLE_ARROW, there will be a string after this
if (is_array($token[$i]) && $token[$i][0] == T_DOUBLE_ARROW && !$escape) {
$wait_for_non_whitespace = 1;
}
// entry ended: comma reached
if (!$parenthesis_count && $token[$i] == "," || ($parenthesis_count == -1 && $token[$i] == ")" && $ending == ")") || ($ending == "]" && $token[$i] == "]")) {
// go back to the first non-whitespace
$whitespaces = "";
if ($parenthesis_count == -1 || ($ending == "]" && $token[$i] == "]")) {
$cut_at = strlen($entry);
while ($cut_at && ord($entry[--$cut_at]) <= 0x20); // 0x20 == " "
$whitespaces = substr($entry, $cut_at + 1, strlen($entry));
$entry = substr($entry, 0, $cut_at + 1);
}
// $escape == true means: there was somewhere a T_DOUBLE_ARROW
if ($escape) {
$escape = 0;
$newcode .= $entry."'";
} else {
$newcode .= "'".addcslashes($entry, "'\\")."'";
}
$newcode .= $whitespaces.($parenthesis_count?")":(($ending == "]" && $token[$i] == "]")?"]":","));
// reset
$entry = "";
} else {
// add actual token to $entry
if (is_array($token[$i])) {
$addChar = $token[$i][1];
} else {
$addChar = $token[$i];
}
if ($entry == "" && $token[$i][0] == T_WHITESPACE) {
$newcode .= $addChar;
} else {
$entry .= $escape?str_replace(array("'", "\\"), array("\\'", "\\\\"), $addChar):$addChar;
}
}
}
//append remaining chars like whitespaces or ;
$newcode .= $entry;
print $newcode;
Demo at: http://3v4l.org/qe4Q1
Should output:
array(
0 => '"a"',
"a" => '$GlobalScopeVar',
"b" => 'array("nested"=>array(1,2,3))',
"c" => 'function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }',
'"string_literal"',
'12345'
)
You can, to get the array's data, print_r(eval("return $newcode;")); to get the entries of the array:
Array
(
[0] => "a"
[a] => $GlobalScopeVar
[b] => array("nested"=>array(1,2,3))
[c] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
[1] => "string_literal"
[2] => 12345
)
The clean way to do this is obviously to use the tokenizer (but keep in mind that the tokenizer alone doesn't solve the problem).
For the challenge, I purpose a regex approach.
The idea is not to describe the PHP syntax, but more to describe it in a negative way (in other words, I describe only basic and needed PHP structures to obtain the result). The advantage of this basic description is to deal with more complex objects than functions, strings, integers or booleans. The result is a more flexible pattern that can deal for example with multi/single line comments, heredoc/nowdoc syntaxes:
<pre><?php
$code=' array(
0 => "a",
"a" => $GlobalScopeVar,
"b" => array("nested"=>array(1,2,3)),
"c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
); ';
$pattern = <<<'EOD'
~
# elements
(?(DEFINE)
# comments
(?<comMulti> /\* .*? (?:\*/|\z) ) # multiline comment
(?<comInlin> (?://|\#) \N* $ ) # inline comment
(?<comments> \g<comMulti> | \g<comInlin> )
# strings
(?<strDQ> " (?>[^"\\]+|\\.)* ") # double quote string
(?<strSQ> ' (?>[^'\\]+|\\.)* ') # single quote string
(?<strHND> <<<(["']?)([a-zA-Z]\w*)\g{-2} (?>\R \N*)*? \R \g{-1} ;? (?=\R|$) ) # heredoc and nowdoc syntax
(?<string> \g<strDQ> | \g<strSQ> | \g<strHND> )
# brackets
(?<braCrl> { (?> \g<nobracket> | \g<brackets> )* } )
(?<braRnd> \( (?> \g<nobracket> | \g<brackets> )* \) )
(?<braSqr> \[ (?> \g<nobracket> | \g<brackets> )* ] )
(?<brackets> \g<braCrl> | \g<braRnd> | \g<braSqr> )
# nobracket: content between brackets except other brackets
(?<nobracket> (?> [^][)(}{"'</\#]+ | \g<comments> | / | \g<string> | <+ )+ )
# ignored elements
(?<s> \s+ | \g<comments> )
)
# array components
(?(DEFINE)
# key
(?<key> [0-9]+ | \g<string> )
# value
(?<value> (?> [^][)(}{"'</\#,\s]+ | \g<s> | / | \g<string> | <+ | \g<brackets> )+? (?=\g<s>*[,)]) )
)
(?J)
(?: \G (?!\A)(?<!\)) | array \g<s>* \( ) \g<s>* \K
(?: (?<key> \g<key> ) \g<s>* => \g<s>* )? (?<value> \g<value> ) \g<s>* (?:,|,?\g<s>*(?<stop> \) ))
~xsm
EOD;
if (preg_match_all($pattern, $code, $m, PREG_SET_ORDER)) {
foreach($m as $v) {
echo "\n<strong>Whole match:</strong> " . $v[0]
. "\n<strong>Key</strong>:\t" . $v['key']
. "\n<strong>Value</strong>:\t" . $v['value'] . "\n";
if (isset($v['stop']))
echo "\n<strong>done</strong>\n\n";
}
}
Here is what you asked for, very compact.
Please let me know if you'd like any tweaks.
THE CODE (you can run this straight in php)
$code=' array(
0 => "a",
"a" => $GlobalScopeVar,
"b" => array("nested"=>array(1,2,3)),
"c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
); ';
$regex = "~(?xm)
^[\s'\"]*([^'\"\s]+)['\"\s]*
=>\s*+
(.*?)\s*,?\s*$~";
if(preg_match_all($regex,$code,$matches,PREG_SET_ORDER)) {
$array=array();
foreach($matches as $match) {
$array[$match[1]] = $match[2];
}
echo "<pre>";
print_r($array);
echo "</pre>";
} // END IF
THE OUTPUT
Array
(
[0] => "a"
[a] => $GlobalScopeVar
[b] => array("nested"=>array(1,2,3))
[c] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
)
$array contains your array.
You like?
Please let me know if you have any questions or require tweaks. :)
Just for this situation:
$code=' array(
0=>"a",
"a"=>$GlobalScopeVar,
"b"=>array("nested"=>array(1,2,3)),
"c"=>function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; },
); ';
preg_match_all('#\s*(.*?)\s*=>\s*(.*?)\s*,?\s*$#m', $code, $m);
$array = array_combine($m[1], $m[2]);
print_r($array);
Output:
Array
(
[0] => "a"
["a"] => $GlobalScopeVar
["b"] => array("nested"=>array(1,2,3))
["c"] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
)