Parsing parameters from command line with RegEx and PHP - php

I have this as an input to my command line interface as parameters to the executable:
-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"
What I want to is to get all of the parameters in a key-value / associative array with PHP like this:
$result = [
'Parameter1' => '1234',
'Parameter2' => '1234',
'param3' => 'Test \"escaped\"',
'param4' => '10',
'param5' => '0',
'param6' => 'TT',
'param7' => 'Seven',
'param8' => 'secret',
'SuperParam9' => '4857',
'SuperParam10' => '123',
];
The problem here lies at the following:
parameter's prefix can be - or --
parameter's glue (value assignment operator) can be either an = sign or a whitespace ' '
some parameters may be inside a quote block and can also have different, both separators and glues and prefixes, ie. a ? mark for the separator.
So far, since I'm really bad with RegEx, and still learning it, is this:
/(-[a-zA-Z]+)/gui
With which I can get all the parameters starting with an -...
I can go to manually explode the entire thing and parse it manually, but there are way too many contingencies to think about.

You can try this that uses the branch reset feature (?|...|...) to deal with the different possible formats of the values:
$str = '-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"';
$pattern = '~ --?(?<key> [^= ]+ ) [ =]
(?|
" (?<value> [^\\\\"]*+ (?s:\\\\.[^\\\\"]*)*+ ) "
|
([^ ?"]*)
)~x';
preg_match_all ($pattern, $str, $matches);
$result = array_combine($matches['key'], $matches['value']);
print_r($result);
demo
In a branch reset group, the capture groups have the same number or the same name in each branch of the alternation.
This means that (?<value> [^\\\\"]*+ (?s:\\\\.[^\\\\"]*)*+ ) is (obviously) the value named capture, but that ([^ ?"]*) is also the value named capture.

You could use
--?
(?P<key>\w+)
(?|
=(?P<value>[^-\s?"]+)
|
\h+"(?P<value>.*?)(?<!\\)"
|
\h+(?P<value>\H+)
)
See a demo on regex101.com.
Which in PHP would be:
<?php
$data = <<<DATA
-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"
DATA;
$regex = '~
--?
(?P<key>\w+)
(?|
=(?P<value>[^-\s?"]+)
|
\h+"(?P<value>.*?)(?<!\\\\)"
|
\h+(?P<value>\H+)
)~x';
if (preg_match_all($regex, $data, $matches)) {
$result = array_combine($matches['key'], $matches['value']);
print_r($result);
}
?>
This yields
Array
(
[Parameter1] => 1234
[Parameter2] => 38518
[param3] => Test \"escaped\"
[param4] => 10
[param5] => 0
[param6] => TT
[param7] => Seven
[param8] => secret
[SuperParam9] => 4857
[SuperParam10] => 123
)

Related

Get content in parentheses following right after string using regex in php

I have a php file as a string, I am looking for places where certain functions are called and I want to extract the passed arguments to the function.
I need to match the following cases:
some_function_name("abc123", ['key' => 'value'])
some_function_name("abc123", array("key" => 'value'))
So far I have this, but it breaks as soon as I have any nesting conditions:
(function_name)\(([^()]+)\)
$text = "test test test test some_function_name('abc123', ['key' => 'value']) sdohjsh dsfkjh spkdo sdfopmsdfohp some_function_name('abc123', array('key' => 'value'))";
preg_match_all('/\w+\(.*?\)(\)|!*)/', $text, $matches);
var_dump($matches[0]);
Is this the desired result you want?
$text = "blah some_function_name('abc123', ['key' => 'value']) blah some_function_name('abc123', array('key' => 'value')) blah";
preg_match_all('/\w+\(.+?(?:array\(.+?\)|\[.+?\])\)/', $text, $matches);
var_dump($matches);
Output:
array(1) {
[0]=>
array(2) {
[0]=>
string(48) "some_function_name('abc123', ['key' => 'value'])"
[1]=>
string(53) "some_function_name('abc123', array('key' => 'value'))"
}
}
Explanation:
\w+ # 1 or more word character (i.e. [a-zA-Z0-9_])
\( # opening parenthesis
.+? # 1 or more any character, not greedy
(?: # non capture group
array\(.+?\) # array(, 1 or more any character, )
| # OR
\[.+?\] # [, 1 or more any character, ]
) # end group
\) # closing parenthesis
I managed to solve it using the following pattern:
((\'.*?\'|\".*?\")(\s*,\s*.*?)*?\);?
Thanks everyone for your suggestions!

simple regex to remove _ and 0

I'm looking for an regex that converts the following strings (string => result):
_0001 => 1
_0001r => 1r
_0021v-s001r => 21v-s1r
_0000_0001r => 1r
It should essentially remove the _ and all zeros.
My attempt is: /[^_0]/
but for some reason it doesn't work:
https://regex101.com/r/4CWo9S/3
Why bother with regex?
It's a simple str_replace that is needed.
$str = "_0001 => 1
_0001r => 1r
_0021v-s001r => 21v-s1r
_0000_0001r => 1r";
echo str_replace(["0","_"], "", $str);
output:
1 => 1
1r => 1r
21v-s1r => 21v-s1r
1r => 1r
https://3v4l.org/BrL1M
From your question I assume you mean /[_0]/ the ^ would negate the character class.
It's because you're negating the search with the ^ token. You need just to search for /[_0]/ and replace with "".

php preg_split ignore comma in specific string

I need some help. What I want is to make ignore a comma in specific string. It is a comma seperated file csv, but the name have a comma, and I need to ignore that.
What I got is
<?php
$pattern = '/([\\W,\\s]+Inc.])|[,]/';
$subject = 'hypertext language, programming, Amazon, Inc., 100';
$limit = -1;
$flags = PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE;
$result = preg_split ($pattern, $subject, $limit, $flags);
?>
Result is
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon',
3 => ' Inc.',
4 => ' 100',
);
?>
And I want the result to be
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon, Inc.',
3 => ' 100',
);
?>
Thanks for your help :)
Note that [\W,\s] = \W since \W matches any char that is not a letter, digit or underscore. However, it seems you just want to split on a , that is not followed with space(s)*+Inc..
You may use a negative lookahead to achieve this:
/,(?!\s*Inc\.)/
^^^^^^^^^^^^
See the regex demo
The (?!\s*Inc\.) will fail any , match if there are 0+ whitespaces (\s*) followed with a sequence of literal characters Inc. after them.
From your tutorial, if I pull the Amazon information as a CSV, I get the following format. Which you can then parse with one of Php's native functions. This shows you don't need to use explode or regex to handle this data. Use the right tool for the job:
<?php
$csv =<<<CSV
"amzn","Amazon.com, Inc.",765.56,"11/2/2016","4:00pm","-19.85 - -2.53%",10985
CSV;
$array = str_getcsv($csv);
var_dump($array);
Output:
array (size=7)
0 => string 'amzn' (length=4)
1 => string 'Amazon.com, Inc.' (length=16)
2 => string '765.56' (length=6)
3 => string '11/2/2016' (length=9)
4 => string '4:00pm' (length=6)
5 => string '-19.85 - -2.53%' (length=15)
6 => string '10985' (length=5)

PregMatch on 3 digits with a dot

I have numbers like theses:
1.80
2.75
#1.55
Theses numbers are in strings and I'm trying to get them throught preg_match. At this time I have this:
$pattern = '/ [0-9]{1}\.[0-9]{2}/';
$result = preg_match($pattern, $feed, $matches);
This works pretty well but I need more precision on my preg_match and I didn't found a solution.
With this pattern, numbers like 1.556 will be found. I don't want this, my numbers length will be 4 chars. dot included.
Also, here I am not able to catch the numbers starting by a #, only a space. How can I do this?
$result = preg_match($pattern, 'test 1.556 red #1.62 blue 2.33 ?', $matches);
Here the results needed are 1.62 and 2.33
As an alternative to regular expressions, PHP-Sanitization-Filters:
$array = explode(' ', 'test 1.556 red #1.62 blue 2.33 ?');
$result = filter_var_array(
array(
'convert' => $array
),
array(
'convert' => array(
'filter' => FILTER_SANITIZE_NUMBER_FLOAT,
'flags' => FILTER_FLAG_ALLOW_FRACTION | FILTER_FORCE_ARRAY
)
)
);
var_dump(array_filter(array_map('floatval', $result['convert'])));
results in:
array(3) {
[1]=>
float(1.556)
[3]=>
float(1.62)
[5]=>
float(2.33)
}
The following pattern will match all numbers in the format of #.## with an optional leading space or at sign.
[ #]?(\d{1}\.\d{2})\b
Demo: http://regex101.com/r/eB4bL5
if you want it up to 4 precision and the # to be catched That is what you need
$pattern = '/ #*([0-9]{1}\.[0-9]{2})\b /';

Regex Optional Matches

I'm trying to match two types of strings using the preg_match function in PHP which could be the following.
'_mything_to_newthing'
'_onething'
'_mything_to_newthing_and_some_stuff'
In the third one above, I only want the "mything" and "newthing" so everything that comes after the third part is just some optional text the user could add. Ideally out of the regex would come in the cases of above;
'mything', 'newthing'
'onething'
'mything', 'newthing'
The patterns should match a-zA-Z0-9 if possible :-)
My regex is terrible, so any help would be appreciated!
Thanks in advanced.
Assuming you're talking about _ deliminated text:
$regex = '/^_([a-zA-Z0-9]+)(|_to_([a-zA-Z0-9]+).*)$/';
$string = '_mything_to_newthing_and_some_stuff';
preg_match($regex, $string, $match);
$match = array(
0 => '_mything_to_newthing_and_some_stuff',
1 => 'mything',
2 => '_to_newthing_and_some_stuff',
3 => 'newthing',
);
As far as anything farther, please provide more details and better sample text/output
Edit: You could always just use explode:
$parts = explode('_', $string);
$parts = array(
0 => '',
1 => 'mything',
2 => 'to',
3 => 'newthing',
4 => 'and',
5 => 'some',
6 => 'stuff',
);
As long as the format is consistent, it should work well...

Categories