How to rewrite a string and get params by pattern in php - php

$string = '/start info#example.com';
$pattern = '/{command} {name}#{domain}';
get array params in php, Like the example below:
['command' => 'start', 'name' => 'info', 'domain' => 'example.com']
and
$string = '/start info#example.com';
$pattern = '/{command} {email}';
['command' => 'start', 'email' => 'info#example.com']
and
$string = '/start info#example.com';
$pattern = '{command} {email}';
['command' => '/start', 'email' => 'info#example.com']

If its a single line string you can use preg_match and a regular expression such as this
preg_match('/^\/(?P<command>\w+)\s(?P<name>[^#]+)\#(?P<domain>.+?)$/', '/start info#example.com', $match );
But depending on variation in the data you may have to adjust the regx a bit. This outputs
command [1-6] start
name [7-11] info
domain [12-23] example.com
but it will also have the numeric index in the array.
https://regex101.com/r/jN8gP7/1
Just to break this down a bit, in English.
The leading ^ is start of line, then named capture ( \w (any a-z A-Z 0-9 _ ) ) then a space \s then named capture of ( anything but the #t sign [^#] ), then the #t sign #, then name captured of ( anything .+? to the end $ )
This will capture anything in this format,
(abc123_ ) space (anything but #)#(anything)

Related

Parsing parameters from command line with RegEx and PHP

I have this as an input to my command line interface as parameters to the executable:
-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"
What I want to is to get all of the parameters in a key-value / associative array with PHP like this:
$result = [
'Parameter1' => '1234',
'Parameter2' => '1234',
'param3' => 'Test \"escaped\"',
'param4' => '10',
'param5' => '0',
'param6' => 'TT',
'param7' => 'Seven',
'param8' => 'secret',
'SuperParam9' => '4857',
'SuperParam10' => '123',
];
The problem here lies at the following:
parameter's prefix can be - or --
parameter's glue (value assignment operator) can be either an = sign or a whitespace ' '
some parameters may be inside a quote block and can also have different, both separators and glues and prefixes, ie. a ? mark for the separator.
So far, since I'm really bad with RegEx, and still learning it, is this:
/(-[a-zA-Z]+)/gui
With which I can get all the parameters starting with an -...
I can go to manually explode the entire thing and parse it manually, but there are way too many contingencies to think about.
You can try this that uses the branch reset feature (?|...|...) to deal with the different possible formats of the values:
$str = '-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"';
$pattern = '~ --?(?<key> [^= ]+ ) [ =]
(?|
" (?<value> [^\\\\"]*+ (?s:\\\\.[^\\\\"]*)*+ ) "
|
([^ ?"]*)
)~x';
preg_match_all ($pattern, $str, $matches);
$result = array_combine($matches['key'], $matches['value']);
print_r($result);
demo
In a branch reset group, the capture groups have the same number or the same name in each branch of the alternation.
This means that (?<value> [^\\\\"]*+ (?s:\\\\.[^\\\\"]*)*+ ) is (obviously) the value named capture, but that ([^ ?"]*) is also the value named capture.
You could use
--?
(?P<key>\w+)
(?|
=(?P<value>[^-\s?"]+)
|
\h+"(?P<value>.*?)(?<!\\)"
|
\h+(?P<value>\H+)
)
See a demo on regex101.com.
Which in PHP would be:
<?php
$data = <<<DATA
-Parameter1=1234 -Parameter2=38518 -param3 "Test \"escaped\"" -param4 10 -param5 0 -param6 "TT" -param7 "Seven" -param8 "secret" "-SuperParam9=4857?--SuperParam10=123"
DATA;
$regex = '~
--?
(?P<key>\w+)
(?|
=(?P<value>[^-\s?"]+)
|
\h+"(?P<value>.*?)(?<!\\\\)"
|
\h+(?P<value>\H+)
)~x';
if (preg_match_all($regex, $data, $matches)) {
$result = array_combine($matches['key'], $matches['value']);
print_r($result);
}
?>
This yields
Array
(
[Parameter1] => 1234
[Parameter2] => 38518
[param3] => Test \"escaped\"
[param4] => 10
[param5] => 0
[param6] => TT
[param7] => Seven
[param8] => secret
[SuperParam9] => 4857
[SuperParam10] => 123
)

Regexp that matches these strings

I want to filter strings that I have in an csv file, and I'm looking for a correct regexp that matches these strings:
PLP_LES_HALLES.VOLUME_POMPE
Newyork:Flow(m3/h)
In fact, the string should not contain any characters like : ç & é # ! ? “ ' ³ = + etc.
I tried this one :
([a-zA-Z0-9_:.(\/)]*) but when I tested it, I figured out that it matches everything. Kindly help me to find the correct one.
Here is my code to test:
while (($line = fgetcsv($handle, 1024, ";")) !== FALSE) {
$total = count( $line );
$keys = array('mesure', 'timestamp', 'value');
$args=array(
'mesure' => array('filter' => FILTER_VALIDATE_REGEXP,
'options' => array('regexp' => '([a-zA-Z0-9_:.(\/)]*)')),
'timestamp' => array( 'filter' => FILTER_VALIDATE_INT,
'options' => array('min_range' => 20000000000000, 'length' => 14)),
'value' => FILTER_VALIDATE_FLOAT);
$testing = filter_var_array(array_combine($keys, $line), $args);
var_dump($testing);
}
EDIT
These strings should not match:
PLP_LES_HALLéS.VOLUME_POMPE
PLP_LES_HàLLES.VOLUME_POMPE
Newyork:Flow(m³/h)
To sum up, all strings that have any characters from the list ç & é # ! ? “ ' ³ = + etc` should not match
Your regex does not match the whole string, and you are using ambiguous regex delimiter, it is recommended to use more common symbols as regex delimiters.
'/^[a-zA-Z0-9_:.()\/]*$/'
^^ ^^
The ^ will match the start of the string, and $ will match its end, requiring a whole string match.
Also, [a-zA-Z0-9_] can be written as \w, use it to shorten the pattern (this is not recommended only if you do not want to match Unicode strings):
'/^[\w:.()\/]*$/'

How to replace a substring with help of preg_replace

I have a string that consists of repeated words. I want to replace a substring 'OK' located between 'L3' and 'L4'. Below you can find my code:
$search = "/(?<=L3).*(OK).*(?=L4)/";
$replace = "REPLACEMENT";
$subject = "'L1' => ('Vanessa', 'Prague', 'OK'), 'L2' => ('Alex', 'Paris', 'OK'), 'L3' => ('Paul', 'Paris', 'OK'), 'L4' => ('John', 'Madrid', 'OK')";
$str = preg_replace($search, $replace, $str);
If I use that pattern with preg_match, it finds a correct substring(third 'OK'). However, when I apply that pattern to preg_replace, it replaces substring that matches the full pattern, instead of the parenthesized subpattern.
So could you please give me an advice what I should change in my code? I know that there are plenty amount of similar questions about regex, but as I understand my pattern is correct and I'm only confused with preg_replace function
It is true that your regex matches a place in the string that is preceded with L3 then contains the last OK substring after 0+ chars other than linebreak symbols and then matches any 0+ chars up to the place followed with L4. See your regex demo.
A possible solution is to use 2 capturing groups around the subpatterns before and after the OK, and use backreferences in the replacement pattern:
$search = "/(L3.*?)OK(.*?L4)/";
$replace = "REPLACEMENT";
$subject = "'L1' => ('Vanessa', 'Prague', 'OK'), 'L2' => ('Alex', 'Paris', 'OK'), 'L3' => ('Paul', 'Paris', 'OK'), 'L4' => ('John', 'Madrid', 'OK')";
$str = preg_replace($search, '$1'.$replace.'$2', $subject);
echo $str; // => 'L1' => ('Vanessa', 'Prague', 'OK'), 'L2' => ('Alex', 'Paris', 'OK'), 'L3' => ('Paul', 'Paris', 'REPLACEMENT'), 'L4' => ('John', 'Madrid', 'OK')
See the PHP demo
If there cannot be any L3.5 in between L3 and L4, the (L3.*?)OK(.*?L4) pattern is safe to use. It will match and capture L3 and then 0+ chars other than a linebreak up to the first OK, then will match OK, and then will match and capture 0+ chars up to the first L4.
If there can be no L4, use a (?:(?!L4).)* tempered greedy token matching any symbol other than a linebreak symbol that is not starting an L4 sequence:
'~(L3(?:(?!L4).)*)OK~'
See the regex demo
NOTE: If you want to make the regexps safer, add ' around L# inside the patterns.

Stop regex splitting a matched url with preg_split

Given the following code:
$regex = '/(http\:\/\/|https\:\/\/)([a-z0-9-\.\/\?\=\+_]*)/i';
$text = preg_split($regex, $note, -1, PREG_SPLIT_DELIM_CAPTURE);
its returning an array such as:
array (size=4)
0 => string '...' (length=X)
1 => string 'https://' (length=8)
2 => string 'duckduckgo.com/?q=how+much+wood+could+a+wood-chuck+chuck+if+a+wood-chuck+could+chuck+wood' (length=89)
3 => string '...' (length=X)
I would prefer it if the returned array had size=3, with one single URL. Is this possible?
Sure that can be done, just remove those extra matching groups from your regex. Try following code:
$regex = '#(https?://[a-z0-9.?=+_-]*)#i';
$text = preg_split($regex, $note, -1, PREG_SPLIT_DELIM_CAPTURE);
Now resulting array will have 3 elements in the array instead of 4.
Besides removing extra grouping I have also simplified your regex also since most of the special characters don't need to be escaped inside character class.

Negotiate arrays inside an array

When i perform a regular expression
preg_match_all('~(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)~', $content, $turls);
print_r($turls);
i got an array inside array. I need a single array only.
How to negotiate the arrays inside another arrays
By default preg_match_all() uses PREG_PATTERN_ORDER flag, which means:
Orders results so that $matches[0] is
an array of full pattern matches,
$matches1 is an array of strings
matched by the first parenthesized
subpattern, and so on.
See http://php.net/preg_match_all
Here is sample output:
array(
0 => array( // Full pattern matches
0 => 'http://www.w3.org/TR/html4/strict.dtd',
1 => ...
),
1 => array( // First parenthesized subpattern.
// In your case it is the same as full pattern, because first
// parenthesized subpattern includes all pattern :-)
0 => 'http://www.w3.org/TR/html4/strict.dtd',
1 => ...
),
2 => array( // Second parenthesized subpattern.
0 => 'www.w3.org',
1 => ...
),
...
)
So, as R. Hill answered, you need $matches[0] to access all matched urls.
And as budinov.com pointed, you should remove outer parentheses to avoid second match duplicate first one, e.g.:
preg_match_all('~https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?~', $content, $turls);
// where $turls[0] is what you need
Not sure what you mean by 'negociate'. If you mean fetch the inner array, that should work:
$urls = preg_match_all('~(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)~', $content, $matches) ? $matches[0] : array();
if ( count($urls) ) {
...
}
Generally you can replace your regexp with one that doesn't contain parenthesis (). This way your results will be hold just in the $turls[0] variable :
preg_match_all('/https?\:\/\/[^\"\'\s]+/i', file_get_contents('http://www.yahoo.com'), $turls);
and then do some code to make urls unique like this:
$result = array_keys(array_flip($turls[0]));

Categories