Understanding preg_match formula

Understanding preg_match formula - php

I anyhow got my stuff working with following line, but I really could not understand.
if (preg_match_all('/[^=]*=([^;#]*)/', shell_exec("/home/technoworld/Desktop/test/b '$test'"),$matches))
{
$x = (int) $matches[1][0]; //optionally cast to int
$y = (int) $matches[1][1];
$pcount= round((100*$x)/($x+$y),2);
$ncount= round((100*$y)/($x+$y),2);
}
b is executable file, which gives result something like x=10 and y=20
Can some one explain me whatever inside if()

This: /[^=]*=([^;#]*)/ collects all ...=... things to the $matches array.
[^=] means any character except =
[^;#] means any character except ; and #
() means collect it into $matches explicitly
The $pcount/$ncount makes percent from the values showing theirs ratio.

Pattern details:
[^=]* # is a negated character class that means "all characters except ="
# * is quantifier that means zero or more times
# note that it seems more logical to replace it with +
# that means 1 or more times
= # literal =
( # open the capturing group 1
[^;#]* # all characters except ";" and "#", zero or more times
# (same notice)
) # close the capturing group 1

Related

PHP: Validate a given string is a valid number

I am trying to write a function that validates whether or not a given string is a valid number. I know I can use PHP is_numeric(), but the requirements is that the function needs to recognize commas as valid when:
Commas are in the whole number part of the number
Each comma has 3 whole number digits following it
At least one digit left of each comma
No more than 3 contiguous digits to the left of a comma
For instance:
It should recognize: 1,000,230 not ,021,201 or 1,023,12
It should also recognize positive and negative and dollar sign in front of it.
I am thinking to use preg_match to check the number format but I am not familiar with preg_match. Can you help me with this ? Any tip is appreciated ! Thank you !

No regex needed. Strip commas, then reformat with number_format. If that matches your original input, you're good.
if (number_format(str_replace(',', '', $number)) === $number) {
// pass
}
You can adjust how you want to handle decimals by providing a second argument to number_format().

You could use this function:
function hasNumericFormat($str) {
return preg_match("/^[-+]?[$]?([1-9]\d{0,2}(,\d{3})*|0)(\.\d+)?$/", $str);
}
Test code:
function test($str) {
if (hasNumericFormat($str)) {
echo "$str is OK<br>";
} else {
echo "$str violates numerical format<br>";
}
}
test("-$1,234,567.89"); // ok
test(",123,567.89"); // not ok: digit missing before comma
test("1,123,56"); // not ok: not 3 digits in last group
test("-0,123,567"); // not ok: first digit is zero
test("+0"); // ok
See it run on eval.in

The Intl library's NumberFormatter may be sufficient for you.
The NumberFormatter class
NumberFormatter::parse
NOTE: You may need to install or enable the Int library extension to use NumberFormatter class.
To enable the extension, check your php.ini file(C:\xampp\php\php.ini) and search for extension=intl. Remove the prepended ; if it exists. Restart Apache server and give it a try.
This extension is bundled with PHP as of PHP version 5.3.0.
/**
* Parse a number.
*
* #param string $locale - The locale of the formatted number.
* #param string $number - Formatted number.
* #return mixed - The value of the parsed number or FALSE on error.
*/
function parse_number(string $number, $locale)
{
return (new NumberFormatter($locale, NumberFormatter::DECIMAL))
->parse(trim($number));
}
echo parse_number('4.678.567.345,3827', 'de_DE'); // 4678567345.3827
echo parse_number('4.678567.345,3827', 'de_DE'); // FALSE
echo parse_number('4,678,567,345.3827', 'en_US'); // 4678567345.3827
echo parse_number(',678567,345.3827', 'en_US'); // FALSE

I will write it like this:
if (preg_match('~\A(?>[1-9][0-9]{0,2}(?:,[0-9]{3})*|0)(?:\.[0-9]+)?\z~', $number)) {
// true
} else {
// false
}
details:
~ # my favourite pattern delimiter
\A # start of the string
(?> # group for integer part:
# I made it atomic to fail faster when there aren't exactly 3 digits
# between commas
[1-9] [0-9]{0,2} # beginning doesn't start with a zero
(?:,[0-9]{3})* # zero or more 3 digits groups
| # OR
0 # zero alone
)
(?: # optional group for decimals
\.[0-9]+ # at least 1 digit
)?
\z # end of the string
~
Note that I have chosen to not allow numbers with a leading zero like 012, numbers without an integer part like .12 or integers with a dot like 12. but feel free to edit the pattern if it isn't what you want.

preg_match_all in php showing result blank

How do i match this with REGEXP and PHP ?
"s:6:\"[\"50\"]\";",
"s:5:\"[\"1\"]\";"
I want to match numbers between : [\"50\"] this only or could be one or more.
I have a pattern and want to take only numbers from json_encode value also serialize() in php this is code :
$result = [];
foreach($impressions as $impression) {
preg_match_all('/\x5C/', $impression->subcategories, $result);
}
return $result;
if no preg_match then here is result :
"s:6:\"[\"50\"]\";",
"s:5:\"[\"1\"]\";"
I am using this to match only digit where \ is so i can take number only like 50 or 1
Any idea how i can pic number with regular expressions ? value hex not works '/\x5C/' showing me result blank but here : Works fine if i put result and test with same REGEXP.

First of all, you can not go through an array of strings that way with preg_match_all – your $result array gets overwritten in each loop iteration.
And then, you need to capture the numbers you want to see in your result set. To do that, you must mask the [, ] and \ characters each with another \ – and then capture the digits in the middle by putting them in ( and )
$impressions[] = "s:6:\"[\"50\"]\";";
$impressions[] = "s:5:\"[\"1\"]\";";
foreach($impressions as $impression) {
preg_match_all('#\[\\"([0-9]+)\\"\]#', $impression, $matches); // I chose # as delimiter
// here – with so many \ involved, we don’t need / around it to add to the confusion
$results[] = $matches; // $matches will be overwritten in each iteration, so we
// preserve its content here by putting it into the $results array
}
var_dump($results);

Regular expression to find a value and return it in PHP

I've been trying to figure this out for 2 hours now with no success. Its a bit complicated for me i guess.
I am trying to parse a script file in PHP and return some values to the user. The ones i want are like this:
_value = object runFunction blah blah blah
Basically what i want is (in an algorithm):
IF case-insensitive runFunction is found in the line (because it might be runfunction)
AND the line starts with _ (underscore) (or if possible before the = there is a value that starts with _ to be sure of the result)
THEN return that underscore value before the = to me.
Usually 99.9% the format is like this...But there are small cases it can be like this:
_value = _object runFunction blah blah blah
(in case the _ after the = messes things up).
Any help here :) ?
Thanks

try something like:
$str = 'YOUR FILE CONTENTS HERE';
$match = preg_match_all('/(_[a-zA-Z0-9_]+) ?= ?[a-zA-Z0-9_]+ runFunction/s',$str,$matches);
var_dump($matches);
you'll probably need to add the multiline flag.

How about
if (preg_match('/^_([^=]+?)(?=\s*=).*runfunction/im', $subject, $regs)) {
$result = $regs[1];
} else {
$result = "";
}
You can exclude the initial "start" anchor is your underscore might not be at the beginning of the line
Here is the regex by itself. The results are in capturing group 1
^_([^=]+?)(?=\s*=).*runfunction
The regex
look for beginning of line
match the first underscore
capture everything that is not an '=' into capturing group 1
provided it is followed by 0 or more spaces and an equal sign.
then capture everything up to a runfunction.
Case insensitive and multiline options need to be set
If the first underscore does not need to be at the beginning of the line, eliminate the anchor.

Regex with possible empty matches and multi-line match

I've been trying to "parse" some data using a regex, and I feel as if I'm close, but I just can't seem to bring it all home.
The data that needs parsing generally looks like this: <param>: <value>\n. The number of params can vary, just as the value can. Still, here's an example:
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_ using `\`), and even basic markup!
To push this text into an object, I put together this little expresion
if (preg_match_all('/^([^:\n\\]+):\s*(.+)/m', $this->structuredMessage, $data))
{
$data = array_combine($data[1], $data[2]);
//$data is assoc array FooID => 123456, Name => Chuck, ...
$report = new Report($data);
}
Now, this works allright most of the time, except for the User Message bit: . doesn't match new lines, because if I were to use the s flag, the second group would match everything after FooID: till the very end of the string.
I'm having to use a dirty workaround for that:
$msg = explode(end($data[1], $string);
$data[2][count($data[2])-1] = array_pop($msg);
After some testing, I've come to understand that sometimes, one or two of the parameters aren't filled in (for example the InternalID can be empty). In that case, my expression doesn't fail, but rather results in:
[1] => Array
(
[0] => FooID
[1] => Name
[2] => When
[3] => InternalID
)
[2] => Array
(
[0] => 123465
[1] => Chuck
[2] => 01/02/2013 01:23:45
[3] => User Comment: Hello,
)
I've been trying various other expressions, and came up with this:
/^([^:\n\\]++)\s{0,}:(.*+)(?!^[^:\n\\]++\s{0,}:)/m
//or:
/^([^:\n\\]+)\s{0,}:(.*)(?!^[^:\\\n]+\s{0,}:)/m
The second version being slightly slower.
That solves the issues I had with InternalID: <void>, but still leaves me with the final obstacle: User Message: <multi-line>. Using the s flag doesn't do the trick with my expression ATM.
I can only think of this:
^([^:\n\\]++)\s{0,}:((\n(?![^\n:\\]++\s{0,}:)|.)*+)
Which is, to my eye at least, too complex to be the only option. Ideas, suggestions, links, ... anything would be greatly appreciated

The following regex should work, but I'm not so sure anymore if it is the right tool for this:
preg_match_all(
'%^ # Start of line
([^:]*) # Match anything until a colon, capture in group 1
:\s* # Match a colon plus optional whitespace
( # Match and capture in group 2:
(?: # Start of non-capturing group (used for alternation)
.*$ # Either match the rest of the line
(?= # only if one of the following follows here:
\Z # The end of the string
| # or
\r?\n # a newline
[^:\n\\\\]* # followed by anything except colon, backslash or newline
: # then a colon
) # End of lookahead
| # or match
(?: # Start of non-capturing group (used for alternation/repetition)
[^:\\\\] # Either match a character except colon or backslash
| # or
\\\\. # match any escaped character
)* # Repeat as needed (end of inner non-capturing group)
) # End of outer non-capturing group
) # End of capturing group 2
$ # Match the end of the line%mx',
$subject, $result, PREG_PATTERN_ORDER);
See it live on regex101.

i'm pretty new to PHP so maybe this is totally out of whack, but maybe you could use something like
$data = <<<EOT
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n's. It can be empty, too
EOT;
if ($key = preg_match_all('~^[^:\n]+?:~m', $data, $match)) {
$val = explode('¬', preg_filter('~^[^:\n]+?:~m', '¬', $data));
array_shift($val);
$res = array_combine($match[0], $val);
}
print_r($res);
yields
Array
(
[FooID:] => 123456
[Name:] => Chuck
[When:] => 01/02/2013 01:23:45
[InternalID:] => 789654
[User Message:] => Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of
's. It can be empty, too
)

So here's what I came up with using a tricky preg_replace_callback():
$string ='FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID: 789654
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
And can start with any number of \n\'s. It can be empty, too
Yellow:cool';
$array = array();
preg_replace_callback('#^(.*?):(.*)|.*$#m', function($m)use(&$array){
static $last_key = ''; // We are going to use this as a reference
if(isset($m[1])){// If there is a normal match (key : value)
$array[$m[1]] = $m[2]; // Then add to array
$last_key = $m[1]; // define the new last key
}else{ // else
$array[$last_key] .= PHP_EOL . $m[0]; // add the whole line to the last entry
}
}, $string); // Anonymous function used thus PHP 5.3+ is required
print_r($array); // print
Online demo
Downside: I'm using PHP_EOL to add newlines which is OS related.

I think I'd avoid using regex to do this task, instead split it into sub-tasks.
Basic algorithm outline
Split the string on \n using explode
Loop over the resulting array
Split the resulting strings on : also using explode with a limit of 2.
If the produced array's length is less than 2, add the entirety of the data to the previous key's value
Else, use the first array index as your key, the second as the value unless the split colon was escaped (in which case, instead add the key + split + value to the previous key's value)
This algorithm does assume there are no keys with escaped colons. Escaped colons in values will be dealt with just fine (i.e. user input).
Code
$str = <<<EOT
FooID: 123456
Name: Chuck
When: 01/02/2013 01:23:45
InternalID:
User Message: Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
This\: works too. And can start with any number of \\n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_
using `\`) like so `\:`, and even basic markup!
EOT;
$arr = explode("\n", $str);
$prevKey = '';
$split = ': ';
$output = array();
for ($i = 0, $arrlen = sizeof($arr); $i < $arrlen; $i++) {
$keyValuePair = explode($split, $arr[$i], 2);
// ?: Is this a valid key/value pair
if (sizeof($keyValuePair) < 2 && $i > 0) {
// -> Nope, append the value to the previous key's value
$output[$prevKey] .= "\n" . $keyValuePair[0];
}
else {
// -> Maybe
// ?: Did we miss an escaped colon
if (substr($keyValuePair[0], -1) === '\\') {
// -> Yep, this means this is a value, not a key/value pair append both key and
// value (including the split between) to the previous key's value ignoring
// any colons in the rest of the string (allowing dates to pass through)
$output[$prevKey] .= "\n" . $keyValuePair[0] . $split . $keyValuePair[1];
}
else {
// -> Nope, create a new key with a value
$output[$keyValuePair[0]] = $keyValuePair[1];
$prevKey = $keyValuePair[0];
}
}
}
var_dump($output);
Output
array(5) {
["FooID"]=>
string(6) "123456"
["Name"]=>
string(5) "Chuck"
["When"]=>
string(19) "01/02/2013 01:23:45"
["InternalID"]=>
string(0) ""
["User Message"]=>
string(293) "Hello,
this is nillable, but can be quite long. Text can be spread out over many lines
This\: works too. And can start with any number of \n's. It can be empty, too.
What's worse, though is that this CAN contain colons (but they're _"escaped"_
using `\`) like so `\:`, and even basic markup!"
}
Online demo

Regular Expressions: how to do "option split" replaces

those reqular expressions drive me crazy. I'm stuck with this one:
test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not
Task:
Remove all [[ and ]] and if there is an option split choose the later one so output should be:
test1:link test2:silver test3:out1insideout2 test4:this|not
I came up with (PHP)
$text = preg_replace("/\\[\\[|\\]\\]/",'',$text); // remove [[ or ]]
this works for part1 of the task. but before that I think I should do the option split, my best solution:
$text = preg_replace("/\\[\\[(.*\|)(.*?)\\]\\]/",'$2',$text);
Result:
test1:silver test3:[[out1[[inside]]out2]] this|not
I'm stuck. may someone with some free minutes help me? Thanks!

I think the easiest way to do this would be multiple passes. Use a regular expression like:
\[\[(?:[^\[\]]*\|)?([^\[\]]+)\]\]
This will replace option strings to give you the last option from the group. If you run it repeatedly until it no longer matches, you should get the right result (the first pass will replace [[out1[[inside]]out2]] with [[out1insideout2]] and the second will ditch the brackets.
Edit 1: By way of explanation,
\[\[ # Opening [[
(?: # A non-matching group (we don't want this bit)
[^\[\]] # Non-bracket characters
* # Zero or more of anything but [
\| # A literal '|' character representing the end of the discarded options
)? # This group is optional: if there is only one option, it won't be present
( # The group we're actually interested in ($1)
[^\[\]] # All the non-bracket characters
+ # Must be at least one
) # End of $1
\]\] # End of the grouping.
Edit 2: Changed expression to ignore ']' as well as '[' (it works a bit better like that).
Edit 3: There is no need to know the number of nested brackets as you can do something like:
$oldtext = "";
$newtext = $text;
while ($newtext != $oldtext)
{
$oldtext = $newtext;
$newtext = preg_replace(regexp,replace,$oldtext);
}
$text = $newtext;
Basically, this keeps running the regular expression replace until the output is the same as the input.
Note that I don't know PHP, so there are probably syntax errors in the above.

This is impossible to do in one regular expression since you want to keep content in multiple "hierarchies" of the content. It would be possible otherwise, using a recursive regular expression.
Anyways, here's the simplest, most greedy regular expression I can think of. It should only replace if the content matches your exact requirements.
You will need to escape all backslashes when putting it into a string (\ becomes \\.)
\[\[((?:[^][|]+|(?!\[\[|]])[^|])++\|?)*]]
As others have already explained, you use this with multiple passes. Keep looping while there are matches, performing replacement (only keeping match group 1.)
Difference from other regular expressions here is that it will allow you to have single brackets in the content, without breaking:
test1:[[link]] test2:[[gold|si[lv]er]]
test3:[[out1[[in[si]de]]out2]] test4:this|not
becomes
test1:[[link]] test2:si[lv]er
test3:out1in[si]deout2 test4:this|not

Why try to do it all in one go. Remove the [[]] first and then deal with options, do it in two lines of code.
When trying to get something going favour clarity and simplicity.
Seems like you have all the pieces.

Why not just simply remove any brackets that are left?
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$str = preg_replace('/\\[\\[(?:[^|\\]]+\\|)+([^\\]]+)\\]\\]/', '$1', $str);
$str = str_replace(array('[', ']'), '', $str);

Well, I didn't stick to just regex, because I'm of a mind that trying to do stuff like this with one big regex leads you to the old joke about "Now you have two problems". However, give something like this a shot:
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not'; $reg = '/(.*?):(.*?)( |$)/';
preg_match_all($reg, $str, $m);
foreach($m[2] as $pos => $match) {
if (strpos($match, '|') !== FALSE && strpos($match, '[[') !== FALSE ) {
$opt = explode('|', $match); $match = $opt[count($opt)-1];
}
$m[2][$pos] = str_replace(array('[', ']'),'', $match );
}
foreach($m[1] as $k=>$v) $result[$k] = $v.':'.$m[2][$k];

This is C# using only using non-escaped strings, hence you will have to double the backslashes in other languages.
String input = "test1:[[link]] " +
"test2:[[gold|silver]] " +
"test3:[[out1[[inside]]out2]] " +
"test4:this|not";
String step1 = Regex.Replace(input, #"\[\[([^|]+)\|([^\]]+)\]\]", #"[[$2]]");
String step2 = Regex.Replace(step1, #"\[\[|\]\]", String.Empty);
// Prints "test1:silver test3:out1insideout2 test4:this|not"
Console.WriteLine(step2);

$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$s = preg_split("/\s+/",$str);
foreach ($s as $k=>$v){
$v = preg_replace("/\[\[|\]\]/","",$v);
$j = explode(":",$v);
$j[1]=preg_replace("/.*\|/","",$j[1]);
print implode(":",$j)."\n";
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Understanding preg_match formula - php

This: /[^=]=([^;#])/ collects all ...=... things to the $matches array. [^=] means any character except = [^;#] means any character except ; and # () means collect it into $matches explicitly The $pcount/$ncount makes percent from the values showing theirs ratio.

Related

PHP: Validate a given string is a valid number

preg_match_all in php showing result blank

Regular expression to find a value and return it in PHP

Regex with possible empty matches and multi-line match

Regular Expressions: how to do "option split" replaces

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Understanding preg_match formula - php

This: /[^=]*=([^;#]*)/ collects all ...=... things to the $matches array. [^=] means any character except = [^;#] means any character except ; and # () means collect it into $matches explicitly The $pcount/$ncount makes percent from the values showing theirs ratio.

Related

PHP: Validate a given string is a valid number

preg_match_all in php showing result blank

Regular expression to find a value and return it in PHP

Regex with possible empty matches and multi-line match

Regular Expressions: how to do "option split" replaces

Categories

Resources

This: /[^=]=([^;#])/ collects all ...=... things to the $matches array. [^=] means any character except = [^;#] means any character except ; and # () means collect it into $matches explicitly The $pcount/$ncount makes percent from the values showing theirs ratio.