php: "sscanf" to 'consume' a string but allows a missing parameter - php

This is for an osCommerce contribution called
("Automatically add multiple products with attribute to cart from external source")
This existing code uses sscanf to 'explode' a string that represents a
- product ID,
- a productOption,
- and quantity:
sscanf('28{8}17[1]', '%d{%d}%d[%f]',
$productID, // 28
$productOptionID, $optionValueID, //{8}17 <--- Product Options!!!
$productQuantity //[1]
);
This works great if there is only 1 'set' of Product Options (e.g. {8}17).
But this procedure needs to be adapted so that it can handle multiple Product Options, and put them into an array, e.g.:
'28{8}17{7}15{9}19[1]' //array(8=>17, 7=>15, 9=>19)
OR
'28{8}17{7}15[1]' //array(8=>17, 7=>15)
OR
'28{8}17[1]' //array(8=>17)
Thanks in advance. (I'm a pascal programmer)

You should not try to do complex recursive parses with one sscanf. Stick it in a loop. Something like:
<?php
$str = "28{8}17{7}15{9}19[1]";
#$str = "28{8}17{7}15[1]";
#$str = "28{8}17[1]";
sscanf($str,"%d%s",$prod,$rest);
printf("Got prod %d\n", $prod);
while (sscanf($rest,"{%d}%d%s",$opt,$id,$rest))
{
printf("opt=%d id=%d\n",$opt,$id);
}
sscanf($rest,"[%d]",$quantity);
printf("Got qty %d\n",$quantity);
?>

Maybe regular expressions may be interesting
$a = '28{8}17{7}15{9}19[1]';
$matches = null;
preg_match_all('~\\{[0-9]{1,3}\\}[0-9]{1,3}~', $a, $matches);
To get the other things
$id = (int) $a; // ;)
$quantity = substr($a, strrpos($a, '[')+ 1, -1);
According the comment a little update
$a = '28{8}17{7}15{9}19[1]';
$matches = null;
preg_match_all('~\\{([0-9]{1,3})\\}([0-9]{1,3})~', $a, $matches, PREG_SET_ORDER);
$result = array();
foreach ($matches as $entry) {
$result[$entry[1]] = $entry[2];
}

sscanf() is not the ideal tool for this task because it doesn't handle recurring patterns and I don't see any real benefit in type casting or formatting the matched subexpressions.
If this was purely a text extraction task (in other words your incoming data was guaranteed to be perfectly formatted and valid), then I could have recommended a cute solution that used strtr() and parse_str() to quickly generate a completely associative multi-dimensional output array.
However, when you commented "with sscanf I had an infinite loop if there is a missing bracket in the string (because it looks for open and closing {}s). Or if I leave out a value. But with your regex solution, if I drop a bracket or leave out a value", then this means that validation is an integral component of this process.
For that reason, I'll recommend a regex pattern that both validates the string and breaks the string into its meaningful parts. There are several logical aspects to the pattern but the hero here is the \G metacharacter that allows the pattern to "continue" matching where the pattern last finished matching in the string. This way we have an array of continuous fullstring matches to pull data from when creating your desired multidimensional output.
The pattern ^\d+(?=.+\[\d+]$)|\G(?!^)(?:{\K\d+}\d+|\[\K\d(?=]$)) in preg_match_all() generates the following type of output in the fullstring element ([0]):
[id], [option0, option1, ...](optional), [quantity]
The first branch in the pattern (^\d+(?=.+\[\d+]$)) validates the string to start with the id number and ends with a square brace wrapped number representing the quantity.
The second branch begins with the "continue" character and contains two logical branches itself. The first matches an option expression (and forgets the leading { thanks to \K) and the second matches the number in the quantity expression.
To create the associative array of options, target the "middle" elements (if there are any), then split the strings on the lingering } and assign these values as key-value pairs.
This is a direct solution because it only uses one preg_ call and it does an excellent job of validating and parsing the variable length data.
Code: (Demo with a battery of test cases)
if (!preg_match_all('~^\d+(?=.+\[\d+]$)|\G(?!^)(?:{\K\d+}\d+|\[\K\d(?=]$))~', $test, $m)) {
echo "invalid input";
} else {
var_export(
[
'id' => array_shift($m[0]),
'quantity' => array_pop($m[0]),
'options' => array_reduce(
$m[0],
function($result, $string) {
[$key, $result[$key]] = explode('}', $string, 2);
return $result;
},
[]
)
]
);
}

Related

Get everything after specific word and before specific word in PHP

Take a look this string: (parent)item category(child)master data(name)category
by the way, that string is dynamic, and I want word inside () as array key and everything after () is that key value before next ()
how can I get the array result from the string above to this: ["parent" => "item category", "child" => "master data", "name" => "category"]?
This probably is what you are looking for:
<?php
$input = "(parent)item category(child)master data(name)category";
preg_match_all('/\(([^()]+)\)([^()]+)/', $input, $matches);
$output = array_combine($matches[1], $matches[2]);
print_r($output);
The output obviously is:
Array
(
[parent] => item category
[child] => master data
[name] => category
)
The approach uses a "regular expression" matching all occurrences of a pattern in the input string. All that is left is to combine the matched tokens which is done by the array_combine(...) call.
Note that such an approach works, but is very limited. It fails with more complex input structure due to the fact that pattern matching based on regular expressions is limited itself. In such cases you'd either have to implement a real language parser (or use a compiler-compiler like yacc or bison to do that for you). Or you simplify your input data structure which usually is more promising ;-)
you can use explode to get an array based on the selected word
<?php
$str = "(parent)item category(child)master data(name)category";
$list = explode("(", $str);
$x = [];
foreach($list as $item){
if($item != null) {
$i = explode(")",$item);
$x[$i[0]] = $i[1];
}
}
print_r($x);

Efficient way to parse this string into array in PHP?

Background
I have an array which I create by splitting a string based on every occurrence of 0d0a using preg_split('/(?<=0d0a)(?!$)/').
For example:
$string = "78781110d0a78782220d0a";
will be split into:
Array ( [0] => 78781110d0a [1] => 78782220d0a )
A valid array element has to start with 7878 and end with 0d0a.
The Problem
But sometimes, there's an additional 0d0a in the string which splits into an extra and invalid array element, i.e., that doesn't begin with 7878.
Take this string for example:
$string = "78781110d0a2220d0a78783330d0a";
This is split into:
Array ( [0] => 78781110d0a [1] => 2220d0a [2] => 78783330d0a )
But it should actually be:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a)
My Solution
I've written the following (messy) code to get around this:
$data = Array('78781110d0a','2220d0a','78783330d0a');
$i = 0; //count for $data array;
$j = 0; //count for $dataFixed array;
$dataFixed = $data;
foreach($data as $packet) {
if (substr($packet,0,4) != "7878") { //if packet doesn't start with 7878, do some fixing
if ($i != 0) { //its the first packet, can't help it!
$j++;
if ((substr(strtolower($packet), -4, 4) == "0d0a")) { //if the packet doesn't end with 0d0a, its 'mostly' not valid, so discard it
$dataFixed[$i-$j] = $dataFixed[$i-$j] . $packet;
}
unset($dataFixed[$i-$j+1]);
$dataFixed = array_values($dataFixed);
}
}
$i++;
}
Description
I first copy the array to another array $dataFixed. In a foreach loop of the $data array, I check whether it starts with 7878. If it doesn't, I join it with the previous array in $data. I then unset the current array in $dataFixed and reset the array elements with array_values.
But I'm not very confident about this solution.. Is there a better, more efficient way?
UPDATE
What if the input string doesn't end in 0d0a like its supposed to? It will stick to the previous array element..
For e.g.: in the string 78781110d0a2220d0a78783330d0a0000, 0000 should be separated as another array element.
Use another positive lookahead (?=7878) to form:
preg_split('/(?<=0d0a)(?=7878)/',$string)
Note: I removed (?!$) because I wasn't sure what that was for, based on your example data.
For example, this code:
$string = "78781110d0a2220d0a78783330d0a";
$array = preg_split('/(?<=0d0a)(?=7878)(?!$)/',$string);
print_r($array);
Results in:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a )
UPDATE:
Based on your revised question of having possible random characters at the end of the input string, you can add three lines to make a complete program of:
$string = "78781110d0a2220d0a787830d0a330d0a0000";
$array = preg_split('/(?<=0d0a)(?=7878)/',$string);
$temp = preg_split('/(7878.*0d0a)/',$array[count($array)-1],null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$array[count($array)-1] = $temp[0];
if(count($temp)>1) { $array[] = $temp[1]; }
print_r($array);
We basically do the initial splitting, then split the last element of the resulting array by the expected data format, keeping the delimiter using PREG_SPLIT_DELIM_CAPTURE. The PREG_SPLIT_NO_EMPTY ensures we won't get an empty array element if the input string doesn't end in random characters.
UPDATE 2:
Based on your comment below where it seems you're implying there might be random characters between any of the desired matches, and you want these random characters preserved, you could do this:
$string = "0078781110d0a2220d0a2220d0a0000787830d0a330d0a000078781110d0a2220d0a0000787830d0a330d0a0000";
$split1 = preg_split('/(7878.*?0d0a)/',$string,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$result = array();
foreach($split1 as $e){
$split2 = preg_split('/(.*0d0a)/',$e,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
foreach($split2 as $el){
// test if $el doesn't start with 7878 and ends with 0d0a
if(strpos($el,'7878') !== 0 && substr($el,-4) == '0d0a'){
//if(preg_match('/^(?!7878).*0d0a$/',$el) === 1){
$result[ count($result)-1 ] = $result[ count($result)-1 ] . $el;
} else {
$result[] = $el;
}
}
}
print_r($result);
The strategy employed here is different than above. First we split the input string based on the delimiter that matches your desired data, using the nongreedy regex .*?. At this point we have some strings that contain the ending of a desired value and some garbage at the end, so we split again based on the last occurrence of "0d0a" with the greedy regex .*0d0a. We then append any of those resulting values that don't start with "7878" but end with "0d0a" to the previous value, as this should repair the first and second halves that got split because it contained an extra "0d0a".
I provided two methods for the innermost if statement, one using regular expressions. The regex one is marginally slower in my testing, so I've left that one commented out.
I might still not have your full requirements, so you'll have to let me know if it works and perhaps provided your full dataset.
I think you are using a delimiter "0d0a" which also happens to be part of a content! Its not possible to avoid getting junk data as long as delimiter can also be part of content. Somehow delimiter must be unique.
Possible solutions.
Change the delimited to something else that doesn't occur as part of your data ( 000000, #!.;)
If you are definite about length of text that easy arrange item may have, use it. As per examples its not possible.
Solutions given in answers considering only sample data you have shared. If you are confidant about what will be the content of string, then these solutions given by others are pretty good to use. Otherwise these solutions wont assure you guarantee!
Best solution: Fix right delimiter then use regex or explode whatever you prefer.
Why don't you use preg_match_all instead? You can avoid all of the non-capturing groups (the look aheads, look behinds) in order to split the string (which without the non-capturing groups removes the matches), and just find the matches you're looking for:
Updated
<?php
$string = "00787817878110d0a22278780d0a78783330d0a00";
preg_match_all('/7878.*?0d0a(?=7878|[^(7878)]*?$)/', $string, $arr);
print_r($arr);
?>
Gives an array $arr[0] => ( [0] => 787817878110d0a22278780d0a, [1] => 78783330d0a ). Strips leading and trailing garbage characters (whatever doesn't start with 7878 or end with 7878 or 0d0a.
So $arr[0] would be the array of values that you are looking for.
See example on ideone
Works with multiple 7878 values and multiple 0d0a values (even though that's ridiculous).
Update
If splitting is more your style, why not avoid regular expressions altogether?
<?php
$string = "787817878110d0a22278780d0a78783330d0a";
$arr = explode('0d0a7878', $string);
$string = implode('0d0a,7878', $arr);
$arr = explode(',', $string);
print_r($arr);
?>
Here we split the string by the delimiter 0d0a7878, which is what #CharlieGorichanaz's solution is doing, and props to him for the quick, accurate solution. We then add a comma, because who doesn't love comma separated values? And we explode again on the commas for an array of desired values. Performance-wise, this ought to be faster than using regular expressions. See example.

using preg_split to fetch terms between two markers in string

ok I have two strings.
(I use this for a language library system to allow translators to provide translations with placeholders).
In the first string, there are two instances. note that it's not always a single instance, some cases it will be none, one, two, or more.
This is a {[John Doe]} and this is {[Jane Doe]}
and then I have a string that is stored like this:
C'est {[1]} et c'est {[2]}
(translation)
This is a {[1]} and this is a {[2]}
so what I need to do is take the first string, replace everything between {[]} of the starting string and match each instance, i.e. first of first string with {[1]} of second string etc. keep in mind that the reason I am using {[1]} and {[2]} is because in some languages, terms may appear in a different order for gramatical accuracy, but are still terms that don't need translation them selves (names).
so the question is. how do I do this? am thinking preg_split and then match index+1 of each with the second string. that part I can handle. the problem I am having is getting the right regex search going..
this is as close as I could get it..
preg_split('/[(\{\[).*(\]\})]/', $str, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
that returns an array of everything before and after each instance of {[ and ]} when I am just trying to get the contents of inbetween the two..
EDIT: solution derived from NikiC's answer.
function lang($str){
$nwStr = $str;
preg_match_all('(\{\[(.+?)\]\})', $str, $placeholders);
foreach ($placeholders[0] as $mk => $match) {
$pos = $mk+1;
$nwStr = str_replace("$match","{[$pos]}",$nwStr);
}
$result = preg_replace_callback('(\{\[(\d+)\]\})', function ($matches) use ($placeholders) {
$n = $matches[1]-1;
return $placeholders[1][$n];
}, $translation);
return $result;
}
basically what i am doing here is first looping through to replace the matches with the placeholders so that I can match the proper placeholder text in my language files. (i.e. create the right label string out of the input string)
First grab the placeholders from the string:
preg_match_all('(\{\[(.+?)\]\})', $string, $matches);
$placeholders = $matches[1];
Now replace with a callback:
$result = preg_replace_callback('(\{\[(\d+)\]\})', function ($matches) use ($placeholders) {
$n = $matches[1] + 1;
return $placeholders[$n];
}, $translation);
You're almost there. PREG_SPLIT_DELIM_CAPTURE captures the groups between ( and ), so this:
preg_split('/(\{\[.*\]\})/U', $str, -1, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
should work better. I also added the U modifier so that * is ungreedy.
edit also, you have a pair of [ and ] which definitely don't belong there!
Another thing, you probably want to have the parts between the {[...]} construct, so this is better:
preg_split('/\{\[(.*)\]\}/U', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
By removing the PREG_SPLIT_NO_EMPTY, you now know for certain that you will find the tagged parts at odd indexes.

Get more backreferences from regexp than parenthesis

Ok this is really difficult to explain in English, so I'll just give an example.
I am going to have strings in the following format:
key-value;key1-value;key2-...
and I need to extract the data to be an array
array('key'=>'value','key1'=>'value1', ... )
I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:
/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/
to work with preg_match and this code:
for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
$parameters[$matches[$i]] = $matches[$i+1];
}
However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.
In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.
You can use a lookahead to validate the input while you extract the matches:
/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/
(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.
This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.
If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.
regex is powerful tool, but sometimes, its not the best approach.
$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
$e = explode("-",$k);
$array[$e[0]]=$e[1];
}
print_r($array);
Use preg_match_all() instead. Maybe something like:
$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';
preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$parameters[$match[1]] = $match[2];
}
print_r($parameters);
EDIT:
to first validate if the input string conforms to the pattern, then just use:
if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
EDIT2: the final semicolon is optional
if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.
what about this solution:
$samples = array(
"good" => "key-value;key1-value;key2-value;key5-value;key-value;",
"bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
"bad2" => "key;key1-value;key2-value;key5-value;key-value;",
"bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);
foreach($samples as $name => $value) {
if (preg_match("/^(\w+-\w+;)+$/", $value)) {
printf("'%s' matches\n", $name);
} else {
printf("'%s' not matches\n", $name);
}
}
I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.

Validate measurements with PHP

I need to validate measurements entered into a form generated by PHP.
I intend to compare them to upper and lower control limits and decide if they fail or pass.
As a first step, I imagine a PHP function which accepts strings representing engineering measurements and converts them to pure numbers before the comparison.
At the moment I'm only expecting measurements of small voltages and currents, so strings like
'1.234uA', '2.34 nA', '39.9mV'. or '-1.003e-12'
will be converted to
1.234e-6, 2.34e-9, 3.99e-2 and -1.003e-12, respectively.
But the method should be generalisable to any measured quantity.
function convert($value) {
$units = array('p' => 'e-12',
'n' => 'e-9',
'u' => 'e-6',
'm' => 'e-3');
$unitstring = implode("", array_keys($units));
$matches = array();
$pattern = "/^(-?(?:\\d*\.\\d+)|(?:\\d+))\s*([$unitstring])([a-z])$/i";
$result = preg_match($pattern, $value, $matches);
if ($result)
$retval = $matches[1].$units[$matches[2]].$matches[3];
else
$retval = $value;
return $retval;
}
So to explain what the above does:
$units is an array to map unit-prefix to the exponent.
$unitstring conglomerates the units into a single string (in the example it would be 'pnum')
The regular expression will match an optional -, followed by either 0 or more digits, a period and 1 or more digits OR 1 or more digits, followed by one of the unit prefixes (only one) and then a single alphabetical character. There can be any amount of whitespace between the number and the units.
Because of the parethesis and the use of preg_match, the number section, the unit prefix, and the unit are all separately captured in the array $matches as elements 1, 2, and 3. (0 will contain the entire string)
$result will be 1 if it matched the regex, 0 otherwise.
$retval is constructed by just connecting the number, the exponent (based on the unit prefix from the array) and the units provided, or it will just be the passed in string (such as if you're given the -1.003e-12, it will be returned)
Of course you can tweak some things, but in general this is a good start. Hope it helps.
In your function
first you need to initialize values for units like -6 for u, -3 for m...etc
divide the string in Number and Unit(i.e micro(u),mili(m),etc).
and then say the entered no is NUM; and unit is UNIT..(char like u,m etc);
while(NUM>10)
{
NUM=NUM/10;
x++; //x is keeping track of the DOT.
}
UNIT=UNIT+x; //i.e UNIT is increased(for M,K,etc) or decreased(for u,m,etc)
echo NUM.e.UNIT;
May be it will do!
My own possibly simple-minded approach has been to use an array of patterns in preg_replace
function convert($value) {
$result = preg_replace($patterns, $replacements, $value);
return $result;
}
Where
$patterns = array('/p[av]/i', '/n[av]/i', '/u[av]/i', '/m[av]/i');
$replacements = array('e-12', 'e-9', 'e-6', 'e-3');
And it could be extended to higher prefixes, but it seems heavy-handed to keep adding increasingly complex regexes to the $patterns array.
Edit: The comparison, later, should interpret the return value as a real number.
I'm hoping someone can suggest something more elegant.

Categories