PHP preg_match_all expression - php

I have virtually no experience of regx, but trying my best.
I have a string like this:
$fString = "Name=Sök,Value=2,Title=Combine me,Options=[Item1=1,Item2=2,Item3=3]";
I want to get an array looking like this:
Array[0] = "Name=Sök"
Array[1] = "Value=2"
Array[2] = "Title=Combine me"
Array[3] = "Options=[Item1=1,Item2=2,Item3=3]"
What I have managed to do so far is:
preg_match_all("/[^,]*[\w\d]*=[^,]*/",$fString,$Data);
But it I can't figure out how to fix the last "Option".
Array ( [0] => Array ( [0] => Name=S�k [1] => Value=2 [2] => Title=Combine me [3] => Options=[Item1=1 [4] => Item2=2 [5] => Item3=3] ) )
...and why is the result an array inside an array?!?
[EDIT]
I guess I need to explain the whole idea of what I'm trying to do here, I'm not sure I'm on the right track any more.
I have created some classes where I store all the "persistent" variables in an array. I have a function that serializes this array so I can be stored in a database.
I know all about the serialize() function, but I'm doing some filtering so I can't use it as it is, and I also prefer to have it more readable for manual editing. This array can have nested arrays within, that needs to be preserved. When I read it all back from the database, the original array must be created again.
I had it all working with the eval() command but stumbled into trouble where I had nested arrays because of the " or ' characters was breaking the main outer string. So this approach was an attempt to serialize everything without nested strings that needed to be preserved.
So if I can solve the nested data with preg_match_all I'm there, otherwise I need to come up with another solution.
I guess the data needs to be escaped as well, such as the , and [ ]

Here is a function that will do basically what you need:
function explode_me($str) {
$a = array();
$v = "";
$ignore = false;
for ($i = 0; $i < strlen($str); $i++) {
if ($str[$i] == ',' && !$ignore) {
$a[] = $v;
$v = "";
}
else if ($str[$i] == '[' && !$ignore) {
$ignore = true;
$v .= $str[$i];
}
else if ($str[$i] == ']' && $ignore) {
$ignore = false;
$v .= $str[$i];
}
else {
$v .= $str[$i];
}
}
$a[] = $v;
return $a;
}
To test it:
$str = "Name=Sök,Value=2,Title=Combine me,Options=[Item1=1,Item2=2,Item3=3]";
$a = explode_me($str);
print_r($a);
which prints:
Array
(
[0] => Name=Sök
[1] => Value=2
[2] => Title=Combine me
[3] => Options=[Item1=1,Item2=2,Item3=3]
)

(\w+)=(\[[^\]]+\]|[^,]+)
This breaks down as:
(\w+) # a word (store in match group 1)
= # the "=" character
( # begin match group 2
\[ # a "[" character
[^\]]+ # anything but "]" character
\] # a "]" character
| # or...
[^,]+ # anything but a comma
) # end match group 1
Apply with preg_match_all():
$fString = "Name=Sök,Value=2,Title=Combine me,Options=[Item1=1,Item2=2,Item3=3]";
$matches = array();
preg_match_all("/(\\w+)=(\\[[^\\]]+\\]|[^,]+)/", $fString, $matches);
Which results in something even more detailed than you wanted to have:
Array
(
[0] => Array
(
[0] => Name=Sök
[1] => Value=2
[2] => Title=Combine me
[3] => Options=[Item1=1,Item2=2,Item3=3]
)
[1] => Array
(
[0] => Name
[1] => Value
[2] => Title
[3] => Options
)
[2] => Array
(
[0] => Sök
[1] => 2
[2] => Combine me
[3] => [Item1=1,Item2=2,Item3=3]
)
)
$result[0] is what you wanted. $result[1] and $result[2] are property names and values separately, which enables you to use them right away instead of making an extra step that splits things like "Options=[Item1=1,Item2=2,Item3=3]" at the correct =.

If you could change the separators between the items (where it says Item1=1,Item2=2,Item3=3 to something like Item1=1|Item2=2|Item3=3) you could easily use explode(',',$fString) to convert a string to an array.
I can also offer this piece of code that will change the separators, as I have no experience with regex:
$newstr = str_replace(',Item','|Item',$fString);
$newarray = explode(',',$newstr);
$newarray will look like this:
Array[0] = "Name=Sök"
Array[1] = "Value=2"
Array[2] = "Title=Combine me"
Array[3] = "Options=[Item1=1|Item2=2|Item3=3]"

This is a problem that lends itself more to parsing than regex extraction. Bout you can separate the special case to make it work:
preg_match_all("/(\w+)=( \w[^,]+ | \[[^\]]+\] )/x", $str, $m);
$things = array_combine($m[1], $m[2]);
Will give you a PHP variable like (but you can access $m[0] for the unparsed strings):
[Name] => Sök
[Title] => Combine me
[Options] => [Item1=1,Item2=2,Item3=3]
You can reapply the function on Options to explode that too.
The trick again is differentiating between \w anything that starts with a letter, and the \[...\] enclosed options. There you have to just make it match ^] all non-closing-brackets, and that's it.

So, here is another approach. It's a mini parser for nested structures. Adapt the regex if you need escape codes.
function parse(&$s) {
while (strlen($s) && preg_match("/^(.*?)([=,\[\]])/", $s, $m)) {
$s = substr($s, 1 + strlen($m[1]));
switch ($m[2]) {
case "=":
$key = $m[1];
break;
case ",":
if (!isset($r[$key])) {
$r[$key] = $m[1];
}
break;
case "[":
$r[$key] = parse($s);
break;
case "]":
return $r;
}
}
if ($s) { $r[$key] = $s; } // remainder
return $r;
}

Related

Most elegant way to clean a string into only comma separated numerals

After instructing clients to input only
number comma number comma number
(no set length, but generally < 10), the results of their input have been, erm, unpredictable.
Given the following example input:
3,6 ,bannana,5,,*,
How could I most simply, and reliably end up with:
3,6,5
So far I am trying a combination:
$test= trim($test,","); //Remove any leading or trailing commas
$test= preg_replace('/\s+/', '', $test);; //Remove any whitespace
$test= preg_replace("/[^0-9]/", ",", $test); //Replace any non-number with a comma
But before I keep throwing things at it...is there an elegant way, probably from a regex boffin!
In a purely abstract sense this is what I'd do:
$test = array_filter(array_map('trim',explode(",",$test)),'is_numeric')
Example:
http://sandbox.onlinephpfunctions.com/code/753f4a833e8ff07cd9c7bd780708f7aafd20d01d
<?php
$str = '3,6 ,bannana,5,,*,';
$str = explode(',', $str);
$newArray = array_map(function($val){
return is_numeric(trim($val)) ? trim($val) : '';
}, $str);
print_r(array_filter($newArray)); // <-- this will give you array
echo implode(',',array_filter($newArray)); // <--- this give you string
?>
Here's an example using regex,
$string = '3,6 ,bannana,5,-6,*,';
preg_match_all('#(-?[0-9]+)#',$string,$matches);
print_r($matches);
will output
Array
(
[0] => Array
(
[0] => 3
[1] => 6
[2] => 5
[3] => -6
)
[1] => Array
(
[0] => 3
[1] => 6
[2] => 5
[3] => -6
)
)
Use $matches[0] and you should be on your way.
If you don't need negative numbers just remove the first bit in the in the regex rule.

Parsing PHP strings with quoted values

I'd like to parse a string like the following :
'serviceHits."test_server"."http_test.org" 31987'
into an array like :
[0] => serviceHits
[1] => test_server
[2] => http_test.org
[3] => 31987
Basically I want to split in dots and spaces, treating strings within quotes as a single value.
The format of this string is not fixed, this is just one example. It might contain different numbers of elements with quoted and numerical elements in different places.
Other strings might look like :
test.2 3 which should parse to [test|2|3]
test."342".cake.2 "cheese" which should parse to [test|342|cake|2|cheese]
test."red feet".3."green" 4 which should parse to [test|red feet|3|green|4]
And sometimes the oid string may contain a quote mark, which should be included if possible, but it's the least important part of the parser:
test."a \"b\" c" "cheese face" which should parse to [test|a "b" c|cheese face]
I'm trying to parse SNMP OID strings from agent written by people with quite varying ideas on what an OID should look like, in a generic manner.
Parsing off the oid string (the bit separated with dots) return value (the last value) into separate named arrays would be nice. Simply splitting on space before parsing the string wouldn't work, as both the OID and the value can contain spaces.
Thanks!
I agree this can be hard to find one regexp to resolve this issue.
Here's a complete solution :
$results = array();
$str = 'serviceHits."test_\"server"."http_test.org" 31987';
// Encode \" to something else temporary
$str_encoded_quotes = strtr($str,array('\\"'=>'####'));
// Split by strings between double-quotes
$str_arr = preg_split('/("[^"]*")/',$str_encoded_quotes,-1,PREG_SPLIT_DELIM_CAPTURE);
foreach ($str_arr as $substr) {
// If value is a dot or a space, do nothing
if (!preg_match('/^[\s\.]$/',$substr)) {
// If value is between double-quotes, it's a string
// Return as is
if (preg_match('/^"(.*)"$/',$substr)) {
$substr = preg_replace('/^"(.*)"$/','\1',$substr); // Remove double-quotes around
$results[] = strtr($substr,array('####'=>'"')); // Get escaped double-quotes back inside the string
// Else, it must be splitted
} else {
// Split by dot or space
$substr_arr = preg_split('/[\.\s]/',$substr,-1,PREG_SPLIT_NO_EMPTY);
foreach ($substr_arr as $subsubstr)
$results[] = strtr($subsubstr,array('####'=>'"')); // Get escaped double-quotes back inside string
}
}
// Else, it's an empty substring
}
var_dump($results);
Tested with all of your new string examples.
First attempt (OLD)
Using preg_split :
$str = 'serviceHits."test_server"."http_test.org" 31987';
// -1 : no limit
// PREG_SPLIT_NO_EMPTY : do not return empty results
preg_split('/[\.\s]?"[\.\s]?/',$str,-1,PREG_SPLIT_NO_EMPTY);
The easiest way is probably to replace dots and spaces inside strings with placeholders, split, then remove the placeholders. Something like this:
$in = 'serviceHits."test_server"."http_test.org" 31987';
$a = preg_replace_callback('!"([^"]*)"!', 'quote', $in);
$b = preg_split('![. ]!', $a);
foreach ($b as $k => $v) $b[$k] = unquote($v);
print_r($b);
# the functions that do the (un)quoting
function quote($m){
return str_replace(array('.',' '),
array('PLACEHOLDER-DOT', 'PLACEHOLDER-SPACE'), $m[1]);
}
function unquote($str){
return str_replace(array('PLACEHOLDER-DOT', 'PLACEHOLDER-SPACE'),
array('.',' '), $str);
}
Here is a solution that works with all of your test samples (plus one of my own) and allows you to escape quotes, dots, and spaces.
Due to the requirement of handling escape codes, a split is not really possible.
Although one can imagine a regex that matches the entire string with '()' to mark the separate elements, I was unable to get it working using preg_match or preg_match_all.
Instead I parsed the string incrementally, pulling off one element at a time. I then use stripslashes to unescape quotes, spaces, and dots.
<?php
$strings = array
(
'serviceHits."test_server"."http_test.org" 31987',
'test.2 3',
'test."342".cake.2 "cheese"',
'test."red feet".3."green" 4',
'test."a \\"b\\" c" "cheese face"',
'test\\.one."test\\"two".test\\ three',
);
foreach ($strings as $string)
{
print"'{$string}' => " . print_r(parse_oid($string), true) . "\n";
}
/**
* parse_oid parses and OID and returns an array of the parsed elements.
* This is an all-or-none function, and will return NULL if it cannot completely
* parse the string.
* #param string $string The OID to parse.
* #return array|NULL A list of OID elements, or null if error parsing.
*/
function parse_oid($string)
{
$result = array();
while (true)
{
$matches = array();
$match_count = preg_match('/^(?:((?:[^\\\\\\. "]|(?:\\\\.))+)|(?:"((?:[^\\\\"]|(?:\\\\.))+)"))((?:[\\. ])|$)/', $string, $matches);
if (null !== $match_count && $match_count > 0)
{
// [1] = unquoted, [2] = quoted
$value = strlen($matches[1]) > 0 ? $matches[1] : $matches[2];
$result[] = stripslashes($value);
// Are we expecting any more parts?
if (strlen($matches[3]) > 0)
{
// I do this (vs keeping track of offset) to use ^ in regex
$string = substr($string, strlen($matches[0]));
}
else
{
return $result;
}
}
else
{
// All or nothing
return null;
}
} // while
}
This generates the following output:
'serviceHits."test_server"."http_test.org" 31987' => Array
(
[0] => serviceHits
[1] => test_server
[2] => http_test.org
[3] => 31987
)
'test.2 3' => Array
(
[0] => test
[1] => 2
[2] => 3
)
'test."342".cake.2 "cheese"' => Array
(
[0] => test
[1] => 342
[2] => cake
[3] => 2
[4] => cheese
)
'test."red feet".3."green" 4' => Array
(
[0] => test
[1] => red feet
[2] => 3
[3] => green
[4] => 4
)
'test."a \"b\" c" "cheese face"' => Array
(
[0] => test
[1] => a "b" c
[2] => cheese face
)
'test\.one."test\"two".test\ three' => Array
(
[0] => test.one
[1] => test"two
[2] => test three
)

PHP Regex removing unwanted values from pattern

I have a large array of scraped names and prices similar to the following:
Array([0] => apple3 [1] => £0.40 [2] => banana6 [3] => £1.80 [4] => lemon [5] => grape [6] => pear5 [7] => melon4 [8] => £2.32 [9] => kiwi [10] => £0.50)
I would like to remove the fruit names that are not immediately followed by a price. In the above example this would remove: [4] => lemon [5] => grape [6] => pear5 resulting in the following output:
Array([0] => apple3 [1] => £0.40 [2] => banana6 [3] => £1.80 [7] => melon4 [8] => £2.32 [9] => kiwi [10] => £0.50)
If the array needs to be converted to a string in order for me to do this that is not a problem, nor is adding values between the array items in order to aid with regex searches. I have so far been unable to find the correct regular expression to do this using preg_match and preg_replace.
The most important factor is the need to maintain the sequential order of the fruits and prices in order for me at a later stage to convert this into an associative array of fruits and prices.
Thanks in advance.
Why involve regular expressions? This is doable with a simple foreach loop wherein you iterate over the array and remove names that follow names:
$lastWasPrice = true; // was the last item a price?
foreach ($array as $k => $v) {
if (ctype_alpha($v)) {
// it's a name
if (!$lastWasPrice) {
unset($array[$k]); // name follows name; remove the second
}
$lastWasPrice = false;
}
else {
// it's a price
$lastWasPrice = true;
}
}
The following code does both of your tasks at once: getting rid of the fruit without value and turning the result into an associative array of fruits with prices.
$arr = array('apple', '£0.40', 'banana', '£1.80', 'lemon', 'grape', 'pear', 'melon', '£2.32', 'kiwi', '£0.50' );
preg_match_all( '/#?([^£][^#]+)#(£\d+\.\d{2})#?/', implode( '#', $arr ), $pairs );
$final = array_combine( $pairs[1], $pairs[2] );
print_r( $final );
First, the array is converted to a string, separated by '#'. The regex captures all groups of fruits with prices - each stored as a separate subgroup in the result. Combining them into an associative array is a single function call.
Something like this might help you
$array = ...;
$index = 0;
while (isset($array[$index + 1])) {
if (!is_fruit($array[$index + 1])) {
// Not followed by a fruit, continue to next pair
$index += 2;
} else {
unset($array[$index]); // Will maintain indices in array
$index += 1;
}
}
Not tested though. Also, you need to create the function is_fruit yourself ;)
Without reformatting it, I don't think you can do it with preg_match or preg_replace-- maybe, but nothing is coming to mind.
What is creating that array? If possible, I would alter it to look more like:
Array([apple] => £0.40 [banana] => £1.80 [lemon] => [grape] => '' [pear ] => '' [melon => £2.32 [kiwi] => £0.50)
Then array_filter($array) is all you'd need to clean it up. If you can't alter the way the original array is created I'd lean towards creating key/value array out of the original.
Try replacing the pattern ** => ([a-zA-Z])** with ** => £0.00 $1**
Basically searching for the context where there is null price and inserting zero pounds.
Hope this helps.
Good luck
Simply do this :
<?php
for($i=0;$i<count($my_array);$i++)
{
if($my_array[$i+1]value=="")
unset($my_array[$i])
}
?>
assume $a is your array.
function isPrice($str) {
return (substr($str, 0, 1) == '£');
}
$newA = array();
for($i=0;$i<count($a);$i++) {
if( isPrice($a[$i]) != isPrice($a[$i+1]) ){
$newA[] = $a[$i];
}
}

Regex to find sequential integers

I am having a difficult time getting my regular expression code to work properly in PHP. Here is my code:
$array = array(); // Used to satisfy the 3rd argument requirment of preg_match_all.
$regex = '/(012|345|678|987|654|321|123|456|789|876|543|210|234|567|765|432)/';
$subject = '123456';
echo preg_match_all($regex, $subject, $array).'<br />';
print_r($array);
When this code is ran it will output:
2
Array
(
[0] => Array
(
[0] => 123
[1] => 456
)
[1] => Array
(
[0] => 123
[1] => 456
)
)
What can I do so that it will match 123, 234, 345 and 456?
Thanks in advance!
Regex is not the right tool for this job (it's not going to return "sub-matches"). Simply use strpos in a loop.
$subject = '123456';
$seqs = array('012', '345', '678', '987', '654', '321', '123', '456', '234');
foreach ($seqs as $seq) {
if (strpos($subject, $seq) !== false) {
// found
}
}
$regex = '/(?=(012|345|678|987|654|321|123|456|789|876|543|210|234|567|765|432))/';
$subject = '123456';
preg_match_all($regex, $subject, $array);
print_r($array[1]);
output:
Array
(
[0] => 123
[1] => 234
[2] => 345
[3] => 456
)
You're trying to retrieve matches that overlap each other in the subject string, which in general is not possible. However, in many cases you can fake it by wrapping the whole regex in a capturing group, then wrapping that in a lookahead. Because the lookahead doesn't consume any characters when it matches, the regex engine manually bumps forward one position after each successful match, to avoid getting stuck in an infinite loop. But capturing groups still work, so you can retrieve the captured text in the usual way.
Notice that I only printed the contents of the first capturing group ($array[1]). If I had printed the whole array of arrays ($array), it would have looked like this:
Array
(
[0] => Array
(
[0] =>
[1] =>
[2] =>
[3] =>
)
[1] => Array
(
[0] => 123
[1] => 234
[2] => 345
[3] => 456
)
)
see it in action on ideone
It can be done with regular expressions. The problem with your original code is that as soon as a match occurs, the character is consumed and the regular expression will not backtrack. Here's one way to do it:
$array = array(); // Used to satisfy the 3rd argument requirment of preg_match_all.
$regex = '/012|345|678|987|654|321|123|456|789|876|543|210|234|567|765|432/';
$subject = '123456';
$tempSubject = $subject;
$finalAnswer = array();
do {
$matched = preg_match($regex, $tempSubject, $array);
$finalAnswer = array_merge($finalAnswer, $array);
$tempSubject = substr($tempSubject, 1);
} while ($matched && (strlen($tempSubject >= 3)));
print_r($finalAnswer);
As suggested in another answer, however, regular expressions might not be the correct tool to use in this situation, depending on your larger goal. In addition, the above code may not be the most efficient way (wrt memory or wrt performance) to solve this with regular expressions. It's just a striaghtforward fulfill-the-requirement solution.
Yeah it's a hack but you can use RegEx
<?php
$subject = '123456';
$rs = findmatches($subject);
echo '<pre>'.print_r($rs,true).'</pre><br />';
function findmatches($x) {
$regex = '/(\d{3})/';
// Loop through the subject string
for($counter = 0; $counter <= strlen($x); $counter++) {
$y = substr($x, $counter);
if(preg_match_all($regex, $y, $array)) {
$rs_array[$counter] = array_unique($array);
}
}
// Parse results array
foreach($rs_array as $tmp_arr) {
$rs[] = $tmp_arr[0][0];
}
return $rs;
}
?>
Returns:
Array
(
[0] => 123
[1] => 234
[2] => 345
[3] => 456
)
NOTE: This would only work with concurrent numbers

How go get everything from between parenthesis in PHP?

Array(
[1] => put returns (between) paragraphs
[2] => (for) linebreak (add) 2 spaces at end
[3] => indent code by 4 (spaces!)
[4] => to make links
)
Want to get text inside brackets (for each value):
take only first match
remove this match from the value
write all matches to new array
After function arrays should look like:
Array(
[1] => put returns paragraphs
[2] => linebreak (add) 2 spaces at end
[3] => indent code by 4
[4] => to make links
)
Array(
[1] => between
[2] => for
[3] => spaces!
[4] =>
)
What is the solution?
I would use the regular expression /\((\([^()]*\)|[^()]*)\)/ (this will match one or two pairs of parentheses) together with preg_split:
$matches = array();
foreach ($arr as &$value) {
$parts = preg_split('/\((\([^()]*\)|[^()]*)\)/', $value, 2, PREG_SPLIT_DELIM_CAPTURE);
if (count($parts) > 1) {
$matches[] = current(array_splice($parts, 1, 1));
$value = implode('', $parts);
}
}
Using preg_split with PREG_SPLIT_DELIM_CAPTURE flag set will contain the matched separators in the result array. So a match was found, there are at least three parts. In that case the second member is the one we are looking for. That member is removed with array_splice that does also return the array of removed members. To get the removed member, current is used on the return value of array_splice. The remaining members are then put back together.
Assuming you meant (between) and not ((between))
$arr = array(
0 => 'put returns (between) paragraphs',
1 => '(for) linebreak (add) 2 spaces at end',
2 => 'indent code by 4 (spaces!)',
3 => 'to make links');
var_dump($arr);
$new_arr = array();
foreach($arr as $key => &$str) {
if(preg_match('/(\(.*?\))/',$str,$m)) {
$new_arr[] = $m[1];
$str = preg_replace('/\(.*?\)/','',$str,1);
}
else {
$new_arr[] = '';
}
}
var_dump($arr);
var_dump($new_arr);
Working link

Categories