Tricky php string matching - php

I have a string that looks like this:
[2005]
one
two
three
[2004]
six
What would be the smoothest was to get an array from it that would look like this:
array(
['2005'] => "one \n two \n three",
['2005'] => "six",
)
... or maybe even get the inner array sliced into lines array...
I tried doing it with preg_split, which worked but didn't give associative array keys so I didn't have the year numbers as keys.
Is there any cool way of doing this without iterating through all the lines ?

/(\[[0-9]{4}\])([^\[]*)/ will give you the date and whatever is after until the next one.
Use the groups to create your array: With preg_match_all() you get a $matches array where $matches[1] is the date and $matches[2] is the data following it.

Using Sylverdrag's regex as a guide:
<?php
$test = "[2005]
one
two
three
[2004]
six";
$r = "/(\[[0-9]{4}\])([^\[]*)/";
preg_match_all($r, $test, $m);
$output = array();
foreach ($m[1] as $key => $name)
{
$name = str_replace(array('[',']'), array('',''), $name);
$output[ $name ] = $m[2][$key];
}
print_r($output);
?>
Output (PHP 5.2.12):
Array
(
[2005] =>
one
two
three
[2004] =>
six
)

That's slightly more complex:
preg_match_all('/\[(\d+)\]\n((?:(?!\[).+\n?)+)/', $ini, $matches, PREG_SET_ORDER);
(Could be simplified with knowing the real format constraints.)

Related

php explode by comma ignore thousands seperator

I am scraping the following kind of strings from an external resource which I can't change:
["one item",0,0,2,0,1,"800.12"],
["another item",1,3,2,5,1,"1,713.59"],
(etc...)
I use the following code to explode the elements into an array.
<?php
$id = 0;
foreach($lines AS $line) {
$id = 0;
// remove brackets and line end comma's
$found_data[] = str_replace(array('],', '[',']', '"'), array('','','',''), $line);
// add data to array
$results[$id] = explode(',', $line);
}
Which works fine for the first line, but as the second line uses a comma for the thousands seperator of the last item, it fails there. So somehow I need to disable the explode to replace stuff between " characters.
If all values would be surrounded by " characters, I could just use something like
explode('","', $line);
However, unfortunately that's not the case here: some values are surrounded by ", some aren't (not always the same values are). So I'm a bit lost in how I should proceed. Anyone who can point me in the right direction?
You can use json_decode here since your input string appears to be a valid json string.
$str = '["another item",1,3,2,5,1,"1,713.59"]'
$arr = json_decode($str);
You can then access individual indices from resulting array or print the whole array using:
print_r($arr);
Output:
Array
(
[0] => another item
[1] => 1
[2] => 3
[3] => 2
[4] => 5
[5] => 1
[6] => 1,713.59
)

Error parsing regex pattern in php

I want to split a string such as the following (by a divider like '~##' (and only that)):
to=enquiry#test.com~##subject=test~##text=this is body/text~##date=date
into an array containing e.g.:
to => enquiry#test.com
subject => test
text => this is body/text
date => date
I'm using php5 and I've got the following regex, which almost works, but there are a couple of errors and there must be a way to do it in one go:
//Split the string in the url of $text at every ~##
$regexp = "/(?:|(?<=~##))(.*?=.*?)(?:~##|$|\/(?!.*~##))/";
preg_match_all($regexp, $text, $a);
//$a[1] is an array containing var1=content1 var2=content2 etc;
//Now create an array in the form [var1] = content, [var2] = content2
foreach($a[1] as $key => $value) {
//Get the two groups either side of the equals sign
$regexp = "/([^\/~##,= ]+)=([^~##,= ]+)/";
preg_match_all($regexp, $value, $r);
//Assign to array key = value
$val[$r[1][0]] = $r[2][0]; //e.g. $val['subject'] = 'hi'
}
print_r($val);
My queries are that:
It doesn't seem to capture more than 3 different sets of parameters
It is breaking on the # symbol and so not capturing email addresses e.g. returning:
to => enquiry
subject => test
text => this is body/text
I am doing multiple different regex searches where I suspect I would be able to do one.
Any help would be really appreciated.
Thanks
Why are you using regex when there is much simple method to do this by explode like this
$str = 'to=enquiry#test.com~##subject=test~##text=this is body/text~##date=date';
$array = explode('~##',$str);
$finalArr = array();
foreach($array as $val)
{
$tmp = explode('=',$val);
$finalArr[$tmp['0']] = $tmp['1'];
}
echo '<pre>';
print_r($finalArr);

To look for a simple way to extract matched parts of strings from an array

I want to extract matched parts of strings --digital part from an array
array("HK00003.Day","HK00005.Day").
<?php
$arr=array("HK00003.Day","HK00005.Day");
$result= array();
foreach ($arr as $item){
preg_match('/[0-9]+/',$item,$match);
array_push($result,$match[0]);
}
It can get the result :00003 00005,it seems tedious,preg_grep seems simple but the result is not what i want .
preg_grep('/[0-9]+/',$arr);
The output is "HK00003.Day","HK00005.Day", not 00003 00005,
is there more simple way to get the job done?
You can use preg_filter (which already uses preg_replace and does not require additional callback functions) to replace the each entry in the array with the number inside:
<?php
$arr = array("HK00003.Day","HK00005.Day");
$matches = preg_filter('/^.*?([0-9]+).*/', '$1',$arr);
print_r($matches);
?>
Output of a sample program:
Array
(
[0] => 00003
[1] => 00005
)
This should work for you:
(Here I just get rid off every character in your array which isn't a number with preg_replace())
<?php
$arr = ["HK00003.Day", "HK00005.Day"];
$result = preg_replace("/[^0-9]/", "", $arr);
print_r($result);
?>
output:
Array ( [0] => 00003 [1] => 00005 )
Your code is fine, not tedious at all. If you want a one-liner you can try something like this (remove everything that's not a digit):
array_push($result, preg_replace("~[^0-9]~", "", $item));
preg_grep return array entries that match the pattern! Therefore, it returns an array of entry rather than the matching string
try below:
preg_match_all('/[0-9]+/',implode('-',$arr),$result);

A bit lost with preg_match regular expression

I'm a beginner in regular expression so it didn't take long for me to get totally lost :]
What I need to do:
I've got a string of values 'a:b,a2:b2,a3:b3,a4:b4' where I need to search for a specific pair of values (ie: a2:b2) by the second value of the pair given (b2) and get the first value of the pair as an output (a2).
All characters are allowed (except ',' which seperates each pair of values) and any of the second values (b,b2,b3,b4) is unique (cant be present more than once in the string)
Let me show a better example as the previous may not be clear:
This is a string: 2 minutes:2,5 minutes:5,10 minutes:10,15 minutes:15,never:0
Searched pattern is: 5
I thought, the best way was to use function called preg_match with subpattern feature.
So I tried the following:
$str = '2 minutes:2,5 minutes:5,10 minutes:10,15 minutes:15,20 minutes:20,30 minutes:30, never:0';
$re = '/(?P<name>\w+):5$/';
preg_match($re, $str, $matches);
echo $matches['name'];
Wanted output was '5 minutes' but it didn't work.
I would also like to stick with Perl-Compatible reg. expressions as the code above is included in a PHP script.
Can anyone help me out? I'm getting a little bit desperate now, as Ive spent on this most of the day by now ...
Thanks to all of you guys.
$str = '2 minutes:2,51 seconds:51,5 minutes:5,10 minutes:10,15 minutes:51,never:0';
$search = 5;
preg_match("~([^,\:]+?)\:".preg_quote($search)."(?:,|$)~", $str, $m);
echo '<pre>'; print_r($m); echo '</pre>';
Output:
Array
(
[0] => 5 minutes:5
[1] => 5 minutes
)
$re = '/(?:^|,)(?P<name>[^:]*):5(?:,|$)/';
Besides the problem of your expression having to match $ after 5, which would only work if 5 were the last element, you also want to make sure that after 5 either nothing comes or another pair comes; that before the first element of the pair comes either another element or the beginning of the string, and you want to match more than \w in the first element of the pair.
A preg_match call will be shorter for certain, but I think I wouldn't bother with regular expressions, and instead just use string and array manipulations.
$pairstring = '2 minutes:2,5 minutes:5,10 minutes:10,15 minutes:15,20 minutes:20,30 minutes:30, never:0';
function match_pair($searchval, $pairstring) {
$pairs = explode(",", $str);
foreach ($pairs as $pair) {
$each = explode(":", $pair);
if ($each[1] == $searchval) {
echo $each[0];
}
}
}
// Call as:
match_pair(5, $pairstring);
Almost the same as #Michael's. It doesn't search for an element but constructs an array of the string. You say that values are unique so they are used as keys in my array:
$str = '2 minutes:2,5 minutes:5,10 minutes:10,15 minutes:15,20 minutes:20,30 minutes:30, never:0';
$a = array();
foreach(explode(',', $str) as $elem){
list($key, $val) = explode(':', $elem);
$a[$val] = $key;
}
Then accessing an element is very simple:
echo $a[5];

How to return only named groups with preg_match or preg_match_all?

Example:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i', $string, $arr_result);
print_r($arr_result);
Returns:
Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
But I want it to be:
Array
(
[date] => 2010-07-18
)
In PHP's PDO object there is an option that is filtering results from database by removing these duplicate numbered values : PDO::FETCH_ASSOC. But I haven't seen similar modifier for the PCRE functions in PHP yet.
How to return only named groups with preg_match or preg_match_all?
This is currently (PHP7) not possible.
You will always get a mixed type array, containing numeric and named keys.
Lets quote the PHP manual (http://php.net/manual/en/regexp.reference.subpatterns.php):
This subpattern will then be indexed in the matches array by its
normal numeric position and also by name.
To solve the problem the following code snippets might help:
1. filter the array by using an is_string check on the array key (for PHP5.6+)
$array_filtered = array_filter($array, "is_string", ARRAY_FILTER_USE_KEY);
2. foreach over the elements and unset if array key is_int() (all PHP versions)
/**
* #param array $array
* #return array
*/
function dropNumericKeys(array $array)
{
foreach ($array as $key => $value) {
if (is_int($key)) {
unset($array[$key]);
}
}
return $array;
}
Its a simple PHP function named dropNumericKeys(). Its for the post-processing of an matches array after a preg_match*() run using named groups for matching. The functions accepts an $array. It iterates the array and removes/unsets all keys with integer type, leaving keys with string type untouched. Finally, the function returns the array with "now" only named keys.
Note: The function is for PHP downward compatiblity. It works on all versions. The array_filter solution relies on the constant ARRAY_FILTER_USE_KEY, which is only available on PHP5.6+. See http://php.net/manual/de/array.constants.php#constant.array-filter-use-key
preg_match does not have any flag or option that it only returns named matches (yet). So what you want is not directly possible. However you can remove all items with non-fitting keys from your matches array and then you get what you're looking for:
$matches = array_intersect_key($matches, array_flip(array('name', 'likes')));
I do not think you can make preg_* do it, but you can do it with a simple loop. But I don't see why those elements pose a problem.
It also possible to unset all numeric indexes before return:
foreach (range(0, floor(count($arr_result) / 2)) as $index) {
unset($arr_result[$index]);
}
Similar to the answer that hakre posted above, I use this snippet to get just the named parameters:
$subject = "This is some text written on 2010-07-18.";
$pattern = '|(?<date>\d\d\d\d-\d\d-\d\d)|i';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
echo '<pre>Before Diff: ', print_r($matches, 1), '</pre>';
$matches = array_diff_key($matches[0], range(0, count($matches[0])));
echo '<pre>After Diff: ', print_r($matches, 1), '</pre>';
...which produces this:
Before Array
(
[0] => Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
)
After Array
(
[date] => 2010-07-18
)
I read in your post that these are possible overloads of future memory etc ...
In this case, why no't can be solved with an unset():
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d{4}-\d{2}-\d{2})|i', $string, $arr_result);
$date = array("date" => $arr_result['date']);
unset($arr_result, $string);//delete array and string preg_match origen
print_r($date);
//or create a new:
// $arr_result = $date;
//print_r($arr_result);
You could use T-Regx and go with group() or namedGroups() which only returns named capturing groups.
<?php
$subject = "This is some text written on 2010-07-18.";
pattern('(?<date>\d\d\d\d-\d\d-\d\d)', 'i')->match($subject)->first(function ($match) {
$date = $match->get('date');
// 2010-07-18
$groups = $match->namedGroups();
// [
// 'date' => '2010-07-18'
// ]
});
I use some of introduced codes and this is the final code works on php 5.6+:
$re = '/\d+\r\n(?<start>[\d\0:]+),\d+\s--\>\s(?<end>[\d\0:]+),.*\r\nHOME.*\r\nGPS\((?<x>[\d\.]+),(?<y>[\d\.]+),(?<d>[\d\.]+)\)\sBAROMETER\:(?<h>[\d\.]+)/';
$str= file_get_contents($srtFile);
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo '<pre>';
$filtered=array_map(function ($d){
return $array_filtered = array_filter($d, "is_string", ARRAY_FILTER_USE_KEY);
},$matches);
var_dump($filtered);
if you are interested what it does it read position data from a str file that DJI drones generate while recording video.
Try this:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i',$string,$arr_result);
echo $arr_result['date'];

Categories