Split string into array regex php - php

I need to split the string bellow into array keys like in this format:
string = "(731) some text here with number 2 (220) some 54 number other text here" convert into:
array(
'731' => 'some text here with number 2',
'220' => 'some 54 number other text here'
);
I have tried:
preg_split( '/\([0-9]{3}\)/', $string );
and got:
array (
0 => 'some text here',
1 => 'some other text here'
);

Code
$string = "(731) some text here with number 2 (220) some 54 number other text here";
preg_match_all("/\((\d{3})\) *([^( ]*(?> +[^( ]+)*)/", $string, $matches);
$result = array_combine($matches[1], $matches[2]);
var_dump($result);
Output
array(2) {
[731]=>
string(28) "some text here with number 2"
[220]=>
string(30) "some 54 number other text here"
}
ideone demo
Description
The regex uses
\((\d{3})\) to match 3 digits in parentheses and captures it (group 1)
\ * to match the spaces in between keys and values
([^( ]*(?> +[^( ]+)*) to match everything except a ( and captures it (group 2)
This subpattern matches exactly the same as [^(]*(?<! ) but more efficiently, based on the unrolling-the-loop technique.
*Notice though that I am interpreting a value field cannot have a ( within. If that is not the case, do tell and I will modify it accordingly.
After that, we have $matches[1] with keys and $matches[2] with values. Using array_combine() we generate the desired array.

Try this:
$string = "(731) some text here with number 2 (220) some 54 number other text here";
$a = preg_split('/\s(?=\()/', $string);//split by spaces preceding the left bracket
$res = array();
foreach($a as $v){
$r = preg_split('/(?<=\))\s/', $v);//split by spaces following the right bracket
if(isset($r[0]) && isset($r[1])){
$res[trim($r[0],'() ')] = trim($r[1]);//trim brackets and spaces
}
}
print_r($res);
Output:
Array
(
[731] => some text here with number 2
[220] => some 54 number other text here
)
DEMO
If you want to limit it only to those numbers in brackets that have 3 digits, just modify the lookarounds:
$a = preg_split('/\s(?=\([0-9]{3}\))/', $string);

you can try this one,
<?php
$str="(731) some text here (220) some other text here";
echo $str .'<br>';
$arr1=explode('(', $str);
$size_arr=count($arr1);
$final_arr=array();
for($i=1;$i<$size_arr; $i++){
$arr2=explode(')', $arr1[$i]);
$final_arr[$arr2[0]]=trim($arr2[1]);
}
echo '<pre>';
print_r($final_arr);
?>
Use this link to test the code, Click Here.
I try to use the simple syntax. Hope everybody can understand.

I'm pretty sure that defining the keys is not possible, as the regex will add matches coninuously.
I would define 2 regex,
one for the keys:
preg_match_all("/(\()([0-9]*)(\))\s/", $input_lines, $output_array);
you will find your keys in $output_array[2].
And one for the texts (that looks quite the same):
preg_split("/(\()([0-9]*)(\))\s/", $input_line);
After that, you can build your custom array iterating over both.
Make sure to trim the strings in the second array when inserting.

Using preg_replace_callback() you can quickly achieve what you desire (when only parentheses contain 3 digits):
$string = "(731) some text here with number 2 (220) some 54 number other text here";
$array = array();
preg_replace_callback('~(\((\d{3})\))(.*?)(?=(?1)|\Z)~s', function($match) use (&$array) {
$array[$match[2]] = trim($match[3]);
}, $string);
var_dump($array);
Output:
array(2) {
[731]=>
string(28) "some text here with number 2"
[220]=>
string(30) "some 54 number other text here"
}

Maybe you can add PREG_SPLIT_DELIM_CAPTURE flag to preg_split. From preg_split man page (http://php.net/manual/en/function.preg-split.php)
PREG_SPLIT_DELIM_CAPTURE
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
So if you change your code to:
$results = preg_split('/\(([0-9]+)\)/s', $data,null,PREG_SPLIT_DELIM_CAPTURE);
You will obtain an array similar to:
Array
(
[0] => KS/M/ 2013/1238
[1] => 220
[2] => 23/12/2013
[3] => 300
[4] =>
[5] => 731
[6] => VALDETE BUZA ADEM JASHARI- PRIZREN, KS
[7] => 526
[8] =>
[9] => 591
[10] =>
[11] => 740
[12] =>
[13] => 540
[14] => DEINA
[15] => 546
[16] =>
[17] => 511
[18] => 3 Preparatet për zbardhim dhe substancat tjera për larje rrobash; preparatet për pastrim, shkëlqim, fërkim dhe gërryerje; sapunët; parfumet, vajrat esencialë, preparatet kozmetike, losionet për flokë, pasta për dhembe
14 Metalet e cmueshme dhe aliazhet e tyre; mallrat në metale të cmueshme ose të veshura me to, që nuk janë përfshire në klasat tjera; xhevahirët, gurët e cmueshëm; instrumentet horologjike dhe kronometrike (për matjen dhe regjistrimin e kohës)
25 Rrobat, këpucët, kapelat
35 Reklamim, menaxhim biznesi; administrim biznesi; funksione zyre
)
What you should do is to loop over the array ignoring first element in that case:
$myArray = array();
$myKey = '';
foreach ($results as $k => $v) {
if ( ($k > 0) && ($myKey == '')) {
$myKey = $v;
} else if ($k > 0) {
$myArray[$myKey] = $v;
$myKey = '';
}
}
EDIT: This answer is for:
$data ='KS/M/ 2013/1238 (220) 23/12/2013 (300)
(731) VALDETE BUZA ADEM JASHARI- PRIZREN, KS (526)
(591)
(740)
(540) DEINA (546)
(511) 3 Preparatet për zbardhim dhe substancat tjera për larje rrobash; preparatet për pastrim, shkëlqim, fërkim dhe gërryerje; sapunët; parfumet, vajrat esencialë, preparatet kozmetike, losionet për flokë, pasta për dhembe
14 Metalet e cmueshme dhe aliazhet e tyre; mallrat në metale të cmueshme ose të veshura me to, që nuk janë përfshire në klasat tjera; xhevahirët, gurët e cmueshëm; instrumentet horologjike dhe kronometrike (për matjen dhe regjistrimin e kohës)
25 Rrobat, këpucët, kapelat
35 Reklamim, menaxhim biznesi; administrim biznesi; funksione zyre';

Related

Grouping of regex with same name

I am trying to write a regex to get the ingredients name, quantity, unit from the sting. The string can be any pattern like "pohe 2 kg OR 2 Kg pohe OR 2Kg Pohe".
I have tried with below code -
<?PHP
$units = array("tbsp", "ml", "g", "grams", "kg", "few drops"); // add whatever other units are allowed
//mixed pattern
$pattern = '/(?J)(((?<i>^[a-zA-Z\s]+)(?<q>\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . '))|(?<q>^\d*\s*)(?<u>' . join("|", array_map("preg_quote", $units)) . ')(?<i>[a-zA-Z\s]+))/';
$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m);
print_r($m);
$quantities = $m['q'];
$units = array_map('trim', $m['u']);
$ingrd = array_map('trim', $m['i']);
print_r($quantities);
print_r($units);
print_r($ingrd);
?>
The above code works for the string "2kg pohe", but not for the "pohe 2kg".
If anyone having idea what I am missing, please help me in this.
For pohe 2kg duplicate named groups are empty, as the documentation of preg_match_all states that for the flag PREG_PATTERN_ORDER (which is the default)
If the pattern contains duplicate named subpatterns, only the
rightmost subpattern is stored in $matches[NAME].
Int he pattern that you generate, there is a match in the second part (after the alternation) for 2kg pohe but for the pohe 2kg there is only a match in the first part so for the second part there are no values stored.
What you might do, is use the PREG_SET_ORDER flag instead, which gives:
$ingredients = '2kg pohe';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);
Output
Array
(
[0] => 2kg pohe
[i] => pohe
[1] =>
[q] => 2
[2] =>
[u] => kg
[3] =>
[4] => 2
[5] => kg
[6] => pohe
)
And
$ingredients = 'pohe 2kg';
preg_match_all($pattern, $ingredients, $m, PREG_SET_ORDER);
print_r($m[0]);
Output
Array
(
[0] => pohe 2kg
[i] => pohe
[1] => pohe
[q] => 2
[2] => 2
[u] => kg
[3] => kg
)
Then you can get the named subgroups for both strings like $m[0]['i'] etc..
Note that in the example there is 2Kg and you can make the pattern case insensitive to match.

Operation on String - Substrings: Get Positions and Count

I have a problem with operations on a string in PHP, I have one string like this:
$words = "Ala ma kota a kot ma ale";
How do I get the number of appearances of al in this long string $words? Additionally, I need the index of the beginning of all appearances of al.
$count = substr_count($words, 'al');
I tried it with the substr_count(), but it only returned the count. I need the index of the appearance as well.
EDIT Adding expected output:
number of al: 2, at index: 0, at index: 22
This is easily accomplished with preg_match_all() using PREG_OFFSET_CAPTURE flag:
$words = "Ala ma kota a kot ma ale. All in Valhalla shall recall the fall.";
preg_match_all('~(al)~i', $words, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[1]);
See a live demo at https://3v4l.org/Frsa5. We have the i modifier for case-insensitive matching. If you want case-sensitive matching, remove it. If you want only als at the start of words, use \bal (\b = word boundary). The result is an array with matches and offsets, as follows:
Array [
[0] => [
[0] => Al
[1] => 0
]
[1] => [
[0] => al
[1] => 21
]
[2] => [
[0] => Al
[1] => 26
]
[3] => [
[0] => al
[1] => 34
]
[4] => [
[0] => al
[1] => 37
]
[5] => [
[0] => al
[1] => 44
]
[6] => [
[0] => al
[1] => 51
]
[7] => [
[0] => al
[1] => 60
]
]
Edit: Since there's nothing else being matched, you don't really need the (al) capture group. You can also just remove the brackets, match ~al~i, and get the results in $matches[0] (containing full pattern matches). I've left it as is with the capture group in place, in case you may want to use more complex matching rules in the future (& being lazy to update the demo).
You could build a loop that continuously searches the array, until you run out of count, probably. Or you could wing it and make a loop that continuously does strpos() and count it in the end. Note that you need to somehow make sure you're not checking the same position over and over, that's what the $position++ is there in this piece of code.
Here, this will output exactly what you asked for in the comment, with the correct position that is. (I didn't see the comment before)
$position = 0;
$words = strtolower("Ala ma kota a kot ma ale");
$needle = 'al';
$positions = [];
do {
$position = strpos($words, $needle, $position);
if ($position !== false) {
array_push($positions, $position);
$position++;
}
} while ($position);
echo 'number of al: '.count($positions);
foreach ($positions as $position) {
echo ', at index: '.$position;
}
Output:
number of al: 2, at index: 0, at index: 21
Edit: As noted by eis, this could be simplified down to
<?php
$position = 0;
$words = strtolower("Ala ma kota a kot ma alet ma ale");
$needle = 'al';
$positions = [];
$position = -1;
while (($position = strpos($words, $needle, $position + 1)) !== false) {
array_push($positions, $position);
}
echo 'number of al: ' . count($positions);
foreach ($positions as $position) {
echo ', at index: ' . $position;
}
Edit: As Markus AO said, you might want to base your position movement (position++;) on the length of the needle. I think that depends on the way you wish to match.
a) If you want to match all appearances (aba → abababa = 3 ... match may occur within the previous match,) keep it this way.
b) If you want to match all 'full' appearances (aba → abababa = 2,) increment the position by the needle's length.

How can I write a regex to pick repeating patterns in php from a file

With this string from a file with similar lines,
03/21/19 11:20 LOC3 UNA:
03/21/19 11:40 LOC2 IN: NEW BD PN VO LVA
03/21/19 11:50 LOC3 OFF:
03/21/19 12:20 LOC2 IN: OLD XD AB VO LVA
I need to capture the NEW, BD, PN, VO,LVA from lime 1, and OLD,XD,AB,VO,LVA in line 2 and so on, ignoring the other lines
This only picks the last 'VO' term
IN:\s(([^\s]+)\s+)+.*LVA
You may match the occurrences of non-whitespace chunks of text after a specific text having some text further in the string using
preg_match_all('~(?:\G(?!\A)(?=.*LVA)|IN:)\h+\K\S+~', $s, $matches)
See the regex demo
Details
(?:\G(?!\A)(?=.*LVA)|IN:) - either the end of the previous match (that has LVA later in the string after 0+ chars other than line break chars) or IN: substring (basically, it means match consecutive substrings that meet the pattern after IN: but only if there is LVA later)
\h+ - 1+ horizontal whitespaces
\K - match reset operator
\S+ - 1+ non-whitespace chars.
PHP:
$s = "03/21/19 11:20 LOC2 IN: NEW BD PN VO LVA";
if (preg_match_all('~(?:\G(?!\A)(?=.*LVA)|IN:)\h+\K\S+~', $s, $matches)) {
print_r($matches[0]);
}
// => Array ( [0] => NEW [1] => BD [2] => PN [3] => VO [4] => LVA )
To get multiple matches, wrap the pattern in the first non-capturing group with a capturing group and then check the submatches when building the final output. Something like
$s = "03/21/19 11:20 LOC2 IN: NEW BD PN VO LVA
03/21/19 11:20 LOC2 IN: NEW BD PN VO LVA VB";
$res = [];
if (preg_match_all('~(?:\G(?!\A)(?=.*LVA)|(IN:))\h+\K\S+~', $s, $matches, PREG_SET_ORDER, 0)) {
$tmp = [];
foreach ($matches as $r) {
if (count($r) > 1) {
if (count($tmp)>0) {
$res[] = $tmp;
$tmp = [];
}
}
$tmp[] = $r[0];
}
if (count($tmp)>0) {
$res[] = $tmp;
}
}
print_r($res);
// => Array (
// [0] => Array ( [0] => NEW [1] => BD [2] => PN [3] => VO [4] => A )
// [1] => Array ( [0] => NEW [1] => BD [2] => PN [3] => VO [4] => LVA )
// )
See the PHP demo.
If the string always has the same pattern then you can use a budget solution by exploding on new line and ": " and get the last value.
$str = "03/21/19 11:20 LOC2 IN: NEW BD PN VO LVA
03/21/19 11:20 LOC2 IN: OLD XD AB VO LVA";
foreach(explode("\n", $str) as $line){
$tmp = explode(": ", $line);
$result[] = end($tmp);
}
var_dump($result);
https://3v4l.org/IbfBo

Php preg_split seperates number with comma in two different numbers

$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
I need to get this array:
Array ( [0] => Bid [1] => 12/20/2018 08:10 AM (PST) [2] => $8,000 [3] => 14 [4] => 0 [5] => [6] => 120270 [7] => $10,75 [8] => false )
I agree with Andreas about using preg_match_all(), but not with his pattern.
For stability, I recommend consuming the entire string from the beginning.
Match the label and its trailing colon. [^:]+:
Match zero or more spaces. \s*
Forget what you matched so far \K
Lazily match zero or more characters (giving back when possible -- make minimal match). .*?
"Look Ahead" and demand that the matched characters from #4 are immediately followed by a comma, then 1 or more non-comma&non-colon character (the next label), then a colon ,[^,:]+: OR the end of the string $.
Code: (Demo)
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
var_export(
preg_match_all(
'/[^:]+:\s*\K.*?(?=\s*(?:$|,[^,:]+:))/',
$line,
$out
)
? $out[0] // isolate fullstring matches
: [] // no matches
);
Output:
array (
0 => 'Bid',
1 => '12/20/2018 08:10 AM (PST)',
2 => '$8,000',
3 => '14',
4 => '0',
5 => '',
6 => '120270',
7 => '$10,75',
8 => 'false',
)
New answer according to new request:
I use he same regex for spliting the string and I replace after what is before the colon:
$line = "Type:Bid, End Time: 12/20/2018 08:10 AM (PST), Price: $8,000,Bids: 14, Age: 0, Description: , Views: 120270, Valuation: $10,75, IsTrue: false";
$parts = preg_split("/(?<!\d),|,(?!\d)/", $line);
$result = array();
foreach($parts as $elem) {
$result[] = preg_replace('/^[^:]+:\h*/', '', $elem);
}
print_r ($result);
Output:
Array
(
[0] => Bid
[1] => 12/20/2018 08:10 AM (PST)
[2] => $8,000
[3] => 14
[4] => 0
[5] =>
[6] => 120270
[7] => $10,75
[8] => false
)
I'd use preg_match instead.
Here the pattern looks for digit(s) comma digit(s) or just digit(s) or a word and a comma.
I append a comma to the string to make the regex simpler.
$line = "TRUE,59,m,10,500";
preg_match_all("/(\d+,\d+|\d+|\w+),/", $line . ",", $match);
var_dump($match);
https://3v4l.org/HQMgu
Even with a different order of the items this code will still produce a correct output: https://3v4l.org/SRJOf
much bettter idea:
$parts=explode(',',$line,4); //explode has a limit you can use in this case 4
same result less code.
I would keep it simple and do this
$line = "TRUE,59,m,10,500";
$parts = preg_split("/,/", $line);
//print_r ($parts);
$parts[3]=$parts[3].','.$parts[4]; //create a new part 3 from 3 and 4
//$parts[3].=','.$parts[4]; //alternative syntax to the above
unset($parts[4]);//remove old part 4
print_r ($parts);
i would also just use explode(), rather than a regular expression.

How to manipulate complex strings in php?

I am trying to group bunch of texts from a string and create an array for it.
The string is something like this:
<em>string</em> and the <em>test</em> here.
tableBegin rowNumber:2, columnNumber:2 11 22 33 44 tableEnd
<em>end</em> text here
I was hoping to get an array like the following results
array (0 => '<em>string</em> and the <em>test</em> here.',
1=>'rowNumber:5',
2=>'columnNumber:3',
3=>'11',
4=>'22',
5=>'33',
6=>'44'
7=>'<em>end</em> text here')
11,22,33,44 are the table cell data the user enters. I want to make them have unique index but keep the rest of texts together.
tableBegin and tableEnd are just the check for the table cell data
Any help or tips? Thanks a lot!
You may try the following, note that you need PHP 5.3+:
$string = '<em>string</em> and the <em>test</em> here.
tableBegin rowNumber:2, columnNumber:2 11 22 33 44 tableEnd
SOme other text
tableBegin rowNumber:3, columnNumber:3 11 22 33 44 55 tableEnd
<em>end</em> text here';
$array = array();
preg_replace_callback('#tableBegin\s*(.*?)\s*tableEnd\s*|.*?(?=tableBegin|$)#s', function($m)use(&$array){
if(isset($m[1])){ // If group 1 exists, which means if the table is matched
$array = array_merge($array, preg_split('#[\s,]+#s', $m[1])); // add the splitted string to the array
// split by one or more whitespace or comma --^
}else{// Else just add everything that's matched
if(!empty($m[0])){
$array[] = $m[0];
}
}
}, $string);
print_r($array);
Output
Array
(
[0] => string and the test here.
[1] => rowNumber:2
[2] => columnNumber:2
[3] => 11
[4] => 22
[5] => 33
[6] => 44
[7] => SOme other text
[8] => rowNumber:3
[9] => columnNumber:3
[10] => 11
[11] => 22
[12] => 33
[13] => 44
[14] => 55
[15] => end text here
)
Regex explanation
tableBegin : match tableBegin
\s* : match a whitespace zero or more times
(.*?) : match everything ungreedy and put it in group 1
\s* : match a whitespace zero or more times
tableEnd : match tableEnd
\s* : match a whitespace zero or more times
| : or
.*?(?=tableBegin|$) : match everything until tableBegin or end of line
The s modifier : make dots also match newlines
Here is the ugly way to do it, if you can't find a Regex guru out ther.
So, this is your text
$string = "<em>string</em> and the <em>test</em> here.
tableBegin rowNumber:2, columnNumber:2 11 22 33 44 tableEnd
<em>end</em> text here";
And this is my code
$E = explode(' ', $string);
$A = $E[0].$E[1].$E[2].$E[3].$E[4].$E[5];
$B = $E[17].$E[18].$E[19];
$All = [$A, $E[8],$E[9], $E[11], $E[12], $E[13], $E[14], $B];
print_r($All);
And this is the output
Array
(
[0] => stringandthetesthere.
[1] => rowNumber:2,
[2] => columnNumber:2
[3] => 11
[4] => 22
[5] => 33
[6] => 44
[7] => endtexthere
)
off-course, the <em> tags won't be visible, unless view the source code.

Categories