PHP: split string based on array - php

Below is that data I'm trying to parse:
50‐59 1High300.00 Avg300.00
90‐99 11High222.00 Avg188.73
120‐1293High204.00 Avg169.33
The first section is a weight range, next is a count, followed by Highprice, ending with Avgprice.
As an example, I need to parse the data above into an array which would look like
[0]50-59
[1]1
[2]High300.00
[3]Avg300.00
[0]90-99
[1]11
[2]High222.00
[3]Avg188.73
[0]120‐129
[1]3
[2]High204.00
[3]Avg169.33
I thought about creating an array of what the possible weight ranges can be but I can't figure out how to use the values of the array to split the string.
$arr = array("10-19","20-29","30-39","40-49","50-59","60-69","70-79","80-89","90-99","100-109","110-119","120-129","130-139","140-149","150-159","160-169","170-179","180-189","190-199","200-209","210-219","220-229","230-239","240-249","250-259","260-269","270-279","280-289","290-299","300-309");
Any ideas would be greatly appreciated.

Hope this will work:
$string='50-59 1High300.00 Avg300.00
90-99 11High222.00 Avg188.73
120-129 3High204.00 Avg169.33';
$requiredData=array();
$dataArray=explode("\n",$string);
$counter=0;
foreach($dataArray as $data)
{
if(preg_match('#^([\d]+\-[\d]+) ([\d]+)([a-zA-Z]+[\d\.]+) ([a-zA-Z]+[\d\.]+)#', $data,$matches))
{
$requiredData[$counter][]=$matches[1];
$requiredData[$counter][]=$matches[2];
$requiredData[$counter][]=$matches[3];
$requiredData[$counter][]=$matches[4];
$counter++;
}
}
print_r($requiredData);

'#^([\d]+\-[\d]+) ([\d]+)([a-zA-Z]+[\d\.]+) ([a-zA-Z]+[\d\.]+)#'
I don't think that will work because of the space you have in the regex
between the weight and count. The thing I'm struggling with is a row
like this where there is no space. 120‐1293High204.00 Avg169.33 that
needs to be parsed like [0]120‐129 [1]3 [2]High204.00 [3]Avg169.33
You are right. That can be remedied by limiting the number of weight digits to three and making the space optional.
'#^(\d+-\d{1,3}) *…

$arr = array('50-59 1High300.00 Avg300.00',
'90-99 11High222.00 Avg188.73',
'120-129 3High204.00 Avg169.33');
foreach($arr as $str) {
if (preg_match('/^(\d+-\d{1,3})\s*(\d+)(High\d+\.\d\d) (Avg\d+\.\d\d)/i', $str, $m)) {
array_shift($m); //remove group 0 (ie. the whole match)
$result[] = $m;
}
}
print_r($result);
Output:
Array
(
[0] => Array
(
[0] => 50-59
[1] => 1
[2] => High300.00
[3] => Avg300.00
)
[1] => Array
(
[0] => 90-99
[1] => 11
[2] => High222.00
[3] => Avg188.73
)
[2] => Array
(
[0] => 120-129
[1] => 3
[2] => High204.00
[3] => Avg169.33
)
)
Explanation:
/ : regex delimiter
^ : begining of string
( : start group 1
\d+-\d{1,3} : 1 or more digits a dash and 1 upto 3 digits ie. weight range
) : end group 1
\s* : 0 or more space character
(\d+) : group 2 ie. count
(High\d+\.\d\d) : group 3 literal High followed by price
(Avg\d+\.\d\d) : Group 4 literal Avg followed by price
/i : regex delimiter and case Insensitive modifier.
To be more generic, you could replace High and Avg by [a-z]+

This is a pattern you can trust (Pattern Demo):
/^((\d{0,2})0‐(?:\2)9) ?(\d{1,3})High(\d{1,3}\.\d{2}) ?Avg(\d{1,3}\.\d{2})/m
The other answers overlooked the digital pattern in the weight range substring. The range start integer always ends in 0, and the range end integer always ends in 9; the range always spans ten integers.
My pattern will capture the digits that precede the 0 in the starting integer and reference them immediately after the dash, then require that captured number to be followed by a 9.
I want to point out that your sample input was a little bit tricky because your ‐ is not the standard - that is between the 0 and = on my keyboard. This was a sneaky little gotcha for me to solve.
Method (Demo):
$text = '50‐59 1High300.00 Avg300.00
90‐99 11High222.00Avg188.73
120‐1293High204.00 Avg169.33';
preg_match_all(
'/^((\d{0,2})0‐(?:\2)9) ?(\d{1,3})High(\d{1,3}\.\d{2}) ?Avg(\d{1,3}\.\d{2})/m',
$text,
$matches,
PREG_SET_ORDER
);
var_export(
array_map(
fn($captured) => [
'weight range' => $captured[1],
'count' => $captured[3],
'Highprice' => $captured[4],
'Avgprice' => $captured[5]
],
$matches
)
);
Output:
array (
0 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
1 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
2 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
3 =>
array (
'weight range' => '50‐59',
'count' => '1',
'Highprice' => '300.00',
'Avgprice' => '300.00',
),
)

Related

Find all occurrences of a "unknown" substring in a string with PHP

I have a string and I need to find all occurrences of some substrings in it but I know only initials chars of substrings... Ho can I do?
Example:
$my_string = "This is a text cointaining [substring_aaa attr], [substring_bbb attr] and [substring], [substring], [substring] and I'll try to find them!";
I know all substrings begin with '[substring' and end with a space char (before attr) or ']' char, so in this example I need to find substring_aaa, substring_bbb and substring and count how many occurrences for each one of them.
The result would be an associative array with the substrings as keys and occurrerrences as values, example:
$result = array(
'substring' => 3,
'substring_aaa' => 1,
'substring_bbb' => 1
)
Match [substring and then NOT ] zero or more times and then a ]:
preg_match_all('/\[(substring[^\]]*)\]/', $my_string, $matches);
$matches[1] will yield:
Array
(
[0] => substring_aaa attr
[1] => substring_bbb attr
[2] => substring
[3] => substring
[4] => substring
)
Then you can count the values:
$result = array_count_values($matches[1]);
After rereading the question, if you don't want what comes after a space (attr in this case) then:
preg_match_all('/\[(substring[^\]\s]*)[\]\s]/', $my_string, $matches);
For which $matches[1] will yield:
Array
(
[0] => substring_aaa
[1] => substring_bbb
[2] => substring
[3] => substring
[4] => substring
)
With the array_count_values yielding:
Array
(
[substring_aaa] => 1
[substring_bbb] => 1
[substring] => 3
)

How to shredding array with regx pattern in unified text?

I'm converting text from a txt file into an array.I need to shred the texts in this array using regex.
This is the array in my text file.
Array
(
[0] => 65S34523APPLE IS VERY BEAUTIFUL6.000TX786.34563.675 234.89
[1] => 06W01232BOOK IS SUCCESSFUL1.000YJ160.00021.853 496.00
[2] => 67E45643DO YOU HAVE A PEN? 7/56.450EQ9000.3451.432 765.12
)
if I need to explain a line as an example,
input => 65S34523APPLE IS VERY BEAUTIFUL6.000TX786.34563.675 234.89
required sections => 65S34523 APPLE IS VERY BEAUTIFUL 6.000 TX 786.345 63.67 5 234.89
target I want :
Array
(
[0] => 65S34523
[1] => APPLE IS VERY BEAUTIFUL
[2] => TX
[3] => 786.345
)
I need multiple regex patterns to achieve this.I need to shred the data I want in order in a loop.but since there is no specific layout, I don't know what to choose according to the regex patterns.
I've tried various codes to smash this array.
$smash =
array('65S34523APPLE IS VERY BEAUTIFUL6.000TX786.34563.675 234.89',
'06W01232BOOK IS SUCCESSFUL1.000YJ160.00021.853 496.00',
'67E45643DO YOU HAVE A PEN? 7/56.450EQ9000.3451.432 765.12');
I'm trying to foreach and parse the array.For example, I tried to get the text first.
foreach ($smash as $row) {
$delete_numbers = preg_replace('/\d/', '', $smash);
}
echo "<pre>";
print_r($delete_numbers);
echo "</pre>";
While it turned out it was that way.
Array
(
[0] => SAPPLE IS VERY BEAUTIFUL.TX.. .
[1] => WBOOK IS SUCCESSFUL.YJ.. .
[2] => EDO YOU HAVE A PEN? /.EQ.. .
)
Naturally, this is not what I want.Each array has a different structure.So i have to check with if-else too.
As you can see in the example, there is no pure text.Here
TX,YJ,EQ should be deleted.The dots should be wiped using apples.The first letters at the beginning of the text should
be removed.The remaining special characters must be removed.
I have tried many of the above.I have looked at alternative examples.
AS A RESULT;
I'm in a dead end.
Code: (Demo)
$smash = ['65S34523APPLE IS VERY BEAUTIFUL6.000TX786.34563.675 234.89',
'06W01232BOOK IS SUCCESSFUL1.000YJ160.00021.853 496.00',
'67E45643DO YOU HAVE A PEN? 7/56.450EQ9000.3451.432 765.12'];
foreach ($smash as $line) {
$result[] = preg_match('~(\w+\d)(\D+)[^A-Z]+([A-Z]{2})(\d+\.\d{3})~', $line, $out) ? array_slice($out, 1) : [];
}
var_export($result);
Output:
array (
0 =>
array (
0 => '65S34523',
1 => 'APPLE IS VERY BEAUTIFUL',
2 => 'TX',
3 => '786.345',
),
1 =>
array (
0 => '06W01232',
1 => 'BOOK IS SUCCESSFUL',
2 => 'YJ',
3 => '160.000',
),
2 =>
array (
0 => '67E45643',
1 => 'DO YOU HAVE A PEN? ',
2 => 'EQ',
3 => '9000.345',
),
)
My pattern assumes:
The first group will consist of numbers and letters and conclude with a digit.
The second group contains no digits.
The third group is consistently 2 uppercase letters.
The fourth group will reliably have three decimal places.
p.s. If you don't want that pesky trailing space after PEN?, you could use this:
https://3v4l.org/9XpA6
~(\w+\d)([^\d ]+(?: [^\d ]+)*) ?[^A-Z]+([A-Z]{2})(\d+\.\d{3})~

PHP regex - count number of exclamation marks before and after a word

I need help to refine a regex, in PHP, intended to count the number of exclamation marks that appear before and after a word. Words, in this situation, can include any character except a space (even exclamation marks), as follows (I am showing the expected "before, after" counts):
!!!!Hi!! => 4, 2
!!!!Hi => 4, 0
!Hi!!! => 1, 3
!easdf.kjaf!! => 1, 2
!hjdfa!sdfk!jaf!! => 1, 2
!,!!!!!fdgsdfg!!sdgj => 1, 0
!!!,!ksfgfdg!jkft!!! => 3, 3
How to code the regex so that, for the before, it stops looking for consecutive exclamation marks when some non-exclamation mark is reached, and start counting for the after when there are only exclamation marks remaining?
The tricky part, is when punctuation characters appear within the word. These should be ignored, these are considered as part of the word.
Here is where I am at:
preg_match_all('/(!*)\b(\S+)\b(!*)/', $w, $m);
$w is the word (as shown above), $m is matching array
As an example, "!!Hi!" would result in $m equal to
Array
(
[0] => Array
(
[0] => !!Hi!
)
[1] => Array
(
[0] => !!
)
[2] => Array
(
[0] => Hi
)
[3] => Array
(
[0] => !
)
)
That is correct and what I am looking for. However, things get thrown off when a punctuation character starts or ends the word , the regex anchor "\b" does not recognize that as part of the word (as it is defined in this exercise). Here is an example of a failure to parse the word "!!!!!!!!xd.sfgdx!!!,!!"
Array
(
[0] => Array
(
[0] => !!!!!!!!xd.sfgdx!!!
)
[1] => Array
(
[0] => !!!!!!!!
)
[2] => Array
(
[0] => xd.sfgdx
)
[3] => Array
(
[0] => !!!
)
)
Help, please.
You just need anchors (^ for beginning and $ for end) and basically anything in the middle. With anchors, a middle ! won't match if it is not on either ends. This might be a first attempt;
/^(!*).*(!*)$/
The problem with the anything in the middle here (.*) is that it is greedy, and will take precedence over the final group (!*). The anything in the middle would match all to the end and the group just nothing. Simple to fix though, just make the middle un-greedy:
/^(!*).*?(!*)$/
Now it will match any ! on the beginning, as much as possible, then anything in the middle step by step until the next condition matches (! at the end).
Here is a quick non-regex solution, just because:
$test = ['!!!!Hi!!',
'!!!!Hi',
'!Hi!!!',
'!easdf.kjaf!!',
'!hjdfa!sdfk!jaf!!',
'!,!!!!!fdgsdfg!!sdgj',
'!!!,!ksfgfdg!jkft!!!'];
foreach($test as $str) {
$count = $rcount = 0;
for ($i = 0; $i < strlen($str); $i++) {
if ($str[$i] == '!') {
$count += 1;
continue;
}
break;
}
for ($i = strlen($str) - 1; $i > 0; $i--) {
if ($str[$i] == '!') {
$rcount += 1;
continue;
}
break;
}
echo $str . ': ' . $count . ', ' . $rcount . '<br />';
}
Output:
!!!!Hi!!: 4, 2
!!!!Hi: 4, 0
!Hi!!!: 1, 3
!easdf.kjaf!!: 1, 2
!hjdfa!sdfk!jaf!!: 1, 2
!,!!!!!fdgsdfg!!sdgj: 1, 0
!!!,!ksfgfdg!jkft!!!: 3, 3
Use this regexp:
preg_match_all('/^(!*)[^!]{1}.*[^!]{1}(!*)/', $w, $m);
For you examples outputs are:
Array
(
[0] => Array
(
[0] => !!!!Hi!!
)
[1] => Array
(
[0] => !!!!
)
[2] => Array
(
[0] => !!
)
)
Array
(
[0] => Array
(
[0] => !!!,!ksfgfdg!jkft!!,!
)
[1] => Array
(
[0] => !!!
)
[2] => Array
(
[0] => !
)
)

PHP: split a string of alternating groups of characters into an array

I have a string whose correct syntax is the regex ^([0-9]+[abc])+$. So examples of valid strings would be: '1a2b' or '00333b1119a555a0c'
For clarity, the string is a list of (value, letter) pairs and the order matters. I'm stuck with the input string so I can't change that. While testing for correct syntax seems easy in principle with the above regex, I'm trying to think of the most efficient way in PHP to transform a compliant string into a usable array something like this:
Input:
'00333b1119a555a0c'
Output:
array (
0 => array('num' => '00333', 'let' => 'b'),
1 => array('num' => '1119', 'let' => 'a'),
2 => array('num' => '555', 'let' => 'a'),
3 => array('num' => '0', 'let' => 'c')
)
I'm having difficulty using preg_match for this. For example this doesn't give the expected result, the intent being to greedy-match on EITHER \d+ (and save that) OR [abc] (and save that), repeated until end of string reached.
$text = '00b000b0b';
$out = array();
$x = preg_match("/^(?:(\d+|[abc]))+$/", $text, $out);
This didn't work either, the intent here being to greedy-match on \d+[abc] (and save these), repeated until end of string reached, and split them into numbers and letter afterwards.
$text = '00b000b0b';
$out = array();
$x = preg_match("/^(?:\d+[abc])+$/", $text, $out);
I'd planned to check syntax as part of the preg_match, then use the preg_match output to greedy-match the 'blocks' (or keep the delimiters if using preg_split), then if needed loop through the result 2 items at a time using for (...; i+=2) to extract value-letter in their pairs.
But I can't seem to even get that basic preg_split() or preg_match() approach to work smoothly, much less explore if there's a 'neater' or more efficient way.
Your regex needs a few matching groups
/([0-9]+?)([a-z])/i
This means match all numbers in one group, and all letters in another. Preg match all gets all matches.
The key to the regex is the non greedy flag ? which matches the shortest possible string.
match[0] is the whole match
match[1] is the first match group (the numbers)
match[2] is the second match group (the letter)
example below
<?php
$input = '00333b1119a555a0c';
$regex = '/([0-9]+?)([a-z])/i';
$out = [];
$parsed = [];
if (preg_match_all($regex, $input, $out)) {
foreach ($out[0] as $index => $value) {
$parsed[] = [
'num' => $out[1][$index],
'let' => $out[2][$index],
];
}
}
var_dump($parsed);
output
array(4) {
[0] =>
array(2) {
'num' =>
string(5) "00333"
'let' =>
string(1) "b"
}
[1] =>
array(2) {
'num' =>
string(4) "1119"
'let' =>
string(1) "a"
}
[2] =>
array(2) {
'num' =>
string(3) "555"
'let' =>
string(1) "a"
}
[3] =>
array(2) {
'num' =>
string(1) "0"
'let' =>
string(1) "c"
}
}
Simple solution with preg_match_all(with PREG_SET_ORDER flag) and array_map functions:
$input = '00333b1119a555a0c';
preg_match_all('/([0-9]+?)([a-z]+?)/i', $input, $matches, PREG_SET_ORDER);
$result = array_map(function($v) {
return ['num' => $v[1], 'let' => $v[2]];
}, $matches);
print_r($result);
The output:
Array
(
[0] => Array
(
[num] => 00333
[let] => b
)
[1] => Array
(
[num] => 1119
[let] => a
)
[2] => Array
(
[num] => 555
[let] => a
)
[3] => Array
(
[num] => 0
[let] => c
)
)
You can use:
$str = '00333b1119a555a0c';
$arr=array();
if (preg_match_all('/(\d+)(\p{L}+)/', $str, $m)) {
array_walk( $m[1], function ($v, $k) use(&$arr, $m ) {
$arr[] = [ 'num'=>$v, 'let'=>$m[2][$k] ]; });
}
print_r($arr);
Output:
Array
(
[0] => Array
(
[num] => 00333
[let] => b
)
[1] => Array
(
[num] => 1119
[let] => a
)
[2] => Array
(
[num] => 555
[let] => a
)
[3] => Array
(
[num] => 0
[let] => c
)
)
All of the above work. But they didn't seem to have the elegance I wanted - they needed to loop, use array mapping, or (for preg_match_all()) they needed another almost identical regex as well, just to verify the string matched the regex.
I eventually found that preg_match_all() combined with named captures solved it for me. I hadn't used named captures for that purpose before and it looks powerful.
I also added an optional extra step to simplify the output if dups aren't expected (which wasn't in the question but may help someone).
$input = '00333b1119a555a0c';
preg_match_all("/(?P<num>\d+)(?P<let>[dhm])/", $input, $raw_matches, PREG_SET_ORDER);
print_r($raw_matches);
// if dups not expected this is also worth doing
$matches = array_column($raw_matches, 'num', 'let');
print_r($matches);
More complete version with input+duplicate checking
$input = '00333b1119a555a0c';
if (!preg_match("/^(\d+[abc])+$/",$input)) {
// OPTIONAL: detected $input incorrectly formatted
}
preg_match_all("/(?P<num>\d+)(?P<let>[dhm])/", $input, $raw_matches, PREG_SET_ORDER);
$matches = array_column($raw_matches, 'num', 'let');
if (count($matches) != count($raw_matches)) {
// OPTIONAL: detected duplicate letters in $input
}
print_r($matches);
Explanation:
This uses preg_match_all() as suggested by #RomanPerekhrest and #exussum to break out the individual groups and split the numbers and letters. I used named groups so that the resulting array of $raw_matches is created with the correct names already.
But if dups arent expected, then I used an extra step with array_column(), which directly extracts data from a nested array of entries and creates a desired flat array, without any need for loops, mapping, walking, or assigning item by item: from
(group1 => (num1, let1), group2 => (num2, let2), ... )
to the "flat" array:
(let1 => num1, let2 => num2, ... )
If named regex matches feels too advanced then they can be ignored - the matches will be given numbers anyway and this will work just as well, you would have to manually assign letters and it's just harder to follow.
preg_match_all("/(\d+)([dhm])/", $input, $raw_matches, PREG_SET_ORDER);
$matches = array_column($raw_matches, 1, 2);
If you need to check for duplicated letters (which wasn't in the question but could be useful), here's how: If the original matches contained >1 entry for any letter then when array_column() is used this letter becomes a key for the new array, and duplicate keys can't exist. Only one entry for each letter gets kept. So we just test whether the number of matches originally found, is the same as the number of matches in the final array after array_coulmn. If not, there were duplicates.

Split a single string into an array using specific Regex rules

I'm processing a single string which contains many pairs of data. Each pair is separated by a ; sign. Each pair contains a number and a string, separated by an = sign.
I thought it would be easy to process, but i've found that the string half of the pair can contain the = and ; sign, making simple splitting unreliable.
Here is an example of a problematic string:
123=one; two;45=three=four;6=five;
For this to be processed correctly I need to split it up into an array that looks like this:
'123', 'one; two'
'45', 'three=four'
'6', 'five'
I'm at a bit of dead end so any help is appreciated.
UPDATE:
Thanks to everyone for the help, this is where I am so far:
$input = '123=east; 456=west';
// split matches into array
preg_match_all('~(\d+)=(.*?);(?=\s*(?:\d|$))~', $input, $matches);
$newArray = array();
// extract the relevant data
for ($i = 0; $i < count($matches[2]); $i++) {
$type = $matches[2][$i];
$price = $matches[1][$i];
// add each key-value pair to the new array
$newArray[$i] = array(
'type' => "$type",
'price' => "$price"
);
}
Which outputs
Array
(
[0] => Array
(
[type] => east
[price] => 123
)
)
The second item is missing as it doesn't have a semicolon on the end, i'm not sure how to fix that.
I've now realised that the numeric part of the pair sometimes contains a decimal point, and that the last string pair does not have a semicolon after it. Any hints would be appreciated as i'm not having much luck.
Here is the updated string taking into account the things I missed in my initial question (sorry):
12.30=one; two;45=three=four;600.00=five
You need a look-ahead assertion for this; the look-ahead matches if a ; is followed by a digit or the end of your string:
$s = '12.30=one; two;45=three=four;600.00=five';
preg_match_all('/(\d+(?:.\d+)?)=(.+?)(?=(;\d|$))/', $s, $matches);
print_r(array_combine($matches[1], $matches[2]));
Output:
Array
(
[12.30] => one; two
[45] => three=four
[600.00] => five
)
I think this is the regex you want:
\s*(\d+)\s*=(.*?);(?=\s*(?:\d|$))
The trick is to consider only the semicolon that's followed by a digit as the end of a match. That's what the lookahead at the end is for.
You can see a detailed visualization on www.debuggex.com.
You can use following preg_match_all code to capture that:
$str = '123=one; two;45=three=four;6=five;';
if (preg_match_all('~(\d+)=(.+?);(?=\d|$)~', $str, $arr))
print_r($arr);
Live Demo: http://ideone.com/MG3BaO
$str = '123=one; two;45=three=four;6=five;';
preg_match_all('/(\d+)=([a-zA-z ;=]+)/', $str,$matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
o/p:
Array
(
[0] => Array
(
[0] => 123=one; two;
[1] => 45=three=four;
[2] => 6=five;
)
[1] => Array
(
[0] => 123
[1] => 45
[2] => 6
)
[2] => Array
(
[0] => one; two;
[1] => three=four;
[2] => five;
)
)
then y can combine
echo '<pre>';
print_r(array_combine($matches[1],$matches[2]));
echo '</pre>';
o/p:
Array
(
[123] => one; two;
[45] => three=four;
[6] => five;
)
Try this but this code is written in c#, you can change it into php
string[] res = Regex.Split("123=one; two;45=three=four;6=five;", #";(?=\d)");
--SJ

Categories