I'm processing a single string which contains many pairs of data. Each pair is separated by a ; sign. Each pair contains a number and a string, separated by an = sign.
I thought it would be easy to process, but i've found that the string half of the pair can contain the = and ; sign, making simple splitting unreliable.
Here is an example of a problematic string:
123=one; two;45=three=four;6=five;
For this to be processed correctly I need to split it up into an array that looks like this:
'123', 'one; two'
'45', 'three=four'
'6', 'five'
I'm at a bit of dead end so any help is appreciated.
UPDATE:
Thanks to everyone for the help, this is where I am so far:
$input = '123=east; 456=west';
// split matches into array
preg_match_all('~(\d+)=(.*?);(?=\s*(?:\d|$))~', $input, $matches);
$newArray = array();
// extract the relevant data
for ($i = 0; $i < count($matches[2]); $i++) {
$type = $matches[2][$i];
$price = $matches[1][$i];
// add each key-value pair to the new array
$newArray[$i] = array(
'type' => "$type",
'price' => "$price"
);
}
Which outputs
Array
(
[0] => Array
(
[type] => east
[price] => 123
)
)
The second item is missing as it doesn't have a semicolon on the end, i'm not sure how to fix that.
I've now realised that the numeric part of the pair sometimes contains a decimal point, and that the last string pair does not have a semicolon after it. Any hints would be appreciated as i'm not having much luck.
Here is the updated string taking into account the things I missed in my initial question (sorry):
12.30=one; two;45=three=four;600.00=five
You need a look-ahead assertion for this; the look-ahead matches if a ; is followed by a digit or the end of your string:
$s = '12.30=one; two;45=three=four;600.00=five';
preg_match_all('/(\d+(?:.\d+)?)=(.+?)(?=(;\d|$))/', $s, $matches);
print_r(array_combine($matches[1], $matches[2]));
Output:
Array
(
[12.30] => one; two
[45] => three=four
[600.00] => five
)
I think this is the regex you want:
\s*(\d+)\s*=(.*?);(?=\s*(?:\d|$))
The trick is to consider only the semicolon that's followed by a digit as the end of a match. That's what the lookahead at the end is for.
You can see a detailed visualization on www.debuggex.com.
You can use following preg_match_all code to capture that:
$str = '123=one; two;45=three=four;6=five;';
if (preg_match_all('~(\d+)=(.+?);(?=\d|$)~', $str, $arr))
print_r($arr);
Live Demo: http://ideone.com/MG3BaO
$str = '123=one; two;45=three=four;6=five;';
preg_match_all('/(\d+)=([a-zA-z ;=]+)/', $str,$matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
o/p:
Array
(
[0] => Array
(
[0] => 123=one; two;
[1] => 45=three=four;
[2] => 6=five;
)
[1] => Array
(
[0] => 123
[1] => 45
[2] => 6
)
[2] => Array
(
[0] => one; two;
[1] => three=four;
[2] => five;
)
)
then y can combine
echo '<pre>';
print_r(array_combine($matches[1],$matches[2]));
echo '</pre>';
o/p:
Array
(
[123] => one; two;
[45] => three=four;
[6] => five;
)
Try this but this code is written in c#, you can change it into php
string[] res = Regex.Split("123=one; two;45=three=four;6=five;", #";(?=\d)");
--SJ
Related
I have an array variable which contain values like this:
$items = array(
"tbFrench",
"eaItaly1",
"discount21",
"kkMM5",
"NbndA",
"fcMNSS334"
);
i nedd to remove last character of string from this array values if the last character contain number, for example:
$newItems = array();
foreach($items as $item){
$newItems[] = $this->removeLastCharacter($item);
}
print_r($newItems);
....
function removeLastCharacter($string){
// ????
}
i want the result to look like this when i print_r the $newItems variable:
Array ( [0] => tbFrench [1] => eaItaly [2] => discount2 [3] => kkMM [4] => NbndA [5] => fcMNSS33 )
You could use regular expressions to remove the last digit.
function removeLastCharacter($string){
return preg_replace('[\d$]', '', $string);
}
\d matches every digit and $ references the end of the string. So this will only replace the last character if it is a digit at the end.
You can do a RegEx replacement over all the items in an array by simply providing the array as the subject, like so:
$items = preg_replace('/^d$/', '', $items);
There's no need to put it into a function at all - print_r($items) outputs:
Array
(
[0] => tbFrench
[1] => eaItaly
[2] => discount2
[3] => kkMM
[4] => NbndA
[5] => fcMNSS33
)
If you want to replace all trailing digits you can use /^\d+$/
Give a try with below code if it solve your problem
$items = array(
"tbFrench",
"eaItaly1",
"discount21",
"kkMM5",
"NbndA",
"fcMNSS334"
);
$newArr=array();
foreach($items as $item){
$data = preg_replace('[\d$]','',$item);
array_push($newArr,$data);
}
print_r($newArr);
Why not simply?
print_r( preg_replace( '/\d+$/', "", $items ) ); // preg_replace accepts an array as argument, pass yours directly, no need for a loop.
Array
(
[0] => tbFrench
[1] => eaItaly
[2] => discount
[3] => kkMM
[4] => NbndA
[5] => fcMNSS
)
Regex Explanation:
\d+ — matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
$ — asserts position at the end of a line
There is a quick trick using rtrim.
$result = rtrim($str,"0..9");
second argument is a range using 2 dots ".."
You are done !!!
Consider I have this string 'aaaabbbaaaaaabbbb' I want to convert this to array so that I get the following result
$array = [
'aaaa',
'bbb',
'aaaaaa',
'bbbb'
]
How to go about this in PHP?
PHP code demo
Regex: (.)\1{1,}
(.): Match and capture single character.
\1: This will contain first match
\1{1,}: Using matched character one or more times.
<?php
ini_set("display_errors", 1);
$string="aaaabbbaaaaaabbbb";
preg_match_all('/(.)\1{1,}/', $string,$matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => aaaa
[1] => bbb
[2] => aaaaaa
[3] => bbbb
)
[1] => Array
(
[0] => a
[1] => b
[2] => a
[3] => b
)
)
Or:
PHP code demo
<?php
$string="aaaabbbaaaaaabbbb";
$array=str_split($string);
$start=0;
$end= strlen($string);
$indexValue=$array[0];
$result=array();
$resultantArray=array();
while($start!=$end)
{
if($indexValue==$array[$start])
{
$result[]=$array[$start];
}
else
{
$resultantArray[]=implode("", $result);
$result=array();
$result[]=$indexValue=$array[$start];
}
$start++;
}
$resultantArray[]=implode("", $result);
print_r($resultantArray);
Output:
Array
(
[0] => aaaa
[1] => bbb
[2] => aaaaaa
[3] => bbbb
)
I have written a one-liner using only preg_split() that generates the expected result with no wasted memory (no array bloat):
Code (Demo):
$string = 'aaaabbbaaaaaabbbb';
var_export(preg_split('/(.)\1*\K/', $string, 0, PREG_SPLIT_NO_EMPTY));
Output:
array (
0 => 'aaaa',
1 => 'bbb',
2 => 'aaaaaa',
3 => 'bbbb',
)
Pattern:
(.) #match any single character
\1* #match the same character zero or more times
\K #keep what is matched so far out of the overall regex match
The real magic happens with the \K, for more reading go here.
The 0 parameter in preg_split() means "unlimited matches". This is the default behavior, but it needs to hold its place in the function so that the next parameter is used appropriately as a flag
The final parameter is PREG_SPLIT_NO_EMPTY which removes any empty matches.
Sahil's preg_match_all() method preg_match_all('/(.)\1{1,}/', $string,$matches); is a good attempt but it is not perfect for two reasons:
The first issue is that his use of preg_match_all() returns two subarrays which is double the necessary result.
The second issue is revealed when $string="abbbaaaaaabbbb";. His method will ignore the first lone character. Here is its output:
Array (
[0] => Array
(
[0] => bbb
[1] => aaaaaa
[2] => bbbb
)
[1] => Array
(
[0] => b
[1] => a
[2] => b
)
)
Sahil's second attempt produces the correct output, but requires much more code. A more concise non-regex solution could look like this:
$array = str_split($string);
$last = "";
foreach ($array as $v) {
if (!$last || strpos($last, $v) !== false) {
$last .= $v;
} else {
$result[] = $last;
$last = $v;
}
}
$result[] = $last;
var_export($result);
I have a string whose correct syntax is the regex ^([0-9]+[abc])+$. So examples of valid strings would be: '1a2b' or '00333b1119a555a0c'
For clarity, the string is a list of (value, letter) pairs and the order matters. I'm stuck with the input string so I can't change that. While testing for correct syntax seems easy in principle with the above regex, I'm trying to think of the most efficient way in PHP to transform a compliant string into a usable array something like this:
Input:
'00333b1119a555a0c'
Output:
array (
0 => array('num' => '00333', 'let' => 'b'),
1 => array('num' => '1119', 'let' => 'a'),
2 => array('num' => '555', 'let' => 'a'),
3 => array('num' => '0', 'let' => 'c')
)
I'm having difficulty using preg_match for this. For example this doesn't give the expected result, the intent being to greedy-match on EITHER \d+ (and save that) OR [abc] (and save that), repeated until end of string reached.
$text = '00b000b0b';
$out = array();
$x = preg_match("/^(?:(\d+|[abc]))+$/", $text, $out);
This didn't work either, the intent here being to greedy-match on \d+[abc] (and save these), repeated until end of string reached, and split them into numbers and letter afterwards.
$text = '00b000b0b';
$out = array();
$x = preg_match("/^(?:\d+[abc])+$/", $text, $out);
I'd planned to check syntax as part of the preg_match, then use the preg_match output to greedy-match the 'blocks' (or keep the delimiters if using preg_split), then if needed loop through the result 2 items at a time using for (...; i+=2) to extract value-letter in their pairs.
But I can't seem to even get that basic preg_split() or preg_match() approach to work smoothly, much less explore if there's a 'neater' or more efficient way.
Your regex needs a few matching groups
/([0-9]+?)([a-z])/i
This means match all numbers in one group, and all letters in another. Preg match all gets all matches.
The key to the regex is the non greedy flag ? which matches the shortest possible string.
match[0] is the whole match
match[1] is the first match group (the numbers)
match[2] is the second match group (the letter)
example below
<?php
$input = '00333b1119a555a0c';
$regex = '/([0-9]+?)([a-z])/i';
$out = [];
$parsed = [];
if (preg_match_all($regex, $input, $out)) {
foreach ($out[0] as $index => $value) {
$parsed[] = [
'num' => $out[1][$index],
'let' => $out[2][$index],
];
}
}
var_dump($parsed);
output
array(4) {
[0] =>
array(2) {
'num' =>
string(5) "00333"
'let' =>
string(1) "b"
}
[1] =>
array(2) {
'num' =>
string(4) "1119"
'let' =>
string(1) "a"
}
[2] =>
array(2) {
'num' =>
string(3) "555"
'let' =>
string(1) "a"
}
[3] =>
array(2) {
'num' =>
string(1) "0"
'let' =>
string(1) "c"
}
}
Simple solution with preg_match_all(with PREG_SET_ORDER flag) and array_map functions:
$input = '00333b1119a555a0c';
preg_match_all('/([0-9]+?)([a-z]+?)/i', $input, $matches, PREG_SET_ORDER);
$result = array_map(function($v) {
return ['num' => $v[1], 'let' => $v[2]];
}, $matches);
print_r($result);
The output:
Array
(
[0] => Array
(
[num] => 00333
[let] => b
)
[1] => Array
(
[num] => 1119
[let] => a
)
[2] => Array
(
[num] => 555
[let] => a
)
[3] => Array
(
[num] => 0
[let] => c
)
)
You can use:
$str = '00333b1119a555a0c';
$arr=array();
if (preg_match_all('/(\d+)(\p{L}+)/', $str, $m)) {
array_walk( $m[1], function ($v, $k) use(&$arr, $m ) {
$arr[] = [ 'num'=>$v, 'let'=>$m[2][$k] ]; });
}
print_r($arr);
Output:
Array
(
[0] => Array
(
[num] => 00333
[let] => b
)
[1] => Array
(
[num] => 1119
[let] => a
)
[2] => Array
(
[num] => 555
[let] => a
)
[3] => Array
(
[num] => 0
[let] => c
)
)
All of the above work. But they didn't seem to have the elegance I wanted - they needed to loop, use array mapping, or (for preg_match_all()) they needed another almost identical regex as well, just to verify the string matched the regex.
I eventually found that preg_match_all() combined with named captures solved it for me. I hadn't used named captures for that purpose before and it looks powerful.
I also added an optional extra step to simplify the output if dups aren't expected (which wasn't in the question but may help someone).
$input = '00333b1119a555a0c';
preg_match_all("/(?P<num>\d+)(?P<let>[dhm])/", $input, $raw_matches, PREG_SET_ORDER);
print_r($raw_matches);
// if dups not expected this is also worth doing
$matches = array_column($raw_matches, 'num', 'let');
print_r($matches);
More complete version with input+duplicate checking
$input = '00333b1119a555a0c';
if (!preg_match("/^(\d+[abc])+$/",$input)) {
// OPTIONAL: detected $input incorrectly formatted
}
preg_match_all("/(?P<num>\d+)(?P<let>[dhm])/", $input, $raw_matches, PREG_SET_ORDER);
$matches = array_column($raw_matches, 'num', 'let');
if (count($matches) != count($raw_matches)) {
// OPTIONAL: detected duplicate letters in $input
}
print_r($matches);
Explanation:
This uses preg_match_all() as suggested by #RomanPerekhrest and #exussum to break out the individual groups and split the numbers and letters. I used named groups so that the resulting array of $raw_matches is created with the correct names already.
But if dups arent expected, then I used an extra step with array_column(), which directly extracts data from a nested array of entries and creates a desired flat array, without any need for loops, mapping, walking, or assigning item by item: from
(group1 => (num1, let1), group2 => (num2, let2), ... )
to the "flat" array:
(let1 => num1, let2 => num2, ... )
If named regex matches feels too advanced then they can be ignored - the matches will be given numbers anyway and this will work just as well, you would have to manually assign letters and it's just harder to follow.
preg_match_all("/(\d+)([dhm])/", $input, $raw_matches, PREG_SET_ORDER);
$matches = array_column($raw_matches, 1, 2);
If you need to check for duplicated letters (which wasn't in the question but could be useful), here's how: If the original matches contained >1 entry for any letter then when array_column() is used this letter becomes a key for the new array, and duplicate keys can't exist. Only one entry for each letter gets kept. So we just test whether the number of matches originally found, is the same as the number of matches in the final array after array_coulmn. If not, there were duplicates.
Good day,
I have an I think rather odd question and I also do not really know how to ask this question.
I want to create a string variable that looks like this:
[car]Ford[/car]
[car]Dodge[/car]
[car]Chevrolet[/car]
[car]Corvette[/car]
[motorcycle]Yamaha[/motorcycle]
[motorcycle]Ducati[/motorcycle]
[motorcycle]Gilera[/motorcycle]
[motorcycle]Kawasaki[/motorcycle]
This should be processed and look like:
$variable = array(
'car' => array(
'Ford',
'Dodge',
'Chevrolet',
'Corvette'
),
'motorcycle' => array(
'Yamaha',
'Ducati',
'Gilera',
'Kawasaki'
)
);
Does anyone know how to do this?
And what is it called what I am trying to do?
I want to explode the string into the two arrays. If it is a sub array
or two individual arrays. I do not care. I can always combine the
latter if I wish so.
But from the above mentioned string to two arrays. That is what I
want.
Solution by Dlporter98
<?php
///######## GET THE STRING FILE OR DIRECT INPUT
// $str = file_get_contents('file.txt');
$str = '[car]Ford[/car]
[car]Dodge[/car]
[car]Chevrolet[/car]
[car]Corvette[/car]
[motorcycle]Yamaha[/motorcycle]
[motorcycle]Ducati[/motorcycle]
[motorcycle]Gilera[/motorcycle]
[motorcycle]Kawasaki[/motorcycle]';
$str = explode(PHP_EOL, $str);
$finalArray = [];
foreach($str as $item){
//Use preg_match to capture the pieces of the string we want using a regular expression.
//The first capture will grab the text of the tag itself.
//The second capture will grab the text between the opening and closing tag.
//The resulting captures are placed into the matches array.
preg_match("/\[(.*?)\](.*?)\[/", $item, $matches);
//Build the final array structure.
$finalArray[$matches[1]][] = $matches[2];
}
print_r($finalArray);
?>
This gives me the following array:
Array
(
[car] => Array
(
[0] => Ford
[1] => Dodge
[2] => Chevrolet
[3] => Corvette
)
[motorcycle] => Array
(
[0] => Yamaha
[1] => Ducati
[2] => Gilera
[3] => Kawasaki
)
)
The small change I had to make was:
Change
$finalArray[$matches[1]] = $matches[2]
To:
$finalArray[$matches[1]][] = $matches[2];
Thanks a million!!
There are many ways to convert the information in this string to an associative array.
split the string on the new line into an array using the explode function:
$str = "[car]Ford[/car]
[car]Dodge[/car]
[car]Chevrolet[/car]
[car]Corvette[/car]
[motorcycle]Yamaha[/motorcycle]
[motorcycle]Ducati[/motorcycle]
[motorcycle]Gilera[/motorcycle]
[motorcycle]Kawasaki[/motorcycle]";
$items = explode(PHP_EOL, $str);
At this point each delimited item is now an array entry.
Array
(
[0] => [car]Ford[/car]
[1] => [car]Dodge[/car]
[2] => [car]Chevrolet[/car]
[3] => [car]Corvette[/car]
[4] => [motorcycle]Yamaha[/motorcycle]
[5] => [motorcycle]Ducati[/motorcycle]
[6] => [motorcycle]Gilera[/motorcycle]
[7] => [motorcycle]Kawasaki[/motorcycle]
)
Next, loop over the array and pull out the appropriate pieces needed to build the final associative array using the preg_match function with a regular expression:
$finalArray = [];
foreach($items as $item)
{
//Use preg_match to capture the pieces of the string we want using a regular expression.
//The first capture will grab the text of the tag itself.
//The second capture will grab the text between the opening and closing tag.
//The resulting captures are placed into the matches array.
preg_match("/\[(.*?)\](.*?)\[/", $item, $matches);
//Build the final array structure.
$finalArray[$matches[1]] = $matches[2]
}
The following is an example of what will be found in the matches array for a given iteration of the foreach loop.
Array
(
[0] => [motorcycle]Gilera[
[1] => motorcycle
[2] => Gilera
)
Please note that I use the PHP_EOL constant to explode the initial string. This may not work if the string was pulled from a different operating system than the one you are running this code on. You may need to replace this with the actual end of line characters that is being used by the string.
Why don't you create two separate arrays?
$cars = array("Ford", "Dodge", "Chevrolet", "Corvette");
$motorcycle = array("Yamaha", "Ducati", "Gilera", "Kawasaki");
You could also use an Associative array to do this.
$variable = array("Ford"=>"car", "Yamaha"=>"motorbike");
Is possible to capturing group under capturing group so i can have an array like that
regex = (asd1).(lol1),(asd2).(asd2)
string = asd1.lol1,asd2.lol2
return_array[0]=>group[0]='asd1';
return_array[0]=>group[1]='lol1';
return_array[1]=>group[0]='asd2';
return_array[1]=>group[1]='lol2';
While using regular expressions can get what you want, you could also use strtok() to iterate through what seems to simply be comma separated sets:
$results = array();
$str = 'asd1.lol1,asd2.lol2';
$token = strtok($str, ',');
while ($token !== false) {
$results[] = explode('.', $token, 2);
$token = strtok(',');
}
Output:
Array
(
[0] => Array
(
[0] => asd1
[1] => lol1
)
[1] => Array
(
[0] => asd2
[1] => lol2
)
)
With regular expressions your pattern needs to only include the two terms surrounding a period, i.e.:
$pattern = '/(?<=^|,)(\w+)\.(\w+)/';
preg_match_all($pattern, $str, $result, PREG_SET_ORDER);
The (?<=^|,) is a look-behind assertion; it makes sure to only match what comes after if preceded by either the start of your search string or a comma, but it doesn't "consume" anything.
Output:
Array
(
[0] => Array
(
[0] => asd1.lol1
[1] => asd1
[2] => lol1
)
[1] => Array
(
[0] => asd2.lol2
[1] => asd2
[2] => lol2
)
)
You're probably looking for preg_match_all.
$regex = '/^((\w+)\.(\w+)),((\w+)\.(\w+))$/';
$string = 'asd1.lol1,asd2.lol2';
preg_match_all($regex, $string, $matches);
This function will create a 2-dimensional array, where the first dimension represents the matched groups (i.e. the parentheses, 0 contains the whole matched string though) and each have subarrays to all the matched lines (only 1 in this case).
[0] => ("asd1.lol1,asd2.lol2") // a view of $matches
[1] => ("asd1.lol1")
[2] => ("asd1")
[3] => ("lol1")
[4] => ("asd2.lol2")
[5] => ("asd2")
[6] => ("lol2")
Your best bet to have groups is to process the first dimension of the array that you want and to then process them further, i.e. get "asd1.lol1" from 1 and 4 and then process these further into asd1 and lol1.
You wouldn't need as many parentheses in your first run:
$regex = '/^(\w+\.\w+),(\w+\.\w+)$/';
will yield:
[0] => ("asd1.lol1,asd2.lol2")
[1] => ("asd1.lol1")
[2] => ("asd2.lol2")
Then you can split the array in 1 and 2 into more granular values.
Flags can be set to preg_match_all to order the output differently. Particularly, PREG_SET_ORDER allows you to have all matched instances in the same subarray. This is of little importance if you're only processing one string, but if you're matching a pattern in a text, it might be more convenient to have all info about one match in $matches[0], and so forth.
Note that if you're just separating a string by comma and then by any periods, you might not need regular expressions and could conveniently use explode() as so:
$string = 'asd1.lol1,asd2.lol2';
$matches = explode(',', $string);
foreach($matches as &$match) {
$match = explode('.', $match);
}
This will give you exactly what you want, but do note that you don't have as much control over the process as with regular expressions – for instance, asd1.lol1.lmao,asd2.lol2.rofl.hehe will also work and they'll produce bigger arrays than you may want. You can check with count() on the size of the subarray and handle the cases when the array isn't of the appropriate size, though. I still believe that's more comfortable than using regular expressions.