Efficient way to parse this string into array in PHP? - php

Background
I have an array which I create by splitting a string based on every occurrence of 0d0a using preg_split('/(?<=0d0a)(?!$)/').
For example:
$string = "78781110d0a78782220d0a";
will be split into:
Array ( [0] => 78781110d0a [1] => 78782220d0a )
A valid array element has to start with 7878 and end with 0d0a.
The Problem
But sometimes, there's an additional 0d0a in the string which splits into an extra and invalid array element, i.e., that doesn't begin with 7878.
Take this string for example:
$string = "78781110d0a2220d0a78783330d0a";
This is split into:
Array ( [0] => 78781110d0a [1] => 2220d0a [2] => 78783330d0a )
But it should actually be:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a)
My Solution
I've written the following (messy) code to get around this:
$data = Array('78781110d0a','2220d0a','78783330d0a');
$i = 0; //count for $data array;
$j = 0; //count for $dataFixed array;
$dataFixed = $data;
foreach($data as $packet) {
if (substr($packet,0,4) != "7878") { //if packet doesn't start with 7878, do some fixing
if ($i != 0) { //its the first packet, can't help it!
$j++;
if ((substr(strtolower($packet), -4, 4) == "0d0a")) { //if the packet doesn't end with 0d0a, its 'mostly' not valid, so discard it
$dataFixed[$i-$j] = $dataFixed[$i-$j] . $packet;
}
unset($dataFixed[$i-$j+1]);
$dataFixed = array_values($dataFixed);
}
}
$i++;
}
Description
I first copy the array to another array $dataFixed. In a foreach loop of the $data array, I check whether it starts with 7878. If it doesn't, I join it with the previous array in $data. I then unset the current array in $dataFixed and reset the array elements with array_values.
But I'm not very confident about this solution.. Is there a better, more efficient way?
UPDATE
What if the input string doesn't end in 0d0a like its supposed to? It will stick to the previous array element..
For e.g.: in the string 78781110d0a2220d0a78783330d0a0000, 0000 should be separated as another array element.

Use another positive lookahead (?=7878) to form:
preg_split('/(?<=0d0a)(?=7878)/',$string)
Note: I removed (?!$) because I wasn't sure what that was for, based on your example data.
For example, this code:
$string = "78781110d0a2220d0a78783330d0a";
$array = preg_split('/(?<=0d0a)(?=7878)(?!$)/',$string);
print_r($array);
Results in:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a )
UPDATE:
Based on your revised question of having possible random characters at the end of the input string, you can add three lines to make a complete program of:
$string = "78781110d0a2220d0a787830d0a330d0a0000";
$array = preg_split('/(?<=0d0a)(?=7878)/',$string);
$temp = preg_split('/(7878.*0d0a)/',$array[count($array)-1],null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$array[count($array)-1] = $temp[0];
if(count($temp)>1) { $array[] = $temp[1]; }
print_r($array);
We basically do the initial splitting, then split the last element of the resulting array by the expected data format, keeping the delimiter using PREG_SPLIT_DELIM_CAPTURE. The PREG_SPLIT_NO_EMPTY ensures we won't get an empty array element if the input string doesn't end in random characters.
UPDATE 2:
Based on your comment below where it seems you're implying there might be random characters between any of the desired matches, and you want these random characters preserved, you could do this:
$string = "0078781110d0a2220d0a2220d0a0000787830d0a330d0a000078781110d0a2220d0a0000787830d0a330d0a0000";
$split1 = preg_split('/(7878.*?0d0a)/',$string,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$result = array();
foreach($split1 as $e){
$split2 = preg_split('/(.*0d0a)/',$e,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
foreach($split2 as $el){
// test if $el doesn't start with 7878 and ends with 0d0a
if(strpos($el,'7878') !== 0 && substr($el,-4) == '0d0a'){
//if(preg_match('/^(?!7878).*0d0a$/',$el) === 1){
$result[ count($result)-1 ] = $result[ count($result)-1 ] . $el;
} else {
$result[] = $el;
}
}
}
print_r($result);
The strategy employed here is different than above. First we split the input string based on the delimiter that matches your desired data, using the nongreedy regex .*?. At this point we have some strings that contain the ending of a desired value and some garbage at the end, so we split again based on the last occurrence of "0d0a" with the greedy regex .*0d0a. We then append any of those resulting values that don't start with "7878" but end with "0d0a" to the previous value, as this should repair the first and second halves that got split because it contained an extra "0d0a".
I provided two methods for the innermost if statement, one using regular expressions. The regex one is marginally slower in my testing, so I've left that one commented out.
I might still not have your full requirements, so you'll have to let me know if it works and perhaps provided your full dataset.

I think you are using a delimiter "0d0a" which also happens to be part of a content! Its not possible to avoid getting junk data as long as delimiter can also be part of content. Somehow delimiter must be unique.
Possible solutions.
Change the delimited to something else that doesn't occur as part of your data ( 000000, #!.;)
If you are definite about length of text that easy arrange item may have, use it. As per examples its not possible.
Solutions given in answers considering only sample data you have shared. If you are confidant about what will be the content of string, then these solutions given by others are pretty good to use. Otherwise these solutions wont assure you guarantee!
Best solution: Fix right delimiter then use regex or explode whatever you prefer.

Why don't you use preg_match_all instead? You can avoid all of the non-capturing groups (the look aheads, look behinds) in order to split the string (which without the non-capturing groups removes the matches), and just find the matches you're looking for:
Updated
<?php
$string = "00787817878110d0a22278780d0a78783330d0a00";
preg_match_all('/7878.*?0d0a(?=7878|[^(7878)]*?$)/', $string, $arr);
print_r($arr);
?>
Gives an array $arr[0] => ( [0] => 787817878110d0a22278780d0a, [1] => 78783330d0a ). Strips leading and trailing garbage characters (whatever doesn't start with 7878 or end with 7878 or 0d0a.
So $arr[0] would be the array of values that you are looking for.
See example on ideone
Works with multiple 7878 values and multiple 0d0a values (even though that's ridiculous).
Update
If splitting is more your style, why not avoid regular expressions altogether?
<?php
$string = "787817878110d0a22278780d0a78783330d0a";
$arr = explode('0d0a7878', $string);
$string = implode('0d0a,7878', $arr);
$arr = explode(',', $string);
print_r($arr);
?>
Here we split the string by the delimiter 0d0a7878, which is what #CharlieGorichanaz's solution is doing, and props to him for the quick, accurate solution. We then add a comma, because who doesn't love comma separated values? And we explode again on the commas for an array of desired values. Performance-wise, this ought to be faster than using regular expressions. See example.

Related

Get everything after specific word and before specific word in PHP

Take a look this string: (parent)item category(child)master data(name)category
by the way, that string is dynamic, and I want word inside () as array key and everything after () is that key value before next ()
how can I get the array result from the string above to this: ["parent" => "item category", "child" => "master data", "name" => "category"]?
This probably is what you are looking for:
<?php
$input = "(parent)item category(child)master data(name)category";
preg_match_all('/\(([^()]+)\)([^()]+)/', $input, $matches);
$output = array_combine($matches[1], $matches[2]);
print_r($output);
The output obviously is:
Array
(
[parent] => item category
[child] => master data
[name] => category
)
The approach uses a "regular expression" matching all occurrences of a pattern in the input string. All that is left is to combine the matched tokens which is done by the array_combine(...) call.
Note that such an approach works, but is very limited. It fails with more complex input structure due to the fact that pattern matching based on regular expressions is limited itself. In such cases you'd either have to implement a real language parser (or use a compiler-compiler like yacc or bison to do that for you). Or you simplify your input data structure which usually is more promising ;-)
you can use explode to get an array based on the selected word
<?php
$str = "(parent)item category(child)master data(name)category";
$list = explode("(", $str);
$x = [];
foreach($list as $item){
if($item != null) {
$i = explode(")",$item);
$x[$i[0]] = $i[1];
}
}
print_r($x);

How to implode a multi-dimensional array?

I have an array of arrays like:
$array = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]]
Now, I want to implode this array such that it would return something like this:
COTV_LITE(1800)
COTV_PREMIUM(2200)
How do I achieve this? Calling just the implode() function did not work:
implode ('<br>', $array);
You can call array_map() to implode the nested arrays:
echo implode('<br>', array_map(function($a) { return implode(' ', $a); }, $array));
DEMO
output:
1. COTV_LITE(1800)<br>2. COTV_PREMIUM(2200)
You can use variable length arguments variadic in PHP >= 5.6
Option1
$items = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]];
echo implode(' ',array_merge(...$items));
Output
1. COTV_LITE(1800) 2. COTV_PREMIUM(2200)
This is more of a precursor for the next option.
Option2
If you want to get a bit more creative you can use preg_replace too:
$items = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]];
$replace = [
'/^(\d+\.)$/' => '<li>\1 ',
'/^(\w+\(\d+\))$/' => '\1</li>'
];
echo '<ul>'.implode(preg_replace(array_keys($replace),$replace,array_merge(...$items))).'</ul>';
Output
<ul><li>1. COTV_LITE(1800)</li><li>2. COTV_PREMIUM(2200)</li></ul>
Option3
And lastly using an olordered list, which does the numbers for you. In this case we only need the second item from the array (index 1):
$items = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]];
echo '<ol><li>'.implode('</li><li>',array_column($items,1)).'</li></ol>';
Output
<ol><li>COTV_LITE(1800)</li><li>COTV_PREMIUM(2200)</li></ol>
Personally, I would put it in the ol that way you don't have to worry about the order of the numbers, you can let HTML + CSS handle them. Also it's probably the easiest and most semantically correct way, But I don't know if the numbering in the array has any special meaning or not.
In any case I would most definitely put this into a list to render it to HTML. This will give you a lot more options for styling it, later.
Update
want to use option 1. But how do I put each option on a different line using <br>
That one will put the <br> between each array element:
echo implode('<br>',array_merge(...$items));
Output
1.<br>COTV_LITE(1800)<br>2.<br>COTV_PREMIUM(2200)
The only way to easily fix that (while keeping the array_merge) is with preg_replace, which is the second one. So I will call this:
Option 1.2
$items = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]];
echo implode(preg_replace('/^(\w+\(\d+\))$/',' \1<br>',array_merge(...$items)));
Output
1. COTV_LITE(1800)<br>2. COTV_PREMIUM(2200)<br>
Sandbox
Basically there is no way to tell where the end item is after merging them. That operation effectively flattens the array out and gives us something like this:
["1.","COTV_LITE(1800)","2.","COTV_PREMIUM(2200)"]
So that Regex does this 'COTV_PREMIUM(2200)' becomes ' COTV_PREMIUM(2200)<br>'. This is just a way of changing that without having to dip into the array with some logic or something. WE wind up with this modification to the array:
["1."," COTV_LITE(1800)<br>","2."," COTV_PREMIUM(2200)<br>"]
Then with implode we just flatten it again into a string:
"1. COTV_LITE(1800)<br>2. COTV_PREMIUM(2200)<br>"
The Regex ^(\w+\(\d+\))$
^ - Match start of string
(...) - capture group 1
\w+ - match any working character a-zA-Z0-9_ one or more, eg. COTV_PREMIUM
\( - match the ( literally
\d+ - match digits 0-9 one or more, eg 2200
\) - match the ) literally
$ - match end of string
So this matches the pattern of the second (or even) items in the array, then we replace that with this:
The Replacement ' \1<br>'
{space} - adds a leading space
\1 - the value of capture group 1 (from above)
<br> - append a line break
Hope that makes sense. This should work as long as they meet that pattern. Obviously we can adjust the pattern, but with such a small sample size it's hard for me to know what variations will be there.
For example something as simple as (.+\))$ will work TestIt. This one just looks for the ending ). We just need somethng to capture all of the even ones, while not matching the odd. Regular expressions can be very confusing the first few times you see them, but they are extremely powerful.
PS - I added a few links to the function names, these go the the PHP documentation page for them.
Cheers!
Try this
$items = [["1.","COTV_LITE(1800)"],["2.","COTV_PREMIUM(2200)"]];
$imploded = [];
foreach($items as $item) {
$item_entry = implode(' ', $item);
echo $item_entry . '<br/>'; // display items
$imploded[] = $item_entry;
}
// your desired result is in $imploded variable for further use

How to numerically sort an array like this: ['11--2017 name.png','1--2016 name.png','2--1999 name.png']

Am I correct that character precedence would order these like this:
1--2016 name.png, 11--2017 name.png, 2--1999 name.png
Numerically, however, they would be like this:
1--2016 name.png, 2--1999 name.png, 11--2017 name.png
That is, if I'm looking at the first numbers alone. How do you numerically sort an array with strings like this? Namely, integers appended with "--".
It's important to note that these "strings" are actually pathnames which cannot be renamed. See glob for more information.
Edit, after modified question:
After your edit, obviously all answers in this thread are wrong. Also, you don't have to only copy-and-paste a piece of code, but to read entire answer. Sure enough, in my original answer, I say:
if you have a value like “12--3”, it will be sorted like “123”
So, you could see right away that your real case is not coherent with provided sample.
This second solution will sort an array by number at start of given basename path followed by two dashes. It will be applicable on following cases:
String Will be sorted by
------------------------------ -----------------
/Absolute/Path/12-- 12
/Absolute/Path/12--2001.png 12
/12--2001.png 12
12--2001.png 12
a12--2001.png a12--2001.png
-12--2001.png -12--2001.png
Having this array:
[
'/path/to/image/1--2016 name.png',
'/path/to/image/11--2017.png',
'/path/to/image/2--1999.png'
]
You can replace regular expression patter of above original solution with this pattern:
~^(.*/)?(\d+)--[^/]*$~
And above array will be sorted in this way:
Array
(
[0] => /path/to/image/1--2016 name.png
[1] => /path/to/image/2--1999.png
[2] => /path/to/image/11--2017.png
)
eval.in demo
Pattern explanation:
~
^ # Start of string
(.*/)? # Group 1 (optional): zero-ore-more characters followed by a slash
(\d+) # Group 2: one-or-more digits
-- # two dashes
[^/]* # zero-or-more characters, except slash
$ # End of string
~
In the future, take a look at How to create a Minimal, Complete, and Verifiable example
Original answer (for original question):
There are surely many ways to obtain your result. Using usort and preg_replace:
$array = ['11--','23--','1--'];
usort
(
$array,
function( $a, $b )
{
return preg_replace( '~[^\d]~', '', $a ) - preg_replace( '~[^\d]~', '', $b );
}
);
$array now is:
Array
(
[0] => 1--
[1] => 11--
[2] => 23--
)
Above solution will sort your array deleting1 all not digits characters.
So, if you have a value like 12--3, it will be sorted like 123. Consequently, it doesn't work on not-integer or negative numbers.
1 Actually, the original array values are not changed.
If you wanted a quick fix to getting this done, you could:
$strings = array('5--', '2--', '11--');
$newStrings = array();
foreach ($strings as $string) {
$stringNew = str_replace('--', '', $string);
array_push($newStrings, $stringNew);
}
sort($newStrings);
$doneArray = array();
foreach ($newStrings as $newString) {
array_push($doneArray, $newString.'--');
}
// $doneArray is the new array full of the sorted strings.
I didn't really bother with the variable names, but that's a nice way to do it.
natsort
See here.
I'm not sure how glob sorts things as they come in, but I thought that sort would have ordered them correctly, but natsort will do the trick.

PHP regex to find values of terms in a string

I am trying to generate a regex that allows me to do the following:
I have a string containing several terms, all which are alphanumeric and maybe some of these special characters: +.#
They are separated by a comma as well.
This is kind of how it looks like:
$string = 'Term1,Term2,Term3,Term4'; ... And so on... (around 60 terms)
I want to be able to get each term and assign it to a variable, because I want to employ a second Regex to a long string, for example:
$secondString = 'This string may contain some terms, such as Term1, or maybe Term2';
So pretty much I want to be able to check if any of the terms in the first string are present in the second string.
I watched the following tutorial:
https://www.youtube.com/watch?v=EkluES9Rvak
But I just seem to not be able to come up with something.
Thank you so much for your help in advance!
Cheers!
You can use array_intersect function after splitting strings into tokens:
$string = 'Term1,Term2,Term3,Term4';
$secondString = 'This string may contain some terms, such as Term1, or maybe Term2';
$arr1 = explode(',', $string);
$arr2 = preg_split('/[,\h]+/', $secondString);
$arr = array_intersect(array_map('strtolower', $arr1), array_map('strtolower', $arr2));
print_r($arr);
Output:
Array
(
[0] => Term1
[1] => Term2
)

Parse text and populate associative array from two substrings per line

Given a large string of text, I want to search for the following patterns:
#key: value
So an example is:
some crazy text
more nonesense
#first: first-value;
yet even more non-sense
#second: second-value;
finally more non-sense
The output should be:
array("first" => "first-value", "second" => "second-value");
<?php
$string = 'some crazy text
more nonesense
#first: first-value;
yet even more non-sense
#second: second-value;
finally more non-sense';
preg_match_all('##(.*?): (.*?);#is', $string, $matches);
$count = count($matches[0]);
for($i = 0; $i < $count; $i++)
{
$return[$matches[1][$i]] = $matches[2][$i];
}
print_r($return);
?>
Link http://ideone.com/fki3U
Array (
[first] => first-value
[second] => second-value )
Tested in PHP 5.3:
// set-up test string and final array
$myString = "#test1: test1;#test2: test2;";
$myArr = array();
// do the matching
preg_match_all('/#([^\:]+)\:([^;]+);/', $myString, $matches);
// put elements of $matches in array here
$actualMatches = count($matches) - 1;
for ($i=0; $i<$actualMatches; $i++) {
$myArr[$matches[1][$i]] = $matches[2][$i];
}
print_r($myArr);
The reasoning behind this is this:
The regex is creating two capture groups. One capture group is the key, the
other the data for that key. The capture groups are the portions of the regex
inside left and right bananas, i.e., (...).
$actualMatches just adjusts for the fact that preg_match_all returns an
extra element containing all matches lumped together.
Demo.
Match whole qualifying lines starting with # and ending with ;.
Capture the substring that does not contain any colons as the first group and capture the substring between the space after the colon and the semicolon at the end of the line.
By using the any character dot in the second capture group, the substring may contain a semicolon without damaging any extracted data.
Call array_combine() to form key-value relationships between the two capture groups.
Code: (Demo)
preg_match_all(
'/^#([^:]+): (.+);$/m',
$text,
$m
);
var_export(array_combine($m[1], $m[2]));
Output:
array (
'first' => 'first-value',
'second' => 'second-value',
)
You can try looping the string line by line (explode and foreach) and check if the line starts with an # (substr) if it has, explode the line by :.
http://php.net/manual/en/function.explode.php
http://nl.php.net/manual/en/control-structures.foreach.php
http://nl.php.net/manual/en/function.substr.php
Depending on what your input string looks like, you might be able to simply use parse_ini_string, or make some small changes to the string then use the function.

Categories