Cutting big string into objects - php

I have a huge string from a server, and I want each line as an object (for later foreach loop).
This is part of the string:
1535;;34290;;teaserbanner_881.jpg;;Not allowed;;closed;;;;closed;;
1535;;34291;;teaserbanner_8832.jpg;;Not allowed;;closed;;;;closed;;
1379;;31912;;teaserbanner_844.jpg;;Allowed;;open;;;;open;;
1379;;31913;;teaserbanner_8422.jpg;;allowed;;closed;;;;closed;;
The only thing that stays the same for each line is the "closing tags"
only two options:
;;closed;;;;closed;;
;;open;;;;open;;
I was thinking that it should be the needle for explode or some regex...
The final output should be:
element[0] 1535;;34290;;teaserbanner_881.jpg;;Not allowed;;closed;;;;closed;;
element[1] 1535;;34291;;teaserbanner_8832.jpg;;Not allowed;;closed;;;;closed;;
element[2] 1379;;31912;;teaserbanner_844.jpg;;Allowed;;open;;;;open;;
element[3] 1379;;31913;;teaserbanner_8422.jpg;;allowed;;closed;;;;closed;;
The string doesn't come in "lines" it is one big line.

You can make use of preg_match_all function:
$s = <<< EOF
1535;;34290;;teaserbanner_881.jpg;;Not allowed;;closed;;;;closed;;
1535;;34291;;teaserbanner_8832.jpg;;Not allowed;;closed;;;;closed;;
1379;;31912;;teaserbanner_844.jpg;;Allowed;;open;;;;open;;
1379;;31913;;teaserbanner_8422.jpg;;allowed;;closed;;;;closed;;
EOF;
if (preg_match_all('~(.*?;;(open|closed);{4}\2;;)~', $s, $arr))
print_r($arr[1]);
OUTPUT:
Array
(
[0] => 1535;;34290;;teaserbanner_881.jpg;;Not allowed;;closed;;;;closed;;
[1] => 1535;;34291;;teaserbanner_8832.jpg;;Not allowed;;closed;;;;closed;;
[2] => 1379;;31912;;teaserbanner_844.jpg;;Allowed;;open;;;;open;;
[3] => 1379;;31913;;teaserbanner_8422.jpg;;allowed;;closed;;;;closed;;
)

Please have a look at split. split("\n", $string) will give you an array, where each entry is one line of the string.

You can use file() for this:-
$lines = file('path/to/file');
foreach($lines as $line){
//do something with $line
}
$lines is an array with each element representing a line in the file so that
var_dump($lines);
Would give something like:-
array (size=4)
0 => string '1535;;34290;;teaserbanner_881.jpg;;Not allowed;;closed;;;;closed;;' (length=68)
1 => string '1535;;34291;;teaserbanner_8832.jpg;;Not allowed;;closed;;;;closed;; ' (length=69)
2 => string '1379;;31912;;teaserbanner_844.jpg;;Allowed;;open;;;;open;; ' (length=60)
3 => string '1379;;31913;;teaserbanner_8422.jpg;;allowed;;closed;;;;closed;;' length=63)

Try using preg_split:
$array = preg_split('/(?<=;;closed;;;;closed;;|;;open;;;;open;;)(?!$)/', $string)
(?<=;;closed;;;;closed;;|;;open;;;;open;;) makes sure there are the closing tags before the point of splitting and (?!$) makes sure the string isn't split at the end.
viper7 demo

What does huge mean?
exploding() something actually huge will deplete your PHP memory.
You need to parse it old school, char by char and add them to a bucket. When your condition is met (like the 5th ; or 10th ; or whatever...), consider the bucket a proper object and handle it. But don't store it. Push it to a file, a DB or something.
If things are not that huge, use a regular expression with an 'object' format. Like:
// keep duplicating the (.*?);; until you reach your number of columns.
preg_match_all '~(.*?);;(.*?);;(.*?);;(.*?);;(.*?);;~s' // pseudo-code :)
And this will break it all into objects and properties. Which you can iterate and use.

Related

Why is my foreach loop returning an extra index position?

I am pulling text from a .txt, I explode the line then re-explode it so it can be stored as an array. When I print_r after the 2nd explode it outputs an array with a 4th index position when it should only have 0 1 2 3. Any ideas why?
Any help is appreciated, thank you.
PHP
list($title, $author, $publisher, $isbn) = explode('*', $line);
print_r($line); //outputs a string
$books = explode('*', $line);
echo "<br>";
print_r($books); //outputs array with extra index position.
Sample of Output
Array ( [0] => Business 101 [1] => John Smith[2] => 2002-07-18 [3] => 1-444-2589-x [4] => )
Sample of line from .txt file
Business 101*John Smith*2002-07-18*1-444-2589-x*
Please, share the content of $line, so I can try to help you.
I imagine just one explanation: $line have more values that you expect.
This is comprehensive if you import the data from a file (for example, a CSV file).
So, check if there's something like this:
BookName*Author*PublishDate*ISBN*
Have you seen the last *? This is the problem.
Explode function read the last * and create another field in the array.
If you want to fix this, you can remove the last *, or just pass a limit to your explode function (3 is the number, try it).
array explode(string $delimiter, string $string, int $limit);
Hope it helps you.
Best regards.

Why should one use str_split() in php?

Given the following code :
$str = 'CLAX';
echo $str[2]; //prints 'A'
then why should I use str_split( $str ) to convert string to a array of characters ?
I understand str_split( $str , 2 ) will return array of strings; each string being 2 characters long.
http://php.net/manual/en/function.str-split.php
This function is to split a string into an array with given string split length
By default string split length is set 1
If you want to split a string into given in given length, then you can use str_split. But in your case you are splitting string with default length 1 that is by you are getting confused.
<?php
$str = "CLAX";
echo $str[2]; //here you are referring to 2 index of string
$arr2 = str_split($str);
Array
(
[0] => C
[1] => L
[2] => A
[3] => X
)
echo $str[2]; //here you are referring to 2 index of an array
str_split reference
<?php
$str = "Hello Friend";
$arr2 = str_split($str, 3);
Array
(
[0] => Hel
[1] => lo
[2] => Fri
[3] => end
)
Using str_split() comes in pretty handy when you want to leverage array functions to perform a task on the components in a string.
str_split() works like explode() except it doesn't care what the characters are, just their position in the string -- there are specific use cases for this.
Use Case #1: Group Array Elements by Letter Range
Rather than manually declaring an array with 3 letters per element, like this:
$chunks=['ABC','DEF','GHI','JKL','MNO','PQR','STU','VWX','YZ']
The same array can be produced with:
$chunks=str_split(implode(range('A','Z')),3);
This purely for demonstration. Of course, declaring it manually would be more efficient. The potential benefit for other cases is code flexibility and ease of code modification.
Use Case #2: Convert string to array at different character occurence
Use str_split() when using a foreach loop to process each character.
$string="abbbaaaaaabbbb";
$array=str_split($string);
$last="";
foreach($array as $v){
if(!$last || strpos($last,$v)!==false){
$last.=$v;
}else{
$result[]=$last;
$last=$v;
}
}
$result[]=$last;
var_export($result);
If you try to supply the foreach loop with $string php will choke on it. str_split() is the right tool for this job.
Use Case #3: Find element of an array those contains only specific character set in PHP
Use str_split() to in association with other array functions to check values in a way that string functions are not well suited for.
[I'll refrain from transferring the full code block across.]

Extract dimensions from a string using PHP

I want to extract the dimension from this given string.
$str = "enough for hitting practice. The dimension is 20'X10' *where";
I expect 20'X10' as the result.
I tried with the following code to get the number before and after the string 'X. But it is returning an empty array.
$regexForMinimumPattern ='/((?:\w+\W*){0,1})\'X\b((?:\W*\w+){0,1})/i';
preg_match_all ($regexForMinimumPattern, $str, $minimumPatternMatches);
print_r($minimumPatternMatches);
Can anyone please help me to fix this? Thanks in advance.
Just remove the \b from your pattern (and append a \' in the end if you want the trailing quote):
$regexForMinimumPattern ='/((?:\w+\W*){0,1})\'X((?:\W*\w+){0,1})\'/i';
NB: \b is the meta-character for word-boundaries, you don't need it here.
Assuming that the format of the string we want is 00'X00 :
$regexForMinimumPattern ='/[0-9]{1,2}\'X[0-9]{1,2}/i';
this gives you a result like
Array ( [0] => Array ( [0] => 20'X10 ) )
So: can a simple preg_replace()do that? Perhaps...
<?php
$str = "enough for hitting practice. The dimension is 20'X10' *where";
$dim = preg_replace("#(.*?)(\d*?)(\.\d*)?(')(X)(\d*?)(\.\d*)?(')(.+)#i","$2$3$4$5$6$7", $str);
var_dump($dim); //<== YIELDS::: string '20'X10' (length=6)
You may try it out Here.

Efficient way to parse this string into array in PHP?

Background
I have an array which I create by splitting a string based on every occurrence of 0d0a using preg_split('/(?<=0d0a)(?!$)/').
For example:
$string = "78781110d0a78782220d0a";
will be split into:
Array ( [0] => 78781110d0a [1] => 78782220d0a )
A valid array element has to start with 7878 and end with 0d0a.
The Problem
But sometimes, there's an additional 0d0a in the string which splits into an extra and invalid array element, i.e., that doesn't begin with 7878.
Take this string for example:
$string = "78781110d0a2220d0a78783330d0a";
This is split into:
Array ( [0] => 78781110d0a [1] => 2220d0a [2] => 78783330d0a )
But it should actually be:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a)
My Solution
I've written the following (messy) code to get around this:
$data = Array('78781110d0a','2220d0a','78783330d0a');
$i = 0; //count for $data array;
$j = 0; //count for $dataFixed array;
$dataFixed = $data;
foreach($data as $packet) {
if (substr($packet,0,4) != "7878") { //if packet doesn't start with 7878, do some fixing
if ($i != 0) { //its the first packet, can't help it!
$j++;
if ((substr(strtolower($packet), -4, 4) == "0d0a")) { //if the packet doesn't end with 0d0a, its 'mostly' not valid, so discard it
$dataFixed[$i-$j] = $dataFixed[$i-$j] . $packet;
}
unset($dataFixed[$i-$j+1]);
$dataFixed = array_values($dataFixed);
}
}
$i++;
}
Description
I first copy the array to another array $dataFixed. In a foreach loop of the $data array, I check whether it starts with 7878. If it doesn't, I join it with the previous array in $data. I then unset the current array in $dataFixed and reset the array elements with array_values.
But I'm not very confident about this solution.. Is there a better, more efficient way?
UPDATE
What if the input string doesn't end in 0d0a like its supposed to? It will stick to the previous array element..
For e.g.: in the string 78781110d0a2220d0a78783330d0a0000, 0000 should be separated as another array element.
Use another positive lookahead (?=7878) to form:
preg_split('/(?<=0d0a)(?=7878)/',$string)
Note: I removed (?!$) because I wasn't sure what that was for, based on your example data.
For example, this code:
$string = "78781110d0a2220d0a78783330d0a";
$array = preg_split('/(?<=0d0a)(?=7878)(?!$)/',$string);
print_r($array);
Results in:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a )
UPDATE:
Based on your revised question of having possible random characters at the end of the input string, you can add three lines to make a complete program of:
$string = "78781110d0a2220d0a787830d0a330d0a0000";
$array = preg_split('/(?<=0d0a)(?=7878)/',$string);
$temp = preg_split('/(7878.*0d0a)/',$array[count($array)-1],null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$array[count($array)-1] = $temp[0];
if(count($temp)>1) { $array[] = $temp[1]; }
print_r($array);
We basically do the initial splitting, then split the last element of the resulting array by the expected data format, keeping the delimiter using PREG_SPLIT_DELIM_CAPTURE. The PREG_SPLIT_NO_EMPTY ensures we won't get an empty array element if the input string doesn't end in random characters.
UPDATE 2:
Based on your comment below where it seems you're implying there might be random characters between any of the desired matches, and you want these random characters preserved, you could do this:
$string = "0078781110d0a2220d0a2220d0a0000787830d0a330d0a000078781110d0a2220d0a0000787830d0a330d0a0000";
$split1 = preg_split('/(7878.*?0d0a)/',$string,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$result = array();
foreach($split1 as $e){
$split2 = preg_split('/(.*0d0a)/',$e,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
foreach($split2 as $el){
// test if $el doesn't start with 7878 and ends with 0d0a
if(strpos($el,'7878') !== 0 && substr($el,-4) == '0d0a'){
//if(preg_match('/^(?!7878).*0d0a$/',$el) === 1){
$result[ count($result)-1 ] = $result[ count($result)-1 ] . $el;
} else {
$result[] = $el;
}
}
}
print_r($result);
The strategy employed here is different than above. First we split the input string based on the delimiter that matches your desired data, using the nongreedy regex .*?. At this point we have some strings that contain the ending of a desired value and some garbage at the end, so we split again based on the last occurrence of "0d0a" with the greedy regex .*0d0a. We then append any of those resulting values that don't start with "7878" but end with "0d0a" to the previous value, as this should repair the first and second halves that got split because it contained an extra "0d0a".
I provided two methods for the innermost if statement, one using regular expressions. The regex one is marginally slower in my testing, so I've left that one commented out.
I might still not have your full requirements, so you'll have to let me know if it works and perhaps provided your full dataset.
I think you are using a delimiter "0d0a" which also happens to be part of a content! Its not possible to avoid getting junk data as long as delimiter can also be part of content. Somehow delimiter must be unique.
Possible solutions.
Change the delimited to something else that doesn't occur as part of your data ( 000000, #!.;)
If you are definite about length of text that easy arrange item may have, use it. As per examples its not possible.
Solutions given in answers considering only sample data you have shared. If you are confidant about what will be the content of string, then these solutions given by others are pretty good to use. Otherwise these solutions wont assure you guarantee!
Best solution: Fix right delimiter then use regex or explode whatever you prefer.
Why don't you use preg_match_all instead? You can avoid all of the non-capturing groups (the look aheads, look behinds) in order to split the string (which without the non-capturing groups removes the matches), and just find the matches you're looking for:
Updated
<?php
$string = "00787817878110d0a22278780d0a78783330d0a00";
preg_match_all('/7878.*?0d0a(?=7878|[^(7878)]*?$)/', $string, $arr);
print_r($arr);
?>
Gives an array $arr[0] => ( [0] => 787817878110d0a22278780d0a, [1] => 78783330d0a ). Strips leading and trailing garbage characters (whatever doesn't start with 7878 or end with 7878 or 0d0a.
So $arr[0] would be the array of values that you are looking for.
See example on ideone
Works with multiple 7878 values and multiple 0d0a values (even though that's ridiculous).
Update
If splitting is more your style, why not avoid regular expressions altogether?
<?php
$string = "787817878110d0a22278780d0a78783330d0a";
$arr = explode('0d0a7878', $string);
$string = implode('0d0a,7878', $arr);
$arr = explode(',', $string);
print_r($arr);
?>
Here we split the string by the delimiter 0d0a7878, which is what #CharlieGorichanaz's solution is doing, and props to him for the quick, accurate solution. We then add a comma, because who doesn't love comma separated values? And we explode again on the commas for an array of desired values. Performance-wise, this ought to be faster than using regular expressions. See example.

php convert string with new lines into array?

I am getting data from an API and the resulting string is
[RESPONSE]
PROPERTY[STATUS][0]=ACTIVE
PROPERTY[REGISTRATIONEXPIRATIONDATE][0]=2012-04-04 19:48:48
DESCRIPTION=Command completed successfully
QUEUETIME=0
CODE=200
RUNTIME=0.352
QUEUETIME=0
RUNTIME=0.8
EOF
I am trying to convert this into an array like
Array(
['PROPERTY[STATUS][0]'] => ACTIVE,
['CODE'] => 200,
...
);
So I am trying to explode it using the resulting file_get_content function with an explode like
$output = explode('=',file_get_contents($url));
But the problem is the returning values are not always returned in the same order, so I need to have it like $array['CODE'] = 200, and $array['RUNTIME'] = 0.352 however there does not seem to be any kind of new line characters? I tried \r\n, \n, <br>, \r\n\r\n in the explode function to no avail. But there is new lines in both notepad and the browser.
So my question is there some way to determine if a string is on a new line or determine what the character forcing the new line is? If not is there some other way I could read this into an array?
To find out what the breaking character is, you could do this (if $data contatins the string example you've posted):
echo ord($data[strlen('[RESPONSE]')]) . PHP_EOL;
echo ord($data[strlen('[RESPONSE]')+1]); // if there's a second char
Then take a look in the ASCII table to see what it is.
EDIT: Then you could explode the data using that newly found character:
explode(ord($ascii_value), $data);
Btw, does file() return a correct array?
Explode on "\n" with double quotes so PHP understands this is a line feed and not a backslashed n ;-) then explode each item on =
Why not just use parse_ini_file() or parse_ini_string()?
It should do everything you need (build an array) in one easy step.
Try
preg_split("/$/m", $str)
or
preg_split("/$\n?/m", $str)
for the split
The lazy solution would be:
$response = strtr($response, "\r", "\n");
preg_match_all('#^(.+)=(.+)\s*$#m', $response, $parts);
$parts = array_combine($parts[1], $parts[2]);
Gives you:
Array (
[PROPERTY[STATUS][0]] => ACTIVE
[PROPERTY[REGISTRATIONEXPIRATIONDATE][0]] => 2012-04-04 19:48:48
[DESCRIPTION] => Command completed successfully
[QUEUETIME] => 0
[CODE] => 200
[RUNTIME] => 0.8

Categories