Dealing with commas in CSV - php

I get a CSV data from a SOAP call in php. Unfortunately, the data may have commas in it. It is formatted correctly as in
1,name,2,lariat,3,"first, last",5,NMEA,...
I need to parse it to individual values in either php or javascript. I have browsed through threads on stack overflow and elsewhere but have not found a specific solution in php / javascript.
The approach I am currently using is
$subject = '123,name,456,lryyrt,123213,"first,last",8585,namea3';
$pattern = '/,|,"/';
$t2=preg_replace ('/,|(".*")/','$0*',$subject);
$t2=str_replace(',','*',$t2);
$t2=str_replace('*',',',$t2);
Where * is the deliminator, but the preg_replace generates an extra *. I have tried a couple of other approaches involving preg_match and other preg_ functions but did not succeed in having any kind of a clean split.
Any suggestion on how to split up CSV data that contains commas in it?

Don't attempt to do this with a regular expression. Just use str_getcsv()! The third parameter informs str_getcsv() to look for quote-enclosed fields.
$subject = '123,name,456,lryyrt,123213,"first,last",8585,namea3';
$array = str_getcsv($subject, ",", '"');
print_r($array);
// Prints:
Array
(
[0] => 123
[1] => name
[2] => 456
[3] => lryyrt
[4] => 123213
[5] => first,last
[6] => 8585
[7] => namea3
)

Just another way to convert a csv file to an associative array.
<?php
//
// Convert csv file to associative array:
//
function csv_to_array($input, $delimiter=',')
{
$header = null;
$data = array();
$csvData = str_getcsv($input, "\n");
foreach($csvData as $csvLine){
if(is_null($header)) $header = explode($delimiter, $csvLine);
else{
$items = explode($delimiter, $csvLine);
for($n = 0, $m = count($header); $n < $m; $n++){
$prepareData[$header[$n]] = $items[$n];
}
$data[] = $prepareData;
}
}
return $data;
}
//-----------------------------------
//
//Usage:
$csvArr = csv_to_array(file_get_contents('test.csv'));
?>

For JavaScript use jQuery-CSV
If you're already using jQuery, just add the jquery-csv.js module to expose the extension methods. Then just convert the CSV directly to a JavaScript array.
If you're only parsing the following will convert it to a one-dimensional array:
$.csv.toArray(csv);
If you have a multi-line CSV string the following will convert it to a two-dimensional array:
$.csv.toArrays(csv);
Note: all the different line endings are detected and split correctly using a smart regex.
The default delimiter is a double-quote (") and the default separator is a comma (,) but you can change use custom settings by specifying them in the method call.
Ex:
$.csv.toArray(csv, {
separator:';',
delimiter:"'"
});
I created the project to provide an end-to-end CSV parser written in JavaScript that takes the guesswork out of importing-exporting CSV.
Doing the heavy lifting on the client removes unnecessary load on the server and removes any unnecessary AJAX round-trips to the server.
Update:
If you're looking for a server-side solution, the library also works on Node.js.

Related

Converting visually formatted plain text output to array

I am using Virtualmin and testing out the Remote API PHP features for a project. I can't seem to find any good documentation on this so I am testing out different functions randomly.
Making an API call is done like this:
<?php
$result = shell_exec("wget -O - --quiet --http-user=root --http-passwd=pass --no-check-certificate 'https://localhost:10000/virtual-server/remote.cgi?program=list-users&domain=testdomain.co.uk'");
echo $result;
?>
The issue I am facing is the result output is plain text formatted in a way which is easily readable. I have provided an example image below. I have tried using explode to create an array which is easily manageable but I cannot explode by blank space as parts such as "real name" or the actual real names have spaces? This also seems to output tens of empty array parts [1] => [2] => [3] => [4] => [5] =>
This is what I currently have which is providing numerous empty array values.
$lines = explode("\n", $result);
$out = array();
foreach ($lines as $line) {
$parts = explode(" ", $line);
}
Ideally I would like the array to work like a multidimensional array.

Obtaining PHP regex matches but unable to do anything with them

I have some PHP code that accepts an uploaded file from an HTML form then reads through it using regex to look for specific lines (in the case below, those with "Track Number" followed by an integer).
The file is an XML file that looks like this normally...
<key>Disc Number</key><integer>2</integer>
<key>Disc Count</key><integer>2</integer>
<key>Track Number</key><integer>1</integer>
But when PHP reads it in it gets rid of the XML tags for some reason, leaving me with just...
Disc Number2
Disc Count2
Track Number1
The file has to be XML, and I don't want to use SimpleXML cause that's a whole other headache. The regex matches the integers like I want it to (I can print them out "0","1","2"...) but of course they're returned as strings in $matches, and it seems I'm unable to make use of these strings. I need to check if the integer is between 0 and 9 but I um unable to do this no matter what I try.
Using intval() or (int) to first convert the matches to integers always returns 0 even though the given string contains only integers. And using in_array to compare the integer to an array of 0-9 as strings always returns false as well for some reason. Here's the trouble code...
$myFile = file($myFileTmp, FILE_IGNORE_NEW_LINES);
$numLines = count($myFile) - 1;
$matches = array();
$nums = array('0','1','2','3','4','5','6','7','8','9');
for ($i=0; $i < $numLines; $i++) {
$line = trim($myFile[$i]);
$numberMatch = preg_match('/Track Number(.*)/', $line, $matches); // if I try matching integers specifically it doesn't return a match at all, only if I do it like this - it gives me the track number I want but I can't do anything with it
if ($numberMatch == 1 and ctype_space($matches[1]) == False) {
$number = trim($matches[1]); // string containing an integer only
echo(intval($number)); // conversion doesn't work - returns 0 regardless
if (in_array($number,$nums)===True) { // searching in array doesn't work - returns FALSE regardless
$number = "0" . $number;
}
}
}
I've tried type checking, double quotes, single quotes, trimming whitespace, UTF8 encoding, === operator, regex matching numbers specifically with (\d+) (which doesn't return a match at all)...what else could it possibly be? When I try these things with regular strings it works fine, but the regex is messing everything up here. I'm about to give up on this app entirely, please save me.
Why is SimpleXML not an option? Consider the following code:
$str = "<container><key>Disc Number</key><integer>2</integer>
<key>Disc Count</key><integer>2</integer>
<key>Track Number</key><integer>1</integer></container>";
$xml = simplexml_load_string($str);
foreach ($xml->key as $k) {
// do sth. here with it
}
You should read RegEx match open tags except XHTML self-contained tags -- while doesn't exactly match your use case it has good reasons why one should use something besides straight up regexp matching for your use case.
Assuming that files only contain a single Track Number you can simplify what you're doing a lot. See the following:
test.xml
<key>Disc Number</key><integer>2</integer>
<key>Disc Count</key><integer>2</integer>
<key>Track Number</key><integer>1</integer>
test.php
<?php
$contents = file_get_contents('test.xml');
$result = preg_match_all("/<key>Track Number<\/key><integer>(\d)<\/integer>/", $contents, $matches);
if ($result > 0) {
print_r($matches);
$trackNumber = (int) $matches[1][0];
print gettype($trackNumber) . " - " . $trackNumber;
}
Result
$ php -f test.php
Array
(
[0] => Array
(
[0] => <key>Track Number</key><integer>1</integer>
)
[1] => Array
(
[0] => 1
)
)
integer - 1%
As you can see, there is no need to iterate through the files line by line when using preg_match_all. The matching here is very specific so you don't have to do extra checks for whitespace or validate that it's a number. Which you're doing against a string value currently.

Efficient way to parse this string into array in PHP?

Background
I have an array which I create by splitting a string based on every occurrence of 0d0a using preg_split('/(?<=0d0a)(?!$)/').
For example:
$string = "78781110d0a78782220d0a";
will be split into:
Array ( [0] => 78781110d0a [1] => 78782220d0a )
A valid array element has to start with 7878 and end with 0d0a.
The Problem
But sometimes, there's an additional 0d0a in the string which splits into an extra and invalid array element, i.e., that doesn't begin with 7878.
Take this string for example:
$string = "78781110d0a2220d0a78783330d0a";
This is split into:
Array ( [0] => 78781110d0a [1] => 2220d0a [2] => 78783330d0a )
But it should actually be:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a)
My Solution
I've written the following (messy) code to get around this:
$data = Array('78781110d0a','2220d0a','78783330d0a');
$i = 0; //count for $data array;
$j = 0; //count for $dataFixed array;
$dataFixed = $data;
foreach($data as $packet) {
if (substr($packet,0,4) != "7878") { //if packet doesn't start with 7878, do some fixing
if ($i != 0) { //its the first packet, can't help it!
$j++;
if ((substr(strtolower($packet), -4, 4) == "0d0a")) { //if the packet doesn't end with 0d0a, its 'mostly' not valid, so discard it
$dataFixed[$i-$j] = $dataFixed[$i-$j] . $packet;
}
unset($dataFixed[$i-$j+1]);
$dataFixed = array_values($dataFixed);
}
}
$i++;
}
Description
I first copy the array to another array $dataFixed. In a foreach loop of the $data array, I check whether it starts with 7878. If it doesn't, I join it with the previous array in $data. I then unset the current array in $dataFixed and reset the array elements with array_values.
But I'm not very confident about this solution.. Is there a better, more efficient way?
UPDATE
What if the input string doesn't end in 0d0a like its supposed to? It will stick to the previous array element..
For e.g.: in the string 78781110d0a2220d0a78783330d0a0000, 0000 should be separated as another array element.
Use another positive lookahead (?=7878) to form:
preg_split('/(?<=0d0a)(?=7878)/',$string)
Note: I removed (?!$) because I wasn't sure what that was for, based on your example data.
For example, this code:
$string = "78781110d0a2220d0a78783330d0a";
$array = preg_split('/(?<=0d0a)(?=7878)(?!$)/',$string);
print_r($array);
Results in:
Array ( [0] => 78781110d0a2220d0a [1] => 78783330d0a )
UPDATE:
Based on your revised question of having possible random characters at the end of the input string, you can add three lines to make a complete program of:
$string = "78781110d0a2220d0a787830d0a330d0a0000";
$array = preg_split('/(?<=0d0a)(?=7878)/',$string);
$temp = preg_split('/(7878.*0d0a)/',$array[count($array)-1],null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$array[count($array)-1] = $temp[0];
if(count($temp)>1) { $array[] = $temp[1]; }
print_r($array);
We basically do the initial splitting, then split the last element of the resulting array by the expected data format, keeping the delimiter using PREG_SPLIT_DELIM_CAPTURE. The PREG_SPLIT_NO_EMPTY ensures we won't get an empty array element if the input string doesn't end in random characters.
UPDATE 2:
Based on your comment below where it seems you're implying there might be random characters between any of the desired matches, and you want these random characters preserved, you could do this:
$string = "0078781110d0a2220d0a2220d0a0000787830d0a330d0a000078781110d0a2220d0a0000787830d0a330d0a0000";
$split1 = preg_split('/(7878.*?0d0a)/',$string,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
$result = array();
foreach($split1 as $e){
$split2 = preg_split('/(.*0d0a)/',$e,null,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
foreach($split2 as $el){
// test if $el doesn't start with 7878 and ends with 0d0a
if(strpos($el,'7878') !== 0 && substr($el,-4) == '0d0a'){
//if(preg_match('/^(?!7878).*0d0a$/',$el) === 1){
$result[ count($result)-1 ] = $result[ count($result)-1 ] . $el;
} else {
$result[] = $el;
}
}
}
print_r($result);
The strategy employed here is different than above. First we split the input string based on the delimiter that matches your desired data, using the nongreedy regex .*?. At this point we have some strings that contain the ending of a desired value and some garbage at the end, so we split again based on the last occurrence of "0d0a" with the greedy regex .*0d0a. We then append any of those resulting values that don't start with "7878" but end with "0d0a" to the previous value, as this should repair the first and second halves that got split because it contained an extra "0d0a".
I provided two methods for the innermost if statement, one using regular expressions. The regex one is marginally slower in my testing, so I've left that one commented out.
I might still not have your full requirements, so you'll have to let me know if it works and perhaps provided your full dataset.
I think you are using a delimiter "0d0a" which also happens to be part of a content! Its not possible to avoid getting junk data as long as delimiter can also be part of content. Somehow delimiter must be unique.
Possible solutions.
Change the delimited to something else that doesn't occur as part of your data ( 000000, #!.;)
If you are definite about length of text that easy arrange item may have, use it. As per examples its not possible.
Solutions given in answers considering only sample data you have shared. If you are confidant about what will be the content of string, then these solutions given by others are pretty good to use. Otherwise these solutions wont assure you guarantee!
Best solution: Fix right delimiter then use regex or explode whatever you prefer.
Why don't you use preg_match_all instead? You can avoid all of the non-capturing groups (the look aheads, look behinds) in order to split the string (which without the non-capturing groups removes the matches), and just find the matches you're looking for:
Updated
<?php
$string = "00787817878110d0a22278780d0a78783330d0a00";
preg_match_all('/7878.*?0d0a(?=7878|[^(7878)]*?$)/', $string, $arr);
print_r($arr);
?>
Gives an array $arr[0] => ( [0] => 787817878110d0a22278780d0a, [1] => 78783330d0a ). Strips leading and trailing garbage characters (whatever doesn't start with 7878 or end with 7878 or 0d0a.
So $arr[0] would be the array of values that you are looking for.
See example on ideone
Works with multiple 7878 values and multiple 0d0a values (even though that's ridiculous).
Update
If splitting is more your style, why not avoid regular expressions altogether?
<?php
$string = "787817878110d0a22278780d0a78783330d0a";
$arr = explode('0d0a7878', $string);
$string = implode('0d0a,7878', $arr);
$arr = explode(',', $string);
print_r($arr);
?>
Here we split the string by the delimiter 0d0a7878, which is what #CharlieGorichanaz's solution is doing, and props to him for the quick, accurate solution. We then add a comma, because who doesn't love comma separated values? And we explode again on the commas for an array of desired values. Performance-wise, this ought to be faster than using regular expressions. See example.

php preg_grep and umlaut/accent

I have an array that consists of terms, some of them contain accented characters. I do a preg grep like this
$data= array('Napoléon','Café');
$result = preg_grep('~' . $input . '~i', $data);
So if user type in 'le' I would also want the result 'Napoléon' to be matched, which does not work with the ablove command.
I did some searching and found that this function might be relevant
preg_match("/[\w\pL]/u",$var);
How can I combine these and make it work?
This is not possible with a regular expression pattern only. It is not because you can not tell the regex engine to match all "e" and similars. However, it is possible to first normalize the input data (both the array as well as the search input) and then search the normalized data but return the results for the non-normalized data.
In the following example I use transliteration to do this kind of normalization, I guess that is what you're looking for:
$data = ['Napoléon', 'Café'];
$result = array_translit_search('le', $data);
print_r($result);
$result = array_translit_search('leó', $data);
print_r($result);
The exemplary output is:
Array
(
[0] => Napoléon
)
Array
(
[0] => Napoléon
)
The search function itself is rather straight forward as written above, transliterating the inputs, doing the preg_grep and then returning the original intputs matches:
/**
* #param string $search
* #param array $data
* #return array
*/
function array_translit_search($search, array $data) {
$transliterator = Transliterator::create('ASCII-Latin', Transliterator::REVERSE);
$normalize = function ($string) use ($transliterator) {
return $transliterator->transliterate($string);
};
$dataTrans = array_map($normalize, $data);
$searchTrans = $normalize($search);
$pattern = sprintf('/%s/i', preg_quote($searchTrans));
$result = preg_grep($pattern, $dataTrans);
return array_intersect_key($data, $result);
}
This code requires the Transliterator from the Intl extension, you can replace it with any other similar transliteration or translation function.
I can not suggest to use str_replace here btw., if you need to fall-back to a translation table, use strtr instead. That is what you're looking for. But I prefer a library that brings the translation with it's own, especially if it's the Intl lib, you normally can't beat it.
You can write something like this:
$data = array('Napoléon','Café');
// do something with your input, but for testing purposes it will be simply as you wrote in your example
$input = 'le';
foreach($data as $var) {
if (preg_match("/".str_replace(array("é"....), array("e"....), $input)."/i", str_replace(array("é"....), array("e"....), $var)))
//do something as there is a match
}
Actually you even don't need regex in this case, simple strpos will be enough.

php convert string with new lines into array?

I am getting data from an API and the resulting string is
[RESPONSE]
PROPERTY[STATUS][0]=ACTIVE
PROPERTY[REGISTRATIONEXPIRATIONDATE][0]=2012-04-04 19:48:48
DESCRIPTION=Command completed successfully
QUEUETIME=0
CODE=200
RUNTIME=0.352
QUEUETIME=0
RUNTIME=0.8
EOF
I am trying to convert this into an array like
Array(
['PROPERTY[STATUS][0]'] => ACTIVE,
['CODE'] => 200,
...
);
So I am trying to explode it using the resulting file_get_content function with an explode like
$output = explode('=',file_get_contents($url));
But the problem is the returning values are not always returned in the same order, so I need to have it like $array['CODE'] = 200, and $array['RUNTIME'] = 0.352 however there does not seem to be any kind of new line characters? I tried \r\n, \n, <br>, \r\n\r\n in the explode function to no avail. But there is new lines in both notepad and the browser.
So my question is there some way to determine if a string is on a new line or determine what the character forcing the new line is? If not is there some other way I could read this into an array?
To find out what the breaking character is, you could do this (if $data contatins the string example you've posted):
echo ord($data[strlen('[RESPONSE]')]) . PHP_EOL;
echo ord($data[strlen('[RESPONSE]')+1]); // if there's a second char
Then take a look in the ASCII table to see what it is.
EDIT: Then you could explode the data using that newly found character:
explode(ord($ascii_value), $data);
Btw, does file() return a correct array?
Explode on "\n" with double quotes so PHP understands this is a line feed and not a backslashed n ;-) then explode each item on =
Why not just use parse_ini_file() or parse_ini_string()?
It should do everything you need (build an array) in one easy step.
Try
preg_split("/$/m", $str)
or
preg_split("/$\n?/m", $str)
for the split
The lazy solution would be:
$response = strtr($response, "\r", "\n");
preg_match_all('#^(.+)=(.+)\s*$#m', $response, $parts);
$parts = array_combine($parts[1], $parts[2]);
Gives you:
Array (
[PROPERTY[STATUS][0]] => ACTIVE
[PROPERTY[REGISTRATIONEXPIRATIONDATE][0]] => 2012-04-04 19:48:48
[DESCRIPTION] => Command completed successfully
[QUEUETIME] => 0
[CODE] => 200
[RUNTIME] => 0.8

Categories