Parsing a malformed CSV file - php

How can I parse a CSV like this one in PHP (there's a double quote near value 8)?
"03720108";"value 8"";"";"219";"03720108";"value";"value";"value";"";"";"";"";"";"";"value";"";"";"value";"value";
I tried with fgetscv($pointer, 4096, ';', '"');

Your data seems to be malformed starting at the prior line. You have an opening quote with no closing quote.

Yes. Notice the extra quote.
"value 8"";

You might still be able to parse this string, though:
$str = '"03720108";"value 8"";"";"219";"03720108";"value";"value";"value";"";"";"";"";"";"";"value";"";"";"value";"value";';
$parsed = array_map(
function( $str ) { return substr($str, 1, -1); },
explode(';', $str)
);
var_export($parsed);
/*
array (
0 => '03720108',
1 => 'value 8"',
2 => '',
3 => '219',
4 => '03720108',
5 => 'value',
6 => 'value',
7 => 'value',
8 => '',
9 => '',
10 => '',
11 => '',
12 => '',
13 => '',
14 => 'value',
15 => '',
16 => '',
17 => 'value',
18 => 'value',
19 => false,
)
*/
Things get a bit more complicated, though, if there are elements that contain a ; character (normally would be escaped by enclosing the value in quotes... d'oh!), and the above code assumes that you only need to parse a single line (though if you are using fgets() to read the input stream, you should be OK).

Related

Regex for splitting apparel sizes

I have the following input (only for example, real input contains much more crazy data)
$values = [
'32/34, 36/38, 40/42, 44/46',
'40/42/44/46/48',
'58/60',
'39-42',
'40-50-60',
'24-25,26,28,30',
'36 40,5 44',
];
and want to split it by separators like / or , but keep pairs of values. This should be done only, if separator does not occur multiple times, so the result should look like:
'32/34, 36/38, 40/42, 44/46'
=> [ '32/34', '36/38', '40/42', '44/46' ]
'40/42/44/46/48'
=> [ '40', '42', '44', '46', '48' ]
'58/60'
=> [ '58/60' ]
'39-42'
=> [ '39-42' ]
'40-50-60'
=> [ '40', '50', '60' ]
'24-25,26,28,30'
=> [ '24-25', '26', '28', '30' ]
'36 40,5 44'
=> [ '36', '40,5', '44' ]
What I have so far is
$separator = '^|$|[\s,\/-]';
$decimals = '\d+(?:[,.][05])?';
foreach ($values as $value) {
preg_match_all('/' .
'(?<=' . $separator . ')' .
'(?:' .
'(?P<var1>(' . $decimals . ')[\/-](?-1)|(?-1))' .
')(?=' . $separator . ')' .
'/ui', $value, $matches);
print_r($matches);
}
But this fails for 40/42/44/46/48 which returns
[var1] => Array
(
[0] => 40/42
[1] => 44/46
[2] => 48
)
But each number should be returned separately. Modifying regex to '(?P<var1>(' . $decimals . ')([\/-])(?-2)|(?-2))(?!\3)' is better, but still returns wrong result
[var1] => Array
(
[0] => 40
[1] => 42
[2] => 44
[3] => 46/48
)
How should the correct regex look like?
As stated in comments above, I know that a 100% match is not possible, because of user input. But I've found a regex which fits most of my use cases:
(?<=^|$|[\s,\/-])(?:(?P<var1>(?<![\/-])(?!(?:(\d+(?:[,.][05])?)[\/-]){2}(?-1))(\d+(?:[,.][05])?)[\/-](?-1)|(?-1)))(?=^|$|[\s,\/-])
See https://regex101.com/r/q3YSa7/1

Json encode Unicode Symbol in PHP array

Given the following array
$locationIcon = array(
'face' => 'FontAwesome',
'code' => '\uf015',
'size' => 75,
'color' => 'gray',
);
which is encoded via json_encode, I would like to have this output:
{
face: 'FontAwesome',
code: '\uf015',
size: 75,
color: 'gray'
}
but instead I get these results:
Version 1
json_encode($array)
=>
"icon":{"face":"FontAwesome","code":"\\uf2bd","size":40,"color":"gray"}
Version 2 as seen here
json_encode($array, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE)
=>
"icon" {"face":"FontAwesome","code":"\\uf2bd","size":40,"color":"gray"} (same)
Version 3 (add an escape char)
$locationIcon = array(
'face' => 'FontAwesome',
'code' => sprintf('%cuf2bd', 27),
'size' => 100,
'color' => 'gray',
);
json_encode($array)
=>
"icon" {"face":"FontAwesome","code":"\u001buf233","size":40,"color":"gray"}
Any ideas what I am doing wrong?
Well, you have written the string "backslash u f zero one five", and JSON-encoding that preserves it exactly with that meaning. There's no sane way around that. Write the actual character you want, not "\uf015". Since this particular character can be slightly awkward to write, write it in some alternative notation, like raw UTF-8 bytes:
$locationIcon = [
'code' => "\xEF\x80\x95", // U+F015
...
];
echo json_encode($locationIcon); // {"code": "\uf015", ...}

how can i replace some things in my array?

I need to format a date in my array - but the date in the array isn't saved as a datetime in a database or something like this.. I've got the dates from my server with cut them out.
So I need to work with preg_replace or with str_replace
What I've tried so far using str_replace:
$reverse_date = str_replace( '[', '' ,$reverse_date);
$reverse_date = str_replace( ']', '' ,$reverse_date);
$reverse_date = str_replace( '/', '.' ,$reverse_date);
but I don't want to use three lines for this.
If I print_r this, I will get : 12.Oct.2015:01:10:43 +0200
before it was looking like this : [12/Oct/2015:00:37:29 +0200]
so this is okay ! But I still don't want to use three lines for this, but I don't understand the preg_replace syntax
I want the following output :
12.Oct.2015(space)01:10:43 +0200
As you have said you were getting a date from an array within the following format
[12/Oct/2015:00:37:29 +0200]
So instead of using str_replace or preg_replace you can simply use DateTime::createFromFormat function of PHP like as
$date = DateTime::createFromFormat("[d/M/Y:H:i:s P]","[12/Oct/2015:00:37:29 +0200]");
echo $date->format('d.M.Y H:i:s P');//12.Oct.2015 00:37:29 +02:00
Demo
Use date_parse to disassemble the date and combine the parts to form your needed result:
[40] boris> $date_array = date_parse(" [12/Oct/2015:00:37:29 +0200] ");
// array(
// 'year' => 2015,
// 'month' => 10,
// 'day' => 12,
// 'hour' => 0,
// 'minute' => 37,
// 'second' => 29,
// 'fraction' => 0,
// 'warning_count' => 0,
// 'warnings' => array(
//
// ),
// 'error_count' => 2,
// 'errors' => array(
// 0 => 'Unexpected character',
// 27 => 'Unexpected character'
// ),
// 'is_localtime' => true,
// 'zone_type' => 1,
// 'zone' => -120,
// 'is_dst' => false
// )
You don't have the month as abbreviated string, but that is trivial to add via an associative array (array(1 => 'Jan', ..., 12 => 'Dec')), and you are on the safe side concerning the date-parsing stuff and future changes in your needs.
Ok I found out how to do it with preg_replace in one line, however I like the Uchiha answer with the date format more - even that he is not using the regex, this is probably the best way to go.
echo preg_replace(['~(?<=\d{4}:\d{2}):~', '~[\[]~', '~[\]]~', '~[\/]~g'],[' ', '', '', '.'],'[12/Oct/2015:00:37:29 +0200]');
12.Oct.2015:00 37:29 +0200

php preg_match_all multiple patterns

[-27439367, 160818667, 'http:\/\/cs13110.vk.me\/u109515688\/video\/l_97403fde.jpg', 'Super Bass', '', '0', 38674081, 37, 0, '2:34', '3', '_8cb245a336c2e35049', '']
Hello! Here is my sample text.... I need to use preg_match for multiple patterns... I need to find:
1. -27439367
2. 165375317
3. http://cs6067.vk.me/u189929178/video/l_02613a05.jpg
4. Super Bass
5. 0
6. 38674081
7. 37
8. 2:34
9. 0
10. 3
11. _8cb245a336c2e35049
I used:
preg_match_all("/[(.*?), (.*?), '(.*?)', '(.*?)', '', '0', 0, 23, 0, '', '0', '(.*?)', '']/mis", $a, $hashtweet);
Here you go:
$json = "[-27439367, 160818667, 'http:\/\/cs13110.vk.me\/u109515688\/video\/l_97403fde.jpg', 'Super Bass', '', '0', 38674081, 37, 0, '2:34', '3', '_8cb245a336c2e35049', '']" ;
$json = preg_replace("/'/", '"', $json); //Replace single quotes by double quotes
$obj = json_decode($json);
var_dump($obj);
array (size=13)
0 => int -27439367
1 => int 160818667
2 => string 'http://cs13110.vk.me/u109515688/video/l_97403fde.jpg' (length=52)
3 => string 'Super Bass' (length=10)
4 => string '' (length=0)
5 => string '0' (length=1)
6 => int 38674081
7 => int 37
8 => int 0
9 => string '2:34' (length=4)
10 => string '3' (length=1)
11 => string '_8cb245a336c2e35049' (length=19)
12 => string '' (length=0)
Maybe strip the [] brackets and do explode(',', $input) (docs)?
Another idea: this looks like a valid JSON data, so json_decode (docs) should do the trick.

My PHP code serializes, but doesn't unserialize

THis this my code .
$data = array(
'24 Jan|8:30' => '12.6',
'22 Feb|8:30' => '250',
'11 Mar|8:10' => '0',
'31 Apr|23:30' => '7',
'32 Apr|23:30' => '80',
'33 Apr|23:30' => '67',
'34 r|23:30' => '45',
'35 Ap|23:30' => '66',
'34 Lr|23:30' => '23',
'3 Apr|23:30' => '23'
);
//echo serialize($data);
$x = unserialize('a:10:{s:12:"24 Jan|8:30 ";s:4:"12.6";s:12:"22 Feb|8:30 ";s:3:"250";s:12:"11 Mar|8:10 ";s:1:"0";s:12:"31 Apr|23:30";s:1:"7";s:12:"32 Apr|23:30";s:2:"80";s:12:"33 Apr|23:30";s:2:"67";s:12:"34 r|23:30 ";s:2:"45";s:12:"35 Ap|23:30 ";s:2:"66";s:12:"34 Lr|23:30 ";s:2:"23";s:12:"3 Apr|23:30 ";s:2:"23";}');
var_dump($x);
Not work in unserialize function.
Please help!
The serialized representation of $data and the string you are trying to unserialize differ.
http://codepad.viper-7.com/3zlk1a
At offset 199 you see
s:12:"34 r|23:30 "
but the string (s) isn't 12 characters long (thats what s:12: mean). I guess something modified the serialized string directly. Just don't do it :) Always unserialize and work with the structured values.
'a:10:{s:12:"24 Jan|8:30 ";s:4:"12.6";s:12:"22 Feb|8:30 ";s:3:"250";s:12:"11 Mar|8:10 ";s:1:"0";s:12:"31 Apr|23:30";s:1:"7";s:12:"32 Apr|23:30";s:2:"80";s:12:"33 Apr|23:30";s:2:"67";s:12:"34 r|23:30 ";s:2:"45";s:12:"35 Ap|23:30 ";s:2:"66";s:12:"34 Lr|23:30 ";s:2:"23";s:12:"3 Apr|23:30 ";s:2:"23";}'
...is not a valid serialization. Specifically, the s:12:"34 r|23:30 "; segment indicates that the string 34 r|23:30 contains 12 characters, which it does not.
$a = serialize($data);
$x = unserialize($a);

Categories