splitting string into php array

splitting string into php array - php

I have an array that outputs the following:
Array
(
[0] => #EXTM3U
[1] => #EXTINF:206,"Weird" Al Yankovic - Dare to be Stupid
[2] => E:\Dare to be Stupid.mp3
[3] => #EXTINF:156,1910 Fruitgum Company - Chewy, Chewy
[4] => E:\Chewy Chewy.mp3
[5] => #EXTINF:134,1910 Fruitgum Company - Goody Goody Gumdrops
[6] => E:\Goody Goody Gumdrops.mp3
[7] => #EXTINF:134,1910 Fruitgum Company - Simon Says
[8] => E:\Simon Says.mp3
[9] => #EXTINF:255,3 Doors Down - When I'm Gone
[10] => E:\When I'm Gone.mp3 [
11] => #EXTINF:179,? And the Mysterians - 96 Tears**
)
I need to split this array then loop through and save each value to the database, e.g:
"Weird" Al Yankovic - Dare to be Stupid
Fruitgum Company - Chewy, Chewy
Save each value above to database individually.
Thanks in advance!
Edit: Added from the comments
Let me try and explain in more detail. I start with a string that looks like this:
#EXTM3U #EXTINF:266,10cc - Dreadlock Holiday
D:\Music - Sorted\Various Artists\De Beste Pop Klassiekers Disc 1\10cc - Dreadlock Holiday.mp3
#EXTINF:263,1919 - Cry Wolf
D:\Music - Sorted\Various Artists\Gothic Rock, Vol. 3 Disc 2\1919 - Cry Wolf.mp3
#EXTINF:318,3 Doors Down - [Untitled Hidden Track]
D:\Music - Sorted\3 Doors Down\Away From The Sun\3 Doors Down - [Untitled Hidden Track].mp3
I'm then trying to strip everything out of this and just have an array of track titles, this is a playlist file for online radio. What I am doing so far:
$finaloutput = $_POST['thestring'];
$finaloutput = str_replace('#EXTINF:','',$finaloutput);
$finaloutput = str_replace('#EXTM3U','',$finaloutput);
$finaloutput = preg_split("/\r\n|\r|\n/", $finaloutput);
foreach ($finaloutput as $value) {
echo $value; echo '<br>';
}
But I still have these rows remaining, I need to try and do a str_replace between a line break and the end .mp3
D:\Music - Sorted\3 Doors Down\Away From The Sun\3 Doors Down - [Untitled Hidden Track].mp3

You can extract the relevant parts from source by use of preg_match_all with a regex like this.
$pattern = '/^#[^,\n]+,\K.*/m';
^ anchor matches start of line in multiline mode which is set with m flag.
#[^,\n]+, matches the part from # until , by use of a negated class.
\K is used to reset beginning of the reported match. We don't want the previous part.
.* the part to be extracted: Any amount of any character until end of the line.
if(preg_match_all($pattern, $finaloutput, $out) > 0);
print_r($out[0]);
PHP demo at eval.in

Related

Regex: match ":" and "-" but don't match "I-"

Ohare:Montrose:I-290 Circle:IL:IL
Ohare-Montrose-I_290-Circle-IL-IL
EB:Kennedy Expy:O'Hare:IL-43 (Harlem Ave):IL:IL
NB:I-894/US-45:Hale Interchange:Zoo Interchange:WI:IL
NB
I-894/US-45
Hale
Interchange
Zoo Interchange
WI
IL
WB:Indiana-East-West:Eastpoint:Middlebury:IN:25:IL
WB
Indiana-East-West
Eastpoint
Middlebury
IN
25
IL
Trying to extract words from two different sources that use different conventions.
Using regex for that, I cannot create one regex that deals with both options.
If I try to extract using : or - then the first one gets extracted as
Ohare, Montrose, I, 290 Circle, IL, IL
How can I get a regex to split on : or - but ignore I- or ignore 'IL-', 'US-', 'Indiana-East-West' and many other that I may find?
What I have so far but not working as I want
Regex

You can use this negative lookbehind regex:
(?:(?:IL?|US)-|Indiana-East-West)(*SKIP)(*F)|[:-]
RegEx Demo
Example Code:
$s = 'NB:I-894/US-45:Hale Interchange:Zoo Interchange:WI:IL';
print_r(preg_split('/(?:(?:IL?|US)-|Indiana-East-West)(*SKIP)(*F)|[:-]/' , $s));
Array
(
[0] => NB
[1] => I-894/US-45
[2] => Hale Interchange
[3] => Zoo Interchange
[4] => WI
[5] => IL
)

Get all matches with pure regex?

I'm working in PHP and need to parse strings looking like this:
Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
I need to get the rake, pot, and rake contribution per player with names. The number of players is variable. Order is irrelevant so long as I can match player name to rake contribution in a consistent way.
For example I'm looking to get something like this:
Array
(
[0] => Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
[1] => 100
[2] => 1000
[3] => andy
[4] => 10
[5] => bob
[6] => 20
[7] => cindy
[8] => 70
)
I was able to come up with a regex which matches the string but it only returns the last player-rake contribution pair
^Rake \(([0-9]+)\) Pot \(([0-9]+)\) Players \((?:([a-z]*): ([0-9]*)(?:, )?)*\)$
Outputs:
Array
(
[0] => Rake (100) Pot (1000) Players (andy: 10, bob: 20, cindy: 70)
[1] => 100
[2] => 1000
[3] => cindy
[4] => 70
)
I've tried using preg_match_all and g modifiers but to no success. I know preg_match_all would be able to get me what I wanted if I ONLY wanted the player-rake contribution pairs but there is data before that I also require.
Obviously I can use explode and parse the data myself but before going down that route I need to know if/how this can be done with pure regex.

You could use the below regex,
(?:^Rake \(([0-9]+)\) Pot \(([0-9]+)\) Players \(|)(\w+):?\s*(\d+)(?=[^()]*\))
DEMO
| at the last of the first non-capturing group helps the regex engine to match the characters from the remaining string using the pattern which follows the non-capturing group.

I would use the following Regex to validate the input string:
^Rake \((?<Rake>\d+)\) Pot \((?<Pot>\d+)\) Players \(((?:\w*: \d*(?:, )?)+)\)$
And then just use the explode() function on the last capture group to split the players out:
preg_match($regex, $string, $matches);
$players = explode(', ', $matches[2]);

php's str_getcsv breaking on tab separated list with no enclosure and individual double quotes

I'm using str_getcsv to parse tab separated values being returned from a nosql query however I'm running into a problem and the only solution I've found is illogical.
Here's some sample code to demonstrate (FYI, it seems the tabs aren't being preserved when showing here)...
$data = '0 16 Gruesome Public Executions In North Korea - 80 Killed http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata "North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou... 1384357511 http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw 0 The Young Turks 1 2013-11-13 12:53:31 9ab8f5607183ed258f4f98bb80f947b4 35afc4001e1a50fb463dac32de1d19e7';
$data = str_getcsv($data,"\t",NULL);
echo '<pre>'.print_r($data,TRUE).'</pre>';
Pay particular attention to the fact that one column (beginning with "North Korea...." actually starts with a double quote " but doesn't finish with one. This is why I supply NULL as the third parameter (enclosure) to override the defaut " enclosure value.
Here is the result:
Array
(
[0] => 0
[1] => 16
[2] => Gruesome Public Executions In North Korea - 80 Killed
[3] => http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata
[4] =>
[5] => North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou... 1384357511 http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw 0 The Young Turks 1 2013-11-13 12:53:31 9ab8f5607183ed258f4f98bb80f947b4 35afc4001e1a50fb463dac32de1d19e7
)
As you can see the quote is breaking the function. Logically I thought I would be able to use NULL or and empty string'' as the third parameter for str_getcsv (enclosure) but neither worked?!?!
The only thing I could use to get str_getcsv to work properly was a space char ' '. That doesn't make any sense to me becuase none of the columns have whitespace starting and/or ending them.
$data = '0 16 Gruesome Public Executions In North Korea - 80 Killed http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata "North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou... 1384357511 http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw 0 The Young Turks 1 2013-11-13 12:53:31 9ab8f5607183ed258f4f98bb80f947b4 35afc4001e1a50fb463dac32de1d19e7';
$data = str_getcsv($data,"\t",' ');
echo '<pre>'.print_r($data,TRUE).'</pre>';
Now the result is:
Array
(
[0] => 0
[1] => 16
[2] => Gruesome Public Executions In North Korea - 80 Killed
[3] => http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata
[4] =>
[5] => "North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou...
[6] => 1384357511
[7] => http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw
[8] => 0
[9] => The Young Turks
[10] =>
[11] =>
[12] =>
[13] =>
[14] => 1
[15] => 2013-11-13 12:53:31
[16] => 9ab8f5607183ed258f4f98bb80f947b4
[17] => 35afc4001e1a50fb463dac32de1d19e7
)
So my question is, why does it work with a space as the enclosure, but not NULL or and empty string? Also are there repercussions to this?
UPDATE 1: It seems this reduced the number of errors I was receiving in our logs but it didn't eliminate them, so I'm guessing that the I used as the enclosure has caused unintended side effects, albeit less troubling than the previous problem. But my question remains the same, why can't I use NULL, or an empty space as the enclosure, and secondly, is there a better way of dealing with / doing this?

Just to give a starting point ...
You might wanna consider working with the string itself, instead of using a function like str_getcsv in your case.
But be aware that there are at least some pitfalls, if you choose this route (might be your only option though):
Handling of escaped characters
Line breaks within the data (not meant as delimiters)
If you know that you don't have any other TABS in your string other than those ending the fields, and you don't have any linebreaks other than those delimiting a row, you might be fine with this:
$data = explode("\n", $the_whole_csv_string_block);
foreach ($data as $line)
{
$arr = explode("\t", $line);
// $arr[0] will have every first field of every row, $arr[1] the 2nd, ...
// Usually this is what I want when working with a csv file
// But if you rather want a multidimensional array, you can simply add
// $arr to a different array and after this loop you are good to go.
}
Otherwise this is just a starting point for you, to begin and tweak it to your individual situation, hope it helps.

Simply use chr(0) as enclosure and escape:
$data = str_getcsv($data, "\t", chr(0), chr(0));

$ not matching position immediately before a newline that is the last character

$ is not matching a position immediately before a newline that is the last character.
Ideally /1...$/ should match but match happens with the pattern /1....$/ which seems to be wrong.
What could be the reason?
PHP doc also says A dollar character ($) is an assertion which is TRUE only if the current matching point is at the end of the subject string, or immediately before a newline character that is the last character in the string (by default).
$subject = 'abc#
123#
';
$pattern = '/1...$/';
preg_match_all($pattern,$subject,$matches); // no match
Update:
I suspect extra dot due to \r\n format of newline.
I did following experiment and see some hint.
$pattern = '/1...(.)$/';
echo bin2hex($matches[1]); // 28
28 seems to be equal to \r (CR) so basically $ is matching before \n not before \r\n, that may be the reason of my problem.
Image after non printable character turn on

Issue was due to different newline representation of window file and linux file
Why this issue:
I created php file in window and transferred to linux where PHP was installed.
Windows uses \r\n to represent newline and linux \n ==> that's why initially it was taking extra dot to match.
Below experiment confirmed the same:
$subject = 'abc#
123#
';
$pattern = '/1...(.)$/';
preg_match_all($pattern,$subject,$matches);
echo bin2hex($matches[1]); // 28
// 28 is equivalent of \r or CR(carriage return)
Created new file in linux system and /1...$/ catches the match :)
I hope this will save someone's time if stuck with same problem.

Your string is multi-line. By default regex won't do multi-line. You have to add the m modifier for this to happen.
For example:
/1...$/m

I have been stuck on this issue for two days. I did a lot of testing to find any logic that lies behind this because it all depends on where your data comes from (internal and controlled vs. external and uncontrolled). In my case it was input field (<textarea>) on my website available from various browsers (and various OS-es) and there were no such problems in JavaScript with pattern testing/matching/checking. Here is a hint for those of you who are trying to fight off (or work around at least) the problem of matching a pattern correctly at the end ($) of any line in multiple lines mode (/m).
<?php
// Various OS-es have various end line (a.k.a line break) chars:
// - Windows uses CR+LF (\r\n);
// - Linux LF (\n);
// - OSX CR (\r).
// And that's why single dollar meta assertion ($) sometimes fails with multiline modifier (/m) mode - possible bug in PHP 5.3.8 or just a "feature"(?).
$str="ABC ABC\n\n123 123\r\ndef def\rnop nop\r\n890 890\nQRS QRS\r\r~-_ ~-_";
// C 3 p 0 _
$pat1='/\w$/mi'; // This works excellent in JavaScript (Firefox 7.0.1+)
$pat2='/\w\r?$/mi'; // Slightly better
$pat3='/\w\R?$/mi'; // Somehow disappointing according to php.net and pcre.org when used improperly
$pat4='/\w(?=\R)/i'; // Much better with allowed lookahead assertion (just to detect without capture) without multiline (/m) mode; note that with alternative for end of string ((?=\R|$)) it would grab all 7 elements as expected
$pat5='/\w\v?$/mi';
$pat6='/(*ANYCRLF)\w$/mi'; // Excellent but undocumented on php.net at the moment (described on pcre.org and en.wikipedia.org)
$n=preg_match_all($pat1, $str, $m1);
$o=preg_match_all($pat2, $str, $m2);
$p=preg_match_all($pat3, $str, $m3);
$r=preg_match_all($pat4, $str, $m4);
$s=preg_match_all($pat5, $str, $m5);
$t=preg_match_all($pat6, $str, $m6);
echo $str."\n1 !!! $pat1 ($n): ".print_r($m1[0], true)
."\n2 !!! $pat2 ($o): ".print_r($m2[0], true)
."\n3 !!! $pat3 ($p): ".print_r($m3[0], true)
."\n4 !!! $pat4 ($r): ".print_r($m4[0], true)
."\n5 !!! $pat5 ($s): ".print_r($m5[0], true)
."\n6 !!! $pat6 ($t): ".print_r($m6[0], true);
// Note the difference among the three very helpful escape sequences in $pat2 (\r), $pat3 and $pat4 (\R), $pat5 (\v) and altered newline option in $pat6 ((*ANYCRLF)) - for some applications at least.
/* The code above results in the following output:
ABC ABC
123 123
def def
nop nop
890 890
QRS QRS
~-_ ~-_
1 !!! /\w$/mi (3): Array
(
[0] => C
[1] => 0
[2] => _
)
2 !!! /\w\r?$/mi (5): Array
(
[0] => C
[1] => 3
[2] => p
[3] => 0
[4] => _
)
3 !!! /\w\R?$/mi (5): Array
(
[0] => C
[1] => 3
[2] => p
[3] => 0
[4] => _
)
4 !!! /\w(?=\R)/i (6): Array
(
[0] => C
[1] => 3
[2] => f
[3] => p
[4] => 0
[5] => S
)
5 !!! /\w\v?$/mi (5): Array
(
[0] => C
[1] => 3
[2] => p
[3] => 0
[4] => _
)
6 !!! /(*ANYCRLF)\w$/mi (7): Array
(
[0] => C
[1] => 3
[2] => f
[3] => p
[4] => 0
[5] => S
[6] => _
)
*/
?>
Unfortunately, I haven't got any access to a server with the latest PHP version - my local PHP is 5.3.8 and my public host's PHP is version 5.2.17.

regex to find year/month substring

Can someone help me with a regular expression to get the year and month from a text string?
Here is an example text string:
http://www.domain.com/files/images/2012/02/filename.jpg
I'd like the regex to return 2012/02.

This regex pattern would match what you need:
(?<=\/)\d{4}\/\d{2}(?=\/)

Depending on your situation and how much your strings vary - you might be able to dodge a bullet by simply using PHP's handy explode() function.
A simple demonstration - Dim the lights please...
$str = 'http://www.domain.com/files/images/2012/02/filename.jpg';
print_r( explode("/",$str) );
Returns :
Array
(
[0] => http:
[1] =>
[2] => www.domain.com
[3] => files
[4] => images
[5] => 2012 // Jack
[6] => 02 // Pot!
[7] => filename.jpg
)
The explode() function (docs here), splits a string according to a "delimiter" that you provide it. In this example I have use the / (slash) character.
So you see - you can just grab the values at 5th and 6th index to get the date values.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

splitting string into php array - php

Related

Regex: match ":" and "-" but don't match "I-"

Get all matches with pure regex?

php's str_getcsv breaking on tab separated list with no enclosure and individual double quotes

$ not matching position immediately before a newline that is the last character

regex to find year/month substring

Categories

Resources