My original test implementation consisted of building an array of "ignore words" with the following code:
$ignoreList = array("test1", "test2", "test3");
Later on, I test for individual words in the $ignoreList:
if(in_array($word, $ignoreList)){
} else{
$words[$word] = $words[$word] + 1;
}
This code works perfectly - upon later echoing my word list, no words on the $ignoreList show up. I refactored to make it easier to add or remove words:
//Import ignore list
$ignore_raw = file_get_contents("includes/ignore.txt");
$ignoreList = explode("\n", $ignore_raw);
ignore.txt is a plain text file with each item on its own line, no spaces. The import and explode seems to be working, because a print_r statement on $ignoreList results in:
Array ( [0] => a [1] => and [2] => are [3] => as [4] => for [5] => in [6] => is [7] => more [8] => of [9] => than [10] => that [11] => the [12] => to [13] => with )
The comparison code, however, stops working properly, and words on the ignore list show up once again in my final results. Any ideas what's wrong?
Your ignore.txt file may have \r\n line endings, and your words actually have a trailing \r.
Try that:
$ignoreList = array_map('trim', file("includes/ignore.txt"));
BTW your code may be refactored like that:
$words = array_diff($words, $ignoreList); // removes ignored words
$words = array_count_values($words); // count words
Related
This question already has an answer here:
How to retrieve variable="value" pairs from m3u string
(1 answer)
Closed 3 years ago.
Hope you guys can help me out. I have the following .m3u file
#EXTM3U
#EXTINF:-1 tvg-id="" tvg-name="A&E" tvg-logo="" group-title="ENTRETENIMIENTO",A&E
http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts
#EXTINF:-1 tvg-id="" tvg-name="ABC Puerto Rico" tvg-logo="" group-title="NACIONALES",ABC Puerto Rico
http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/96.ts
#EXTINF:-1 tvg-id="" tvg-name="Animal Planet" tvg-logo="" group-title="ENTRETENIMIENTO",Animal Planet
http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/185.ts
As you can see, there is the main tag for the file
#EXTM3U and down that start the video information tag (#EXTINF:-1 ...) and down that the video link entry (http:// .....)
Can you explicitly tell me how can i parse this whole file (it's a pretty large one) and save the fields in an array for example like this? videos[ ]
and later i can acces to every video attributes lets say videos[0]['title'] for getting the title for the first video? and so on with the other attributes for example videos[42]['link'] and get the link to the video #42.
I am already using curl to get the file content into a variable like this
<?php
$handler = curl_init("link to m3u file");
$response = curl_exec ($handler);
curl_close($handler);
echo $response;
?>
What i need now is to parse the Curl response and save all the videos information into an array, where i can acces to every attribute of every video.
I know i must use some regexp or something like that. i just dont understand how. can you please help me with some code? thank you so much.
Behold the magik of Regx
$string = <<<CUT
#EXTM3U
#EXTINF:-1 tvg-id="" tvg-name="A&E" tvg-logo="" group-title="ENTRETENIMIENTO",A&E`http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts
http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts
#EXTINF:-1 tvg-id="" tvg-name="ABC Puerto Rico" tvg-logo="" group-title="NACIONALES",ABC Puerto Rico
http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/96.ts
CUT;
preg_match_all('/(?P<tag>#EXTINF:-1)|(?:(?P<prop_key>[-a-z]+)=\"(?P<prop_val>[^"]+)")|(?<something>,[^\r\n]+)|(?<url>http[^\s]+)/', $string, $match );
$count = count( $match[0] );
$result = [];
$index = -1;
for( $i =0; $i < $count; $i++ ){
$item = $match[0][$i];
if( !empty($match['tag'][$i])){
//is a tag increment the result index
++$index;
}elseif( !empty($match['prop_key'][$i])){
//is a prop - split item
$result[$index][$match['prop_key'][$i]] = $match['prop_val'][$i];
}elseif( !empty($match['something'][$i])){
//is a prop - split item
$result[$index]['something'] = $item;
}elseif( !empty($match['url'][$i])){
$result[$index]['url'] = $item ;
}
}
print_r( $result );
Returns
array (
0 =>
array (
'tvg-name' => 'A&E',
'group-title' => 'ENTRETENIMIENTO',
'something' => ',A&E`http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts',
'url' => 'http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts',
),
1 =>
array (
'tvg-name' => 'ABC Puerto Rico',
'group-title' => 'NACIONALES',
'something' => ',ABC Puerto Rico',
'url' => 'http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/96.ts',
),
)
Seriously though I have no clue what some of this is something for example. Anyway should get you started.
For the regx, it's actually pretty simple when it's broken down. The real trick is in using preg_match_all instead of preg_match.
Here is our regx
/(?P<tag>#EXTINF:-1)|(?:(?P<prop_key>[-a-z]+)=\"(?P<prop_val>[^"]+)")|(?<something>,[^\r\n]+)|(?<url>http[^\s]+)/
First we will break it down to more manageable bits. These are separated by the pipe | for or. Each one can be thought as a separate pattern, match this one or the next one. Now, the order can be important, because they will match left to right so if one matches on the left it stops. So you have to be careful no to have a regx that can match in two places ( if you don't want that ). However, it can be used to your advantage too, as I will show below. This is really what we are dealing with
(?P<tag>#EXTINF:-1)
(?:(?P<prop_key>[-a-z]+)=\"(?P<prop_val>[^"]+)")
(?<something>,[^\r\n]+)
(?<url>http[^\s]+)
Four regular expressions. For all of these (?P<name>...) is a named capture group, it just makes it more readable, easier to find the bits. If you look at the conditions I use to find the matches, for example!empty($match['tag'][$i]), we can use the tag index/key because of a named capture group, otherwise it would be 1. With a number of regx all together, having 1 2 3 can get messy if you consider this is actually nested so it would be $match[1][$i] for tag etc. Anyway, once that is taken out we have
#EXTINF:-1 match this string literally
(?:(?P<prop_key>[-a-z]+)=\"(?P<prop_val>[^"]+)") this is more complicated (?: .. ) is a non-capture group, this is so the key/value winds up with the same index in the match array but not captured togather, Broken down this is ([-a-z]+)=\"([^"]+)\" or match a word followed by = then " than anything but a " ending with ". Basically one side captures the key, the other the value excluding the double quotes
,[^\r\n]+ starts with a comma then anything but a line return
and last http[^\s] a url
Now remember what I said about order being important, this url http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts would match the last expression, except that it starts with ,A&Ehttp://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts` which matches the 3rd one so it never gets to number 4
Hope that helps, granted you'll have to have a basic grasp of Regx, this is not really the place for a full tutorial on that, and you can find better examples then I can provide in a few short minutes.
Just for the sake of completeness, here is part of what preg_match_all returns
(
[0] => Array(
[0] => #EXTINF:-1
[1] => tvg-name="A&E"
[2] => group-title="ENTRETENIMIENTO"
[3] => ,A&E`http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts
[4] => http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts
[5] => #EXTINF:-1
[6] => tvg-name="ABC Puerto Rico"
[7] => group-title="NACIONALES"
[8] => ,ABC Puerto Rico
[9] => http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/96.ts
)
[tag] => Array(
[0] => #EXTINF:-1
[1] =>
[2] =>
[3] =>
[4] =>
[5] => #EXTINF:-1
[6] =>
[7] =>
[8] =>
[9] =>
)
[1] => Array(
[0] => #EXTINF:-1
[1] =>
[2] =>
[3] =>
[4] =>
[5] => #EXTINF:-1
[6] =>
[7] =>
[8] =>
[9] =>
)
[prop_key] => Array(
[0] =>
[1] => tvg-name
[2] => group-title
[3] =>
[4] =>
[5] =>
[6] => tvg-name
[7] => group-title
[8] =>
[9] =>
)
[2] => Array( ... duplicate of prop_key .. )
etc.
)
The way to find the item in the above array is if you look at the for loop when it runs the first time index 0, the main part of the match $match[0][$i] contains all the matches, but the tag array only contains the items that match that regx, we can correlate them using the $i index.
if( !empty($match['tag'][$i])){
//is a tag increment the result index
++$index;
}
If $match[tag][$i] is not empty. which if you look at $match[tag][0] when $i = 0 you will see that indeed it is not empty. On the second loop $match[tag][1] is empty but $match[prop_key][1] is not so we know that when $i = 1 item is a prop_key match. That's how that works.
-ps- if you can find a way to remove the duplicated numeric indexes, please share it with me ... lol ... these are the normal matches if I didn't use a named capture group, as I said it can get messy.
I did a simple working m3u8 parser in php.
it's a remote m3u8 file parser to json but it easy to change the output
https://github.com/onigetoc/m3u8-PHP-Parser
I may soon change it or add a CURL parser instead of file_get_contents().
m3u-parser.php?url=https://raw.githubusercontent.com/onigetoc/m3u8-PHP-Parser/master/ressources/demofile.m3u
Once you get the CURL Response then read the file from Remote Location via CURL or fopen function.
For that you have read the files that are into directory from remote location and save all the files into Local server.
You can use the file function "Stat" for the getting all the information and keep into the $files
I have given the idea regarding how to collect all information and then you can create array.
Once the Array is created you can serialize the response for printing.
I've looked and can't find a solution to this feature we would like to write. I'm fairly new to PHP so any help, advice and code examples are always greatly appreciated.
Let me explain what we want to do...
We have a block of HTML inside a string - the content could be up to 2000 words with styling such as <p>, <ul>, <h2> included in this HTML content string.
We also have an array of images related to this content inside a separate string.
We need to add the images from the array string into the HTML content at equal spaces without breaking the HTML code. So a simple character count won't work as it could break the HTML tags.
We need to equally space the images. So, for example; if we had 2000 words inside the HTML content string and 10 images in the array, we need to place an image every 200 words.
Any help or coding samples provided in order to achieve this is greatly appreciated - thank you for your help in advance.
You can use
$numword = str_word_count($str, 0);
for getting the number of row
or
$array = str_word_count($str,1);
for getting in $array an array with all the word (one for index) and then iterating on this array for rebuild text you need adding every number of time (word) the code for your image
This Sample is form php Manual
<?php
$str = "Hello fri3nd, you're
looking good today!";
print_r(str_word_count($str, 1));
print_r(str_word_count($str, 2));
print_r(str_word_count($str, 1, 'àáãç3'));
echo str_word_count($str);
?>
this is related result
Array
(
[0] => Hello
[1] => fri
[2] => nd
[3] => you're
[4] => looking
[5] => good
[6] => today
)
Array
(
[0] => Hello
[6] => fri
[10] => nd
[14] => you're
[29] => looking
[46] => good
[51] => today
)
Array
(
[0] => Hello
[1] => fri3nd
[2] => you're
[3] => looking
[4] => good
[5] => today
)
7
You can find it in this doc
for the insert you can try this way
$num = 200; // number of word after which inert the image
$text = $array[0]; // initialize the text with the first word in array
for ($cnt =1; $cnt< count( $array); $cnt++){
$text .= $array[$cnt]; // adding the word to the text
if (($cnt % $num) == 0) { // if array index multiple fo 200 insert the image
$text .= "<img src='your_img_path' >";
}
}
I have a problem with str_getcsv function for PHP.
I have this code:
<?php
$string = '#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=714000,RESOLUTION=640x480,CODECS="avc1.77.30, mp4a.40.34"';
$array = str_getcsv($string, ",", '"');
print_r($array);
Which should return:
Array
(
[0] => #EXT-X-STREAM-INF:PROGRAM-ID=1
[1] => BANDWIDTH=714000
[2] => RESOLUTION=640x480
[3] => CODECS=avc1.77.30, mp4a.40.34
)
But instead, it is returning:
Array
(
[0] => #EXT-X-STREAM-INF:PROGRAM-ID=1
[1] => BANDWIDTH=714000
[2] => RESOLUTION=640x480
[3] => CODECS="avc1.77.30
[4] => mp4a.40.34"
)
Cause it is ignoring the enclosure of the last parameter: CODECS and is spliting also that information. I'm using str_getcsv instead of just doing explode(",", $string) precisely for that reason (that function should respect the enclosure) but it is working the same as explode will do it.
The code being executed: http://eval.in/17471
The enclosure (third) parameter does not have quite that effect. The enclosure character is treated as such only when it appears next to the delimiter.
To get your desired output, the input would need to be
#EXT-X-STREAM-INF:PROGRAM-ID=1,...,"CODECS=avc1.77.30, mp4a.40.34"
See it in action.
$this->row['contents'] = strip_tags($this->row['contents']);
$this->words = explode(" ", $this->row['contents']);
The code above should create an array with a key => value pair for each word of $this->row['contents']. Under normal circumstances it works just fine, but with a string such as:
This costs U$ 10.40 per liter.
It will separate as
[0] => This
[1] => Costs U$
[2] => 10.40 per
[3] => liter.
Any ideas how to solve this?
maybe this code help you
$this->words = preg_split('/\s+/', $this->row['contents']);
Hello can someone help me with this regex please
here is my $lang_file:
define(words_picture,"Снимка");
define(words_amount,"бр.");
define(words_name,"Име");
define(words_price_piece,"Ед. цена");
define(words_total,"Обща цена");
define(words_del,"Изтрий");
define(words_delivery,"Доставка,но няма");
this is my code :
$fh = fopen($lang_file, 'r');
$data = str_replace($rep,"",fread($fh, filesize($lang_file)));
fclose($fh);
preg_match_all('/define\((.*?)\)/i', $data,$defines,PREG_PATTERN_ORDER);
when i print $defines i get this :
[0] => words_picture,"Снимка"
[1] => words_amount,"бр."
[2] => words_name,"Име"
[3] => words_price_piece,"Ед. цена"
[4] => words_total,"Обща цена"
[5] => words_del,"Изтрий"
[6] => words_delivery,"Доставка" //here is the part that is missing and i need it :-)
so when there is a comma inside the string it breaks the string there, and doesn't return correct value.
Try (koko.*?) as the match. That'll return koko for koko,goko. If you want it to return koko,goko, remove the ?. Make it (koko.*). That will return koko,goko for koko,goko.
Here's a site that I use to test my regex against a number of cases:
http://www.cyber-reality.com/regexy.html
based on your edit I'd say you're looking for (koko.*). If your code worked for everything else, use this:
preg_match_all('\(/define.*)\)/i', $data,$defines,PREG_PATTERN_ORDER);