Hello can someone help me with this regex please
here is my $lang_file:
define(words_picture,"Снимка");
define(words_amount,"бр.");
define(words_name,"Име");
define(words_price_piece,"Ед. цена");
define(words_total,"Обща цена");
define(words_del,"Изтрий");
define(words_delivery,"Доставка,но няма");
this is my code :
$fh = fopen($lang_file, 'r');
$data = str_replace($rep,"",fread($fh, filesize($lang_file)));
fclose($fh);
preg_match_all('/define\((.*?)\)/i', $data,$defines,PREG_PATTERN_ORDER);
when i print $defines i get this :
[0] => words_picture,"Снимка"
[1] => words_amount,"бр."
[2] => words_name,"Име"
[3] => words_price_piece,"Ед. цена"
[4] => words_total,"Обща цена"
[5] => words_del,"Изтрий"
[6] => words_delivery,"Доставка" //here is the part that is missing and i need it :-)
so when there is a comma inside the string it breaks the string there, and doesn't return correct value.
Try (koko.*?) as the match. That'll return koko for koko,goko. If you want it to return koko,goko, remove the ?. Make it (koko.*). That will return koko,goko for koko,goko.
Here's a site that I use to test my regex against a number of cases:
http://www.cyber-reality.com/regexy.html
based on your edit I'd say you're looking for (koko.*). If your code worked for everything else, use this:
preg_match_all('\(/define.*)\)/i', $data,$defines,PREG_PATTERN_ORDER);
Related
This question already has an answer here:
How to retrieve variable="value" pairs from m3u string
(1 answer)
Closed 3 years ago.
Hope you guys can help me out. I have the following .m3u file
#EXTM3U
#EXTINF:-1 tvg-id="" tvg-name="A&E" tvg-logo="" group-title="ENTRETENIMIENTO",A&E
http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts
#EXTINF:-1 tvg-id="" tvg-name="ABC Puerto Rico" tvg-logo="" group-title="NACIONALES",ABC Puerto Rico
http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/96.ts
#EXTINF:-1 tvg-id="" tvg-name="Animal Planet" tvg-logo="" group-title="ENTRETENIMIENTO",Animal Planet
http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/185.ts
As you can see, there is the main tag for the file
#EXTM3U and down that start the video information tag (#EXTINF:-1 ...) and down that the video link entry (http:// .....)
Can you explicitly tell me how can i parse this whole file (it's a pretty large one) and save the fields in an array for example like this? videos[ ]
and later i can acces to every video attributes lets say videos[0]['title'] for getting the title for the first video? and so on with the other attributes for example videos[42]['link'] and get the link to the video #42.
I am already using curl to get the file content into a variable like this
<?php
$handler = curl_init("link to m3u file");
$response = curl_exec ($handler);
curl_close($handler);
echo $response;
?>
What i need now is to parse the Curl response and save all the videos information into an array, where i can acces to every attribute of every video.
I know i must use some regexp or something like that. i just dont understand how. can you please help me with some code? thank you so much.
Behold the magik of Regx
$string = <<<CUT
#EXTM3U
#EXTINF:-1 tvg-id="" tvg-name="A&E" tvg-logo="" group-title="ENTRETENIMIENTO",A&E`http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts
http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts
#EXTINF:-1 tvg-id="" tvg-name="ABC Puerto Rico" tvg-logo="" group-title="NACIONALES",ABC Puerto Rico
http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/96.ts
CUT;
preg_match_all('/(?P<tag>#EXTINF:-1)|(?:(?P<prop_key>[-a-z]+)=\"(?P<prop_val>[^"]+)")|(?<something>,[^\r\n]+)|(?<url>http[^\s]+)/', $string, $match );
$count = count( $match[0] );
$result = [];
$index = -1;
for( $i =0; $i < $count; $i++ ){
$item = $match[0][$i];
if( !empty($match['tag'][$i])){
//is a tag increment the result index
++$index;
}elseif( !empty($match['prop_key'][$i])){
//is a prop - split item
$result[$index][$match['prop_key'][$i]] = $match['prop_val'][$i];
}elseif( !empty($match['something'][$i])){
//is a prop - split item
$result[$index]['something'] = $item;
}elseif( !empty($match['url'][$i])){
$result[$index]['url'] = $item ;
}
}
print_r( $result );
Returns
array (
0 =>
array (
'tvg-name' => 'A&E',
'group-title' => 'ENTRETENIMIENTO',
'something' => ',A&E`http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts',
'url' => 'http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts',
),
1 =>
array (
'tvg-name' => 'ABC Puerto Rico',
'group-title' => 'NACIONALES',
'something' => ',ABC Puerto Rico',
'url' => 'http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/96.ts',
),
)
Seriously though I have no clue what some of this is something for example. Anyway should get you started.
For the regx, it's actually pretty simple when it's broken down. The real trick is in using preg_match_all instead of preg_match.
Here is our regx
/(?P<tag>#EXTINF:-1)|(?:(?P<prop_key>[-a-z]+)=\"(?P<prop_val>[^"]+)")|(?<something>,[^\r\n]+)|(?<url>http[^\s]+)/
First we will break it down to more manageable bits. These are separated by the pipe | for or. Each one can be thought as a separate pattern, match this one or the next one. Now, the order can be important, because they will match left to right so if one matches on the left it stops. So you have to be careful no to have a regx that can match in two places ( if you don't want that ). However, it can be used to your advantage too, as I will show below. This is really what we are dealing with
(?P<tag>#EXTINF:-1)
(?:(?P<prop_key>[-a-z]+)=\"(?P<prop_val>[^"]+)")
(?<something>,[^\r\n]+)
(?<url>http[^\s]+)
Four regular expressions. For all of these (?P<name>...) is a named capture group, it just makes it more readable, easier to find the bits. If you look at the conditions I use to find the matches, for example!empty($match['tag'][$i]), we can use the tag index/key because of a named capture group, otherwise it would be 1. With a number of regx all together, having 1 2 3 can get messy if you consider this is actually nested so it would be $match[1][$i] for tag etc. Anyway, once that is taken out we have
#EXTINF:-1 match this string literally
(?:(?P<prop_key>[-a-z]+)=\"(?P<prop_val>[^"]+)") this is more complicated (?: .. ) is a non-capture group, this is so the key/value winds up with the same index in the match array but not captured togather, Broken down this is ([-a-z]+)=\"([^"]+)\" or match a word followed by = then " than anything but a " ending with ". Basically one side captures the key, the other the value excluding the double quotes
,[^\r\n]+ starts with a comma then anything but a line return
and last http[^\s] a url
Now remember what I said about order being important, this url http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts would match the last expression, except that it starts with ,A&Ehttp://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts` which matches the 3rd one so it never gets to number 4
Hope that helps, granted you'll have to have a basic grasp of Regx, this is not really the place for a full tutorial on that, and you can find better examples then I can provide in a few short minutes.
Just for the sake of completeness, here is part of what preg_match_all returns
(
[0] => Array(
[0] => #EXTINF:-1
[1] => tvg-name="A&E"
[2] => group-title="ENTRETENIMIENTO"
[3] => ,A&E`http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts
[4] => http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/76.ts
[5] => #EXTINF:-1
[6] => tvg-name="ABC Puerto Rico"
[7] => group-title="NACIONALES"
[8] => ,ABC Puerto Rico
[9] => http://nxtv.tk:8080/live/jarenas/iDKZrC56xZ/96.ts
)
[tag] => Array(
[0] => #EXTINF:-1
[1] =>
[2] =>
[3] =>
[4] =>
[5] => #EXTINF:-1
[6] =>
[7] =>
[8] =>
[9] =>
)
[1] => Array(
[0] => #EXTINF:-1
[1] =>
[2] =>
[3] =>
[4] =>
[5] => #EXTINF:-1
[6] =>
[7] =>
[8] =>
[9] =>
)
[prop_key] => Array(
[0] =>
[1] => tvg-name
[2] => group-title
[3] =>
[4] =>
[5] =>
[6] => tvg-name
[7] => group-title
[8] =>
[9] =>
)
[2] => Array( ... duplicate of prop_key .. )
etc.
)
The way to find the item in the above array is if you look at the for loop when it runs the first time index 0, the main part of the match $match[0][$i] contains all the matches, but the tag array only contains the items that match that regx, we can correlate them using the $i index.
if( !empty($match['tag'][$i])){
//is a tag increment the result index
++$index;
}
If $match[tag][$i] is not empty. which if you look at $match[tag][0] when $i = 0 you will see that indeed it is not empty. On the second loop $match[tag][1] is empty but $match[prop_key][1] is not so we know that when $i = 1 item is a prop_key match. That's how that works.
-ps- if you can find a way to remove the duplicated numeric indexes, please share it with me ... lol ... these are the normal matches if I didn't use a named capture group, as I said it can get messy.
I did a simple working m3u8 parser in php.
it's a remote m3u8 file parser to json but it easy to change the output
https://github.com/onigetoc/m3u8-PHP-Parser
I may soon change it or add a CURL parser instead of file_get_contents().
m3u-parser.php?url=https://raw.githubusercontent.com/onigetoc/m3u8-PHP-Parser/master/ressources/demofile.m3u
Once you get the CURL Response then read the file from Remote Location via CURL or fopen function.
For that you have read the files that are into directory from remote location and save all the files into Local server.
You can use the file function "Stat" for the getting all the information and keep into the $files
I have given the idea regarding how to collect all information and then you can create array.
Once the Array is created you can serialize the response for printing.
I would like to split a string where any character is a space or punctuation (excluding apostrophes). The following regex works as intended.
/[^a-z']/i
Words like I'll and Didn't are accepted, which is great.
The problem is with words like 'ere and 'im. I would like to remove the beginning apostrophe and have the words im and ere.
I would ideally like to stop/remove this within the regex pattern if possible.
Thanks in advance.
Try this
$str = "Words like I'll and Didn't are accepted, which is great.
The problem is with words like 'ere and 'im";
print_r(preg_split("/'?[^a-z']+'?/i", $str));
//Array ( [0] => Words [1] => like [2] => I'll [3] => and [4] => Didn't ...
// [16] => ere [17] => and [18] => im )
I wrote a function to strip parameters from urls, the function looks like this
function remove_it($c_link){
$regex = array();
$award = array();
$regex[] = '/[\?&](?<name>sa)=(?<value>[^&=]+)/';
$regex[] = '/[\?&](?<name>ei)=(?<value>[^&=]+)/';
$regex[] = '/[\?&](?<name>ved)=(?<value>[^&=]+)/';
$regex[] = '/[\?&](?<name>usg)=(?<value>[^&=]+)/';
foreach($regex as $remove){
$c_link = preg_replace($remove,'',$c_link);
}
return $c_link;
}
When I use a testurl like this
$test = 'http://forum.gofeminin.de/forum/dietetique/__f2955_dietetique-Diatpillen.html&sa=U&ei=8doOUa6HOsfKtAaDpICIBQ&ved=0CB0QFjAA&usg=AFQjCNEcFS48QvteNkSNcszXv5RG6VUe2g';
It's woking perfect. Now I wanted to use it in my code. So I called to function with my data and it doesn't affect the string. I used print_r to see if the string looks strange, but it's just 1:1 like in $test
$TEST-> http://forum.gofeminin.de/forum/dietetique/__f2955_dietetique-Diatpillen.html&sa=U&ei=C9wOUZuvCoeQtQavpoHoDg&ved=0CB0QFjAA&usg=AFQjCNHkRBKRpZXZX7idJ6YmSG0AIxtOdw
print_r-> http://forum.gofeminin.de/forum/dietetique/__f2955_dietetique-Diatpillen.html&sa=U&ei=C9wOUZuvCoeQtQavpoHoDg&ved=0CB0QFjAA&usg=AFQjCNHkRBKRpZXZX7idJ6YmSG0AIxtOdw
As I used all debugging methods that I know of, I don't really know where I should start searching... any pointers ?
I made antoher testrun, and saved all data in an array, later on I wanted to stript the parameter for 1 url. Here the testcode:
echo '<pre>';
print_r($test).'</br>';
echo remove_it($test[0]);
echo '</pre>';
break;
the output was like :
Array
(
[0] => http://forum.gofeminin.de/forum/dietetique/__f2955_dietetique-Diatpillen.html&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CDUQFjAA&usg=AFQjCNGgMS-nHM2JY_PkIt7C_RT2dr9bUw
[1] => http://www.fitforfun.de/abnehmen/gesund-essen/diaetpillen/diaetpillen-appetitzuegler_aid_2100.html&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CEEQFjAB&usg=AFQjCNG60KJy3wLR8DnLm9gKQEn-uR6l3w
[2] => http://www.stern.de/ernaehrung/uebergewicht-abnehmen/diaetpillen-check-welche-mittel-machen-duenn-das-abc-der-schlankmacher-615772.html&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CEYQFjAC&usg=AFQjCNGLzi5UMG4g5INDkeBdMpENgY4gHg
[3] => http://getslim.de/diaetpillen-im-test&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CEoQFjAD&usg=AFQjCNEcZnpSlVVxLgskK9DfhBF9AHGC2w
[4] => http://www.br.de/fernsehen/bayerisches-fernsehen/sendungen/gesundheit/themenuebersicht/medizin/schlankheitspillen-diaet-tabletten100.html&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CFQQFjAE&usg=AFQjCNHujKjdfNsOkarYf6MwHCPODcISjw
[5] => http://www.diaetpillenvergleich.de/beste-diatpillen/&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CFoQFjAF&usg=AFQjCNFBgbYjgutHJfp-eQztXTsKYk7rTw
[6] => http://www.diaetpillen-online.de/&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CF4QFjAG&usg=AFQjCNF083onO0rkMuQjY0tEIhhdSM4Igg
[7] => http://diaet.erdbeerlounge.de/Diaetpillen/&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CGIQFjAH&usg=AFQjCNFhNr-gsFxK1-vfjhnC1A5qQi1ZjQ
[8] => http://diaet.erdbeerlounge.de/abnehmen-forum/Diaetpillen-_t2698848s1&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CGcQFjAI&usg=AFQjCNHhHY3zUnJtwF6-HV-DbsxaVUFxsg
[9] => http://www.gutefrage.net/tag/diaetpillen/1&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CG0QFjAJ&usg=AFQjCNHPYODXZA1Sa2rs6ItnUWTOYkJj3w
)
http://forum.gofeminin.de/forum/dietetique/__f2955_dietetique-Diatpillen.html&sa=U&ei=LOIOUaqQGITntQbmmIHYBQ&ved=0CDUQFjAA&usg=AFQjCNGgMS-nHM2JY_PkIt7C_RT2dr9bUw
I made the test array and it works for me. It seems that your code is fine and something else is wrong.
Try wrapping the function input in double quotes.
remove_it("$test[0]");
I want to compare two strings against url:
$reg1 = "/(^(((www\.))|(?!(www\.)))domain\.com\/paramsindex\/([a-z]+)\/([a-z]+)\/((([a-z0-9]+)(\-[a-z0-9]+){0,})(\/([a-z0-9]+)(\-[a-z0-9]+){0,}){0,})|()\/?$)/";
$reg2 = "/(^(((www\.))|(?!(www\.)))domain\.com\/paramsassoc\/([a-z]+)\/([a-z]+)\/((([a-z0-9]+)(\-[a-z0-9]+){0,})(\/([a-z0-9]+)(\-[a-z0-9]+){0,}){0,})|()\/?$)/";
$uri = "www.domain.com/paramsindex/cont/meth/par1/par2/par3/";
$r1 = preg_match($reg1, $uri);
echo "<p>First regex returned: {$r1}</p>";
$r2 = preg_match($reg2, $uri);
echo "<p>Second regex returned: {$r2}</p>";
Now these strings are not the same, difference is this:
www.domain.com/paramsindex/cont/meth/par1/par2/par3/
vs.
www.domain.com/paramsassoc/cont/meth/par1/par2/par3/
And yet PHP preg_match returns 1 for both of them.
Now you will say this is a long regex and why use that. And the thing is I could built shorter regex but it is built on the fly and... it youst needs to be like that.
And what bothers me is that in Rubular regexs works as it should.
When testing them I was using Rubular, and now i PHP it wont work. I know Rubular is Ruby regex editor but I tought it should be the same :(
Rubular testing:here
What is problem here? How should I write that regex in PHP so preg_match can see the difference? And regex should be as close to the one I already wrote, is there some simple fix to my problem? Something im overlooking?
That behavior is by design, preg_match returns 1 when a match is found. If you want to capture matches, see the matches parameter at: http://php.net/manual/en/function.preg-match.php
Edit: For example
$matches = array();
$r2 = preg_match($reg2, $uri, $matches);
echo "<p>Second regex returned: ";
print_r($matches);
echo "</p>";
I'll leave the above to document my own stupidity for not answering the right question.
At the end of your regex you have |()\/?$)/ which will make the regex match URL that ends with a slash. Take it out and it looks like you're golden from my tests.
Always remember to group your operands!
I can assume that this one is can be quite hard to spot, but it's all because of your use of the or-operator |. You are not grouping the operands correctly and therefore the result described in your post is being yield.
Your use of |() in the provided case will match either nothing or the full regular expression to the left of your operator |.
To solve this issue you will need to put parentheses around the operands that should be ORed.
An easy method of seeing where everything goes wrong is to run this below snippet:
$reg1 = "/(^(((www\.))|(?!(www\.)))domain\.com\/paramsindex\/([a-z]+)\/([a-z]+)\/((([a-z0-9]+)(\-[a-z0-9]+){0,})(\/([a-z0-9]+)(\-[a-z0-9]+){0,}){0,})|()\/?$
$reg2 = "/(^(((www\.))|(?!(www\.)))domain\.com\/paramsassoc\/([a-z]+)\/([a-z]+)\/((([a-z0-9]+)(\-[a-z0-9]+){0,})(\/([a-z0-9]+)(\-[a-z0-9]+){0,}){0,})|()\/?$
$uri = "www.domain.com/paramsindex/cont/meth/par1/par2/par3/";
var_dump (preg_match($reg1, $uri, $match1));
var_dump (preg_match($reg2, $uri, $match2));
print_r ($match1);
print_r ($match2);
output
int(1)
int(1)
Array
(
[0] => www.domain.com/paramsindex/cont/meth/par1/par2/par3
[1] => www.domain.com/paramsindex/cont/meth/par1/par2/par3
[2] => www.
[3] => www.
[4] => www.
[5] =>
[6] => cont
[7] => meth
[8] => par1/par2/par3
[9] => par1
[10] => par1
[11] =>
[12] => /par3
[13] => par3
)
Array
(
[0] => /
[1] => /
[2] =>
[3] =>
[4] =>
[5] =>
[6] =>
[7] =>
[8] =>
[9] =>
[10] =>
[11] =>
[12] =>
[13] =>
[14] =>
[15] =>
)
As you see $reg2 matches a bunch of empty strings in $uri, which is an indication of what I described earlier.
If you come up with a short description of what you are trying to do I can provide you with a fully functional (and probably a bit neater than you current) regular expression.
Your RegEx is a mess and you will have to change it if you want it to work.
Check out the Rubular for your paramsindex: http://www.rubular.com/r/3ptjQ5aIrD
Now, for paramsassoc: http://www.rubular.com/r/o7GCbCsHyX
They both return a result. Sure it's an array full of empty strings, but it is a result nontheless.
That is why both are TRUE.
My original test implementation consisted of building an array of "ignore words" with the following code:
$ignoreList = array("test1", "test2", "test3");
Later on, I test for individual words in the $ignoreList:
if(in_array($word, $ignoreList)){
} else{
$words[$word] = $words[$word] + 1;
}
This code works perfectly - upon later echoing my word list, no words on the $ignoreList show up. I refactored to make it easier to add or remove words:
//Import ignore list
$ignore_raw = file_get_contents("includes/ignore.txt");
$ignoreList = explode("\n", $ignore_raw);
ignore.txt is a plain text file with each item on its own line, no spaces. The import and explode seems to be working, because a print_r statement on $ignoreList results in:
Array ( [0] => a [1] => and [2] => are [3] => as [4] => for [5] => in [6] => is [7] => more [8] => of [9] => than [10] => that [11] => the [12] => to [13] => with )
The comparison code, however, stops working properly, and words on the ignore list show up once again in my final results. Any ideas what's wrong?
Your ignore.txt file may have \r\n line endings, and your words actually have a trailing \r.
Try that:
$ignoreList = array_map('trim', file("includes/ignore.txt"));
BTW your code may be refactored like that:
$words = array_diff($words, $ignoreList); // removes ignored words
$words = array_count_values($words); // count words