Extract all MP3 and OGG Links from String with preg_match_all - php

i was trying to create a regular expressions to extract all MP3/OGG links from a example word but i could't! this is a example word that i'm trying to extract MP3/OGG files from it:
this is a example word http://domain.com/sample.mp3 and second file is https://www.mydomain.com/sample2.ogg. then this is a link for third file Download
and PHP part:
$Word = "this is a example word http://domain.com/sample.mp3 and second file is https://www.mydomain.com/sample2.ogg. then this is a link for third file Download";
$Pattern = '/href=\"(.*?)\".mp3/';
preg_match_all($Pattern,$Word,$Matches);
print_r($Matches);
i tried this too:
$Pattern = '/href="([^"]\.mp3|ogg)"/';
$Pattern = '/([-a-z0-9_\/:.]+\.(mp3|ogg))/i';
so i need your help to fix this code and extract all MP3/OGG links from that example word.
Thank you guys.

To retrieve all links, you can use:
((https?:\/\/)?(\w+?\.)+?(\w+?\/)+\w+?.(mp3|ogg))
Demo.
((https?:\/\/)? Optional http:// or https://
(\w+?\.)+? Matches domain groups
(\w+?\/)+ Matches the final domain group and forward slash
\w+?.(mp3|ogg)) Matches a filename ending in .mp3 or .ogg.
In the string you provided there are several unescaped quotation marks, when corrected and my regex added in:
$Word = "this is a example word http://domain.com/sample.mp3 and second file is https://www.mydomain.com/sample2.ogg. then this is a link for third file Download";
$Pattern = '/((https?:\/\/)?(\w+?\.)+?(\w+?\/)+\w+?.(mp3|ogg))/im';
preg_match_all($Pattern,$Word,$Matches);
var_dump($Matches[0]);
Produces the following output:
array (size=3)
0 => string 'http://domain.com/sample.mp3' (length=28)
1 => string 'https://www.mydomain.com/sample2.ogg' (length=36)
2 => string 'http://seconddomain.com/files/music.mp3' (length=39)

..extract all MP3/OGG links from that example word.
e.g.:
(?<=https?://(.+)?)\.(mp3|ogg)
$1 - uri
$2 - extension
Updated:
:( yes, on the PHP (v5.5 tested) search with:
(?<=https?://(.+)?)\.(mp3|ogg)
there are restrictions:
Compilation failed: lookbehind assertion is not fixed length at offset n
so, the similar variant:
(?<=p1(.+)?)p2 - match p2 if matched p1 before
p2(?=(.+)p3) - match p2 if matched p3 after - all working with not fixed length ~ .+? for PHP
for your sample:
//p2(?=.*p3)
preg_match_all("#https?://(?=(.+?)\.(mp3|ogg))#im", $Word, $Matches);
/*
[0] => Array
(
[0] => http://
[1] => https://
[2] => http://
)
[1] => Array
(
[0] => domain.com/sample
[1] => www.mydomain.com/sample2
[2] => seconddomain.com/files/music
)
[2] => Array
(
[0] => mp3
[1] => ogg
[2] => mp3
)
*/

Related

Extract urls from string without spaces between

Let's say I have a string like this:
$urlsString = "http://foo.com/barhttps://bar.com//foo.com/foo/bar"
and I want to get an array like this:
array(
[0] => "http://foo.com/bar",
[1] => "https://bar.com",
[0] => "//foo.com/foo/bar"
);
I'm looking to something like:
preg_split("~((https?:)?//)~", $urlsString, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
Where PREG_SPLIT_DELIM_CAPTURE definition is:
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
That said, the above preg_split returns:
array (size=3)
0 => string '' (length=0)
1 => string 'foo.com/bar' (length=11)
2 => string 'bar.com//foo.com/foo/bar' (length=24)
Any idea of what I'm doing wrong or any other idea?
PS: I was using this regex until I've realized that it doesn't cover this case.
Edit:
As #sidyll pointed, I'm missing the $limit in the preg_split parameters. Anyway, there is something wrong with my regex, so I will use #WiktorStribiżew suggestion.
You may use a preg_match_all with the following regex:
'~(?:https?:)?//.*?(?=$|(?:https?:)?//)~'
See the regex demo.
Details:
(?:https?:)? - https: or http:, optional (1 or 0 times)
// - double /
.*? - any 0+ chars other than line break as few as possible up to the first
(?=$|(?:https?:)?//) - either of the two:
$ - end of string
(?:https?:)?// - https: or http:, optional (1 or 0 times), followed with a double /
Below is a PHP demo:
$urlsString = "http://foo.com/barhttps://bar.com//foo.com/foo/bar";
preg_match_all('~(?:https?:)?//.*?(?=$|(?:https?:)?//)~', $urlsString, $urls);
print_r($urls);
// => Array ( [0] => http://foo.com/bar [1] => https://bar.com [2] => //foo.com/foo/bar )

preg_match to match number csv format and capture each number?

I want to use preg_match to parse '123,456,789,323' and only capture each number into arrray $m.
My php codes:
preg_match("/^(\d+)(?:,(\d+))*?$/",'123,456,789,323',$m));
print_r($m);
This is how I interpret my regexp:
^: Begin of line
1st (\d+): Capture 1st number
,(\d+): Match pattern 'a command then a number'.
(?:,(\d+))*?: Match zero or more [using *] of above pattern but don't
capture whole pattern [using ?:] instead only capture
the number [using (\d+)]. Lastly, match pattern
nongreedy [using last ?]
$: Match end of line.
But I get this output:
Array
(
[0] => 123,456,555,789,323
[1] => 123
[2] => 323
)
What I want is:
Array
(
[0] => 123,456,555,789,323
[1] => 123
[2] => 456
[3] => 789
[4] => 323
)
I thought (...)* is too greedy, so I use (...)*?. But it doesn't approve the output. What do I miss?
PS: I want to know how can regexp do this rather than use other way e.g. explode().

split string by spaces and colon but not if inside quotes

having a string like this:
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"
the desired result is:
[0] => Array (
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
what I get with:
preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);
is:
[0] => Array (
[0] => dateto:'2015-10-07
[1] => 15:05'
[2] => xxxx
[3] => datefrom:'2015-10-09
[4] => 15:05'
[5] => yyyy
[6] => asdf
)
Also tried with preg_split("/[\s]+/", $str) but no clue how to escape if value is between quotes. Can anyone show me how and also please explain the regex. Thank you!
I would use PCRE verb (*SKIP)(*F),
preg_split("~'[^']*'(*SKIP)(*F)|\s+~", $str);
DEMO
Often, when you are looking to split a string, using preg_split isn't the best approach (that seems a little counter intuitive, but that's true most of the time). A more efficient way consists to find all items (with preg_match_all) using a pattern that describes all that is not the delimiter (white-spaces here):
$pattern = <<<'EOD'
~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
EOD;
if (preg_match_all($pattern, $str, $m))
$result = $m[0];
pattern details:
~ # pattern delimiter
(?=\S) # the lookahead assertion only succeeds if there is a non-
# white-space character at the current position.
# (This lookahead is useful for two reasons:
# - it allows the regex engine to quickly find the start of
# the next item without to have to test each branch of the
# following alternation at each position in the strings
# until one succeeds.
# - it ensures that there's at least one non-white-space.
# Without it, the pattern may match an empty string.
# )
[^'"\s]* #"'# all that is not a quote or a white-space
(?: # eventual quoted parts
'[^']*' [^'"\s]* #"# single quotes
|
"[^"]*" [^'"\s]* # double quotes
)*
~
demo
Note that with this a little long pattern, the five items of your example string are found in only 60 steps. You can use this shorter/more simple pattern too:
~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
but it's a little less efficient.
For your example, you can use preg_split with negative lookbehind (?<!\d), i.e.:
<?php
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf";
$matches = preg_split('/(?<!\d)(\s)/', $str);
print_r($matches);
Output:
Array
(
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
Demo:
http://ideone.com/EP06Nt
Regex Explanation:
(?<!\d)(\s)
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\d)»
Match a single character that is a “digit” «\d»
Match the regex below and capture its match into backreference number 1 «(\s)»
Match a single character that is a “whitespace character” «\s»

PHP Regex: How to get optional text if present?

Let's take an example of following string:
$string = "length:max(260):min(20)";
In the above string, :max(260):min(20) is optional. I want to get it if it is present otherwise only length should be returned.
I have following regex but it doesn't work:
/(.*?)(?::(.*?))?/se
It doesn't return anything in the array when I use preg_match function.
Remember, there can be something else than above string. Maybe like this:
$string = "number:disallow(negative)";
Is there any problem in my regex or PHP won't return anything? Dumping preg_match returns int 1 which means the string matches the regex.
Fully Dumped:
int 1
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)
You're using single character (.) matching in the case of being lazy, at the very beginning. So it stops at the zero position. If you change your preg_match function to preg_match_all you'll see the captured groups.
Another problem is with your Regular Expression. You're killing the engine. Also e modifier is deprecated many many decades before!!! and yet it was used in preg_replace function only.
Don't use s modifier too! That's not needed.
This works at your case:
/([^:]+)(:.*)?/
Online demo
I tried to prepare a regex which can probably solve your issue and also add some value to it
this regex will not only match the optional elements but will also capture in key value pair
Regex
/(?<=:|)(?'prop'\w+)(?:\((?'val'.+?)\))?/g
Test string
length:max(260):min(20)
length
number:disallow(negative)
Result
MATCH 1
prop [0-6] length
MATCH 2
prop [7-10] max
val [11-14] 260
MATCH 3
prop [16-19] min
val [20-22] 20
MATCH 4
prop [24-30] length
MATCH 5
prop [31-37] number
MATCH 6
prop [38-46] disallow
val [47-55] negative
try demo here
EDIT
I think I understand what you meant by duplicate array with different key, it was due to named captures eg. prop & val
here is the revision without named capturing
Regex
/(?<=:|)(\w+)(?:\((.+?)\))?/
Sample code
$str = "length:max(260):min(20)";
$str .= "\nlength";
$str .= "\nnumber:disallow(negative)";
preg_match_all("/(?<=:|)(\w+)(?:\((.+?)\))?/",
$str,
$matches);
print_r($matches);
Result
Array
(
[0] => Array
(
[0] => length
[1] => max(260)
[2] => min(20)
[3] => length
[4] => number
[5] => disallow(negative)
)
[1] => Array
(
[0] => length
[1] => max
[2] => min
[3] => length
[4] => number
[5] => disallow
)
[2] => Array
(
[0] =>
[1] => 260
[2] => 20
[3] =>
[4] =>
[5] => negative
)
)
try demo here

regex to find year/month substring

Can someone help me with a regular expression to get the year and month from a text string?
Here is an example text string:
http://www.domain.com/files/images/2012/02/filename.jpg
I'd like the regex to return 2012/02.
This regex pattern would match what you need:
(?<=\/)\d{4}\/\d{2}(?=\/)
Depending on your situation and how much your strings vary - you might be able to dodge a bullet by simply using PHP's handy explode() function.
A simple demonstration - Dim the lights please...
$str = 'http://www.domain.com/files/images/2012/02/filename.jpg';
print_r( explode("/",$str) );
Returns :
Array
(
[0] => http:
[1] =>
[2] => www.domain.com
[3] => files
[4] => images
[5] => 2012 // Jack
[6] => 02 // Pot!
[7] => filename.jpg
)
The explode() function (docs here), splits a string according to a "delimiter" that you provide it. In this example I have use the / (slash) character.
So you see - you can just grab the values at 5th and 6th index to get the date values.

Categories