Regex to match Youtube URL's

Regex to match Youtube URL's - php

I am trying to validate a Youtube URL using regex:
preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]+~', $videoLink)
It kind of works, but it can match URL's that are malformed. For example, this will match ok:
http://www.youtube.com/watch?v=Zu4WXiPRek
But so will this:
http://www.youtube.com/watch?v=Zu4WX£&P!ek
And this wont:
http://www.youtube.com/watch?v=!Zu4WX£&P4ek
I think it's because of the + operator. It's matching what seems to be the first character after v=, when it needs to try and match everything behind v= with [a-zA-Z0-9-]. Any help is appreciated, thanks.

To provide an alternative that is larger and much less elegant than a regex, but works with PHP's native URL parsing functions so it might be a bit more reliable in the long run:
$url = "http://www.youtube.com/watch?v=Zu4WXiPRek";
$query_string = parse_url($url, PHP_URL_QUERY); // v=Zu4WXiPRek
$query_string_parsed = array();
parse_str($query_string, $query_string_parsed); // an array with all GET params
echo($query_string_parsed["v"]); // Will output Zu4WXiPRek that you can then
// validate for [a-zA-Z0-9] using a regex

The problem is that you are not requiring any particular number of characters in the v= part of the URL. So, for instance, checking
http://www.youtube.com/watch?v=Zu4WX£&P!ek
will match
http://www.youtube.com/watch?v=Zu4WX
and therefore return true. You need to either specify the number of characters you need in the v= part:
preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]{10}~', $videoLink)
or specify that the group [a-zA-Z0-9-] must be the last part of the string:
preg_match('~http://youtube.com/watch\?v=[a-zA-Z0-9-]+$~', $videoLink)
Your other example
http://www.youtube.com/watch?v=!Zu4WX£&P4ek
does not match, because the + sign requires that at least one character must match [a-zA-Z0-9-].

Short answer:
preg_match('%(http://www.youtube.com/watch\?v=(?:[a-zA-Z0-9-])+)(?:[&"\'\s])%', $videoLink)
There are a few assumptions made here, so let me explain:
I added a capturing group ( ... ) around the entire http://www.youtube.com/watch?v=blah part of the link, so that we can say "I want get the whole validated link up to and including the ?v=movieHash"
I added the non-capturing group (?: ... ) around your character set [a-zA-Z0-9-] and left the + sign outside of that. This will allow us to match all allowable characters up to a certain point.
Most importantly, you need to tell it how you expect your link to terminate. I'm taking a guess for you with (?:[&"\'\s])
?) Will it be in html format (e.g. anchor tag) ? If so, the link in href will obviously end with a " or '.
?) Or maybe there's more to the query string, so there would be an & after the value of v.
?) Maybe there's a space or line break after the end of the link \s.
The important piece is that you can get much more accurate results if you know what's surrounding what you are searching for, as is the case with many regular expressions.
This non-capturing group (in which I'm making assumptions for you) will take a stab at finding and ignoring all the extra junk after what you care about (the ?v=awesomeMovieHash).
Results:
http://www.youtube.com/watch?v=Zu4WXiPRek
- Group 1 contains the http://www.youtube.com/watch?v=Zu4WXiPRek
http://www.youtube.com/watch?v=Zu4WX&a=b
- Group 1 contains http://www.youtube.com/watch?v=Zu4WX
http://www.youtube.com/watch?v=!Zu4WX£&P4ek
- No match
a href="http://www.youtube.com/watch?v=Zu4WX&size=large"
- Group 1 contains http://www.youtube.com/watch?v=Zu4WX
http://www.youtube.com/watch?v=Zu4WX£&P!ek
- No match

The "v=..." blob is not guaranteed to be the first parameter in the query part of the URL. I'd recommend using PHP's parse_url() function to break the URL into its component parts. You can also reassemble a pristine URL if someone began the string with "https://" or simply used "youtube.com" instead of "www.youtube.com", etc.
function get_youtube_vidid ($url) {
$vidid = false;
$valid_schemes = array ('http', 'https');
$valid_hosts = array ('www.youtube.com', 'youtube.com');
$valid_paths = array ('/watch');
$bits = parse_url ($url);
if (! is_array ($bits)) {
return false;
}
if (! (array_key_exists ('scheme', $bits)
and array_key_exists ('host', $bits)
and array_key_exists ('path', $bits)
and array_key_exists ('query', $bits))) {
return false;
}
if (! in_array ($bits['scheme'], $valid_schemes)) {
return false;
}
if (! in_array ($bits['host'], $valid_hosts)) {
return false;
}
if (! in_array ($bits['path'], $valid_paths)) {
return false;
}
$querypairs = explode ('&', $bits['query']);
if (count ($querypairs) < 1) {
return false;
}
foreach ($querypairs as $querypair) {
list ($key, $value) = explode ('=', $querypair);
if ($key == 'v') {
if (preg_match ('/^[a-zA-Z0-9\-_]+$/', $value)) {
# Set the return value
$vidid = $value;
}
}
}
return $vidid;
}

Following regex will match any youtube link:
$pattern='#(((http(s)?://(www\.)?)|(www\.)|\s)(youtu\.be|youtube\.com)/(embed/|v/|watch(\?v=|\?.+&v=|/))?([a-zA-Z0-9._\/~#&=;%+?-\!]+))#si';

Related

Check if string is a comma-separated list of digits

In my table1 i have varchar field where i store an id-list of other table2 (id - INT UNSIGNED AUTOINCREMENT), separated by comma.
For example: 1,3,5,12,90
Also ids should not be repeated.
I need to check if a string (coming from outside) matches this rule.
For example i need to check $_POST['id_list']
Data consistency is not important for now (for example insert this value without checking if this ids really exist in table2)
Any advice will be helpful.

The easiest way to do such check is to use regular expression (preg_match).
Lets try to find a pattern matching our rule.
Just comma-separated digits:
^[0-9]+(,[0-9]*)*$
^ - means start of string.
$ - means end of string.
[0-9]+ - means that our string MUST starts with a digits.
(,[0-9]+)* - means that our string CAN continue itself with ",$someDigits" manner, from 0 to as many you wish times.
But if our digits are "INT UNSIGNED AUTOINCREMENT" we should modify our pattern this way:
^[1-9][0-9]*(,[1-9][0-9]+)*$
to exclude cases like: 0,01,02,009,000,012
As for unique values, i think more clear will be to use splitting (explode) string by comma to array, pass it through array_unique and compare.
So the result check-function will be:
function isComaSeparatedIds($string, $allowEmpty = false) {
if ($allowEmpty AND $string === '') {
return true;
}
if (!preg_match('#^[1-9][0-9]*(,[1-9][0-9]*)*$#', $string)) {
return false;
}
$idsArray = explode(',', $string);
return count($idsArray) == count(array_unique($idsArray));
}
Also added $allowEmpty argument if u would like to allow empty strings.

For the sake of completeness I would like to mention the following solution:
<?php
$check = explode(',', $string);
if ($diff = array_diff_key($check, array_filter($check, 'ctype_digit'))) {
// at least one is not a digit
foreach ($diff as $failIndex => $failValue) {
// handle
}
}
For less than 1000 digits in the string this is a little faster than preg_match and as little extra you get the positions and the values that are not a digit.

This is a bit of a cheeky solution,but it sure does the work.
<?php
$a="1,3,5,12,90";
$b=explode(",",$a);
$str='';
for($c=0;$c<count($b);$c++)
{
if (preg_match('/^[0-9]+$/', $b[$c]))
{
$str=$str."Y";
}
}
if(count(array_unique($b))==count($b) && (count($b)==strlen($str)))
{
echo $str;
//FURTHER CODE HERE WHEN ALL ELEMENTS UNIQUE AND VALID NUMBERS
}
else
{
//FURTHER CODE HERE WHEN NOT UNIQUE OR NOT A VALID NUMBERS
}
?>

Replace string between two slashes [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 2 years ago.
I have to modify an URL like this:
$string = "/st:1/sc:RsrlYQhSQvs=/fp:1/g:3/start:2015-07-01/end:2015-07-30";
Namely, I want to delete st:1 with a regex. I used:
preg_replace("/\/st:(.*)\//",'',$string)
but I got
end:2015-07-30
while I would like to get:
/sc:RsrlYQhSQvs=/fp:1/g:3/start:2015-07-01/end:2015-07-30
Same if I would like to delete fp:1.

You can use:
$string = preg_replace('~/st:[^/]*~','',$string);
[^/]* will only match till next /

You are using greedy matching with . that matches any character.
Use a more restricted pattern:
preg_replace("/\/st:[^\/]*/",'',$string)
The [^\/]* negated character class only matches 0 or more characters other than /.
Another solution would be to use lazy matching with *? quantifier, but it is not that efficient as with the negated character class.
FULL REGEX EXPLANATION:
\/st: - literal /st:
[^\/]* - 0 or more characters other than /.

You need to add ? in your regex:-
<?php
$string = "/st:1/sc:RsrlYQhSQvs=/fp:1/g:3/start:2015-07-01/end:2015-07-30";
echo preg_replace("/\/st:(.*?)\//",'',$string)
?>
Output:- https://eval.in/397658
Based on this same you can do for next things also.

Instead of using regex here you should make parsing utility functions for your special format string, they are simple, they don't take to long to write and they will make your life a lot easier:
function readPath($path) {
$parameters = array();
foreach(explode('/', $path) as $piece) {
// Here we make sure we have something
if ($piece == "") {
continue;
}
// This here is just a fancy way of splitting the array returned
// into two variables.
list($key, $value) = explode(':', $piece);
$parameters[$key] = $value;
}
return $parameters;
}
function writePath($parameters) {
$path = "";
foreach($parameters as $key => $value) {
$path .= "/" . implode(":", array($key, $value));
}
return $path;
}
Now you can just work on it as a php array, in this case you would go:
$parameters = readPath($string);
unset($parameters['st']);
$string = writePath($parameters);
This makes for much more readable and reusable code, additionally since most of the time you are dealing with only slight variations of this format you can just change the delimiters each time or even abstract these functions to using different delimiters.
Another way to deal with this is to convert the string to conform to a normal path query, using something like:
function readPath($path) {
return parse_str(strtr($path, "/:", "&="));
}
In your case though since you are using the "=" character in a url you would also need to url encode each value so as to not conflict with the format, this would involve similarly structured code to above though.

How to return false if all numerical values of 0-9 are matching WITH other characters in string?

I am specifically targeting numerical only, So if I am using a phone mask using javascript on front end that filters user input to (000)000-000, basically [2-9] and [0-9] as mask (jquery.maskedinput-1.3.js) and mobile filter...
jQuery(function ($e) {
var isMobile = navigator.userAgent.match(/(iPhone|iPod|iPad|Android|BlackBerry)/);
$e('#refer').val(window.location.href);
if (!(isMobile)) {
$e('#phone').mask('(299)299-9999');
$e('#field_phone_number').mask('299-299-9999');
}
});
For server side I have a regular expression in PHP as (nothing special yet)
function phonenumber($value)
{
return preg_match("/\(?\b[(. ]?[0-9]{3}\)?[). ]?[0-9]{3}[-. ]?[0-9]{4}\b/i", $value);
}
How can a create a regex or php script that targets all numerical values without creating a very long regex for each character? I just want to know if someone types in (222)222-2222, they get a false on the return.

function phonenumber($value)
{
$prefix = '\d{3}'; // You might want to specify '2\d\d' (200 to 299)
$regex = '#^(\('.$prefix.'\)|'.$prefix.')[\s\.-]?\d{3}[\.-]?\d{4}$#';
if (preg_match($regex, $value))
{
// Number is in a suitable format
// Now extract digits -- remove this section to not test repeated pattern
$digits = preg_replace('#[^\d]+#', '', $value);
// All numbers equal are rejected
if (preg_match('#^(\d)\1{9}$#', $digits))
return false;
// end of pattern check
// Otherwise it is accepted
return true;
}
return false; // Not in a recognized format
}
This will accept (299)423-1234 and 277-111-2222, and also (400)1234567 or 4001234567. It will reject (400-1234567 and 400-12-34-56-7. It will also reject (222)222-2222 because of the repeated 2's.

You can use a backreference \1 to detect recurring patterns. In your case you can simply mix in a .* to ignore in-between fillers like ( and -
/(\d)(.*\1){7}/
Will look for a number, and at least 7 repetitions of the same, ignoring any other characters used as filler. This will not ensure that they are consecutive however, so (222)222-8222 would match too.

filter specific string in php

$var="UseCountry=1
UseCountryDefault=1
UseState=1
UseStateDefault=1
UseLocality=1
UseLocalityDefault=1
cantidad_productos=5
expireDays=5
apikey=ABQIAAAAFHktBEXrHnX108wOdzd3aBTupK1kJuoJNBHuh0laPBvYXhjzZxR0qkeXcGC_0Dxf4UMhkR7ZNb04dQ
distancia=15
AutoCoord=1
user_add_locality=0
SaveContactForm=0
ShowVoteRating=0
Listlayout=0
WidthThumbs=100
HeightThumbs=75
WidthImage=640
HeightImage=480
ShowImagesSystem=1
ShowOrderBy=0
ShowOrderByDefault=0
ShowOrderDefault=DESC
SimbolPrice=$
PositionPrice=0
FormatPrice=0
ShowLogoAgent=1
ShowReferenceInList=1
ShowCategoryInList=1
ShowTypeInList=1
ShowAddressInList=1
ShowContactLink=1
ShowMapLink=1
ShowAddShortListLink=1
ShowViewPropertiesAgentLink=1
ThumbsInAccordion=5
WidthThumbsAccordion=100
HeightThumbsAccordion=75
ShowFeaturesInList=1
ShowAllParentCategory=0
AmountPanel=
AmountForRegistered=5
RegisteredAutoPublish=1
AmountForAuthor=5
AmountForEditor=5
AmountForPublisher=5
AmountForManager=5
AmountForAdministrator=5
AutoPublish=1
MailAdminPublish=1
DetailLayout=0
ActivarTabs=0
ActivarDescripcion=1
ActivarDetails=1
ActivarVideo=1
ActivarPanoramica=1
ActivarContactar=1
ContactMailFormat=1
ActivarReservas=1
ActivarMapa=1
ShowImagesSystemDetail=1
WidthThumbsDetail=120
HeightThumbsDetail=90
idCountryDefault=1
idStateDefault=1
ms_country=1
ms_state=1
ms_locality=1
ms_category=1
ms_Subcategory=1
ms_type=1
ms_price=1
ms_bedrooms=1
ms_bathrooms=1
ms_parking=1
ShowTextSearch=1
minprice=
maxprice=
ms_catradius=1
idcatradius1=
idcatradius2=
ShowTotalResult=1
md_country=1
md_state=1
md_locality=1
md_category=1
md_type=1
showComments=0
useComment2=0
useComment3=0
useComment4=0
useComment5=0
AmountMonthsCalendar=3
StartYearCalendar=2009
StartMonthCalendar=1
PeriodOnlyWeeks=0
PeriodAmount=3
PeriodStartDay=1
apikey=ABQIAAAAJ879Hg7OSEKVrRKc2YHjixSmyv5A3ewe40XW2YiIN-ybtu7KLRQiVUIEW3WsL8vOtIeTFIVUXDOAcQ
";
in that string only i want "api==ABQIAAAAJ879Hg7OSEKVrRKc2YHjixSmyv5A3ewe40XW2YiIN-ybtu7KLRQiVUIEW3WsL8vOtIeTFIVUXDOAcQ";
plz guide me correctly;

EDIT
As shamittomar pointed out, the parse_str will not work for this situation, posted the proper regex below.
Given this seems to be a QUERY STRING, use the parse_str() function PHP provides.
UPDATE
If you want to do it with regex using preg_match() as powertieke pointed out:
preg_match('/apikey=(.*)/', $var, $matches);
echo $matches[1];
Should do the trick.

preg_match(); should be right up your alley

people are so fast to jump to preg match when this can be done with regular string functions thats faster.
$string = '
expireDays=5
apikey=ABQIAAAAFHktBEXrHnX108wOdzd3aBTupK1kJuoJNBHuh0laPBvYXhjzZxR0qkeXcGC_0Dxf4UMhkR7ZNb04dQ
distancia=15
AutoCoord=1';
//test to see what type of line break it is and explode by that.
$parts = (strstr($string,"\r\n") ? explode("\r\n",$string) : explode("\n",$string));
$data = array();
foreach($parts as $part)
{
$sub = explode("=",trim($part));
if(!empty($sub[0]) || !empty($sub[1]))
{
$data[$sub[0]] = $sub[1];
}
}
and use $data['apikey'] for your api key, i would also advise you to wrpa in function.
I can bet this is a better way to parse the string and much faster.
function ParsemyString($string)
{
$parts = (strstr($string,"\r\n") ? explode("\r\n",$string) : explode("\n",$string));
$data = array();
foreach($parts as $part)
{
$sub = explode("=",trim($part));
if(!empty($sub[0]) || !empty($sub[1]))
{
$data[$sub[0]] = $sub[1];
}
}
return $data;
}
$data = ParsemyString($string);

First of all, you are not looking for
api==ABQIAAAAJ879Hg7OSEKVrRKc2YHjixSmyv5A3ewe40XW2YiIN-ybtu7KLRQiVUIEW3WsL8vOtIeTFIVUXDOAcQ
but you are looking for
apikey=ABQIAAAAJ879Hg7OSEKVrRKc2YHjixSmyv5A3ewe40XW2YiIN-ybtu7KLRQiVUIEW3WsL8vOtIeTFIVUXDOAcQ
It is important to know if the api-key property always occurs at the end and if the length of the api-key value is always the same. I this is the case you could use the PHP substr() function which would be easiest.
If not you would most probably need a regular expression which you can feed to PHPs preg_match() function. Something along the lines of apikey==[a-zA-Z0-9\-] Which matches an api-key containing a-z in both lowercase and uppercase and also allows for dashes in the key. If you are using the preg_match() function you can retrieve the matches (and thus your api-key value).

Get more backreferences from regexp than parenthesis

Ok this is really difficult to explain in English, so I'll just give an example.
I am going to have strings in the following format:
key-value;key1-value;key2-...
and I need to extract the data to be an array
array('key'=>'value','key1'=>'value1', ... )
I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:
/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/
to work with preg_match and this code:
for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
$parameters[$matches[$i]] = $matches[$i+1];
}
However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.
In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.

You can use a lookahead to validate the input while you extract the matches:
/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/
(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.
This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.
If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.

regex is powerful tool, but sometimes, its not the best approach.
$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
$e = explode("-",$k);
$array[$e[0]]=$e[1];
}
print_r($array);

Use preg_match_all() instead. Maybe something like:
$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';
preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$parameters[$match[1]] = $match[2];
}
print_r($parameters);
EDIT:
to first validate if the input string conforms to the pattern, then just use:
if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
EDIT2: the final semicolon is optional
if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
/* do the preg_match_all stuff */
}

No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.

what about this solution:
$samples = array(
"good" => "key-value;key1-value;key2-value;key5-value;key-value;",
"bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
"bad2" => "key;key1-value;key2-value;key5-value;key-value;",
"bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);
foreach($samples as $name => $value) {
if (preg_match("/^(\w+-\w+;)+$/", $value)) {
printf("'%s' matches\n", $name);
} else {
printf("'%s' not matches\n", $name);
}
}

I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex to match Youtube URL's - php

Following regex will match any youtube link: $pattern='#(((http(s)?://(www\.)?)|(www\.)|\s)(youtu\.be|youtube\.com)/(embed/|v/|watch(\?v=|\?.+&v=|/))?([a-zA-Z0-9._\/~#&=;%+?-\!]+))#si';

Related

Check if string is a comma-separated list of digits

Replace string between two slashes [duplicate]

How to return false if all numerical values of 0-9 are matching WITH other characters in string?

filter specific string in php

Get more backreferences from regexp than parenthesis

Categories

Resources