PHP url recognition - php

How can you check if text typed in from a user is an url?
Lets say I want to check if the text is a url and then check if the string has "youtube.com" in it. Afterwards, I want to get the portion of the link which is of interest for me between the substrings "watch?v=" and any "&" parameters if they do exist.

parse_url() is probably a good choice here. If you get a bad URL, the function will return false. Otherwise, it will break the URL up into its component pieces and you can use the ones you need.
Example:
$urlParts = parse_url('http://www.youtube.com/watch?v=MX0D4oZwCsA');
if ($urlParts == false) echo "Bad URL";
else echo "Param string is ".$urlParts['query'];
Outputs:
Param string is v=MX0D4oZwCsA
You could split the query portion as needed using explode() for specific parameters.
Edit: Keep in mind that parse_url() tries as hard as possible to parse the string it is given, so bad URLs will often succeed, although the resulting data array will be very odd. It's obviously up to you how definitive you want your validation to be and what exactly you require out of your user input.

preg_match('#watch\?v=([^&]+)#', $url, $matches);
echo $matches[1];

Its strongly recommended not to use parse_url() for url validation.
here is a nice solution.

Related

Look for phrase in string, and also look for wildcards after the phrase

I'm looking for a way to display some code based on the URL of the site.
Case 1:
If url contains getting-started do something
Case 2:
If url contains getting-started/* (could be anything) show something else.
I'm currently getting the URL string as:
$url = $_SERVER['REQUEST_URI'];
And then performing strpos
if (strpos($slug,'getting-started')){
But I need something to specifically target URLs where the end is getting-started and also be able to identify when there is more that comes after it.
You should use regex:
(getting-started)$ // gets the phrase at the end of the string
(getting-started)(.*) // gets the phrase + anything else
You could combine this with preg_match()

Parsing link from javascript function

I'm trying to parse a direct link out of a javascript function within a page. I'm able to parse the html info I need, but am stumped on the javascript part. Is this something that is achievable with php and possibly regex?
function videoPoster() {
document.getElementById("html5_vid").innerHTML =
"<video x-webkit-airplay='allow' id='html5_video' style='margin-top:"
+ style_padding
+ "px;' width='400' preload='auto' height='325' controls onerror='cantPlayVideo()' "
+ "<source src='http://video-website.com/videos/videoname.mp4' type='video/mp4'>";
}
What I need to pull out is the link "http://video-website.com/videos/videoname.mp4". Any help or pointers would be greatly appreciated!
/http://.*\.mp4/ will give you all characters between http:// and .mp4, inclusive.
See it in action.
If you need the session id, use something like /http://.*\.mp4?sessionid=\d+/
In general, no. Nothing short of a full javascript parser will always extract urls, and even then you'll have trouble with urls that are computed nontrivially.
In practice, it is often best to use the simplest capturing regexp that works for the code you actually need to parse. In this case:
['"](http://[^'"]*)['"]
If you have to enter that regexp as a string, beware of escaping.
If you ever have unescaped quotation marks in urls, this will fail. That's valid but rare. Whoever is writing the stuff you're parsing is unlikely to use them because they make referring to the urls in javascript a pain.
For your specific case, this should work, provided that none of the characters in the URL are escaped.
preg_match("/src='([^']*)'/", $html, $matches);
$url = $matches[1];
See the preg_match() manual page. You should probably add error handling, ensuring that the function returns 1 (that the regex matched) and possibly performing some additional checks as well (such as ensuring that the URL begins with http:// and contains .mp4?).
(As with all Web scraping techniques, the owner or maintainer of the site you are scraping may make a future change that breaks your script, and you should be prepared for that.)
The following captures any url in your html
$matches=array();
if (preg_match_all('/src=["\'](?P<urls>https?:\/\/[^"\']+)["\']/', $html, $matches)){
print_r($matches['urls']);
}
if you want to do the same in javascript you could use this:
var matches;
if (matches=html.match(/src=["'](https?:\/\/[^"']+)["']/g)){
//gives you all matches, but they are still including the src=" and " parts, so you would
//have to run every match again against the regex without the g modifier
}

Regex, how to match this string?

I have the url http://domain.com/script.php?l=7&p=146#p146. I want to be able to get the number after p=, without the #. Also, the hash may not always be there, so sometimes it could turn out as script.php?l=7&p=146. I know it's something to do with the regex character +, but I'm not completely sure on how to use it. Can someone please create the regex and explain how it works?
No need for regular expressions here.
$query = parse_url("http://domain.com/script.php?l=7&p=146#p146", PHP_URL_QUERY);
parse_str($query, $params);
echo $params['p'];
parse_url can get you all the distinct elements of a URL. And parse_str takes a query string (that stuff you find between ? and an optional # in a URL) and figures out the different parameters for you. You could also omit the parameter $params to the function, then parse_str would define some variables for you (afterward you could find the result in $p). But I personally rather dislike using parse_str with this side effect.
If you want to read up some more: PHP documentation on parse_url and parse_str
Don't reinvent the wheel. Use a built-in function, such as parse_url to parse the URL.
Documentation and examples: http://php.net/manual/en/function.parse-url.php

URL with query string validation using PHP

I need a PHP validation function for URL with Query string (parameters seperated with &). currently I've the following function for validating URLs
$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
echo preg_match($pattern, $url);
This function correctly validates input like
google.com
www.google.com
http://google.com
http://www.google.com ...etc
But this won't validate the URL when it comes with parameters (Query string). for eg.
http://google.com/index.html?prod=gmail&act=inbox
I need a function that accepts both types of URL inputs. Please help. Thanks in advance.
A simple filter_var
if(filter_var($yoururl, FILTER_VALIDATE_URL))
{
echo 'Ok';
}
might do the trick, although there are problems with url not preceding the schema:
http://codepad.org/1HAdufMG
You can turn around the issue by placing an http:// in front of urls without it.
As suggested by #DaveRandom, you could do something like:
$parsed = parse_url($url);
if (!isset($parsed['scheme'])) $url = "http://$url";
before feeding the filter_var() function.
Overall it's still a simpler solution than some extra-complicated regex, though..
It also has these flags available:
FILTER_FLAG_PATH_REQUIRED FILTER_VALIDATE_URL Requires the URL to
contain a path part. FILTER_FLAG_QUERY_REQUIRED FILTER_VALIDATE_URL
Requires the URL to contain a query string.
http://php.net/manual/en/function.parse-url.php
Some might think this is not a 100% bullet-proof,
but you can give a try as a start

PHP Regex help, getting part of a link

I'm trying to write a regex in php that in a line like
<a href="mypage.php?(some junk)&p=12345&(other junk)" other link stuff>Text</a>
and it will only return me "p=12345", or even "12345". Note that the (some junk)& and the &(otherjunk) may or may not be present.
Can I do this with one expression, or will I need more than one? I can't seem to work out how to do it in one, which is what I would like if at all possible. I'm also open to other methods of doing this, if you have a suggestion.
Thanks
Perhaps a better tactic over using a regular expressoin in this case is to use parse_url.
You can use that to get the query (what comes after the ? in your URL) and split on the '&' character and then the '=' to put things into a nice dictionary.
Use parse_url and parse_str:
$url = 'mypage.php?(some junk)&p=12345&(other junk)';
$parsed_url = parse_url($url);
parse_str($parsed_url['query'], $parsed_str);
echo $parsed_str['p'];

Categories