Regex, how to match this string? - php

I have the url http://domain.com/script.php?l=7&p=146#p146. I want to be able to get the number after p=, without the #. Also, the hash may not always be there, so sometimes it could turn out as script.php?l=7&p=146. I know it's something to do with the regex character +, but I'm not completely sure on how to use it. Can someone please create the regex and explain how it works?

No need for regular expressions here.
$query = parse_url("http://domain.com/script.php?l=7&p=146#p146", PHP_URL_QUERY);
parse_str($query, $params);
echo $params['p'];
parse_url can get you all the distinct elements of a URL. And parse_str takes a query string (that stuff you find between ? and an optional # in a URL) and figures out the different parameters for you. You could also omit the parameter $params to the function, then parse_str would define some variables for you (afterward you could find the result in $p). But I personally rather dislike using parse_str with this side effect.
If you want to read up some more: PHP documentation on parse_url and parse_str

Don't reinvent the wheel. Use a built-in function, such as parse_url to parse the URL.
Documentation and examples: http://php.net/manual/en/function.parse-url.php

Related

Escape a url within a regular expression in a preg_replace

I am trying to redirect some tags to another page, passing its href as a url parameter. The code I'm using is something like this:
preg_replace(
"/<a(\s[^>]*)href=[\"\']??([^\" >]*?)[\"\']??([^>]*)>(.*)<\/a>/siU",
"<a$1href=\"".WWW."go.php?to=".urlencode("$2")."\"$3>$4</a>", $text
);
It is a modified version of the regexp found here. I use this code in this block:
$text = "<...some other tags...><a target=\"_blank\" href=\"http://www.google.com\" style=\"...\" class=\"...\">Google</a></...some other tags...>";
And it correctly gets captured, but when using urlencode("$2"), it recieves a "$2" string, and not the value stored in the preg variables (as I would). It is not limited to urlencode, but to passing this as a parameter to any other function. So I would not only want to encode this (I can always extend a little more the regexp to accept urls) but generally use variables inside methods.
Do you know any workaround to this? Thanks in advance.
this is totally normal as your are url encoding the string "$2" and then the urlencoded string is used for replacement so you end up with the same thing as writing
"<a$1href=\"".WWW."go.php?to=$2\"$3>$4</a>"
as second parameter. If you want the urlencode to be evaluated you have to use the e (for eval) flag like this:
preg_replace(
"/<a(\s[^>]*)href=[\"\']??([^\" >]*?)[\"\']??([^>]*)>(.*)<\/a>/seiU",
"'<a$1href=\"'.WWW.'go.php?to=\"'.urlencode('$2').'\"$3>$4</a>'", $text
);
another preferable solution may be to use preg_replace_callback to avoid relying on evaluating unknown strings

Using regex to get string from URL?

Regex is my bete noire, can anyone help me isolate a string from a URL?
I want to get the page name from a URL which could appear in any of the following ways from an input form:
https://www.facebook.com/PAGENAME?sk=wall&filter=2
http://www.facebook.com/PAGENAME?sk=wall&filter=2
www.facebook.com/PAGENAME
facebook.com/PAGENAME?sk=wall
... and so on.
I can't seem to find a way to isolate the string after .com/ but before ? (if present at all). Is it preg_match, replace or split?
If anyone can recommend a particularly clear and introductory regex guide they found useful, it'd be appreciated.
You can use the parse_url function and then get the last segment from the path of the url:
$parts=parse_url($url);
$path_parts=explode("/", $parts["path"]);
$page=$path_parts[count($path_parts)-1];
For learning and testing regexes I found RegExr, an online tool, very useful: http://gskinner.com/RegExr/
But as others mentioned, parsing the url with appropriate functions might be better in this case.
I think you can use this php function (parse_url) directly instead of using regex.
Use smth like:
substr(parse_url('https://www.facebook.com/PAGENAME?sk=wall&filter=2', PHP_URL_PATH), 1);

regex to get $_GET variables

I have a URL string and would like to extract parts of the URL. I have been trying to do understand how to do it with regex but no luck.
http://www.example.com?id=example.id&v=other.variable
From the example above I would like to extract the id value ie. example.id
I'm assuming you're not referring to actual $_GET variables, but to a string containing a URL with a query string.
PHP has built-in functions to process those:
parse_url() to extract the query string from a URL
parse_str() to split the query string into its components
No need for regexp here, just use php built in function parse_url
$url = 'http://www.example.com?id=example.id&v=other.variable';
parse_str(parse_url($url, PHP_URL_QUERY), $vars);

PHP url recognition

How can you check if text typed in from a user is an url?
Lets say I want to check if the text is a url and then check if the string has "youtube.com" in it. Afterwards, I want to get the portion of the link which is of interest for me between the substrings "watch?v=" and any "&" parameters if they do exist.
parse_url() is probably a good choice here. If you get a bad URL, the function will return false. Otherwise, it will break the URL up into its component pieces and you can use the ones you need.
Example:
$urlParts = parse_url('http://www.youtube.com/watch?v=MX0D4oZwCsA');
if ($urlParts == false) echo "Bad URL";
else echo "Param string is ".$urlParts['query'];
Outputs:
Param string is v=MX0D4oZwCsA
You could split the query portion as needed using explode() for specific parameters.
Edit: Keep in mind that parse_url() tries as hard as possible to parse the string it is given, so bad URLs will often succeed, although the resulting data array will be very odd. It's obviously up to you how definitive you want your validation to be and what exactly you require out of your user input.
preg_match('#watch\?v=([^&]+)#', $url, $matches);
echo $matches[1];
Its strongly recommended not to use parse_url() for url validation.
here is a nice solution.

PHP Regex help, getting part of a link

I'm trying to write a regex in php that in a line like
<a href="mypage.php?(some junk)&p=12345&(other junk)" other link stuff>Text</a>
and it will only return me "p=12345", or even "12345". Note that the (some junk)& and the &(otherjunk) may or may not be present.
Can I do this with one expression, or will I need more than one? I can't seem to work out how to do it in one, which is what I would like if at all possible. I'm also open to other methods of doing this, if you have a suggestion.
Thanks
Perhaps a better tactic over using a regular expressoin in this case is to use parse_url.
You can use that to get the query (what comes after the ? in your URL) and split on the '&' character and then the '=' to put things into a nice dictionary.
Use parse_url and parse_str:
$url = 'mypage.php?(some junk)&p=12345&(other junk)';
$parsed_url = parse_url($url);
parse_str($parsed_url['query'], $parsed_str);
echo $parsed_str['p'];

Categories