PHP Regex help, getting part of a link - php

I'm trying to write a regex in php that in a line like
<a href="mypage.php?(some junk)&p=12345&(other junk)" other link stuff>Text</a>
and it will only return me "p=12345", or even "12345". Note that the (some junk)& and the &(otherjunk) may or may not be present.
Can I do this with one expression, or will I need more than one? I can't seem to work out how to do it in one, which is what I would like if at all possible. I'm also open to other methods of doing this, if you have a suggestion.
Thanks

Perhaps a better tactic over using a regular expressoin in this case is to use parse_url.
You can use that to get the query (what comes after the ? in your URL) and split on the '&' character and then the '=' to put things into a nice dictionary.

Use parse_url and parse_str:
$url = 'mypage.php?(some junk)&p=12345&(other junk)';
$parsed_url = parse_url($url);
parse_str($parsed_url['query'], $parsed_str);
echo $parsed_str['p'];

Related

Regex, how to match this string?

I have the url http://domain.com/script.php?l=7&p=146#p146. I want to be able to get the number after p=, without the #. Also, the hash may not always be there, so sometimes it could turn out as script.php?l=7&p=146. I know it's something to do with the regex character +, but I'm not completely sure on how to use it. Can someone please create the regex and explain how it works?
No need for regular expressions here.
$query = parse_url("http://domain.com/script.php?l=7&p=146#p146", PHP_URL_QUERY);
parse_str($query, $params);
echo $params['p'];
parse_url can get you all the distinct elements of a URL. And parse_str takes a query string (that stuff you find between ? and an optional # in a URL) and figures out the different parameters for you. You could also omit the parameter $params to the function, then parse_str would define some variables for you (afterward you could find the result in $p). But I personally rather dislike using parse_str with this side effect.
If you want to read up some more: PHP documentation on parse_url and parse_str
Don't reinvent the wheel. Use a built-in function, such as parse_url to parse the URL.
Documentation and examples: http://php.net/manual/en/function.parse-url.php

Using regex to get string from URL?

Regex is my bete noire, can anyone help me isolate a string from a URL?
I want to get the page name from a URL which could appear in any of the following ways from an input form:
https://www.facebook.com/PAGENAME?sk=wall&filter=2
http://www.facebook.com/PAGENAME?sk=wall&filter=2
www.facebook.com/PAGENAME
facebook.com/PAGENAME?sk=wall
... and so on.
I can't seem to find a way to isolate the string after .com/ but before ? (if present at all). Is it preg_match, replace or split?
If anyone can recommend a particularly clear and introductory regex guide they found useful, it'd be appreciated.
You can use the parse_url function and then get the last segment from the path of the url:
$parts=parse_url($url);
$path_parts=explode("/", $parts["path"]);
$page=$path_parts[count($path_parts)-1];
For learning and testing regexes I found RegExr, an online tool, very useful: http://gskinner.com/RegExr/
But as others mentioned, parsing the url with appropriate functions might be better in this case.
I think you can use this php function (parse_url) directly instead of using regex.
Use smth like:
substr(parse_url('https://www.facebook.com/PAGENAME?sk=wall&filter=2', PHP_URL_PATH), 1);

replace url using preg_replace php

Hi all i know preg_replace can be used for formatting string but
i need help in that concerned area
my url will be like this
www.example.com/en/index.php
or
www.example.com/fr/index.php
what i want is to get
result as
www.example.com/index.php
i need it in php code so as to set in a session
can anyone please explain how ?
preg_replace('/www.example.com\/(.+)\/index.php/i', "www.example.com/index.php?lang=$1", $url); will do the thing
This is one way to do it:-
$newurl = preg_replace('/\/[a-z][a-z]\//', '/', $url);
Note that the search string appears with quotes and forward slashes ('/.../') and that the forward slashes in the URL then have to be escaped (\/). The language code is then matched with '[a-z][a-z]', but there are several other ways to do this and you may want something more liberal in case there are ever 3 letter codes, or caps. Equally you may need to do something tighter depending on what other URL schemes might appear.
I suspect in this instance it would be faster simply to use str_replace as follows:
$cleanedData = str_replace(array('www.example.com/en/', 'www.example.com/fr/'), '', $sourceData);
Finally i got a method my thanks to Purpletoucan
$newurl = preg_replace('/\/(en|esp|fr)\//', '/', $url);
it's working now i think!

php: delete elements from string

I have this sitation:
..<img src="//http://www... OR ..<img src="/http://www... OR ..<img src="////http://www...
(/ - may be much)
How delete / before http?
Resultat always should be:
..<img src="http://www...
Thanks ;)
This should do the trick.
ltrim($url, "/");
This seems like a rather ad hoc solution. You might want to get to the bottom of the issue and eliminate it at source.
A regular expression along the lines of this should do the trick I think:
$string = preg_replace('/="\/+http:/', '="http:', $string);
Assuming that the url is defined in a variable within your PHP, ltrim() could be the answer
$url = ltrim($url,'/');
though you wouldn't be able to use this option if you had local url's (eg '/images/img.gif') without the 'http://'
You could do something like this (str_replace() because it is faster than a regular expression):
$markup = str_replace('//http://', 'http://', $markup);
Why do you need this? It might be better to eliminate the source of this problem.

regex to get current page or directory name?

I am trying to get the page or last directory name from a url
for example if the url is: http://www.example.com/dir/ i want it to return dir or if the passed url is http://www.example.com/page.php I want it to return page Notice I do not want the trailing slash or file extension.
I tried this:
$regex = "/.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*/i";
$name = strtolower(preg_replace($regex,"$2",$url));
I ran this regex in PHP and it returned nothing. (however I tested the same regex in ActionScript and it worked!)
So what am I doing wrong here, how do I get what I want?
Thanks!!!
Don't use / as the regex delimiter if it also contains slashes. Try this:
$regex = "#^.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*$#i";
You may try tho escape the "/" in the middle. That simply closes your regex. So this may work:
$regex = "/.*\.(com|gov|org|net|mil|edu)\/([a-z_\-]+).*/i";
You may also make the regex somewhat more general, but that's another problem.
You can use this
array_pop(explode('/', $url));
Then apply a simple regex to remove any file extension
Assuming you want to match the entire address after the domain portion:
$regex = "%://[^/]+/([^?#]+)%i";
The above assumes a URL of the format extension://domainpart/everythingelse.
Then again, it seems that the problem here isn't that your RegEx isn't powerful enough, just mistyped (closing delimiter in the middle of the string). I'll leave this up for posterity, but I strongly recommend you check out PHP's parse_url() method.
This should adequately deliver:
substr($s = basename($_SERVER['REQUEST_URI']), 0, strrpos($s,'.') ?: strlen($s))
But this is better:
preg_replace('/[#\.\?].*/','',basename($path));
Although, your example is short, so I cannot tell if you want to preserve the entire path or just the last element of it. The preceding example will only preserve the last piece, but this should save the whole path while being generic enough to work with just about anything that can be thrown at you:
preg_replace('~(?:/$|[#\.\?].*)~','',substr(parse_url($path, PHP_URL_PATH),1));
As much as I personally love using regular expressions, more 'crude' (for want of a better word) string functions might be a good alternative for you. The snippet below uses sscanf to parse the path part of the URL for the first bunch of letters.
$url = "http://www.example.com/page.php";
$path = parse_url($url, PHP_URL_PATH);
sscanf($path, '/%[a-z]', $part);
// $part = "page";
This expression:
(?<=^[^:]+://[^.]+(?:\.[^.]+)*/)[^/]*(?=\.[^.]+$|/$)
Gives the following results:
http://www.example.com/dir/ dir
http://www.example.com/foo/dir/ dir
http://www.example.com/page.php page
http://www.example.com/foo/page.php page
Apologies in advance if this is not valid PHP regex - I tested it using RegexBuddy.
Save yourself the regular expression and make PHP's other functions feel more loved.
$url = "http://www.example.com/page.php";
$filename = pathinfo(parse_url($url, PHP_URL_PATH), PATHINFO_FILENAME);
Warning: for PHP 5.2 and up.

Categories