regex to get current page or directory name? - php

I am trying to get the page or last directory name from a url
for example if the url is: http://www.example.com/dir/ i want it to return dir or if the passed url is http://www.example.com/page.php I want it to return page Notice I do not want the trailing slash or file extension.
I tried this:
$regex = "/.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*/i";
$name = strtolower(preg_replace($regex,"$2",$url));
I ran this regex in PHP and it returned nothing. (however I tested the same regex in ActionScript and it worked!)
So what am I doing wrong here, how do I get what I want?
Thanks!!!

Don't use / as the regex delimiter if it also contains slashes. Try this:
$regex = "#^.*\.(com|gov|org|net|mil|edu)/([a-z_\-]+).*$#i";

You may try tho escape the "/" in the middle. That simply closes your regex. So this may work:
$regex = "/.*\.(com|gov|org|net|mil|edu)\/([a-z_\-]+).*/i";
You may also make the regex somewhat more general, but that's another problem.

You can use this
array_pop(explode('/', $url));
Then apply a simple regex to remove any file extension

Assuming you want to match the entire address after the domain portion:
$regex = "%://[^/]+/([^?#]+)%i";
The above assumes a URL of the format extension://domainpart/everythingelse.

Then again, it seems that the problem here isn't that your RegEx isn't powerful enough, just mistyped (closing delimiter in the middle of the string). I'll leave this up for posterity, but I strongly recommend you check out PHP's parse_url() method.
This should adequately deliver:
substr($s = basename($_SERVER['REQUEST_URI']), 0, strrpos($s,'.') ?: strlen($s))
But this is better:
preg_replace('/[#\.\?].*/','',basename($path));
Although, your example is short, so I cannot tell if you want to preserve the entire path or just the last element of it. The preceding example will only preserve the last piece, but this should save the whole path while being generic enough to work with just about anything that can be thrown at you:
preg_replace('~(?:/$|[#\.\?].*)~','',substr(parse_url($path, PHP_URL_PATH),1));

As much as I personally love using regular expressions, more 'crude' (for want of a better word) string functions might be a good alternative for you. The snippet below uses sscanf to parse the path part of the URL for the first bunch of letters.
$url = "http://www.example.com/page.php";
$path = parse_url($url, PHP_URL_PATH);
sscanf($path, '/%[a-z]', $part);
// $part = "page";

This expression:
(?<=^[^:]+://[^.]+(?:\.[^.]+)*/)[^/]*(?=\.[^.]+$|/$)
Gives the following results:
http://www.example.com/dir/ dir
http://www.example.com/foo/dir/ dir
http://www.example.com/page.php page
http://www.example.com/foo/page.php page
Apologies in advance if this is not valid PHP regex - I tested it using RegexBuddy.

Save yourself the regular expression and make PHP's other functions feel more loved.
$url = "http://www.example.com/page.php";
$filename = pathinfo(parse_url($url, PHP_URL_PATH), PATHINFO_FILENAME);
Warning: for PHP 5.2 and up.

Related

Removing 'http://' from link via REGEX

What I would like to do is remove the "http://" part of these autogenerated links, below is an example of it.
http://google.com/search?gc...
Here are the regexes I am using in PHP to generate these links from a URL.
$patterns_sp[5] = '~([\S]+)~';
$replaces_sp[5] = '<a href=\1 target="_blank">\1<br/>';
$patterns_sp[6] = '~(?<=\>)([\S]{1,25})[^\s]+~';
$replaces_sp[6] = '\1...</a><br/>';
When these patterns are run on a URL like this:
http://www.google.com/search?gcx=c&ix=c1&sourceid=chrome&ie=UTF-8&q=regex
the REGEX gives me:
http://google.com/search?gc...
Where I am stuck:
There is no obvious reason why I cannot modify the fourth line of code to read like this:
$patterns_sp[6] = '~(?<=\>http\:\/\/)([\S]{1,25})[^\s]+~';
However, the REGEX still seems to capture the "http://" part of the address, thus making a long list of these very redundant looking. What I am left with is the same thing as in the first example.
Replace...
$patterns_sp[5] = '~([\S]+)~';
...with...
$patterns_sp[5] = '~^(?:https?|ftp):([\S]+)~';
Then you can access the protocol-less version with $1 and the whole link with $0.
Optionally, you can remove a leading protocol with something like...
preg_replace('/^(?:https?|ftp):/', '', $str);
I suggest not writing your own regex, instead have a look at http://php.net/manual/en/function.parse-url.php
Retrieve the components of the URL, then compose a new version that only contains the parts you want.

PHP remove page name Regex - preg_replace

I have this url (several similar ones)..
images/image1/image1.jpg
images/images1/images2/image2.jpg
images/images2/images3/images4/image4.jpg
I have this regex: but I want it to strip away the image name from the string:
<?php $imageurlfolder = $pagename1;
$imageurlfolder = preg_replace('/[A-Za-z0-9]+.asp/', '', $pagename1);?>
the string would look like the url's above images/images2/images3/images4/ but without the image4.jpg
hope you can help
Thanks
For this particular purpose function dirname() would be sufficient:
<?php echo dirname('images/images2/images3/images4/image4.jpg'); ?>
Would return:
images/images2/images3/images4
I think you can use the dirname function
for instance (from that page)
dirname("/etc/passwd")
would print
/etc
A quite straightforward way to do it:
preg_replace("#(?<=/)[^/]+$#","",$your_string);
It will remove everything between the last / and the end of the string.
Edit: as many peopole pointed out, you can also use dirname which might proof faster…

Using PHP to split a URL

I am creating a PHP proxy where it accepts a url and confirms it is on my list of servers.
When importing the url from the application i ran it to an issue where i needed 2 parser tags. i need it to split along a "\?" tag as well as a string, in my case, "export?"
i am using preg for the first tag. Does this accept the strings like my export tag or is there some other method for doing this?
please le me know how this is accomplished or if you have more questions.
As ircmaxell has already stated in the comments, PHP does already have a function to parse a URL: parse_url.
And when you have the URL path (I assume your export? the path suffix plus the query indicator), you can use explode to split the path into its path segments:
$path = parse_url($url, PHP_URL_PATH);
$segments = explode('/', $path);
You can then get the last path segment with one of the following:
end($segments)
$segments[count($segments)-1]
And to cope with trailing slashes, you can use rtrim($path, '/') to remove them.
All together:
$url = 'http://www.example.com/subfolders/export?';
$path = parse_url($url, PHP_URL_PATH);
$segments = explode('/', rtrim($path, '/'));
echo end($segments);
A regular expression should do the trick, something like the below would work. This is what Django uses in their URL dispatcher
r'^export/$'
Regular expressions are strings matches that may also include variable matches. Because ? is included within ?, you have to do your split twice. Once on export? first, and a second pass on each of those with ? as your delimiter. As written below, you're just splitting on either of two different strings.
$first = preg_split('export\?', ...);
for ($first) {
array_push ($second,preg_split('\?', ...)');
}
That isn't perfectly valid PHP, but I hope it is close enough pseudocode.
Hey guys i ended up using an explode which looked for the string (export?) and then i used the preg split command to search for the \?. this provided me with the protion i was looking for. thanks guys.

PHP Regex help, getting part of a link

I'm trying to write a regex in php that in a line like
<a href="mypage.php?(some junk)&p=12345&(other junk)" other link stuff>Text</a>
and it will only return me "p=12345", or even "12345". Note that the (some junk)& and the &(otherjunk) may or may not be present.
Can I do this with one expression, or will I need more than one? I can't seem to work out how to do it in one, which is what I would like if at all possible. I'm also open to other methods of doing this, if you have a suggestion.
Thanks
Perhaps a better tactic over using a regular expressoin in this case is to use parse_url.
You can use that to get the query (what comes after the ? in your URL) and split on the '&' character and then the '=' to put things into a nice dictionary.
Use parse_url and parse_str:
$url = 'mypage.php?(some junk)&p=12345&(other junk)';
$parsed_url = parse_url($url);
parse_str($parsed_url['query'], $parsed_str);
echo $parsed_str['p'];

php: delete elements from string

I have this sitation:
..<img src="//http://www... OR ..<img src="/http://www... OR ..<img src="////http://www...
(/ - may be much)
How delete / before http?
Resultat always should be:
..<img src="http://www...
Thanks ;)
This should do the trick.
ltrim($url, "/");
This seems like a rather ad hoc solution. You might want to get to the bottom of the issue and eliminate it at source.
A regular expression along the lines of this should do the trick I think:
$string = preg_replace('/="\/+http:/', '="http:', $string);
Assuming that the url is defined in a variable within your PHP, ltrim() could be the answer
$url = ltrim($url,'/');
though you wouldn't be able to use this option if you had local url's (eg '/images/img.gif') without the 'http://'
You could do something like this (str_replace() because it is faster than a regular expression):
$markup = str_replace('//http://', 'http://', $markup);
Why do you need this? It might be better to eliminate the source of this problem.

Categories