Convert absolute to relative url with preg_replace

Convert absolute to relative url with preg_replace - php

(I searched, and found lots of questions about converting relative to absolute urls, but nothing for absolute to relative.)
I'd like to take input from a form field and end up with a relative url. Ideally, this would be able to handle any of the following inputs and end up with /page-slug.
http://example.com/page-slug
http://www.example.com/page-slug
https://example.com/page-slug
https://www.example.com/page-slug
example.com/page-slug
/page-slug
And maybe more I'm not thinking of...?
Edit: I'd also like this to work for something where the relative url is e.g. /page/post (i.e. something with more than one slash).

Take a look at parse_url if you are always working with URLs. Specifically:
parse_url($url, PHP_URL_PATH)
FYI, I tested it against all your input, and it worked on all except: example.com/page-slug

Try this regexp.
#^ The start of the string
(
:// Match either ://
| Or
[^/] Not a /
)* Any number of times
#
And replace it with the empty string.
$pattern = '#^(://|[^/])+#';
$replacement = '';
echo preg_replace($pattern, $replacement, $string);

I think you want the part of the URL after the hostname, you can use parse_url:
$path = parse_url($url, PHP_URL_PATH);
Note that this gets the whole of the URL after the hostname, so http://example.com/page/slug will give /page/slug.

I would just do this a little hacky way if you know your application. I would use a regex to search for
[a-z].([(com|org|net)])

Related

extract part of a path with filename with php

I need to extract a portion of urls using php. The last 6 segments of the url are the part I need. The first part of the url varies in length and number of directories. So if I have a url like this:
https://www.random.ccc/random2/part1/part2/part3/2017/08/file.txt
or this:
https://www.random.vov/part1/part2/part3/2016/08/file.pdf
What I need is this:
/part1/part2/part3/2017/08/file.txt
or this:
/part1/part2/part3/2016/08/file.pdf
I have tried this:
$string = implode("/",array_slice(explode("/",$string,8),6,4));
which works ok on the first example but not the second. I am not so good with regex and I suppose that is the way. What is the most graceful solution?

Your approach is fine, though adding parse_url in there to isolate just the path will help a lot:
$path = parse_url($url, PHP_URL_PATH); // just the path part of the URL
$parts = explode('/', $path); // all the components
$parts = array_slice($parts, -6); // the last six
$path = implode('/', $parts); // back together as a string
Try it online at 3v4l.org.
Now, to qualify: if you only need the string part of the path, then use parse_url. If, however, you need to work with each of the segments (such as removing only the last six, as asked), then use the common pattern of explode/manipulate/implode.
I have left each of these steps separate in the above so you can debug and choose the parts that work best for you.

Use this, substituting $url as you wish:
$url= "https://www.random.vov/part1/part2/part3/2016/08/file.pdf";
preg_match("%/[^/]*?/[^/]*?/[^/]*?/[^/]*?/[^/]*?/[^/]*?$%", $url, $matches);
echo $matches[0];
best regards!

Extracting text from URL using PHP

I'm curious as to how I would get a certain value after a delimiter in a URL?
If I have a URL of http://www.testing.site.com/site/biz/i-want-this, how would I extract only the part that says "i-want-this", or initially after the last /?
Thank you!

You want basename($path); It should give you what you need:
http://www.ideone.com/8hFSN

$url = "http://www.testing.site.com/site/biz/i-want-this";
preg_match( "/[^\/]*$/", $url, $match);
echo $match[0]; // i-want-this
You can use basename() but if you are on Windows, it will break on not just slashes but also backslashes. This is unlikely to come up as backslashes are unusual in a URL. But I suspect you could find them in a query string in a valid URL.

Regex to Remove Everything After 4th Slash in URL

I'm working in PHP with friendly URL paths in the form of:
/2011/09/here-is-the-title
/2011/09/here-is-the-title/2
I need to standardize these URL paths to remove anything after the 4 slash including the slash itself. The value after the 4th slash is sometimes a number, but can also be any parameter.
Any thoughts on how I could do this? I imagine regex could handle it, but I'm terrible with it. I also thought a combination of strpos and substr might be able to handle it, but cannot quite figure it out.

You can use explode() function:
$parts = explode('/', '/2011/09/here-is-the-title/2');
$output = implode('/', array_slice($parts, 0, 4));

Replace
%^((/[^/]*){3}).*%g
with $1.
see http://regexr.com?2vlr8 for a live example

If your regex implementation support arbitrary length look-behind assertions you could replace
(?<=^[^/]*(/[^/]*){3})/.*$
with an empty string.
If it does not, you can replace
^([^/]*(?:/[^/]*){3})/.*$
with the contents of the first capturing group. A PHP example for the second one can be found at ideone.com.

you could also use a loop:
result="";
for char c in URL:
if(c is a slash) count++;
if(count<4) result=result+c;
else break;

Would a regular expression be best for this problem?

I need to take a url like this:
https://www.domain.com/m/281/[imagename].jpg
and turn it into this:
http://www.NEWdomain.com/images/[imagename].jpg
I will need to do this to many urls so I want to write a quick php script to put the urls in array and then loop to change the domain name and remove the file structure in the original urls. Not all the original urls are /m/281 some are slightly different.
I thought I could do a str_replace for the https://www.domain.com to http://www.NEWdomain.com, but I am stumped with how to change the varying /m/281/ in the url's to my file structure like /images/.
Would a regular expression be best to solve this problem?

you could try something like :
strip off the "https://"
do a str_replace() as you said on
the domain
split the string into an array based
on "/". explode("/", $urlString);
loop through and remove any elements
after the URL element but not the
last.
result will be:
$arr[0] = www.NEWdomain.com
$arr[1] = [imagename].jpg
then just insert before the last element "images"
result will then be:
$arr[0] = www.NEWdomain.com
$arr[1] = images
$arr[2] = [imagename].jpg
finally implode it back to a string:
$blah = implode("/", $arr);

Why don't you try using some URL parsing library like - parse_url
and then get each component and do simpler string replace perhaps.

If you want to change all image urls from all paths, this tested function should do the trick.
function fixurls($text) {
$re = '% # Match image urls in domain.com
https://www\.domain\.com/ # Required domain.
(?:[^\s/]+/)* # Optional pathname.
([^\s/]+\.jpe?g|png|gif) # $1: Filename (images only)
\b # Anchor to word boundary.
%xim';
// Fix all image URLs in $text string.
$replace = 'http://www.NEWdomain.com/images/$1';
$text = preg_replace($re, $replace, $text);
return $text;
}
You can easily modify the path portion of the regex if you only wish to change images from specific paths.

Your regular expression could match /[a-zA-Z]/[0-9]*/, if I didn't make a bad assumption about your old pattern.

I think what you need is preg_replace().

If only the first two subdirectory segments are variable, you could try:
$src = preg_replace(
"~https?://www.domain.com/\w+/\d+/(.*?\.jpg)~" // match regex
"http://www.NEWdomain.com/images/$1", // replacement
$src);
The \w means a letter, and \d+ matches decimals. The .*? works on almost anything, since you didn't give any criteria for the filename.
In the replacement string the $1 just becomes what was previously matched with the ( capture ) parens.

Regular expression to extract from URI

I need a regular expression to extract from two types of URIs
http://example.com/path/to/page/?filter
http://example.com/path/to/?filter
Basically, in both cases I need to somehow isolate and return
/path/to
and
?filter
That is, both /path/to and filter is arbitrary. So I suppose I need 2 regular expressions for this? I am doing this in PHP but if someone could help me out with the regular expressions I can figure out the rest. Thanks for your time :)
EDIT: So just want to clearify, if for example
http://example.com/help/faq/?sort=latest
I want to get /help/faq and ?sort=latest
Another example
http://example.com/site/users/all/page/?filter=none&status=2
I want to get /site/users/all and ?filter=none&status=2. Note that I do not want to get the page!

Using parse_url might be easier and have fewer side-effects then regex:
$querystring = parse_url($url, PHP_URL_QUERY);
$path = parse_url($var, PHP_URL_PATH);
You could then use explode on the path to get the first two segments:
$segments = explode("/", $path);

Try this:
^http://[^/?#]+/([^/?#]+/[^/?#]+)[^?#]*\?([^#]*)
This will get you the first two URL path segments and query.

not tested but:
^https?://[^ /]+[^ ?]+.*
which should match http and https url with or without path, the second argument should match until the ? (from the ?filter for instance) and the .* any char except the \n.

Have you considered using explode() instead (http://nl2.php.net/manual/en/function.explode.php) ? The task seems simple enough for it. You would need 2 calls (one for the / and one for the ?) but it should be quite simple once you did that.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Convert absolute to relative url with preg_replace - php

Take a look at parse_url if you are always working with URLs. Specifically: parse_url($url, PHP_URL_PATH) FYI, I tested it against all your input, and it worked on all except: example.com/page-slug

Try this regexp. #^ The start of the string ( :// Match either :// | Or [^/] Not a / )* Any number of times # And replace it with the empty string. $pattern = '#^(://|[^/])+#'; $replacement = ''; echo preg_replace($pattern, $replacement, $string);

I think you want the part of the URL after the hostname, you can use parse_url: $path = parse_url($url, PHP_URL_PATH); Note that this gets the whole of the URL after the hostname, so http://example.com/page/slug will give /page/slug.

I would just do this a little hacky way if you know your application. I would use a regex to search for [a-z].([(com|org|net)])

Related

extract part of a path with filename with php

Extracting text from URL using PHP

Regex to Remove Everything After 4th Slash in URL

Would a regular expression be best for this problem?

Regular expression to extract from URI

Categories

Resources