urlencode only the directory and file names of a URL - php

I need to URL encode just the directory path and file name of a URL using PHP.
So I want to encode something like http://example.com/file name and have it result in http://example.com/file%20name.
Of course, if I do urlencode('http://example.com/file name'); then I end up with http%3A%2F%2Fexample.com%2Ffile+name.
The obvious (to me, anyway) solution is to use parse_url() to split the URL into scheme, host, etc. and then just urlencode() the parts that need it like the path. Then, I would reassemble the URL using http_build_url().
Is there a more elegant solution than that? Or is that basically the way to go?

#deceze definitely got me going down the right path, so go upvote his answer. But here is exactly what worked:
$encoded_url = preg_replace_callback('#://([^/]+)/([^?]+)#', function ($match) {
return '://' . $match[1] . '/' . join('/', array_map('rawurlencode', explode('/', $match[2])));
}, $unencoded_url);
There are a few things to note:
http_build_url requires a PECL install so if you are distributing your code to others (as I am in this case) you might want to avoid it and stick with reg exp parsing like I did here (stealing heavily from #deceze's answer--again, go upvote that thing).
urlencode() is not the way to go! You need rawurlencode() for the path so that spaces get encoded as %20 and not +. Encoding spaces as + is fine for query strings, but not so hot for paths.
This won't work for URLs that need a username/password encoded. For my use case, I don't think I care about those, so I'm not worried. But if your use case is different in that regard, you'll need to take care of that.

As you say, something along these lines should do it:
$parts = parse_url($url);
if (!empty($parts['path'])) {
$parts['path'] = join('/', array_map('rawurlencode', explode('/', $parts['path'])));
}
$url = http_build_url($parts);
Or possibly:
$url = preg_replace_callback('#https?://.+/([^?]+)#', function ($match) {
return join('/', array_map('rawurlencode', explode('/', $match[1])));
}, $url);
(Regex not fully tested though)

function encode_uri($url){
$exp = "{[^0-9a-z_.!~*'();,/?:#&=+$#%\[\]-]}i";
return preg_replace_callback($exp, function($m){
return sprintf('%%%02X',ord($m[0]));
}, $url);
}

Much simpler:
$encoded = implode("/", array_map("rawurlencode", explode("/", $path)));

I think this function ok:
function newUrlEncode ($url) {
return str_replace(array('%3A', '%2F'), '/', urlencode($url));
}

Related

PHP Regex to get second occurance from the path

I have a path "../uploads/e2c_name_icon/" and I need to extract e2c_name_icon from the path.
What I tried is using str_replace function
$msg = str_replace("../uploads/","","../uploads/e2c_name_icon/");
This result in an output "e2c_name_icon/"
$msg=str_replace("/","","e2c_name_icon/")
There is a better way to do this. I am searching alternative method to use regex expression.
Try this. Outputs: e2c_name_icon
<?php
$path = "../uploads/e2c_name_icon/";
// Outputs: 'e2c_name_icon'
echo explode('/', $path)[2];
However, this is technically the third component of the path, the ../ being the first. If you always need to get the third index, then this should work. Otherwise, you'll need to resolve the relative path first.
Use basename function provided by PHP.
$var = "../uploads/e2c_name_icon/";
echo basename( $var ); // prints e2c_name_icon
If you are strictly want to get the last part of the url after '../uploads'
Then you could use this :
$url = '../uploads/e2c_name_icon/';
$regex = '/\.\.\/uploads\/(\w+)/';
preg_match($regex, $url, $m)
print_r ($m); // $m[1] would output your url if possible
You can trim after the str_replace.
echo $msg = trim(str_replace("../uploads/","","../uploads/e2c_name_icon/"), "/");
I don't think you need to use regex for this. Simple string functions are usually faster
You could also use strrpos to find the second last /, then trim off both /.
$path = "../uploads/e2c_name_icon/";
echo $msg = trim(substr($path, strrpos($path, "/",-2)),"/");
I added -2 in strrpos to skip the last /. That means it returns the positon of the / after uploads.
So substr will return /e2c_name_icon/ and trim will remove both /.
You'd be much better off using the native PHP path functions vs trying to parse it yourself.
For example:
$path = "../uploads/e2c_name_icon/";
$msg = basename(dirname(realpath($path))); // e2c_name_icon

Convert unicode URL to ASCII

I'm writing a PHP application that accepts an URL from the user, and then processes it with by making some calls to binaries with system()*. However, to avoid many complications that arise with this, I'm trying to convert the URL, which may contain Unicode characters, into ASCII characters.
Let's say I have the following URL:
https://täst.de:8118/news/zh-cn/新闻动态/2015/
Here two parts need to be dealt with: the hostname and the path.
For the hostname, I can simply call idn_to_ascii().
However, I can't simply call urlencode() over the path, as each of the characters that need to remain unmodified will also be converted (e.g. news/zh-cn/新闻动态/2015/ -> news%2Fzh-cn%2F%E6%96%B0%E9%97%BB%E5%8A%A8%E6%80%81%2F2015%2F as opposed to news/zh-cn/%E6%96%B0%E9%97%BB%E5%8A%A8%E6%80%81/2015/).
How should I approach this problem?
*I'd rather not deal with system() calls and the resulting complexity, but given that the functionality is only available by calling binaries, I unfortunately have no choice.
split URL by / then urlencode() that part then put it back together
$url = explode("/", $url);
$url[2] = idn_to_ascii($url[2]);
$url[5] = urlencode($url[5]);
$url = join("/", $url);
You could use PHP's iconv function:
inconv("UTF-8", "ASCII//TRANSLIT", $url);
The following can be used for this transformation:
function convertpath ($path) {
$path1 = '';
$len = strlen ($path);
for ($i = 0; $i < $len; $i++) {
if (preg_match ('/^[A-Za-z0-9\/?=+%_.~-]$/', $path[$i])) {
$path1 .= $path[$i];
}
else {
$path1 .= urlencode ($path[$i]);
}
}
return $path1;
}

PHP regex: How to remove ?file in url?

My url like this:
http://mywebsite.com/movies/937-lan-kwai-fong-2?file=Rae-Ingram&q=
http://mywebsite.com/movies/937-big-daddy?file=something&q=
I want to get "lan-kwai-fong-2" and "big-daddy", so I use this code but it doesn't work. Please help me fix it ! If you can shorten it, it is so great !
$url= $_SERVER['REQUEST_URI'];
preg_replace('/\?file.*/','',$url);
preg_match('/[a-z][\w\-]+$/',$url,$matches);
$matches= str_replace("-"," ",$matches[0]);
First there are issue with your code which im going to go over because they are general things:
preg_replace does not work by reference so you are never actually modifying the url. You need to assign the result of the replace to a variable:
// this would ovewrite the current value of url with the replaced value
$url = preg_replace('/\?file.*/','',$url);
It is possible that preg_match will not find anything so you need to test the result
// it should also be noted that sometimes you may need a more exact test here
// because it can return false (if theres an error) or 0 (if there is no match)
if (preg_match('/[a-z][\w\-]+$/',$url,$matches)) {
// do stuff
}
Now with that out of the way you are making this more difficult than it needs to be. There are specific function for working with urls parse_url and parse_str.
You can use these to easily work with the information:
$urlInfo = parse_url($_SERVER['REQUEST_URI']);
$movie = basename($urlInfo['path']); // yields 937-the-movie-title
Just replace
preg_replace('/\?file.*/','',$url);
with
$url= preg_replace('/\?file.*/','',$url);
Regex works, and parse_url is the right way to do it. But for something quick and dirty I would usually use explode. I think it's clearer.
#list($path, $query) = explode("?", $url, 2); // separate path from query
$match = array_pop(explode("/", $path)); // get last part of path
How about this:
$url = $_SERVER['REQUEST_URI'];
preg_match('/\/[^-]+-([^?]+)\?/', $url, $matches);
$str = isset($matches[1]) ? $matches[1] : false;`
match last '/'
match anything besides '-' until '-'
capture anything besides '?' until (not including) '?'

Urlencode everything but slashes?

Is there any clean and easy way to urlencode() an arbitrary string but leave slashes (/) alone?
Split by /
urlencode() each part
Join with /
You can do like this:
$url = "http://www.google.com/myprofile/id/1001";
$encoded_url = urlencode($url);
$after_encoded_url = str_replace("%2F", "/", $url);
Basically what #clovecooks said, but split() is deprecated as of 5.3:
$path = '/path with some/illegal/characters.html';
$parsedPath = implode('/', array_map(function ($v) {
return rawurlencode($v);
}, explode('/', $path)));
// $parsedPath == '/path%20with%20some/illegal/characters.html';
Also might want to decode before encoding, in case the string is already encoded.
I suppose you are trying to encode a whole HTTP url.
I think the best solution to encode a whole HTTP url is to follow the browser strickly.
If you just skip slashes, then you will get double-encode issue if the url has already been encoded.
And if there are some parameters in the url, (?, &, =, # are in the url) the encoding will break the link.
The browsers only encode , ", <, >, ` and multi-byte characters. (Copy all symbols to the browser, you will get the list)
You only need to encode these characters.
echo preg_replace_callback("/[\ \"<>`\\x{0080}-\\x{FFFF}]+/u", function ($match) {
return rawurlencode($match[0]);
}, $path);
Yes, by properly escaping the individual parts before assembling them with slashes:
$url = urlencode($foo) . '/' . urlencode($bar) . '/' . urlencode($baz);
$encoded = implode("/", array_map(function($v) { return urlencode($v); }, split("/", $url)));
This will split the string, encode the parts and join the string together again.

Get vine video id using php

I need to get the vine video id from the url
so the output from link like this
https://vine.co/v/bXidIgMnIPJ
be like this
bXidIgMnIPJ
I tried to use code form other question here for Vimeo (NOT VINE)
Get img thumbnails from Vimeo?
This what I tried to use but I did not succeed
$url = 'https://vine.co/v/bXidIgMnIPJ';
preg_replace('~^https://(?:www\.)?vine\.co/(?:clip:)?(\d+)~','$1',$url)
basename maybe?
<?php
$url = 'https://vine.co/v/bXidIgMnIPJ';
var_dump(basename($url));
http://codepad.org/vZiFP27y
Assuming it will always be in that format, you can just split the url by the / delimiter. Regex is not needed for a simple url such as this.
$id = end(explode('/', $url));
Referring to as the question is asked here is a solution for preg_replace:
$s = 'https://vine.co/v/bXidIgMnIPJ';
$new_s = preg_replace('/^.*\//','',$s);
echo $new_s;
// => bXidIgMnIPJ
or if you need to validate that an input string is indeed a link to vine.co :
$new_s = preg_replace('/^(https?:\/\/)?(www\.)?vine\.co.*\//','',$s);
I don't know if that /v/ part is always present or is it always v... if it is then it may also be added to regex for stricter validation:
$new_s = preg_replace('/^(https?:\/\/)?(www\.)?vine\.co\/v\//','',$s);
Here's what I am using:
function getVineId($url) {
preg_match("#(?<=vine.co/v/)[0-9A-Za-z]+#", $url, $matches);
if (isset($matches[0])) {
return $matches[0];
}
return false;
}
I used a look-behind to ensure "vine.co/v/" always precedes the ID, while ignoring if the url is HTTP or HTTPS (or if it lacks a protocol altogether). It assumes the ID is alphanumeric, of any length. It will ignore any characters or parameters after the id (like Google campaign tracking parameters, etc).
I used the "#" delimiter so I wouldn't have to escape the forward slashes (/), for a cleaner look.
explode the string with '/' and the last string is what you are looking for :) Code:
$vars = explode("/",$url);
echo $vars[count($vars)-1];
$url = 'https://vine.co/v/b2PFre2auF5';
$regex = '/^http(?:s?):\/\/(?:www\.)?vine\.co\/v\/([a-zA-Z0-9]{1,13})$/';
preg_match($regex,$url,$m);
print_r($m);
1. b2PFre2auF5

Categories