Recently I'm reading some webpages and I found there are a lot of usages like:
href="/./foo/bar.php"
Isn't this the same as href="/foo/bar.php"? Or is this there something I don't know about the differences between the two ways?
The relative URL /./foo/bar.php is not the same as the relative URL /foo/bar.php. The former has /. at the beginning.
They have the same effect, though. When URLs are processed, relative URLs are resolved to absolute URLs, and in this process, if a relative URL starts with /./, it is replaced by /. Reference: STD 66, clause Remove Dot Segments. (Such a reference is turn resolved as relative to the server root, basically something like http://www.example.com/foo/bar.php.)
So these two relative URLs always resolve to the same absolute URL. There is in general no reason to use the longer URL, which looks more complicated and confusing.
Note that this has absolutely nothing to do with folders or files. It is simply string manipulation, based on the URL standard. Whether URLs get mapped to folders and files is at the discretion of a server and in principle invisible to the world outside it.
Related
Friends,
I'm looking to find the most efficient way to choose for an anchor tag which will contain a user-submitted link to link to an external site instead of erroneously appending it to the end of the current site url.
// Explanation:
As many of you know, when writing links in Joomla such as the following:
Google
or
Google
It appends the href to the current site url.
For example, if my site was http://www.stackoverflow.com/questions/ask
And I clicked on either link above it would take me to http://www.stackoverflow.com/questions/ask/google.com
...as opposed to what would seem natural, just taking me to google.com
// End Explanation
Of course I know prepending http:// to the href solves this issue. However for user-submitted content this means calling a string-based method to check user-submitted links to make sure http:// (or https, etc.) is what starts the url and if not, to append it.
Could someone shed some light on other options for doing this. I'm hoping to find out if there are possibly better, more efficient methods.
Also, if it turns out that I am doing it the best way possible, then I would love to see what others use for this string function.
Thank you Stackfriends.
That's not a "Joomla" behaviour thats the way URL's are to be resolved as defined by the standard. What you're talking about is how browsers etc are supposed to process a relative or absolute URL.
Changing this behaviour is IMO only likely to result in grief.
A URL is a string that represents an identifier.
A URL is either a relative URL or an absolute URL. Either form can be
followed by a fragment.
A relative URL is a URL without a scheme. A relative URL must be
relative to a base URL.
An absolute URL is a URL with a scheme.
A base URL is an absolute URL with a relative scheme.
You might want to read more of this at http://url.spec.whatwg.org/#urls
This code will enforce http:// for $unknownlink without scheme (protocol):
$link = JUri::getInstance($unknownlink);
if (!$link->getScheme()) $link->setScheme('http');
echo JHtml::link($link, $link, ['target' => '_blank']);
works in J3.4, not sure about old versions
I have a common header.php to include in virtually every page of the website. I have relative links to different resources in the header.php file. As soon as I include it in other pages which are located in different sub-folders under the root folder, some of the links would break. I can make all the links in the header.php absolute. Now another problem pops up: If you move the application to another domain and put it somewhere under the wwwroot, the absolute links will not work. I couldn't find a way in PHP to extract the part of the URL which is the root for this application. I ended up define a variable for the application root and prefix every link inside header.php with this variable. This way, I only need to change one variable when this application is moved from one place to another.
I am wondering if there are other better ways to handle this kind of situation.
Your feed back would be much appreciated.
Edit: hoping to receive more feed back.
You write it's a common header.php file, like with the central variable, you can do make use of the <base> HTML element - but this is limited to HTML.
You can create yourself a mapping function that is able to resolved absolut URLS to the request URI, or that is able to resolved relative URLS to the context they come out of so that they can be mapped absolutely relatively to the request URI again.
Then you can implement a output filter that is handling URIs on it's own and you can do some special prefixes for special treatments.
I am actually using PHP but such crawling can be done by any programming languages. It will be a bit difficult to cater a lot of situations. Please help me look through the problem, and please give me some suggestion on whether I am going to the right direction.
What I know is the current url address from which I can get a list of links from <a href=" or from <frame src=".
What I am doing is: from current url address, I can firstly get root url, for example, from http://www.abc.com/def, I can get http://www.abc.com first. This is to cater the situation <a href="/fff.html", so I have to know the root url first.
Secondly, I need to get url directory from current url, this is a little difficult and I still have no idea how to get it done perfectly. For example, from http://www.abc.com/def/xyz.htm, it's url directory is http://www.abc.com/def. This is to cater the situation <a href="../../xyz.html">.
The problem I am facing is, how to get the current url directory? For example, if the current url is http://www.abc.com/def, how can I actually know that def is a directory or a file? If def is a file, then the url directory would be http://www.abc.com. But if def is a directory, then the url directory would just be http://www.abc.com/def.
You can say that if there is "/" at last, then it would be directory. But from my point of view, when I am crawling a webpage, I can't really ensure that the webpage builder will add "/" at the end of a directory url. A directory url is perfect valid, for example, if def is a directory, then http://www.abc.com/def would probably stands for http://www.abc.com/def/index.html.
Since it's hard to know whether http://www.abc.com/def is a directory or a script file, then it is hard to make full url from relative href such as <a href="xyz.html">.
Am I over complicating the problem? Is there any solution to this?
There are other situations for example href="# means anchor then I'll just append it to the end of current url. Is that correct and valid for any current url situation? Meaning that, is that valid for the situation where current url is http://www.abc.com/def (def is a directory), will http://www.abc.com/def#xyz be converted to http://www.abc.com/def/index.html#xyz ?
And for href="javascript: or href="vbscript: etc, I'll just ignore it.
And for href="xyz.???", and if ??? is an image file, exe file, or anything that is not valid html, I'll just ignore them?
Thanks.
The question might be a little messy, I hope I explained it clearly.
Anything after the domain name can map to whatever the person configuring the domain wants.
There is no guarantee that a URL ending in .html refers to a real file on a filesystem somewhere, or that it will return valid HTML, or anything else.
You can arbitrarily decide to count def/ as a directory or part of a filename, whatever floats your boat, as any choice is equally correct.
If http://www.abc.com/def is a directory then the web server will usually redirect to http://www.abc.com/def/ in order to avoid confusing the client. You simply need to notice the redirect and use urlparse.urljoin() or the appropriate function in <language-of-choice> to fuse the two components together in either case as a browser would.
In my php script I created a constant variable that defines the base url to be put in my hyperlinks, but do I really need this?
Here is an example of what I mean.
I have this:
// base url is only defined once and reused throughout
define("__BASE_URL","http://localhost/test1/");
print '<a href="'.__BASE_URL.'index.php?var1=open/>Open</a>';
(I have a lot of these spread throughout my script.)
I tried this and it works:
print '<a href="index.php?var1=open/>Open</a>';
So which way is the proper way on doing this? I noticed the second way even works on loading images, css, and javascript files.
It really comes down to how you're structuring your site. Relative URLs are great (by doing href="index.php" you're reallying saying href="./index.php"), but they can start to become messy when you begin spreading pages over multiple directories.
Personally I like to base all of my relative URLs off of the root directory, meaning that all of my URLs start with a slash ('/'). That way it doesn't matter if my script is in / or /admin, as I will always have a constant reference point - the document root - as opposed to some relative directory in the structure.
Your first example, storing document paths in variables, really starts to come in handy when you begin developing larger systems where you want the paths to be configurable. For example, maybe you want your system admins to be able to define where images are pulled from, or where the cached downloads are.
So really consider your use cases and size of your system.
Also keep in mind that if you ever move the script to another server that your URLs and directory structures may change, which could cause havoc (ex., you might have your script moved to a different subdomain, into the document root, etc.). A lot of people will drop in Apache's mod_rewrite in this case.
It depends. Without the __BASE_URL, your link will be relative to the current document. In your case, that means index.php must be in the same directory as the file that has the index.php link on it.
If you have the __BASE_URL, then the link will work no matter where its containing file is located (i.e. doesn't have to be in same directory as index.php).
Another option is to use a starting slash only. Then your link will be relative to your domain root:
print '<a href="/index.php?var1=open/>Open</a>';
In other words, the above link would point to http://localhost/index.php.
It sounds like your question is regarding absolute vs relative URLs. Are you going for portability? It's generally best to use relative URLs, especially if you plan to work in a test environment and then later transfer files to production.
Is the html <base> tag safe to use in terms of browser support? Or should I generate a root path with PHP which i then add like this somepage which makes up a absolute url.
using the base tag like this <base href="<?=BASE?>" /> I am then able to use links like this
somepage
now I am fully aware that it would be much easier to just do this without using the base tag:
somepage
but how do I test locally then with a base url of http://localhost/testsite/ ???
edit:
thanks guys, your the people who make the stackoverflow community so great :)
My advice would be to use absolute URLs beginning with a slash, and just set up a virtual host that uses /localhost/testsite/ as its document root. Then you could leave the absolute URLs and just access your site at something like http://testsite/ locally.
You've definitely given this some thought but I would like to throw one more consideration into the mix. If you are writing a web application, you should construct it in such a way that you can install it into any sub-directory in the future and it will continue to work with little change.
This means that href's, src's, action's and even HTTP Location headers will need to be aware of such a change. That's why I recommend prepending your uri's with <?php echo SITE_BASE ?> or whatever you want to call it.
Debate can rage on as to whether SITE_BASE should contain a trailing slash.
I like the first option, outputting a base directory in PHP tags
somepage
the very best. This makes your site independent from which directory it is installed in (you may not be able to change virtual host settings on shared hosting packages). It is also easy to outsource image files to a different server/subdomain if ever need be.
<base> is safe to use in terms of browser support, but I personally recommend strongly against using it for reasons of code maintainability. <base> changes the rules for how URLs are processed; you need to keep the base in mind for all relative URLs; and is very easy to overlook. From a programming perspective, it feels like a dirty fix.
Comming back to this question 2 years later, I now realise that the best way to handle absolute URLs is to put the URL into a wrapper function, for example like this:
somepage
the wrapper function then gives you full control over the URL handling
function url($url){
if(strpos($url,'/')===0){
return 'http://localhost/testsite'.$url;
}else{
return $url;
}
}