Encode the URL including its path in php

Encode the URL including its path in php - php

I want to encode the URL including its path in PHP.
For eg: As of now,my path is www.yoursite.com/code/results/show.php?u=10&n="tom".
I want to encode this URL so that user should not be able to see the
"/code/results/show.php?u=10&n="tom".
Why I need this because
I do not want to expose my server data location to user
Keep my server safe.
Thanks in advance.

You will need to look into .htaccess files, from there you can perform url rewrites that will take a url of (for example) www.yoursite.com/code/results/show.php?u=10&n=tom and instead output www.yoursite.com/results/10/tom.
If the u=10&n=tom is important, it can't be removed entirely from the URL, however it can be masked in the above way, the alternative is to do everything with POST, which is not a good way to go.
Take a look at this link: http://www.addedbytes.com/articles/for-beginners/url-rewriting-for-beginners/

best way to hide critical information is to keep it secret, instead just use a reference and get the information from the database.
in general its no good sign if security depends on how user are sending requests.
Sending it with POST would hide it, but not really... there are various ways to get and manipulate post-data.

the problem is not with url, URL should be used to only identify resource, if you want to hide something then it should not reach (be part of URL) client in the first place.

Related

Convert URL into one standard format

Here are a few URLs:
http://sub.example.com/?feed=atom&hello=world
http://www.sub.example.com/?feed=atom&hello=world
http://sub.example.com/?hello=world&feed=atom
http://www.sub.example.com/?hello=world&feed=atom
http://www.sub.example.com/?hello=world&feed=atom
http://www.sub.example.com/?hello=world&feed=atom#123
As you can see, they all lead to the exact same page but the URL format is different. Here is two other basic examples:
http://example.com/hello/
http://example.com/hello
Both are the same.
I want to convert the URL into one standard format so that when I store the URL in the database, I can easily check whether if the URL string already exists in the database.
Because of the various ways of how the URL can be formatted, this can be puzzling.
What's the definitive approach to converting URL into one standard format? Maybe parse_url() route...?
Edit
As outlined in the comments, there is no definitive solution to this, but the aim is to get as close as possible with what we have without "retrieving" the page. Please read comments before posting an answer to this bounty.

After you parse_url:
Remove the www prefix from the domain name
If the path is not empty - remove the trailing slash from it
Sort query parameters alphabetically by their name - if there are any
Combine these parts in order to get a canonical URL.

I had the same issue for a reports-configuration-save functionality. In our system, users can design his own reports of sales (like JQL of Jira); for that, we use get params as conditions, and fragment identifier (after #) as layout setup, like this:
http://example.com/report.php?since=20180101&until=20180806#sort=amount&color=blue
For our system, order of GET or after # params are irrelevant as well you reach the same report configuration if set param "until" first than "since", so for us are the same request.
Considering this, subdomains are out of discussion, cause you must solve this using rewrite techniques (like mod_rewrite with 301 in Apache) or create a pool of domain exceptions to do this at software level. Also, different domains can point into different websites, so you must decide if is a good idea; in subdos "www" is very easy to figured it out, but it will toke you time in another cases.
Server side can help to get vars in query section. For example, in PHP you can use function parse_str and $_SERVER['QUERY_STRING'] to get array, and then, you will need use asort() to order it to finnaly compare if are the same request (array_diff function).
Unfortunately, server side is not an option since have no capability to get after hash (#) content, and we still without consider another problems, like scriptname included, protocols or ports:
http://www.sub.example.com/index.php?hello=world&feed=atom
https://www.sub.example.com/?hello=world&feed=atom
http://www.sub.example.com:8081/?hello=world&feed=atom
In my personal experience, the most close solution is JavaScript, for handling url, parsing query section as array, compare them and do the same with fragment identifier. If you need to use it in server side, every load page will must be followed with an ajax request sending this data to the server.
Apologies in advance for length of my answer, but it is what I had to go through in order to solve the same problems you have. Greetings!
Get protocol, domain, and port from URL
Get protocol, domain, and port from URL
How can I get query string values in JavaScript?
How can I get query string values in JavaScript?
How do I get the fragment identifier (value after hash #) from a URL?
How do I get the fragment identifier (value after hash #) from a URL?

adding the preferred <link rel="canonical" ... > tag into the HTML headers is the only reliable solution, in order to reference unique content to a single SEF URL. see Google's documentation, concerning Consolidate duplicate URLs, which possibly answers the whole question more autoritative and reliable, than I ever could.
the idea of being able to know of the canonical URL or to resolve a bunch externals URLs, without parsing those server's .htaccess rewrite-rules or the HTML headers, does not appear to be applicable (simply because one can maintain a table with URL aliases, which subsequently do not permit guessing how a HTTP request might have been re-written).
this question might belong to https://webmasters.stackexchange.com/search?q=cannonical.

Since the question is marked „PHP“ I assume you are in the backend.
There are enough answers how you can compare URLs (protocol, host, port, path, list of request params) where path is case sensitive, protocol and host are not. Changing the order of request parameters is strictly speaking also changing the URL.
My impression is that you want to differentiate by the RESOURCE which the server is serving (http://www.sub.example.com/ serves the same resource as http://sub.example.com/ or .../hello serves the same resource as .../hello/)
Which resource is served, you should perfectly know on the backend level, since you (the backend) know what you are serving. Find the perfect ID for the resource and use it.
PS: the URL is not a good identifier for that. But if you must use it, just use a sanitized version (sanitization for your purpose => sanitize to your preferred host, strip or add slashes at end of paths, drop things like /../ from path (security issue anyway), bring the request params in a certain order, whatever is right for your purpose.
Best regards, iPirat

It's the case with duplicate URLs and you can avoid these kind of duplicate URLs using a URL factory redirecting all URLs which are not proper to the proper URL.
And the same thing is explained in this article:
https://www.tinywebhut.com/remove-duplicate-urls-from-your-website-38
Any other URLs leading to the same page are 301 redirected to the proper version of the URLs.
This is the best practice of Search Engine Optimization(SEO). Here I'm going to give you a couple of examples.
You can consider the URLs of this website, for example the wrong links of this page are
https://stackoverflow.com/questions/51685850
https://stackoverflow.com/questions/51685850/convert-url-into-one-s
https://stackoverflow.com/questions/51685850/
If you go to the above wrong URLs of this page, you'll be redirected to the proper URL which is
https://stackoverflow.com/questions/51685850/convert-url-into-one-standard-format
And if you change the title of this question, all other URLs are 301 redirected to the proper URL. The idea here is the 301 redirection which tells the search engines to replace the old URL with the new one otherwise the search engines find different URLs providing the same content.
The real deal here is the id of the question, 51685850. This id is used to create the proper URL with the information from the database. With the URL factory that is created in the article in the link provided, you do not even need to store URLs in the database.
You can read more on duplicate content here:
https://moz.com/learn/seo/duplicate-content
The same rules are applied to tinywebhut.com as well, the wrong URLs are
https://www.tinywebhut.com/remove-duplicate-38
https://www.tinywebhut.com/some-text-38
https://www.tinywebhut.com/remove-duplicate-urls-from-your-website-38/
In the above URLs the ID is appended to the end of the URL which is 38 and if you go to any of these URLs, you'll be 301 redirected to the proper version of the URLs which is
https://www.tinywebhut.com/remove-duplicate-urls-from-your-website-38
I didn't make any functions to explain this here because it is already done in this article:
https://www.tinywebhut.com/remove-duplicate-urls-from-your-website-38
You can achieve the goal with a couple of really simple functions and you can apply the same idea to remove other duplicate URLs such as /about.php, /about, /about.php/, /about/ and so on. And to achieve this you just need a little more code to your existing functions.
One alternative is adding canonical tag, for example, even if you have more than one URL to go the same page, you just need to apply canonical tag and add the link to the proper URL.
<link rel="canonical" href="https://stackoverflow.com/questions/51685850/convert-url-into-one-standard-format" />
This way you are telling the search engines that the multiple URLs should be considered as one and the search engines add the link used in the canonical tag in their search results. You can read more on canonicalization here:
https://moz.com/learn/seo/canonicalization
But still the best way to get rid of duplicate content is the 301 redirect. If you have a 301 redirect like I talked at the beginning, all problems are solved without surprises.

My original answer assumes that the pages are all owned by the OP, as per the line "As you can see, they all lead to the exact same page but the URL format is different...". I am adapting the answer to handle multiple options and adding a list of assumptions you can and cannot make about URLs.
As others have pointed out there is no definitive easy answer to this if you do not know that the page(s) are the same. However, if you follow these assumptions, you should be safe standardizing some things:
CAN ASSUME
Query strings with the same values point to the same location regardless of order. Example: https://example.com/?fruit=apple&color=red is the same as https://example.com/?color=red&fruit=apple
301 redirects to a specific source can be followed. If you receive a 301 redirect response, follow the redirect and use that URL. You can safely assume that if a URL actually does point to the same page, and page rank is optimized, then you can follow it.
If there is a single <link rel="canonical"> tag in the HTML, that too can be used to cover the canonical link (see below for why).
CANNOT ASSUME
Any URL is guaranteed to be the same as any other URL, if they are different (by URL in this case I am talking about anything before the query string).
http://example.com can be different from https://example.com can be different from http://www.example.com or https://www.example.com. There is no restriction against showing a different website when putting "www" or leaving it out. That's why page rank on search engines is really damaged here.
Any two URLs, even if they currently have exactly the same content, will keep exactly the same content. An example would be https://example.com/test and https://sub.example.com/test. Both may feasibly be set to the same generic test page content. In the future, https://sub.example.com/test may be changed. You can't assume it won't be.
If you own the site
Redirect all traffic in the first part of the URL format you want: Do you want www.example.com or example.com or sub.example.com? Do you want a trailing slash or not? Redirect this first, either using server rules or PHP. This is also highly beneficial for search page rank (if that matters to you).
An example of this would be something like this:
if (!$_SERVER['HTTPS'] || 'example.com' !== $_SERVER['HTTP_HOST'] || rtrim($_SERVER['PHP_SELF'], '/') !== $_SERVER['PHP_SELF']) {
header('HTTP/1.1 301 Moved Permanently');
header('Location: '. 'https://example.com/'.rtrim($_SERVER['PHP_SELF']), '/'));
exit;
}
Finally, to manage any remaining SEO concerns, you can add this HTML tag:
`<link rel="canonical" href="<?php echo $url; ?>">`
Whether you own the site or not, you can standardize query order
Even if you don't control the site, you can assume that query order does not matter. To standardize this, take your query and rebuild the parameters, appending it to your normalized URL.
function getSortedQuery()
{
$url = [];
parse_str($_SERVER['QUERY_STRING'], $url);
ksort($url);
return http_build_query($url);
}
$url = $_SERVER['HTTP_HOST'].$_SERVER['PHP_SELF'].'?'.getSortedQuery();
Another option is to grab the contents of the page and see if there is a <link rel="canonical"> string, and use that string to log your data. This is a bit more costly as it requires a full page load.
To repeat, do make sure you grab 301 redirects as they are not suggestions, but directives, as to the end result URL.
One final suggestion
I might recommend using two columns, one being "canonical_url" and another being "effective_url". Sometimes a URL works and then later becomes a 301 redirect. This is just my take but I would like to know these things.

All of the answers have great information. Assuming you are using an Apache-like server, for the URL bit, I would use .htaccess (or, preferably, if you can change it - the equivalent server Apache config file) to do the rewrites. For a simple example:
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteRule (.*) http://example.com/$1 [R=Permanent]
In this example, the "R=Permanent" DOES do a redirect. This is usually not a big issue as, a) it tells the browser to remember the redirect, and b) your internal links are presumably relative, so protocol (http or https) and the server (example.com or whatever) are preserved. So generally the redirect will be once per session or less - time well spent, IMO, to avoid doing all this in PHP.
I guess you could use it to rewrite the order of the query bits as well, though when the query bits are significant, I tend to (not recommending you do, just sayin') add them to my path (eg rewrite ".../blah/atom" to ".../blah.php?feed=atom"). At any rate, there are loads of rewrite tricks available, and I recommend you read about them in
Apache mod_rewrite.
If you do go this route, be sure to carefully think through what you want to happen - once you start mucking with the URL's, you are usually stuck with your decisions for a long while.

As several have pointed out, while the URLs you show may currently point to the same content, there is no way to tell if they will in the future. A change in either protocol or hostname can get you different sets of content, even example.com vs. www.example.com, even if served up by the same machine at the same IP. Not common, but it can happen...
So if I were wanting to maintain a list of URLs, I would store protocol, hostname, directory path, filename if present (aka "whatever came after the last slash before a questionmark"), and a sorted on key set of key/value pairs for the GET arguments
And then don't forget that you can go to https://www.google.com and not have anything BUT the protocol and hostname...

Avoid passing the parameters in the url. Pass your parameters to the web page using JSON.

Using a QR code securely

All,
I'm going to use a QR code from the following URL:
http://qrcode.kaywa.com/
I want to use the URL option so when someone scans it they are sent to the URL that I specified on the code. I want to have something like the following URL:
http://www.website.com/web-page/?type=uplights&action=checkout
Based on the variables in the URL I want to allow my user to insert some data.
Is there a way to secure this do that I know a user got to this URL from scanning the QR code instead of just typing that information into the URL?
Thanks!

Short Answer: Not directly.
QR codes were not designed to keep content stored within it secret. Someone could use a QR reader to scan your URL, store it and keep using it over and over again, without actually scanning it again.
One way we used to circumvent this issue was to encrypt our URL such that our own application (Based on ZXing) would be the only one capable of reading our QR code. It then sends the actual request with a nonce over a secure channel such that a replay attack would also be rendered useless (in case someone was sniffing outbound connections). All other readers see the encrypted URL which isn't of any use.
Other than that, there isn't another way of ensuring the user actually does scan your QR and doesn't type it out/paste it in.
The way we implemented this:
We stored the URL as http://www.website.com/app.php?<encrypted_string>. If someone read our URL a different QR decoder, they would be taken to our app.php page, which urged them to read the QR using our application.
Our app itself, on encountering that URL stripped off the encrypted query-string, decrypted it, and formed its own request to the right page. In PHP, you could execute that request at the server-end itself, so it is never visible to the user. You could use mcrypt as detailed here for encryption.

You can add a secret-ish parameter to the URL and not publish the URL with that parameter. But basically, no, you still won't know if someone didn't just type in that URL. (For example, I may have used the QR code, then cut and paste the URL in an email to a friend, and that friend may have typed it in.) But you'll know that they probably didn't just type it in.
QR codes are just easily reversible encodings for text. There's no magic there. So there are things you can do to make it less likely that someone typed in the URL, but you can never be certain.

XSS Vulnerability in PHP scripts

I have been searching everywhere to try and find a solution to this. I have recently been running scans on our websites to find any vulnerabilities to XSS and SQL Injection. Some items have been brought to my attention.
Any data which is user inputted is now validated and sanitized using filter_var().
My issue now is with XSS and persons manipulating the URL. The simple one which seems to be everywhere is:
http://www.domainname.com/script.php/">< script>alert('xss');< /script >
This then changes some of the $_SERVER variables and causes all of my relative paths to CSS, links, images, etc.. to be invalid and the page doesn't load correctly.
I clean any variables that are used within the script, but I am not sure how I get around removing this unwanted data in the URL.
Thanks in advance.
Addition:
This then causes a simple link in a template file:
Link
to actually link to:
"http://www.domainname.com/script.php/">< script>alert('xss');< /script >/anotherpage.php

This then changes some of the $_SERVER variables and causes all of my relative paths to CSS, links, images, etc.. to be invalid and the page doesn't load correctly.
This sounds you made a big mistake with your website and should re-think how you inject link-information from the input into your output.
Filtering input alone does not help here, you need to filter the output as well.
Often it's more easy if your application recieves a request that does not match the superset of allowed requests to return a 404 error.
I am not sure how I get around removing this unwanted data in the URL.
Actually, the request has been already send, so the URL is set. You can't "change" it. It's just the information what was requested.
It's now your part to deal upon it, not to blindly pass it around any longer, e.g. into your output (and then your links are broken).
Edit: You now wrote more specifically what you're concerned about. I would go in one with dqhendricks here: Who cares?
If you really feel uncomfortable with the fact that a user is just using her browser and enters any URL she feels free to do so, well, the technically correct response is:
400 Bad Request (ref)
And return a page with no or only fully-qualified URIs (absolute URIs) or a redefinition of the Base-URI, otherwise the browser will take the URI entered into it's address bar as the Base-URI. See Uniform Resource Identifier (URI): Generic Syntax RFC 3986; Section 5. Reference ResolutionSpecs.

first, if someone adds that crap to their url, who cares if the page doesn't load images correctly? also if the request isn't valid, why would it load any page? why are you using SERVER vars to get paths anyways?
second, you should also be escaping any user submitted database input with the appropriate method for your particular database to avoid sql injection. filter_var generally will not help.
third, xss is simple to protect from. Any user submitted data that is to be displayed on any page needs to be escaped with htmlspecialchars(). this is easier to ensure if you use a view class that you can build this escaping in to.

To your concern about XSS: The altered URL won't get into your page unless you blindly use the related $_SERVER variables. The fact that the relative links seem to include the URL injected script is a browser behavior that risks only breaking your relative links. Since you are not blinding using the $_SERVER variables, you don't have to worry.
To your concern about your relative paths breaking: Don't use relative paths. Reference all your resources with at least a root-of-domain path (starting with a slash) and this sort of URL corruption will not break your site in the way you described.

How do I get this URL without considering the Apache settings?

HEllo I have this URL I need to get with PHP
http://www.domain.com/forum/#forum/General-discussions-0.htm
The problem is this is not a real URL, but this the mask created by the .htaccess.
I need to get the visible URL and not the real path of the file, because I need to compare it with some PHP variables I have.
In fact the real path will look like this:
http://domain.com/modules/boonex/forum/index.php
And in that way is totally useless for me.
How do I get the first URL as it is?

You can't get that from http://www.domain.com/forum/#forum/General-discussions-0.htm. Everything after the fragment (#) is not even send to the server, there is no way to retrieve it save for a delayed update with javascript. All you'll get it is http://www.domain.com/forum/ send to the server, and on the onload event of your document you can possibly load something in with javascript.

Look into the source code or it may not have real urls at all. The part is for ajax based navigation. It may mean that there are no real urls on that site and if there are then they should be extracted from <a href="someurl"> as they might masked using javascript.

With
file_get_contents();
for example. Neither user nor your server mind about .htaccess
It's server proccessing the request who have to direct you to correct address
however php does ignore everything after #, so in this case you have no chance to get it without real url
As #Wrikken said, there is no way to get url after # fragment

Taking a hashed URL and sending it to a new URL

For example, I'd like to have my registration, about and contact pages resolve to different content, but via hash tags:
three links one each to the registration, contact and about page -
www.site.com/index.php#about
www.site.com/index.php#registration
www.site.com/index.php#contact
Is there a way using Javascript or PHP to resolve these pages to the separated content?

The hash is not sent to the server, so you can only do it in Javascript.
Check the value of location.hash.

There's no server-side way to do it. You could work with AJAX, but this will break the site for non-javascript users. The best way would probably be to have server-side content URLs (index.php?page=<page_id>) and rewrite these locally with JavaScript (to #<page_id>) and handle the content loading with AJAX then. That way you can have your hash-URLs for JS-enabled devices and everybody else can still use the site.
It does however require a bit of redundance because you need to provide the same content twice, once for inclusion via AJAX and once with the proper layout and everything via PHP.
If you just want hash URLs for aesthetic reasons, but don't want to rely on JS, you're out of luck. The semantics of URLs are against you: fragment IDs shouldn't really affect the content the URL is referring to, merely the fragment within that content. AJAX URLs are changing those semantics, but there's no good reason to do that if you don't have to.

I suppose you probably have a good reason, but can I ask, why would you do this? It breaks the widely understood standard of how hashs in URLs are supposed to work, and its just begging for trouble for interoperability with other clients, down the road.

You can use PHP's Global $_REQUEST variables to grab the requested URL and parse out the hashtag...

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Encode the URL including its path in php - php

the problem is not with url, URL should be used to only identify resource, if you want to hide something then it should not reach (be part of URL) client in the first place.

Related

Convert URL into one standard format

Using a QR code securely

XSS Vulnerability in PHP scripts

How do I get this URL without considering the Apache settings?

Taking a hashed URL and sending it to a new URL

Categories

Resources