Using the URL as a pseudo GET variable? - php

I understand how to parse the URL to get data. What I don't know how to do, or rather, can't seem to search properly for it, is how to prevent redundant file creation.
Here's what I mean. Let's say we have referrer1, referrer2, and referrer3. I want the URL link for each referrer (they are each given their own) to be www.test.com/referrer1, www.test.com/referrer2, and www.test.com/referrer3. Other than pulling the refferer name from the URL, the website functions identically. Is there some way that I can do this that scales to having any arbitrary number of referrers, so that I don't have to make an identical subfolder for every single referrer?

You are looking for url rewriting for example. You could have a adress example.com/index.php?ref=ref1,2,3 etc and use url rewrite to make it look lite example.com/referrer1,2,3 etc
http://en.wikipedia.org/wiki/Rewrite_engine

Related

Convert URL into one standard format

Here are a few URLs:
http://sub.example.com/?feed=atom&hello=world
http://www.sub.example.com/?feed=atom&hello=world
http://sub.example.com/?hello=world&feed=atom
http://www.sub.example.com/?hello=world&feed=atom
http://www.sub.example.com/?hello=world&feed=atom
http://www.sub.example.com/?hello=world&feed=atom#123
As you can see, they all lead to the exact same page but the URL format is different. Here is two other basic examples:
http://example.com/hello/
http://example.com/hello
Both are the same.
I want to convert the URL into one standard format so that when I store the URL in the database, I can easily check whether if the URL string already exists in the database.
Because of the various ways of how the URL can be formatted, this can be puzzling.
What's the definitive approach to converting URL into one standard format? Maybe parse_url() route...?
Edit
As outlined in the comments, there is no definitive solution to this, but the aim is to get as close as possible with what we have without "retrieving" the page. Please read comments before posting an answer to this bounty.
After you parse_url:
Remove the www prefix from the domain name
If the path is not empty - remove the trailing slash from it
Sort query parameters alphabetically by their name - if there are any
Combine these parts in order to get a canonical URL.
I had the same issue for a reports-configuration-save functionality. In our system, users can design his own reports of sales (like JQL of Jira); for that, we use get params as conditions, and fragment identifier (after #) as layout setup, like this:
http://example.com/report.php?since=20180101&until=20180806#sort=amount&color=blue
For our system, order of GET or after # params are irrelevant as well you reach the same report configuration if set param "until" first than "since", so for us are the same request.
Considering this, subdomains are out of discussion, cause you must solve this using rewrite techniques (like mod_rewrite with 301 in Apache) or create a pool of domain exceptions to do this at software level. Also, different domains can point into different websites, so you must decide if is a good idea; in subdos "www" is very easy to figured it out, but it will toke you time in another cases.
Server side can help to get vars in query section. For example, in PHP you can use function parse_str and $_SERVER['QUERY_STRING'] to get array, and then, you will need use asort() to order it to finnaly compare if are the same request (array_diff function).
Unfortunately, server side is not an option since have no capability to get after hash (#) content, and we still without consider another problems, like scriptname included, protocols or ports:
http://www.sub.example.com/index.php?hello=world&feed=atom
https://www.sub.example.com/?hello=world&feed=atom
http://www.sub.example.com:8081/?hello=world&feed=atom
In my personal experience, the most close solution is JavaScript, for handling url, parsing query section as array, compare them and do the same with fragment identifier. If you need to use it in server side, every load page will must be followed with an ajax request sending this data to the server.
Apologies in advance for length of my answer, but it is what I had to go through in order to solve the same problems you have. Greetings!
Get protocol, domain, and port from URL
Get protocol, domain, and port from URL
How can I get query string values in JavaScript?
How can I get query string values in JavaScript?
How do I get the fragment identifier (value after hash #) from a URL?
How do I get the fragment identifier (value after hash #) from a URL?
adding the preferred <link rel="canonical" ... > tag into the HTML headers is the only reliable solution, in order to reference unique content to a single SEF URL. see Google's documentation, concerning Consolidate duplicate URLs, which possibly answers the whole question more autoritative and reliable, than I ever could.
the idea of being able to know of the canonical URL or to resolve a bunch externals URLs, without parsing those server's .htaccess rewrite-rules or the HTML headers, does not appear to be applicable (simply because one can maintain a table with URL aliases, which subsequently do not permit guessing how a HTTP request might have been re-written).
this question might belong to https://webmasters.stackexchange.com/search?q=cannonical.
Since the question is marked „PHP“ I assume you are in the backend.
There are enough answers how you can compare URLs (protocol, host, port, path, list of request params) where path is case sensitive, protocol and host are not. Changing the order of request parameters is strictly speaking also changing the URL.
My impression is that you want to differentiate by the RESOURCE which the server is serving (http://www.sub.example.com/ serves the same resource as http://sub.example.com/ or .../hello serves the same resource as .../hello/)
Which resource is served, you should perfectly know on the backend level, since you (the backend) know what you are serving. Find the perfect ID for the resource and use it.
PS: the URL is not a good identifier for that. But if you must use it, just use a sanitized version (sanitization for your purpose => sanitize to your preferred host, strip or add slashes at end of paths, drop things like /../ from path (security issue anyway), bring the request params in a certain order, whatever is right for your purpose.
Best regards, iPirat
It's the case with duplicate URLs and you can avoid these kind of duplicate URLs using a URL factory redirecting all URLs which are not proper to the proper URL.
And the same thing is explained in this article:
https://www.tinywebhut.com/remove-duplicate-urls-from-your-website-38
Any other URLs leading to the same page are 301 redirected to the proper version of the URLs.
This is the best practice of Search Engine Optimization(SEO). Here I'm going to give you a couple of examples.
You can consider the URLs of this website, for example the wrong links of this page are
https://stackoverflow.com/questions/51685850
https://stackoverflow.com/questions/51685850/convert-url-into-one-s
https://stackoverflow.com/questions/51685850/
If you go to the above wrong URLs of this page, you'll be redirected to the proper URL which is
https://stackoverflow.com/questions/51685850/convert-url-into-one-standard-format
And if you change the title of this question, all other URLs are 301 redirected to the proper URL. The idea here is the 301 redirection which tells the search engines to replace the old URL with the new one otherwise the search engines find different URLs providing the same content.
The real deal here is the id of the question, 51685850. This id is used to create the proper URL with the information from the database. With the URL factory that is created in the article in the link provided, you do not even need to store URLs in the database.
You can read more on duplicate content here:
https://moz.com/learn/seo/duplicate-content
The same rules are applied to tinywebhut.com as well, the wrong URLs are
https://www.tinywebhut.com/remove-duplicate-38
https://www.tinywebhut.com/some-text-38
https://www.tinywebhut.com/remove-duplicate-urls-from-your-website-38/
In the above URLs the ID is appended to the end of the URL which is 38 and if you go to any of these URLs, you'll be 301 redirected to the proper version of the URLs which is
https://www.tinywebhut.com/remove-duplicate-urls-from-your-website-38
I didn't make any functions to explain this here because it is already done in this article:
https://www.tinywebhut.com/remove-duplicate-urls-from-your-website-38
You can achieve the goal with a couple of really simple functions and you can apply the same idea to remove other duplicate URLs such as /about.php, /about, /about.php/, /about/ and so on. And to achieve this you just need a little more code to your existing functions.
One alternative is adding canonical tag, for example, even if you have more than one URL to go the same page, you just need to apply canonical tag and add the link to the proper URL.
<link rel="canonical" href="https://stackoverflow.com/questions/51685850/convert-url-into-one-standard-format" />
This way you are telling the search engines that the multiple URLs should be considered as one and the search engines add the link used in the canonical tag in their search results. You can read more on canonicalization here:
https://moz.com/learn/seo/canonicalization
But still the best way to get rid of duplicate content is the 301 redirect. If you have a 301 redirect like I talked at the beginning, all problems are solved without surprises.
My original answer assumes that the pages are all owned by the OP, as per the line "As you can see, they all lead to the exact same page but the URL format is different...". I am adapting the answer to handle multiple options and adding a list of assumptions you can and cannot make about URLs.
As others have pointed out there is no definitive easy answer to this if you do not know that the page(s) are the same. However, if you follow these assumptions, you should be safe standardizing some things:
CAN ASSUME
Query strings with the same values point to the same location regardless of order. Example: https://example.com/?fruit=apple&color=red is the same as https://example.com/?color=red&fruit=apple
301 redirects to a specific source can be followed. If you receive a 301 redirect response, follow the redirect and use that URL. You can safely assume that if a URL actually does point to the same page, and page rank is optimized, then you can follow it.
If there is a single <link rel="canonical"> tag in the HTML, that too can be used to cover the canonical link (see below for why).
CANNOT ASSUME
Any URL is guaranteed to be the same as any other URL, if they are different (by URL in this case I am talking about anything before the query string).
http://example.com can be different from https://example.com can be different from http://www.example.com or https://www.example.com. There is no restriction against showing a different website when putting "www" or leaving it out. That's why page rank on search engines is really damaged here.
Any two URLs, even if they currently have exactly the same content, will keep exactly the same content. An example would be https://example.com/test and https://sub.example.com/test. Both may feasibly be set to the same generic test page content. In the future, https://sub.example.com/test may be changed. You can't assume it won't be.
If you own the site
Redirect all traffic in the first part of the URL format you want: Do you want www.example.com or example.com or sub.example.com? Do you want a trailing slash or not? Redirect this first, either using server rules or PHP. This is also highly beneficial for search page rank (if that matters to you).
An example of this would be something like this:
if (!$_SERVER['HTTPS'] || 'example.com' !== $_SERVER['HTTP_HOST'] || rtrim($_SERVER['PHP_SELF'], '/') !== $_SERVER['PHP_SELF']) {
header('HTTP/1.1 301 Moved Permanently');
header('Location: '. 'https://example.com/'.rtrim($_SERVER['PHP_SELF']), '/'));
exit;
}
Finally, to manage any remaining SEO concerns, you can add this HTML tag:
`<link rel="canonical" href="<?php echo $url; ?>">`
Whether you own the site or not, you can standardize query order
Even if you don't control the site, you can assume that query order does not matter. To standardize this, take your query and rebuild the parameters, appending it to your normalized URL.
function getSortedQuery()
{
$url = [];
parse_str($_SERVER['QUERY_STRING'], $url);
ksort($url);
return http_build_query($url);
}
$url = $_SERVER['HTTP_HOST'].$_SERVER['PHP_SELF'].'?'.getSortedQuery();
Another option is to grab the contents of the page and see if there is a <link rel="canonical"> string, and use that string to log your data. This is a bit more costly as it requires a full page load.
To repeat, do make sure you grab 301 redirects as they are not suggestions, but directives, as to the end result URL.
One final suggestion
I might recommend using two columns, one being "canonical_url" and another being "effective_url". Sometimes a URL works and then later becomes a 301 redirect. This is just my take but I would like to know these things.
All of the answers have great information. Assuming you are using an Apache-like server, for the URL bit, I would use .htaccess (or, preferably, if you can change it - the equivalent server Apache config file) to do the rewrites. For a simple example:
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^www\.example\.com$
RewriteRule (.*) http://example.com/$1 [R=Permanent]
In this example, the "R=Permanent" DOES do a redirect. This is usually not a big issue as, a) it tells the browser to remember the redirect, and b) your internal links are presumably relative, so protocol (http or https) and the server (example.com or whatever) are preserved. So generally the redirect will be once per session or less - time well spent, IMO, to avoid doing all this in PHP.
I guess you could use it to rewrite the order of the query bits as well, though when the query bits are significant, I tend to (not recommending you do, just sayin') add them to my path (eg rewrite ".../blah/atom" to ".../blah.php?feed=atom"). At any rate, there are loads of rewrite tricks available, and I recommend you read about them in
Apache mod_rewrite.
If you do go this route, be sure to carefully think through what you want to happen - once you start mucking with the URL's, you are usually stuck with your decisions for a long while.
As several have pointed out, while the URLs you show may currently point to the same content, there is no way to tell if they will in the future. A change in either protocol or hostname can get you different sets of content, even example.com vs. www.example.com, even if served up by the same machine at the same IP. Not common, but it can happen...
So if I were wanting to maintain a list of URLs, I would store protocol, hostname, directory path, filename if present (aka "whatever came after the last slash before a questionmark"), and a sorted on key set of key/value pairs for the GET arguments
And then don't forget that you can go to https://www.google.com and not have anything BUT the protocol and hostname...
Avoid passing the parameters in the url. Pass your parameters to the web page using JSON.

Why long URL's and what are those extra characters for?

I was looking at a few web e-commerce website and noticed that they usually have long URL something like this (taken from an e-commerce site)
https://www.example.com/bsquare-tuxedo-solid-men-s-suit/p/itmejr4zbjt6fr3h?gclid=CjwKEAiAz4XFBRCW87vj6-28uFMSJAAHeGZbXBpy4Rw5UckmBAiPaacHLhr5MbPj4bMxVThaQe5A3xoCIi7w_wcB&pid=SUIEJR4ZAK7FZKRT&cmpid=content_suit_8965229628_gmc_pla&tgi=sem%2C1%2CG%2C11214002%2Cg%2Csearch%2C%2C95089233620%2C1o5%2C%2C%2Cc%2C%2C%2C%2C%2C%2C%2C&s_kwcid=AL!739!3!95089233620!!!g!182171694500!&ef_id=WJitiQAAAfz4lw1Y%3A20170213091758%3As
Generally a router would be
http://www.example.com/controller/method/param1/param2
Then what all codes go up in the longer URL as above and why do a longer version when a short URL can get the same job done?
You can start to read more about get method:
http://www.html-form-guide.com/php-form/php-form-get.html
To send data server side in php you will need to make a request get or post. The second link is an get but rewritten.In the end to access your data the url will be rewritten by the framework you use ( http://www.example.com/controller/method?id1=param1&id2=param2 ). If you do not use any framework you will need to rewrite yourself with .htaccess Usually the second link is used for
more common pages so will be better indexed by google. The first link is for data that is more dynamic and may have more or less parameters, null or "" values etc. And also you do not care to index that page. For exempla a search : http://www.example.com/controller/method?searchValue=&page=1
You also can read more here. It explain how and why we rewrite links: https://www.smashingmagazine.com/2011/11/introduction-to-url-rewriting/

I like to change my id number to my post title in php

I am using this function in php
http://domain.com/page.php?do=post&id=124
but i wish to change this URL to
http://domain.com/page.php/this is first post.html
How can I change this URL?
You need to have knowledge of .htaccess to accomplish this.
What you need to achieve here is URL Rewriting.
What is "URL Rewriting"?
Most dynamic sites include variables in their URLs that tell the site
what information to show the user. Typically, this gives URLs like the
following, telling the relevant script on a site to load product
number 7.
http://www.pets.com/show_a_product.php?product_id=7 The problems with
this kind of URL structure are that the URL is not at all memorable.
It's difficult to read out over the phone (you'd be surprised how many
people pass URLs this way). Search engines and users alike get no
useful information about the content of a page from that URL. You
can't tell from that URL that that page allows you to buy a Norwegian
Blue Parrot (lovely plumage). It's a fairly standard URL - the sort
you'd get by default from most CMSes. Compare that to this URL:
http://www.pets.com/products/7/ Clearly a much cleaner and shorter
URL. It's much easier to remember, and vastly easier to read out. That
said, it doesn't exactly tell anyone what it refers to. But we can do
more:
You can find more information here.
You have to perform url rewriting. Its a technique for you you need to configure your .htaccess file.
Check this for Details

How do I get this URL without considering the Apache settings?

HEllo I have this URL I need to get with PHP
http://www.domain.com/forum/#forum/General-discussions-0.htm
The problem is this is not a real URL, but this the mask created by the .htaccess.
I need to get the visible URL and not the real path of the file, because I need to compare it with some PHP variables I have.
In fact the real path will look like this:
http://domain.com/modules/boonex/forum/index.php
And in that way is totally useless for me.
How do I get the first URL as it is?
You can't get that from http://www.domain.com/forum/#forum/General-discussions-0.htm. Everything after the fragment (#) is not even send to the server, there is no way to retrieve it save for a delayed update with javascript. All you'll get it is http://www.domain.com/forum/ send to the server, and on the onload event of your document you can possibly load something in with javascript.
Look into the source code or it may not have real urls at all. The part is for ajax based navigation. It may mean that there are no real urls on that site and if there are then they should be extracted from <a href="someurl"> as they might masked using javascript.
With
file_get_contents();
for example. Neither user nor your server mind about .htaccess
It's server proccessing the request who have to direct you to correct address
however php does ignore everything after #, so in this case you have no chance to get it without real url
As #Wrikken said, there is no way to get url after # fragment

REALLY basic mod_rewrite question

I am trying to use SEO-friendly URLs for a website. The website displays information in hierarchical manner. For instance, if my website was about cars I would want the URL 'http://example.com/ford' to show all Ford models stored in my 'cars' table. Then the URL 'http://example.com/ford/explorer' would show all the Ford Explorer Models (V-6, V-8, Premium, etc).
Now for the question:
Is mod_rewrite used to rewrite a query string style URL into a semantic URL, or the other way around? In other words, when I use JavaScript to set window.location=$URL, should the URL be the query string version 'http://example.com/page.php?type=ford&model=explorer' OR do I internally use 'http://example.com/ford/explorer' which then gives me access to the query string variables?
Hopefully, someone can see my confusion with this issue. For what it's worth, I do have unique 'slugs' made for all my top level categories (obviously, the site isn't about cars).
Thanks for all the help so far. I got the rewrite working but it is affecting other paths on the site (CSS, JavaScript, images). I using the correct path structure for all these includes (ie '/images/car.png' and '/css/main.css'). Do I have to use an absolute path ('http://example.com/css/main.css') for all files? Thanks!
Generally, people who use mod_rewrite use the terminology like this:
I want mod_rewrite to rewrite A to be B.
What this means is that any request from the outside world for page A gets rewritten to file B on the server.
You want the outside world to see URLs that look like
A) http://example.com/ford/explorer
but your web server wants them to look like
B) http://example.com/page.php?type=ford&model=explorer
I would say you want to rewrite (A) to look like (B), or you want to rewrite the semantic URL into a query string URL.
Since all the links on your page are clicked on by the user and/or requested by the browser, you want them to look like (A). This includes links that javascript uses in window.location. They can and should look like (A).
once you have set up mod_rewrite then your links should point to the mod_rewritten version of the URL (in your example: http://mysite.com/ford/explorer). Internally in your system you will still reference the variables as if they are traditional query string name value pairs though.
Its also worth pointing out that Google is now starting to advocate more logical URLs from a search engine point of view, i.e. a query string over mod rewrite
Does that mean I should avoid
rewriting dynamic URLs at all? That's
our recommendation, unless your
rewrites are limited to removing
unnecessary parameters, or you are
very diligent in removing all
parameters that could cause problems.
If you transform your dynamic URL to
make it look static you should be
aware that we might not be able to
interpret the information correctly in
all cases
http://googlewebmastercentral.blogspot.com/2008/09/dynamic-urls-vs-static-urls.html
also worth looking at:
http://googlewebmastercentral.blogspot.com/2009/08/optimize-your-crawling-indexing.html

Categories