In building a site in PHP, I have found that the URL is capable of having extra info that doesn't belong, i.e.
http://www.mydomain.com/index.php/extrainformation
I've read about it being apart of $_SERVER['PATH_INFO'] but need to find a way to stop this information from being displayed as it is showing up in results of Google searches. Is this something I can prevent by adding a condition in my .htaccess file?
Any insight?
That information is technically a valid URL even if it is ignored by your web page. So if a search engine like Google finds a URL, probably through a link, that contains that extra information, and it pulls up a valid web page, they will display it in their results.
You can solve this a few ways:
Use canonical URLs to specify the proper URL without the extra information
Do a 301 redirect to the URL without the garbage information if it is appended to a URL
Return an error (HTTP 40x) that the URL is invalid
All three will prevent Google from indexing pages with those kind of URLs
Those look like Apache's multiviews. Add this to your htaccess file:
Options -MultiViews
Related
I've been reading about redirection, and how it can affect (or not if done properly) SEO.
I'm changing my website's content platform from Drupal to a PHP custom made code.
In my current site I have two links that point to the same link like this:
.../node/123
.../my-node-title
Mainly because Drupal allows you to create a custom-made links, so every article has a default one (node/123) and the custom-made one (/my-node-title).
My question is about what to do in order to prevent losing any SEO that each link may have.
In the new website all articles are structured like this: content.php?id=123
I've stored in the database the custom-made link of every article.
Instead of doing a 301 redirect I'm redirecting all links that do not exist to be redirected to redirect.php page to process the request. There I take the string from the link, look for it in the database and redirect the user.
The process is like this:
in .htaccess file:
RewriteRule ^.*$ ./redirect.php
In redirect.php:
I grab the $_SERVER['REQUEST_URI'] and using explode() I get the last part of the link (ie. my-node-title), look for it in the database and grab the ID of the article (ie. 123) and save it in a $link variable.
Then I use header() function and do the redirect: header('Location: '.$link);
So, people still click on .../my-node-title but when the article loads at the navigation bar appears /content.php?id=123
I would like to know your comments about this solution. I know that with SEO there are not fixed rules, or certainty in anything, but I would like to know if what am I doing is acceptable. Thanks!
Your SEO strategy should not only focus on discoverability of your pages, but also take proper UX into account. Having a user follow /some-link/, and then landing on /index.php?page_id=123 may disorient them.
As for saving your ranking, a 302 redirect (which is what the 'Location' header does in PHP), will not affect PageRank, according to Google. I have no information on how it might adversely affect other ranking signals. You would probably do good to specify a canonical URL for all distinct links that point to the same resource.
Also, be aware that your algorithm won't work, if query parameters are present. You might also want to look at properly handling optional trailing slashes.
Ideally, in my opinion, you would want to provide consistent URLs to the outside world, without any need for redirection. Your URL handling would then internally resolve them to their respective resources, serving the canonical URL on every page load.
I have a page where I am loading up a JSON file and matching data based on a users search.
The caveat however, is that I want to have really clean URLs for these results without actually making a new page for them. (For the life of me I don't know what the terminology for this is)
So when a user goes to website.com/names/adrian it will just land on /names/ and load the data based on "adrian".
You can do that with apache's rewrite rule:
RewriteEngine on
RewriteRule ^(names\/[a-z0-9A-Z-_]+)$ names.php?name=$1
Add this to your .htaccess file
It will send example.com/names/aName as get request to names.php.
And you can get that with $_GET["name"]; in names.php
By the way, you can see regex result in here: https://regexr.com/415mq
I need to do this using htaccess
When a request is made for http://www.example.com/home, it should (internally)load the page http://www.example.com/home_st.php
Similarly when a request is made to other link called products (http://www.example.com/products), it should (internally)load http://www.example.com/products_st.php
In short what it is doing is appending "_st.php" and loading that URL. But one thing I do not want is if the user directly types http://www.example.com/home_st.php or http://www.example.com/products_st.php in the browser, it should show 404 / Page not found error
I have few other pages in that folder and I want those pages to behave in this manner. I understand the htaccess should have something like this
Turn on the rewrite
Forbid access if the URL is called with page names like home_st.php, products_st.php etc.
If it's "home" or "products", then rewrite(append?) it to home_st.php and products_st.php respectively. I have other files too while need to follow the same
P.N: My URL should not show the actual filename, for example home_st.php, products_st.php etc. It should only show as http://www.example.com/home, http://www.example.com/products etc
htaccess and regex is not something that I am well acquainted with. Any help would be great. Thanks
You want to be able to re-write URL's
This has been written before but i'll say it again.
You want to use the Htaccess file and the re-write rule.
RewriteEngine On # Turn on the rewriting engine
RewriteRule ^pet-care/?$ pet_care_info_01_02_2008.php [NC,L] # Handle requests for "pet-care"
This will make this url: http://www.pets.com/pet_care_info_07_07_2008.php
Look like this: http://www.pets.com/pet-care/
Links to more information: How to make Clean URLs
and for the webpage I used to reference this information from: https://www.addedbytes.com/articles/for-beginners/url-rewriting-for-beginners/
To get your pages set the way you want them, I would advise that you read the two articals and try get what you are looking for.
If you need some specific help with doing this, ask.
Hope that helps.
Google has picked up around 30,000 404 URL's in Google Webmaster Tools and I've been manually redirecting a lot of them and using some regex/explode() to cope with some of them but I can't seem to make rules that cover it all.
When a 404 is about to occur I would like my 404 php script to check my table of existing urls for the closest match and redirect to it.
For example if the bad url is "http://www.example.com/category-somenonexistingpart-someactuallyexistingpart-somejibberish-1234.htm" I would like my database to return the existing url that resembles the bad url the most = "http://www.example.com/category-someactuallyexistingpart.htm"
Can this be done? Is it a good or bad idea?
First, create a 404.php page and have .htaccess redirect all 404's here. Then, using $_SERVER ( can't remember the specific key ), you can parse the requested URI, split it, then search the DB... you may notice an increased load on the DB, however.
This is a noob question I belieive, in a content management system as well as several other types of sites that work on submissions, once you submit a URL in a URL shortening website for instance, how do you use PHP to redirect to the appropriate URL without a 404 or without using an htaccess.
Based on what I've found in simple url shortening scripts online, an htaccess is always used to redirect 404s to a PHP file which process the URL and goto the specific page, how do you do this without an htaccess?
Another example would be any blog software, once you submit a post, if you goto the specific URL it retrieves the appropriate post without the use of an htaccess.
I hope I'm being clear, thanks.
You are talking about two different concepts here. One is "url rewriting" the other is "redirection".
Url rewriting is the process of transforming one URL into another, and it may involve or not redirection. This happens server-side, before PHP kicks in. In fact, PHP is not aware of anything. This is performed as htaccess directives. What you obtain is usually the transformation of a complex nested url into a simple url with query.
For example: /blog/2010/10/30 rewritten to blog.php?year=2010&month=10&day=30
This is a beautification, in the sense that PHP responds to the second URL, and you could skip entirely the url rewriting, which is just for the sake of search engines and URL usability.
All of this happens before PHP starts. Then PHP could make its own redirections, and this is done using a call to header("Location: ..."), or a redirection through javascript or as html meta header.
None of this involves any 404.