Website localization when already indexed on search engines - php

I found lots of topics on how to localize a website and most common solutions have been adding subdomains or creating different subdirectories(eg. "/en/"). However I could not find anything that worried about loosing Google indexing for sites that originally were localized for only one language.
Since now, Google managed to index pages like this:
http://website.com/threads/this-is-the-title/11111
Whenever I decide to opt for localizations in different sub-directories, it would be:
http://website.com/en/threads/this-is-the-title/11111
What will happen to the hundreads of pages index by Google? Can you help me figuring out a solution to localize a website without having trouble with Google?
What I found that partially solves the problem
Hreflang: https://support.google.com/webmasters/answer/189077?hl=en
This would work except for the fact that I'll find myself with localization on two different folder levels:
/
...files of already index content
/en
...files of the second language
Update:
Current htaccess file:
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^threads/(?:([a-zA-Z0-9-_]+\/)?)([0-9]+)$ thread.php?qid=$2 [QSA,L]
...other
I turned that line into:
RewriteRule ^(?:([a-zA-Z]+\/)?)threads/(?:([a-zA-Z0-9-_]+\/)?)([0-9]+)$ $1/thread.php?qid=$3 [QSA,L]
This is not enough, since it does not redirect to a localized sub-directory.

Use .htaccess and mod-rewirte to tell google that the url has moved to another uri
for example
RewriteEngine On
RewriteRule ^oldsite\.html$ /newsite.html [R=301,L]
And google is fine with you

Related

Make website appear on Google regardless of PHP routing

Over the past few days, I have been working on a new website. As of now, I have chosen to go with something similar to that of an MVC: I am using PHP for routing to other pages depending on a value retrieved by GET, changing some settings inside the .htaccess.
When I search for my website on Google, I find three links: one for the website and two for different subdomains:
www.example.comsub.example.comone.example.com
The structure of the links on my website looks like this, because of the routing:
www.example.com/test/bedev.example.com/that/this
This is my .htaccess in case you need it:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.+)$ index.php?url=$1 [QSA,L]
RewriteBase /
All the different parts of the website are linked together with a href, but I worry that may not be enough to make it fully SEO.
Is this a problem going to be a problem? Is there anything I can do about it?
Edit: what I am asking is whether or not this type of layout will be a problem regarding SEO. If that is the case, my follow-up question is what way I would go about fixing it, which most certainly would be a programming-related task (and so the question is not off-topic).

Google indexing two URLs for the same page - with and without index.php

I am new to the trade of Web development. Currently working on five content based websites of a customer. The sites are designed using Laravel 4 and use a shared hosting with no access to configurations. I am required to remove 'index.php' from all the article pages and make them clean URLs for SEO purpose. I am facing two problems. The problems seem to be similar in nature hence quoting both in same question.
For Addon domains - I changed the document root of the domains to the 'public' folder of Laravel. Then added the code below in .htaccess file to make the URLs cleaner without the 'index.php' part
Options -MultiViews
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /index.php?/$1 [L]
Though the code works fine if URLs are entered directly to browser's address bar. The problem is that now when I search my site in Google.com using site:mysite.com it shows some of the pages without the index.php part in the URL while some of the pages with index.php in the URL. The worst thing is that some of the the pages appear twice - once with index.php in URL and second time without index.php
e.g. the search result would contain pages with URLS like this -
www.mysite.com/index.php/article-1-content
www.mysite.com/article-1-content
If I am not wrong this would result in duplicate content. Moreover when I open the article page through the URL having index.php, all the other URLs on that page - like internal links and side bar articles also appear with index.php in their URL. What should I do to completely remove index.php from URLs and google index. Please help.
Sorry for making the question too long but my second problem seems to be of similar nature -
For primary Domain - I have added whole of the site code in public_html and then added this piece of code to .htaccess file in public_html to change the document root
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^primarydomain.com$ [NC,OR]
RewriteCond %{HTTP_HOST} ^www.primarydomain.com$
RewriteCond %{REQUEST_URI} !public/
RewriteRule ^(.*)$ public/$1 [L]
and also code to remove index.php part in .htaccess file of public folder - (same as that in addon domains ). The problem is that on searching in Google each and every article page appear twice with URLs looking something like this
www.primarydomain.com/article-1-content
www.primarydomain.com/public/index.php/article-1-content
google is indexing the URL with 'public/index.php' part as well. What should I do to completely get rid of these not so clean URLs.
Thanks for tolerating such a long question :) any help would be appreciated.
Regards.
This sounds like a job for a canonical tag.
As long as the URL loads, the canonical tag will do the rest (ie you can specificy which URL to use).
https://support.google.com/webmasters/answer/139066?hl=en

How to create Clean URL for every page using .Htaccess?

I'm currently in the progress of creating a huge website, but instead of the regular URLs I'd like to use Clean / User Friendly URLs. I have been searching on how could I basically tailor these Apache Mod Rewrite rules for my needs, howere I could not found any solution for my particular problem.
Below you can read the aim, which I'd like to achieve with the URLs (I'm not going to write the domain name each time, just imagine: http://www.example.com ahead of the URL parts).
/register/ OR /register ---> /register.php (It should support both of the variations.)
I actually have more files for the registration and I'd like them to be accessible using the "part" words like:
/register/part1/ OR /register/part1 ---> /register.php?part=1 (It should support both of the variations.)
Also, what if I have more than just one query varialbe? (Like "personal=1")
/register/part1/personal/ OR /register/part1/personal ---> /register.php?part=1&personal=1
And what if I have many more of these queries, but I CAN'T specify all of them before? Any of these can be entered. (Like "thing,name,job,etc")
/register/part1/personal/Nicky/ OR /register/part1/personal/Nicky ---> /register.php?part=1&personal=1&name=Nicky
OR any kind of variations you can imagine:
/register/part1/personal/thing/employee/ OR /register/part1/personal/thing/employee ---> /register.php?part=1&personal=1&thing=1&job=employee
EDIT:
This is what I've tried yet, but it just redirects the pages to index.php :/
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
So I have given you a lot of examples, what I'd like to basically achieve. I can have a lot of other pages besides "register.php" so it shouldn't be specific to that page only. I also want that which is VERY important, that IF someone goes to for example: register.php?part=1 it should redirect them to the appropriate Clean URL (of course in PHP).
I would also want to ask what should I do in the PHP end to make everything good? I saw that Wordpress has a really great solution for this, which is pretty automatic, and it looks great!
Is there any ways that someone could please explain me how to create a great .HTACCESS mod_rewrite solution for this? I would be really-really glad!
Please do not mark this question as duplicate, because I really did not found anything specific for my case.
You mentioned WordPress, which has something like this:
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
What this does is redirect any request that doesn't match a real file or directory to the index.php script. That script will then decide what page to display. It will do so by looking into $_SERVER['REQUEST_URI'] which holds a path like register/part1.
It would be easier for you to figure out what page to show using this method, because in PHP there are many ways to parse that path string, then map it to a function
You should be able to construct clean URLs like this from your htaccess:
RewriteEngine On
RewriteRule ^index\.html$ /index.php?pageID=Home [L]
RewriteRule ^about-my-homepage\.html$ /index.php?pageID=About [L]
RewriteRule ^contact-us\.html$ /index.php?pageID=Contact [L]
the first is the one you want to output (the "clean" URL), the second one the one you actually want to open. Good Luck!

How to redirect all API requests using .htaccess, while keeping asset requests intact?

TL; DR: I would like to hit the index-api.php file if api is found in the URL, but then simply keep all other requests pointing to the site/dist directory as if it were the 'root' of the site.
So, I've spent way too many hours on this and trust me, I've dug through all of the resources for mod_rewrite. I guess I'm just not quite understanding and figured I'd ask on here.
What I want to do, in theory, seems simple. I'm building a single page application (Angular App) using Grunt, outputting that to a the root of a WordPress install. The WordPress install is simply serving up an API using the WordPress JSON API plugin, so I want the root of the site to hit my Grunt directory (located at site/dist/index.html), but all requests to siteurl.com/api to hit the index.php file and proceed normally.
Keep in mind I have other assets / images located in this site/dist directory, so ideally, it would be awesome if all requests to the site root would simply use this folder as the "base" of the site (e.g. a request to siteurl.com/images/testimage.jpg pulls from site/dist/images/testimage.jpg).
I feel like I'm onto something here and am surprised I couldn't find anything that directly tackles this issue.
What I've done now is renamed the index.php from WordPress to index-api.php and left it the same:
index-api.php:
<?php
define('WP_USE_THEMES', true);
/** Loads the WordPress Environment and Template */
require('./wordpress/wp-blog-header.php');
// phpInfo();
.htaccess:
<ifModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^api/(.*)$ index-api.php [L]
RewriteRule (.*)$ site/dist/index.html [L]
</ifModule>
I tried a myriad of other efforts from a few posts trying to get this working, and it seems to me like it should work fine. The funny thing is, if I comment out the last line RewriteRule (.*)$ site/dist/index.html [L] the api request works normally as expected, so I know I'm close.
Any suggestions?
Would appreciate anyone's help on this, it's been really confusing!
In the first place you'll need to make sure that requests made to /index-api.php are not matched and rewritten by the second rule. In the second rule you can use $1. $1 will be replaced with whatever was matched in the first capture group. We'll also need to make sure that the second rule will not match what it rewrites, or we'll end up with an infinite loop and an internal error.
You can use the $1 in the first rule too, as I show below:
RewriteRule ^api/(.*)$ index-api.php?url=$1 [L]
RewriteCond %{REQUEST_URI} !^/site/dist/
RewriteCond %{REQUEST_URI} !^/index-api\.php
RewriteRule (.*)$ site/dist/$1 [L]
I recommend reading the documentation of mod_rewrite to get a better understanding how you can use it and what things you have at your disposal while rewriting url's.

htaccess mod_rewrite on subdomain with subdirectories

I know there are TONS of questions on this website regarding this topic, as well as TONS of tutorials around google. However I have searched for a while now and I cannot seem to find one that explains my specific situation.
I am building a social application in which the whole application is being stored in a directory of a subdomain the website like so: subdomain.example.com/network/ <-- network being the directory of all the application files.
I have my .htaccess file located in the root of the subdomain and as of right now I have it removing .php from all the files and that is working throughout the entire project as it should. I have a user.php page which each user's profile is based off and the unique identifier for each user is their username (not 'id' like most people use.) So right now the urls look like /network/user.php?username=TylerB which is not very pretty. I have tried many different things to get this to look like /network/user/TylerB but nothing I try seems to work. I don't know if it's because it's in a directory, a subdomain or what. As of right now, when I have /network/user/TylerB (TylerB is my username) it gives me a 404 error.
Here is my current .htaccess file:
RewriteEngine On
RewriteRule ^user/([^/\.]+)/?$ user.php?username=$1 [L]
RewriteRule ^([a-z]+)/([a-z\-]+)$ /$1/$2.php [L]
I would recommend setting the RewriteBase directive and matching on the full pattern, making sure to rewrite to the /network/ directory.
RewriteEngine On
RewriteBase /
RewriteRule ^network/user/([^/\.]+)/? /network/user.php?username=$1 [L]
RewriteRule ^([a-z]+)/([a-z\-]+)$ /$1/$2.php [L]
Using the testing tool at http://htaccess.madewithlove.be/ I get an output of:
This rule was met, the new url is http://example.com/network/user.php?username=name
The tests are stopped because the L in your RewriteRule options

Categories