Google Index: Using final URL or Link - php

This may seem like a basic question (and I've searched for the answer), but I'm wondering if anyone knows whether Google indexes using the final URL displayed (in the address bar), or just the link it used to get there (or both).
A client has a Drupal 7 website with product category links in friendly URL format, e.g.
website.com/product/cat/bedroom
...which is perfectly fine for the primary Menu, but when using the filtering menu I would like those links to be "standard" PHP Queries e.g.
/?q=product/cat/bedroom/cat/bathroom
This can be achieved quite easily as Drupal (and obviously PHP) already accept the query and just requires editing of the filter block's link. However, when landing on that filtered page the URL will drop the query and revert to the friendly URL. My question is, will Google index that URL, or the link's query URL - or both?
The ultimate goal is to block Google from indexing the q parameter, meaning it won't dive deep into filtering, which has resulted in tens of thousands of useless indexes, and only have the pages linked directly to with friendly URLs.

Related

How to handle Google indexing 'pages' that not exists

I build dynamic websites where structure is hierarchically saved in the database (Own CMS). I am using the Adjacency model to manage this database tables (PHP and Mysql through PDO)
I detected that Google is indexing pages that it should not.
An example of a tree structure used for navigation:
home
about us
products
productgroup 1
productgroup 2
contact
support
sales
Imagine this structure in a pulldown menu with links to the pages. When I select products->productgroup 1 I get a url like www.domain.com/products/productgroup-1 which pulls the data from the database (based on the last uri element: productgroup-1, a slug version of the title) and shows it in my template. I do not query all elements, only the last (I should, I know).
So far so good. Google is indexing this page as expected:
http://www.domain.com/products/productgroup-1
But... When I use Google webmaster tools I see a lot of pages indexed with 404's, like:
http://www.domain.com/products
http://www.domain.com/contact
And so fort.
These pages are empty and have no link in the navigation structure.
I have designed my structure so that these pages return a 404 error. Webmastertools confirms this but keeps indexing these pages. I know I can use robots.txt to disallow Google's search bot to keep it drom indexing url's. Is there another way to do this? Should I generate a 403 instead of a 404?
I am in the dark here.
You should do a few things:
Use 301 Permanent Redirection to direct this empty pages to a relevant page:
Even if Google does not crawl http://www.domain.com/products, some people may still access this link by removing the last segment from the URL from the browser. You probably don't want to show them 404s but some relevant information.
For example, you can redirect http://www.domain.com/products AND http://www.domain.com/products/ to http://www.domain.com/products/productgroup-1
Learn more about 301 redirection from Moz
It is possible to use mod-rewrite to do 301 redirects instead of doing it at code level.
Submit a sitemap to google webmaster tools.
This is a definitive list of URLs in your site.
Having a sitemap will note remove the list of 404 URLs already indexed on Google, but will inform Google of all your "official" URLs in your site and the intended crawl frequency.
Read more from Google webmaster tools here.
Check your HTML code for references to "/products" or "/contact". Googlebot will not be crawling these URLs otherwise.
301 redirection is the best option which you dont want pages and also you can assign those pages in robots.txt page.

Strategy for permanent links (not wordpress)

I'm building a little database driven PHP CMS. I'm trying to figure the best strategy for this case scenario:
I have a URL like this:
http://www.my.com/news/cool-slug
Someone saves or share this URL (or it gets indexed by Google).
Now I realize that the slug is not quite right and change it to:
http://www.my.com/news/coolest-slug
Google and users who previously saved the URL will hit a 404 error.
Is this the best and common solution (showing the 404) or should I keep a table in my database with all the history of the generated URLs mapped to the ID of the page and redirect with a 301 header?
Will this be an unnecessary load on my system (this table can get lots of records...)?
One very common solution used by many sites (including StackOverflow as far as I can tell) is to include the ID in the URL. The slug is just here for SEO/beauty/whatever, but is not used to identify the page.
Example: http://stackoverflow.com/questions/27877901/strategy-for-permanent-links-not-wordpress
As long as you have the right ID, it doesn't matter what slug you use. The site will just detect that the slug is wrong, and generate a redirect to the right one. For example, the following URL is valid:
http://stackoverflow.com/questions/27877901/old-slug
If for some reason you do not want the ID in the URL, then either you forbid changes (many news site do that: you can notice that sometimes the slug and the title of a news article do not match), or your have to live with the occasional 404 when the slug changes. I've never seen any website having a system to keep the slug history, as it can be quite annoying (you won't be able to "reuse" a slug for example).

How to make search engines index search results on my website?

I have a classifieds website.
It has an index.html, which consists of a form. This form is the one users use to search for classifieds. The results of the search are displayed in an iframe in index.html, so the page wont reload or anything. However, the action of the form is a php-page, which does the work of fetching the classifieds etc.
Very simple.
My problem is, that google hasn't indexed any of the search results yet.
Must the links be on the same page as index.html for google to index the Search Results? (because it is currently displayed in an iframe)
Or is it because the content is dynamic?
I have a sitemap which works, with all URLS to the classifieds in the sitemap, but still not indexed.
I also have this robots.txt:
Disallow: /bincgi/
the php code is inside the /bincgi/ folder, could this be the reason why it isn't being indexed?
I have used rewrite to rewrite the URLS of the classifieds to
/annons/classified_title_here
And that is how the sitemap is made up, using the rewritten urls.
Any ideas why this isn't working?
Thanks
If you need more input let me know.
If the content is entirely dynamic and there is no other way to get to that content except by submitting the form, then Google is likely not indexing the results because of that. Like I mentioned in a comment elsewhere, Google did some experimental form submission on large sites in 2008, but I really have no idea if they expanded on that.
However, if you have a valid and accessible Google Sitemap, Google should index your classifieds fine. I suggest to use the Google Webmaster Tools to find out how Google treats your site and to diagnose any potential problems with crawling.
To use ebay is probably a bad example as its not impossible that google uses custom rules for such a popular site.
Although it is worth considering that ebay has text links to categories and sub categories of auction types, so it is possible to find auction items without actually filling in a form.
Personally, I'd get rid of the iframe, it's not unreasonable when submitting a form to load a new page.
that question is not answerable with the information given, to many open detail questions. if you post your site domain and URLs that you want to get indexed.
based on how you use GWT it can produce unindexable content.
Switch every parameters to GET
Make html links to those search queries on "known by Googlebot" webpages
and they'll be index

How to redirect a Google search result to a dynamic Web page?

I'm trying to enter a list of items into Google Base via an XML feed so that, when a user searches for one of these items and then clicks the search result link in Google Base (or plain Google), the user is directed to a dynamic Web page on my Web site. I'm assuming that the only way to specify a specific link (either static or dynamic) is through the attribute in the XML feed. Is that correct? So, for example, if my attribute is:
http://www.example.com/product1-info.html
the user will be directed to the product1-info.html page.
But if, instead of a static product page, I want to have the user redirected to a dynamic page that generates search results from my local database (on my Web site) for all products containing the keyword "product1", would I be able to do something like this?:
http://www.example.com/products.php?productID=product1
Finally, and most importantly, is there any way to specify this landing page (or any specific landing page) from a "regular" Google search? Or is it only possible via Google Base and the attribute? In other words, if I put a bunch of stuff into Google Base, if any of it shows up in a regular Google search, is there a way for me to control what parameters get passed to the landing page (and thus, what search is performed on the landing page), or is that out of my control? I hope I explained this correctly. Thanks in advance for any help.
first question: Yes, urls containing a query_string part are allowed.
http://base.google.com/support/bin/answer.py?hl=en&answer=78170 says:XML example:
<link>http://www.example.com/asp/sp.asp?cat=12&id=1030</link>
--
Let me rephrase the second question to see if I understand it correctly (might be completely on the wrong track): E.g. products.php?productID=product1 performs a db-search for the product "FooEx" and products.php?productID=product2 for "BarPlus". Now you want google to show the link .../products.php?productID=product1 but not ....?productId=product2 if someone searched for "FooEx" and google decided that your site is relevant? Then it's the same "problem" we all face with search engines: communicate what each url is relevant for. I.e. e.g. have the appropriate (and only the appropriate) keywords appear in the title/h1 element of the page, avoid linking to the same contents with different urls (e.g. product.php?x=1&productId=1 <-> product.php?productId=1&x1, different urls requesting most probably the exact same contents), submit a sitemap, and so on and on....
edit:
and you can avoid the query-string part all together by using something like mod_rewrite (e.g. the front controller for the zend framework makes use of it) or by parsing the contents of $_SERVER["PATH_INFO"] (this requires the webserver to provide that information), e.g. http://localhoast/test.php/foo/bar -> $_SERVER['PATH_INFO']=='/foo/bar'
Also take a look at the link to this thread: How to redirect a Google search result to a dynamic Web page?, it contains the title of the thread, but SO is perfectly happy with How to redirect a Google search result to a dynamic Web page?, too. The title is "only" additional data for search engines and (even more) the user.
You can do the same:
http://www.example.com/products.php/product1/FooEx <-> http://www.example.com/products.php/product2/BarPlus

How to Have Search Engines Index Database-Driven Content?

How can I make it so that content from a database is available to search engines, like google, for indexing?
Example:
Table in mysql has a field named 'Headline' which equals 'BMW M3 2005'.
My site name is 'MySite'
User enters 'BMW M3 2005 MySite' in google and the record will show up with results?
Google indexes web pages, so you will need to have a page for each of your records, this doesn't mean to say you need to create 1,000 HTML pages, but following my advice above will dynamically / easily provide a seemingly unique page for each product.
For example:
www.mydomain.com/buy/123/nice-bmw-m3-2005
You can use .htaccess to change this link to:
www.mydomain.com/product.php?id=123
Within this script you can dynamically create each page with up-to-date information by querying your database based on the product id in this case 123.
Your script will provide each record with it's own title ('Nice BMW M3 2005'), a nice friendly URL ('www.mydomain.com/buy/123/nice-bmw-m3-2006') and you can include the correct meta information too, as well as images, reviews etc etc.
Job done, and you don't have to create hundreds of static HTML pages.
For more information on .htaccess, check out this tutorial.
What ILMV is trying to explain is that you have to have HTML pages that Google and other search engines can 'crawl' in order for them to index your content.
Since your information is loading dynamically from a database, you will need to use a server-side language like PHP to dynamically load information from the database and then output that information to an HTML page.
You have any number of options for how to accomplish this specifically, ILMV's suggestion is one of the better ways though.
Basically what you need to do first is figure out how to pull the information from the database and then use PHP (or another server-side language) to output the information to an HTML page.
Then you will need to determine whether you want to use the uglier, default url style for php driven pages:
mysite.com/products.php?id=123
But this url is not very user or search engine friendly and will result in your content not being indexed very well.
Or you can use some sort of URL rewriting mechanism (mod_rewrite in a .htaccess file does this or you can look at more complex PHP oriented solutions like Zend Framework that provide what's called a Front Controller to handle mapping of all requests) to make it so that your url's look like:
mysite.com/products/123/nice-bmw-m3-2006
This is what ILMV is talking about with regard to url masking.
Using this method of dynamically loading content will allow you to develop a single page to load the information for a number of different products based on the Id thus making it seem to the various search engines as though you have a unique page for each product.

Categories