I build dynamic websites where structure is hierarchically saved in the database (Own CMS). I am using the Adjacency model to manage this database tables (PHP and Mysql through PDO)
I detected that Google is indexing pages that it should not.
An example of a tree structure used for navigation:
home
about us
products
productgroup 1
productgroup 2
contact
support
sales
Imagine this structure in a pulldown menu with links to the pages. When I select products->productgroup 1 I get a url like www.domain.com/products/productgroup-1 which pulls the data from the database (based on the last uri element: productgroup-1, a slug version of the title) and shows it in my template. I do not query all elements, only the last (I should, I know).
So far so good. Google is indexing this page as expected:
http://www.domain.com/products/productgroup-1
But... When I use Google webmaster tools I see a lot of pages indexed with 404's, like:
http://www.domain.com/products
http://www.domain.com/contact
And so fort.
These pages are empty and have no link in the navigation structure.
I have designed my structure so that these pages return a 404 error. Webmastertools confirms this but keeps indexing these pages. I know I can use robots.txt to disallow Google's search bot to keep it drom indexing url's. Is there another way to do this? Should I generate a 403 instead of a 404?
I am in the dark here.
You should do a few things:
Use 301 Permanent Redirection to direct this empty pages to a relevant page:
Even if Google does not crawl http://www.domain.com/products, some people may still access this link by removing the last segment from the URL from the browser. You probably don't want to show them 404s but some relevant information.
For example, you can redirect http://www.domain.com/products AND http://www.domain.com/products/ to http://www.domain.com/products/productgroup-1
Learn more about 301 redirection from Moz
It is possible to use mod-rewrite to do 301 redirects instead of doing it at code level.
Submit a sitemap to google webmaster tools.
This is a definitive list of URLs in your site.
Having a sitemap will note remove the list of 404 URLs already indexed on Google, but will inform Google of all your "official" URLs in your site and the intended crawl frequency.
Read more from Google webmaster tools here.
Check your HTML code for references to "/products" or "/contact". Googlebot will not be crawling these URLs otherwise.
301 redirection is the best option which you dont want pages and also you can assign those pages in robots.txt page.
Related
Manually you can make PHP pages in your directory.
e.g.
Index.php
About.php
Contact.php
But with PHP frameworks like Laravel the pages do not exist in a file, they are in the database and are called when the user visits the page.
e.g.
If a person visits http://mywebsite.com/contact , the framework will look in the database for a page named 'contact' then output it to a user.
But how does Google (or other search engines) find those pages if they only exist in the database?
Google can index these fine as they are "server-side" generated. Files do not need to exist for Google to be able to index them, just exist at the server-side level.
Where Google has issues indexing is if your site is "client-side" based and uses something like AJAX to pull the content into the browser. A search engine spider can't execute JavaScript so they never find the content. However, Google has defined some guidelines for people to get this content indexed in their Web Masters Guide.
You have a static website address www.domain.com and that is real so once google come to know that there is a website named www.domain.com it will visit the site, now that google crawler is on your website it will look out for the links available on the home page of www.domain.com and hence they will be crawled. Thats simple
In Laravel, pages DO NOT exist in database, although they might be dynamically generated.
As pointed #expodax ,
Google will index LINKS for your web app, and links (URIs) are geneated in accordance with your routes.php file (found in app/Http/routes.php)
In essence, Google will index links / URIs available for the end user, it DOES NOT depend upon how you've organized files in your web app.
For detailed documentation about Routes in Laravel (how they can be generated or used) please check this.
http://laravel.com/docs/5.0/routing
A sitemap is a file where you can list the web pages of your site to tell Google and other search engines about the organization of your site content. Search engine web crawlers like Googlebot read this file to more intelligently crawl your site. more info
If you want to generate a sitemap for your laravel application you can do it manually or you can use a package like this: https://github.com/RoumenDamianoff/laravel-sitemap
Do search engine robots crawl my dynamically generated URLs? With this I mean html pages generated by php based upon GET variables in the url. The links would look like this:
http://www.mywebsite.com/view.php?name=something
http://www.mywebsite.com/view.php?name=somethingelse
http://www.mywebsite.com/view.php?name=something
I have tried crawling my website with a test crawler found here: http://robhammond.co/tools/seo-crawler but it only visits my view page once, with just one variable in the header.
Most of the content on my website is generated by these GET variables from the database so I would really like the search engines to crawl those pages.
Some search engines do, and some don't. Google for one does include dynamically generated pages: https://support.google.com/webmasters/answer/35769?hl=en
Be sure to check your robots.txt file to ensure files you do not want the crawlers to see are blocked, and that files you do want indexed are not blocked.
Also, ensure that all pages you want indexed are linked via other pages, that you have a sitemap, or submit individual URLs to the search engine(s) you want to index your site.
Yes, search engines will crawl those pages, assuming they can find them. Best thing to do is to simply create links to those pages on your website, particularly accessible, or at least traversable from the home page.
Googlebot couldn't crawl this URL because it points to a non-existent page. Generally, 404s don't harm your site's performance in search, but you can use them to help improve the user experience.
this error occur in following urls.
how can i solve it..
check and see which page links to these pages. maybe your website's domain had a previous owner who had a webpage, and there are some inbound links pointing to that old website. this is something you can't control, if thats the case you should do a redirect on these pages to your start page. Do this with your .htaccess file:
ErrorDocument 404 /index.html
some things to check that may produce malformed urls and is under your control:
your paging code in search results and/or categories of products/services/content
your sitemap
i also had a similar experience, in one of my websites i had this scheme:
example.com?category=1
example.com?category=2
example.com?category=3
and in webmaster tools i was getting random strings:
example.com?category=xxcCzxvsd
in my analytics nobody (except googlebot) ever visited example.com?category=xxcCzxvsd. I couldn't find the source of this, so there is a strong chance the problem is in google's side.
I am planning an informational site on php with mysql.
I have read about google sitemap and webmaster tools.
What i did not understand is will google be able to index dynamic pages of my site using any of these tools.
For example if i have URLs like www.domain.com/articles.php?articleid=103
Obviously this page will be having same title and same meta information always but the content will change according to articleid. So how google will come to know about the article on the page to display in search.
Is there some way that i can get google rankings for these pages
A URL is a URL, Google doesn't give up when it sees a question mark in one (although excessive parameters may get ignored, but you only have one). All you need is a link to a page.
You could alternatively make the url SEO friendly with mod_rewrite www.domain.com/articles/103
RewriteRule ^articles/(.*)$ articles.php?articleid=$1 [L]
I do suggest you give each individual page relevant meta tags no more then 80 chars and dont place the article content within a table tag as googles placement algorithm is strict, random non related links will also do harm to the rank.
You have to link to the page for Google to notice it. And the more links you have the higher up in Google's result list your page will get. A smart thing to do is to find a page where you can link to all of your pages. This way Google will find them and give them a higher ranking than if you only link to them once.
I have a classifieds website.
It has an index.html, which consists of a form. This form is the one users use to search for classifieds. The results of the search are displayed in an iframe in index.html, so the page wont reload or anything. However, the action of the form is a php-page, which does the work of fetching the classifieds etc.
Very simple.
My problem is, that google hasn't indexed any of the search results yet.
Must the links be on the same page as index.html for google to index the Search Results? (because it is currently displayed in an iframe)
Or is it because the content is dynamic?
I have a sitemap which works, with all URLS to the classifieds in the sitemap, but still not indexed.
I also have this robots.txt:
Disallow: /bincgi/
the php code is inside the /bincgi/ folder, could this be the reason why it isn't being indexed?
I have used rewrite to rewrite the URLS of the classifieds to
/annons/classified_title_here
And that is how the sitemap is made up, using the rewritten urls.
Any ideas why this isn't working?
Thanks
If you need more input let me know.
If the content is entirely dynamic and there is no other way to get to that content except by submitting the form, then Google is likely not indexing the results because of that. Like I mentioned in a comment elsewhere, Google did some experimental form submission on large sites in 2008, but I really have no idea if they expanded on that.
However, if you have a valid and accessible Google Sitemap, Google should index your classifieds fine. I suggest to use the Google Webmaster Tools to find out how Google treats your site and to diagnose any potential problems with crawling.
To use ebay is probably a bad example as its not impossible that google uses custom rules for such a popular site.
Although it is worth considering that ebay has text links to categories and sub categories of auction types, so it is possible to find auction items without actually filling in a form.
Personally, I'd get rid of the iframe, it's not unreasonable when submitting a form to load a new page.
that question is not answerable with the information given, to many open detail questions. if you post your site domain and URLs that you want to get indexed.
based on how you use GWT it can produce unindexable content.
Switch every parameters to GET
Make html links to those search queries on "known by Googlebot" webpages
and they'll be index