SEO questions before launching new web site - php

I'm launching this big database (1.5+ million records) driven website and I want to know some SEO tips before..
Which links I need to tag as rel="nofollow", rel="me", etc?
How to prevent search engines to follow links that are meant to users only? Like 'login', 'post message', 'search', etc.
Do I need to prevent search engines from entering the 'search' section of the site? How to prevent it?
The site is basically a database of movies and actors. How to create a good sitemap?
I need to prevent search engines form reading user comments and reviews???
Another robots.txt or .htacces configuration is needed?
How to use noindex the right way?
Additional tips?
Thanks!

If you just have internal links, no reason to make them nofollow
Make them buttons on forms with method="post" (that's the correct way to do it anyway)
Don't think you need to do that.
Perhaps see how IMDb does it? I'd consider just listing all actors and all movies in some sort of a sensible manner or something like that.
Why would you need to do that?
Depending on whether you want to block something (via robots.txt) or need .htaccess for something else
No idea
Remember to use semantic HTML - use h1's for page titles and so on.

Use nofollow when you don't want your linking to a page to give it additional weight in Google's pageRank. So, for example, you'd use it on links to user homepages for comments or signatures. Use me when you are linking to your other "identities", e.g. your facebook page, your myspace account, etc.
robots.txt allows you to give a set of rules to webcrawlers on what they can or can't crawl and how to crawl. nofollow also tells Google not to crawl a link supposedly. Additionally, if you have application queries that are non-idempotent (cannot be safely called multiple times), then they should be POST requests—these include things like news/message/page deletions.
Unless your searches are incredibly database-intensive (perhaps they should be cached) then you probably don't need to worry about this.
Google is intelligent enough to figure out a sitemap that you've created for your user. And that's the way you ought to be thinking instead of SEO; E.g. how can I make my site more usable/accessible/user-friendly—all of which will indirectly optimize your site for search engines. But if you want to go the distance, there are semantic sitemap technologies you can use, like RDF sitemaps or XML sitemaps. Also, Google Webmasters Tools offers site map creation.
No, why would you want to hide content from the search engine? Probably 90% of StackOverflow's search engine referrals are from user-generated content.
What? Configure your web server for people, not search engines.
This is easy to find the answer to.
Don't make your site spammy, such as overloading it with banners or using popup ads; use semantic markup (H1, H2, P, etc.); use good spelling/grammar; use REST-style URLs (even if it's not a RESTful application); use slugs to hide ugly URI-encoding; observe accessibility standards and guidelines; and, most importantly, make your site useful to encourage return visits and backlinks—that is the most sure fire way of attaining good search ranking.

Related

How to show HTML pages instead of Flash to search engines

Let's say I have a plain HTML website. More than 80% of my visitors are usually from search engines like Google, Yahoo, etc. What I want to do is to make my whole website in Flash.
However, search engines can't read information from Flash or JavaScript. That means my web page would lose more than half of the visitors.
So how do I show show HTML pages instead of Flash to the search engines?
Note: you could reach a specific page/category/etc in Flash by using PHP GET function, for example: you can surf trough all the web pages from the homepage and link to a specific web page by typing page?id=1234.
Short answer: don't make your whole site in Flash.
Longer answer: If you show humans one view and the googlebot another, you are potentially guilty of "cloaking". If the Google Gods find you guilty, you will be banned to the Supplemental Index, never to be heard from again.
Also, doing an entire site in Flash breaks the basic contract of the web, namely that you can link to specific content from other sites or in emails. If your site has just one URL and everything else is handled inside of Flash ... well, I don't know what you have, but it isn't a website anymore. Adobe may like you, but many people will not. Oh, and Flash is very unfriendly to people with handicaps.
I recommend using Flash where it is needed (videos, animations, etc.), but make it part of an honest-to-God website.
What I want to do is to make my whole
website in Flash
So how to accomplish this: show HTML
pages instead of Flash?
These two seem a bit contradictory.
Important is to understand the reasoning behind choosing Flash to build your entire website.
More than 80 percent of my visitors
are usually from search engines
You did some analysis but did you look at how many visitors access your website via a mobile device? Because apart from SEO, Flash won't serve on the majority of these devices.
Have you considered HTML5 as an alternative for anything you want to do with Flash?
Facebook requires you to build applications in Flash among others but html, why? I do not know, but that is their policy and there has got to be a reason.
I have been recently developing simple social applications in Flash (*.swf) and my latest app is a website in flash that will display in tab of my company webpage in Facebook; at the same time, I also want to use that website as a regular webpage on the internet for my company. So, the only way I could find out to display html text within a flash file is by changing the properties for the text wherever I can in CHARACTER to "Render text as HTML", look for the symbol "<>". I think that way the search engines will be able to read your content and process your website accordingly. Good luck.
As you say that you can reach the Flash page by get variable using page ID or any other variables. So its good. I hope you will add Flash in each HTML page. Beside this, you can add all other HTML contents in hidden format. So the crawlers could reach the content and your site will look-up in Flash. Isn't it?
Since no-one actually gave you an straight answer (probably because your question is absolute face-palm-esque), i'll try:
Consider using the web-development approach called progressive enhancement. Now, it's fair to say that it probably wasn't intended for Flashification of a website, but you can make use of it's principles.
Start with your standard HTML version of your website
Introduce swfobject to dynamically (important bit) swap out the HTML content for it's Flash equivalent
Introduce swfaddress to allow for deep linking into your Flash movies (pseudo-URLs)
Granted, steps 2 and 3 are a little more advanced that how i've described them and your site size/structure/design may not suit this approach, but at least it's an answer.
All that being said, I agree with the other answers/comments about the need for using Flash to display your entire site - there's very very very few reasons anyone would do that, and there's more reasons than already added as to why not to (iOS devices etc)...

Do search engines read/apply get variables?

Lets say we have index.php with the following links.
Home>
Contact
Followed by the following dynamic content...
<div id="content">
<?php inlclude "./content/" . $_GET['page'] . ".php"; ?>
</div>
I am in the process of creating my own light weight CMS and i'm wondering if search engines will crawl through these links with the get variables and pull/index the content. I also plan on controlling my meta-content in a similar fashion.
Do Search Engines read/apply get variables?
They surely will. Else they'd miss most of the dynamic content on the web not using nice urls ;)
Search engines will scan a webpage for hyperlinks and store any unique locations that they come across. index.php is a different location than index.php?q=home is a different location than index.php?q=about.
Unless of course you've told the search engines not to scan that page with a robots.txt file.
Back in the early days of search engines, the answer was no. Nowadays search engines are smarter and are for the most part able to differentiate pages even with the same root pagename.
However, it will definitely be better to use a RESTful application design, and that will entail using mod_rewrite or some other technique to make your URLs more transparent. Given that you're in the planning stages of creating the CMS, I would definitely read up on how to implement REST in your program, avoiding the problem entirely.

How to make website more SEO friendly when the content is coming from the database? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
How can i make my database content search-able on SE,
so basically how to make a website more SEO friendly where the data is not static,
it will come from the database.
It doesn't matter whether its content is loaded from a database or a static file, as long as it's being loaded server-side (ie. by PHP) rather than client-side (ie. by JavaScript). Crawlers see no difference, and so the same guidelines apply.
FRKT is correct that the search engines don't know where content is coming from.
Meta tags, while still somewhat important, don't have the same effect they used to. Include them, but don't consider them the be-all, end-all of how to get higher in SEO.
Start by making sure that the page you generate is W3C compliant. Once it's working, put it into the w3c validator at http://validator.w3.org/ and make it 100% correct. A search engine can't see code if it's poorly structured.
Now, comes the tough part....the other stuff. Nobody REALLY knows everything that the Googles of the world look for, but we've all got pretty good ideas. For example, you'll be higher in search rankings if your domain has "aged" or been out on the web for a while....makes sense, you're not a fly by night operation if your URL has been in operation for months. Keep your content fresh, use proper markup (such as titles in h1 tags, content in p, and ensure that you're not "hiding" your content using images without Meta tags or burying important text in Flash.
Google and Bing provide "webmaster tools" that you can embed in your site and analyze the code to take some of the guesswork out of what the browser sees. See https://www.google.com/webmasters/tools/ and http://www.bing.com/webmaster Don't miss this free opportunity to make things better.
Good luck. Building a strong SEO site with a CMS is not difficult at all if you take your time and think through your actions.
You need to provide the correct meta tags on your web pages such as the Keywords tag in order for search engine crawlers to determine that the contents on your pages are relevant.
If your content is coming from the database and you cannot change it then perhaps you could write a web control to determine the most popular words in your content and then present these automatically within the keywords meta tag.
Provide links to it.
You can't create a form which has form control in which end-users specify what they want to retrieve: because a search engine won't fill in the form (and therefore won't retrieve the data).
Instead you need to serve a page which includes hyperlinks to the various data.
Most search engines provide a way of specifying sitemaps that essentially tell them how to access certain pages that can't be found through normal crawling. For example, pages accessed through javascript or form submissions that generate a URL (method=GET).
Search engines index pages, not databases. Your pages can be dynamic, crawlers come back often enough to update the indexed content and incorporate any new content. You don't have to provide a URL for all pages, just the first page in a series. The search engine will find and follow any pagination links, and index the subsequent pages.
In addition to the other comments, use search engine friendly URLs. This will require you to rewrite your URLs.
Some links:
http://www.seoconsultants.com/articles/1000/urls
http://articles.sitepoint.com/article/search-engine-friendly-urls
http://www.evolt.org/article/Search_Engine_Friendly_URLs_with_PHP_and_Apache/17/15049/index.html%22
The basic idea is that a search engine can do more with a URL in the format:
http://mysite.com/cars/toyota/tacoma
Than it can with a URL in the format:
http://mysite.com/item.php?mid=123&modid=456

PHP library for keeping your site Indexed by Google Bing etc

I need some library which would be able to keep my urls Indexed and described. So I want to say to it something like
Index this new url "www.bla-bla.com/new_url" with some key words
or something like that. And I want to be soure that If I told my lib about my new URL Google and others will 100% find it As soon as possible and people will be able to find this URL on the web.
Do you know any such libs?
I do not know of any librarys that will achieve this but I think you need to do some reading on Search Engine Optimisation. From my understanding (and please correct me if I am wrong) when a Google Bot comes to your website to index it, it will check for a file called sitemap.xml. In this file you define properties as follows;
<url>
<loc>http://www.myhost.com/mypage.html</loc>
<lastmod>YYYY-DD-MM</lastmod>
<changefreq>monthly</changefreq>
<priority>1.00</priority>
</url>
As far as I know, you can not specifiy particular keywords for a particular page. The use of META tags can to "some" (arguably) extent influence this. The main influence will be the actual content of the page.
I would recommend the use of Google's "Webmaster Tools" which will give you feedback/errors about the indexing of your site. You can Add your site to google and join a queue for indexing.
There are several Automated Sitemap Generators, which I have had no experience with so can not comment on these.
There is no way to (immediately and on-demand) manipulate the search results in any search engine. It will always take at least a week for your site to be indexed (maybe even longer).

How to build a in-site search engine with php?

I want to build a in-site search engine with php. Users must login to see the information. So I can't use the google or yahoo search engine code.
I want to make the engine searching for the text and pages, and not the tables in mysql database right now.
Has anyone ever done this? Could you give me some pointers to help me get started?
you'll need a spider that harvests pages from your site (in a cron job, for example), strips html and saves them in a database
You might want to have a look at Sphinx http://sphinxsearch.com/ it is a search engine that can easily be access from php scripts.
You can cheat a little bit the way the much-hated Experts-Exchange web site does. They are for-profit programmer's Q&A site much like StackOverflow. In order to see answers you have to pay, but sometimes the answers come up in Google search results. It is rather clear that E-E present different page for web crawlers and different for humans. You could use the same trick, then add Google Custom Search to your site. Users who are logged in would then see the results, otherwise they'd be bounced to login screen.
Do you have control over your server? Then i would recommend that you install Solr/Lucene for index and SolPHP for interacting with PHP. That way you can have facets and other nice full text search features.
I would not spider the actual pages, instead i would spider pages without navigation and other things that is not content related.
SOLR requiers Java on the server.
I have used sphider finally which is a free tool, and it works well with php.
Thanks all.
If the content and the titles of your pages are already managed by a database, you will just need to write your search engine in php. There are plenty of solutions to query your database, for example:
http://www.webreference.com/programming/php/search/
If the content is just contained in html files and not in the db, you might want to write a spider.
You may be interested in caching the results to improve the performances, too.
I would say that everything depends on the size and the complexity of your website/web application.

Categories