Do search engines read/apply get variables?

Do search engines read/apply get variables? - php

Lets say we have index.php with the following links.
Home>
Contact
Followed by the following dynamic content...
<div id="content">
<?php inlclude "./content/" . $_GET['page'] . ".php"; ?>
</div>
I am in the process of creating my own light weight CMS and i'm wondering if search engines will crawl through these links with the get variables and pull/index the content. I also plan on controlling my meta-content in a similar fashion.
Do Search Engines read/apply get variables?

They surely will. Else they'd miss most of the dynamic content on the web not using nice urls ;)

Search engines will scan a webpage for hyperlinks and store any unique locations that they come across. index.php is a different location than index.php?q=home is a different location than index.php?q=about.
Unless of course you've told the search engines not to scan that page with a robots.txt file.

Back in the early days of search engines, the answer was no. Nowadays search engines are smarter and are for the most part able to differentiate pages even with the same root pagename.
However, it will definitely be better to use a RESTful application design, and that will entail using mod_rewrite or some other technique to make your URLs more transparent. Given that you're in the planning stages of creating the CMS, I would definitely read up on how to implement REST in your program, avoiding the problem entirely.

Related

page generator from template

I was wondering if there is a 'best' way to have a website with multiple pages with all pages having the same layout but with different content (like facebook, stackoverflow and many many other sites)?
The way I'm currently doing this is like this:
(the content in my case is a database tablename which is then queried into content through sql.)
template.php:
<?php
function tplate($tablename){
echo "<html>content from $tablename</html>";
}
?>
subcat.php:
<?php
include 'template.php';
tplate('contenttable');
?>
This works in very simple cases but once I try to add more functionalities it gets really complicated and I spend a lot of time debugging.
Does anyone know how this is generally done/best handled?

You'd typically use a templating engine (often in concert with a framework, like Laravel or Symfony). Twig is a popular one for PHP.

this is what i have been doing for a while http://controllingnetworks.tk/ the url is rewritten using a mod re-write, if i was to type the url in full it would be index.php?page=home instead of /home
but all the pages have the same template with different content,
i have a few sites using databases to manage the content and a few that use php files to do this

Will crawlers understand page titles which has been changed by jQuery?

In a project I'm trying to fetch data within the <body> tag. So I can't echo anything in the <title> 'cause I haven't fetched anything yet. I want to change the title tag after the page has been loaded with jQuery.
Will crawlers understand this and when they index the page will they use the title I have provided with jQuery?

nope.. search engine crawlers see what is rendered by the server..
But if you are building an AJax website you can read the google provided Making AJAX Applications Crawlable
quoting the guide
If you're running an AJAX application with content that you'd like to appear in search results, we have a new process that, when implemented, can help Google (and potentially other search engines) crawl and index your content.

No, crawlers are highly unlikely to execute any of the javascript on the page. Some may inspect any javascript and make some assumptions based on that. But one should not assume that this is the case.

Google's spider can run JavaScript on pages that it processes, but I don't think there's any advice anywhere on what it can and can't do. Of course other crawlers won't be as sophisticated and will probably ignore dynamic content.

It's an interesting test actually. I'll try this one of one of my sites and post back. I know googlebot does understand some javascript, but I think this is more for dark SEO tactics; i.e. $('.spammystuff').hide(); type things.

SEO questions before launching new web site

I'm launching this big database (1.5+ million records) driven website and I want to know some SEO tips before..
Which links I need to tag as rel="nofollow", rel="me", etc?
How to prevent search engines to follow links that are meant to users only? Like 'login', 'post message', 'search', etc.
Do I need to prevent search engines from entering the 'search' section of the site? How to prevent it?
The site is basically a database of movies and actors. How to create a good sitemap?
I need to prevent search engines form reading user comments and reviews???
Another robots.txt or .htacces configuration is needed?
How to use noindex the right way?
Additional tips?
Thanks!

If you just have internal links, no reason to make them nofollow
Make them buttons on forms with method="post" (that's the correct way to do it anyway)
Don't think you need to do that.
Perhaps see how IMDb does it? I'd consider just listing all actors and all movies in some sort of a sensible manner or something like that.
Why would you need to do that?
Depending on whether you want to block something (via robots.txt) or need .htaccess for something else
No idea
Remember to use semantic HTML - use h1's for page titles and so on.

Use nofollow when you don't want your linking to a page to give it additional weight in Google's pageRank. So, for example, you'd use it on links to user homepages for comments or signatures. Use me when you are linking to your other "identities", e.g. your facebook page, your myspace account, etc.
robots.txt allows you to give a set of rules to webcrawlers on what they can or can't crawl and how to crawl. nofollow also tells Google not to crawl a link supposedly. Additionally, if you have application queries that are non-idempotent (cannot be safely called multiple times), then they should be POST requests—these include things like news/message/page deletions.
Unless your searches are incredibly database-intensive (perhaps they should be cached) then you probably don't need to worry about this.
Google is intelligent enough to figure out a sitemap that you've created for your user. And that's the way you ought to be thinking instead of SEO; E.g. how can I make my site more usable/accessible/user-friendly—all of which will indirectly optimize your site for search engines. But if you want to go the distance, there are semantic sitemap technologies you can use, like RDF sitemaps or XML sitemaps. Also, Google Webmasters Tools offers site map creation.
No, why would you want to hide content from the search engine? Probably 90% of StackOverflow's search engine referrals are from user-generated content.
What? Configure your web server for people, not search engines.
This is easy to find the answer to.
Don't make your site spammy, such as overloading it with banners or using popup ads; use semantic markup (H1, H2, P, etc.); use good spelling/grammar; use REST-style URLs (even if it's not a RESTful application); use slugs to hide ugly URI-encoding; observe accessibility standards and guidelines; and, most importantly, make your site useful to encourage return visits and backlinks—that is the most sure fire way of attaining good search ranking.

How to make website more SEO friendly when the content is coming from the database? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 6 years ago.
Improve this question
How can i make my database content search-able on SE,
so basically how to make a website more SEO friendly where the data is not static,
it will come from the database.

It doesn't matter whether its content is loaded from a database or a static file, as long as it's being loaded server-side (ie. by PHP) rather than client-side (ie. by JavaScript). Crawlers see no difference, and so the same guidelines apply.

FRKT is correct that the search engines don't know where content is coming from.
Meta tags, while still somewhat important, don't have the same effect they used to. Include them, but don't consider them the be-all, end-all of how to get higher in SEO.
Start by making sure that the page you generate is W3C compliant. Once it's working, put it into the w3c validator at http://validator.w3.org/ and make it 100% correct. A search engine can't see code if it's poorly structured.
Now, comes the tough part....the other stuff. Nobody REALLY knows everything that the Googles of the world look for, but we've all got pretty good ideas. For example, you'll be higher in search rankings if your domain has "aged" or been out on the web for a while....makes sense, you're not a fly by night operation if your URL has been in operation for months. Keep your content fresh, use proper markup (such as titles in h1 tags, content in p, and ensure that you're not "hiding" your content using images without Meta tags or burying important text in Flash.
Google and Bing provide "webmaster tools" that you can embed in your site and analyze the code to take some of the guesswork out of what the browser sees. See https://www.google.com/webmasters/tools/ and http://www.bing.com/webmaster Don't miss this free opportunity to make things better.
Good luck. Building a strong SEO site with a CMS is not difficult at all if you take your time and think through your actions.

You need to provide the correct meta tags on your web pages such as the Keywords tag in order for search engine crawlers to determine that the contents on your pages are relevant.
If your content is coming from the database and you cannot change it then perhaps you could write a web control to determine the most popular words in your content and then present these automatically within the keywords meta tag.

Provide links to it.
You can't create a form which has form control in which end-users specify what they want to retrieve: because a search engine won't fill in the form (and therefore won't retrieve the data).
Instead you need to serve a page which includes hyperlinks to the various data.

Most search engines provide a way of specifying sitemaps that essentially tell them how to access certain pages that can't be found through normal crawling. For example, pages accessed through javascript or form submissions that generate a URL (method=GET).
Search engines index pages, not databases. Your pages can be dynamic, crawlers come back often enough to update the indexed content and incorporate any new content. You don't have to provide a URL for all pages, just the first page in a series. The search engine will find and follow any pagination links, and index the subsequent pages.

In addition to the other comments, use search engine friendly URLs. This will require you to rewrite your URLs.
Some links:
http://www.seoconsultants.com/articles/1000/urls
http://articles.sitepoint.com/article/search-engine-friendly-urls
http://www.evolt.org/article/Search_Engine_Friendly_URLs_with_PHP_and_Apache/17/15049/index.html%22
The basic idea is that a search engine can do more with a URL in the format:
http://mysite.com/cars/toyota/tacoma
Than it can with a URL in the format:
http://mysite.com/item.php?mid=123&modid=456

How to build a in-site search engine with php?

I want to build a in-site search engine with php. Users must login to see the information. So I can't use the google or yahoo search engine code.
I want to make the engine searching for the text and pages, and not the tables in mysql database right now.
Has anyone ever done this? Could you give me some pointers to help me get started?

you'll need a spider that harvests pages from your site (in a cron job, for example), strips html and saves them in a database

You might want to have a look at Sphinx http://sphinxsearch.com/ it is a search engine that can easily be access from php scripts.

You can cheat a little bit the way the much-hated Experts-Exchange web site does. They are for-profit programmer's Q&A site much like StackOverflow. In order to see answers you have to pay, but sometimes the answers come up in Google search results. It is rather clear that E-E present different page for web crawlers and different for humans. You could use the same trick, then add Google Custom Search to your site. Users who are logged in would then see the results, otherwise they'd be bounced to login screen.

Do you have control over your server? Then i would recommend that you install Solr/Lucene for index and SolPHP for interacting with PHP. That way you can have facets and other nice full text search features.
I would not spider the actual pages, instead i would spider pages without navigation and other things that is not content related.
SOLR requiers Java on the server.

I have used sphider finally which is a free tool, and it works well with php.
Thanks all.

If the content and the titles of your pages are already managed by a database, you will just need to write your search engine in php. There are plenty of solutions to query your database, for example:
http://www.webreference.com/programming/php/search/
If the content is just contained in html files and not in the db, you might want to write a spider.
You may be interested in caching the results to improve the performances, too.
I would say that everything depends on the size and the complexity of your website/web application.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.