Expand my knowledge in DB programming

Expand my knowledge in DB programming - php

my name is Tal,
Im working on a PHP Application the should have lots and lots of records, what DB should I use, and are there guides on the web that explains how to build efficient dbs and tables?
Tnx!

Get familiar with database normalization. It's a brilliant theory that will help you build stable and scalable databases. There is a good article on the subject at http://en.wikipedia.org/wiki/Database_normalization
As to what database management system you should use, I'd recommend that you start with MySQL. It's the world's most popular database software (it's what they use in WordPress, phpBB and Drupal). It's fast, reliable, open source and there are plenty of learning materials about it.

This is a big topic, merely selecting technology A or B is not enough. You have to consider the whole chain from the end user all the way up to the web server.
Most DB's are close performance wise. How well you utilise each DB will have a direct and bigger impact than simply using one platform over another.
A badly designed DB schema and poorly optimised query will perform badly in ANY db platform.

Don't select every row (*) just select what you need in a query.
Always close connections.
Use either prepared statements or escape inputted user information when inserting or updating.

PHP is widely used with MySql. You can also use PHP with sqlite. sqlite is faster and embeddable, but is not ideal for large dbs.
As far as db efficiency is concerned check this out.

MySql is easy to use, but probably not the most efficient b/c it is free.
SQLite works well too if you need an embedded database in your app.

Related

Mirror MySQL Database Schema with HTML5 local storage for querying

I've done some research on HTML5 local storage, and it seems plausible that I could mirror a MySQL database's structure for use in an application that needs a lot of data for only one person.
Why would I do this? In my spare time, I'm a web game developer: PHP, MySQL, and all the technologies to dress it up. So far I've built databases that support many players, but my games are intended to be "single-player with multi capabilities". And for games that are only intended to be played single-player, there's no point in even having a DB connection unless they're saving to a web server!
I want to achieve a single-player mode that will never touch my database, and will be available offline. However, the code behind all of this is still going to be making SQL queries. Ideally I imagine that I could set up a sort of abstraction layer of local storage that would respond to queries.
And in short, I'm wondering what's out there. Searching local storage and HTML5 will give you endless posts about the technologies, but I'm not certain that my idea here will work well, or should even be attempted. Likewise there could be frameworks out there already that handle this with ease. I've found nothing yet.
update: The deprecation of web SQL database worries me. It looked very appealing for my situation; as it uses SQL, modifying my queries shouldn't be so difficult. Now with a push toward IndexedDB, I'm not certain it will be as easy.

I've read your question a few times now and continue to wonder 'Why would you do this' ;-) The question brings up more questions for me, so...
What do you mean by 'a lot of data'? Ultimately speaking, is the sql metaphor appropriate in this case? Abstraction layer that can respond to 'sql-like' queries is interesting in and of itself, but sounds extremely complex. Could a simpler solution do the job, like a JSON object? What about persisting the data in cases when users clean cache, history, re-install the browser, etc? Even complex javascript literal or JSON object could provide very straight forward means of occasional persistance/backup and recovery. (It's interesting that just released PostgreSql 9.2 includes JSON as datatype. Could it be that SQL and NoSql will gravitate toward some common ground?) Sorry if this comes off as more of a commentary than an answer, your question comes off as being on a higher plane.
EDIT
googling 'javascript sql interpreter' turns up some interesting stuff:
http://www.terminally-incoherent.com/blog/2009/05/19/sql-emulation-tool-in-javascript-part-2/#comments
https://github.com/forward/sql-parser#readme
Generating a JavaScript SQL parser for SQLite3 (with Lemon? ANTLR3?)
http://jsdb.sourceforge.net/demo.html

How is Gmail search is so fast?

What is the most efficient way to search through so many characters? What do you think?
Let's say website built in PHP and MySQL.
What should I learn to be able to build this as much efficiently as possible? Are there any algorythms I should learn or something?

Text indexing algorithm

Google uses a custom-made database solution called BigTable, http://en.wikipedia.org/wiki/Big_table, which is run linked over hundreds of servers all over the world. So they're fast because they wrote the software specifically to be fast, and set up the hardware in such a way that they could squeeze the most out of it.
You can get to a decent set with PHP and MySQL, but once you start dealing with very large data sets, MySQL, and any other generic database, will start to buckle under the stress. If you want to learn more about this, a good place to start is to do a search for concurrency in database design (briefly explained in http://en.wikipedia.org/wiki/Concurrency_control amongst others), which is a topic way too large to cover in a stackoverflow reply =)

Google goes beyond simply optimizing the databases and the code. They also do a lot of distributed programming. While the exact mechanisms they use to power systems such as Gmail are guarded secrets, it is known that they have entire farms of computers networked, each working on parts of the index at any given time, rather than just one server.

For MySQL, look at the Full-Text Search Functions.
This is assuming your content is stored in the database (such as in a CMS).

Business Logic in PHP or MySQL?

On a site with a reasonable amount of traffic , would it matter if the application/business logic is written as stored procedures ,triggers and views , instead of inside the PHP code itself?
What would be the best way to go keeping scalability in mind.

I can't provide you statistics, but unless you plan to change PHP for another language in the future, i can say keeping the business logic in PHP is more "scalability friendly".
Its always easier and cheaper to solve web server load problems than having them in the database. Your database will always need to be lighting quick and just throwing mirrors at it won't solve the problem. The more database slaves you have, the more writes you have to do.

In my experience, you should put business logic in PHP code rather than move it onto the database. Assuming your database is on a separate server, you don't want your database to be busy calculating formulas when requests come in.
Keep your database lightning fast to handle selects, inserts and updates.

I think you will have far better scalibility keeping database code in the database where it can be performance tuned as the number of records gets larger. You will also have better data integrity which is critical to the data even being useful. You don't see a lot of terrabyte sized relational dbs with all their code in the application.
Read some books on database performance tuning and then decide if you want to risk your company's data on application code.

There are several things to consider when trying to decide whether to place the business logic in the database or in the application code.
Will the same database be accessed
from different websites / web
applications? Will the sites /
applications be written in the same
language or in a different language?
If the database will be used from a single site, and the site is written in a single language then this becomes a non-issue. Otherwise, you'll need to consider the added complexity of stored procedures, triggers, etc vs trying to maintain database access logic etc in multiple code bases.
What are relational databases in
general good for and what is MySQL
good for specifically? What is PHP
best at?
This consideration is fairly straight-forward. Relational databases across the board and specifically in any variant of SQL are going to do a great job at inserting, updating, and deleting data. Generally they also handle ATOMIC transactions well. However, most variants of SQL (including MySQL) are not good at complex calculations, on-the-fly date handling, file system access etc.
PHP on the other hand is very fast at handling calculations, dates, file system accesses. By taking a little time you can even design your PHP code to work in such a way that records are only retrieved once and then stored when necessary.
What are you most familiar /
comfortable with using?
Obviously it tends to make more sense to use the tool with which you are most familiar.
As a last point consider that just because a drill can be used to cut sheet rock or because a hammer can be used to drive a screw doesn't mean that they should be used for these things. Sometimes I think that programmers do more potential damage by trying to make more powerful tools that do everything rather than making simpler tools that do one thing really, really well.

A well done PHP application should be enought, but keep in mind that it also requires you to do the less calls to the database you can. Store values you'll need later in PHP, shorten queries, cache, etc.
MySQL optimization is always a must, as it will also decrease the amount of databse calls by PHP, and thus getting a better performance. Therefore, there's no way you can't think of stored procedures, etc, if your aim is to increase performance. But MySQL by itself would't be enought if your PHP code isn't well done (lots of unecessary database calls), that's why I think PHP must be well coded, keeping in mind the hole process while developing it, so that unecessary stuff doesn't get in the way. Cache for instance, in "duet" with proper MySQL, is a great boost on performance.

My POV, even not having much experience in developing large applications is to write business logic in the DB for some reasons:
1 - Maintainability, I think that languages deprecate functions and changes many other things in a short time period, so if PHP changes version, you'll need to adapt your code to the new version
2 - DBs tends to be more language stable, so when a new version of a RDBMS comes out, it usually doesn't change many things in the way you write your queries or SPs, or it even doesn't change. Writing your logic in DB will reduce code adaptation because of a new DB version
3 - A RDBMS is more likely to be alive for a long period rather than a programming language. Also, as your data is critical, there is a big worry from the RDBMS developers for automatic migration of your whole data to the new RDBMS version, including your SPs. When clipper died, there were no ways to migrate systems to a new programming language, they had to be completely rewritten.
4 - If you think someday to change completely the language you are writing the application for some reason(language death, for example), the only thing to be rewritten will be the presentation and the SP calls, not business logic.
I'd like to know from other people here if what I pointed out makes sense, and if not, why. I'm on the same situation as Sabeen Malik, I'm thinking to begin my first huge project and I'm tending towards SPs because of what I wrote. So it's time to correct my POV if it's not so correct.

MySQL sucks at using advanced DB techniques, it's simple and fast. PHP, being a dynamic language, makes processing data very easy. Therefore, it usually makes sense to use PHP.

flat-file database php application

I'm creating and app that will rely on a database, and I have all intention on using a flat file db, is there any serious reasons to stay away from this?
I'm using mimesis (http://mimesis.110mb.com)
it's simpler than using mySQL, which I have to admit I have little experience with.
I'm wondering about the security of the db. but the files are stored as php and it seems to be a solid database solution.
I really like the ease of backing up and transporting the databases, which I have found harder with mySQL. I see that everyone seems to prefer the mySQL way - and it likely is faster when it comes to queries but other than that is there any reason to stay away from flat-file dbs and (finally) properly learn mysql ?
edit
Just to let people know,
I ended up going with mySQL, and am using the CodeIgniter framework. Still like the flat file db, but have now realized that it's way more complex for this project than necessary.

Use SQLite, you get a database with many SQL features and yet it's only a single file.

Greetings, I'm the creator of Mimesis. Relational databases and SQL are important in situations where you have massive amounts of data that needs to be handled. Are flat files superior to relation databases? Well, you could ask Google, as their entire archiving system works with flat files, and its the most popular search engine on Earth. Does Mimesis compare to their system? Likely not.
Mimesis was created to solve a particular niche problem. I only use free websites for my online endeavors. Plenty of free sites offer the ability to use PHP. However, they don't provide free SQL database access. Therefore, I needed to create a database that would store data, implement locking, and work around file permissions. These were the primary design parameters of Mimesis, and it succeeds on all of those.
If you need an idea of Mimesis's speed, if you navigate to the first page it will tell you what country you're viewing the site from. This free database is taken from the site ip2nation.com and ported into a Mimesis ffdb. It has hundreds if not thousands of entries.
Furthermore, the hit counter on the main page has already tracked over 7000 visitors. These are UNIQUE visits, which means that the script has to search the database to see if the IP address that's visiting already exists, and also performs a count of the total IPs.
If you've noticed the main page loads up pretty quickly and it has two fairly intensive Mimesis database scripts running on the backend. The way Mimesis stores data is done to speed up read and write procedures and also translation procedures. Most ffdb example scripts or other ffdb scripts out there use a simple CVS file or other some such structure for storing data. Mimesis actually interprets binary data at some levels to augment its functionality. Mimesis is somewhat of a hybrid between a flat file database and a relational database.
Most other ffdb scripts involve rewriting the COMPLETE file every time an update is made. Mimesis does not do this, it rewrites only the structural file and updates the actual row contents. So that even if an error does occur you only lose new data that's added, not any of the older data. Mimesis also maintains its history. Unless the table is refreshed the data that rows had previously is still contained within.
I could keep going on about all the features, but this isn't intended as a "Mimesis is the greatest database ever" rant. Moreso, its intended to open people's eyes to the fact that SQL isn't the ONLY technology available, and that flat files, when given proper development paradigms are superior to a relational database, taking into account they are more specialized.
Long live flat files and the coders who brave the headaches that follow.

The answer is "Fine" if you only NEED a flat-file structure. One test: Would a single simple spreadsheet handle all needs? If not, you need a relational structure, not a flat file.
If you're not sure, perhaps you can start flat-file. SQLite is a great app for getting started.
It's not good to learn you made the wrong choice, if you figure it out too far along in the process. But if you understand the importance of a relational structure, and upsize early on if needed, then you are fine.

I really like the ease of backing up
and transporting the databases, which
I have found harder with mySQL.
Use SQLite as mentioned in another answer. There is only one file to backup, or set up periodic dumps of the MySQL databases to SQL files. This is a relatively simple thing to do.
I see that everyone seems to prefer
the mySQL way - and it likely is
faster when it comes to queries
Speed is definitely a consideration. Databases tend to be a lot faster, because the data is organized better.
other than that is there any reason to
stay away from flat-file dbs and
(finally) properly learn mysql ?
There sure are plenty of reasons to use a database solution, but there are arguments to be made for flat files. It is always good to learn things other than what you "usually" use.
Most decisions depend on the application. How many concurrent users are you going to have? Do you need transaction support?

Wanted to inform that Mimesis has moved from the original URL to http://mimesis.site11.com/
Furthermore, I am shifting the focus of Mimesis from an ffdb to a key-value store. It's more sensible Given the types of information I'm storing and the methods I use to retrieve it. There was also a grave error present in the coding of Mimesis (which I've since fixed). However, I'm still in the testing phase of the new key-value store type. I've also been side-tracked by other things. Locking has also been changed from the use of file creation to directory creation as the mutex mechanism.

Interoperability. MySQL can be interfaced by basically any language that counts. Mimesis is unlikely to be usable outside PHP.
This becomes significant the moment you try to use profilers, or modify data from the outside.

You might also look at http://lukeplant.me.uk/resources/flatfile/ for the PHP Flatfile Package.

The issue with going flatfile is that in order to adjust the situation for further development you have to alter a significant amount of code in order to improve the foundation of the system. Whereas if it was a pure SQL system it would require little to no modification to proceed in the future.

What are alternatives to SQL database storage for a web site?

An SQL database is overkill if your storage needs are small. When I was young and dumb, I used a text file and flock()ed it when I needed to access it. This doesn't scale, but I still feel that non-database solutions have been completely ignored in Web 2.0.
Does anyone not use an SQL database for storage? What are the alternatives?

There are a lot of alternatives. But having SQLite which gives you SQL power combined with no fuss of file based storage, there is no need to look for these alternatives. SQLite is light enough to be used in cell phones and MP3 players, so I don't see how it could be considered an overkill.
So unless your application needs something very specific, don't bother. Most alternatives are a lot harder to use and have less performance.

SQLite is invented for this.
It's just a flat-file that contains a complete SQL database. You can query, update, insert, delete, there's little to no overhead in installation and all you need is the driver (which comes standard in PHP )
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.
Kind of weird that nobody mentioned this already?

CouchDB (http://couchdb.apache.org/index.html) is a non-sql database, and seems to be a popular project these days, as well as Google's bigtable, or GT.M (http://sourceforge.net/projects/fis-gtm) which has been around forever.
Object databases abound as well; dbforobjects (http://www.db4o.com/), ZODB (http://www.zope.org/Products/StandaloneZODB), just to name a few.
All of these are supposedly faster and simpler than traditional SQL databases for certain use cases, but none approach the simplicity of a flat file.

A distributed hash table like google bigtable or hadoop is a simple and scalable non SQL database and often suits the websites far better than a SQL database. SQL is great for complex relational data, but most websites don't have this requirement. Most websites store and retrieve data in a few forms and don't need to run complex operations on the data.
Take a look at one of these solutions as they will provide all of the concurrent access that you need but don't subscribe to the traditional ideas of data normalisation. They can be thought of as pretty analogous to a bunch of named text files.

It probably depends how dynamic your web site is. I used wiki software once that used RCS to check in and out text files. I wouldn't recommend that solution for something that gets as many updates as StackOverflow or Wikipedia. The thing about database is that they scale well, and the database engine writers have figured out all the fiddly little details of simultaneous access, load balancing, replication, etc.

I would say that it doesn't depend on whether you store less or more information, it depends on how often you are requesting the stored data. Databasemanagers are superb on caching queries, so they are often the better choice performance wise. How ever, if you don't need a dynamic web page and are just loading static data - maybe a text file is the better option. Which format the data is stored in (i.e. XML, JSON, key=pair) doesn't matter - it's I/O operations that are performance heavy.
When I'm developing web applications, I always use a RDBMS as the primary data holder. If the web application don't need to serve dynamic data at every request, I simply apply a cache functionality storing the data in a cache file that gets requested when no new data have been added to the primary data source (the RDBMS).

I wouldn't choose whether to use an SQL database based on how much data I wanted to store - I would choose based on what kind of data I wanted to store and how it is to be used.
Wikipeadia defines a database as: A database is a structured collection of records or data that is stored in a computer system. And I think your answer lies there: If you want to store records such as customer accounts, access rights and so on then a DB such as mySQL or SQLite or whatever is not overkill. They give you a tried and trusted mechanism for managing those records.
If, on the other hand, your website stores and delivers unchanging file-based content such as PDFs, reports, mp3s and so on then simply storing them in a well-defined directory layout on a disk is more than enough. I would also include XML documents here: if you had for example a production department that created articles for a website in XML format there is no need to put them in a DB - store them on disk and use XSLT to deliver them.
Your choice of SQL or not will also depend on how the content you wish to store is to be retrieved. SQL is obviously good for retrieving many records based on search criteria whereas a directory tree, XML database, RDF database, etc are more likely to be used to retrieve single records.
Choice of storage mechanism is very important when trying to scale high-traffic site and stuffing everything into a SQL DB will quickly become a bottleneck.

It depends what you are storing. My blog uses Blosxom (written in Perl but a similar thing could be done for PHP) where each individual entry is a separate text file. The first line is plain text (the title) and the rest is unrestricted HTML. Following a few simple rules, these are rendered to form a simple but effective blogging framework.
It does have drawbacks but it also means that each post is a discrete file, which works well for updating on a local machine and then publishing to a remote web server. This is limited when it comes to efficient querying though, so certainly not a good choice if you want fine-grained control and web-based interaction with your data.

Check CouchDB.

I have used LINQ to XML as a data source in a .NET project. It was a small solution, and used caching to mitigate performance concerns. I would do it again for the quick site that just needs to keep data in a common place without increasing server requirements.

Depends on what you're storing and how you need to access it. Generally sql provides great reporting and manual management ability. Almost everything needs some way to manage what's stored and report on it.

In Perl I use DBM or Storable for such tasks. DBM will update automatically when variable is updated.

One level down from SQL databases is an ISAM (Indexed Sequential Access Method) - basically tables and indexes but no SQL and no explicit relationships among tables. As long as the conceptual basis fits your design, it will scale nicely. I've used Codebase effectively for a long time.
If you want to work with SQL-database-type data, then consider FileMaker.

A Simple answer is that you can use any data storage format, from standard defined, to database (which generally involved a protocol), even a bespoke file-format.
There are trade-offs for every choice you make in IT, and certainly websites are no different. In the early 2000's file-based forum systems were popular as it allows anyone with limited technical ability to edit pages and posts. Completely static sites swiftly become unmanageable and content does not benefit from upgrades to the site user-interface; however the site if coded correctly can simply be moved to a sub-directory, or ripped into the new design. CMS's and dynamic systems bring with them their own set of problems, namely that there does not yet exist a widely adopted standard for data storage amongst them; that they often rely on third-party plugins to provide features between design styles (despite their documentation advocating for separation of function & form).
In 2016, it's pretty uncommon not to use a standard storage mechanism, such as a *SQL RDBMS; although static site generators such as Jekyll (powers a lot of GitHub pages); and independent players such as October CMS still provision for static file-based storage.
My personal preference is to use an *SQL enabled RDBMS, it provides me syntax that is standardised at least at the vendor level, familiar and powerful syntax, but unlike a lot of people I don't think this is the only way, and in most cases would advocate for using a site-generator to save parts that don't have to be dynamic to a static store as this is the cheapest way to live on the web.
TLDR; it's up to you, SQL & RDBMS backed are popular.

Well, this is a bit of an open-ended question from the OP and there are two questions ... around SQL alternatives and non-SQL.
In general, in the "Why is SQL good" category ... it's a mature and robust standard that provides referential-integrity. Java JDBC supports it fully as do tools like TOAD and there a many SQL implementations such as SQL-Lite referenced earlier.
Now specific to a "for a web-site" is not particularly indicative of anything. Does a web-site need referential integrity? Maybe. If the business nature of the web-site is largely unstructured content, then one can consider any kind of persistent storage really from so called "no-SQL" databases like AWS DynamoDB to Mongo (not a fan though).
For managing the complexities of SQL stores - one suggestion versus a list of every persistence store ever created ... is AWS Aurora (part of RDS service). It is multi-region active-active and fully MySQL-compliant. JDBC/ODBC based driver frameworks would work out-of-the-box and it effectively offers "zero administration".

I would check out XML if I were you. See w3schools XML tutorial section on the left side. Tons of possibilities without using SQL database.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.