Can MySQL handle large amounts of data?

Can MySQL handle large amounts of data? - php

I a developing college management web application with PHP and MySQL. I chose MySQL as my database because of its free license. Will it handle large amounts of data? College datas gradually increases with more schools and number of years the datas are accumulated. Is MySQL the best one for large amount of datas?
Thanks in advance

MySQL is perfectly fine; facebook uses mySQL for instance; I can't imagine a database size more extensive... see https://blog.facebook.com/blog.php?post=7899307130 from facebooks blog.

MySQL is definitely the best choice for you to start, as it is...
free available
a defacto standard in combination with PHP
a good start for beginners
and yes, can handle a huge amount of data
I've seen lots of companies and startups, which are using MySQL and handling tons of data. If you ran into performance issues later, you can care about it then, e.g. use a caching layer, optimize MySQL, etc.

MySQL will handle large amounts of data just fine, making sure your tables are properly indexed is going to go along way into ensuring that you can retrieve large data sets in a timely manner. We have a client that has a database with over 5 million records, and don't have much trouble outside the normal issues in dealing with a table that large.
Each flavor of SQL has it's own differences, just make sure you do your due diligence to find out the best options for your database and tables based on your needs.

MySQL Table size has a max of 4GB by default, you can change this. PostgresSQL, you set the limit when you create a table.

Mysql is OK choice, bit if you're expecting vast amounst of data I would prefer postgres sql (imho best free db avaliable).

Related

PHP with MSSQL performance

I'm developing a project where I need to retrieve HUGE amounts of data from an MsSQL database and treat that data. The data retrieval comes from 4 tables, 2 of them with 800-1000 rows, but the other two with 55000-65000 rows each one.
The execution time wasn't tollerable, so I started to rewrite the code, but I'm quite inexperienced with PHP and MsSQL. My execution of PHP atm is in localhost:8000. I'm generating the server using "php -S localhost:8000".
I think that this is one of my problems, the poor server for a huge ammount of data. I thought about XAMPP, but I need a server where I can put without problems the MsSQL Drivers to use the functions.
I cannot change the MsSQL for MySQL or some other changes like that, the company wants it that way...
Can you give me some advices about how to improve the performance? Any server that I can use to improve the PHP execution? Thank you really much in advance.

The PHP execution should least of your concerns. If it is, most likely you are going about things in the wrong way. All the PHP should be doing is running the SQL query against the database. If you are not using PDO, consider it: http://php.net/manual/en/book.pdo.php
First look to the way your SQL query is structured, and how it can be optimised. If in complete doubt, you could try posting the query here. Be aware that if you can't post a single SQL query that encapsulates your problem you're probably approaching the problem from the wrong angle.
I am assuming from your post that you do not have recourse to alter the database schema, but if so that would be the second course of action.

Try to do as much data processing in SQL Server as possible. Don't do data joining or other type of data processing that can be done in the RDBMS.
I've seen PHP code that retrieved data from multiple tables and matched lines based on several conditions. This is just an example of a misuse.
Also try to handle data in sets in SQL (be it MS* or My*) and avoid, if possible, line-by-line processing. The optimizer will output a much more performant plan.

This is small database. Really. My advices:
- Use paging for the tables and get data by portions (by parts)
- Use indexes for tables
- Try to find more powerful server. Often hosters companies uses one database server for thousands user's databases and speed is very slow. I suffered from this and bought dedicated server finally.

Efficiency: PhpExcel look up vs mysql server query

I'm working with a bunch of manufacturing processes and trying to create a basic auto-scheduler. This is focusing on gathering the requirements from the DB2 server our system is on.
On a previous incarnation I queried the orders for each part by itself based on days, then transformed those days into mondays to group the orders by weeks and then propagated the requirements down through the components and finally storing all of that in specific excel files for the given resource.
In the newest incarnation I've built a database with the bill of materials information, and I query for all of those orders at once to build raw data files for the different kinds of processes and in order to get the schedules for each component I'm parsing those raw files and building specific excel files for the schedules.
My question then is: Is it more efficient to limit queries or limit excel look ups? I've done some looking at other PHPExcel efficiency questions and found a few changes to make to improve that, and have also done the same with MySQL queries, but in general what is a more efficient way to do what I'm looking at? (As an additional note, the server I'm running my MySQL database on has enough RAM to store the entire database there, which I know increases the speed, but I'm not sure if that should be a determining factor as that fact might eventually change)

MySQL is really good at optimizing queries. What you need is the right indexes. Making a SQL query on an index will definitely be faster than parsing the data using PHPExcel (or any other library).
This is especially true for PHPExcel if your dataset is large: PHPExcel requires a lot of memory so you are likely to encounter an "Out of Memory" issue. You can workaround this by caching data but this will badly affect the overall performance.
So my advice is to make sure your tables are correctly indexed, get the data you need from MySQL and avoid filtering this data in your application. This pattern works for the vast majority of use cases and scales well.

Mysql : How to run heavy analytical query at real time

I am running a crm application which uses mysql database. My application generating lots of data in mysql. Now i want to give my customer a reporting section where admin can view real time report, they should be able to filter at real time. Basically i want my data to be slice and dice at real time fast as possible.
I have implemented the reporting using mysql and php. But now as data is too much query takes too much time and page does not load. After few read i came across few term like Nosql, mongoDb , cassandra , OLAP , hadoop etc but i was confuse which to choose. Is there any mechanism which would transfer my data from mysql to nosql on which i can run my reporting query ans serve my customer keeping my mysql database as it is ?

It doesn't matter what database / datastore technology you use for reporting: you still will have to design it to extract the information you need efficiently.
Improving performance by switching from MySQL to MongoDB or one of the other scalable key/value store systems is like solving a pedestrian traffic jam by building a railroad. It's going to take a lot of work to make it help the situation. I suggest you try getting things to work better in MySQL first.
First of all, you need to take a careful look at which SQL queries in your reporting system are causing trouble. You may be able to optimize their performance by adding indexes or doing other refactoring. That should be your first step. MySQL has a slow query log. Look at it.
Secondly, you may be able to add resources (RAM, faster disks, etc) to MySQL, and you may be able to tune it for higher performance. There's a book called High Performance MySQL that offers a sound methodology for doing this.
Thirdly, many people who need to add a reporting function to their busy application use MySQL replication. That is, they configure one or two slave MySQL servers to accept copies of all data from the master server.
http://dev.mysql.com/doc/refman/5.5/en/replication-howto.html
They then use the slave server or servers to run reporting queries. The slaves are ordinarily a few seconds or minutes behind the master (that is, they're slightly out of date). But it usually is good enough to give users the illusion of real-time reporting.
Notice that if you use MongoDB or some other technology you will also have to replicate your data.

I will throw this link out there for you to read which actually gives certain use cases: http://www.mongodb.com/use-cases/real-time-analytics but I will speak for a more traditional setup of just MongoDB.
I have used both MySQL and MongoDB for analytical purposes and I find MongoDB better suited, if not needing a little bit of hacking to get it working well.
The great thing about MongoDB when it comes to retreiving analytical data is that it does not require the IO/memory to write out a separate result set each time. This makes reads on a single member of a replica set extremely scalable since you just add your analytical collections to the working set (a.k.a memory) and serve straight from those using batch responses (this is the default implementation of the drivers).
So with MongoDB replication rarely gives an advantage in terms of read/write, and in reality with MySQL I have found it does not either. If it does then you are doing the wrong queries which will not scale anyway; at which point you install memcache onto your database servers and, look, you have stale data being served from memory in a NoSQL fashion anyway...whoop, I guess.
Okay, so we have some basic ideas set out; time to talk about that hack. In order to get the best possible speed out of MongoDB, and since it does not have JOINs, you need to flatten your data so that no result set will even be needed your side.
There are many tactics for this, but the one I will mention here is: http://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports/ pre-aggregated reports. This method also works well in SQL techs since it essentially is the in the same breath as logically splitting tables to make queries faster and lighter on a large table.
What you do is you get your analytical data, split it into a demomination such as per day or month (or both) and then you aggregate your data across those ranges in a de-normalised manner, essentially, all one row.
After this you can show reports straight from a collection without any need for a result set making for some very fast querying.
Later on you could add a map reduce step to create better analytics but so far I have not needed to, I have completed full video based anlytics without such need.
This should get you started.

TiDB may be a good fit https://en.pingcap.com/tidb/, it is MySQL compatible, good at real-time analytics, and could replicate the data from MySQL through binlog.

Is MySQL fast enough to read from the database every time or should I cache the results?

I'm in the planning stage of building a web app for a school. I'm worried about the speed and efficiency of MySQL when multiple people are accessing it. The app will allow teachers to CRUD student records. Is it better to cache a json/xml result when a record is created/updated so that the app can quickly display it to the user (using javascript)? Or is MySQL fast enough to handle a updates and queries for the same data?

I have a program that does exactly this (plus more). Use a database, they're designed for these queries. Currently I've hit just under 100 concurrent users, and have a few thousand students, and have had no latency issues.
It's better, faster, safer to use a database.

JSON and XML is used for data exchanging between different platform/software. Like between PHP and Java or between twitter and C. Its good when you dont have any protocol defined and you use a common format like xml, json, yml. But if you have a protocol defined (between mysql and php) use that. That'll be much much faster. Besides as a database mysql can perform many other extra data manipulation operation that you can not achieve with plain xml or json or yml. So use mysql.
is mysql fast enough to handle a large number of queries that will just be retrieving data to display?
Its a lot faster. I doubt whether you'll face its slowness. I have seen many devices (not human) manipulating mysql concurrently. It produces tremendous load. And mysql still handling those data.

in this case Csv or Mysql?

I am starting new project. In my project I will need to use local provinces and local city names. I do not want to have many mysql tables unless I have to have or csv is fast. For province-city case I am not sure which one to use.
I have job announcements related with cities, provinces. For Csv case I will keep the name of city in announcements table, so when I do search I send selected city name to db in query.
can anyone give me better idea on how to do this? csv or mysql? why?
Thanks in advance.

Database Pros
Relating cities to provinces and job announcements will mean less redundant data, and consistently formatted data
The ability to search/report data is much simpler, being [relatively] standardized by the use of SQL
More scalable, accommodating GBs of data if necessary
Infrastructure is already in place, well documented in online resources
Flat File (CSV) Pros
I'm trying, but I can't think of any. Reading from a csv means loading the contents into memory, whether the contents will be used or not. As astander mentioned, changes while the application is in use would be a nightmare. Then there's the infrastructure to pull data out, searching, etc.
Conclusion
Use a database, be it MySQL or the free versions of Oracle or SQL Server. Basing things off a csv is coding yourself into a corner, with no long term benefits.

If you use CSV you will run into problems eventually if you are planning on a lot of traffic. If you are just going to use this personally on your machine or with a couple people in an office then CSV is probably sufficient.

I would recomend keeping it in the db. If you store the names in the annoucements table, any changes to the csv will not be updated in the queries.
DBs are meant to hanle these issues.

If you don't want to use a database table, use an hardcoded array directly in PHP: if the performances are so critic I don't know any way faster than this one (and I don't see a single advantage in using CSV too).
Apart of that I think this is a clear premature optimization. You should make your application extensible, especially at the planning stage. Not using a table will make the overall structure rigid.

While people often get worried about the the proliferation of tables inside a database they are under management. Management by the DBMS. This means that you can control the data control task like updating and it also takes you down the route of organising the data properly, i.e. normalisation.
Large collections of CSV or XML files can get extremely unwieldy unless you are prepared to write management systems arounf them (that already come with the DBMS for, as it were, free).
There can be good reason for not using DBMS's but i have not found many and certainly not in mainstream development.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.