I have any number of users in a database (this could be 100, 2000, or 3) what i'm doing is using mysql "show tables" and storing the table names in an array, then i'm running a while loop and taking every table name (the user's name) and inserting it into some code, then i'm running said piece of code for every table name. With 3 users, this script takes around 20 seconds. It uses the Twitter API and does some mysql inserts. Is this the most efficient way to do it or not?
Certainly not!
I don't understand why you store each user in their table. You should create a users table and select from there.
It will run in 0.0001 seconds.
Update:
A table has rows and columns. You can store multiple users in rows, and information about each user in columns.
Please try some database design tutorials/books, they wil help you a great deal.
If your worried about storing multiple entries for each user within the same users table, you can have a seperate table for tweets with the tweet_id refering to the user.
I'd certainly go for one users table.
Databases are optimized for processing many rows; some of the techniques used are indexes, physical layout of data on disk and so on. Operations on many tables will be always be slower - this is just not what RDBMS were built to do.
There is one exception - sometimes you optimize databases by sharding (partitioning data), but this approach has as many advantages as disadvantages. One of the disadvantages is that queries like the one you described take a lot of time.
You should put all your users in one table, because, from logical point of view - they represent one entity.
Related
I am planing to design a database which may have to store huge amounts of data. But i am not sure which way i should use for this? the records may have fields like user id, record date, group, coordinate and perhaps other properties like that, but the key is the user id.
then i may have to call (select) or process the records with that user id. there may be thousands of user ids so here is the question.
1-) on every record; i should directly store all records in a single table? and
then call or process them like "... WHERE userId=12345 ...".
2-) on every record; i should check if there exists a table with that
user id and if not create a new table with the user id as table name
and store its data in that table. and then call or process them with
"SELECT * FROM ...".
So what would you suggest me?
There are different views about using many databases vs many tables. the common view is that there isn't any performance disadvantage. i prefered to go with the 1st way (single table). the project is finished and there arent any problems. i dont need to change the table all the time. but my main reason was because it is a little bit more complicated and time-consuming to program many tables style.
1-) on every record; i should directly store all records in a single table? and then call or process them like "... WHERE userId=12345 ...".
besides that here is a link of mysql.com about many tables that could be.
Disadvantages of Creating Many Tables in the Same Database
If you have many MyISAM tables in the same database directory, open, close, and create operations are slow. If you execute SELECT statements on many different tables, there is a little overhead when the table cache is full, because for every table that has to be opened, another must be closed. You can reduce this overhead by increasing the number of entries permitted in the table cache.
(http://dev.mysql.com/doc/refman/5.7/en/creating-many-tables.html)
Part of my project involves storing and retrieving loads of ips in my database. I have estimated that my database will have millions of ips within months of starting the project. That been the case I would like to know how slow simple queries to a big database can get? What will be the approximate speeds of the following queries:
SELECT * FROM table where ip= '$ip' LIMIT 1
INSERT INTO table(ip, xxx, yyy)VALUES('$ip', '$xxx', '$yyy')
on a table with 265 million rows?
Could I speed query speeds up by having 255^2 tables created that would have names corresponding to all the 1st two numbers of all possible ipv4 ip addresses, then each table would have a maximum of 255^2 rows that would accommodate all possible 2nd parts to the ip. So for example to query the ip address "216.27.61.137" it would be split into 2 parts, "216.27"(p1) and "61.137"(p2). First the script would select the table with the name, p1, then it would check to see if there are any rows called "p2", if so it would then pull the required data from the row. The same process would be used to insert new ips into the database.
If the above plan would not work what would be a good way to speed up queries in a big database?
The answers to both your questions hinge on the use of INDEXES.
If your table is indexed on ip your first query should execute more or less immediately, regardless of the size of your table: MySQL will use the index. Your second query will slow as MySQL will have to update the index on each INSERT.
If your table is not indexed then the second query will execute almost immediately as MySQL can just add the row at the end of the table. Your first query may become unusable as MySQL will have to scan the entire table each time.
The problem is balance. Adding an index will speed the first query but slow the second. Exactly what happens will depend on server hardware, which database engine you choose, configuration of MySQL, what else is going on at the time. If performance is likely to be critical, do some tests first.
Before doing any of that sort, read this question (and more importantly) its answers: How to store an IP in mySQL
It is generally not a good idea to split data among multiple tables. Database indexes are good at what they do, so just make sure you create them accordingly. A binary column to store IPv4 addresses will work rather nicely - it is more a question of query load than of table size.
First and foremost, you can't predict how long will a query will take, even if we knew all information about the database, the database server, the network performance and another thousands of variables.
Second, if you are using a decent database engine, you don't have to split the data into different tables. It knows how to handle big data. Leave the database functionality to the database itself.
There are several workarounds to deal with large datasets. Using the right data types and creating the right indexes will help a lot.
When you begin to have problems with your database, then search for something specific to the problem you are having.
There are no silver bullets to big data problems.
So... assuming i have a database with three tables:
Table clients
Table data
and Table clients_to_data
And I have a API which allows Clients to Access data from Table data. Every client has a record in Table clients (with things like IP adress etc.) To log who accesses what, i'm logging in the table clients_to_data (which contains the ID for table clients, table data and a timestamp.)
Every time a user access my API, he get's logged in the clients_to_data table. (So records in clients and data are not updated, just read.)
I also want to be able to get the amount of hits per client. Pretty easy, just query the clients_to_data table with a client_id and count the results. But as my DB grows, i'll have tenthousands of records in the clients_to_data table.
And here's my question:
Is it a better practice to add a field "hits" to Table clients that stores the amount of hits for that user and increment it every time the user queries the API
So this would be adding redundancy to the DB which i've heard generally is a bad thing. But in this case i think it would speed up the process of retrieving the amount of hits.
So which method is better and faster in this case? Thanks for your help!
Faster when?
Appending to the table will be faster , than finding the record and updating it, much faster than reading it, incrementing and updating it.
However having hits "precalulated", will be faster than the aggregate query to count them.
What you gain on the swings you lose on the roundabouts, which choice you make depends on your current usage patterns. So are you prepared to slow down adding a hit, to gain a signicant boost on finding out how many you've had?
Obviously selecting a single integer column from a table will be faster then selecting a count() of rows from a table.
The complexity trade off is a bit moot. 1 way you need to write a more complex sql, the other way you will need to update/insert 2 tables in your code.
How often is the number of hits queried? Do you clients look it up, or do you check it once a month? If you only look now and then I probably wouldn't be too concerned about the time taken to select count(*).
If your clients look up the hit count with every request, then I would look at storing a hits column.
Now that our table structures are all clearly defined, lets get to work.
You want to record something in the DB which is the number of times every client has accessed the data, in other terms,
Insert a record into a table "client_to_data" for every clients "impression".
You are worried about 2 things,
1. Redundancy
2. Performance when retrieving the count
What about the performance when storing the count.(Insert statements)..?
This is a classic scenario, where I would write the data to be inserted into memcache, and do a bulk insert at the end of the day.
More importantly, I will normalize the data before inserting it to the DB.
As to select, create indexes. If its text, install sphinx.
Thanks.
I'm developing software for conducting online surveys. When a lot of users are filling in a survey simultaneously, I'm experiencing trouble handling the high database write load. My current table (MySQL, InnoDB) for storing survey data has the following columns: dataID, userID, item_1 .. item_n. The item_* columns have different data types corresponding to the type of data acquired with the specific items. Most item columns are TINYINT(1), but there are also some TEXT item columns. Large surveys can have more than a hundred items, leading to a table with more than a hundred columns. The users answers around 20 items in one http post and the corresponding row has to be updated accordingly. The user may skip a lot of items, leading to a lot of NULL values in the row.
I'm considering the following solution to my write load problem. Instead of having a single table with many columns, I set up several tables corresponding to the used data types, e.g.: data_tinyint_1, data_smallint_6, data_text. Each of these tables would have only the following columns: userID, itemID, value (the value column has the data type corresponding to its table). For one http post with e.g. 20 items, I then might have to create 19 rows in data_tinyint_1 and one row in data_text (instead of updating one large row with many columns). However, for every item, I need to determine its data type (via two table joins) so I know in which table to create the new row. My zend framework based application code will get more complicated with this approach.
My questions:
Will my solution be better for heavy write load?
Do you have a better solution?
Since you're getting to a point of abstracting this schema to mimic actual datatypes, it might stand to reason that you should simply create new table sets per-survey instead. Benefit will be that the locking will lessen and you could isolate heavy loads to outside machines, if the load becomes unbearable.
The single-survey database structure then can more accurately reflect your real world conditions and data input handlers. It ought to make your abstraction headaches go away.
There's nothing wrong with creating tables on the fly. In some configurations, soft sharding is preferable.
This looks like obvious solution would be to use document database for fast writes and then bulk-insert answers to MySQL asynchronously using cron or something like that. You can create view in the document database for quick statistics, but allow filtering and other complicated stuff only in MySQ if you're not a fan of document DBMSs.
I've two routes,
1) creating sub-tables for each user and storing his individual content
2) creating few tables and store data of all users in them.
for instance.
1) 100,000 tables each with 1000 rows
2) 50 Tables each with 2,000,000 rows
I wanna know which route is the best and efficient.
Context: Like Facebook, for millions of users, their posts, photos, tags. All this information is in some giant tables for all users or each user has it's own sub-tables.
This are some pros and cons of this two approaches in MySQL.
1. Many small tables.
Cons:
More concurrent tables used means more file descriptors needed (check this)
A database with 100.000 tables is a mess.
Pros:
Small tables means small indexes. Small indexes can be loaded entirely on memory, that means that your queries will run faster.
Also, because of small indexes, data manipulation like inserts will run faster.
2. Few big tables
Cons:
A huge table imply very big indexes. If your index cannot be entirely loaded on memory most of the queries will be very slow.
Pros:
The database (and also your code) it's clear and easy to mantain.
You can use partitioning if your tables became so big. (check this).
From my experience a table of two millions rows (I've worked with 70 millions rows tables) it's not a performance problem under MySQL if you are able to load all your active index on memory.
If you'll have many concurrent users I'll suggest you to evaluate other technologies like Elastic Search that seems to fit better this kind of scenarios.
Creating a table for each user is the worse design possible. It is one of the first things you are taught in db design class.
Table is a strong logical component of a database and hence it is used for many of the maintenance tasks by RDBMS. E.g. it is customary to set up table file space, limitations, quota, log space, transaction space, index tree space and many many other things. If every table gets it's own file to put data in, you'll get big roundtrip times when joining tables or whatsoever.
When you create many tables, you'll be having a really BIG overhead in maintenance. Also, you'll be denying the very nature of relational sources. And just suppose you're adding record to a database - creating a new table each time? It'd be a bit harder on your code.
But then again, you could try and see for yourself.
You should leverage the power of MySQL indexes which will basically provide something similar to having one table per user.
Creating one table called user_data indexed on user_id will (in a big picture) transform your queries having a where clause on user_id like this one:
SELECT picture FROM user_data WHERE user_id = INT
Into:
Look in the index to find rows from user_data where user_id = INT
Then, in this batch of rows load me the value of picture
By doing that, MySQL won't search into all rows from user_data but into the relevant one found in the index.