I'm new to mysql & I have coded my first php-mysql application.
I have 30 devices list from which I want user to add their preferred devices into his account.
So, I created mysql table "A" where some device-id's are being stored under particular userID like this
UserID | Device Id
1 | 33
1 | 21
1 | 52
2 | 12
2 | 45
3 | 22
3 | 08
1 | 5
more.....
Say, I have 5000 user-ids & 30 devices-ids.
& If each user has 15 device-id's (on-average) under his account records.
Then it will be 5000 X 15 = 75000 records under Table "A"
So, My question, Is there any limit on how many records can we store in to mysql table ?
Is my approach for storing records as mentioned above is correct ?? whether it will affect query performance if more users are added ?
OR there is any better way to do that ?
It's very unlikely that you will approach the limitations of a MySQL table with two columns that are only integers.
If you're really concerned about query performance, you can just go ahead and throw an index on both columns. It's likely that the cost of inserting / updating your table will be negligible even with an index on the device ID. If your database gets huge, it can speed up queries such as "which users prefer this device". Your queries that ask "what devices do this user prefer" will also be fast with an index on user.
I would just say to make this table a simple two column table with a two-part composite key (indexed). This way, it will be as atomic as possible and won't require any of the tom-foolery that some may suggest to "increase performance."
Keep it atomic and normal -- your performance will be fine and you won't exceed any limitations of your DBMS
I don't see anything wrong in it. If you are looking forward to save some server space, then don't worry about it. Let your database do the underlying job. Index your database properly with an ID - int(10) primary auto increment . Think about scalability when it is needed. Your first target should be to complete the application that you are making. Then test it. If you find that it is causing any lag, problem, then start worrying about the things to solve the problem. Don't bother yourself with things that probably you might not even face.
But considering the scale of your application (75k to 1 lac records), it shouldn't be much of a task. Alternatively you can have a schema like this for your users
(device_table)
device_id
23
45
56
user_id | device_id
1 | 23,45,67,45,23
2 | 45,67,23,45
That is storing device_ids in an array and then getting the device_id for particular user as
$device_for_user=explode(',',$device_id)
Where of course device_id is retrieved from mysql database.
so you'll have
$device_for_user[0]=23
$device_for_user[1]=45
amd so on.
But this method isn't a very good design or an approach. But just for your information, this is one way of doing it
Related
Visitor opens url, for example
/transport/cars/audi/a6
or
/real-estate/flats/some-city/city-district
I plan separate table for cars and real-estate (separate table for each top level category).
Based on url (php explode create array)
$array[0] - based on the value know which table to SELECT
And so on $array[1], $array[2] ...
For example, RealEstate table may look like:
IdOfAd | RealEstateType | Location1 | Location2 | TextOfAd | and so on
----------------------------------------------------------------------
1 | flat | City1 | CityDistric1 | text.. |
2 | land | City2 | CityDistric2 | text.. |
And mysql query to display ads would be like:
SELECT `TextOfAd`, `and so on...`
WHERE RealEstateType = ? AND Location1 =? AND Location2 = ?
// and possibly additional AND .. AND
LIMIT $start, $limit
Thinking about performance. Hopefully after some long time number of active ads would be high (also i plan not to delete expired ads, just change column value to 0 not to display for SELECT; but display if directly visit from search engine).
What i need to do (change database design or SELECT in some another way), if for example number of rows in table would be 100 000 or millions?
Thinking about moving expired ads to another table (for which performance is not important). For example, from search engine user goes to some url with expired ad. At first select main table, if do not find, then select in table for expired ads. Is this some kind of solution?
Two hints:
Use ENGINE=InnoDB when creating your table. InnoDB uses row-level locking, which is MUCH better for bigger tables, as this allows rows to be read much faster, even when you're updating some of them.
ADD INDEX on suitable columns. Indexing big tables can reduce search times by several orders of magnitude. They're easy to forget and a pain to debug! More than once I've been investigating a slow query, realised I forgot a suitable INDEX, added it, and had immediate results on a query that used to take 15 seconds to run.
I am working on a system, where among the requirements are:
PHP + PostgreSql
Multitenant system, using a single database for all the tenants (tenantId).
Each tenant's data is unknown, so they should have the flexibility to add whatever data they want:
e.g. for an accounts table,
tenant 1 > account_no | date_created | due_date
tenant 2 > account_holder | start_date | end_date | customer_name | ...
The only solution I can see for this case is using the key-value pair database structure:
- e.g.
accounts table
id | tenant_id | key | value
1 1 account_no 12345
accounts_data table
account_id | key | value
1 date_created 01-01-2014
1 due_date 30-02-2014
The draw backs I see for this approach in the long run:
- Monster queries
- Inefficient with large data
- Lots of coding to handle data validation, since no data types are there and everything is saved as string
- Filtering can be lots of work
Having that said, I would appreciate suggestions as well as if any other approach I can use to achieve this.
Warning, you're walking into the inner platform effect and Enterprisey design.
Stop and back away slowly, then revisit your assumptions about why you have to do things this way.
Something has to give here; either:
Use a schemaless free-form database for schemaless, free-form data;
Allow tenant users to define useful schema for their data based on their needs; or
Compromise with something like hstore or json storage
Please, please, please don't create a database within an EAV model of a database. Developers everywhere in the world will cry and your design will soon end up talked about on The Daily WTF.
When storing relationship data for a user (potentially a thousand friends per user), would it be faster to create a new row for each relationship, or to concatenate all of of their friends into a string and then parse that later?
I.e.
Primary id | Friend1ID | Friend2ID|
1| 234| 5789|
2| 5789| 234|
Where the IDs are references to primary IDs in a 'Users' table.
Or for the 'Users' table to just have a column called friends which may look like this:
Primary id | Friend1ID |
234| 5789.123.8474|
5789| 234|
I'm of the understanding that string concatenation and parsing is generally quite slow, so I'd be tempted to lean towards the first method. However as the number of users grows, this then becomes a case of selecting one row and parsing it V searching millions of rows for rows which match the WHERE criteria.
Is one method distinctly faster than the other? Particularly as the number of users grows.
You should use a second table to store the friends.
Users Table
----------
userid | username
1 | Bob
2 | Mike
3 | John
Users Friends Table
--------------------
userid | friend_id
1 | 2
3 | 2
Here you can see that Mike is friends with both Bob and John.... This is of course a very simply demonstration.
Your second option will not scale, some people may have hundreds of thousands of friends, storing each Id in a single field is going to cause a headache further down the line. adding friends, removing friends. working out complex relationships between people. Lots of over head.
Querying millions of records with a WHERE clause on a properly indexed table should take no more than a second, the first option is the better one.
The "correct" way would probably be keeping multiple rows. This allows for much easier statistical analysis and more complex queries (like friends of friends) without any hacky stuff. Integer storage size is also often smaller than string storage, even though you're repeating one ID - especially if you use an appropriately sized integer store (like mediumint).
It's also more maintainable, scalable (if they start getting a damn lot of friends) export and importable. The speed gain from concatenation, if any, wouldn't be worth the rest of the benefits.
If you wanted for instance to search if Bob was a friend of Jane, this would be a single row lookup in the multiple row implementation, or in the single row implementation: get Bob's row, decode field, loop through field looking for Jane - found Jane. DBMS optimisation and indexing would make the multiple row implementation much faster in this case - if you had the primary key as (id, friendid) then it'd be pretty much instantaneous as the table would probably be hashed on that key.
I believe the proper way to do it which might be more faster is two do a two columns table
user | friend
1 | 2
1 | 3
It will simple and will make queering and updating much easier and you can have as many relationship as you want.
Don't over complicate the problem...
... Asking for the more "correct" way is wrong itself.
It depends based on case.
If you have low access rate to your web application having more rows won't change anything on the other side of the coins (i'm not English), on large and medium application access it's maybe better to have the minimal access to the db possible.
To obtain this as you've already thinked you can concatenate the values and then split them on login of the user and then put everything into the $_SESSION supervar.
At least this is what i think.
I've recently been working on normalizing and restructuring my database to make it more effective in the long run. Currently I have around 500 records, and obviously I don't want to lose the users data.
I assume SQL through phpmyadmin is the easiest way to do this?
So let me give you guys an example
In my old table I would have something like this
records //this table has misc fields, but they are unimportant right now
id | unit |
1 | g |
With my new one, I have it split apart 3 different tables.
records
id
1
units
id | unit
1 | g
record_units
id | record_id | unit_id
1 | 1 | 1
Just to be clear, I am not adding anything into the units table. The table is there as a reference for which id to store in the record_units table
As you can see it's pretty simple. What moved in the second table is that I started using an index table to hold my units, since they would be repeated quite often. I then store that unit id, and the pairing record id in the record_units table, so I can later retrieve the fields.
I am not incredibly experienced with SQL, though i'd say I know average. I know this operation would be quite simple to do with my cakephp setup, because all my associations are already setup, but I can't do that.
If I understand correctly you want to copy related records from your old table to the new tables, in which case you can use something like this
UPDATE units u
INNER JOIN records r
ON u.id=r.id
SET u.unit = r.unit
This will copy the unit type from your old table to the matching id in the new units table and then you can do something similare on your 3rd table
I have a large database of artists, albums, and tracks. Each of these items may have one or more tags assigned via glue tables (track_attributes, album_attributes, artist_attributes). There are several thousand (or even hundred thousand) tags applicable to each item type.
I am trying to accomplish two tasks, and I'm having a very hard time getting the queries to perform acceptably.
Task 1) Get all tracks that have any given tags (if provided) by artists that have any given tags (if provided) on albums with any given tags (if provided). Any set of tags may not be present (i.e. only a track tag is active, no artist or album tags)
Variation: The results are also presentable by artist or by album rather than by track
Task 2) Get a list of tags that are applied to the results from the previous filter, along with a count of how many tracks have each given tag.
What I am after is some general guidance in approach. I have tried temp tables, inner joins, IN(), all my efforts thus far result in slow responses. A good example of the results I am after can be seen here: http://www.yachtworld.com/core/listing/advancedSearch.jsp, except they only have one tier of tags, I am dealing with three.
Table structures:
Table: attribute_tag_groups
Column | Type |
------------+-----------------------------+
id | integer |
name | character varying(255) |
type | enum (track, album, artist) |
Table: attribute_tags
Column | Type |
--------------------------------+-----------------------------+
id | integer |
attribute_tag_group_id | integer |
name | character varying(255) |
Table: track_attribute_tags
Column | Type |
------------+-----------------------------+
track_id | integer |
tag_id | integer |
Table: artist_attribute_tags
Column | Type |
------------+-----------------------------+
artist_id | integer |
tag_id | integer |
Table: album_attribute_tags
Column | Type |
------------+-----------------------------+
album_id | integer |
tag_id | integer |
Table: artists
Column | Type |
------------+-----------------------------+
id | integer |
name | varchar(350) |
Table: albums
Column | Type |
------------+-----------------------------+
id | integer |
artist_id | integer |
name | varchar(300) |
Table: tracks
Column | Type |
-------------+-----------------------------+
id | integer |
artist_id | integer |
album_id | integer |
compilation | boolean |
name | varchar(300) |
EDIT I am using PHP, and I am not opposed to doing any sorting or other hijinx in script, my #1 concern is speed of return.
If you want speed, I would suggest you look into Solr/Lucene. You can store your data, and have very speedy lookups by calling Solr and parsing the result from PHP. And as an added benefit you get faceted searches as well (which is task 2 of your question if I interpret it correctly). The downside is of course that you might have redundant information (once stored in DB, once in the Solr document store). And it does take a while to setup (well, you could learn a lot from Drupal Solr integration).
Just check out the PHP reference docs for Solr.
Here's on article on how to use Solr with PHP, just in case : http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/.
You probably should try to denormalize your data. Your structure is optimised for insert/update load, but not for queries. As I got it, your will have much more select queries than insert/update queries.
For example you can do something like this:
store your data in normalized structure.
create agregate table like this
track_id, artist_tags, album_tags, track_tags
1 , jazz/pop/, jazz/rock, /heavy-metal/
or
track_id, artist_tags, album_tags, track_tags
1 , 1/2/, 1/3, 4/
to spead up search you probably should create FULLTEXT index on *_tags columns
query this table with sql like
select * from aggregate where album_tags MATCH (track_tags) AGAINST ('rock')
rebuild this table incrementally once a day.
I think the answer greately depends on how much money you wish to spend on your project - there are some tasks that are even theoretically impossible to accomplish given strict conditions(for example that you must use only one weak server). I will assume that you are ready to upgrade your system.
First of all - your table structure forces JOIN's - I think you should avoid them if possible when writing high performace applications. I don't know "attribute_tag_groups" is, so I propose a table structure: tag(varchar 255), id(int), id_type(enum (track, album, artist)). Id can be artist_id,track_id or album_id depending on id_type. This way you will be able too lokup all your data in one table, but of cource it will use much more memory.
Next - you should consider using several databases. It will help even more if each database contains only part of your data(each lookup will be faster). Deciding how to spread your data between databases is usually rather hard task: I suggest you make some statistics about tag length, find ranges of length that will get similar trac/artists results count and hard-code it into your lookup code.
Of cource you should consider MySql tuning(I am sure you did that, but just in case) - all your tables should reside in RAM - if that is impossible try to get SSD discs, raids etc.. Proper indexing and database types/settings are really important too (MySql may even show some bottlenecks in internal statistics).
This suggestion may sound mad - but sometimes it is good to let PHP do some calculations that MySql can do itself. MySql databases are much harder to scale, while a server for PHP processing can be added in in the matter of minutes. And different PHP threads can run on different CPU cores - MySql have problems with it. You can increase your PHP performace by using some advanced modules(you can even write them yourself - profile your PHP scripts and hard code bottlenecks in fast C code).
Last but I think the most important - you must use some type of caching. I know that it is really hard, but I don't think that there was any big project without a really good caching system. In your case some tags will surely be much more popular then others, so it should greately increase performance. Caching is a form of art - depending on how much time you can spend on it and how much resources are avaliable you can make 99% of all requests use cache.
Using other databases/indexing tools may help you, but you should always consider theoretical query speed comparison(O(n), O(nlog(n))...) to understand if they can really help you - using this tools sometimes give you low performance gain(like constant 20%), but they may complicate your application design and most of the time it is not worth it.
From my experience most 'slow' MySQL database doesn't have correct index and/or queries. So I would check these first:
Make sure all data talbes' id fields is primary index. Just in case.
For all data tables, create an index on the external id fields and then the id, so that MySQL can use it in search.
For your glue tables, setting a primary key on the two fields, first the subject, then the tag. This is for normal browsing. Then create a normal index on the tag id. This is for searching.
Still slow? Are you using MyISAM for your tables? It is designed for quick queries.
If still slow, run an EXPLAIN on a slow query and post both the query and result in the question. Preferably with an importable sql dump of your complete database structure.
Things you may give a try:
Use a Query Analyzer to explore the bottlenecks of your querys. (In most times the underlying DBS is quite doing an amazing job in optimizing)
Your table structure is well normalized but personal experience showed me that you can archive much greater performance levels with structures that enable you to avoid joins& subquerys. For your case i would suggest to store the tag information in one field. (This requires support by the underlying DBS)
So far.
Check your indices, and if they are used correctly. Maybe MySQL isn't up to the task. PostgreSQL should be similiar to use but has better performance in complex situations.
On a completely different track, google map-reduce and use one of these new fancy no-SQL databases for really really large data sets. This can do distributed search on multiple servers in parallel.