For an MySQL table I am using the InnoDB engine and the structure of my tables looks like this:
Table user
id | username | etc...
----|------------|--------
1 | bruce | ...
2 | clark | ...
3 | tony | ...
Table user-emails
id | person_id | email
----|-------------|---------
1 | 1 | bruce#wayne-ent.com
2 | 1 | ceo#wayne-ent.com
3 | 2 | clark.k#daily-planet.com
To fetch data from the database I've written a tiny framework. E.g. on __construct($id) it checks if there is a person with the given id, if yes it creates the corresponding model and saves only the field id to an array. During runtime, if I need another field from the model it fetches only the value from the database, saves it to the array and returns it. E.g. same with the field emails for that my code accesses the table user-emails and get all the emails for the corresponding user.
For small models this works alright, but now I am working on another project where I have to fetch a lot of data at once for a list and that takes some time. Also I know that many connections to MySQL and many queries are quite stressful for the server, so..
My question now is: Should I fetch all data at once (with left joins etc.) while constructing the model and save the fields as an array or should I use some other method?
Why do people insist on referring to the entities and domain objects as "models".
Unless your entities are extremely large, I would populate the entire entity, when you need it. And, if "email list" is part of that entity, I would populate that too.
As I see it, the question is more related to "what to do with tables, that are related by foreign keys".
Lets say you have Users and Articles tables, where each article has a specific owner associate by user_id foreign key. In this case, when populating the Article entity, I would only retrieve the user_id value instead of pulling in all the information about the user.
But in your example with Users and UserEmails, the emails seem to be a part of the User entity, and something that you would often call via $user->getEmailList().
TL;DR
I would do this in two queries, when populating User entity:
select all you need from Users table and apply to User entity
select all user's emails from the UserEmails table and apply it to User entity.
P.S
You might want to look at data mapper pattern for "how" part.
In my opinion you should fetch all your fields at once, and divide queries in a way that makes your code easier to read/manage.
When we're talking about one query or two, the difference is usually negligible unless the combined query (with JOINs or whatever) is overly complex. Usually an index or two is the solution to a very slow query.
If we're talking about one vs hundreds or thousands of queries, that's when the connection/transmission overhead becomes more significant, and reducing the number of queries can make an impact.
It seems that your framework suffers from premature optimization. You are hyper-concerned about fetching too many fields from a row, but why? Do you have thousands of columns or something?
The time consuming part of your query is almost always the lookup, not the transmission of data. You are causing the database to do the "hard" part over and over again as you pull one field at a time.
Related
When storing relationship data for a user (potentially a thousand friends per user), would it be faster to create a new row for each relationship, or to concatenate all of of their friends into a string and then parse that later?
I.e.
Primary id | Friend1ID | Friend2ID|
1| 234| 5789|
2| 5789| 234|
Where the IDs are references to primary IDs in a 'Users' table.
Or for the 'Users' table to just have a column called friends which may look like this:
Primary id | Friend1ID |
234| 5789.123.8474|
5789| 234|
I'm of the understanding that string concatenation and parsing is generally quite slow, so I'd be tempted to lean towards the first method. However as the number of users grows, this then becomes a case of selecting one row and parsing it V searching millions of rows for rows which match the WHERE criteria.
Is one method distinctly faster than the other? Particularly as the number of users grows.
You should use a second table to store the friends.
Users Table
----------
userid | username
1 | Bob
2 | Mike
3 | John
Users Friends Table
--------------------
userid | friend_id
1 | 2
3 | 2
Here you can see that Mike is friends with both Bob and John.... This is of course a very simply demonstration.
Your second option will not scale, some people may have hundreds of thousands of friends, storing each Id in a single field is going to cause a headache further down the line. adding friends, removing friends. working out complex relationships between people. Lots of over head.
Querying millions of records with a WHERE clause on a properly indexed table should take no more than a second, the first option is the better one.
The "correct" way would probably be keeping multiple rows. This allows for much easier statistical analysis and more complex queries (like friends of friends) without any hacky stuff. Integer storage size is also often smaller than string storage, even though you're repeating one ID - especially if you use an appropriately sized integer store (like mediumint).
It's also more maintainable, scalable (if they start getting a damn lot of friends) export and importable. The speed gain from concatenation, if any, wouldn't be worth the rest of the benefits.
If you wanted for instance to search if Bob was a friend of Jane, this would be a single row lookup in the multiple row implementation, or in the single row implementation: get Bob's row, decode field, loop through field looking for Jane - found Jane. DBMS optimisation and indexing would make the multiple row implementation much faster in this case - if you had the primary key as (id, friendid) then it'd be pretty much instantaneous as the table would probably be hashed on that key.
I believe the proper way to do it which might be more faster is two do a two columns table
user | friend
1 | 2
1 | 3
It will simple and will make queering and updating much easier and you can have as many relationship as you want.
Don't over complicate the problem...
... Asking for the more "correct" way is wrong itself.
It depends based on case.
If you have low access rate to your web application having more rows won't change anything on the other side of the coins (i'm not English), on large and medium application access it's maybe better to have the minimal access to the db possible.
To obtain this as you've already thinked you can concatenate the values and then split them on login of the user and then put everything into the $_SESSION supervar.
At least this is what i think.
I have this data that should output to corresponding number of social media that he interacted with.
There's 4 interaction which is fblike_point, fbshare_point, tweet_point, and follow_point
So let's say, I've interacted with fblike_point and tweet_point judging from the data below.
So what I want to do is, it should output 2 times since I've interacted with fblike_point and tweet_point.
Output:
2013-05-14 | fblike_point
2013-05-14 | tweet_point
If I interacted 4 times, it should output 4 times with the corresponding social media interaction that he made.
Well I can manage to do this stuff but, it was like redundancy, for example I'm using a mysql query in PHP for selecting data:
SELECT date_participated, fblike_point FROM table WHERE fblike_point = 1
SELECT date_participated, fbshare_point FROM table WHERE fbshare_point = 1
SELECT date_participated, tweet_point FROM table WHERE tweet_point = 1
SELECT date_participated, follow_point FROM table WHERE follow_point = 1
So is there any other way to have a short method or something?
If I interacted 4 times, it should output 4 times
With your data schema, you'd either need the four distinct queries you quoted, or a UNION over these.
it was like redundancy
This is redundant because the way your schema is organized. If you want to be able to treat these different interactions alike (which makes a lot of sense), then you'd want an extra table for these, with one column identifying the row of your original table that this refers to, and a second column (probably of an ENUM type) identifying the social media. Both together would form the primary key of that table.
You can then create a VIEW from the actual tables which looks just like your table does now. That way you can maintain compatibility to existing queries and still provide more flexible queries for those cases where you need them.
I want to improve the speed of a notification board. It retrieves data from the event table.
At this moment the events MySQL table looks like this
id | event_type | who_added_id | date
In the event table I store one row with information regarding a particular event. Each time a users A asks for new notifications, the query runs through the table and looks if the notifications added by the user B suit him (they have to be friends, members of the same groups, have previously chatted).
Table events became big, because of the bulky query the page loads slow.
I'm thinking of changing entirely this design and, instead of adding one event row and then compare if the user's event suits or not, to add as many rows as interested users. I would change the table events structure as follows:
id | event_type | who_added_id | forwho_id | date
Now, if user B creates an event which interests other 50 members, I create 50 rows with the same information and in the 'forwho_id' field I mention those 50 members which must get this notification.
I think the query will become much more simple and it will take less time to search through it.
How do you think:
1. Is this a good approach in storing such kind of data or we should avoid duplicate data at any cost?
2. How do you think the events table will behave if the number of interested users will be not 50 but hundreds?
Thank you for reading this and I hope I made myself understandable.
Duplicated data is not "bad", and it's not to be "avoided at all cost".
What is "bad" is uncontrolled redundancy, and the kind of problems that come up when the logical data model isn't third normal form. It is acceptable and expected that an implementation will deviate from a logical data model, and introduce redundancy for performance.
Your revised design looks appropriate for your needs.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the most efficient/elegant way to parse a flat table into a tree?
This I am finding rather tricky and would like some opinions on the matter.
I am trying to store hierarchal data (tree like) with an unknown number of levels and branches. I am wanting to be able to add new ones and delete any at any time.
I need to be able to query from any node in the hierarchy for all of the children id's in one go and efficiently due to large user base.
Lets take a hypothetical example of a website where families socialise and update their status like in facebook and at any time you can be viewing a family members "Wall" which will also include all of the recent status updates form the people below them in the hierarchy in chronological order.
Obviously the fetching posts once you have the array of family members id's who are children of this family members node is easy enough in a loop.
Lets take an example simple table structure of:
id | parentId | name
________________________
1 | NULL | John
2 | 1 | Peter
3 | 1 | Bob
4 | 3 | Emma
5 | 2 | Sam
6 | 4 | Gill
etc.... You get the idea.
I need to be able to do the above with something like this unless you think the structure needs to be adapted.
I have read up on mySql nested set model.
This seems very fiddly and could be unreliable if something was not to update correctly and would mess everything up.
I am used to using php and mysql but have been reading a bit on cassandra and thrift. Not sure if this would be easier?
There are already good approaches out there which are more simple than the solution you propose.
Here are a couple of links which explain how to do it (we use this ourselves for much the same problem you describe and it works well).
Managing Hierarchical Data in MySQL (from MySQL)
Storing Hierarchical Data in a Database (from Sitepoint, but a clearer explanation, I think)
This makes inserting/updating more complex, but selecting portions of the tree structure far faster (with only one query). It allows finding all children of any given node in one query, and finding all the ancestors of a given node with one query.
So I think I have come up with an idea.
The reason I am against the nested set model is because it seems like it is still not the best way and is not going to be the ideal performance solution.
I am going to cover a proposed solution I have been thinking about.
The concept means creating an hierarchal map table to keep track of all the relationships between each family member/node.
The way it would work is:
Using map table structure of this:
id | fMemberId | parentid
=====================================
1 | 3 | 2
2 | 4 | 3
3 | 4 | 2
1) As a new family member is created as a child of a parent we would take the parents id and create a new row in our family members table with the parent id set for future additional uses and functionality.
2) As this row is created we will create new rows with all of the parent id's for the new family member.
A quick way to do this would be to take the parent id from the new family member and do a query to the map table to find all the rows with the family member id the same as the new family members parent id and then store an array in php of the subsequent parent ids required for storing alongside the new family members id in the map table. This would then only require one sql query for grabbing all the parent id's for adding them rather than a number of queries based on the number of nodes
This would mean when we are viewing a family members feed of posts we would be able to query the db for simply the rows in the map table to get all the children id's of the current family member and subsequently query other tables for the post data.
The main trade off being the amount of potential storage required for this kind of system.
However I believe reading speed would be quicker as there is no conditional SQL statements and also maybe just as quick to write to db in this way.
We could overcome this by using InnoDB's cluster id's assigning an initial family id index and creating a new table with the "next family members id" based on the family id.
Also reliability, if a row wasn't written it would be easy enough to add it in. It prevents having to continually edit rows just to create a member.
What are your thoughts on this?
So far this seems to be a good way in my opinion. Took a lot of thinking to get to here. I also believe it could maybe be improved with time and being able to store arrays of id's per member rather than all of them. Still trying to work that one out!
Yes, your solution is called a transitive closure. I have written about it before:
What is the most efficient/elegant way to parse a flat table into a tree?
Models for Hierarchical Data
You also need the zero-length paths, e.g. 2-2, 3-3, 4-4.
Soon I'll be working on catalog(php+mysql) that will have multilang content support. And now I'm considering the best approach to design the database structure. At the moment I see 3 ways for multilang handling:
1) Having separate tables for each language specific data, i.e. schematicly it'll look like this:
There will be one table Main_Content_Items, storing basic data that cannot be translated like ID, creation_date, hits, votes on so on - it will be only one and will refer to all languages.
And here are tables that will be dublicated for each language:
Common_Data_LANG table(example: common_data_en_us) (storing common/"static" fields that can be translated, but are present for eny catalog item: title, desc and so on...)
Extra_Fields_Data_LANG table (storing extra fields data that can be translated, but can be different for custom item groups, i.e. like: | id | item_id | field_type | value | ...)
Then on items request we will look in table according to user/default language and join translatable data with main_content table.
Pros:
we can update "main" data(i.e. hits, votes...) that are updated most often with only one query
we don't need o dublicate data 4x or more times if we have 4 or more languages in comparison with structure using only one table with 'lang' field. So MySql queries would take less time to go through 100000(for example) records catalog rather then 400000 or more
Cons:
+2 tables for each language
2) Using 'lang' field in content tables:
Main_Content_Items table (storing basic data that cannot be translated like ID, creation_date, hits, votes on so on...)
Common_Data table (storing common/"static" fields that can be translated, but are present for eny catalog item: | id | item_id | lang | title | desc | and so on...)
Extra_Fields_Data table (storing extra fields data that can be translated, but can be different for custom item groups, i.e. like: | id | item_id | lang | field_type | value | ...)
So we'll join common_data and extra_fields to main_content_items according to 'lang' field.
Pros:
we can update "main" data(i.e. hits, votes...) that are updated most often with only one query
we only 3 tables for content data
Cons:
we have custom_data and extra_fields table filled with data for all languages, so its X time bigger and queries run slower
3) Same as 2nd way, but with Main_Content_Items table merged with Common_Data, that has 'lang' field:
Pros:
...?
Cons:
we need to update update "main" data(i.e. hits, votes...) that are updated most often with for every language
we have custom_data and extra_fields table filled with data for all languages, so its X time bigger and queries run slower
Will be glad to hear suggestions about "what is better" and "why"? Or are there better ways?
Thanks in advance...
I've given a similar anwer in this question and highlighted the advantages of this technique (it would be, for example, important for me to let the application decide on the language and build the query accordingly by only changing the lang parameter in the WHERE clause of the SQL query.
This get's pretty close to your second solution. I didn't quite got the "extra_fields" but if it makes sense, you could(!) merge it into the common_data table. I would advise you against the first idea since there will be too many tables and it can be easy to lose track about the items in there.
To your edit: I still consider the second approach the better one (it's my optinion so it's relative ;)) I'm no expert on optimization but I think that with proper indexes and proper table structure speed should be not be a problem. As always, the best way to find the most effective way is doing both methods and see which is best since speed will vary from data, structure, ....