Arrays vs Tables (PHP/MySQL) - php

Meet Jimmy. He has made his new life goal to prove that Chocolate is the best ice cream flavor ever. For this he built a simple form with radio buttons and a text field for the name so he can send the link to his friends.
He is using a very common set up, MySQL and PHP to save the form submissions in a table that looks like this:
selection being the id of the flavor. The flavors are stored in a PHP array because he plans to use the favor list in future pages:
$flavors = array(
1=>"Chocolate",
2=>"Cherry",
....
);
The form was a success and his friends are starting to ask Jimmy to add new options, so Jimmy has decided to take it to the next level and add country, age, email and other things to the form, but this time he is doubtful about whether it is a better idea to put the flavor names, countries, ages and other static data in arrays or save each of them in a database table, he knows how to do joins in queries anyways.
First approach would mean having a PHP file with many arrays and having access it every time Jimmy needs the flavor name something like:
$query = mysql_query("SELECT name, flavor FROM votes")
while($row = mysql_fetch_assoc($query)){
echo $row["name"]." - ".$flavors[$row["flavor"]];
}
Second approach would mean having many tables in the database and having to do a join every time he needs a name like this:
$query = mysql_query("SELECT name, flavor FROM votes LEFT JOIN flavors
WHERE votes.flavor = flavors.flavor");
while($row = mysql_fetch_assoc($query)){
echo $row["name"]." - ".$row["flavor"];
}
Although there seems to be little difference this is an important decision for Jimmy as he wants to build many more and bigger forms in the future.
What is the best way for Jimmy to handle static data like flavor names, countries, age groups, etc. that is associated with IDs in the database?
Given environmental details:
The arrays are static and will almost never change
He will be using the data on several pages so hard coding is not convenient
Adding a new array is usually faster
Thanks in advance for helping him out.

The second option is of-course more scale-able and almost better in every aspect, the only argument could be performance given that his data is gonna eventually get really big. But even at that point jimmy can easily cache the result from the new flavors table, using a technology like memcache, x-cache, or even write a code that will create the php file with the array of flavors dynamically using the flavores database. I am very confused as why someone with your reputation will ask such a question?!

I think Jimmy should be thinking about how he wants to lay his data out. Why limit his conquest to flavors.
If Jimmy is looking to store a lot of small bits of data, databases are the way to go. If Jimmy wants to store images of the items, those should be stored in files and he should store their relative location to some root directory in the database
Maybe one table can contain:
VOTE_ITEMS
ID - PRIMARY KEY
NAME
IMAGE
TAGS - (Maybe an imploded ID array with the IDs pointing to a TAG table)
...
Another table can contain:
USERS
ID - PRIMARY KEY
...
(As much information as your want to collect from your users)
...
On to voting:
POLLS
ID
VOTE_ITEM_IDS
...
USER_VOTES
POLLS_ID
VOTE_ITEM
USER_ID
Since Jimmy seems to know a lot about databases, anytime he wants to add something on he can just add another column (or table) depending on his needs. Also, if we wraps a sweet user system he can reuse it in other projects in the future!

I tend to store values like this in db tables, mainly so they can be modified via CMS.
Then I retrieve them all at once, only once, near the beginning of my PHP code, in a globals array ... e.g. $glob['flavors'], $glob['cities'], etc. Then it's as simple as ...
foreach ($person) {
echo 'Their flavor = '. $glob['flavors'][$person['flavor_id']];
}
... but you have to remember to include the global in any functions that will use it.
Benefits of this: Only one db lookup, global access.
Drawbacks of this: Memory hog if array is huge

Related

Best way to create nested array from tables: multiple queries/loops VS single query/loop style

Say I have 2 tables,
which I can "merge" and represent in a single nested array.
I'm wandering what would be the best way to do that, considering:
efficiency
best-practice
DB/server-side usage trade-off
what you should do in real life
same case for 3, 4 or more tables that can be "merged" that way
The question is about ANY server-side/relational-db.
2 simple ways I was thinking about
(if you have others, please suggest!
notice I'm asking for a simple SERVER-SIDE and RELATIONAL-DB,
so please don't waste your time explaining why I shouldn't
use this kind of DB, use MVC design, etc., etc. ...):
2 loops, 5 simple "SELECT" queries
1 loop, 1 "JOIN" query
I've tried to give a simple and detailed example,
in order to explain myself & understand better your answers
(though how to write the code and/or
finding possible mistakes is not the issue here,
so try not to focus on that...)
SQL SCRIPTS FOR CREATING AND INSERTING DATA TO TABLES
CREATE TABLE persons
(
id int NOT NULL AUTO_INCREMENT,
fullName varchar(255),
PRIMARY KEY (id)
);
INSERT INTO persons (fullName) VALUES ('Alice'), ('Bob'), ('Carl'), ('Dan');
CREATE TABLE phoneNumbers
(
id int NOT NULL AUTO_INCREMENT,
personId int,
phoneNumber varchar(255),
PRIMARY KEY (id)
);
INSERT INTO phoneNumbers (personId, phoneNumber) VALUES ( 1, '123-456'), ( 1, '234-567'), (1, '345-678'), (2, '456-789'), (2, '567-890'), (3, '678-901'), (4, '789-012');
A JSON REPRESENTATION OF THE TABLES AFTER I "MERGED" THEM:
[
{
"id": 1,
"fullName": "Alice",
"phoneNumbers": [
"123-456",
"234-567",
"345-678"
]
},
{
"id": 2,
"fullName": "Bob",
"phoneNumbers": [
"456-789",
"567-890"
]
},
{
"id": 3,
"fullName": "Carl",
"phoneNumbers": [
"678-901"
]
},
{
"id": 4,
"fullName": "Dan",
"phoneNumbers": [
"789-012"
]
}
]
PSEUDO CODE FOR 2 WAYS:
1.
query: "SELECT id, fullName FROM persons"
personList = new List<Person>()
foreach row x in query result:
current = new Person(x.fullName)
"SELECT phoneNumber FROM phoneNumbers WHERE personId = x.id"
foreach row y in query result:
current.phoneNumbers.Push(y.phoneNumber)
personList.Push(current)
print personList
2.
query: "SELECT persons.id, fullName, phoneNumber FROM persons
LEFT JOIN phoneNumbers ON persons.id = phoneNumbers.personId"
personList = new List<Person>()
current = null
previouseId = null
foreach row x in query result:
if ( x.id != previouseId )
if ( current != null )
personList.Push(current)
current = null
current = new Person(x.fullName)
current.phoneNumbers.Push(x.phoneNumber)
print personList
CODE IMPLEMENTATION IN PHP/MYSQL:
1.
/* get all persons */
$result = mysql_query("SELECT id, fullName FROM persons");
$personsArray = array(); //Create an array
//loop all persons
while ($row = mysql_fetch_assoc($result))
{
//add new person
$current = array();
$current['id'] = $row['id'];
$current['fullName'] = $row['fullName'];
/* add all person phone-numbers */
$id = $current['id'];
$sub_result = mysql_query("SELECT phoneNumber FROM phoneNumbers WHERE personId = {$id}");
$phoneNumbers = array();
while ($sub_row = mysql_fetch_assoc($sub_result))
{
$phoneNumbers[] = $sub_row['phoneNumber']);
}
//add phoneNumbers array to person
$current['phoneNumbers'] = $phoneNumbers;
//add person to final result array
$personsArray[] = $current;
}
echo json_encode($personsArray);
2.
/* get all persons and their phone-numbers in a single query */
$sql = "SELECT persons.id, fullName, phoneNumber FROM persons
LEFT JOIN phoneNumbers ON persons.id = phoneNumbers.personId";
$result = mysql_query($sql);
$personsArray = array();
/* init temp vars to save current person's data */
$current = null;
$previouseId = null;
$phoneNumbers = array();
while ($row = mysql_fetch_assoc($result))
{
/*
if the current id is different from the previous id:
you've got to a new person.
save the previous person (if such exists),
and create a new one
*/
if ($row['id'] != $previouseId )
{
// in the first iteration,
// current (previous person) is null,
// don't add it
if ( !is_null($current) )
{
$current['phoneNumbers'] = $phoneNumbers;
$personsArray[] = $current;
$current = null;
$previouseId = null;
$phoneNumbers = array();
}
// create a new person
$current = array();
$current['id'] = $row['id'];
$current['fullName'] = $row['fullName'];
// set current as previous id
$previouseId = $current['id'];
}
// you always add the phone-number
// to the current phone-number list
$phoneNumbers[] = $row['phoneNumber'];
}
}
// don't forget to add the last person (saved in "current")
if (!is_null($current))
$personsArray[] = $current);
echo json_encode($personsArray);
P.S.
this link is an example of a different question here, where i tried to suggest the second way: tables to single json
Preliminary
First, thank you for putting that much effort into explaining the problem, and for the formatting. It is great to see someone who is clear about what they are doing, and what they are asking.
But it must be noted that that, in itself, forms a limitation: you are fixed on the notion that this is the correct solution, and that with some small correction or guidance, this will work. That is incorrect. So I must ask you to give that notion up, to take a big step back, and to view (a) the whole problem and (b) my answer without that notion.
The context of this answer is:
all the explicit considerations you have given, which are very important, which I will not repeat
the two most important of which is, what best practice and what I would do in real life
This answer is rooted in Standards, the higher order of, or frame of reference for, best practice. This is what the commercial Client/Server world does, or should be doing.
This issue, this whole problem space, is becoming a common problem. I will give a full consideration here, and thus answer another SO question as well. Therefore it might contain a tiny bit more detail that you require. If it does, please forgive this.
Consideration
The database is a server-based resource, shared by many users. In an online system, the database is constantly changing. It contains that One Version of the Truth (as distinct from One Fact in One Place, which is a separate, Normalisation issue) of each Fact.
the fact that some database systems do not have a server architecture, and that therefore the notion of server in such software is false and misleading, are separate but noted points.
As I understand it, JSON and JSON-like structures are required for "performance reasons", precisely because the "server" doesn't, cannot, perform as a server. The concept is to cache the data on each (every) client, such that you are not fetching it from the "server" all the time.
This opens up a can of worms. If you do not design and implement this properly, the worms will overrun the app.
Such an implementation is a gross violation of the Client/Server Architecture, which allows simple code on both sides, and appropriate deployment of software and data components, such that implementation times are small, and efficiency is high.
Further, such an implementation requires a substantial implementation effort, and it is complex, consisting of many parts. Each of those parts must be appropriately designed.
The web, and the many books written in this subject area, provide a confusing mix of methods, marketed on the basis of supposed simplicity; ease; anyone-can-do-anything; freeware-can-do-anything; etc. There is not scientific basis for any of those proposals.
Non-architecture & Sub-standard
As evidenced, you have learned that that some approaches to database design are incorrect. You have encountered one problem, one instance that that advice is false. As soon as you solve this one problem, the next problem, which is not apparent to you right now, will be exposed. The notions are a never-ending set of problems.
I will not enumerate all the false notions that are sometimes advocated. I trust that as you progress through my answer, you will notice that one after the other marketed notion is false.
The two bottom lines are:
The notions violate Architecture and Design Standards, namely Client/Server Architecture; Open Architecture; Engineering Principles; and to a lesser in this particular problem, Database Design Principles.
Which leads to people like you, who are trying to do an honest job, being tricked into implementing simple notions, which turn into massive implementations. Implementations that will never quite work, so they require substantial ongoing maintenance, and will eventually be replaced, wholesale.
Architecture
The central principle being violated is, never duplicate anything. The moment you have a location where data is duplicated (due to caching or replication or two separate monolithic apps, etc), you create a duplicate that will go out of synch in an online situation. So the principle is to avoid doing that.
Sure, for serious third-party software, such as a gruntly report tool, by design, they may well cache server-based data in the client. But note that they have put hundreds of man-years into implementing it correctly, with due consideration to the above. Yours is not such a piece of software.
Rather than providing a lecture on the principles that must be understood, or the evils and costs of each error, the rest of this answer provides the requested what would you do in real life, using the correct architectural method (a step above best practice).
Architecture 1
Do not confuse
the data which must be Normalised
with
the result set, which, by definition, is the flattened ("de-normalised" is not quite correct) view of the data.
The data, given that it is Normalised, will not contain duplicate values; repeating groups. The result set will contain duplicate values; repeating groups. That is pedestrian.
Note that the notion of Nested Sets (or Nested Relations), which is in my view not good advice, is based on precisely this confusion.
For forty-five years since the advent of the RM, they have been unable to differentiate base relations (for which Normalisation does apply) from derived relations (for which Normalisation does not apply).
Two of these proponents are currently questioning the definition of First Normal Form. 1NF is the foundation of the other NFs, if the new definition is accepted, all the NFs will be rendered value-less. The result would be that Normalisation itself (sparsely defined in mathematical terms, but clearly understood as a science by professionals) will be severely damaged, if not destroyed.
Architecture 2
There is a centuries-old scientific or engineering principle, that content (data) must be separated from control (program elements). This is because the analysis, design, and implementation of the two are completely different. This principle is no less important in the software sciences, where it has specific articulation.
In order to keep this brief (ha ha), instead of a discourse, I will assume that you understand:
That there is a scientifically demanded boundary between data and program elements. Mixing them up results in complex objects that are error-prone and hard to maintain.
The confusion of this principle has reached epidemic proportions in the OO/ORM world, the consequences reach far and wide.
Only professionals avoid this. For the rest, the great majority, they accept the new definition as "normal", and they spend their lives fixing problems that we simply do not have.
The architectural superiority, the great value, of data being both stored and presented in Tabular Form per Dr E F Codd's Relational Model. That there are specific rules for Normalisation of data.
And importantly, you can determine when the people, who write and market books, advise non-relational or anti-relational methods.
Architecture 3
If you cache data on the client:
Cache the absolute minimum.
That means cache only the data that does not change in the online environment. That means Reference and Lookup tables only, the tables that populate the higher level classifiers, the drop-downs, etc.
Currency
For every table that you do cache, you must have a method of (a) determining that the cached data has become stale, compared to the One Version of the Truth which exists on the server, and (b) refreshing it from the server, (c) on a table-by-table basis.
Typically, this involves a background process that executes every (e) five minutes, that queries the MAX updated DateTime for each cached table on the client vs the DateTime on the server, and if changed, refreshes the table, and all its child tables, those that dependent on the changed table.
That, of course, requires that you have an UpdatedDateTime column on every table. That is not a burden, because you need that for OLTP ACID Transactions anyway (if you have a real database, instead of a bunch of sub-standard files).
Which really means, never replicate, the coding burden is prohibitive.
Architecture 4
In the sub-commercial, non-server world, I understand that some people advise the reverse caching of "everything".
That is the only way the programs like PostgreSQL, can to the used in a multi-user system.
You always get what you pay for: you pay peanuts, you get monkeys; you pay zero, you get zero.
The corollary to Architecture 3 is, if you do cache data on the client, do not cache tables that change frequently. These are the transaction and history tables. The notion of caching such tables, or all tables, on the client is completely bankrupt.
In a genuine Client/Server deployment, due to use of applicable standards, for each data window, the app should query only the rows that are required, for that particular need, at that particular time, based on context or filter values, etc. The app should never load the entire table.
If the same user using the same window inspected its contents, 15 minutes after the first inspection, the data would be 15 mins out of date.
For freeware/shareware/vapourware platforms, which define themselves by the absence of a server architecture, and thus by the result, that performance is non-existent, sure, you have to cache more than the minimum tables on the client.
If you do that, you must take all the above into account, and implement it correctly, otherwise your app will be broken, and the ramifications will drive the users to seek your termination. If there is more than one user, they will have the same cause, and soon form an army.
Architecture 5
Now we get to how you cache those carefully chosen tables on the client.
Note that databases grow, they are extended.
If the system is broken, a failure, it will grow in small increments, and require a lot of effort.
If the system is even a small success, it will grow exponentially.
If the system (each of the database, and the app, separately) is designed and implemented well, the changes will be easy, the bugs will be few.
Therefore, all the components in the app must be designed properly, to comply with applicable standards, and the database must be fully Normalised. This in turn minimises the effect of changes in the database, on the app, and vice versa.
The app will consist of simple, not complex, objects, which are easy to maintain and change.
For the data that you do cache on the client, you will use arrays of some form: multiple instances of a class in an OO platform; DataWindows (TM, google for it) or similar in a 4GL; simple arrays in PHP.
(Aside. Note that what people in situations such as yours produce in one year, professional providers using a commercial SQL platform, a commercial 4GL, and complying with Architecture and Standards.)
Architecture 6
So let's assume that you understand all the above, and appreciate its value, particularly Architecture 1 & 2.
If you don't, please stop here and ask questions, do not proceed to the below.
Now that we have established the full context, we can address the crux of your problem.
In those arrays in the app, why on Earth would you store flattened views of data ?
and consequently mess with, and agonise over, the problems
instead of storing copies of the Normalised tables ?
Answer
Never duplicate anything that can be derived. That is an Architectural Principle, not limited to Normalisation in a database.
Never merge anything.
If you do, you will be creating:
data duplication, and masses of it, on the client. The client will not only be fat and slow, it will be anchored to the floor with the ballast of duplicated data.
additional code, which is completely unnecessary
complexity in that code
code that is fragile, that will constantly have to change.
That is the precise problem you are suffering, a consequence of the method, which you know intuitively is wrong, that there must be a better way. You know it is a generic and common problem.
Note also that method, that code, constitutes a mental anchor for you. Look at the way that you have formatted it and presented it so beautifully: it is of importance to you. I am reluctant to inform you of all this.
Which reluctance is easily overcome, due to your earnest and forthright attitude, and the knowledge that you did not invent this method
In each code segment, at presentation time, as and when required:
a. In the commercial Client/Server context
Execute a query that joins the simple, Normalised, unduplicated tables, and retrieves only the qualifying rows. Thereby obtaining current data values. The user never sees stale data. Here, Views (flattened views of Normalised data) are often used.
b. In the sub-commercial non-server context
Create a temporary result-set array, and join the simple, unduplicated, arrays (copies of tables that are cached), and populate it with only the qualifying rows, from the source arrays. The currency of which is maintained by the background process.
Use the Keys to form the joins between the arrays, in exactly the same way that Keys are used to form the joins in the Relational tables in the database.
Destroy those components when the user closes the window.
A clever version would eliminate the result-set array, and join the source arrays via the Keys, and limit the result to the qualifying rows.
Separate to being architecturally incorrect, Nested Arrays or Nested Sets or JSON or JSON-like structures are simply not required. This is the consequence of confusing the Architecture 1 Principle.
If you do choose to use such structures, then use them only for the temporary result-set arrays.
Last, I trust this discourse demonstrates that n tables is a non-issue. More important, that m levels deep in the data hierarchy, the "nesting", is a non-issue.
Answer 2
Now that I have given the full context (and not before), which removes the implications in your question, and makes it a generic, kernel one.
The question is about ANY server-side/relational-db. [Which is better]:
2 loops, 5 simple "SELECT" queries
1 loop, 1 "JOIN" query
The detailed examples you have given are not accurately described above. The accurate descriptions is:
Your Option 1
2 loops, each loop for loading each array
1 single-table SELECT query per loop
(executed n x m times ... the outermost loop, only, is a single execution)
Your Option 2
1 Joined SELECT query executed once
followed by 2 loops, each loop for loading each array
For the commercial SQL platforms, neither, because it does not apply.
The commercial SQL server is a set-processing engine. Use one query with whatever joins are required, that returns a result set. Never step through the rows using a loop, that reduces the set-processing engine to a pre-1970's ISAM system. Use a View, in the server, since it affords the highest performance and the code is in one place.
However, for the non-commercial, non-server platforms, where:
your "server" is not a set-processing engine ie. it returns single rows, therefore you have to fetch each row and fill the array, manually or
your "server" does not provide Client/Server binding, ie. it does not provide facilities on the client to bind the incoming result set to a receiving array, and therefore you have to step through the returned result set, row by row, and fill the array, manually,
as per your example then, the answer is, by a large margin, your option 2.
Please consider carefully, and comment or ask questions.
Response to Comment
Say I need to print this json (or other html page) to some STOUT (example: an http response to: GET /allUsersPhoneNumbers. It's just an example to clarify what I'm expecting to get), should return this json. I have a php function that got this 2 result sets (1). now it should print this json - how should I do that? this report could be an employee month salary for a whole year, and so one. one way or anther, I need to gather this information and represent it in a "JOIN"ed representation
Perhaps I was not clear enough.
Basically, do not use JSON unless you absolutely have to. Which means sending to some system that requires it, which means that receiving system, and that demand is stupid.
Make sure that your system doesn't make such demands on others.
Keep your data Normalised. Both in the database, and in whatever program elements that you write. That means (in this example) use one SELECT per table or array. That is for loading purposes, so that you can refer to and inspect them at any point in the program.
When you need a join, understand that it is:
a result-set; a derived relation; a view
therefore temporary, it exists for the duration of the execution of that element, only
a. For tables, join them in the usual manner, via Keys. One query, joining two (or more) tables.
b. For arrays, join arrays in the program, the same way you join tables in the database, via Keys.
For the example you have given, which is a response to some request, first understand that it is the category [4], and then fulfil it.
Why even consider JSON?
What has JSON got to do with this?
JSON is misunderstood and people are interested in the wow factor. It is a solution looking for a problem. Unless you have that problem it has no value.
Check these two links:
Copter - What is JSON
Stack Overflow - What is JSON
Now if you understand that, it is mostly for incoming feeds. Never for outgoing. Further, it requires parsing, deconstructing, etc, before the can be used.
Recall:
I need to gather this information and represent it in a "JOIN"ed representation
Yes. That is pedestrian. Joined does not mean JSONed.
In your example, the receiver is expecting a flattened view (eg. spreadsheet), with all the cells filled, and yes, for Users with more than one PhoneNumber, their User details will be repeated on the second nad subsequent result-set row. For any kind of print, eg. for debugging, I want a flattened view. It is just a:
SELECT ... FROM Person JOIN PhoneNumber
And return that. Or if you fulfil the request from arrays, join the Person and PhoneNumber Arrays, which may require a temporary result-set array, and return that.
please don't tell me you should get only 1 user at a time, etc. etc.
Correct. If someone tells you to regress to procedural processing (ie. row by row, in a WHILE loop), where the engine or your program has set processing (ie. processes an entire set in one command), that marks them as someone who should not be listened to.
I have already stated, your Option 2 is correct, Option 1 is incorrect. That is as far as the GET or SELECT is concerned.
On the other hand, for programming languages that do not have set-processing capability (ie. cannot print/set/inspect an array in a single command), or "servers" that do not provide client-side array binding, you do have to write loops, one loop per depth of the data hierarchy (in your example, two loops, one for Person, and one for PhoneNumber per User).
You have to do that to parse an incoming JSON object.
You have to do that to load each array from the result set that is returned in your Option 2.
You have to do that to print each array from the result set that is returned in your Option 2.
Response to Comment 2
I've ment I have to return a result represented in a nested version (let's say I'm printing the report to the page), json was just an example for such representation.
I don't think you understand the reasoning and the conclusions I have provided in this answer.
For printing and displaying, never nest. Print a flattened view, the rows returned from the SELECT per Option 2. That is what we have been doing, when printing or displaying data Relationally, for 31 years. It is easier to read, debug, search, find, fold, staple, mutilate. You cannot do anything with a nested array, except look at it, and say gee that is interesting.
Code
Caveat
I would prefer to take your code and modify it, but actually, looking at your code, it is not well written or structured, it cannot be reasonably modified. Second, if I use that, it would be a bad teaching tool. So I will have to give you fresh, clean code, otherwise you will not learn the correct methods.
This code examples follow my advice, so I am not going to repeat. And this is way beyond the original question.
Query & Print
Your request, using your Option 2. One SELECT executed once. Followed by one loop. Which you can "pretty up" if you like.
In general it is a best practice to grab the data you need in as few trips to the database as possible then map the data into the appropriate objects. (Option 2)
But, to answer your question I would ask yourself what the use case for your data is. If you know for sure that you will be needing your person and your phone number data then I would say the second method is your best option.
However, option one can also have its use case when the joined data is optional.One example of this could be that on the UI you have a table of all your people and if a user wants to see the phone number for a particular person they have to click on that person. Then it would be acceptable to "lazy-load" all of the phone numbers.
This is the common problem, especially if you are creating a WebAPIs, converting those table sets to nested arrays is a big deal..
I always go for you the second option(in slightly different method though), because the first is worst possible way to do it... One thing I learned with my experience is never query inside a loop, that is a waste of DB calls, well you know what I trying to say.
Although I don't accept all the things PerformanceDBA said, there are two major things I need the address,
1. Don't have duplicate data
2. Fetch only data you want
The Only problem I see in Joining the table is, we end up duplicating data lots of them, take you data for example, Joining Person ans phoneNumber tables we end up duplicating Every person for each of his phone number, for two table with few hundred rows its fine, imagine we need to merge 5 tables with thousands of rows its huge...
So here's my solution:
Query:
SELECT id, fullName From Person;
SELECT personId, phoneNumber FROM phoneNumbers
WHERE personId IN (SELECT id From Person);
So I get to tables in my result set, now I assign Table[0] to my Person list,
and use a 2 loops to place right phoneNumbers in to right person...
Code:
personList = ConvertToEntity<List<Person>>(dataset.Table[0]);
pnoList = ConvertToEntity<List<PhoneNumber>>(dataset.Table[1]);
foreach (person in personList) {
foreach (pno in pnoList) {
if(pno.PersonId = person.Id)
person.PhoneNumer.Add(pno)
}
}
I think above method reduce lots of duplication and only get me what I wanted, if there is any downside to the above method please let me know... and thanks for asking these kind of questions...

Loop through MySQL database until field = 'specified value'

I need some help please! Basically I have a system that has an unlimited amount of categories and the way in which it works is through unique IDs. So basically the system will find the root folder and match all subfolders based on its parent's UID. An endless loop...
But now I want to do the opposite of that in a single MySQL statement (if possible).
Basically I want it to do this.. (By the way this isn't my actual code, it's just how I want it to work)
SELECT UID FROM Table
WHERE UID = 'value'
--AND ALSO:
SELECT * FROM SameTable
WHERE UID = The Parent UID just fetched...
And do this until the UID = 'Specified Value'.
I seriously hope that makes sense!
Is it even possible? I could do it using multiple queries in a PHP loop I know, but that just feels like a long way around, and bad practice.
What you have is called "Hierarchical data". You have to read on it on google. In short, there are three main ways to represent it in a 2-dimensional table:
Adjacency list (what you have). You scarcely can make it with single query
Materialized path (my favorite). Natural and readable. Not so efficient though.
Nested set (Most complicated) yet most powerful.
You can choose any system you like ir stick to your current one. Single query is not Holy grail to pursue at any cost.

MySQL database optimization for 20.000 users or more

I have been looking for some optimization tips since I´m doing a RPG modification which uses MySQL to store data by PHP.
I´m using one unique table to store all user information in columns by his unique ID, and I have to store (many?) data for each user. Weapons and other information.
I´m using explode and implode as a method to store the weapons, for example, in one column with the 'text' value. I don´t know if that´s a good practice and I don´t know if I will have performance problems if I get thousands of players doing tons of UPDATES , SELECT , etc, requests.
I read that a Junction table may be better to store the weapons and all those information, but I don´t know if that will get better information that you request it by the explode method.
I mean, I should store all the weapons in a different table, each weapon with his information (each weapon have some information, like different columns, I use multiple explode for that inside the main explode) and the user owner of that weapon to identify the weapon than just have them in one column.
It can be 100 items at least to store, I don´t know if it´s good to make 100 records per user on a different table and call all of them all the time better than just call the column and use explode.
Also I want to improve my skills and knowledge to make the best performance MySQL database I can.
I hope somebody can tell me something.
Thanks, and sorry for my stupid english grammar.
It is almost always best practice to normalize your table data. There are some exceptions to this rule (especially in very high volume databases), but you probably do not need to worry about those exceptions until you get to the point of first understanding how to properly normalize and index your tables.
Typically, try to arrange your tables in a way that mimics real-world objects and their relations to each other.
So, in your case you have users - that is one table. Each user might have multiple weapons. So, you now have a weapons table. Since multiple different users might have the same weapon and each user might have multiple weapons, you have a many-to-many relationship between them, so you should have a table "users_weapons" or similar that does nothing but relate user id's to weapon id's.
Now say the users can all have armor. So now you add an armor table and a users_armor table (as this is likely many-to-many as well).
Just think through the different aspects of your game and try to understand the relationships between them. Make sure you can model these relationships in database tables before you even bother writing any code to actually implement the functionality.
Yes it is better to use several tables instead of one. It's better to db performance, easier to understand, easier to maintain and simplier to use as well.
Let's suggest that one user has several weapons with multiple features(but not unique among all weapons). And in one place in your game you just need to know the value of one specific feature:
doing it by your way you'll need to find user row in users table, fetch on column, explode it several times, and there you have your value, but it complicates even more if you want to change it and save then.
better way is having one table for user details(login, password, email etc), another table which keeps user weapons(name of weapon, image maybe) and table in which will be all features, special powers of weapons kept. You could keep all possible features of all weapons in extra table as well. This way you if you already know user id from user table, you'll have to only join 2 tables in your sql query, and there you got value of feature of specific weapon of user.
Example pseudo schema of tables:
users
user_id
user_name
password
email
weapons
weapon_id
user_id
weapon_name
image
weapons_features
feature_id
weapon_id
feature_name
feature_value
And if you really want to use some ordered data in text field in database encode it to JSON or serialize it. This way you don't have to explode and implode it!
As all guys said, typically you should start from normalized database structure.
If performance is ok, then great, nothing to do.
If not, you can try many different things:
Find and optimize query which works slow.
Denormalize queries - sometimes joins kill performance.
Change data access pattern used in application.
Store data in file system or use NoSQL/polyglot persistence solution.

When to use comma-separated values in a DB Column?

OK, I know the technical answer is NEVER.
BUT, there are times when it seems to make things SO much easier with less code and seemingly few downsides, so please here me out.
I need to build a Table called Restrictions to keep track of what type of users people want to be contacted by and that will contain the following 3 columns (for the sake of simplicity):
minAge
lookingFor
drugs
lookingFor and drugs can contain multiple values.
Database theory tells me I should use a join table to keep track of the multiple values a user might have selected for either of those columns.
But it seems that using comma-separated values makes things so much easier to implement and execute. Here's an example:
Let's say User 1 has the following Restrictions:
minAge => 18
lookingFor => 'Hang Out','Friendship'
drugs => 'Marijuana','Acid'
Now let's say User 2 wants to contact User 1. Well, first we need to see if he fits User 1's Restrictions, but that's easy enough EVEN WITH the comma-separated columns, as such:
First I'd get the Target's (User 1) Restrictions:
SELECT * FROM Restrictions WHERE UserID = 1
Now I just put those into respective variables as-is into PHP:
$targetMinAge = $row['minAge'];
$targetLookingFor = $row['lookingFor'];
$targetDrugs = $row['drugs'];
Now we just check if the SENDER (User 2) fits that simple Criteria:
COUNT (*)
FROM Users
WHERE
Users.UserID = 2 AND
Users.minAge >= $targetMinAge AND
Users.lookingFor IN ($targetLookingFor) AND
Users.drugs IN ($targetDrugs)
Finally, if COUNT == 1, User 2 can contact User 1, else they cannot.
How simple was THAT? It just seems really easy and straightforward, so what is the REAL problem with doing it this way as long as I sanitize all inputs to the DB every time a user updates their contact restrictions? Being able to use MySQL's IN function and already storing the multiple values in a format it will understand (e.g. comma-separated values) seems to make things so much easier than having to create join tables for every multiple-choice column. And I gave a simplified example, but what if there are 10 multiple choice columns? Then things start getting messy with so many join tables, whereas the CSV method stays simple.
So, in this case, is it really THAT bad if I use comma-separated values?
****ducks****
You already know the answer.
First off, your PHP code isn't even close to working because it only works if user 2 has only a single value in LookingFor or Drugs. If either of these columns contains multiple comma-separated values then IN won't work even if those values are in the exact same order as User 1's values. What do expect IN to do if the right-hand side has one or more commas?
Therefore, it's not "easy" to do what you want in PHP. It's actually quite a pain and would involve splitting user 2's fields into single values, writing dynamic SQL with many ORs to do the comparison, and then doing an extremely inefficient query to get the results.
Furthermore, the fact that you even need to write PHP code to answer such a relatively simple question about the intersection of two sets means that your design is badly flawed. This is exactly the kind of problem (relational algebra) that SQL exists to solve. A correct design allows you to solve the problem in the database and then simply implement a presentation layer on top in PHP or some other technology.
Do it correctly and you'll have a much easier time.
Suppose User 1 is looking for 'Hang Out','Friendship' and User 2 is looking for 'Friendship','Hang Out'
Your code would not match them up, because 'Friendship','Hang Out' is not in ('Hang Out','Friendship')
That's the real problem here.

How to handle Tree structures returned from SQL query using PHP?

This is a "theoretical" question.
I'm having trouble defining the question so please bear with me.
When you have several related tables in a database, for example a table that holds "users" and a table that holds "phones"
both "phones" and "users" have a column called "user_id"
select user_id,name,phone from users left outer join phones on phones.user_id = users.user_id;
the query will provide me with rows of all the users whether they have a phone or not.
If a user has several phones, his name will be returned in 2 rows as expected.
columns=>|user_id|name|phone|
row0 = > | 0 |fred|NULL|
row1 = > | 1 |paul|tlf1|
row2 = > | 1 |paul|tlf2|
the name "paul" in the case above is a necessary duplicate which in the RDMS's eye's is not a duplicate at all!
It will then be handled by some server side scripting language, for example php.
How are these "necessary duplicates" actually handled in real websites or applications?
as in, how are the row's "mapped" into some usable object model.
p.s. if you decide to post examples, post them for php,mysql,sqlite if possible.
edit:
Thank you for providing answers, each answer has interpreted the question differently and as such is different and correct in it's own way.
I have come to the conclusion that if round trips are expensive this will be the best way along with Jakob Nilsson-Ehle's solution, which was fitting for the theoretical question.
If round trips they are cheap, I will do separate selects for phones and users as 9000 suggests, if I need to show a single phone for every user, I will give a primary column to the phones and join it with the user select like Ollie Jones correctly suggests.
even though for real life applications I'm using 9000's answer, I think that for this unrealistic question Jakob Nilsson-Ehle's solution is most appropriate.
The thing I would probably do in this case in PHP would be to use the userId in a PHP array and then use that to continuosly update the users
A very simple example would be
$result = mysql_query('select user_id,name,phone from users left outer join phones on phones.user_id = users.user_id;');
$users = Array();
while($row = mysql_fetch_assoc($result)) {
$uid =$row['user_id'];
if(!array_key_exists($uid, $users)) {
$users[$uid] = Array('name' => $row['name'], 'phones' => Array());
}
$users[$uid]['phones'][] = $row['phone'];
}
Of course, depending on your programming style and the complexity of the user data, you might define a User class or something and populate the data, but that is fundamentally how I would would do it.
Your data model inherently allows a user to have 0, 1, or more phones.
You could get your database to return either 0 or 1 phone items for each user by employing a nasty hack, like choosing the numerically smallest phone number. (MIN(phone) ... GROUP BY user). But numerically comparing phone numbers makes very little sense.
Your problem of ambiguity (which of several phone numbers) points to a problem in your data model design. Take a look, if you will, at some common telephone-directory apps. A speed-dial app on a mobile phone is a good example. Mostly they offer ways to put in multiple phone numbers, but they always have the concept of a primary phone number.
If you add a column to your phone table indicating number priority, and make it part of your primary (unique) key, and declare that priority=1 means the user's primary number, your app will not have this ambiguous issue any more.
You can't easily get a tree structure from an RDBMS, only a table structure. And you want a tree: [(user1, (phone1, phone2)), (user2, (phone2, phone3))...]. You can optimize towards different goals, though.
Round-trips are more expensive than sending extra info: go with your current solution. It fetches username multiple times, but you only have one round-trip per entire phone book. May make sense if your overburdened MySQL host is 1000 miles away.
Sending extra info is more expensive than round-trips, or you want more clarity: as #martinho-fernandes suggests, only fetch user IDs with phones, then fetch user details in another query. I'd stick with this approach unless your entire user details is a short username. With SQLite I'd stick with it at all times just for the sake of clarity.
Sound's like you're confusing the object data model with the relational data model - Understanding how they differ in general, and in the specifics of your application is essential to writing OO code on top of a relational database.
Trivial ORM is not the solution.
There are ORM mapping technologies such as hibernate - however these do not scale well. IME, the best solution is using a factory pattern to manage the mapping properly.

Categories