Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What's the best way of storing options in mySQL? As descriptive strings or integers that are associated with each string.
Let's say I have this question in my UI:
What's your favorite ice cream flavor?
Vanilla
Chocolate
Strawberry
Is it better to store those in the DB as 1, 2, 3 in an INT(1) field or as strings vanilla, chocolate, or strawberry, in a CHAR field? I know the INT field will be faster, but probably not drastically unless there's tens of thousands of rows.
If they're stored as strings then I wouldn't need to do any extra PHP code, whereas if they're stored as numbers, I'd have to define the value of 1 = vanilla, etc...
What's the general consensus on this?
The usual approach with relational databases is to make a new table called icecream_flavor or whatver. Then you can add new flavours at a later date, and your program can offer all the current flavour choices when it asks. You can store choices by table ID (ie an integer).
If paddy answer isn't an option then you should store the values as ENUM.
While ENUM is pretty equivalent to TINYINT(1).
ENUM is only the answer if the values you let the user choose are already pre-fixed, otherwise you would have to edit the table. But if you use ENUM MySQL has the engine optimized while inserting, and selecting from ENUMs. It's the obvious choice, for example (Male\Female).
Otherwise the answer to your question is TINYINYT(1), which is fastest then both CHAR and INT(1).
If you have a constant set of values and you're not going to relate that values with another data in your database (information related to each type of ice-cream) you can use MySQL's special field type called ENUM
If each user is identifiable by some unique key (say an email address) then you may find you do not need a numerical id.
Keeping the flavour as a flavour when YOU control the option (so you dont get Vanilla, vanilla vanila and so on) suggests to me that you store the real value (Vanilla).
This avoids a join and makes your database data meaningful when you browse it.
You can add an index to the flavour column of the database, so you can "show all users who prefer Vanilla" very cheaply.
(If you potentially have infinite options to store, then maybe you should be investigating using a nosql databases)
I would suggest that you keep the primary index as an int. The simplest reason for doing this is for what you don't currently know and to allow your data model to evolve.
Assuming your example, if you go with the actual word, and in six months you decide to also put in a size - single scoop, double scoop etc, you suddenly have a problem as your primary key that you thought was unique suddenly has multiple entries for each flavour. If on the other hand you go with a numeric format, you can easily have as many variants of Vanilla as you like and everything is happy.
In this case, I would also suggest keeping the actual primary key as a key and nothing more. Add in another column that then has the flavour, or create a lookup table that will store the actual properties of each item. If you keep a numeric format, but have ID 1 as Vanilla, you basically run into the same problem as before, keep the ID as the ID and your details seperate.
To keep data about your items, you can use something like this:
Master Table
ID Name
1 Ice Cream
Properties Table
ID Type
1 Flavour
2 Size
Property Table
ID PropID Detail
1 1 Vanilla
2 1 Strawberry
3 2 Single Scoop
4 2 Double Scoop
MasterToProperty
MasterID Property PropID
1 1 1
1 2 4
This basically gives you an unlimited number of options, and adding any thing you want (an entry for choc chips for example) is simply a matter of adding a few rows of data, not table changes.
Set up your database like this:
Questions
id, question_text
ex. (1, "What is your favorite ice cream?")
Options
id, question_id, option_text
ex. (1, 1, "Vanilla")
Responses
id, user_id, question_id, option_id
ex. (1, 421, 1, 1)
You should also throw some created and modified fields into each table for keeping track of changes.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
What I Wish To Implement
My site does a nightly API data fetch, inserting 100,000+ new entries each night. To save space, each field name is in a seperate table with an allocated ID saving around 1,027 bytes per data set, 2.5675MB approx per night and just under a gigabyte over the course of a year, however this is set to increase.
For each user, a JSON file is requested containing the 112 entries to be added. Instead of checking my table for each name ID, I feel to save time, it would be best to create an array whereas the position in the array will be the ID, so lets use some random vegetable names;
Random List Of Vegetables
"Broccoli", "Brussels sprouts", "Cabbage", "Calabrese", "Carrots", "Cauliflower", "Celery", "Chard", "Collard greens", "Corn salad", "Endive", "Fiddleheads (young coiled fern leaves)", "Frisee", "Fennel"
When I create the insert via my PHP classes, I use the following;
$database->bind(':veg_name', VALUE);
Question
What would be the best method to quickly check what position $x is within the array?
As an alternative solution to matching the entries in PHP (which might at some point run into time and/or memory problems):
The general idea is to let the database to the work. It is already optimized (index structures) to match entries to one another.
So following your example, the database probably has a dimensional table for the field names fields:
ID | Name
---------------------------------
0 | "Broccoli"
1 | "Brussels sprouts"
2 | "Cabbage"
Then there is the "final" table facts, which has a structure like this:
User_ID | Field_ID | Timestamp
Now a new batch of entries should be inserted. For this, we first create a temporary table temp with the following format and insert all raw entries. The last column Field_ID will stay empty for now.
User_ID | Field_Name | Timestamp | Field_ID
In a next step we match each field name with its ID using a simple SQL query:
UPDATE `temp` t
SET Field_ID=(SELECT Field_ID FROM fields f WHERE f.Name=t.Field_Name)
So now the database has done our required mapping and we can issue another query to insert the rows into our fact table:
INSERT INTO facts
SELECT User_ID, Field_ID, Timestamp FROM temp WHERE Field_ID IS NOT NULL
A small side-effect here: All rows in our temp table, that could not be matched (we didn't have the field name in our fields table), are still available there. So we could write some logic to send an error report somewhere and have someone add the field names or otherwise fix the issue.
After we are done, we should remove or at least truncate the temp table to be ready for next nights iteration.
Small remark: The queries here are just examples. You could do the mapping and insertion into your facts table in one query, but then you'd lose the "unmatched" entries or have to redo the work.
Redoing the work might not be an issue now, but you said the number of entries will increase in the future, so this might become an issue.
If you're only doing 2.5 megs/night, that's almost nothing. If you gzipped that before dragging it across, it would reduce it a lot more.
Using array positions could get tricky if you're trying to use that to match something in some other table.
That being said, every array has a numeric index as well, so you can find out what that is at any point.
Try this and you'll see:
$array = array("Broccoli", "Brussels sprouts", "Cabbage", "Calabrese", "Carrots", "Cauliflower", "Celery", "Chard", "Collard greens", "Corn salad", "Endive", "Fiddleheads (young coiled fern leaves)", "Frisee", "Fennel");
var_dump(array_keys($array));
On the array, you can also do this:
$currentKey = array_search("carrot",$array);
That will return the key for a given variable. So if you're looping through an array, you can output the key(index) and go do something else with it.
Also, gzip is a form of compression that makes your data much smaller.
If you have a list of item, e.g. an array containing only strings that represent your values, you can use foreach with a key-value ($users as $index => $user) method instead of just a $users as $user like following :
$users = ["Broccoli", "Brussels sprouts", "Cabbage", "Calabrese", "Carrots", "Cauliflower", "Celery", "Chard", "Collard greens", "Corn salad", "Endive", "Fiddleheads (young coiled fern leaves)", "Frisee", "Fennel"];
foreach( $users as $index => $name ) {
echo "about to insert $name which is the #$index..." . PHP_EOL;
}
Which will echo :
about to insert Broccoli which is the #0...
about to insert Brussels sprouts which is the #1...
about to insert Cabbage which is the #2...
about to insert Calabrese which is the #3...
about to insert Carrots which is the #4...
about to insert Cauliflower which is the #5...
about to insert Celery which is the #6...
about to insert Chard which is the #7...
about to insert Collard greens which is the #8...
about to insert Corn salad which is the #9...
about to insert Endive which is the #10...
about to insert Fiddleheads (young coiled fern leaves) which is the #11...
about to insert Frisee which is the #12...
about to insert Fennel which is the #13...
Live-example available here : https://repl.it/Jpwk
Like #m13r asked, how would an index be useful in your case ?
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Can someone offer a solution for implementing flags in php and mysql. I have a large number of flags that represent owning items. There over 200 different item, but over time this will grow to as many as 500-600.
My initial thought was to store this information in a data blob, and update it in a trigger in mysql. But it appears that bit operations are limited to 64 bits.
The basic operation is to give an item by id type (say item 156) which would set 156th bit in the blob.
If you store 200 "items" as bit flags, that will occupy 25 bytes per user. Regardless of the number of users.
If instead you have a UserItems table with two columns, , , that is 8 bytes per pair. If users have, on average, 3 items or fewer, then the normalized approach is actually smaller than the bit-packing approach.
It also offers several advantages. The normalized approach would naturally have an items table with descriptive information about the items. This could be easily joined in, so you would know which items are red, or in German, or size 16, or take diesel fuel -- whatever the appropriate attributes are for your items. And these could have item hierarchies with important category information as well.
In addition, the basic UserItems table might be too small. Perhaps you want other information about the acquisition of an item -- such as when it was acquired, or the quantity. Well, you can add columns to the UserItem table. The bit-packing approach is a bit less flexible.
The advice is to use a standard database approach. This has worked on many different applications, some bigger than the one you are contemplating. If you really do understand the problem and understand the performance implications of different approaches, there are some circumstances where bit-packing could be the right solution. But it is not the way to start the design.
As I see it:
you have a list of objects and a list of potential owners
you want to implement a "owner owns object" relation
The simplest solution would be to have a table that associates an owner and an object.
A record in this table would be equivalent to a cell in the matrix representing the ownership relation.
If this matrix is populated sparsely enough, then the table is a good enough solution.
If you know that an average owner will own 50 objects or so, then you might want to organize your objects in groups and have a slightly more complex relation: "owner owns objects in object group", where you store the owner id, the group id and a bitmap specifying which objects are owned within the group.
Provided your owners don't pick items at random (i.e. you can guess reasonably well which objects will be likely to be owned together), this approach would be more flexible than a huge bitmap for each owner, that you would have to update each time you add a new set of objects to your database, and probably more size-efficient.
There are several answers here that should be considered prior to using this solution. In my particular case I analyzed the use case and it was necessary to use binary data and bit twiddle the results. So to the solution....
First the mysql data type was varbinary({maxLen}). where maxlen is the size of bits I would probably need. In my case I chose 64 which provides me with 512 bits. I chose this because it doesn't pad out the data. This was because it wanted to store the smallest amount of data and it wasn't until later that users would begin needing larger bits flipped.
Second the php... (WHAT CRAP, but it works) - I select from the DB, convert, bittwiddle and update. But there is a bit more to it than that.
$q = $sqli->query("Select binaryFlags from Table where id= $id");
$r = $q->fetch_assoc();
$c = $r['binaryFlags']; // holds the binary raw data
$c = bin2hex($c);// convert to hex - don't know why I can't convert directly to dec
$c = hexdec($c);// now i have a hex string convert to dec
$c |= 1 << ($flag-1); // now i have dec i can OR $flag is the bit to flip and is 1 based)
$c = '0x'.dechex($c); //convert back to hex and add the '0x' so that the sql will know how to deal with it.
$q = $sqli->query("update Table set binaryFlags = $c"); // acutall update with proper binary.
Hope this can help someone else AND if anyone has a better way, or reasons why I had to jump thru hoops please leave a comment
I have a game. In the game, people make many choices out of 2 options.
The choice can be either right or wrong and I am storing the result of their run through the game (which can be a very large length) as a string with 1 for a right answer and 0 for wrong answers.
So for example, player 128937 will have stored in his run column the string 00010101010010001010111 as a varchar(5000).
Is there a better way I can store this information in MYSQL? (I am using PHP too if that can help)
I would create a new table (say it's called 'answers') with three columns:
question_id,user_id and answer (which will hold values of 0/1 )
every time the player answers a question you INSERT a new entry to this table.
This way it'll be easier to maintain the sum of right/wrong answers
Why not use a tinyint(1) for each option rather than using strings?
I would make multiple tables
choices
id
scenario (or other title)
options
id
choice_id
title (example: "go left" or "turn around and go home"
correct (0 or 1)
user_choices
user_id
option_id
choice_id (optional since choice_id is already in options table)
I have a questionnaire for users to be matched by similar interests: 40 categories, each with 3 to 10 subcategories. Each of the subcategories has a 0 - 5 value related to how interested they are in that subcategory (0 being not even remotely interested, 5 being a die-hard fan). Let's take an example for a category, sports:
<input type="radio" name="int_sports_football" value="0">0</input>
<input type="radio" name="int_sports_football" value="1">1</input>
<input type="radio" name="int_sports_football" value="2">2</input>
<input type="radio" name="int_sports_football" value="3">3</input>
<input type="radio" name="int_sports_football" value="4">4</input>
<input type="radio" name="int_sports_football" value="5">5</input>
With so many of these, I have a table with the interest categories, but due to the size, have been using CSV format for the subcategory values (Bad practice for numerous reasons, I know).
Right now, I don't have the resources to create an entire database devoted to interests, and having 40 tables of data in the profiles database is messy. I've been pulling the CSV out (Which looks like 0,2,4,1,5,1), exploding them, and using the numbers as I desire, which seems really inefficient.
If it were simply yes/no I could see doing bit masking (which I do in another spot – maybe there's a way to make this work with 6-ary values? ). Is there another way to store this sort of categorized data efficiently?
You do not do this by adding an extra field per question to the user table, but rather you create a table of answers where each answer record stores a unique identifier for the user record. You can then query the two tables together using joins in order to isolate only those answers for a specific user. In addition, you want to create a questions table so you can link the answer to a specific question.
table 1) user: (uniqueID, identifying info)
table 2) answers: (uniqueID, userID, questionID, text) links to unique userID and unique questionID
table 3) question: (uniqueID, subcategoryID, text) links to uniqueID of a subcategory (e.g. football)
table 4) subcategories: (uniqueID, maincategoyID, text) links to uniqueID of a mainCategory (e.g sports)
table 5) maincategories: (uniqueID,text)
An individual user has one user record, but MANY answer records. As the user answers a question, a new record is created in the answers table, storing the uniqueID of the user, the uniqueID of the question, and the value of their answer.
An answer record is linked to a single user record (by referencing the user's uniqueID field) and a single question record (via uniqueID of question).
A question record is linked to a single subcategory record.
A subcategory record is linked to a single category record.
Note this scheme only handles two levels of categories: sports->football. If you have 3 levels, then add another level in the same manner. If your levels are arbitrary, there may be some other scheme more suited.
okay, so, given that you have 40 categories and let's assume 10 subcategories, that leaves us with 400 question-answer pairs per user.
now, in order to design the best intermediary data storage, I would suggest starting out with a few questions:
1) what type of analysis will I need
2) what resources do I have
3) is this one time solution or should it be reused in future
Well, if I were you, I would stick to very simple database structure e.g.:
question_id | user_id | answer
if I would foresee more this kind of polls going on with same questions and probably having same respondents, I would further extend the structure with "campaign_id". This would work as raw data storage which would allow quick and easy statistics of any kind.
now, you said database is no option. well, you can mimic this very same structure using arrays and create your own statistical interface that would work based on the array storage type, BUT, you would save their and your time if you could get sql. as others suggest, there is always sqlite (file based database engine), which, is easy to use and setup.
now, if all that does not make you happy, then there is another interesting approach. if data set is fixed, meaning, that there are pretty much no conditional questions, then, given that you could create question index, you could further create funny 400byte answer chunk, where each byte would represent answer in any of the given values. then what you do is you create your statistical methods that, based on the question id, can easily operate with $answer[$user][$nth] byte (or $answer[$nth][$user] -- again, based on the type of statistics you need)
this should help you get your mind set on the goal you want to achieve.
I know you said you don't have the resources to create a database, but I disagree. Using SQL seems like your best bet and PHP includes SQLite (http://us2.php.net/manual/en/book.sqlite.php) which means you wouldn't need to set up a MySQL database if that were a problem.
There are also tools for both MySQL and SQLite which would allow you to create tables and import your data from the CSV files without any effort.
maybe I am confused but it seems like you need a well designed relational database.
for example:
tblCategories (pkCategoryID, fldCategoryName)
tblSubCategory (pkSubCategoryID, fkdSubCategoryName)
tblCategorySubCategory(fkCategoryID,fkSubCategoryID)
then use inner joins to populate the pages. hopefully this helps you :)
i consider NoSQL architecture as a solution to scaling MySQL field in agile solutions.
To get it done asap, I'd create a class for "interest" category that constructs sub-categories instance which extends from category parent class, carrying properties of answers, which would be stored as a JSON object in that field, example:
{
"music": { // category
"instruments": { // sub category
"guitar": 5, //intrest answers
"piano": 2,
"violin": 0,
"drums": 4
},
"fav artist":{
"lady gaga": 1,
"kate perry": 2,
"Joe satriani": 5
}
}
"sports": {
"fav sport":{
"soccer": 5,
"hockey": 2,
}
"fav player":{
"messi": 5,
"Jordan": 5,
}
}
}
NOTE that you need to use "abstraction" for the "category" class to keep the object architecture right
i was wondering if any one can help me with my php-mysql design
my current app. (is a more or less survey app) it let users store questions about targeting specific features in other products also saved in other table in database !
for example , a user can post a car: and then ask users about there opion in safty elements of his car.
car db : Id,brand,safety
brand = Fast
saftety = ABS=ABS (Anti lock braking System),DriverAirBag=Air bags
questions db: ID,Question,Answer,Target,type
eg of data:
Question:safety options you like
Answer:ABS=ABS (Anti lock braking System),DriverAirBag=Air bags"
target:saftey
type=checkbox
problem is that to display questions stored, i have to .
1) loop through all questions, echo Question and echo target in hidden input,
2) explode Answer field twice(1st w/ "," to get each answer and other with "=" to differ > between whats inside the database[0] and a user friendly text[1]
3) check type to chose display type (3 options checkbox,select,text)
4) set this display type of [0] and show [1] for user !!! (stupid i know:()
eg:
< checkbox
value=$expolde[0]>$explode[1]
All these steps make it very hard to maintain, not flexable by any mean cause display is embeded in code :(,
any ideas :) ?
I would separate the tables into a one-to-many type design like:
CarTable
ID
Brand
Model
CarInfo
CarID # Foreign key to Car Table
Category # Optional: Safety, Performance, Looks, etc...
Value # Specific Info Value: ABS, Air Bags, etc...
In this design you can have 0 to many CarInfo records for each Car making it easier to add/remove info records for a car without having to parse a potentially complex field like in your original design.
Your question table design could be similar depending on what your ultimate goal is:
Question
ID
Description
QuestionInfo
QuestionID
Category
Value
Some other things you should be considering and questions you should be asking yourself:
How are you handling custom user inputs? If user1 enters "Air Bags" and user2 requests "Driver Side AirBag" how are you going to match the two?
Make sure you understand the problem before you attempt to solve it. It was not clear to me from your question what you are trying to do (which could be just me or limited size of the question here).
Be careful when outputting raw database values (like the type field in your question table). This is fine as long as the database values cannot be input by the user or are properly sanitized. Search for "SQL Injection" if you are not familiar with it.
If you want a survey PHP application, I suppose, to be clear, that you need something where:
one user can add a subject (a car in your example).
there can be an arbitrary number of questions attached to a subject by that user
each question can accept several types of answers: yes/no (checkbox input), a number (text input, or say 10 radiobuttons with values 1 to 10 attached etc), single or multiple choice (select with or without multi attribute), arbitrary data (textarea). Moreover, some questions may accept comments / "other, please explain" field.
any other user can answer all the questions, and all of them are stored.
A more sophisticated version will require different sets of questions based on what was replied previously, but it's out of the scope of this question, I hope.
For that I think you need several tables:
Subjects
id int pri_key and anything that can come to mind: brand, type etc.
Questions
id int pri_key, text varchar, subject int f_key, type int/enum/?
QuestionOptions
id int pri_key, question int f_key, option varchar
Users
id int pri_key + whatever your authentication structure is
UserReplies
user int f_key, question int f_key, answer varchar, comments varchar
The user-creator sets up a subject and attaches several questions to it. Each question knows which type it is - the field 'type' may be an integer, or an enum value, I myself prefer storing such data as integer and defining constants in php, using something like QTYPE_MULTISELECT or QTYPE_BOOLEAN for readability.
For single/multiselect questions a user-creator also populates the QuestionOptions table, where the options for select-tag are stored.
To display all the questions there'll be something like
SELECT Questions.id, Questions.text, Questions.type, GROUP_CONCAT(CONCAT(questionOptions.id, questionOptions.option)) AS options
FROM Questions LEFT JOIN QuestionsOptions ON (Questions.type = $select AND Questions.id = QuestionsOptions.question)
WHERE Questions.subject = $subject
GROUP BY Questions.id
The GROUP_CONCAT and CONCAT here should be modified to return something like 5:Option_One;6:Option_Two etc, so that exploding the data won't be much hassle.
I realize this is not the cleanest approach in terms of performance and optimization, but It should do for a non-large-scale project.
There is also a drawback in in the above design in that the answers to the "multiple answer question" are stored imploded in the "answer" field of the UserReplies table. Better add another table, where every record holds an option value the user selected for this or that question. That way there will be no unnecessary denormalization in the database and queries for statistics will be much easier (i.e. querying which options were most popular in a single question)