Saving search POST variables in MySQL

Saving search POST variables in MySQL - php

I am developing a site where users choose their city (from a long list), an activity (again a long list) and then a level of expertise. After that, the site shows them local events based upon their choices. I have over 100 POST variables (and more being added). How can this be done in MySQL?

You only have a few variables (city, activity, expertise). Use a session.
Here is a link to sessions in PHP (sessions)

Yes, you have the right idea. create a table and store each variable you would like to save.
post whatever variables from whatever scripting language of your choice, and the database statement would look something like this:
'INSERT INTO `your_table_name` (`field_1`,`field_2`) VALUES ('$post1','$post2');
you can add more fields as your information grows. That would be the best way to start getting into mysql. Of course this answer is the simplest solution. I recommend you read up on all fun and much more complex things you can do with mysql.
More efficiently I would do it like this:
categorize / classify your post variables and create multiple tables grouping similar things or data from a specific page. Create an auto increment primary id for your rows on the master table of the group, and then join your tables as needed. That's a little more advanced, so I suggest you add your few fields that you currently are using to a table now, and when you need to start adding more, looking these topics:
Normalization:
http://en.wikipedia.org/wiki/Database_normalization
Primary Keys, Joins, Stored Procedures, and the list goes on.

Related

How to structure MySQL db for storing multiple checkbox form data and later do statistics in php?

I am working on a form for storing information about the themes of the queries we receive in our company.
I am using php/MySQL to store radio button data such as:
Name of employee
Medium of query (e-mail/phone/in person)
With radio button data such as this, I can easily use PHP to query the database and e.g. count the numbers of queries answered by e-mail by each employee.
The trouble I am having is with data regarding the theme of the query. This is checkbox data with a lot of different checkboxes (50+). We would like to be able to add or remove checkboxes from time to time, though not very often.
I used to store this data just as comma-separated values in a single cell in the database and then export to excel to work with the data, but now I'd like to use another PHP form to generate statistics on the themes.
My research has lead me two to ways of doing this, that may be possible:
Creating a separate table in my database for my themes with one column for each possible theme, so that I'd have as many columns in my database as the number of checkboxes in my form.
Use the php functions serialize to store the checkbox data in one cell in my database and then later using unserialize to work with the data in php.
I am an absolute beginner, so with both options I'm unsure how I'd actually implement it:
With this option I am unsure whether my MySQL columns should just be "theme 1", "theme 2", etc., or whether they should have the names of my checkbox values, e.g. "money", "personal problems", "practical issues", etc. I have not been able to find a good resource on how to store the checkbox data in the right way, when the user may sometimes have clicked just 1 theme, and in other instances may have clicked 10 themes.
With this option I am unsure how I could populate a dropdown with unique values, and how I could later count instances of a unique value across the rows in my database.
Any help you can give me on this, including links to tutorials or questions similar to this on stackoverflow, would be much appreciated. I haven't been able to find anything about this that I could understand, other than I am thinking option 1 is probably the right way to go.
EDIT: After having received an idea about how to do this from David, I am updating my post with my attempt to understand how I would go about this.

What you're describing can be thought of as a many-to-many relationship. You have:
A form record, which can relate to many themes
A theme, which can relate to many form records
In a situation like this, the relationship itself is a database record. Consider this table structure:
FormRecords
----------
ID
SomeTitle
UserIDWhoFilledOutForm
etc.
Themes
----------
ID
ThemeName
etc.
FormRecordThemes
----------
FormRecordID
ThemeID
Each "primary entity" has an identifier and information about that entity. Then there's a "linking table" which has information about the relationship between those two entities.
Any time you present a form, you simply select from the Themes to populate the check boxes. You can add new ones as you see fit. You probably shouldn't remove any, though you can "soft delete" them by setting some flag on the record to indicate not to display them on the form.
If you ever want to edit the Themes in any significant way (a way which would somehow invalidate previous uses of that record, such as completely changing its name/title), then keep in mind that you'd be modifying the entire history of its use. I don't know if this is a risk in your domain, but in cases like that it might help to de-normalize a little bit by storing "Theme at that time" values in the relationship table. Like, the name of the Theme at the time that relationship was created. It's best to avoid this scenario entirely if possible, mostly by making key Theme values immutable in the domain.
Don't store delimited lists, don't store serialized data (unless the entire object really is a single data point)... Keep values separated into their own actual values in the database. Relational databases are really good at querying relational data.

You can put all values of checkboxes got from user into one CSV format string and store in one cell. Later on you can just split the string and get the values back.

MySQL but don't know the column names before hand

I am building an PHP/MySQL app and I am allowing users to create their own custom (as much as they want) profile data (i.e. they can add any amount of info to their profile with additional textboxes, but there is a "CORE" set of user profile fields)
For example, they can create a new textbox on the form and call it "my pet" and/or "my favorite color". We need to store this data in a database and cannot obviously create columns for each of their choices since we don't know what their additional info is before hand.
One way we think that we could store all "addidional info" they provide is to store their additional info as JSON and store it in a MySQL text field ( I love MySQL :) )
I've seen Wordpress form builder plugins where you can create your own fields so I'm thinking they must store the data in MySQL somehow as NoSQL solutions are beyond the scope of these plugins.
I would love to stick with MySQL but do you guys think NoSQL solutions like MongoDB/Redis would be a better fix since for this?
Thanks

One way to approach this is to use a single table using the EAV paradigm, or Entity-Attribute-Value. See the Wikipedia article. That would be far tidier in most respects than letting users choose a database schema.

You could create a table of key value pairs where anything not in core would be stored. The table would look like: user_id, name_of_user_specified_field, user_specified_value;
Any name_of_user_specified_field that starts showing up a lot you could then add to the core table. This is referred to as Entity-Attribute-Value. Please note, some people consider this an anti-pattern.
If you do this, please add controls to limit the number of new entries a user can create or you might find someone stuffing your db with lots of fields :)

MySQL can handle this just fine. If the additional data is always going to be pulled out all together (i.e. you will never need to get just the pet field without any other additional fields) then you can store it serialized in a column on the users table. However, if you want a more relational model, you can store the extra data in a separate table linked by the user ID. The additional table would have a column for the user ID, additional field name additional field value, and whatever else you might want with it. Then you just run a JOIN query when getting the profile to get all of the extra fields.

SELECT DISTINCT or separate normalized table?

We're making the plans now, so before I start progress I want to make sure I'm handing things in the best way.
We have a products table to which we're adding a new field called 'format', which is going to be the structure of the product (bag, box, etc). There is no set values for this, users can enter whatever they like into that field, however we want to show a drop down list of all formats that the user has already entered.
There's two ways I can think of to do that: either a basic SELECT DISTINCT on the products table to get all formats the user already filled in; or a separate table that stores the formats and is linked to by the product.
Instinctively I'd like to use SELECT DISTINCT, since it would make my life easier. However, assuming a table of a billion products, which would be the best way to go?

I think i would opt for the second option (additional table + foreign key if you want to add constraint), just because of the volume and because you can have management that will merge similar product form for example.

If you decide to keep everything in one table, then build an index on the column. This should speed the processing for creating the list in the user application.
I'm somewhat agnostic about which is the best approach. Often, when designing user interfaces, you want to try out different things. Having to make database changes impedes the creative process of building the application.
On the other hand, generally when users pick things from a drop down box in the application, these "things" are excellent examples of "entities" -- and that is what tables are intended to store.
In the end, I would say do what is most convenient while developing the application. As you get closer to finalizing it, consider whether it would be better to store these things in a separate table. One of the big questions is whether you want to know all formats that have every been used, even if no user currently has them defined.

Since you are letting users enter whatever they want I would go with the 2nd option.
Create a new table and insert in there all the new 'formats' and link to the product table.
Be sure when you create the code to add the format the user typed in, check if there is an equal value on the database so you won't need to distinct them as well.
Also, keep it consistent, either by having only the first letter upprcase of each word.

PHP mySQL best practices regarding lookups

My question:
I have a mysql database that consists of something like a fact table (although not every field is a lookup) and a variety of other tables. When I want to display data from that "fact" table, is it necessary to run a query to each individual lookup or is there a way to make a temporary table that has already done the "looking up"?
Example:
Table structure -
unique_id(auto increment int),
model(int, lookup to table #2),
type(int, lookup from table #2 to table #3)
employee(int, lookup to table #4)
notes(text)
cost(float)
hours(float)
-
So for instance when I want to make a php page to enter this data it seems like a lot more "work" than it needs to be:
unique_id (not shown as a data entry field, increments automatically
on submit)
model (drop down box. population requires query to table #2 where status = X)
type (read-only text box shows type of model. Requires query to table #3 based on column from table #2)
employee (drop down box. population requires query to table #4 where employee_status = "Active")
notes (text box, user inputs related notes to submission)
cost (texts box, user enters costs related to submission)
hours (text box, user enters hours related to submission)
Just to get a simple form populated with valid data requires what seems to me like A LOT of queries/lookups.
Is this the best way? Is there a better way?
Aside: I have control over the data structure, so if the problem is the database design, then those suggestions would be helpful as well.

Dimension tables typically don't change very often, at least relative to the number of inserts to the fact table. Dimension tables are also individually much smaller than the fact table. This makes dimension tables good candidates for caching.
What some people do to good effect is to render the partial HTML output for the form, with all the data populated as dropdowns, radiobuttons, etc. Then store that partial HTML under a memcached key so you don't have to do any of the database queries or the HTML render for most PHP requests -- you just fetch the pre-populated HTML fragment out of memcached and echo it verbatim. I think of this like the "Ikea" of database-driven output.
Of course if you ever do change data in a dimension table, you'd want to invalidate the cached HTML, or even better re-generate it and store a new version of the HTML in memcached.
Regarding doing all the lookups, I'll point out that there's no requirement to use pseudokeys in a fact table. You can use the natural values, and make them reference the primary key of the dimension table, which also can be a natural key instead of a pseudokey. It might take a bit more space in some cases, but it eliminates the lookups. Of course it may make sense to continue using pseudokeys for dimensions that are long varchars.

I'm not quite sure what you mean by " a query to each individual lookup". Do you mean a way to save your entire table in your php script? Or do you mean a way to cache on the mysql server to eliminate process resources on the database node?
MySQL includes a built in Caching system that eliminates a lot of server cycles for similar queries. You can find more here-> MySQL Caching
As far as your database structure, you're going to have to provide a little bit more detail about your schema (What your database is meant to do) if you would like some suggestions. It's hard to know what kind of structure works and is effective without knowing what it's supposed to do. (Are there multiple notes per employee, what are costs? Are they per employee? etc)

Personalized Search Results based on History

What are some of the techniques for providing personalized search results to a logged in user? One way I can think of will be by analyzing the user's browsing history.
Tracking: A log of a user's activities like pages viewed and 'like' buttons clicked can be use to bias search results.
Question 1: How do you track a user's browsing history? A table with columns user_id, number_of_hits, page id? If I have 1000 daily visitors, each browsing 10 pages on average, wont there be a large number of records to select each time a personalized recommendation is required? The table will grow at 300K rows a month! It will take longer and longer to select the rows each time a search is made. I guess the table for recording 'likes' will take the same table design.
Question 2: How do you bias the results of a search? For example, if a user as been searching for apple products, how does the search engine realise that the user likes apple products and subsequently bias the search towards them? Tag the pages and accumulate a record of tags on the page visited?

You probably don't want to use a relational database for this type of thing, take a look at mongodb or cassandra. That's because you basically want to add a new column to the user's history so a column-oriented database makes more sense.

300k rows per month is not really that much, in fact, that's almost nothing. it doesn't matter if you use a relational or non-relational database for this.
Straightforward approach is the following:
put entries into the table/collection like this:
timestamp, user, action, misc information
(make sure that you put as much information as possible, such that you don't need to join this data warehousing table with any other table)
partition by timestamp (one partition per month)
never go against this table directly, instead have say daily report jobs running over all data and collect and compute the necessary statistics and write them to a summary table.
reflect on your report queries and put appropriate partition local indexes
only go against the summary table from your web frontend

If you stored only the last X results as opposed to everything, it would probably be do-able. Might slow things down, but it'd work. Any time you're writing more data and reading more data, there's going to be an impact. Proper DBA methods such as indexing and query optimizing can help, but no matter what you use there's going to be an affect.
I'd personally look at storing just a default view for the user in a DB and use the session to keep track of the rest. Sure, when you login there'd be no history. But you could take advantage of that to highlight a set of special pages that you think are important or relevant to steer the user to. A highlight system of sorts. Faster, easier, and more user-friendly.
As for bias, you could write a set of keywords for each record and array sort them accordingly. Wouldn't be terribly difficult using PHP.

I use MySQL and over 2M records (page views) a month and we run reports on that table daily and often.
The table is partitioned by month (like already suggested) and indexed where needed.
I also clear the table from data that is over 6 months by creating a new table called "page_view_YYMM" (YY=year, MM=month) and using some UNIONS when necessary
for the second question, the way I would approach it is by creating a table with the list of your products that is a simple:
url, description
the description will be a tag stripped of the content of your page or item (depend how you want to influence the search) and then add a full text index on description and a search on that table adding possible extra terms that you have been collecting while the user was surfing your site that you think are relevant (for example category name, or brand)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.