I have a system where people would upload something, then a single link would identify the uploaded file and display for other users. Each file would have its own page and they will have a comment box. Now I have the general idea of how to create the PHP script and the SQL I just have one problem. What would be beter? storing up all of the comments for each file in a single row, and adding a delimiter for each new comment? Or creating a new row for each comment input but having the same reference to the file? I would prefer if each comment had its own row in the SQL but then it can easily grow up to thousands of entries per day and im not sure if that's efficient.
Which approach from the two is the most efficient?
Adding a new row for each comment would make more sense. You could have a column in the comment table which would be FileID and easily filter the results by that. While you will have people commenting on it and that will add rows, it will be much more efficient than having a single row. Especially considering that you will likely not have enough space for thousands of comments in a single row!
If you're really concerned about having too many comments, you should look into a NoSql solution like MongoDB or Cassandra. These systems are made specifically for handling large amounts of inserts (ie: new comments).
If (each) comment length is limited then go for seprate rows with VARCHAR type. If you choose single row for all comments pertaining to each file then you cannot, before hand, estimate the length comments and you will have to declare it TEXT type, which means the data will be stored in separate file and this will impact the retrieval time negatively. Moreover, advanced querying and data-mining will be easier with separate rows.
Also separate rows will be consistent with some requirements of higher order normalisation.
Make sure a comment has a set maxlength and from the database have the row with VARCHAR type. It is not recommended saving data in one field using delemeters and it would be a real hassle if you want to save information about the comments on a later stage (date / posted by / etc) or even when querying back for the information.
Save them as separate entries on a separate table with a reference to the uploadId.
Related
I am making a website that interacts with an offline project through json files sent from the offline project to the site.
The site will need to load these files and manipulate the data.
Is it feasible with modern computing power to simply load these files into the database as a single serialized field, which can then be loaded and decoded for every use?
Or would it save significant overhead to properly store the JSON as tables and fields and refer to those for every use?
Without knowing more about the project, a table with multiple fields is probably the better solution.
There will be more options for the data in the long run, for example, indexing fields, searching through fields and many other MySQL commands that would not be possible if it was all stored in a single variable.
Consider future versions of the project too, example adding another field to a table is easy, but adding another field to a block of JSON would be more difficult.
Project growth, what if you experience 100x or 1000x growth will the table handle the extra load.
500kb is a relatively small data block, there shouldn't be any issue with computing power regardless of which method is used, although more information would be handy here, example 500kb per user, per upload, how many stores a day how often is it accessed.
Debugging will also be easier.
The New MySQL shell has a bulk JSON loader that is not only very quick but lets you have a lot of control on how the data is handled. See https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-json.html
Load it as JSON.
Think about what queries you need to perform.
Copy selected fields into MySQL columns so that those queries can use WHERE, GROUP BY, ORDER BY of MySQL instead of having to deal with the processing in the client.
A database table contains a bunch of similarly structured rows. Each row has a constant set of columns. (NULLs can be used to indicate missing columns for a given row.) JSON complicates things by providing a complex column. My advice above is a compromise between the open-ended flexibility of JSON and the need to use the database server to process lots of data. Further discussion here.
I'm trying to optimize my PHP and MySQL, but my understanding of SQL databases is shoddy at best. I'm creating a website (mostly for learning purposes) which allows users to make different kinds of posts (image/video/text/link).
Here is the basics of what I'm storing
Auto - int (key index)
User ID - varchar
Post id - varchar
Post Type - varchar (YouTube, vimeo, image, text, link)
File Name - varchar (original image name or link title)
Source - varchar (external link or name of file + ext)
Title - varchar (post title picked by user)
Message - text (user's actual post)
Date - int (unix timestamp)
I have other data stored relevant to the post in other tables which I grab with the post id (like user information) but I'm really doubting if this is the method I should be storing information in. I do use PDO, but I'm afraid this format might just be extremely slow.
Would there be any sense in storing the post information in another format? I don't want excessively large tables, so from a performance standpoint should I store some information as a blob/binary/xml/json?
I can't seem to find any good resources on PHP/MySQL optimization. Most information I come across tends to be 5-10 years old, content you have to pay for, too low-level, or just straight documentation which can't hold my attention for more than half an hour.
Databases are made to store 'data', and are fast to retrieve the data. Do not switch to anything else, stick with a database.
Try not to store pictures and video's in a database. Store them on disk, and keep a reference to them in a database table.
Finally, catch up on database normalization, it will help you in getting your database in optimal condition.
What you have seems okay, but you have missed the important bit about indexes and keys.
Firstly, I am assuming that your primary key will be field 1. Okay, no problems there, but make sure that you also stick an index on userID, PostID, Date and probably a composite on UserID, Date.
Secondly, are you planning on having search functions on these? In that case you may need to enable full text searches.
Don't muck around trying to store data in a JSON or other such things. Store it plain and simple. The last thing you want to be doing is trying to extract a field from the database just to see what is inside. If you database can't work it out, it is bad design.
On that note, there isn't anything wrong with large tables. As long as they are indexed nicely, a small table or large table will make very little difference in terms of accessing it (short of huge badly written SQL joins), so worry about simplicity to be able to get the data back from it.
Edit: A Primary Key is lovely way to identify a row by a unique column of some sort. So, if you want to delete a row, in your example, you might specify a delete from yourTable where ID=6 and you know that this will only delete one row as only one row can have ID=6.
On the other hand, an index is different to a key, in that it is like a cheat-sheet for the database to know where certain information is inside the table. For example, if you have an index on the UserID column, when you pass a userID in a query, the database won't have to look though the entire table, it looks at the index and knows the location of all the rows for that user.
A composite index is taking this one step further again, if you know what you will want to constantly query data for both UserID and ContentType, you can add in a composite index (meaning an index on BOTH fields in one index) which will then allow the database to return only the data you specify in a query using both those columns without having to sift through the entire table - nor even sift through all of a users posts to find the right content type.
Now, indexes take up some extra space on the server, so keep that in mind, but if your tables grow to be larger (which is perfectly fine) the improved efficiency is staggering.
At this time, stick with RDMS for now. Once you will be comfortable with PHP and MySQL then may be later on there will be more to learn like NoSQL, MongoDB etc. but for current purpose of yours as every thing has its purpose, this is quite right and will not slow down. Your table schema seems right with few modifications.
User id and Post id will be integer and I think this table is post so post id will be auto incremented and it will be primary key.
Other thing is that you are using 2 fields, filename and source, please note that filename will be file's name that is uploaded but if by source you mean complete path of file then then DB is not the place for storing complete path. Generate path from PHP function. to access that path every time not in DB. Otherwise if you will need to change path then it will be much overhead.
Also you asked about blob etc. Please note that it is better to store file in file system not in db while these fields like blob etc are good when one want to store file in DB table, that I don't recommend here.
I am doing a project (PHP) where i need to store about 4 different pieces of text about a person, each containing about 250 characters. there is currently no limit to the number times this must be done.
Would you suggest I store the 4 pieces of text in a database table and pull the text out of this, whenever a user enter the given persons page/profile, or should i rather make files out of them?
Which method would be the best in terms of speed, scalability etc.
Thanks
Databases are the perfect solution for what you want to do, and PHP has plenty of functions to work with them, so you don't have to reinvent the wheel to store data in flat files.
Think, for instance, of the pain you'll have in 6 months time when you'll have to take all those files and add a column to each one of them...
With a DB you'd just have to run one very simple query.
So, essentially, use a DB.
I would do this in a database. File operations are (as I recall) slower than doing a database query. The fact that you'll potentially have ~1k data for each person with a potentially unlimited amount of persons suggests that it would be better to do in a DB than as a text file. Define your table and then insert/select. The records are always gaurnteed to have consistent structure and you'll not have to worry about tripping over the delimiter character for fields.
My question:
I have a mysql database that consists of something like a fact table (although not every field is a lookup) and a variety of other tables. When I want to display data from that "fact" table, is it necessary to run a query to each individual lookup or is there a way to make a temporary table that has already done the "looking up"?
Example:
Table structure -
unique_id(auto increment int),
model(int, lookup to table #2),
type(int, lookup from table #2 to table #3)
employee(int, lookup to table #4)
notes(text)
cost(float)
hours(float)
-
So for instance when I want to make a php page to enter this data it seems like a lot more "work" than it needs to be:
unique_id (not shown as a data entry field, increments automatically
on submit)
model (drop down box. population requires query to table #2 where status = X)
type (read-only text box shows type of model. Requires query to table #3 based on column from table #2)
employee (drop down box. population requires query to table #4 where employee_status = "Active")
notes (text box, user inputs related notes to submission)
cost (texts box, user enters costs related to submission)
hours (text box, user enters hours related to submission)
Just to get a simple form populated with valid data requires what seems to me like A LOT of queries/lookups.
Is this the best way? Is there a better way?
Aside: I have control over the data structure, so if the problem is the database design, then those suggestions would be helpful as well.
Dimension tables typically don't change very often, at least relative to the number of inserts to the fact table. Dimension tables are also individually much smaller than the fact table. This makes dimension tables good candidates for caching.
What some people do to good effect is to render the partial HTML output for the form, with all the data populated as dropdowns, radiobuttons, etc. Then store that partial HTML under a memcached key so you don't have to do any of the database queries or the HTML render for most PHP requests -- you just fetch the pre-populated HTML fragment out of memcached and echo it verbatim. I think of this like the "Ikea" of database-driven output.
Of course if you ever do change data in a dimension table, you'd want to invalidate the cached HTML, or even better re-generate it and store a new version of the HTML in memcached.
Regarding doing all the lookups, I'll point out that there's no requirement to use pseudokeys in a fact table. You can use the natural values, and make them reference the primary key of the dimension table, which also can be a natural key instead of a pseudokey. It might take a bit more space in some cases, but it eliminates the lookups. Of course it may make sense to continue using pseudokeys for dimensions that are long varchars.
I'm not quite sure what you mean by " a query to each individual lookup". Do you mean a way to save your entire table in your php script? Or do you mean a way to cache on the mysql server to eliminate process resources on the database node?
MySQL includes a built in Caching system that eliminates a lot of server cycles for similar queries. You can find more here-> MySQL Caching
As far as your database structure, you're going to have to provide a little bit more detail about your schema (What your database is meant to do) if you would like some suggestions. It's hard to know what kind of structure works and is effective without knowing what it's supposed to do. (Are there multiple notes per employee, what are costs? Are they per employee? etc)
I have a pretty large social network type site I have working on for about 2 years (high traffic and 100's of files) I have been experimenting for the last couple years with tweaking things for max performance for the traffic and I have learned a lot. Now I have a huge task, I am planning to completely re-code my social network so I am re-designing mysql DB's and everything.
Below is a photo I made up of a couple mysql tables that I have a question about. I currently have the login table which is used in the login process, once a user is logged into the site they very rarely need to hit the table again unless editing a email or password. I then have a user table which is basicly the users settings and profile data for the site. This is where I have questions, should it be better performance to split the user table into smaller tables? For example if you view the user table you will see several fields that I have marked as "setting_" should I just create a seperate setting table? I also have fields marked with "count" which could be total count of comments, photo's, friends, mail messages, etc. So should I create another table to store just the total count of things?
The reason I have them all on 1 table now is because I was thinking maybe it would be better if I could cut down on mysql queries, instead of hitting 3 tables to get information on every page load I could hit 1.
Sorry if this is confusing, and thanks for any tips.
alt text http://img2.pict.com/b0/57/63/2281110/0/800/dbtable.jpg
As long as you don't SELECT * FROM your tables, having 2 or 100 fields won't affect performance.
Just SELECT only the fields you're going to use and you'll be fine with your current structure.
should I just create a seperate setting table?
So should I create another table to store just the total count of things?
There is not a single correct answer for this, it depends on how your application is doing.
What you can do is to measure and extrapolate the results in a dev environment.
In one hand, using a separate table will save you some space and the code will be easier to modify.
In the other hand you may lose some performance ( and you already think ) by having to join information from different tables.
About the count I think it's fine to have it there, although it is always said that is better to calculate this kind of stuff, I don't think for this situation it hurt you at all.
But again, the only way to know what's better your you and your specific app, is to measuring, profiling and find out what's the benefit of doing so. Probably you would only gain 2% of improvement.
You'll need to compare performance testing results between the following:
Leaving it alone
Breaking it up into two tables
Using different queries to retrieve the login data and profile data (if you're not doing this already) with all the data in the same table
Also, you could implement some kind of caching strategy on the profile data if the usage data suggests this would be advantageous.
You should consider putting the counter-columns and frequently updated timestamps in its own table --- every time you bump them the entire row is written.
I wouldn't consider your user table terrible large in number of columns, just my opinion. I also wouldn't break that table into multiple tables unless you can find a case for removal of redundancy. Perhaps you have a lot of users who have the same settings, that would be a case for breaking the table out.
Should take into account the average size of a single row, in order to find out if the retrieval is expensive. Also, should try to use indexes as while looking for data...
The most important thing is to design properly, not just to split because "it looks large". Maybe the IP or IPs could go somewhere else... depends on the data saved there.
Also, as the socialnetworksite using this data also handles auth and autorization processes (guess so), the separation between login and user tables should offer a good performance, 'cause the data on login is "short enough", while the access to the profile could be done only once, inmediately after the successful login. Just do the right tricks to improve DB performance and it's done.
(Remember to visualize tables as entities, name them as an entity, not as a collection of them)
Two things you will want to consider when deciding whether or not you want to break up a single table into multiple tables is:
MySQL likes small, consistent datasets. If you can structure your tables so that they have fixed row lengths that will help performance at the potential cost of disk space. One thing that from what I can tell is common is taking fixed length data and putting it in its own table while the variable length data will go somewhere else.
Joins are in most cases less performant than not joining. If the data currently in your table will normally be accessed all at the same time then it may not be worth splitting it up as you will be slowing down both inserts and quite potentially reads. However, if there is some data in that table that does not get accessed as often then that would be a good candidate for moving out of the table for performance reasons.
I can't find a resource online to substantiate this next statement but I do recall in a MySQL Performance talk given by Jay Pipes that he said the MySQL optimizer has issues once you get more than 8 joins in a single query (MySQL 5.0.*). I am not sure how accurate that magic number is but regardless joins will usually take longer than queries out of a single table.