I am doing a project (PHP) where i need to store about 4 different pieces of text about a person, each containing about 250 characters. there is currently no limit to the number times this must be done.
Would you suggest I store the 4 pieces of text in a database table and pull the text out of this, whenever a user enter the given persons page/profile, or should i rather make files out of them?
Which method would be the best in terms of speed, scalability etc.
Thanks
Databases are the perfect solution for what you want to do, and PHP has plenty of functions to work with them, so you don't have to reinvent the wheel to store data in flat files.
Think, for instance, of the pain you'll have in 6 months time when you'll have to take all those files and add a column to each one of them...
With a DB you'd just have to run one very simple query.
So, essentially, use a DB.
I would do this in a database. File operations are (as I recall) slower than doing a database query. The fact that you'll potentially have ~1k data for each person with a potentially unlimited amount of persons suggests that it would be better to do in a DB than as a text file. Define your table and then insert/select. The records are always gaurnteed to have consistent structure and you'll not have to worry about tripping over the delimiter character for fields.
Related
I am making a website that interacts with an offline project through json files sent from the offline project to the site.
The site will need to load these files and manipulate the data.
Is it feasible with modern computing power to simply load these files into the database as a single serialized field, which can then be loaded and decoded for every use?
Or would it save significant overhead to properly store the JSON as tables and fields and refer to those for every use?
Without knowing more about the project, a table with multiple fields is probably the better solution.
There will be more options for the data in the long run, for example, indexing fields, searching through fields and many other MySQL commands that would not be possible if it was all stored in a single variable.
Consider future versions of the project too, example adding another field to a table is easy, but adding another field to a block of JSON would be more difficult.
Project growth, what if you experience 100x or 1000x growth will the table handle the extra load.
500kb is a relatively small data block, there shouldn't be any issue with computing power regardless of which method is used, although more information would be handy here, example 500kb per user, per upload, how many stores a day how often is it accessed.
Debugging will also be easier.
The New MySQL shell has a bulk JSON loader that is not only very quick but lets you have a lot of control on how the data is handled. See https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-json.html
Load it as JSON.
Think about what queries you need to perform.
Copy selected fields into MySQL columns so that those queries can use WHERE, GROUP BY, ORDER BY of MySQL instead of having to deal with the processing in the client.
A database table contains a bunch of similarly structured rows. Each row has a constant set of columns. (NULLs can be used to indicate missing columns for a given row.) JSON complicates things by providing a complex column. My advice above is a compromise between the open-ended flexibility of JSON and the need to use the database server to process lots of data. Further discussion here.
I have a data structure type question that I don't really know the answer too. Essentially I have four permission controls (isSecret, canEdit, isActive and hasPage) that I need to store in a for a number of different tables.
I have two solutions in mind, but I'm not sure which is the best performance wise:
Store each permission as a separate row on each table. To me this seems to be the fastest way to access the data when querying, but because PHP will handle permissions 90% of the time, it seems inefficient.
Have a single permissions column where the permission name (sec,edt,act,has) is stored as a comma separated string. This gives me flexibly in the future to introduce new/different permissions, looks neat in my database and is easy to use in both PHP and mySQL (I can use the IN lookup for queries and explode the string and work with it as an array in PHP). This column would be a varchar of 40 characters, allowing to me store up to 10 different permissions (3 letters and a comma)
Option 2 was my preferred solution until I realised that the IN command might be resource intensive. I thought it might take a performance hit using an IN command on every row in my table trying to look for inactive pages to filter out. To solve this, I could just fetch every single row in my column, and then filter the rows out with PHP, but again, I'm not sure how effective this will be.
Ideally I think in my solution sub-columns would be the best solution (a main permissions column and under this 4 sub-columns for each of my permissions) that could then be queried easily (ie. where permission.canEdit = 1)
Results will eventually be cached using memcache (when I am able to figure out how to use it and an effective method for clearing it), but I don't want to have to rely on this.
I think SETs would be what you need
I have a system where people would upload something, then a single link would identify the uploaded file and display for other users. Each file would have its own page and they will have a comment box. Now I have the general idea of how to create the PHP script and the SQL I just have one problem. What would be beter? storing up all of the comments for each file in a single row, and adding a delimiter for each new comment? Or creating a new row for each comment input but having the same reference to the file? I would prefer if each comment had its own row in the SQL but then it can easily grow up to thousands of entries per day and im not sure if that's efficient.
Which approach from the two is the most efficient?
Adding a new row for each comment would make more sense. You could have a column in the comment table which would be FileID and easily filter the results by that. While you will have people commenting on it and that will add rows, it will be much more efficient than having a single row. Especially considering that you will likely not have enough space for thousands of comments in a single row!
If you're really concerned about having too many comments, you should look into a NoSql solution like MongoDB or Cassandra. These systems are made specifically for handling large amounts of inserts (ie: new comments).
If (each) comment length is limited then go for seprate rows with VARCHAR type. If you choose single row for all comments pertaining to each file then you cannot, before hand, estimate the length comments and you will have to declare it TEXT type, which means the data will be stored in separate file and this will impact the retrieval time negatively. Moreover, advanced querying and data-mining will be easier with separate rows.
Also separate rows will be consistent with some requirements of higher order normalisation.
Make sure a comment has a set maxlength and from the database have the row with VARCHAR type. It is not recommended saving data in one field using delemeters and it would be a real hassle if you want to save information about the comments on a later stage (date / posted by / etc) or even when querying back for the information.
Save them as separate entries on a separate table with a reference to the uploadId.
I'm struggling with a philosophical question on database programming in PHP. In particular, I'm trying to decide when it's best to read in an entire table into an object, vs. querying MySQL directly whenever I need data.
Is there ever a situation where you'd want to just read in the entire database into an object? Where do you draw the line?
For example, if I had a table full of names and phone numbers, and I need to get the phone number for one individual, that's a simple one-time mysql query. Reading in an entire table into an associative array just to get one phone number sounds ridiculous... But:
(1) what if I need to get the names and phone numbers of 50 individuals? 100? 1000?
(2) When is it more efficient (if ever) to read in the entire table into an object? Is performing 1000 mysql queries on 1000 names always going to be more efficient than reading in the entire table?
(2a) Obviously it would depend on the total number of records in the table. Would it be better to do 1000 queries for 1000 phone numbers, or read in a table of 2000 total records from a MySQL into an associative array? What if it was 5000 total records, and I needed 1000? What if it was 10k? Etc. etc.
(3) What if I need to do something a little more complex, like return all phone numbers in a certain area code? Obviously in that case I could use a regexp SQL query, but I'm sure I could come up with a more complex case where a simple query doesn't give me exactly what I want.
I guess what I'm getting at is, as a developer, you have several knobs you can turn to optimize your application. Obviously you want to think about the data you're using and optimize the database model to match the types of data requests you'll be doing. But sometimes you get into a mutually exclusive case where you're forced to pick optimizing your data model for one scenario, at the expense of another, competing scenario.
Any thoughts?
Databases are designed to be efficient at locating and returning exactly the data that you need to work with for a particular operation.
Transferring data over a network connection is orders of magnitude slower than processing it on the machine where it resides. Use databases for what they're good at... holding lots of information and allowing application code to query and work with exactly the subset of that data it needs to at a given point in time.
If you find that you need to frequently access the same data over and over, caching it at the application layer or in a dedicated caching solution like memcached does make sense, but I cannot imagine a scenario where it makes sense just to read in a whole table because my application logic needs to process a subset of the rows and/or columns in the table.
(3) but I'm sure I could come up with a more complex case where a simple query doesn't give me exactly what I want.
This is usually an indication that your database hasn't been properly normalized and/or has design flaws.
(2) When is it more efficient (if ever) to read in the entire table into an object? Is performing 1000 mysql queries on 1000 names always
Neither is a good choice. SQL is intended for set-based operations. You really need to use the system correctly for it to work well, but to do this you have to have properly designed your database. The best thing would be to write one query that returns exactly the records you want, no more and no less.
what if I need to get the names and phone numbers of 50 individuals
Maybe use something like select * where ID in (1,2,3,...,50), if you have a larger number of users, maybe create a temporary table with the list of users you want, and join on that. With a properly designed database there is usually a good way to retrieve a set of data with a single query.
I have a pretty large social network type site I have working on for about 2 years (high traffic and 100's of files) I have been experimenting for the last couple years with tweaking things for max performance for the traffic and I have learned a lot. Now I have a huge task, I am planning to completely re-code my social network so I am re-designing mysql DB's and everything.
Below is a photo I made up of a couple mysql tables that I have a question about. I currently have the login table which is used in the login process, once a user is logged into the site they very rarely need to hit the table again unless editing a email or password. I then have a user table which is basicly the users settings and profile data for the site. This is where I have questions, should it be better performance to split the user table into smaller tables? For example if you view the user table you will see several fields that I have marked as "setting_" should I just create a seperate setting table? I also have fields marked with "count" which could be total count of comments, photo's, friends, mail messages, etc. So should I create another table to store just the total count of things?
The reason I have them all on 1 table now is because I was thinking maybe it would be better if I could cut down on mysql queries, instead of hitting 3 tables to get information on every page load I could hit 1.
Sorry if this is confusing, and thanks for any tips.
alt text http://img2.pict.com/b0/57/63/2281110/0/800/dbtable.jpg
As long as you don't SELECT * FROM your tables, having 2 or 100 fields won't affect performance.
Just SELECT only the fields you're going to use and you'll be fine with your current structure.
should I just create a seperate setting table?
So should I create another table to store just the total count of things?
There is not a single correct answer for this, it depends on how your application is doing.
What you can do is to measure and extrapolate the results in a dev environment.
In one hand, using a separate table will save you some space and the code will be easier to modify.
In the other hand you may lose some performance ( and you already think ) by having to join information from different tables.
About the count I think it's fine to have it there, although it is always said that is better to calculate this kind of stuff, I don't think for this situation it hurt you at all.
But again, the only way to know what's better your you and your specific app, is to measuring, profiling and find out what's the benefit of doing so. Probably you would only gain 2% of improvement.
You'll need to compare performance testing results between the following:
Leaving it alone
Breaking it up into two tables
Using different queries to retrieve the login data and profile data (if you're not doing this already) with all the data in the same table
Also, you could implement some kind of caching strategy on the profile data if the usage data suggests this would be advantageous.
You should consider putting the counter-columns and frequently updated timestamps in its own table --- every time you bump them the entire row is written.
I wouldn't consider your user table terrible large in number of columns, just my opinion. I also wouldn't break that table into multiple tables unless you can find a case for removal of redundancy. Perhaps you have a lot of users who have the same settings, that would be a case for breaking the table out.
Should take into account the average size of a single row, in order to find out if the retrieval is expensive. Also, should try to use indexes as while looking for data...
The most important thing is to design properly, not just to split because "it looks large". Maybe the IP or IPs could go somewhere else... depends on the data saved there.
Also, as the socialnetworksite using this data also handles auth and autorization processes (guess so), the separation between login and user tables should offer a good performance, 'cause the data on login is "short enough", while the access to the profile could be done only once, inmediately after the successful login. Just do the right tricks to improve DB performance and it's done.
(Remember to visualize tables as entities, name them as an entity, not as a collection of them)
Two things you will want to consider when deciding whether or not you want to break up a single table into multiple tables is:
MySQL likes small, consistent datasets. If you can structure your tables so that they have fixed row lengths that will help performance at the potential cost of disk space. One thing that from what I can tell is common is taking fixed length data and putting it in its own table while the variable length data will go somewhere else.
Joins are in most cases less performant than not joining. If the data currently in your table will normally be accessed all at the same time then it may not be worth splitting it up as you will be slowing down both inserts and quite potentially reads. However, if there is some data in that table that does not get accessed as often then that would be a good candidate for moving out of the table for performance reasons.
I can't find a resource online to substantiate this next statement but I do recall in a MySQL Performance talk given by Jay Pipes that he said the MySQL optimizer has issues once you get more than 8 joins in a single query (MySQL 5.0.*). I am not sure how accurate that magic number is but regardless joins will usually take longer than queries out of a single table.