what will be faster?
SELECT * FROM
or
SELECT specified FROM
background: table have one field (specified) which at the same time is a primary index
In your particular case it may very well be the same, but as a matter of good practice, you should always specify the columns you want.
In addition to the various good reasons Dark Falcon put in a comment, it also creates a form of self-documentation in your application code, since it's directly in the query each field you're expecting.
As a matter of good practice, it's usually better to explicitly specify the columns you want, regardless of the performance implications you're concerned about in this question.
But in general, the answer will depend heavily on your version of mysql. Profile it and see:
explain select * from ...;
explain select specified from ...;
I suspect strongly that this is a case of premature optimization, and that you don't really need to know which is faster.
Imho the explicit version will be faster, cause mysql don't need to look up what fields the table contains.
Depending on table structure (including indexes) it may not make a difference -- running some benchmarks and using EXPLAIN SELECT to see where things can be improved will help you along the way. But in general, if you know you only want n fields, only select n fields.
Just specify it, in case in the future more columns are added, but you don't want to retrieve all of them. In any case it is better to be specific.
Write a console app with 2 functions doing the two methods and loop 1000 times on each and print out the average time it took. This would be your fastest way to test the performance.
Generally it's better and I think faster to specify something in your sql query to avoid to get some unuseful data
The "select * from" format imho is just a fast way when quering as dba you just want a quick glance at the table. Even though that will work in programming I wouldn't recommend it, by listing the columns as a programmer it keeps you from having to go back and forth to your db and see what column you want to use or query for. It keeps you in one spot..that's just me though..this really is up to you and how you want to program.
You are looking at this bass ackwards. Do you need the content of the column or not, if you don't and you get it, that will take longer than not.
Parsing the sql to check the column names, order them and potentially alias them, is trivial compared to flooding the network transferring loads of stuff you don't need.
Related
Basically, I have tons of files with some data. each differ, some lack some variables(null) etc, classic stuff.
The part it gets somewhat interesting is that, since each file can have up to 1000 variables, and has at least 800~ values that is not null, I thought: "Hey I need 1000 columns". Another thing to mention is, they are integers, bools, text, everything. they differ by size, and type. Each variable is under 100 bytes, at all files, alth. they vary.
I found this question Work around SQL Server maximum columns limit 1024 and 8kb record size
Im unfamiliar with capacities of sql servers and table design, but the thing is: people who answered that question say that they should reconsider the design, but I cant do that. I however, can convert what I already have, as long as I still have that 1000 variables.
Im willing to use any sql server, but I dont know what suits my requirements best. If doing something else is better, please tell so.
What I need to do with this data is, look, compare, and search within. I dont need the ability to modify these. I thought of just using them as they are and keeping them as plain text files and reading from, that requires "seconds" of php runtime for viewing data out of "few" of these files and that is too much. Not even considering the fact that I need to check about 1000 or more of these files to do any search.
So the question is, what is the fastest way of having 1000++ entities with 1000 variables each, and searching/comparing for any variable I wish within them, etc. ? and if its SQL, which SQL server functions best for this sort of stuff?
Sounds like you need a different kind of database for what you're doing. Consider a document database, such as MongoDB, or one of the other not-only-SQL database flavors that allows for manipulation of data in different ways than a traditional table structure.
I just saw the note mentioning that you're only reading as well. I've had good luck with Solr on a similar dataset.
You want to use an EAV model. This is pretty common
You are asking for best, I can give an answer (how I solved it), but cant say if it is the 'best' way (in your environment), I had the Problem to collect inventory data of many thousend PCs (no not NSA - kidding)
my soultion was:
One table per PC (File for you?)
Table File:
one row per file, PK FILE_ID
Table File_data
one row per column in file, PK FILE_ID, ATTR_ID, ATTR_NAME, ATTR_VALUE, (ATTR_TYPE)
The Table File_data, was - somehow - big (>1e6 lines) but the DB handled that fast
HTH
EDIT:
I was pretty short in my anwser, lately; I want to put some additional information to my (and still working) solution:
the table 'per info source' has more than the two fields PK, FILE_ID ie. ISOURCE, ITYPE, where ISOURCE and ITYPE dscribe from where (I had many sources) and what basic Information type it is / was. This helps to get a structure into queries. I did not need to include data from 'switches' or 'monitors', when searching for USB divices (edit: to day probably: yes)
the attributes table had more fields, too. I mention here the both fileds: ISOURCE, ITYPE, yes, the same as above, but a slightly different meaning, the same idea behind
What you would have to put into these fields, depends definitely on your data.
I am sure, that if you take a closer look, what information you have to collect, you will find some 'KEY Values' for that
For storage, XML is probably the best way to go. There is really good support for XML in SQL.
For queries, if they are direct SQL queries, 1000+ rows isn't a lot and XML will be plenty fast. If you're moving towards a million+ rows, you're probably going to want to take the data that is most selective out of the XML and index that separately.
Link: http://technet.microsoft.com/en-us/library/hh403385.aspx
I'd like to know if this:
$column_family->get('row_key', $columns=array('name1', 'name2'));
Is faster then the more flexible get i now use:
$column_family->get('row_key');
Method 1 is harder to implement of course but will it give less load/bandwidth/delay?
Cassandra is not mysql so it will come as no surprise that some things are different there. :)
In this case, Cassandra's sparse-row storage model means that for small numbers of columns the full-row version will be faster because Cassandra doesn't need to deserialize and check its row-level column entries.
Of course for larger numbers of columns the extra work of deserializing more than you need will dominate again.
Bottom line: worrying about this is almost certainly premature optimization. When it's not, test.
First one is faster, especially if you work with large tables that contain plenty of columns.
Even you have just two columns called name1 and name2, specifying their names should avoid extracting column names from table structure on MySQL side. So it should be faster than using * selector.
However, test your results using microtime() in PHP against large tables and you'll see what I'm talking about. Of course, if you have 20+ columns in table and you want to extract them all it's easier to put * than listing all those column-names but in terms of speed, listing columns is bit quicker.
The best way to check out this conclusion, is to test it by yourself.
Users can do advanced searches (they are many possible parameters):
/search/?query=toto&topic=12&minimumPrice=0&maximumPrice=1000
I would like to store the search parameters (after the /search/?) for an email alert.
I have 2 possibilites:
Storing the raw request (query=toto&topicId=12&minimumPrice=0&maximumPrice=1000) in a table with a structure like id, parameters.
Storing the request in a structured table id, query, topicId, minimumPrice, maximumPrice, etc.
Each solution has its pros and cons. Of course the solution 2 is the cleaner, but is it really worth the (over)effort?
If you already have implemented such a solution and have experienced the maintenance of it, what is the best solution?
The better solution should be the best for each dimension:
Rigidity
Fragility
Viscosity
Performance
Daniel's solution is likely to be the cleanest solution, but I get your point about performance. I'm not very familiar with PHP, but there should be some db abstraction library that takes care relations and multiple inserts so that you get the best performance, right? I only mention it because there may not be a real performance issue. DO you have load tests that point to an issue perhaps?
Anyway, if it is between your original 2 solutions, I would have to select the first. Having a table with column names (like your solution #2) is just asking for trouble. If you add new params, you have to modify the table columns. And there is the ever present issue of "what do we put to indicate not selected vs left empty?"
So I don't agree that solution 2 is cleaner.
You could have a table consisting of three columns: search_id, key, value with the two first being the primary key. This way you can reconstruct a particular search if you have the ID of a saved search. This also allows you to expand with additional search keywords without having to actually modify your table.
If you wish, you can also have key be a foreign key to another table containing valid search terms to ensure integrity. Whether you want to do that depends on your specific needs though.
Well that's completely dependent on what you want to do with the data. For the PHP part, you need to process it anyway, either on insertion or selection time.
For really large number of parameters you may save some time with the 1st on the database management/maintenance, since you don't need to change anything about your database scheme.
Daniel's answer is a generic solution, but if you consider performance an issue, you may end up doing too many inserts on the database side for a single search (one for each parameter). Too many inserts is a common source of performance problems.
You know your resources.
Let's say you've got a table with a timestamp column, and you want to parse that column into two arrays - $date and $time.
Do you, personally:
a) query like this DATE(timestamp), TIME(timestamp) , or perhaps even going as far as HOUR(timestamp), MINUTE(timestamp
b) grab the timestamp column and parse it out as needed with a loop in PHP
I feel like (a) is easier... but I know that I don't know anything. And it feels a little naughty to make my query hit the same column 2 or 3 times for output...
Is there a best-practice for this?
(a) is probably fine, if it is easier for your code base. I am a big fan of not having to write extra code that is not needed and I love only optimizing when necessary. To me pulling the whole date and then parsing seems like premature optimization.
Always remember that sql servers have a whole lot of smarts in them for optimizing queries so you don't have to.
So go with a), if you find it is dog slow or cause problems, then go to b). I suspect that a) will do all you want and you will never think about it again.
I would personally do (b). You're going to be looping the rows anyway, and PHP's strtotime() and date() functions are so flexible that they will handle most of the date/time formatting issues you run into.
I tend to try to keep my database result sets as small as possible, just so I don't have to deal with lots of array indexes after a database fetch. I'd much rather take a single timestamp out of a result row and turn it into several values in PHP than have to deal with multiple representations of the same data in a result row, or edit my SQL queries to get specific formatting.
b) is what I follow and I use it every time. It also gives you the flexibility of being able to control how you want it to appear in your front end. Think about this: If you are following a) and you want to do a change, you will need to change all the queries manually. But if you are using b) you can just call a function on this value (from the DB) and you are good to go. If you ever need to change anything, just change it within this function and viola! Doesn't that sound like a time saver to you ???
Hope that helps.
I would also use b). I think it is important that, if I at some point need to use names on the days, or the months in another language. I can use PHP locale support to translate it to the given language, that wouldn't be the case in a).
If you need it in the SQL query itself (e.g. in a WHERE, GROUP BY, ORDER BY, etc), then way a) is preferred. If you rather need it in the code logic (PHP or whatever), then way b) is preferred.
If your PHP code actually does a task which can be as good done with SQL, then I'd go for that as well. In other words, way b) is only preferred if you are going to format the date for pure display purposes only.
I think it boils down to this, do you feel more at home writing php code or mysql queries?
I think this is more a question of coding style than technical feasibility, and you get to choose your style.
What is better extra query or extra column in database for data that will be available very less time.
Example: In Case of sub user management either i add one extra column super_user_id in main users table and make enrty if users types are sub_user and the default column value is -1 or i create new table and manage sub user in that table.
But in case of login i have to search in two tables and this i have to make one more query.
Thanks
There is no general answer; you'll have to be more specific. All I can provide are general principles.
All else being equal, you'll be better off with a well-normalized database without redundant information, for a number of reasons. But there are situations where redundant information could save your program a lot of time. One example is text formatted with Markdown: you need to store the original markup to allow for editing, but formatting the source every time you need the output may be extremely taxing on the system. Therefore, you might add a redundant column to store the formatted output and assume the additional responsibility of ensuring that that column is kept up-to-date.
All I know about your situation is that the postulated extra column would save a query. The only correct answer to that is that you should probably keep your table clean and minimal unless you know that the performance benefit of saving one query will make up for it. Remember, premature optimization is the root of all evil – you may find that your application runs more than fast enough anyways. If find while profiling that the extra query is a significant bottleneck, then you might consider adding the column.
Again, without more knowledge of your situation, it is impossible to provide a specific or concrete recommendation, but I hope that I've at least helped you to come to a decision.
Do you mean calculating a value in the your query versus storing a calculated value?
This depends on how often it will be updated, how big the data will be, how often it is needed. There may be no theoretical best answer, you will need to test and profile.
It depends on amount of redundency you will ad to table by adding a column.
With proper indexing and design joins work better so no need to afraid of normalizing if required.
Use the second table. It will not require you to issue two queries. Instead, you will issue a single query JOINing the two tables together or, better yet, create a VIEW that does the JOIN for you:
SELECT usertable.col1, usertable.col2 superusertable.superuserid
FROM usertable LEFT OUTER JOIN superusertable
ON usertable.userid = superusertable.userid
This allows you to maintain proper normalized structure, helps you in certain queries (like figuring out who is a super_user), and allows the database to optimize the search issues.
Doing an additional query will always take more time.
Adding an extra column in DB will not have any significant impact, even if you should have thousands of rows.
Ergo, add extra column and save DB trafic :)