I am using the commonly known reddit 'hot' algorithm on my table 'posts'. Now this hot column is a decimal number like this: 'XXXXX,XXXXXXXX'
I want this column to be an index, because when I order by 'hot', I want the query to be as fast as possible. However, I am kind of new to indexes. Does an index need to be unique?
If it has to be unique, would this work and be efficient?
$table->unique('id', 'hot');
If it does not have to be unique, would this be the right approach?
$table->index('hot');
Last question: would the following query be taking advantage of the index?
Post::orderBy('hot', 'desc')->get()
If not, how should I modify it?
Thank you very much!
Do not make it UNIQUE unless you need the constraint that you cannot insert duplicates.
Phrased differently, a UNIQUE key is two things: an INDEX (for speedy searching) and a "constraint" (to give an error when trying to insert a dup).
ORDER BY hot DESC can use INDEX(hot) (or UNIQUE(hot)). I say "can", not "will", because there are other issues where the Optimizer may decide not to use the index. (Without seeing the actual query and knowing more about the the dataset, I can't be more specific.)
If id is the PRIMARY KEY, then neither of these is of any use: INDEX(id, hot); UNIQUE(id, hot). Swapping the order of the columns makes sense. Or simply INDEX(hot).
A caveat: EXPLAIN does not say whether the index is used for ORDER BY, only for WHERE. On the other hand, EXPLAIN FORMAT=JSON does give more details. Try that.
(Yes, DECIMAL columns can be indexed.)
Related
I am trying to make my DB more optimized and are in the beginning of indexing it but not sure how to do it right.
I have this query:
$year = date("Y");
$thisYear = $year;
//$nextYear = $thisYear + 1;
$sql = mysql_query("SELECT SUM(points) as userpoints
FROM ".$prefix."_publicpoints
WHERE date BETWEEN '$thisYear" . "-01-01' AND '$thisYear" . "-12-31' AND fk_player_id = $playerid");
$row = mysql_fetch_assoc($sql);
$userPoints = $row['userpoints'];
$sql = mysql_query("SELECT
fk_player_id
FROM ".$prefix."_publicpoints
WHERE date BETWEEN '$thisYear" . "-01-01' AND '$thisYear" . "-12-31'
GROUP BY fk_player_id
HAVING SUM(points) > $userPoints");
$row = mysql_fetch_assoc($sql);
$userWrank = mysql_num_rows($sql)+1;
I am not sure how to index this? I have tried indexing the fk_player_id but it still looks through all the rows (287937).
I have indexed the date field which gives me this back in EXPLAIN:
1
SIMPLE
nf_publicpoints
range
IDXdate
IDXdate
3
NULL
143969
Using where with pushed condition; Using temporary...
I also have 2 calls to the same table... Could that be done in one?
How do I index this and/or could it be done smarter?
You should definitely spend some time reading up on indexing, there's a lot written about it, and it's important to understand what's going on.
Broadly speaking, and index imposes an ordering on the rows of a table.
For simplicity's sake, imagine a table is just a big CSV file. Whenever a row is inserted, it's inserted at the end. So the "natural" ordering of the table is just the order in which rows were inserted.
Imagine you've got that CSV file loaded up in a very rudimentary spreadsheet application. All this spreadsheet does is display the data, and numbers the rows in sequential order.
Now imagine that you need to find all the rows that has some value "M" in the third column. Given what you have available, you have only one option. You scan the table checking the value of the third column for each row. If you've got a lot of rows, this method (a "table scan") can take a long time!
Now imagine that in addition to this table, you've got an index. This particular index is the index of values in the third column. The index lists all of the values from the third column, in some meaningful order (say, alphabetically) and for each of them, provides a list of row numbers where that value appears.
Now you have a good strategy for finding all the rows where the value of the third column is M! For instance, you can perform a binary search! Whereas the table scan requires you to look N rows (where N is the number of rows), the binary search only requires that you look at log-n index entries, in the very worst case. Wow, that's sure a lot easier!
Of course, if you have this index, and you're adding rows to the table (at the end, since that's how our conceptual table works), you need need to update the index each and every time. So you do a little more work while you're writing new rows, but you save a ton of time when you're searching for something.
So, in general, indexing creates a tradeoff between read efficiency and write efficiency. With no indexes, inserts can be very fast -- the database engine just adds a row to the table. As you add indexes, the engine must update each index while performing the insert.
On the other hand, reads become a lot faster.
Hopefully that covers your first two questions (as others have answered -- you need to find the right balance).
Your third scenario is a little more complicated. If you're using LIKE, indexing engines will typically help with your read speed up to the first "%". In other words, if you're SELECTing WHERE column LIKE 'foo%bar%', the database will use the index to find all the rows where column starts with "foo", and then need to scan that intermediate rowset to find the subset that contains "bar". SELECT ... WHERE column LIKE '%bar%' can't use the index. I hope you can see why.
Finally, you need to start thinking about indexes on more than one column. The concept is the same, and behaves similarly to the LIKE stuff -- essentialy, if you have an index on (a,b,c), the engine will continue using the index from left to right as best it can. So a search on column a might use the (a,b,c) index, as would one on (a,b). However, the engine would need to do a full table scan if you were searching WHERE b=5 AND c=1)
Hopefully this helps shed a little light, but I must reiterate that you're best off spending a few hours digging around for good articles that explain these things in depth. It's also a good idea to read your particular database server's documentation. The way indices are implemented and used by query planners can vary pretty widely.
More information and example visit here : http://blog.sqlauthority.com/category/sql-index/
Try create index on date column, indexing fk_payer_id will not help with this query. If does not work - paste explain...
For more information about indexes in Mysql look here: http://hackmysql.com/case1
Why not index the date column, seeing how that's the main criterion that will be evaluated in the lookup?
I have this query:
SELECT * FROM tablename WHERE
((samaccountname_s='".$sender."' AND samaccountname_r='".$contact."') OR
(samaccountname_s='".$contact."' AND samaccountname_r='".$sender."'))
ORDER BY timestamp ASC
And it is seriously slow, I think some of the latency may be down to the server, but I can't help but think there is a more efficient query to get the result that I'm looking for above.
In short: The query checks to see if there are messages between two people ($sender and $contact)
Could this be done any more efficiently/quickly?
put indices on samaccountname_s and samaccountname_r. Since you're checking both in one query, put them in the same index. Like this
ALTER TABLE mytable add INDEX sam_index (samaccountname_s, samaccountname_r);
or like this
CREATE INDEX sam_index ON mytable (samaccountname_s, samaccountname_r);
You should add some indexes to the table. At the very least you'll want to index the samaccountname_s and samaccountname_r fields. Like so:
ALTER TABLE tablename ADD INDEX IDX_tablename_samaccountname_s (samaccountname_s);
ALTER TABLE tablename ADD INDEX IDX_tablename_samaccountname_r (samaccountname_r);
Indexing the timestamp column might help as well since you are using it in your order by:
ALTER TABLE tablename ADD INDEX IDX_tablename_timestamp (timestamp);
You are probably missing indices.
Basically, indices make lots of searching for data feasible from a performance standpoint. The only reason that columns are not always indexed, is because indices also slow down inserts, although not by much. A good general rule is to keep indices on fields that you normally search by, for example foreign keys and such.
Here is the MySQL article on how to create an index:
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
The columns you should have indexed in your query are:
samaccountname_s
samaccountname_r
timestamp
Hope it helps!
P.S.: You really should escape the values in that query. Consider switching to PDO (http://php.net/manual/en/ref.pdo-mysql.php), or at the very least, use manual escaping: http://php.net/manual/en/function.mysql-real-escape-string.php
From someone with more experience than myself, would it be a better idea to simply count the number of items in a table (such as counting the number of topics in a category) or to keep a variable that holds that value and just increment and call it (an extra field in the category table)?
Is there a significant difference between the two or is it just very slight, and even if it is slight, would one method still be better than the other? It's not for any one particular project, so please answer generally (if that makes sense) rather than based on something like the number of users.
Thank you.
To get the number of items (rows in a table), you'd use standard SQL and do it on demand
SELECT COUNT(*) FROM MyTable
Note, in case I've missed something, each item (row) in the table has some unique identifier, whether it's a part number, some code, or an auto-increment. So adding a new row could trigger the "auto-increment" of a column.
This is unrelated to "counting rows". Because of DELETEs or ROLLBACK, numbers may not be contiguous.
Trying to maintain row counts separately will end in tears and/or disaster. Trying to use COUNT(*)+1 or MAX(id)+1 to generate a new row identifier is even worse
I think there is some confusion about your question. My interpretation is whether you want to do a select count(*) or a column where you track your actual count.
I would not add such a column, if you don't have reasons to do so. This is premature optimization and you complicate your software design.
Also, you want to avoid having the same information stored in different places. Counting is a trivial task, so you actually duplicating information, which is a bad idea.
I'd go with just counting. If you notice a performance issue, you can consider other options, but as soon as you keep a value that's separate, you have to do some work to make sure it's always correct. Using COUNT() you always get the actual number "straight from the horse's mouth" so to speak.
Basically, don't start optimizing until you have to. If everything works fine and fast using COUNT(), then do that. Otherwise, store the count somewhere, but rather than adding/subtracting to update the stored value, run COUNT() when needed to get the new number of items
In my forum I count the sub-threads in a forum like this:
SELECT COUNT(forumid) AS count FROM forumtable
As long as you're using an identifier that is the same to specify what forum and/or sub-section, and the column has an index key, it's very fast. So there's no reason to add more columns than you need to.
I am building an inventory tracking system for internal use at my company. I am working on the database structure and want to get some feedback on which design is better*.
I need a recursive(i might be using this term wrong...) system where a part could be made up of zero or more parts. I though of two ways to do this but am not sure which one to use. I am not an expert in database design so maybe there is a their option that i haven't thought of.
Option 1:
Two tables one with the part_id and the other with part_id, sub_part_id (which refers to another part_id) and quantity. so one table part_id would be unique and the other table there could be zero or more rows showing all the parts that make up a certain part.
Option 2:
One table with part_id and assembly. assembly would be a text field that looks something like this, part_id,quantity;part_id,quanity;.... I would then use the PHP explode() function to separate by semi-colon and again by comma to get an array of the sub parts.
I hope this all makes sense. I am using PHP/MySQL.
*community wiki because this may be subjective.
Generally, option 1 is preferable to option 2, not least because some of the part IDs in the assembly would themselves be assemblies.
You do have to deal with recursive or tree-structured queries. That is not particularly easy in any dialect of SQL. Some systems have better support for them than others. Oracle has its CONNECT BY PRIOR system (weird, but it sort of works), and DB2 has recursive WITH clauses, and ...
NEVER, never ever use procedural languages like PHP or C# to process data structures when you have a database engine for that. Relational data structures are much more faster and flexible, and surer, than storing text. Forget about Option 2.
You could use recursive UDFs to retrieve the whole tree with no big fuss about it.
How about a nullable foreign key on the same table? Something like:
CREATE TABLE part (
part_id int not null auto_increment primary key,
parent_part_id int null,
constraint fk_parent_part foreign key (parent_part_id) references part (part_id)
)
Definitely not option 2. That is a recipe for trouble. The correct answer depends on how many potential levels of assemblies are possible, and how you think of the assemblies. Do you think of an assembly (a composite onject consisting of 2 or more atomic parts) as a part in it's own right, that can itself be used as a subpart in anothe assmebly? Or are assemblies a fundementally differrent kind of thing froma an atomic part?
If the former is the case, then put all assemblies and parts in one table, with a PartID, and add a second table that just has the construction details for those parts that are composed of multiple other parts (which themseleves may be assemblies of yet more atomic parts). This second table would look like this:
ConstructionDetails
PartId, SubPartId, QuantityRequired
If you think of things more like the second way, then only put the atomic parts in the first table, and put the assemblies in the second table
Assemblies
AssemblyId, PartId, QuantityRequired
I have a database with over 10,000,000 rows. Querying it right now can take a few seconds just to find some basic information. This isn't preferable, I know that the best way to optimize is to minimize the number of rows which is possible, but right now I don't have the time to do this.
What's the easiest way to optimize a MySQL database so that when querying it, the time taken is short?
I don't mind about the size of the database, that doesn't really matter so any optimizations that increase the size are fine. I'm not very good with optimization, right now I have indexes set up, but I'm not sure how much better I can get from there.
I'll eventually trim down the database properly, but is there a quick temporary solution?
Besides indexing which has already been suggested, you may want to also look into partitioning tables if they are large.
Partitioning in MySQL
It's tough to be specific here, because we have very limited information, but proper indexing along with partitioning can go a very long way. Indexing properly can be a long subject, but in a very general sense you'll want to index columns you query against.
For example, say you have a table of employees, and you have your usual columns of SSN, FNAME, LNAME. In addition to those columns, we'll say that you have an additional 10 columns in the table as well.
Now you have this query:
SELECT FNAME, LNAME FROM EMPLOYEES WHERE SSN = 'blah';
Ignoring the fact that the SSN could likely be the primary key here and may already have a unique index on it, you would likely see a performance benefit by creating another composite index containing the columns (SSN, FNAME, LNAME). The reason this is beneficial is because the database can satisfy this query by simply looking at the composite index because it contains all the values needed in a sorted and compact space. (that is, less I/O). Even though the index on SSN only is a better access method to doing a full table scan, the database still has to read the data blocks for the index (I/O), find the value(s) which will contain pointers to the records needed to satisfy the query, then will need to read different data blocks (read: more random I/O) in order to retrieve the actual values for fname and lname.
This is obviously very simplified, but using indexes in this way can drastically reduce I/O and increase performance of your database.
Some other links here you may find helpful:
MySQL indexes - how many are enough?
When should I use a composite index?
MySQL Query Optimization (Particularly the section on "Choosing Indexes")
As I can see you request 40k rows from the database, this load of data needs time just to be transferred.
Also, never ask "how to improve in general". There is no way of "general" optimization. Optimization is always result of profiling and research of your particular case.
Use indexes on columns you search on very often.
In your example, 'WHERE x=y', if y is column name, create an index with y also.
The key with index is the # of result from your select query should be around 3% ~ 5% comparing entire table and it will be faster.
Also archieving table helps. I do not know how to do this, mostly DBA task.
For DBA it is simple task if they have been doing this.
If you're doing ordering or complex queries you may need to use multi-column indexes. For example if you're searching where x.name = 'y' OR x.phone = 'z' it might be worth putting an index on name,phone. Simplified example, but if you need to do this you'll need to research it further anyway :)
Are your queries using your indexes? What does running an EXPLAIN on your select queries tell you?
The first (and easiest) step will be making sure your queries are optimized.