I have to get all entries in database that have a publish_date between two dates. All dates are stored as integers because dates are in UNIX TIMESTAMP format...
Following query works perfect but it takes "too long". It returns all entries made between 10 and 20 dazs ago.
SELECT * FROM tbl_post WHERE published < (UNIX_TIMESTAMP(NOW())-864000)
AND published> (UNIX_TIMESTAMP(NOW())-1728000)
Is there any way to optimize this query? If I am not mistaken it is calling the NOW() and UNIX_TIMESTAMP on evey entry. I thought that saving the result of these 2 repeating functions into mysql #var make the comparison much faster but it didn't. 2nd code I run was:
SET #TenDaysAgo = UNIX_TIMESTAMP(NOW())-864000;
SET #TwentyDaysAgo = UNIX_TIMESTAMP(NOW())-1728000;
SELECT * FROM tbl_post WHERE fecha_publicado < #TenDaysAgo
AND fecha_publicado > #TwentyDaysAgo;
Another confusing thing was that PHP can't run the bove query throught mysql_query(); ?!
Please, if you have any comments on this problem it will be more than welcome :)
Luka
Be sure to have an index on published.And make sure it is being used.
EXPLAIN SELECT * FROM tbl_post WHERE published < (UNIX_TIMESTAMP(NOW())-864000) AND published> (UNIX_TIMESTAMP(NOW())-1728000)
should be a good start to see what's going on on the query. To add an index:
ALTER TABLE tbl_post ADD INDEX (published)
PHP's mysql_query function (assuming that's what you're using) can only accept one query per string, so it can't execute the three queries that you have in your second query.
I'd suggest moving that stuff into a stored procedure and calling that from PHP instead.
As for the optimization, setting those variables is about as optimized as you're going to get for your query. You need to make the comparison for every row, and setting a variable provides the quickest access time to the lower and upper bounds.
One improvement in the indexing of the table, rather than the query itself would be to cluster the index around fecha_publicado to allow MySQL to intelligently handle the query for that range of values. You could do this easily by setting fecha_publicado as PRIMARY KEY of the table.
The obvious things to check are, is there an index on the published date, and is it being used?
The way to optimize would be to partition the table tbl_post on the published key according to date ranges (weekly seems appropriate to your query). This is a feature that is available for MySQL, PostgreSQL, Oracle, Greenplum, and so on.
This will allow the query optimizer to restrict the query to a much narrower dataset.
I agree with BraedenP that a stored procedure would be appropriate here. If you can't use one or really don't want to, you can always either generate the dates on the PHP side, but they might not match exactly with the database unless you have them synced.
You can also do it more quickly as 3 separate queries likely. Query for the begin data, query for the end date, then use those values as input into your target query.
Related
I have a very long table that has 10000 rows.
9500 rows have a column set to 1 and the remaining 500 have that column set to 0.
I want to update the table so all rows have that column set to 1.
Would it be faster to use WHERE column = 0 or is it better to skip the WHERE and just UPDATE all ?
Im using a prepared statement.
thanks
This is an interesting question. In general, I would say it is better to have where column = 0. It should never be detectably worse, but in MySQL it might not make a difference.
With no index on the column, then the query will need to read all the rows anyway in order to identify the ones that need to be updated. So, the query is going to do a full table scan regardless. Even with an index, the optimizer may still choose a full table scan, unless you have a clustered index on column.
Then, the overhead on doing an update is maintaining the log. I'm pretty sure that MySQL only logs actual changes to the database. In other words, it is doing the comparison anyway. So, MySQL is not going to "re-update" values to the same value. Note: not all databases behave this way.
All that said, I would always put the where column = 0 if that is your intention. On 10,000 rows, performance isn't the big issue. Clarity of code ranks higher, in my opinion. Also, I write code for multiple databases, so I prefer to write code that will work well across all of them.
With my script execution time measurements, for the MariaDB it's pretty much the same result with that queries (I mean result & time).
I will try it on MySQL too.
My situation: My website will look at a cookie for a remember me token and a user ID. If the cookie exists it will unhash it and look up the user ID and compare the token. with a "WHERE userid = '' and rememberme = ''".
My question is: Will MySQL optimize this query on the unique userid so that the query does not scan the entire database for this 20+ character token? Or instead should I just select the token from the database and then use a php if comparison to check if the tokens are the same?
In short (tl;dr): Would it be better to check if a token matches in with a MySQL select query, or to grab all the tokens from a databases database and compare the values with a php if conditional?
Thanks!
Simple answer:
YES, the database will definitely optimism your search AS LONG AS THE variable you are searching in the WHERE ... portion is indexed! You definitely should not retrieve all the information via SQL and then do a PHP conditional if you are worried about performance.
So if the id column in your table is not indexed, you should index it. If you have let say... 1 million rows already in your table and run a command like SELECT * FROM user WHERE id = 994321, you would see a definite increase in performance.
Elaborating:
A database (like MySQL) is made to be much faster at executing queries/commands than you would expect that to happen in php for instance. In your specific situation, lets say you are executing this SQL statement:
$sql = "SELECT * FROM users WHERE id = 4";
If you have 1 million users, and the id column is not indexed, MySQL will look through all 1 million users to find all the rows with id = 4. However, if it is indexed, there is something called a b tree that MySQL makes (behind the scenes) which works similarly to how the indexing of a dictionary work.
If you try to find the world slowly in a dictionary, you might open the book in the middle, find words that start with the letter M and then look in the middle again of the pages on your right side hoping to find a letter closer to S. This method of looking for a word is much faster than looking at each single page from the beginning 1 by 1.
For that very reason, MySQL has created indexes to help performance and this feature should definitely be taken advantage of to help increase the speed of your queries.
Comparing it on MySQL-side should be fast. It should find the corresponding row by ID first (fast) and then compare the hash (also fast, since there will be only 1 row to check).
Try analyzing the query with EXPLAIN to find out the actual execution plan.
In my opinion it will be always faster to use WHERE clause no matter what (real) database server will be used. Database engines have strong algorithms for searching data written in language that is compiling to low-level code dedicated to platform, so it cannot be even compared with some loop written in interpreted PHP.
And remember that for PHP loop you will have to send all records from DB to PHP.
If you Data Base its on a separate server than you Apache PHP there is not doubt it would be faster if you write a query in MySQL.
If your PHP and MySQL server is on the same physical server probably PHP would be faster cause the comparison will be made on the RAM But have all the User Id array into RAM would be a waste of RAM so you can use Indexes that would speed up your query
ALTER TABLE table ADD INDEX idx__tableName__fieldName (field)
I am working on converting a prototype web application into something that can be deployed. There are some locations where the prototype has queries that select all the fields from a table although only one field is needed or the query is just being used for checking the existence of the record. Most of the cases are single row queries.
I'm considering changing these queries to queries that only get what is really relevant, i.e.:
select * from users_table where <some condition>
vs
select name from users_table where <some condition>
I have a few questions:
Is this a worthy optimization in general?
In which kind of queries might this change be particularly good? For example, would this improve queries where joins are involved?
Besides the SQL impact, would this change be good at the PHP level? For example, the returned array will be smaller (a single column vs multiple columns with data).
Thanks for your comments.
If I were to answer all of your three questions in a single word, I would definitely say YES.
You probably wanted more than just "Yes"...
SELECT * is "bad practice": If you read the results into a PHP non-associative array; then add a column; now the array subscripts are possibly changed.
If the WHERE is complex enough, or you have GROUP BY or ORDER BY, and the optimizer decides to build a tmp table, then * may lead to several inefficiencies: having to use MyISAM instead of MEMORY; the tmp table will be bulkier; etc.
EXISTS SELECT * FROM ... comes back with 0 or 1 -- even simpler.
You may be able to combine EXISTS (or a suitable equivalent JOIN) to other queries, thereby avoiding an extra roundtrip to the server.
Working in Drupal 6, PHP 5.3, and MySQL, I'm building a query that looks roughly like this:
SELECT val from table [and some other tables joined in below]
where [a bunch of clauses, including getting all the tables joined up]
and ('foo' not in (select ...))
and (('bar' in (select...) and x = y)
or ('baz' in (select ...) and p = q))
That's not a great representation of what I'm trying to do, but hopefully it will be enough. The point is that, in the middle of the query there is an embedded SELECT that is used a number of times. It's always the same. It's not completely self-contained -- it relies on a value pulled from one of the tables at the top level of the query.
I'm feeling a little guilty/unclean for just repeating the query every time it's needed, but I don't see any other way to compute the value once and reuse it as needed. Since it refers to the value from a top level table, I can't compute it once outside the query and just insert the value into the query, either through a MySQL variable or by monkeying around with the query string. Or, so I think, anyway.
Is there anything I can do about this? Or, maybe it's a non-issue from a performance perspective: the code might be nasty, but parhaps MySQL is smart enough to cache the value itself and avoid executing the query over and over again? Any advice? Thanks!
You should be able to alias the result by doing SELECT ... AS alias, and then using in alias in the other queries, since the SELECT is really just a table.
the query i'd like to speed up (or replace with another process):
UPDATE en_pages, keywords
SET en_pages.keyword = keywords.keyword
WHERE en_pages.keyword_id = keywords.id
table en_pages has the proper structure but only has non-unique page_ids and keyword_ids in it. i'm trying to add the actual keywords(strings) to this table where they match keyword_ids. there are 25 million rows in table en_pages that need updating.
i'm adding the keywords so that this one table can be queried in real time and return keywords (the join is obviously too slow for "real time").
we apply this query (and some others) to sub units of our larger dataset. we do this frequently to create custom interfaces for specific sub units of our data for different user groups (sorry if that's confusing).
this all works fine if you give it an hour to run, but i'm trying to speed it up.
is there a better way to do this that would be faster using php and/or mysql?
I actually don't think you can speed up the process.
You can still add brutal power to your database by cluserting new servers.
Maybe I'm wrong or missunderstood the question but...
Couldn't you use TRIGGERS ?
Like... when a new INSERT is detected on "en_pages", doing a UPDATE after on that same row?
(I don't know how frequent INSERTS are in that table)
This is just an idea.
How often does "en_pages.keyword" and "en_pages.keyword_id" changes after being inserted ?!?!?
I don't know about mySQL but usually this sort of thing runs faster in SQL Server if you process a limited number of batches of records (say a 1000) at a time in a loop.
You might also consider a where clause (I don't know what mySQL uses for "not equal to" so I used the SQL Server verion):
WHERE en_pages.keyword <> keywords.keyword
That way you are only updating records that have a difference in the field you are updating not all of the them.