So I have a table with a column called id and in some rare cases a lot of the IDs(between 20-140 different IDs) listed don't need to be/can't be shown to the user. It's all based on different permissions.
SELECT * FROM `table` WHERE (`id` != 21474 OR 26243 OR 78634) AND `checked` = 5
Unfortunately there is no additional grouping anywhere else in the DB that allows me to call out this section of IDs at once. So I'm looking if there is a better way of going about this or if I should ignore doing this during the mysql/SELECT statement and instead do it within like a PHP statement after everything is pulled. The problem with a PHP foreach later is the data that is pulled can be hundreds upon hundreds of rows so it will really slow down the page. As you can see in the above mysql query I was thinking of just listing out the IDs in one huge statement but I figured maybe there is a better way of going about this. 100 ORs just doesn't sound like the best possible solution.
Use the IN keyword,
SELECT * FROM `table` WHERE `id` NOT IN (21474, 26243, 78634) AND `checked` = 5
IN statement is good, but be careful not to bump into max_allowed_packet option of the server, if you query turns out toooooo long.
Related
I am working on converting a prototype web application into something that can be deployed. There are some locations where the prototype has queries that select all the fields from a table although only one field is needed or the query is just being used for checking the existence of the record. Most of the cases are single row queries.
I'm considering changing these queries to queries that only get what is really relevant, i.e.:
select * from users_table where <some condition>
vs
select name from users_table where <some condition>
I have a few questions:
Is this a worthy optimization in general?
In which kind of queries might this change be particularly good? For example, would this improve queries where joins are involved?
Besides the SQL impact, would this change be good at the PHP level? For example, the returned array will be smaller (a single column vs multiple columns with data).
Thanks for your comments.
If I were to answer all of your three questions in a single word, I would definitely say YES.
You probably wanted more than just "Yes"...
SELECT * is "bad practice": If you read the results into a PHP non-associative array; then add a column; now the array subscripts are possibly changed.
If the WHERE is complex enough, or you have GROUP BY or ORDER BY, and the optimizer decides to build a tmp table, then * may lead to several inefficiencies: having to use MyISAM instead of MEMORY; the tmp table will be bulkier; etc.
EXISTS SELECT * FROM ... comes back with 0 or 1 -- even simpler.
You may be able to combine EXISTS (or a suitable equivalent JOIN) to other queries, thereby avoiding an extra roundtrip to the server.
I have a script that adds about 100,000 entries to SQL if it doesn't exist. But it normally takes about 30 hours to fully check each row and add if it doesnt exist. Is there an easier way to do this?
my code currently uses a for Loop, within the loop is this.
$query = mysql_query("SELECT EXISTS (SELECT * FROM linkdb WHERE link='$currentlink')");
if (mysql_result($query, 0) == 1){
}else{
$qry = "INSERT INTO linkdb(link,title) VALUES('$link','$title')";
$result = #mysql_query($qry);
}
the code above takes very long time because it has to normally go through thousands of entries. If I don't check the table first using SELECT EXIST and use only INSERT INTO, 90,000 entries are added within 1 min. But that adds duplicate entries of the same row.
Please give me some advice on what I could do. These rows need to be updated almost everyday.
You're looking for ON DUPLICATE KEY UPDATE. Add an index on link and then:
INSERT INTO linkdb(link,title) VALUES('$link','$title') ON DUPLICATE KEY UPDATE link=link;
With that said, you should not be using ext/mysql since it is deprecated. Instead look into PDO or mysqli. It would be much better to use parametrized queries for this to prevent SQL injection.
Perhaps
INSERT INTO ... ON DUPLICATE KEY UPDATE
can solve your problem.
If you don't want to update the value when there is a duplicate, you can combine the two queries into one:
INSERT INTO linkdb(link,title)
select '$link','$title'
where not exists (SELECT * FROM linkdb WHERE link='$currentlink'))
In practice, you can speed up any of these queries by creating an index on linkdb(link).
I have to get all entries in database that have a publish_date between two dates. All dates are stored as integers because dates are in UNIX TIMESTAMP format...
Following query works perfect but it takes "too long". It returns all entries made between 10 and 20 dazs ago.
SELECT * FROM tbl_post WHERE published < (UNIX_TIMESTAMP(NOW())-864000)
AND published> (UNIX_TIMESTAMP(NOW())-1728000)
Is there any way to optimize this query? If I am not mistaken it is calling the NOW() and UNIX_TIMESTAMP on evey entry. I thought that saving the result of these 2 repeating functions into mysql #var make the comparison much faster but it didn't. 2nd code I run was:
SET #TenDaysAgo = UNIX_TIMESTAMP(NOW())-864000;
SET #TwentyDaysAgo = UNIX_TIMESTAMP(NOW())-1728000;
SELECT * FROM tbl_post WHERE fecha_publicado < #TenDaysAgo
AND fecha_publicado > #TwentyDaysAgo;
Another confusing thing was that PHP can't run the bove query throught mysql_query(); ?!
Please, if you have any comments on this problem it will be more than welcome :)
Luka
Be sure to have an index on published.And make sure it is being used.
EXPLAIN SELECT * FROM tbl_post WHERE published < (UNIX_TIMESTAMP(NOW())-864000) AND published> (UNIX_TIMESTAMP(NOW())-1728000)
should be a good start to see what's going on on the query. To add an index:
ALTER TABLE tbl_post ADD INDEX (published)
PHP's mysql_query function (assuming that's what you're using) can only accept one query per string, so it can't execute the three queries that you have in your second query.
I'd suggest moving that stuff into a stored procedure and calling that from PHP instead.
As for the optimization, setting those variables is about as optimized as you're going to get for your query. You need to make the comparison for every row, and setting a variable provides the quickest access time to the lower and upper bounds.
One improvement in the indexing of the table, rather than the query itself would be to cluster the index around fecha_publicado to allow MySQL to intelligently handle the query for that range of values. You could do this easily by setting fecha_publicado as PRIMARY KEY of the table.
The obvious things to check are, is there an index on the published date, and is it being used?
The way to optimize would be to partition the table tbl_post on the published key according to date ranges (weekly seems appropriate to your query). This is a feature that is available for MySQL, PostgreSQL, Oracle, Greenplum, and so on.
This will allow the query optimizer to restrict the query to a much narrower dataset.
I agree with BraedenP that a stored procedure would be appropriate here. If you can't use one or really don't want to, you can always either generate the dates on the PHP side, but they might not match exactly with the database unless you have them synced.
You can also do it more quickly as 3 separate queries likely. Query for the begin data, query for the end date, then use those values as input into your target query.
consider "Query1", which is quite time consuming. "Query1" is not static, it depends on $language_id parameter, thats why I can not save it on the server.
I would like to query this "Query1" with another query statement. I expect, that this should be fast. I see perhaps 2 ways
$result = mysql_query('SELECT * FROM raw_data_tbl WHERE ((ID=$language_id) AND (age>13))');
then what? here I want to take result and requery it with something like:
$result2 = mysql_query('SELECT * FROM $result WHERE (Salary>1000)');
Is it possible to create something like "on variable based" MYSQL query directly on the server side and pass somehow variable $language_id to it? The second query would query that query :-)
Thanks...
No, there is no such thing as your second idea.
For the first idea, though, I would go with a single query :
select *
from raw_data
where id = $language_id
and age > 13
and Salary > 1000
Provided you have set the right indexes on your table, this query should be pretty fast.
Here, considering the where clause of that query, I would at least go with an index on these three columns :
id
age
Salary
This should speed things up quite a bit.
For more informations on indexes, and optimization of queries, take a look at :
Chapter 7. Optimization
7.3.1. How MySQL Uses Indexes
12.1.11. CREATE INDEX Syntax
With the use of sub queries you can take advantage of MySQL's caching facilities.
SELECT * FROM raw_data_tbl WHERE (ID='eng') AND (age>13);
... and after this:
SELECT * FROM (SELECT * FROM raw_data_tbl WHERE (ID='eng') AND (age>13)) WHERE salary > 1000;
But this is only beneficial in some very rare circumstances.
With the right indexes your query will run fast enough without the need of trickery. In your case:
CREATE INDEX filter1 ON raw_data_tbl (ID, age, salary);
Although the best solution would be to just add conditions from your second query to the first one, you can use temporary tables to store temporary results. But it would still be better if you put that in a single query.
You could also use subqueries, like SELECT * FROM (SELECT * FROM table WHERE ...) WHERE ....
I'm building a wepage in php using MySQL as my database.
Which way is faster?
2 requests to MySQL with the folling query.
SELECT points FROM data;
SELECT sum(points) FROM data;
1 request to MySQL. Hold the result in a temporary array and calcuale the sum in php.
$data = SELECT points FROM data;
EDIT -- the data is about 200-500 rows
It's really going to depend on a lot of different factors. I would recommend trying both methods and seeing which one is faster.
Since Phill and Kibbee have answered this pretty effectively, I'd like to point out that premature optimization is a Bad Thing (TM). Write what's simplest for you and profile, profile, profile.
How much data are we talking about? I'd say MySQL is probably faster at doing those kind of operations in the majority of cases.
Edit: with the kind of data that you're talking about, it probably won't make masses of difference. But databases tend to be optimised for those kind of queries, whereas PHP isn't. I think the second DB query is probably worth it.
If you want to do it in one line, use a running total like this:
SET #total=0;
SELECT points, #total:=#total+points AS RunningTotal FROM data;
I wouldn't worry about it until I had an issue with performance.
If you go with two separate queries, you need to watch out for the possibility of the data changing between getting the rows & getting their sum. Until there's an observable performance problem, I'd stick to doing my own summation to keep the page consistent.
The general rule of thumb for efficiency with mySQL is to try to minimize the number of SQL requests. Every call to the database adds overhead and is "expensive" in terms of time required.
The optimization done by mySQL is quite good. It can take very complex requests with many joins, nestings and computations, and make it run efficiently.
But it can only optimize individual requests. It cannot check the relationship between two different SQL statements and optimize between them.
In your example 1, the two statements will make two requests to the database and the table will be scanned twice.
Your example 2 where you save the result and compute the sum yourself would be faster than 1. This would only be one database call, and looping through the data in PHP to get the sum is faster than a second call to the database.
Just for the fun of it.
SELECT COUNT(points) FROM `data`
UNION
SELECT points FROM `data`
The first row will be the total, the next rows will be the data.
NOTE: Union can be slow, but its an option.
Could also do more fun and this supports you sorting the rows.
SELECT 'total' AS name, COUNT(points) FROM `data`
UNION
SELECT 'points' AS name, points FROM `data`
Then selecting through PHP
while($row = mysql_fetch_assoc($query))
{
if($row["data"] == "points")
{
echo $row["points"];
}
if($row["data"] == "total")
{
echo "Total is: ".$row["points"];
}
}
You can use union like this:
(select points, null as total from data) union (select null, sum(points) from data group by points);
The result will look something like this:
point total
2 null
5 null
...
null 7
you can figure out how to handle it.
do it the mySQL way. let the database manager do its work.
mySQL is optimized for such tasks