I have a table where I log members.
There are 1,486,044 records here.
SELECT * FROM `user_log` WHERE user = '1554143' order by id desc
However, this query takes 5 seconds. What do you recommend ?
Table construction below;
CREATE TABLE IF NOT EXISTS `user_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user` int(11) NOT NULL,
`operation_detail` varchar(100) NOT NULL,
`ip_adress` varchar(50) NOT NULL,
`l_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
COMMIT;
For this query:
SELECT * FROM `user_log` WHERE user = 1554143 order by id desc
You want an index on (user, id desc).
Note that I removed the single quotes around the filtering value for user, since this column is a number. This does not necessarily speeds things up, but is cleaner.
Also: select * is not a good practice, and not good for performance. You should enumerate the columns you want in the resultset (if you don't need them all, do not select them all). If you want all columns, since your table has not a lot of columns, you might want to try a covering index on all 5 columns, like: (user, id desc, operation_detail, ip_adress, l_date).
In addition to the option of creating an index on (user, id), which has already been mentioned, a likely better option is to convert the table to InnoDB as create an index only on (user).
I have a simple, location based, key-value array (PHP), which changes throughout the day. I intend to capture variation in this array.
I can calculate the difference between previous array and current array values. I could, then save them in SQL DB as:
Location, Date, Key, NewValue
How will the schema look like for this. My newbie attempt is as follows:
CREATE TABLE `Variations` (
`Location` TEXT(128),
`Date` DATETIME,
`Key` TEXT(64),
`Value` TEXT(256),
`ID` INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`ID`)
);
How would I know all the (latest) key value pairs at a given date ?
Looking for guidance on SQL query for data retrieval.
An autoincrementing ID must be of type INTEGER, not INT, and AUTOINCREMENT (not AUTO_INCREMENT) has a meaning different from what you think it has.
To get the latest values for a date, you need rows for which no other row with a later date exists:
SELECT *
FROM Variations
WHERE date(Date) <= 'xxxx-xx-xx'
AND NOT EXISTS (SELECT 1
FROM Variations AS V2
WHERE V2.Location = Variations.Location
AND V2.Key = Variations.Key
AND V2.Date <= 'xxxx-xx-xx'
AND V2.Date > Variations.Date)
I have a table that its structure is as like as follow:
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ttype` int(1) DEFAULT '19',
`title` mediumtext,
`tcode` char(2) DEFAULT NULL,
`tdate` int(11) DEFAULT NULL,
`visit` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `tcode` (`tcode`),
KEY `ttype` (`ttype`),
KEY `tdate` (`tdate`)
ENGINE=MyISAM
I have two query on x.php same as:
SELECT * FROM table_name WHERE id='10' LIMIT 1
UPDATE table_name SET visit=visit+1 WHERE id='10' LIMIT 1
My first problem is that whether updating 'visit' in table cause reindexing and decreasing performance or not? Note to this point that 'visit' is not key.
Second method may be creating new table that contain 'visit' like as follow:
'newid' int(10) unsigned NOT NULL ,
`visit` int(11) DEFAULT '0',
PRIMARY KEY (`newid`),
ENGINE=MyISAM
So selecting by
SELECT w.*,q.visit FROM table_name w LEFT JOIN table_name2 q
ON (w.id=q.newid) WHERE w.id='10' LIMIT 1
UPDATE table_name2 SET visit=visit+1 WHERE newid='10' LIMIT 1
Is second method prefered rescpect to first method? Which one would have better performance and would be quick?
Note: all sql queries would be run by PHP (mysql_query command). Also I need first table indexes for other queries on other pages.
I'd say your first method is the best, and simplest. Updating visit will be very fast and no updating of indexes needs to be performed.
I'd prefer the first, and have used that for similar things in the past with no problems. You can remove the limit clause since id is your primary key you will never have more than 1 result, although the query optimizer probably does this for you.
There was a question someone asked earlier to which I responded with a solution you may want to consider as well. When you do 'count' columns you lose the ability to mine the data later. With a transaction table not only can you get 'views' counts, but you can also query for date ranges etc. Sure you will carry the weight of storing potentially hundreds of thousands of rows, but the table is narrow and indices numeric.
I cannot see a solution on the database side... Perhaps you can do it in PHP: If the user has a PHP session, you could, for example, only update the visitor count each 10th time, like:
<?php
session_start();
$_SESSION['count']+=1;
if ($_SESSION['count'] > 10) {
do_the_function_that_updates_the_count_plus_10();
$_SESSION['count'] = 0;
}
Of course you loose some counts, this way, but perhaps this is not that important?
I have this query:
SELECT ROUND(AVG(temp)*multT + conT,2) as temp,
FLOOR(timestamp/$secondInterval) as meh
FROM sensor_locass
LEFT JOIN sensor_data USING(sensor_id)
WHERE sensor_id = '$id'
AND project_id = '$project'
GROUP BY meh
ORDER BY timestamp ASC
The purpose is to select data for drawing a graph, I use the average over a pixels worth of data to make the graph faithful to the data.
So far optimization has included adding indexes, switching between MyISAM and InnoDB but no luck.
Since the time interval changes with graph zoom and period of data collection I cannot make a seperate column for the GROUP BY statement, the query however is slow. Does anyone have ideas for optimizing this query or the table to make this grouping faster, I currently have an index on the timestamp, sensor_id and project_id columns, the timestamp index is not used however.
When running explain extended with the query I get the following:
1 SIMPLE sensor_locass ref sensor_id_lookup,project_id_lookup sensor_id_lookup 4 const 2 100.00 Using where; Using temporary; Using filesort
1 SIMPLE sensor_data ref idsensor_lookup idsensor_lookup 4 webstech.sensor_locass.sensor_id 66857 100.00
The sensor_data table contains at the moment 2.7 million datapoints which is only a small fraction of the amount of data i will end up having to work with. Any helpful ideas, comments or solution would be most welcome
EDIT table definitions:
CREATE TABLE `sensor_data` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`gateway_id` int(11) NOT NULL,
`timestamp` int(10) NOT NULL,
`v1` int(11) NOT NULL,
`v2` int(11) NOT NULL,
`v3` int(11) NOT NULL,
`sensor_id` int(11) NOT NULL,
`temp` decimal(5,3) NOT NULL,
`oxygen` decimal(5,3) NOT NULL,
`batVol` decimal(4,3) NOT NULL,
PRIMARY KEY (`id`),
KEY `gateway_id` (`gateway_id`),
KEY `time_lookup` (`timestamp`),
KEY `idsensor_lookup` (`sensor_id`)
) ENGINE=MyISAM AUTO_INCREMENT=2741126 DEFAULT CHARSET=latin1
CREATE TABLE `sensor_locass` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`project_id` int(11) NOT NULL,
`sensor_id` int(11) NOT NULL,
`start` date NOT NULL,
`end` date NOT NULL,
`multT` decimal(6,3) NOT NULL,
`conT` decimal(6,3) NOT NULL,
`multO` decimal(6,3) NOT NULL,
`conO` decimal(6,3) NOT NULL,
`xpos` decimal(4,2) NOT NULL,
`ypos` decimal(4,2) NOT NULL,
`lat` decimal(9,6) NOT NULL,
`lon` decimal(9,6) NOT NULL,
`isRef` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `sensor_id_lookup` (`sensor_id`),
KEY `project_id_lookup` (`project_id`)
) ENGINE=MyISAM AUTO_INCREMENT=238 DEFAULT CHARSET=latin1
Despite everyone's answers, changing the primary key to optimize the search on the table with 238 rows isn't gonna change anything, especially when the EXPLAIN shows a single key narrowing the search to two rows. And adding timestamp to the primary key on sensor_data won't work either since nothing is querying the timestamp, just calculating on it (unless you can restrict on the timestamp values as galymzhan suggests).
Oh, and you can drop the LEFT in your query, since matching on project_id makes it irrelevant anyway (but doesn't slow anything down). And please don't interpolate variables directly into a query if those variables come from customer input to avoid $project_id = "'; DROP TABLES; --" type sql injection exploits.
Adjusting your heap sizes could work for a while but you'll have to continue adjusting it if you need to scale.
The answer vdrmrt suggests might work but then you'd need to populate your aggregate table with every single possible value for $secondInterval which I'm assuming isn't very plausible given the flexibility that you said you needed. In the same vein, you could consider rrdtool, either using it directly or modifying your data in the same way that it does. What I'm referring to specifically is that it keeps the raw data for a given period of time (usually a few days), then averages the data points together over larger and larger periods of time. The end result is that you can zoom in to high detail for recent periods of time but if you look back further, the data has been effectively lossy-compressed to averages over large periods of time (e.g. one data point per second for a day, one data point per minute for a week, one data point per hour for a month, etc). You could customize those averages initially but unless you kept both the raw data and the summarized data, you wouldn't be able to go back and adjust. In particular, you could not dynamically zoom in to high detail on some older arbitrary point (such as looking at the per second data for a 1 hour of time occuring six months ago).
So you'll have to decide whether such restrictions are reasonable given your requirements.
If not, I would then argue that you are trying to do something in MySQL that it was not designed for. I would suggest pulling the raw data you need and taking the averages in php, rather than in your query. As has already been pointed out, the main reason your query takes a long time is because the GROUP BY clause is forcing mysql to crunch all the data in memory but since its too much data its actually writing that data temporarily to disk. (Hence the using filesort). However, you have much more flexibility in terms of how much memory you can use in php. Furthermore, since you are combining nearby rows, you could pull the data out row by row, combining it on the fly and thereby never needing to keep all the rows in memory in your php process. You could then drop the GROUP BY and avoid the filesort. Use an ORDER BY timestamp instead and if mysql doesn't optimize it correctly, then make sure you use FORCE INDEX FOR ORDER BY (timestamp)
I'd suggest that you find a natural primary key to your tables and switch to InnoDB. This a guess at what your data looks like:
sensor_data:
PRIMARY KEY (sensor_id, timestamp)
sensor_locass:
PRIMARY KEY (sensor_id, project_id)
InnoDB will order all the data in this way so rows you're likely to SELECT together will be together on disk. I think you're group by will always cause some trouble. If you can keep it below the size where it switches over to a file sort (tmp_table_size and max_heap_table_size), it'll be much faster.
How many rows are you generally returning? How long is it taking now?
As Joshua suggested, you should define (sensor_id, project_id) as a primary key for sensor_locass table, because at the moment table has 2 separate indexes on each of the columns. According to mysql docs, SELECT will choose only one index from them (most restrictive, which finds fewer rows), while primary key allows to use both columns for indexing data.
However, EXPLAIN shows that MySQL examined 66857 rows on a joined table, so you should somehow optimize that too. Maybe you could query sensor data for a given interval of time, like timestamp BETWEEN (begin, end) ?
I agree that the first step should be to define sensor_id, project_id as primary key for sensor_locass.
If that is not enough and your data is relative static you can create an aggregated table that you can refresh for example everyday and than query from there.
What you still have to do is to define a range for secondInterval, store that in new table and add that field to the primary key of your aggregated table.
The query to populate the aggregated table will be something like this:
INSERT INTO aggregated_sensor_data (sensor_id,project_id,secondInterval,timestamp,temp,meh)
SELECT
sensor_locass.sensor_id,
sensor_locass.project_id,
secondInterval,
timestamp,
ROUND(AVG(temp)*multT + conT,2) as temp,
FLOOR(timestamp/secondInterval) as meh
FROM
sensor_locass
LEFT JOIN sensor_data
USING(sensor_id)
LEFT JOIN secondIntervalRange
ON 1 = 1
WHERE
sensor_id = '$id'
AND
project_id = '$project'
GROUP BY
sensor_locass.sensor_id,
sensor_locass.project_id,
meh
ORDER BY
timestamp ASC
And you can use this query to extract the aggregated data:
SELECT
temp,
meh
FROM
aggregated_sensor_data
WHERE
sensor_id = '$id'
AND project_id = '$project'
AND secondInterval = $secondInterval
ORDER BY
timestamp ASC
If you want to use timestamp index, you will have to tell explicitly to use that index. MySQL 5.1 supports USE INDEX FOR ORDER BY/FORCE INDEX FOR ORDER BY. Have a look at it here http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
I'm trying to count how many times an article has been rated by my members buy using PHP to count a certain articles total entered ratings that have been stored in my MySQL database.
I really want to use PHP and not MySQL to do this and was wondering how I can do this?
I hope I explained this right?
An example would be very helpful my MySQL database that holds the ratings are listed below.
Here is the MySQL database.
CREATE TABLE articles_ratings (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
ratings_id INT UNSIGNED NOT NULL,
users_articles_id INT UNSIGNED NOT NULL,
user_id INT UNSIGNED NOT NULL,
date_created DATETIME NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE ratings (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
points FLOAT UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY (id)
);
It's much easier just to do it with SQL:
select count(*) from articles_ratings where id = (id value)
You could of course just select * from articles_ratings where id = (id value), then loop through all the rows to count them -- but if the database can do all this work for you, then it's usually best to use it!
If that's really what you want, you could SELECT the ratings and then use http://php.net/manual/en/function.mysql-num-rows.php to count them. Is this what you had in mind?
"I really want to use PHP" - this will mean you will retrieve all rows from MySQL server and count them using PHP loop?
This is wrong - use SQL to aggregate information, then retrieve it from database.