Best way to count millions of rows between certain dates in MySQL - php

Here is my SQL for creating a table:
$sql_create_table = "CREATE TABLE {$table_name} (
hit_id bigint(20) unsigned NOT NULL auto_increment,
user_id int(7) unsigned NOT NULL default '0',
optin_id int(8) unsigned NOT NULL default '0',
hit_date datetime NOT NULL default '0000-00-00 00:00:00',
hit_type varchar(10) NOT NULL default '',
PRIMARY KEY (hit_id),
KEY user_id (user_id)
) $charset_collate; ";
I need to know the fastest way to count the number of rows within a query. My current query doesn't cut it for going through millions of rows.
$sql = "SELECT hit_id FROM $table_name WHERE user_id = %d AND hit_type = %s AND hit_date >= FROM_UNIXTIME(%d) AND hit_date <= FROM_UNIXTIME(%d)";
I've tried this with no luck (not returning the proper results):
$sql = "SELECT COUNT(*) FROM $table_name WHERE user_id = %d AND hit_type = %s AND hit_date >= FROM_UNIXTIME(%d) AND hit_date <= FROM_UNIXTIME(%d)";
What do I need to do to make this query efficient so that it doesnt time out for millions of rows? I simply want to count the number of rows within the specified parameter set.

I'm not sure of the performance of the FROM_UNIXTIME function, but the first thing I would do is create an index on hit_date.
http://dev.mysql.com/doc/refman/5.0/en/create-index.html

Related

How to Run Timely Automated Query on MySQL

I have a field called pending and been declared as Boolean with default value of 0 as:
`pending` tinyint(1) NOT NULL DEFAULT '0'
I am running an Update query againt the database to change the state of qpending to 1 like below
$sql = "UPDATE `appointments` SET `pending` = '1' WHERE `appointments`.`id` = 124;
now my question is, is there any way to automatically re-state the pending to 0 after 30 minutes by taking a conditional clause like
// After 30 Minutes of update!
if (!confirmed){
$sql = "UPDATE `appointments` SET `pending` = '0' WHERE `appointments`.`id` = 124;
}
Table Schema
CREATE TABLE IF NOT EXISTS `appointments` (
`id` int(6) NOT NULL AUTO_INCREMENT,
`date` varchar(100) CHARACTER SET utf8 NOT NULL,
`available` tinyint(1) NOT NULL DEFAULT '1',
`pending` tinyint(1) NOT NULL DEFAULT '0',
`confirmed` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
You are doing it wrong.
Make your pending field not a boolean but a datetime, with the value of the due time. Then in your select queries just compare that value with the current time. So instead of
SELECT * FROM appointments WHERE pending = 1
just make it
SELECT * FROM appointments WHERE pending < NOW()
this solution is either much simpler and more flexible

Correct mysql query to query one table and have that provide information for another table query

I have 2 separate tables, both of which I need to query simultaneously to get the correct information to display. The tables are members and posts. Through an html form, a user enters criteria for the members table, and then I need to use the primary index of all those specific members to find all the posts submitted by those members and then do a sort on the posts table results. The results will be a mixture of rows from the two tables. Both tables have a primary index of the name 'id'. So far what I've come up with is:
$sql_get_posts = mysqli_query($link, "(SELECT id, username FROM members WHERE active='y' AND gender='M' AND city='Yuma' AND state='Arizona') UNION (SELECT * FROM posts WHERE member_id='id' AND active='y' ORDER BY list_weight DESC)") or die(mysqli_error($link));
The error I'm getting is "The used SELECT statements have a different number of columns".
I need to then cycle through the returned results from both tables to populate the content seen by the user:
<?php
while ($row = mysqli_fetch_array($sql_get_posts)) {
$post_id = $row['id']; //This should be the post primary index named 'id', not the member primary index also name 'id'
$member_id = $row['member_id']; //This is the member_id row in the post table referencing this particular member who wrote this post
$member_username = $row['username']; //This is a row stored in the member table
$title = $row['title']; //This is a row stored in post table
******//and on and on getting rows from only the post table
}
Edit My SQL tables:
CREATE TABLE IF NOT EXISTS `members` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(20) NOT NULL,
`age` varchar(3) NOT NULL,
`gender` varchar(1) NOT NULL,
`city` varchar(20) NOT NULL,
`state` varchar(50) NOT NULL,
`active` enum('y','n') NOT NULL DEFAULT 'y',
`created_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;
CREATE TABLE IF NOT EXISTS `posts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`member_id` int(11) NOT NULL,
`title` text NOT NULL,
`comments` enum('y','n') NOT NULL DEFAULT 'y',
`post_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`list_weight` double NOT NULL,
`active` enum('y','n') NOT NULL DEFAULT 'y',
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=47 ;
Use join instead of union, union assumes the tables you're combining are similar, whereas join merges the columns of two tables.
Something like:
SELECT members.id, members.username, posts.*
FROM members
INNER JOIN posts
ON members.id = posts.member_id
WHERE members.active='y' AND members.gender='M' AND members.city='Yuma' AND members.state='Arizona'
ORDER BY posts.list_weight DESC
SELECT statement within the UNION must have the same number of columns. The columns must also have similar data types. Also, the columns in each SELECT statement must be in the same order.
But you have selected only 2 columns in first query and "*" for second select query
use joins
SELECT m.id, m.username,p.* FROM members m JOIN posts p on m.active='y' AND m.gender='M' AND m.city='Yuma' AND m.state='Arizona' and p.member_id='id' AND p.active='y' ORDER BY p.list_weight DESC
or you can use
SELECT m.id, m.username,p.* FROM members m,posts p where m.active='y' AND m.gender='M' AND m.city='Yuma' AND m.state='Arizona' and p.member_id='id' AND p.active='y' ORDER BY p.list_weight DESC

Get big result set from mysql

I've a big table with about 20 millions of rows and every day it grows up and I've a form which get a query from this table. Unfortunately query returns hundreds of thousands of rows.
Query is based on Time, and I need all records to classify them by 'clid' base on some rules.So I need all records to do some process on them to make a result table.
This is my table :
CREATE TABLE IF NOT EXISTS `cdr` (
`gid` bigint(20) NOT NULL AUTO_INCREMENT,
`prefix` varchar(20) NOT NULL DEFAULT '',
`id` bigint(20) NOT NULL,
`start` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`clid` varchar(80) NOT NULL DEFAULT '',
`duration` int(11) NOT NULL DEFAULT '0',
`service` varchar(20) NOT NULL DEFAULT '',
PRIMARY KEY (`gid`),
UNIQUE KEY `id` (`id`,`prefix`),
KEY `start` (`start`),
KEY `clid` (`clid`),
KEY `service` (`service`)
) ENGINE=InnoDB DEFAULT CHARSET=utf-8 ;
and this is my query :
SELECT * FROM `cdr`
WHERE
service = 'test' AND
`start` >= '2014-02-09 00:00:00' AND
`start` < '2014-02-10 00:00:00' AND
`duration` >= 10
Date period could be various from 1 hour to maybe 60 day or even more.(like :
DATE(start) BETWEEN '2013-02-02 00:00:00' AND '2014-02-03 00:00:00'
)
The result set has about 150,000 rows for every day. When i try to get result for bigger period or even one day database crashes.
Does anybody have any idea ?
I don't know how to prevent it from crashing, but one thing that I did with my large tables was partition them by date.
Here, I partition the rows by date, twice a month. As long as your query uses the partitioned column, it will only search the partitions containing the key. It will not do a full table scan.
CREATE TABLE `identity` (
`Reference` int(9) unsigned NOT NULL AUTO_INCREMENT,
...
`Reg_Date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`Reference`),
KEY `Reg_Date` (`Reg_Date`)
) ENGINE=InnoDB AUTO_INCREMENT=28424336 DEFAULT CHARSET=latin1
PARTITION BY RANGE COLUMNS (Reg_Date) (
PARTITION p20140201 VALUES LESS THAN ('2014-02-01'),
PARTITION p20140214 VALUES LESS THAN ('2014-02-14'),
PARTITION p20140301 VALUES LESS THAN ('2014-03-01'),
PARTITION p20140315 VALUES LESS THAN ('2014-03-15'),
PARTITION p20140715 VALUES LESS THAN (MAXVALUE)
);
So basically, you just do a dump of the table, create it with partitions and then import the data into it.

MySQL "NOT IN" Query Optimization

Optimizng MySQL queries isn't my expertise, so I was wondering if someone could help me formulate the most optimal query here (and indices).
As background, I'm trying to find a distinct visitor id within a table of transactions with certain where criteria (date range, not a certain product, etc. as you see in the query below). Transactions and visitors have a one to many relationship, so there can be many transactions to a single visitor.
Another requirement for the results is that if a visitor_id is found in the result, it must be the first instance of a visitor_id (by date_time) in the entire table. In other words, the visitor_id should only exist in the date range set in the primary query and at no time beforehand.
Here's what I've put together so far. It uses NOT IN and a subquery, but this doesn't seem ideal because the query takes between 2-3 seconds being that the table has over 500k records. I've tried a few variations of indices, but nothing seems to really work.
Here's the query.
SELECT DISTINCT visitor_id, date_time
FROM pt_transactions
WHERE visitor_id NOT IN (SELECT visitor_id FROM pt_transactions WHERE date_time < '$this->_date_time_start')
AND campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'
AND product_id != 65
And here's the complete table structure.
CREATE TABLE IF NOT EXISTS `pt_transactions` (
`id` int(32) NOT NULL AUTO_INCREMENT,
`type` varchar(2) NOT NULL COMMENT 'New Lead (NL), Raw Optin (RO), Base Sale (BS), Upsell Sale (US), Recurring Sale (RS), Base Refund (BR), Upsell Refund (UR), Recurring Refund (RR), Unknown Refund (XR), or Chargeback (C)',
`date_time` datetime NOT NULL,
`amount` varchar(255) NOT NULL,
`a_aid` varchar(255) NOT NULL,
`subid1` varchar(255) NOT NULL,
`subid2` varchar(255) NOT NULL,
`subid3` varchar(255) NOT NULL,
`product_id` int(16) NOT NULL,
`visitor_id` int(32) NOT NULL,
`campaign_id` int(16) NOT NULL,
`last_click_id` int(16) NOT NULL,
`trackback_type` varchar(255) NOT NULL COMMENT 'Shows if the transaction is tracked back to the original visitor via cookie or via IP. Usually only applies to sales via pixel.',
`original_transaction_id` int(32) NOT NULL COMMENT 'Reference to original transaction id, in this table, if type is RS, R, or C',
`recurring_transaction_id` varchar(32) NOT NULL COMMENT 'Reference to existing RecurringTransaction if type is RS',
PRIMARY KEY (`id`),
KEY `visitor_id` (`visitor_id`),
KEY `campaign_id` (`visitor_id`,`campaign_id`,`amount`,`product_id`),
KEY `transaction_retrieval_group` (`campaign_id`,`date_time`,`a_aid`),
KEY `type` (`type`),
KEY `date_time` (`date_time`),
KEY `original_source` (`campaign_id`,`a_aid`,`date_time`,`product_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=574636
You can try NOT EXISTS
SELECT DISTINCT visitor_id, date_time
FROM pt_transactions t
WHERE campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'
AND product_id != 65
AND NOT EXISTS
(
SELECT *
FROM pt_transactions
WHERE visitor_id = t.visitor_id
AND date_time < '$this->_date_time_start'
)
Do EXPLAIN <query> and see how your indices are used. If you want you can post results in your question in a textual form.
From your query what i can understand is that...
Their is no need to write NOT IN Statement...
Because, you are already keeping a check for
date_time >= '$this->_date_time_start'
so thier is no need to check date_time < '$this->_date_time_start' in not NOT IN statement.
Only below should work fine :)
SELECT DISTINCT visitor_id, date_time
FROM pt_transactions
WHERE
AND campaign_id = $this->_campaign_id
AND a_aid = '$a_aid'
AND date_time >= '$this->_date_time_start'
AND date_time <= '$this->_date_time_end'
AND product_id != 65

PHP and SQL Social Operations

I've got a table created using this query.
CREATE TABLE `tbl_friends` (
`friend_id` int(11) NOT NULL auto_increment,
`friend_status` tinyint(4) NOT NULL default '0',
`friend_id1` int(11) NOT NULL default '0',
`friend_id2` int(11) NOT NULL default '0',
`friend_date` datetime NOT NULL default '0000-00-00 00:00:00',
`friend_confirm` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`friend_id`)
) TYPE=MyISAM AUTO_INCREMENT=10 ;
I want to view the list of my friends, for that I only need to know other user's id - numeric value stored in friend_id1 or friend_id2 rows. My problem is that I don't know if friend_id1 or friend_id2 contains required data.
Please help me create a query to receive the number of another user if my number is $_SESSION['id']
CREATE TABLE `tbl_friends` (
`friend_id` int(11) NOT NULL auto_increment,
`friend_status` tinyint(4) NOT NULL default '0',
`friend_id1` int(11) NOT NULL default '0',
`friend_id2` int(11) NOT NULL default '0',
`friend_date` datetime NOT NULL default '0000-00-00 00:00:00',
`friend_confirm` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY (`friend_id`)
) TYPE=MyISAM AUTO_INCREMENT=10 ;
Would be like so:
SELECT * FROM tbl_friends WHERE friend_id2 = %d OR friend_id1 = %d AND friend_status != 0 ORDER BY friend_id
More Visual example:
SELECT
*
FROM
tbl_friends
WHERE
friend_id2 = %$1d
OR
friend_id1 = %$1d
AND
friend_status != 0
ORDER BY
friend_id
DESC
Then just run it threw a sprintf function and your ready.
$sql = sprintf($sql,$_SESSION['id']);
I'm not sure what you exactly mean, but I think your looking for something like:
select distinct * from tbl_friends where friend_id1 = $_SESSION["id"] OR friend_id2 = $_SESSION["id"];
If this is not what you mean, please add some additional information.
Something like:
Select distinct(id) from (
select friend_id1 as id from friends where friend_id2 = :my_id
union
select friend_id2 as id from friends where friend_id1 = :my_id
)
There real problem here is keeping the table from getting weird data. You are better off always putting two records into the table for each reciprocal relationship. Then you only need to say:
select * from friends where friend_id1 = :my_id and status = :whatever
If you do it this way, the control data for when friendship happens might need to move to another table

Categories