How to track a secondary index id

How to track a secondary index id - php

Have a table that will be shared by multiple users. The basic table structure will be:
unique_id | user_id | users_index_id | data_1 | data_2 etc etc
With the id fields being type int and unique_id being an primary key with auto increment.
The data will be something like:
unique_id | user_id | users_index_id
1 | 234 | 1
2 | 234 | 2
3 | 234 | 3
4 | 234 | 4
5 | 732 | 1
6 | 732 | 2
7 | 234 | 5
8 | 732 | 3
How do I keep track of 'users_index_id' so that it 'auto increments' specifically for a user_id ?
Any help would be greatly appreciated. As I've searched for an answer but am not sure I'm using the correct terminology to find what I need.

The only way to do this consistently is by using a "before insert" and "before update" trigger. MySQL does not directly support this syntax. You could wrap all changes to the table in a stored procedure and put the logic there, or use very careful logic when doing an insert:
insert into table(user_id, users_index_id)
select user_id, count(*) + 1
from table
where user_id = param_user_id;
However, this won't keep things in order if you do delete or some updates.
You might find it more convenient to calculate the users_index_id when you query rather than in the database. You can do this using either subqueries (which are probably ok with the right indexes on the table) or using variables (which might be faster but can't be put into a view).
If you have an index on table(user_id, unique_id), then the following query should work pretty well:
select t.*,
(select count(*) from table t2 where t2.user_id = t.user_id and t2.unique_id <= t.unique_id
) as users_index_id
from table t;
You will need the index for non-abyssmal performance.

You need to find the MAX(users_index_id) and increment it by one. To avoid having to manually lock the table to ensure a unique key you will want to perform the SELECT within your INSERT statement. However, MySQL does not allow you to reference the target table when performing an INSERT or UPDATE statement unless it's wrapped in a subquery:
INSERT INTO users (user_id, users_index_id) VALUES (234, (SELECT IFNULL(id, 0) + 1 FROM (SELECT MAX(users_index_id) id FROM users WHERE user_id = 234) dt))
Query without subselect (thanks Gordon Linoff):
INSERT INTO users (user_id, users_index_id) SELECT 234, IFNULL((SELECT MAX(users_index_id) id FROM users WHERE user_id = 234), 0) + 1;
http://sqlfiddle.com/#!2/eaea9a/1/0

Related

Find unmatched results obtained by two different queries on two different tables [duplicate]

I've got the following two tables (in MySQL):
Phone_book
+----+------+--------------+
| id | name | phone_number |
+----+------+--------------+
| 1 | John | 111111111111 |
+----+------+--------------+
| 2 | Jane | 222222222222 |
+----+------+--------------+
Call
+----+------+--------------+
| id | date | phone_number |
+----+------+--------------+
| 1 | 0945 | 111111111111 |
+----+------+--------------+
| 2 | 0950 | 222222222222 |
+----+------+--------------+
| 3 | 1045 | 333333333333 |
+----+------+--------------+
How do I find out which calls were made by people whose phone_number is not in the Phone_book? The desired output would be:
Call
+----+------+--------------+
| id | date | phone_number |
+----+------+--------------+
| 3 | 1045 | 333333333333 |
+----+------+--------------+

There's several different ways of doing this, with varying efficiency, depending on how good your query optimiser is, and the relative size of your two tables:
This is the shortest statement, and may be quickest if your phone book is very short:
SELECT *
FROM Call
WHERE phone_number NOT IN (SELECT phone_number FROM Phone_book)
alternatively (thanks to Alterlife)
SELECT *
FROM Call
WHERE NOT EXISTS
(SELECT *
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number)
or (thanks to WOPR)
SELECT *
FROM Call
LEFT OUTER JOIN Phone_Book
ON (Call.phone_number = Phone_book.phone_number)
WHERE Phone_book.phone_number IS NULL
(ignoring that, as others have said, it's normally best to select just the columns you want, not '*')

SELECT Call.ID, Call.date, Call.phone_number
FROM Call
LEFT OUTER JOIN Phone_Book
ON (Call.phone_number=Phone_book.phone_number)
WHERE Phone_book.phone_number IS NULL
Should remove the subquery, allowing the query optimiser to work its magic.
Also, avoid "SELECT *" because it can break your code if someone alters the underlying tables or views (and it's inefficient).

The code below would be a bit more efficient than the answers presented above when dealing with larger datasets.
SELECT *
FROM Call
WHERE NOT EXISTS (
SELECT 'x'
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number
);

SELECT DISTINCT Call.id
FROM Call
LEFT OUTER JOIN Phone_book USING (id)
WHERE Phone_book.id IS NULL
This will return the extra id-s that are missing in your Phone_book table.

I think
SELECT CALL.* FROM CALL LEFT JOIN Phone_book ON
CALL.id = Phone_book.id WHERE Phone_book.name IS NULL

SELECT t1.ColumnID,
CASE
WHEN NOT EXISTS( SELECT t2.FieldText
FROM Table t2
WHERE t2.ColumnID = t1.ColumnID)
THEN t1.FieldText
ELSE t2.FieldText
END FieldText
FROM Table1 t1, Table2 t2

SELECT name, phone_number FROM Call a
WHERE a.phone_number NOT IN (SELECT b.phone_number FROM Phone_book b)

Alternatively,
select id from call
minus
select id from phone_number

Don't forget to check your indexes!
If your tables are quite large you'll need to make sure the phone book has an index on the phone_number field. With large tables the database will most likely choose to scan both tables.
SELECT *
FROM Call
WHERE NOT EXISTS
(SELECT *
FROM Phone_book
WHERE Phone_book.phone_number = Call.phone_number)
You should create indexes both Phone_Book and Call containing the phone_number. If performance is becoming an issue try an lean index like this, with only the phone number:
The fewer fields the better since it will have to load it entirely. You'll need an index for both tables.
ALTER TABLE [dbo].Phone_Book ADD CONSTRAINT [IX_Unique_PhoneNumber] UNIQUE NONCLUSTERED
(
Phone_Number
)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ONLINE = ON) ON [PRIMARY]
GO
If you look at the query plan it will look something like this and you can confirm your new index is actually being used. Note this is for SQL Server but should be similar for MySQL.
With the query I showed there's literally no other way for the database to produce a result other than scanning every record in both tables.

If I have a MySQL table with multiple colum values the same, how do I delete all but two of the most recent entries?

I know this sounds like a duplicate of a few questions, and it may well be, but I've searched through and tried my own implementation of several possible solutions but all of them seem to result in some form of infinite recursion that just chews 100% CPU and does nothing. That could be because I'm doing it wrong or they aren't appropriate for me, I don't know.
I have a MySQL table structured as follows :
+--------+------+-----+-------+--------+--------+----------------+
| id | fid | bid | dec_a | varc_a | varc_b | dec_b | varc_c |
+--------+------+-----+-------+--------+--------+----------------+
| 106861 | 4192 | 22 | 1.40 | blah | blahbr | 0.2 | blahca |
| 108620 | 4192 | 22 | 1.55 | blah | blahbe | 0.2 | blahca |
| 108621 | 4192 | 22 | 1.55 | blah | blahbq | 0.2 | blahca |
| 108622 | 4192 | 22 | 1.55 | blah | blahbw | 0.2 | blahca |
| 108623 | 4192 | 22 | 1.55 | blah | blahbe | 0.2 | blahca |
| 108624 | 4192 | 22 | 1.55 | blah | blahbf | 0.2 | blahca |
| 106863 | 4192 | 33 | 1.40 | blah | blahba | 0.2 | blahca |
+--------+------+-----+-------+--------+--------+-------+--------+
The "id" value is a BIGINT auto-incrementing value and the data is added in proper chronological order from the source, so I am viewing this as the timestamp.
To establish which data is duplicated I am using the "fid", "bid", "varc_a", "dec_b" and "varc_c" columns. From the example above you can see that there are 6 duplicates based on those columns and those are the first six rows, the seventh row shows where there is variation in the "bid" column but obviously any variation in any of those columns excludes the row as a duplicate.
I can easily visualise what I want to do : There are potentially millions of entries in the database, I want to exclude the 2 most recent rows of data based on the entry id where the "fid", "bid", "varc_a", "dec_b" and "varc_c" column values are the same and then sweep away what's left.
For the life of me I can't figure out how to do that using just MySQL and, as I say, all of the questions and answers I've looked at don't seem to be doing what I want to do or I'm not understanding what's proposed.
I know I can do this with PHP+MySQL by trawling through the data and removing the duplicates but considering I can do it in such a horribly inefficient way quite easily I'm thinking that I'm missing something obvious and I should be able to do it with MySQL alone ?
: Note :
Mike's answer is excellent and it did precisely what I need with a little tweaking given the context of my question. What I ended up using was this :
DROP TEMPORARY TABLE IF EXISTS keepers1, keepers2, keepers_all;
CREATE TEMPORARY TABLE keepers1 (KEY(id)) ENGINE=MEMORY AS
SELECT fid, bid, varc_a, dec_b, var_c, MAX(id) AS id
FROM market_prices
GROUP BY fid, bid, varc_a, dec_b, varc_c;
CREATE TEMPORARY TABLE keepers2 AS
SELECT fid, bid, varc_a, dec_b, varc_c, MAX(id) AS id
FROM market_prices AS k
WHERE NOT EXISTS (SELECT 1 FROM keepers1 WHERE id = k.id)
GROUP BY fid, bid, varc_a, dec_b, varc_c;
CREATE TEMPORARY TABLE keepers_all (KEY(id)) ENGINE=MEMORY AS
SELECT id FROM keepers1
UNION ALL
SELECT id FROM keepers2;
DELETE k.* FROM market_prices AS k WHERE NOT EXISTS (SELECT 2 FROM keepers_all WHERE id = k.id);
When grouping be sure to just use the columns that are duplicated and in that last statement your SELECT should be the number of records you want to keep, I needed a SELECT 2 at the end there.
Time to raise a glass to the man of the hour!

This may be a solution for your problem.
However, since there is no date-time column I am assuming that the id column is the primary key. And it is Auto_increment. So my assumption is that the larger the number the newer the record. (it should be true unless you had some old data dumps into the table)
Make sure you back up your data before you delete as this will cause you a permanent data lost. Even better, you can make a copy of the current table into a different table and work on he new table to make sure the logic below is correct. Then change the queries that I have below to read from tbl_new instead on tbl
you can duplicate your table via something like
CREATE TABLE tbl_new LIKE tbl;
I have left comments for every query
DROP TEMPORARY TABLE IF EXISTS keepers1, keepers2, keepers_all;
-- get the #1 top records
CREATE TEMPORARY TABLE keepers1 (KEY(id)) ENGINE=MEMORY AS
SELECT fid, bid, dec_a, varc_a, varc_b, dec_b, varc_c, MAX(id) AS id
FROM tbl
GROUP BY fid, bid, dec_a, varc_a, varc_b, dec_b, varc_c;
-- get the #2 top records
CREATE TEMPORARY TABLE keepers2 AS
SELECT fid, bid, dec_a, varc_a, varc_b, dec_b, varc_c, MAX(id) AS id
FROM tbl AS k
WHERE NOT EXISTS (SELECT 1 FROM keepers1 WHERE id = k.id)
GROUP BY fid, bid, dec_a, varc_a, varc_b, dec_b, varc_c;
-- create a temp table where you have all he ids that you want to keep
CREATE TEMPORARY TABLE keepers_all (KEY(id)) ENGINE=MEMORY AS
SELECT id FROM keepers1
UNION ALL
SELECT id FROM keepers2;
-- delete all records that you don't want to keep
DELETE k.* FROM tbl AS k WHERE NOT EXISTS (SELECT 1 FROM keepers_all WHERE id = k.id);
if this is a one time clean up job then you should be able to execute the queries from the console. but if you are looking for a recruiting Job them you should probably take this code and put it in a procedure.
Note: here I am using MEMORY TEMPORARY tables for better performance. You may run into an issue that say "Table is Full" this is because you have too many records. then you can increase the value max_heap_table_size for the session
something like
SET SESSION tmp_table_size = 1024 * 1024 * 1024 * 2; -- this will set it to 2G
SET SESSION max_heap_table_size = 1024 * 1024 * 1024 * 2; -- this will set it to 2G
This will give you your current value
SELECT VARIABLES LIKE 'max_heap_table_size';
SELECT VARIABLES LIKE 'tmp_table_size';

You will need to write a stored procedure. You can create the stored procedure either via PHP, or MySQL directly:
Creating via PHP
$createProc = "DROP PROCEDURE IF EXISTS `remove_dups`;
CREATE DEFINER=`root`#`localhost` PROCEDURE `remove_dups`( In id varchar(255))
BEGIN
...my code...
END;";
$conn = new PDO("mysql:host=$host;dbname=$dbname", $username, $password);
//create the stored procedure
$stmt = $conn->prepare($createProc);
$stmt->execute();
Create via MySQL GUI
Simply put the create statement in the text box and run it (against the proper DB):
CREATE DEFINER=`root`#`localhost` PROCEDURE `remove_dups`( In id varchar(255))
BEGIN
...my code...
END;";
Then you can call this procedure either from PHP or MySQL.
In your stored proc, you'll want to declare some variables to store the values in and do a check to find rows with the same values (using a cursor), and then check the id against the previous row's. If all the values are the same, delete to one with the lower id.

Safely auto increment MySQL field based on MAX() subquery upon insert

I have a table which contains a standard auto-incrementing ID, a type identifier, a number, and some other irrelevant fields. When I insert a new object into this table, the number should auto-increment based on the type identifier.
Here is an example of how the output should look:
id type_id number
1 1 1
2 1 2
3 2 1
4 1 3
5 3 1
6 3 2
7 1 4
8 2 2
As you can see, every time I insert a new object, the number increments according to the type_id (i.e. if I insert an object with type_id of 1 and there are 5 objects matching this type_id already, the number on the new object should be 6).
I'm trying to find a performant way of doing this with huge concurrency. For example, there might be 300 inserts within the same second for the same type_id and they need to be handled sequentially.
Methods I've tried already:
PHP
This was a bad idea but I've added it for completeness. A request was made to get the MAX() number for the item type and then add the number + 1 as part of an insert. This is quick but doesn't work concurrently as there could be 200 inserts between the request for MAX() and that particular insert leading to multiple objects with the same number and type_id.
Locking
Manually locking and unlocking the table before and after each insert in order to maintain the increment. This caused performance issues due to the number of concurrent inserts and because the table is constantly read from throughout the app.
Transaction with Subquery
This is how I'm currently doing it but it still causes massive performance issues:
START TRANSACTION;
INSERT INTO objects (type_id,number) VALUES ($type_id, (SELECT COALESCE(MAX(number),0)+1 FROM objects WHERE type_id = $type_id FOR UPDATE));
COMMIT;
Another negative thing about this approach is that I need to do a follow up query in order to get the number that was added (i.e. searching for an object with the $type_id ordered by number desc so I can see the number that was created - this is done based on a $user_id so it works but adds an extra query which I'd like to avoid)
Triggers
I looked into using a trigger in order to dynamically add the number upon insert but this wasn't performant as I need to perform a query on the table I'm inserting into (which isn't allowed so has to be within a subquery causing performance issues).
Grouped Auto-Increment
I've had a look at grouped auto-increment (so that the number would auto-increment based on type_id) but then I lose my auto-increment ID.
Does anybody have any ideas on how I can make this performant at the level of concurrent inserts that I need? My table is currently InnoDB on MySQL 5.5
Appreciate any help!
Update: Just in case it is relevant, the objects table has several million objects in it. Some of the type_id can have around 500,000 objects assigned to them.

Use transaction and select ... for update. This will solve concurrency conflicts.

In Transaction with Subquery
Try to make index on column type_id
I think by making index on column type_id it will speed up your subquery.

DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,type_id INT NOT NULL
);
INSERT INTO my_table VALUES
(1,1),(2,1),(3,2),(4,1),(5,3),(6,3),(7,1),(8,2);
SELECT x.*
, COUNT(*) rank
FROM my_table x
JOIN my_table y
ON y.type_id = x.type_id
AND y.id <= x.id
GROUP
BY id
ORDER
BY type_id
, rank;
+----+---------+------+
| id | type_id | rank |
+----+---------+------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 4 | 1 | 3 |
| 7 | 1 | 4 |
| 3 | 2 | 1 |
| 8 | 2 | 2 |
| 5 | 3 | 1 |
| 6 | 3 | 2 |
+----+---------+------+
or, if performance is an issue, just do the same thing with a couple of #variables.

Perhaps an idea to create a (temporary) table for all rows with a common "type_id".
In that table you can use auto-incrementing for your num colomn.
Then your num shoud be fully trustable.
Then you can select your data and update your first table.

WHERE vs HAVING in generated queries

I know that this title is overused, but it seems that my kind of question is not answered yet.
So, the problem is like this:
I have a table structure made of four tables (tables, rows, cols, values) that I use to recreate the behavior of the information_schema (in a way).
In php I am generating queries to retrieve the data, and the result would still look like a normal table:
SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')
HAVING (col2 LIKE "%4%")
OR
SELECT * FROM
(SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')) d
WHERE col2 LIKE "%4%"
note that the part where I define the columns of the result is generated by a php script. It is less important why I am doing this, but I want to extend this algorithm that generates the queries for a broader use.
And we got to the core problem, I have to decide if I will generate a where or a having part for the query, and I know when to use them both, the problem is my algorithm doesn't and I have to make a few extra checks for this. But the two above queries are equivalent, I can always put any query in a sub-query, give it an alias, and use where on the new derived table. But I wonder if I will have problems with the performance or not, or if this will turn back on me in an unexpected way.
I know how they both work, and how where is supposed to be faster, but this is why I came here to ask. Hopefully I made myself understood, please excuse my english and the long useless turns of phrases, and all.
EDIT 1
I already know the difference between the two, and all that implies, my only dilemma is that using custom columns from other tables, with variable numbers and size, and trying to achieve the same result as using a normally created table implies that I must use HAVING for filtering the derived tables columns, at the same time having the option to wrap it up in a subquery and use where normally, this probably will create a temporary table that will be filtered afterwards. Will this affect performance for a large database? And unfortunately I cannot test this right now, as I do not afford to fill the database with over 1 billion entries (that will be something like this: 1 billion in rows table, 5 billions in values table, as every row have 5 columns, 5 rows in cols table and 1 row in tables table = 6,000,006 entries in total)
right now my database looks like this:
+----+--------+-----------+------+
| id | name | title | dets |
+----+--------+-----------+------+
| 1 | table1 | Table One | |
+----+--------+-----------+------+
+----+-------+------+
| id | table | name |
+----+-------+------+
| 3 | 1 | col1 |
| 4 | 1 | col2 |
+----+-------+------+
where `table` is a foreign key from table `tables`
+----+-------+-------+
| id | table | extra |
+----+-------+-------+
| 1 | 1 | |
| 2 | 1 | |
+----+-------+-------+
where `table` is a foreign key from table `tables`
+----+-----+-----+----------+
| id | row | col | value |
+----+-----+-----+----------+
| 1 | 1 | 3 | 13 |
| 2 | 1 | 4 | 14 |
| 6 | 2 | 4 | 24 |
| 9 | 2 | 3 | asdfghjk |
+----+-----+-----+----------+
where `row` is a foreign key from table `rows`
where `col` is a foreign key from table `cols`
EDIT 2
The conditions are there just for demonstration purposes!
EDIT 3
For only two rows, it seems there is a difference between the two, the one using having is 0,0008 and the one using where is 0.0014-0.0019. I wonder if this will affect performance for large numbers of rows and columns
EDIT 4
The result of the two queries is identical, and that is:
+----------+------+
| col1 | col2 |
+----------+------+
| 13 | 14 |
| asdfghjk | 24 |
+----------+------+

HAVING is specifically for GROUP BY, WHERE is to provide conditional parameters. See also WHERE vs HAVING

I believe the having clause would be faster in this case, as you're defining specific values, as opposed to reading through the values and looking for a match.

See: http://database-programmer.blogspot.com/2008/04/group-by-having-sum-avg-and-count.html
Basically, WHERE filters out columns before passing them to an aggregate function, but HAVING filters the aggregate function's results.

you could do it like that
WHERE col2 In (14,24)
your code WHERE col2 LIKE "%4%" is bad idea so what about col2 = 34 it will be also selected.

delete duplicate rows that have blob text / mediumtext mysql

I have seen lots of posts on deleting rows using sql commands but i need to filter out rows which have mediumtext.
I keep getting an error Error Code: 1170. BLOB/TEXT column used in key specification without a key length from solution such as:
ALTER IGNORE TABLE foobar ADD UNIQUE (title, SID)
My table is simple, i need to check for duplicates in mytext, id is unique and they are AUTO_INCREMENT.
As a note, the table has about a million rows, and all attempts keep timing out. I would need a solution that performs actions in batches such as WHERE id>0 AND id<100
Also I am using MySQL Workbench on amazons RDS
From a table like this
+---+-----+-----+------+-------+
|id |fname|lname|mytext|morevar|
|---|-----|-----|------|-------|
| 1 | joe | min | abc | 123 |
| 2 | joe | min | abc | 123 |
| 3 | mar | kam | def | 789 |
| 4 | kel | smi | ghi | 456 |
+------------------------------+
I would like to end up with a table like this
+---+-----+-----+------+-------+
|id |fname|lname|mytext|morevar|
|---|-----|-----|------|-------|
| 1 | joe | min | abc | 123 |
| 3 | mar | kam | def | 789 |
| 4 | kel | smi | ghi | 456 |
+------------------------------+
update forgot to mention this is on amazon RDS using mysql workbench
my table is very large and i keep getting an error Error Code: 1205. Lock wait timeout exceeded from this sql command:
DELETE n1 FROM names n1, names n2 WHERE n1.id > n2.id AND n1.name = n2.name
Also, if anyone else is having issues with MySQL workbench timing out the fix is
Go to Preferences -> SQL Editor and set to a bigger value this parameter:
DBMS connection read time out (in seconds)

OPTION #1: Delete all duplicates records leaving one of each (e.g. the one with max(id))
DELETE
FROM yourTable
WHERE id NOT IN
(
SELECT MAX(id)
FROM yourTable
GROUP BY mytext
)
You could prefer using min(id).
Depending on the engine used, this won't work and, as it did, give you the Error Code: 1093. You can't specify target table 'yourTable' for update in FROM clause. Why? Because deleting one record may cause something to happen which made the WHERE condition FALSE, i.e. max(id) changes the value.
In this case, you could try using another subquery as a temporary table:
DELETE
FROM yourTable
WHERE id NOT IN
(
SELECT MAXID FROM
(
SELECT MAX(id) as MAXID
FROM yourTable
GROUP BY mytext
) as temp_table
)
OPTION #2: Use a temporary table like in this example or:
First, create a temp table with the max ids:
SELECT MAX(id) AS MAXID
INTO tmpTable
FROM yourTable
GROUP BY mytext;
Then execute the delete:
DELETE
FROM yourTable
WHERE id NOT IN
(
SELECT MAXID FROM tmpTable
);

How about this it will delete all the duplicate records from the table
DELETE t1 FROM foobar t1 , foobar t2 WHERE t1 .mytext= t2.mytext

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.