MySQL/PHP: Finding IDs that are NOT stored in Database [duplicate] - php

This question already has answers here:
MySQL get missing IDs from table
(8 answers)
Closed 4 months ago.
I am using a MySQL database with a table containing random records. The only column that is interesting for my use case is a BIGINT column called "ID". This is also the primary key of that table, but it is not an AUTO-INCREMENT column and since this data is fetched from external sources, these IDs are not continuously.
Sample-Data from that Table:
[ID]
201101
201504
201641
201755
...
I need to find an efficient way to find all the IDs that are NOT yet stored in that database within a specific range. For instance (pseudo):
GetUnusedIDs(RangeStart = 201100; RangeEnd = 201600);
-> 201100
-> 201102
-> 201103
-> ...
-> 201503
-> 201505
-> ...
-> 201600
What I did so far was fetching all values within that range into a PHP-Array, then within a FOR-Loop from RangeStart to RangeEnd checking for each number if it is contained in that specific array and if not, adding it to a new array containing only the numbers that don't yet exist in the database.
I think there must be a better (more efficient) way to do this.
Thank you in advance!

You can do this within MySQL by creating the sequential integers in a seed table and checking which don't exist in your main table, be sure to filter the seed table by the minimum and maximum to increase performance, and alter the number of cross joins to fit the range
with seed as (
select null as n
union all select null
union all select null
union all select null
union all select null
),
numbers as (
select row_number() OVER ( ORDER BY a.n ) + "YOUR_MIN_ID" n
from seed a,
seed b,
seed c,
seed d,
seed e,
seed f,
seed g
)
select n from numbers where not exists ( select null from Table where n = ID )

Related

Speed-up/Optimise MySQL statement - finding a new row that hasn't been selected before

First a bit of background about the tables & DB.
I have a MySQL db with a few tables in:
films:
Contains all film/series info with netflixid as a unique primary key.
users:
Contains user info "ratingid" is a unique primary key
rating:
Contains ALL user rating info, netflixid and a unique primary key of a compound "netflixid-userid"
This statement works:
SELECT *
FROM films
WHERE
INSTR(countrylist, 'GB')
AND films.netflixid NOT IN (SELECT netflixid FROM rating WHERE rating.userid = 1)
LIMIT 1
but it takes longer and longer to retrieve a new film record that you haven't rated. (currently at 6.8 seconds for around 2400 user ratings on an 8000 row film table)
First I thought it was the INSTR(countrylist, 'GB'), so I split them out into their own tinyint columns - made no difference.
I have tried NOT EXISTS as well, but the times are similar.
Any thoughts/ideas on how to select a new "unrated" row from films quickly?
Thanks!
Try just joining?
SELECT *
FROM films
LEFT JOIN rating on rating.ratingid=CONCAT(films.netflixid,'-',1)
WHERE
INSTR(countrylist, 'GB')
AND rating.pk IS NULL
LIMIT 1
Or doing the equivalent NOT EXISTS.
I would recommend not exists:
select *
from films f
where
instr(countrylist, 'GB')
and not exists (
select 1 from rating r where r.userid = 1 and f.netflixid = r.netflixid
)
This should take advantage of the primary key index of the rating table, so the subquery executes quickly.
That said, the instr() function in the outer query also represents a bottleneck. The database cannot take advantage of an index here, because of the function call: basically it needs to apply the computation to the whole table before it is able to filter. To avoid this, you would probably need to review your design: that is, have a separate table to represent the relationship between movies and countries, which each tuple on a separate row; then, you could use another exists subquery to filter on the country.
The INSTR(countrylist, 'GB') could be changed on countrylist = 'GB' or countrylist LIKE '%GB%' if the countrylist contains more than the country.
Then don't select all '*' if you need only some columns details. Depends on the number of columns, the query could be really slow

Select *, count(*) in one Query [duplicate]

This question already has an answer here:
SQL counting all rows instead of counting individual rows
(1 answer)
Closed 5 years ago.
I've got a DB with a few columns and I'm trying to populate a html table with it.
Everything's going fine but I've encountered the following problem:
Since I'm filling filtered Results into different Columns, I came up with a SQL Query that needs both Select * and count(*)?
$query = "SELECT *, COUNT(example_A) AS total_example_A FROM test WHERE example_A = 'certain_result' AND date(start_date) = '$current_date_proof' ORDER BY start_date ASC";
It does work, but I'm only getting the first result. I guess I cannot combine Select with Count?
You can do it with a correlated sub-query, Count is an aggregation function ( so it aggregates or combines all the data ):
$query = "
SELECT
t1.*,
( SELECT COUNT(t0.id) FROM test AS t0 WHERE t0.id = t1.id ) AS total_example_A
FROM
test AS t1
WHERE
t1.example_A = 'certain_result'
AND
date(t1.start_date) = '$current_date_proof'
ORDER BY t1.start_date ASC
";
This assumes that your table test has a primary key named id. One other thing is I would count on the primary key if its not (example_A) COUNT(t0.id)
In my world a database either have a Auto Increment Int as the primary key or they have a compound primary key consisting of 2 or more foreign keys which are themselves Auto Increment Int fields. It's vital ( IMO ) to always have a surrogate key in you table. That is a key that has no direct relationship to the data itself. But, that is just me...
You could just count the return within your application, but barring that the correlated sub-query should give you the best/goodest performance. Certainly much better then a separate database call.

Given MySQL DB with duplicate rows based on criteria, list all the OLDER rows, for external processing (php)

Situation:
Old scripts added rows to a table without deleting existing rows.
Need to discover "duplicate" rows (based on matching two fields).
For each set of duplicate rows, sort by ids and return all but the newest one (highest id).
Each row has an associated external file, so can't simply delete the older rows - need to return a list of all the older rows, which will then be processed by a php script.
Example:
TABLE mytable:
ID A B Filename
1 10 abc aa.png
2 11 dddd bb.xml
3 10 abc cc.png
4 10 dddd dd.png
5 10 abc ee.xml
6 11 dddd ff.xml
Rows with IDs 1 & 3 & 5 are duplicates (both A and B match).
Similarly, 2 & 6 are duplicates. Return list (1, 2, 3) - these are the "older" rows that need to be processed.
Even better: return a set of records, containing 'ID' and 'Filename' for those rows.
My primary question is an SQL query that does this, though it would also be useful to me to see how to use the result of that query in php.
There are existing stackoverflow posts related to deleting duplicate rows, but the ones I found delete the rows directly. This won't work for me, as I need to have the external php script delete the corresponding external files:
Deleting Duplicate Rows from MySql Table
How to delete duplicate records in mysql database?
How to delete all the duplicate records in PHP/Mysql
IMPORTANT: The other posts which I quote don't bother to distinguish newer from older; they are about removing fully duplicate records, but that is not my situation. I have records which are partially duplicates; that is, several records match the specified criteria, but there is important information in other fields, hence I have to know which is newest (highest id) for each value of criteria; those are the ones to keep.
I would try this "make sure you test the code before to apply it on production data"
Assuming you have lots of data, I would create temporary table of the data that you want to keep so you can perform the operation fast.
-- Generate a list of the IDs to keep
CREATE TEMPORARY TABLE keepers (KEY(ID)) ENGINE = MEMORY
SELECT A, B MIN(ID) AS ID
FROM table
GROUP BY A, B;
-- Delete the records that you do not wish to keep
DELETE FROM table
WHERE NOT EXISTS (SELECT 1 FROM keepers WHERE ID = table.ID);
If the DELETE query does not work "return an error" about the sub query, you can try this instead of the DELETE query.
CREATE TEMPORARY TABLE deleteme (KEY(ID)) ENGINE = MEMORY
SELECT ID FROM table
WHERE NOT EXISTS (SELECT 1 FROM keepers WHERE ID = table.ID);
DELETE t.* FROM table AS t
INNER JOIN deleteme AS d ON d.ID = t.ID;
To get the data:
Select the records you want to keep (inner query) and join back on itself (outer query) keeping all records and using the dummyfield to find the to be deleted records.
CREATE TEMPORARY TABLE delete_these AS
SELECT *
FROM table a
LEFT JOIN (SELECT MAX(id) as non_deletion_id, 1 AS dummyfield,
FROM table a
GROUP BY your two fields) b ON non_deletion_id=a.id
WHERE dummyfield IS NULL;

Repeated Insert copies on ID

We have records with a count field on an unique id.
The columns are:
mainId = unique
mainIdCount = 1320 (this 'views' field gets a + 1 when the page is visited)
How can you insert all these mainIdCount's as seperate records in another table IN ANOTHER DBASE in one query?
Yes, I do mean 1320 times an insert with the same mainId! :-)
We actually have records that go over 10,000 times an id. It just has to be like this.
This is a weird one, but we do need the copies of all these (just) counts like this.
The most straightforward way to this is with a JOIN operation between your table, and another row source that provides a set of integers. We'd match each row from our original table to as many rows from the set of integer as needed to satisfy the desired result.
As a brief example of the pattern:
INSERT INTO newtable (mainId,n)
SELECT t.mainId
, r.n
FROM mytable t
JOIN ( SELECT 1 AS n
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) r
WHERE r.n <= t.mainIdCount
If mytable contains row mainId=5 mainIdCount=4, we'd get back rows (5,1),(5,2),(5,3),(5,4)
Obviously, the rowsource r needs to be of sufficient size. The inline view I've demonstrated here would return a maximum of five rows. For larger sets, it would be beneficial to use a table rather than an inline view.
This leads to the followup question, "How do I generate a set of integers in MySQL",
e.g. Generating a range of numbers in MySQL
And getting that done is a bit tedious. We're looking forward to an eventual feature in MySQL that will make it much easier to return a bounded set of integer values; until then, having a pre-populated table is the most efficient approach.

Mysql intersect two strings

I have the following tables:
TableFinal
column id, with first row having value 1
column numbers, with first row having value `1,5,6,33,2,12,3,4,9,13,26,41,59,61,10,7,28`
And
TablePick
column id, with first row having value 1
column numbers, with first row having value 2,12,26,33
I want to check if the numbers from TablePick, column "selected" are contained in the column "numbers" of TableFinal.
I have to mention that in TablePick, the numbers in column "selected" are ordered ASC, while in TableFinal, the numbers in column "numbers" are shuffled.
Usually I would put each of these in an array using PHP and then intersect the 2 arrays and count the resulted array. But in MYSQL, it is not that simple, so practically I have no idea where to start.
Maybe I should create an ARRAY_INTERSECT function? Or do we have a simpler solution?
SELECT * FROM TablePick p RIGHT JOIN TableFinal f ON f.id=p.id WHERE ARRAY_INTERSECT(p.selected,f.numbers)
Sorry to say so, but your schema needs some serious maintenance: NEVER EVER store more than one information in one field, if you need to access them separately.
You need a pair of join tables, where instead of the first row (1, "1,5,6,33,2,12,3,4,9,13,26,41,59,61,10,7,28") you have the rows
(1,1)
(1,5)
(1,6)
(1,33)
...
and instead of the row (1, "2,12,26,33") you have the rows
(1,2)
(1,12)
(1,26)
(1,33)
Now you query is simply:
SELECT ... FROM TableFinal
INNER JOIN TABLE TablePick ON TableFinal.number=TablePick.number
WHERE TableFinal.id=1
AND TablePick.id=1
EDIT
Please understand, that even if this were possible without MySQL abuse, it would be a performance killer, once the number of rows start to rise: We are talking of n*m array intersects, if the tables have n and m rows respectivly.

Categories