Mark duplicates in MySql with php (without deleting)

Mark duplicates in MySql with php (without deleting) - php

So, I'm having some problems with a MySQL query (see other question), and decided to try a different approach.
I have a database table with some duplicate rows, which I actually might need for future reference, so I don't want to remove. What I'm looking for is a way to display the data without those duplicates, but without removing them. I can't use a simple select query (as described in the other question).
So what I need to do is write a code that does the following:
1. Go through my db Table.
2. Spot duplicates in the "ip" column.
3. Mark the first instance of each duplicate with "0" (in a column named "duplicate") and the rest with "1".
This way I can later SELECT only the rows WHERE duplicate=0.
NOTE: If your solution is related to the SELECT query, please read this other question first - there's a reason I'm not just using GROUP BY / DISTINCT.
Thanks in advance.

MySQL doesn't have any ranking/analytical/windowing functionality, but you can use a variable instead:
SELECT t.ip,
CASE
WHEN #ip != t.ip THEN #rank := 0
ELSE #rank := #rank + 1
END AS duplicate,
#ip = t.ip
FROM TABLE t
JOIN (SELECT #rank := 0, #ip = NULL) r
ORDER BY t.ip
The first occurrence of the ip value will be have the value of zero in the duplicate column; all subsequent records will have value incrementing by one. If you don't want the incrementing number, use:
SELECT t.ip,
CASE
WHEN #ip != t.ip THEN 0
ELSE 1
END AS duplicate,
#ip = t.ip
FROM TABLE t
JOIN (SELECT #ip = NULL) r
ORDER BY t.ip
You can get a list of unique IP rows from that by using it in a subquery:
SELECT x.ip
FROM (paste either query in here) x
WHERE x.duplicate = 0

Related

MySql function and session / user variables inside the Mysql function

I would like to create a MySql function that will return an incremental row count as long as the given id is the same and if the id changes function would reset the count starting from 1.
Below is a result I am looking for, where you can see as long as the itemId (on left column) remains the same, the Count on right column will increments, and when itemId changes the Count will restart from 1.
In my mind, the MySql function like the one below would do the incremental counting and resetting, but unfortunately it returns 1 for each row. My thought was to provide the current itemId to the function and the function would compare the sent in id to to the one saved in #n session variable from last row, and as long as the id's are the same the function would return incremented row count, else it would reset to 1.
Can anybody guide me to why this function is not working? Or is there a better way to achieves the result I am looking for?
CREATE FUNCTION `nth`(id int) RETURNS tinyint(4)
BEGIN
declare ln tinyint;
if #saved_id = id then
set #n := #n+1;
set ln = #n;
else
set #saved_id := id;
set #n := 1;
set ln = #n;
end if;
RETURN ln;
END
The Mysql version I am using is 5.7
Here is the example query I am using, the itemId is foreign key
select id, itemId, started_at 'Start', stopped_at Stop, nth(started_at) 'Count'
from events
order by itemId, stopped_at

You don't need to define a UDF for this. You can achieve this within a SELECT query itself. In newer versions of MySQL (8.0.2 and above), it is achievable using ROW_NUMBER() OVER (PARTITION BY itemId ORDER BY id)
In older version, we can use the user-defined variables. In a Derived Table (subquery inside the FROM clause), we order our data such that all the rows having same itemId values come together, with further sorting between them based on id.
Now, we use this result-set and use conditional CASE..WHEN expressions to evaluate the numbering ("count"). It will be like a Looping technique (which we use in application code, eg: PHP). We would store the previous row values in the User-defined variables, and then check the current row's value(s) against the previous row. Eventually, we will assign row number ("Count") accordingly.
SELECT
dt.id,
dt.Start,
dt.Stop,
#rn := CASE WHEN dt.itemId = #itm THEN #rn + 1
ELSE 1
END AS Count,
#itm := dt.itemId AS itemId
FROM
(
SELECT
id,
itemId,
started_at AS Start,
stopped_at AS Stop
FROM events
ORDER BY itemID, id
) AS dt
CROSS JOIN (SELECT #itm := 0, #rn := 0) AS user_init_vars

Update Current Row in MySQL Loop

I have a MySQL table with over 16 million rows and there is no primary key. Whenever I try to add one, my connection crashes. I have tried adding one as an auto increment in PHPMyAdmin and in shell but the connection is always lost after about 10 minutes.
What I would like to do is loop through the table's rows in PHP so I can limit the number of results and with each returned row add an auto-incremented ID number. Since the number of impacted rows would be reduced by reducing the load on the MySQL query, I won't lose my connection.
I want to do something like
SELECT * FROM MYTABLE LIMIT 1000001, 2000000;
Then, in the loop, update the current row
UPDATE (current row) SET ID='$i++'
How do I do this?
Note: the original data was given to me as a txt file. I don't know if there are duplicates but I cannot eliminate any rows. Also, no rows will be added. This table is going to be used only for querying purposes. When I have added indexes, however, there were no problems.

I suspect you are trying to use phpmyadmin to add the index. As handy as it is, it is a PHP script and is limited to the same resources as any PHP script on your server, typically 30-60 seconds run time, and a limited amount of ram.
Suggest you get the mysql query you need to add the index, then use SSH to shell in, and use command line MySQL to add your indexes.

If you don't have duplicate rows then the following way might shed some light:
Suppose you want to update the auto incremented value for first 10000 rows.
UPDATE
MYTABLE
INNER JOIN
(SELECT
*,
#rn := #rn + 1 AS row_number
FROM MYTABLE,(SELECT #rn := 0) var
ORDER BY SOME_OF_YOUR_FIELD
LIMIT 0,10000 ) t
ON t.field1 = MYTABLE.field1 AND t.field2 = MYTABLE.field2 AND .... t.fieldN = MYTABLE.fieldN
SET MYTABLE.ID = t.row_number;
For next 10000 rows just need to change two things:
(SELECT #rn := 10000) var
LIMIT 10000,10000
Repeat..
Note: ORDER BY SOME_OF_YOUR_FIELD is important otherwise you would get results in random order. Better create a function which might take limit,offset as parameter and do this job. Since you need to repeat the process.
Explanation:
The idea is to create a temporary table(t) having N number of rows and assigning a unique row number to each of the row. Later make an inner join between your main table MYTABLE and this temporary table t ON matching all the fields and then update the ID field of the corresponding row(in MYTABLE) with the incremented value(in this case row_number).
Another IDEA:
You may use multithreading in PHP to do this job.
Create N threads.
Assign each thread a non overlapping region (1 to 10000, 10001 to
20000 etc) like the above query.
Caution: The query will get slower in higher offset.

Insert Records All At Once

I have a table that has been functional and i added a column to the table. After adding the column i want to add the result of a query (query is same for all but different results) into that column all at once instead of one at a time which will be time consuming. How can i achieve that? Cos after updating, i have just one result in all the column, i cannot use a where clause cos it will require me doing it one after the other
$stmt = $pdo->prepare("UPDATE table SET my_value = '$myValue' ");
$stmt->execute();

UPDATE table
SET my_value = (select col from some_table where ...)

If the value is the same for all rows, I would advise using cross join:
update table t cross join
(select newval . . .) x
set t.col = x.newval;
Note: this is better than a subquery, because the subquery is guaranteed to be evaluated only once.
If you are trying to say that the value is the same for groups of columns, then extend this to a join:
update table t join
(select grp, newval . . .) x
on t.grp = x.grp
set t.col = x.newval;

After adding the column I want to add the result of a query (query
result is same for all) into that column all at once instead of one at
a time which will be time consuming.
The solution depends on what you mean by "Is the same for all the rows."
If you have one value that is exactly the same for all columns, you can just ask for it and then update. This is usually faster (and allows you to debug more easily) than using pure SQL to achieve everything.
If, on the other hand, you mean the values of that column are retrieved by the same query, but will be different for different rows, then a subquery or a cross join as Gordon suggested will do the trick.

Seek a specific record in MySQL paged results

I've a classic pagination system using LIMIT startrecord, endrecord and I want to figure out in what page number an X record is located.
The only idea I've right now is to seek recursively all the records to find it out. But I'm looking for a much more "economic" method!
Any ideas ?

You could use a sub query to create a table with the results and their position, then query that for the specific entry you are looking at:
SET #rank=0;
SELECT rank, record
FROM (
SELECT
#rank:=#rank+1 AS rank,
record
FROM table
) as subquery
WHERE record = x;
The returned table would show the record an the rank it appeared in the original query. You can the divide the rank by the number of results per page... Or build it into the query. Hope this helps.

Cout the number of records that are prior to the one you are looking for. This requires you to assume an order for your query which is natural.
SELECT COUNT(id) AS c
FROM tbl
WHERE sort_field < ((SELECT sort_field FROM tbl WHERE id = 18))
OR (sort_field = ((SELECT sort_field FROM tbl WHERE id = 18)) AND id < 18);
Then just retrieve the c and calculate ceilling(c/page_size). This will give you the page number that your record will fall in. The only important thing to remember is that you need to sort the records in the same order as you would in your query with limit.
To describe what the query does, it counts the number records that stand before the record with id 18. The only tricky part is with records with the same value as for their sort_field in which MySQL will make use of primary key and in our case the id. And that's why we have the OR part in our condition. In my answer I'm assuming you are sorting your original query (with limit statement in it) ascending, but if you are sorting descending then you need to change all of < to >.

Use something like this with your query as part of the s subselect
SELECT s.row, s.RECORD, YOUR_OTHER_FIELDS...
FROM (SELECT #row := 0) cnt
JOIN (SELECT #row := #row + 1 row, RECORD, ...YOUR QUERY WITH ORDER BY ...) s
WHERE s.RECORD = <desired record number>
and divide row by the pagesize from your pagination.
Concrete but nonsensical example:
SELECT p.row, p.id
FROM (SELECT #row := 0) cnt
JOIN (SELECT #row := #row + 1 row, id FROM products ORDER BY id desc) p
WHERE p.id = 485166
As intended, the value of row changes with the order you use in the subselect.
It folds the variable initialization into the query so this is only one statement.
It also does not depend on a natural order or distribution of rows - as long as the order they ARE returned in stays the same for whatever ORDER you specify (or leave out).

if this is something that you will use often, i think it is a good idea to create an stored procedure or a function. We can use a cursor inside, to iterate through the results and get the position of the desired item. I think this will be faster, it wont have to iterate to all the records, and dont need a subquery (for all this i would say that it is more economic) and you can use order, join, and whatever you need.
DELIMITER $$
CREATE FUNCTION position ( looking_for INT )
RETURNS INT
READS SQL DATA
BEGIN
-- First we declare all the variables we will need
DECLARE id INT;
DECLARE pos INT;
SET pos=0;
-- flag which will be set to true, when cursor reaches end of table
DECLARE exit_loop BOOLEAN;
-- Declare the sql for the cursor
DECLARE pos_cursor CURSOR FOR
SELECT id
FROM your_table
--you can use where, join, group by, order and whatever you need
--end of query
-- Let mysql set exit_loop to true, if there are no more rows to iterate
DECLARE CONTINUE HANDLER FOR NOT FOUND SET exit_loop = TRUE;
-- open the cursor
OPEN example_cursor;
-- marks the beginning of the loop
example_loop: LOOP
-- read the id from next row into the variable id
FETCH pos_cursor INTO id;
-- increment the pos var
SET pos=pos+1;
-- check if we found the desired item,
-- if it has been set we close the cursor and exit
-- the loop
IF id=looking_for THEN
CLOSE example_cursor;
LEAVE example_loop;
END IF;
-- check if the exit_loop flag has been set by mysql,
-- if it has been set we close the cursor and exit
-- the loop
IF exit_loop THEN
CLOSE example_cursor;
LEAVE example_loop;
END IF;
END LOOP example_loop;
RETURN pos;
END $$
DELIMITER ;
You create the function just once, and for using it, you just need to use this sql:
CALL position(ID_OF_THE_ITEM_YOU_ARE_LOOKING_FOR);
and it returns the position of the item, in the position [0][0] of the returned rowset.
Of course instead of the id you can create a function that compares the name, or any other field, or even more than one.
If the query is always diferent, then you cannot use a function, but you can still use the cursor (the syntax will be the same). You can build the cursor in your PHP, let pos be a System variable (using #pos), and in any case just add the specific sql of the query (the part between DECLARE pos_cursor CURSOR FOR and --end of query)

You can't really create an "economic" way. You have to get the full list of records from the DB since there is no way to know the position of a record from MySQL.
Depending on your sorting, the frequency at which the data changes, you could assign the record its position in a column: add column position to the table you are querying. That might not be feasible in all cases.

Deleting rows not returning to original numbers

Just working with a database and some tests were done recently which checked the integrity of the setup.
As a result, a lot of test entries were added which were then deleted. However, when new entries are added, the ID number value continues from after the entries added.
What I want:
ID increases by one from where it left off before the additional rows were added:
4203, 4204, 4205, 4206 etc.
What is happening:
ID increases by one from after the additional rows ID:
4203, 4204, 6207, 6208 6209 etc.
Not sure where to fix this...whether in phpmyadmin or in the PHP code. Any help would be appreciated. Thanks!

I have ran into this before and I solve it easily with phpMyAdmin. Select the database, select the table, open the operations tab, and in the Table Options set the AUTO_INCREMENT to 1 then click GO. This will force mysql to look for the last auto incremented value and then set it to the value directly after that. I do this on a manually basis that way I know that when a row is skipped that it was not from testing but a deletion because when I test and delete the rows I fix the AI value.

I don't think there's a way to do this with an auto-incrementing ID key.
You could probably do it by assigning the ID to (select max(id) + 1 from the_table)

You could drop the primary key then recreate it, but this would reassign all the existing primary keys so could cause issues with relationships (although if you don't have any gaps in your primary key you may get away with it).
I would however say that you should accept (and your app should reflect) the possibility of missing IDs. For example in a web app if someone links to a missing ID you would want a 404 returned not a different record.

There should be no need to "reset" the id values; I concur with the other comments concerning this issue.
The behavior you observe with AUTO_INCREMENT is by design; it is described in the MySQL documentation.
With all that said, I will describe an approach you can use to change the id values of those rows "downwards", and make them all contiguous:
As a "stepping stone" first step, we will create a query that gets a list of the id values that we need changed, along with a proposed new id value we are going to change it to. This query makes use of a MySQL user variable.
Assuming that 4203 is the id value you want to leave as is, and you want the next higher id value to be reset to 4204, the next higher id to be reset to 4205, etc.
SELECT s.id
, #i := #i + 1 AS new_id
FROM mytable s
JOIN (SELECT #i := 4203) i
WHERE s.id > 4203
ORDER BY s.id
(Note: the constant value 4203 appears twice in the query above.)
Once we're satisfied that this query is working, and returning the old and new id values, we can use this query as an inline view (MySQL calls it a derived table), in a multi-table UPDATE statement. We just wrap that query in a set of parentheses, and give assign it an alias, so we can reference it like a regular table. (In an inline view, MySQL actually materializes the resultset returned by the query into a MyISAM table, which probably explains why MySQL refers to it as a "derived table".)
Here's an example UPDATE statement that references the derived table:
UPDATE ( SELECT s.id
, #i := #i + 1 AS new_id
FROM mytable s
JOIN (SELECT #i := 4203) i
WHERE s.id > 4203
ORDER BY s.id
) n
JOIN mytable t
ON t.id = n.id
SET t.id = n.new_id
ORDER BY t.id
Note that the old id value from the inline view is matched to the id value in the existing table (the ON clause), and the "new_id" value generated by the inline view is assigned to the id column (the SET clause.)
Once the id values are assigned, we can reset the AUTO_INCREMENT value on the table:
ALTER TABLE mytable AUTO_INCREMENT = 1;
NOTE: this is just an example, and is provided with the caveat that this should not be necessary to reassign id values. Ideally, primary key values should be IMMUTABLE i.e. they should not change once they have been assigned.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Mark duplicates in MySql with php (without deleting) - php

Related

MySql function and session / user variables inside the Mysql function

Update Current Row in MySQL Loop

Insert Records All At Once

Seek a specific record in MySQL paged results

Deleting rows not returning to original numbers

Categories

Resources