removing duplicate row from mysql where value equals something - php

I've all the way to the end of the internet and I'm proper stuck. Whilst I can find partial answer I'm unable to modify it to make it work.
I have a table named myfetcher like:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| fid_id | int(11) | NO | PRI | NULL | auto_increment |
| linksetid | varchar(200) | NO | | NULL | |
| url | varchar(200) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
The url field would sometimes contain dupes but rather than remove all duplicates in the table, I need only where the field linksetid is equal to X.
The SQL below removes all duplicates in the table (which is not what I want)... but what I want is only the duplicates within a set range in the field linksetid. I know I'm doing something wrong, just not sure what is it.
DELETE FROM myfetcher USING myfetcher, myfetcher as vtable
WHERE (myfetcher.fid>vtable.fid)
AND (myfetcher.url=vtable.url)
AND (myfetcher.linksetid='$linkuniq')

Delete only records with linksetid=X. First EXISTS check case when all records are with linksetid=X then only one with min(fid) remains. The second EXISTS check case when there is a record with linksetid<>X then all records with linksetid=X will be removed:
NOTE: this query works in Oracle or MSSQL. For MYSql use next workaround:
DELETE FROM myfetcher
where (myfetcher.linksetid='$linkuniq')
and
(
exists
(select t.fid from myfetcher t where
t.fid<myfetcher.fid
and
t.url=myfetcher.url
and
t.linksetid='$linkuniq')
or
exists
(select t.fid from myfetcher t where
t.url=myfetcher.url
and
t.linksetid<>'$linkuniq')
)
In MYSql you can't use update/delete command with subquery for the target table. So for MySql you can use following script. SqlFiddle demo:
create table to_delete_tmp as
select fid from myfetcher as tmain
where (tmain.linksetid='$linkuniq')
and
(
exists
(select t.fid from myfetcher t where
t.fid<tmain.fid
and
t.url=tmain.url
and
t.linksetid='$linkuniq')
or
exists
(select t.fid from myfetcher t where
t.url=tmain.url
and
t.linksetid<>'$linkuniq')
) ;
delete from myfetcher where myfetcher.fid in (select fid from to_delete_tmp);
drop table to_delete_tmp;

Related

Know the last modified row or id's row in a mysql table

I'm using Mysql 5.5 and by example I have a table like this
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| idgroups | int(11) | NO | PRI | NULL | auto_increment |
| group_id | int(11) | YES | | NULL | |
| group_name | varchar(45) | YES | | NULL |
Where some people are allowed to do inserts,update and delete but I want to know which is the last modified row or row's id in a given time
Any ideas?
Thanks in advance
My suggestion would be to create a second table. something like edit_history for recording modifications. You can put triggers on your groups table above that says "Any time a record is inserted, deleted, or updated, create a record in my edit_history table".
A trigger can be created as follows:
CREATE TRIGGER trigger_name
AFTER INSERT
ON table_name FOR EACH ROW
BEGIN
-- For each row inserted
-- do something...
END;
Since your field is auto_increment, you can just select the maximum value of idgroups to get the most recently inserted value:
select max(idgroups) from tbl
to get last modified in general will require additional structure to your table. In particular, if you are deleting, you will need to store what you have most recently deleted somewhere.

How can i update the Records included in another query using SUM and GROUP By in mysql

I am having a mysql table
content_votes_tmp
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| up | int(11) | NO | MUL | 0 | |
| down | int(11) | NO | | 0 | |
| ip | int(10) unsigned | NO | | NULL | |
| content | int(11) | NO | | NULL | |
| datetime | datetime | NO | | NULL | |
| is_updated | tinyint(2) | NO | | 0 | |
| record_num | int(11) | NO | PRI | NULL | auto_increment |
+------------+------------------+------+-----+---------+----------------+
surfers can vote up or vote down on posts i.e. content, a record gets inserted everytime a vote is given same as rating , in the table along with other data like ip , content id
Now i am trying to create cronjob script in php which will SUM(up) and SUM(down) of votes
like this,
mysqli_query($con, "SELECT SUM(up) as up_count, SUM(down) as down_count, content FROM `content_votes_tmp` WHERE is_updated = 0 GROUP by content")
and then by using while loop in php i can update the main table for the specific content id,
but i would like to set the records which are part of SUM to be marked as updated i.e. SET is_updated = 1, so the same values wont get summed again and again.
How can i achieve this ? using mysql query ? and work on same data set as , every second/milisecond the records are getting inserted in the table ,.
i can think of another way of achieving this is by getting all the non-updated records and doing sum in the php and then updating every record.
The simplest way would probably be a temporary table. Create one with the record_num values you want to select from;
CREATE TEMPORARY TABLE temp_table AS
SELECT record_num FROM `content_votes_tmp` WHERE is_updated = 0;
Then do your calculation using the temp table;
SELECT SUM(up) as up_count, SUM(down) as down_count, content
FROM `content_votes_tmp`
WHERE record_num IN (SELECT record_num FROM temp_table)
GROUP by content
Once you've received your result, you can set is_updated on the values you just calculated over;
UPDATE `content_votes_tmp`
SET is_updated = 1
WHERE record_num IN (SELECT record_num FROM temp_table)
If you want to reuse the connection to do the same thing again, you'll need to drop the temporary table before creating it again, but if you just want to do it a single time in a page, it will disappear automatically when the database is disconnected at the end of the page.

Speed Up MySQL (MyISAM) COUNTs with WHERE Clauses

We are implementing a system that analyses books. The system is written in PHP, and for each book loops through the words and analyses each of them, setting certain flags (that translate to database fields) from various regular expressions and other tests.
This results in a matches table, similar to the example below:
+------------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| regex | varchar(250) | YES | | NULL | |
| description | varchar(250) | NO | | NULL | |
| phonic_description | varchar(255) | NO | | NULL | |
| is_high_frequency | tinyint(1) | NO | | NULL | |
| is_readable | tinyint(1) | NO | | NULL | |
| book_id | bigint(20) | YES | | NULL | |
| matched_regex | varchar(255) | YES | | NULL | |
| [...] | | | | | |
+------------------------+--------------+------+-----+---------+----------------+
Most of the omitted fields are tinyint, either 0 or 1. There are currently 25 fields in the matches table.
There are ~2,000,000 rows in the matches table, the output of analyzing ~500 books.
Currently, there is a "reports" area of the site which queries the matches table like this:
SELECT COUNT(*)
FROM matches
WHERE is_readable = 1
AND other_flag = 0
AND another_flag = 1
However, at present it takes over a minute to fetch the main index report as each query takes about 0.7 seconds. I am caching this at a query level, but it still takes too long for the initial page load.
As I am not very experienced in how to manage datasets such as this, can anyone advise me of a better way to store or query this data? Are there any optimisations I can use with MySQL to improve the performance of these COUNTs, or am I better off using another database or data structure?
We are currently using MySQL with MyISAM tables and a VPS for this, so switching to a new database system altogether isn't out of the question.
You need to use indexes, create them on the columns you do a WHERE on most frequently.
ALTER TABLE `matches` ADD INDEX ( `is_readable` )
etc..
You can also create indexes based on multiple columns, if your doing the same type of query over and over its useful. phpMyAdmin has the index option on the structure page of the table at the bottom.
Add multi index to this table as you are selecting by more than one field. Below index should help a lot. Those type of indexes are very good for boolean / int columns. For indexes with varchar values read more here: http://dev.mysql.com/doc/refman/5.0/en/create-index.html
ALTER TABLE `matches` ADD INDEX ( `is_readable`, `other_flag`, `another_flag` )
One more thing is to check your queries by using EXPLAIN {YOUR WHOLE SQL STATEMENT} to check which index is used by DB. So in this example you should run query:
EXPLAIN ALTER TABLE `matches` ADD INDEX ( `is_readable`, `other_flag`, `another_flag` )
More info on EXPLAIN: http://dev.mysql.com/doc/refman/5.0/en/explain.html

Recursive-ish query for tags?

I have a table of tags that can be linked to other tags and I want to "recursively" select the tags in order of arrangement. So that when a search is made, we get the immediate (1-level) results and then carry on down to say 5-levels so that we always have a list of tags no matter if there wasn't enough exact matches on level 1.
I can manage this fine with making multiple queries until I get enough results, but surely there is a better, optimized, way via a one-trip query?
Any tips will be appreciated.
Thanks!
Results:
tagId, tagWord, child, child tagId
'513', 'Slap', 'Hog Slapper', '1518'
'513', 'Slap', 'Corporal Punishment', '147'
'513', 'Slap', 'Impact Play', '1394'
Query:
SELECT t.tagId, t.tagWord as tag, tt.tagWord as child, tt.tagId as childId
FROM platform.tagWords t
INNER JOIN platform.tagsLinks l ON l.parentId = t.tagId
INNER JOIN platform.tagWords tt ON tt.tagId = l.tagId
WHERE t.tagWord = 'slap'
Table Layouts:
mysql> explain tagWords;
+---------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+---------------------+------+-----+---------+----------------+
| tagId | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| tagWord | varchar(45) | YES | UNI | NULL | |
+---------+---------------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> explain tagsLinks;
+----------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------------------+------+-----+---------+-------+
| tagId | bigint(20) unsigned | NO | | NULL | |
| parentId | bigint(20) | YES | | NULL | |
+----------+---------------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
AFAIK Mysql doesn't have any mechanism for querying data recursively
Oracle has Connected By construct and Sql Server has CTE(Common Table Expressions).
But Mysql,
Read Here and Here
Here are the options that I consider each time I find myself in a situation when I need to query hierarchical data.
Nested Sets
Path enumeration
Explicit joins (when the maximum level is known)
Vendor Extensions (SQL Server CTE, Oracle Connect by etc)
Stored Procedures
Suck it up

Finding mySQL duplicates, then merging data

I have a mySQL database with a tad under 2 million rows. The database is non-interactive, so efficiency isn't key.
The (simplified) structure I have is:
`id` int(11) NOT NULL auto_increment
`category` varchar(64) NOT NULL
`productListing` varchar(256) NOT NULL
Now the problem I would like to solve is, I want to find duplicates on productListing field, merge the data on the category field into a single result - deleting the duplicates.
So given the following data:
+----+-----------+---------------------------+
| id | category | productListing |
+----+-----------+---------------------------+
| 1 | Category1 | productGroup1 |
| 2 | Category2 | productGroup1 |
| 3 | Category3 | anotherGroup9 |
+----+-----------+---------------------------+
What I want to end up is with:
+----+----------------------+---------------------------+
| id | category | productListing |
+----+----------------------+---------------------------+
| 1 | Category1,Category2 | productGroup1 |
| 3 | Category3 | anotherGroup9 |
+----+----------------------+---------------------------+
What's the most efficient way to do this either in pure mySQL query or php?
I think you're looking for GROUP_CONCAT:
SELECT GROUP_CONCAT(category), productListing
FROM YourTable
GROUP BY productListing
I would create a new table, inserting the updated values, delete the old one and rename the new table to the old one's name:
CREATE TABLE new_YourTable SELECT GROUP_CONCAT(...;
DROP TABLE YourTable;
RENAME TABLE new_YourTable TO YourTable;
-- don't forget to add triggers, indexes, foreign keys, etc. to new table
SELECT MIN(id), GROUP_CONCAT(category SEPARATOR ',' ORDER BY id), productListing
FROM mytable
GROUP BY
productListing

Categories