Mysql SELECT with an OR across 2 columns - php

I'm creating a 'similar items' link table.
i have a 2 column table. both columns contains product ids.
CREATE TABLE IF NOT EXISTS `prod_similar` (
`id_a` int(11) NOT NULL,
`id_b` int(11) NOT NULL
)
INSERT INTO `prod_similar` (`id_a`, `id_b`) VALUES
(5, 10),
(5, 15),
(10, 13),
(10, 14),
(14, 5),
(14, 13);
I want to select 3 similar products, favouring products where the id is in the first col, 'id_a'
SELECT * FROM prod_similar WHERE id_a={$id} OR id_b={$id}
ORDER BY column(?)
LIMIT 3

Don't know, maybe this?
SELECT *
FROM similar_items
WHERE col_1={$id} OR col_2={$id}
ORDER BY CASE WHEN col_1={$id} THEN 0 WHEN col_2={$id} THEN 1 END
LIMIT 3

I assume you have other columns as well
(SELECT 1 favouring, id_a id, [other columns]
FROM prod_similar
WHERE id_a = {$id})
UNION
(SELECT 2 favouring, id_b id, [other columns]
FROM prod_similar
WHERE id_b = {$id})
ORDER BY favouring, id
LIMIT 3;
In case you don't mind duplicates or there are none between id_a and id_b you can do UNION ALL instead which is considerably faster.
Unions are indication of denormalized data, denormalized data improves speed of certain queries and reduces speed of others (such as this).

An easy way to do this is this:
ORDER BY NULLIF(col_1, {$id}) LIMIT 3
The CASE WHEN works as well, but this is bit simpler.

I am not sure I get the question, could you maybe post example data for the source table and also show what the result should look like.
If I got you right i would try something like
Select (case
when col_1={$ID}:
col1
when col_2={$ID}:
col2) as id from similar_items WHERE col_1={$id} OR col_2={$id}
LIMIT 3

Related

Return top 10 most frequently occuring values that same id as a variable. 3m+ rows

EDIT:: SOLVED I was using a for loop when a while loop was the correct option to print the results. Many thanks to all for contributing below.. I have left all steps below for reference but here is the solution and working code. Now to clean up my data and see how this runs with my 'not so big' data hehe!
$db = new PDO($dsn, $db_user, $db_pass);
$query = $db->prepare("SELECT brand
FROM transactions
WHERE
id IN (SELECT id FROM transactions WHERE brand = :brand1)
AND brand <> :brand1
GROUP BY brand
ORDER BY COUNT(*) DESC
LIMIT 10");
$query->bindparam(":brand1", $brand);
$query->execute();
echo "<table>";
while($row = $query->fetch(PDO::FETCH_ASSOC)) {
echo "<tr><td>".$row['brand']."</td</tr>";
}
echo "</table>";
To put into better context, I have transaction level sales data for which I want to do a very simple brand level basket analysis/affinity analysis.
EDIT:: actual schema and example working data below.
On my page I will have a dropdown box which will select a brand. For the purposes of this question 'Brand1'. And then execute a query which lists the top 10 most occurring brands which also appear in the table with the same id as the one selected in the dropdown.
The output based on the data would be
brand2
brand4
brand3
brand5
The table consists of 3 million rows, so I don't think I can load the lot into memory. But even the query itsself I would know quite easily how to retrieve the top 10 most frequent values in a table. But to do it based on whether it shares and id with a variable is beyond my current level of skill.
So I call on you experts to help me to take my next step of being able to handle big data with php/mysql. How could I word such a query.
EDIT:: Attempt 1
$brand = "Brand1";
$db = new PDO($dsn, $db_user, $db_pass);
$query = $db->prepare("SELECT brand
FROM brand
WHERE
id IN (SELECT id FROM brand WHERE brand = :brand1)
AND brand <> :brand1
GROUP BY brand
ORDER BY COUNT(*) DESC
LIMIT 10");
$query->bindparam(":brand1", $brand);
$query->execute();
$row = $query->fetch(PDO::FETCH_ASSOC);
echo "<table>";
for($i=0;$i<10;$i++) {
echo "<tr><td>".$row['brand']."</td</tr>";
$i++;
}
echo "</table>";
The above returns, "Brand2" 5 times. (I'm only using small sample data like in my OP). Is it my loop that's the issue, because it did similar with both types of query suggested. Here is the schema for reference:
--
-- Database: `transactions`
--
-- --------------------------------------------------------
--
-- Table structure for table `brand`
--
CREATE TABLE `brand` (
`id` int(11) NOT NULL,
`brand` varchar(25) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Dumping data for table `brand`
--
INSERT INTO `brand` (`id`, `brand`) VALUES
(1, 'Brand1'),
(1, 'Brand1'),
(1, 'Brand2'),
(1, 'Brand3'),
(1, 'Brand4'),
(2, 'Brand1'),
(2, 'Brand2'),
(2, 'Brand3'),
(3, 'Brand1'),
(3, 'Brand2'),
(4, 'Brand1'),
(4, 'Brand2'),
(5, 'Brand1'),
(5, 'Brand2'),
(5, 'Brand4'),
(5, 'Brand5'),
(6, 'Brand2'),
(6, 'Brand3'),
(7, 'Brand1'),
(7, 'Brand2'),
(7, 'Brand3');
--
-- Indexes for dumped tables
--
--
-- Indexes for table `brand`
--
ALTER TABLE `brand`
ADD KEY `brand` (`id`,`brand`) USING BTREE;
I would express it as
SELECT brand
FROM brand
WHERE
id IN (SELECT id FROM brand WHERE brand = 'brand1')
AND brand <> 'brand1'
GROUP BY brand
ORDER BY COUNT(*) DESC
LIMIT 10;
This avoids the cost of a JOIN and removes the user selected brand that does not appear in your example result set.
As mentioned by Gondon Linoff, indexes might improve performance greatly.
In SQL, you can express this as:
select b.brand
from brand b join
brand b1
on b.id = b1.id and b1.brand = 1 and b1.brand <> b.brand
group by b.brand
order by count(*) desc
limit 10;
You'll get some benefit in performance from an index on brand(brand, id) as well as brand(id).
Depending on the data and user requirements, I'm not sure that you'll get the performance that you want from this query. But, first get the logic to work, then work on performance.
The SQL query below says "return only 10 records, start on record 16 (OFFSET 15)":
SELECT * FROM <YOURTABLE> LIMIT 10 OFFSET 15

Where in on an index

I have a mySql table:
id INT(10),
property_id INT(10),
value_id INT(10),
..
There's an index 'combination' on property_id + value_id
I have an array containing for example [1 => 68, 4 => 8, 9 => 15, ...]
Instead of this query:
SELECT * FROM table
WHERE (property_id = 1 && value_id = 68)
|| (property_id = 4 && value_id = 8)
|| (property_id = 9 && value_id = 15)
|| ...
i hoped something as this would work:
SELECT * FROM table WHERE combination IN ('1_68', '4_8', '9_15', ...)
I now know this does not work. But is there another way i can accomplish this?
In MySQL you can use tuples for conditions:
SELECT * FROM table
WHERE (property_id, value_id) IN (
(1, 68),
(4, 8),
(9, 15)
);
This is how it should work. But yes - MySQL doesn't use the index properly. We can just hope it will do some day (AFAIK it works for PostgreSQL).
If it is about performance and you need it now, then you might consider to use an indexed virtual (generated) column (available in MySQL 5.7.8).
ALTER TABLE `locations`
ADD COLUMN `combination` VARCHAR(21) GENERATED ALWAYS AS CONCAT(property_id, '_', value_id),
ADD INDEX `combination` (`combination`);
And now you can use your query
SELECT * FROM table WHERE combination IN ('1_68', '4_8', '9_15', ...)
If you want to save some memory you can combine two INTs into one BIGINT
ADD COLUMN `combination` VARCHAR(21) GENERATED ALWAYS AS ((property_id << 32) + value_id)
You can also just use UNION ALL
SELECT * FROM table WHERE (property_id, value_id) = (1,68)
UNION ALL
SELECT * FROM table WHERE (property_id, value_id) = (4,8)
UNION ALL
SELECT * FROM table WHERE (property_id, value_id) = (9,15)
This will be fast. It's a shame that MySQL isn't doing that trivial optimisation.
Can you use this :
SELECT * FROM table WHERE concat(property_id,'_',value_id) IN ('1_68', '4_8', '9_15', ...)

Remove duplicates in MySQL table - set group_id when city_id is the same

I have table units in my database. In schema I have fields id, unit_id, group_id, city_id.
For simple I have 3 units:
(1, 1, 1, 1)
(2, 1, 2, 1)
(3, 1, 3, 2)
How can I remove useless groups id, when city id is the same. I have next result:
(1, 1, 1, 1)
(2, 1, 1, 1)
(3, 1, 3, 2)
I know how do this in PHP, but I think 'maybe MySQL has inbuild functions which i don't know' ;)
Regards
if I understand your question correctly you want to all group_id have same value from the same city_id. Basically your first table in question is what you have and the second one is desired result. If that's the case your query could look like this:
UPDATE table1
INNER JOIN (SELECT * FROM table1 GROUP BY city_id) AS tx
ON table1.city_id = tx.city_id
SET table1.group_id = tx.group_id;
Here is the SQL Fiddle to see how it's work.
If you want to completely remove values and to hold only distinct city_id then you can do that with query like this:
DELETE table1 FROM table1
INNER JOIN (SELECT * FROM table1 GROUP BY city_id) AS tx
ON table1.city_id = tx.city_id
WHERE table1.group_id != tx.group_id;
Here is SQL Fiddle for that!
In this case your result table will be without row with id 2...
GL!
If I understand correctly, you want to delete rows where group_id and city_id are equal? If so, it's very simple:
DELETE FROM units WHERE group_id = city_id
Okay, my solution:
UPDATE `ingame_units` INNER JOIN `ingame_groups` g1 ON `ingame_units`.`group_id`=g1.`id` LEFT JOIN `ingame_groups` g2 ON `ingame_units`.`group_id`<>g2.`id` AND g1.`city_id`=g2.`city_id` AND g1.`id`>g2.`id` AND g1.`game_id`=g2.`game_id` SET `ingame_units`.`group_id`=IFNULL(g2.`id`,g1.`id`)
Thanks one man to minus my post and don't try to help me. Regards :)

Mysql query "WHERE 2 IN (`column`)" not working

I want to execute a query where I can find one ID in a list of ID.
table user
id_user | name | id_site
-------------------------
1 | james | 1, 2, 3
1 | brad | 1, 3
1 | suko | 4, 5
and my query (doesn't work)
SELECT * FROM `user` WHERE 3 IN (`id_site`)
This query work (but doesn't do the job)
SELECT * FROM `user` WHERE 3 IN (1, 2, 3, 4, 6)
That's not how IN works. I can't be bothered to explain why, just read the docs
Try this:
SELECT * FROM `user` WHERE FIND_IN_SET(3,`id_site`)
Note that this requires your data to be 1,2,3, 1,3 and 4,5 (ie no spaces). If this is not an option, try:
SELECT * FROM `user` WHERE FIND_IN_SET(3,REPLACE(`id_site`,' ',''))
Alternatively, consider restructuring your database. Namely:
CREATE TABLE `user_site_links` (
`id_user` INT UNSIGNED NOT NULL,
`id_site` INT UNSIGNED NOT NULL,
PRIMARY KEY (`user_id`,`site_id`)
);
INSERT INTO `user_site_links` VALUES
(1,1), (1,2), (1,3),
(2,1), (2,3),
(3,4), (3,5);
SELECT * FROM `user` JOIN `user_site_links` USING (`id_user`) WHERE `id_site` = 3;
Try this: FIND_IN_SET(str,strlist)
NO! For relation databases
Your table doesn't comfort first normal form ("each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain") of a database and you:
use string field to contain numbers
store multiple values in one field
To work with field like this you would have to use FIND_IN_SET() or store data like ,1,2,3, (note colons or semicolons or other separator in the beginning and in the end) and use LIKE "%,7,%" to work in every case. This way it's not possible to use indexes[1][2].
Use relation table to do this:
CREATE TABLE user_on_sites(
user_id INT,
site_id INT,
PRIMARY KEY (user_id, site_id),
INDEX (user_id),
INDEX (site_id)
);
And join tables:
SELECT u.id, u.name, uos.site_id
FROM user_on_sites AS uos
INNER JOIN user AS u ON uos.user_id = user.id
WHERE uos.site_id = 3;
This way you can search efficiently using indexes.
The problem is that you are searching within several lists.
You need something more like:
SELECT * FROM `user` WHERE id_site LIKE '%3%';
However, that will also select 33, 333 and 345 so you want some more advanced text parsing.
The WHERE IN clause is useful to replace many OR conditions.
For exemple
SELECT * FROM `user` WHERE id IN (1,2,3,4)
is cleaner than
SELECT * FROM `user` WHERE id=1 OR id=2 OR id=3 OR id=4
You're just trying to use it in a wrong way.
Correct way :
WHERE `field` IN (list_item1, list_item2 [, list_itemX])

What mysql query would let me get aggregated stats, by month from this table?

I have a points system setup on my site, where every single point accumulated is logged in the points table. The structure is simple, p_userid, p_points (how many points accumulated during this action), and p_timestamp.
I wanna display top 3 point accumulating users, for each month. So essentially, it should sum the p_points table for the month, for each user id, and display the top 3 users, grouped into months. The user ids will be joined to a users table, to get actual user names.
What would be the best way to do it? I use php/mysql.
EDIT:
As a possible solution, I could create another column, and log YYYY-MM into it, and simply group it based on that, but thats more data I gotta log, for an already huge table.
EDIT 2:
Data stored as such
INSERT INTO `points` (`point_userid`, `point_points`, `point_code`, `point_date`) VALUES
(8465, 20, 3, 1237337627),
(46745, 20, 3, 1237337678),
(7435, 20, 3, 1237337733),
(46565, 20, 3, 1237337802),
(4466, 20, 3, 1237337836),
(34685, 20, 3, 1237337885),
(8544, 20, 3, 1237337908),
(6454, 20, 3, 1237337998),
(45765, 20, 3, 1237338008),
(3476, 20, 3, 1237338076);
This isn't easy in MySQL.
First you need to create a table of variables, one for storing the current group, and one for storing the current row number in the group. Initialize them both to NULL.
Then iterate group by month and select all rows ordered by score and select the current rown number and increase it. If the group changes, reset the row number to one.
Then put all this in a subselect and in the outer select, select all rows with rownumber <= 3.
You could use this query:
SELECT month, p_userid, points FROM (
SELECT
*,
(#rn := CASE WHEN month = #last_month THEN #rn + 1 ELSE 1 END) AS rn,
(#last_month := month)
FROM (
SELECT p_userid, month(p_timestamp) AS month, SUM(p_points) AS points
FROM Table1, (SELECT #last_month := NULL, #rn := 0) AS vars
GROUP BY p_userid, month(p_timestamp)
ORDER BY month, points DESC
) AS T1
) AS T2
WHERE rn <= 3
Result:
Month User Score
1 4 7
1 3 5
1 2 4
2 4 17
2 5 10
2 3 6
Test data:
CREATE TABLE Table1 (p_userid INT NOT NULL,
p_points INT NOT NULL,
p_timestamp TIMESTAMP NOT NULL);
INSERT INTO Table1 (p_userid, p_points, p_timestamp) VALUES
(1, 1, '2010-01-01'),
(1, 2, '2010-01-02'),
(1, 3, '2010-02-01'),
(2, 4, '2010-01-01'),
(3, 5, '2010-01-01'),
(3, 6, '2010-02-01'),
(4, 7, '2010-01-01'),
(4, 8, '2010-02-01'),
(4, 9, '2010-02-02'),
(5, 10, '2010-02-02');
Hm,
Too simple?
SELECT COUNT(tb1.p_points) as total_points, tb1.p_userid, tb1.p_timestamp, tb2.username
FROM tb1, tb2
WHERE tb1.p_userid = tb2.username AND p_timestamp BETWEEN 'start_of_date' AND 'end_of_month'
GROUP BY p_userid
ORDER BY total_points DESC LIMIT 3
Syntax might be a little bit out (relatively new to SQL) - wouldn't iterating through a query like this get the result you're looking for? Must admit that Mark's response makes me think this definitely is too simple but figured I'd let you see it anyway.
I'm plpgsql addicted and I don't know if something simmilar can work in MySQL, and how PHP will get results (I don't know if multiple queries will be taken as UNION) but few tests were promising.
CREATE PROCEDURE topusers(OUT query TEXT) BEGIN
DECLARE time TIMESTAMP;
SELECT MIN(CONCAT(EXTRACT(YEAR_MONTH FROM FROM_UNIXTIME(p_timestamp)), '01')) INTO time FROM t;
SET #query = '';
REPEAT
SET #query = CONCAT(#query, '(SELECT SUM(p_points) as total_points, p_userid, ', UNIX_TIMESTAMP(time), '
FROM t
WHERE p_timestamp BETWEEN ', UNIX_TIMESTAMP(time), ' AND ', UNIX_TIMESTAMP(ADDDATE(time, INTERVAL 1 MONTH)), '
GROUP BY p_userid
ORDER BY total_points DESC LIMIT 3)');
SELECT ADDDATE(time, INTERVAL 1 MONTH) INTO time;
IF time < NOW() THEN
SET #query=CONCAT(#query, ' UNION ');
END IF;
UNTIL time > NOW() END REPEAT;
SELECT #query INTO query;
END//
And query
CALL topusers(#query); PREPARE stmt1 FROM #q; EXECUTE stmt1;
and at the end
DEALLOCATE PREPARE stmt1;

Categories