I'm trying to perform a subquery in Laravel to get some relevant user data, but the data may be in one of two tables. I have a table for test_activity and live_activity which both have a created_at column. My goal is to get the oldest record from a combination of the data from the two tables.
Sample data:
CREATE TABLE users (
id INTEGER,
first_name TEXT,
last_name TEXT
);
INSERT INTO users (id, first_name, last_name)
VALUES (1, 'Craig', 'Smith'), (2, 'Bill', 'Nye'), (3, 'Bloop', 'Blop');
CREATE TABLE test_activity (
id INTEGER,
user_id INTEGER,
created_at DATE
);
INSERT INTO test_activity (id, user_id, created_at)
VALUES (1, 1, '2019-04-29'), (2, 2, '2019-03-28'), (3, 3, '2019-04-28');
CREATE TABLE live_activity (
id INTEGER,
user_id INTEGER,
created_at DATE
);
INSERT INTO live_activity (id, user_id, created_at)
VALUES (1, 1, '2019-04-27'), (2, 2, '2019-03-29'), (3, 3, '2019-04-27');
Here is how I am trying the query with Laravel:
$firstActivity = TestActivity::select('created_at')
->whereColumn('user_id', 'id');
$firstActivity = LiveActivity::select('created_at')
->whereColumn('user_id', 'id')
->union($firstActivity)
->limit(1)
->getQuery();
$users = Users::select(['id', 'first_name', 'last_name'])
->whereIn('id', $arrayOfIds)
->selectSub($firstActivity, 'start_date')
->paginate(25);
Here is the query as it's being executed and throwing an error:
select
`id`,
`first_name`,
`last_name`,
((select `created_at`
from `live_activity`
where `user_id` = `id`)
union
(select `created_at`
from `test_activity`
where `user_id` = `id`
limit 1) as `start_date`
from `users`
where `id` in (....)
limit 25
offset 0)
The error I get is this
Query 1 ERROR: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'union
How can I make this work? And, is there a better, more efficient way to perform this kind of query with Laravel? Thank you.
UPDATE:
I've provided sample data for anyone willing to help. Here is the query I am now trying to run:
select
id,
first_name,
last_name,
(select created_at
from test_activity
where user_id = users.id
union
select created_at
from live_activity
where user_id = users.id
order by created_at asc
limit 1) as start_date
from users;
How can I convert this to eloquent?
In a lot of cases, joins are faster than sub-queries. I recommend working with sql, and testing what is faster before writing it in Laravel, since you aren't using the ORM anyway, and then just rewrite the best query using the query building language.
The join approach would join all three columns, using join where MAX(created_at) on both the testactivity and liveactivity and then using select GREATEST(test_Created_at, live_create_at) as start_date
Related
I'm trying to achieve something in Laravel/MySQL and cannot seem to be pointed in the right direction for a solution. I can achieve what I am looking for with subqueries, but I have been told they are not as efficient as joins. And, I'm going to have to convert the solution for this into Eloquent/Query Builder, and the way I have it working with subqueries and unions doesn't seem to convert easily.
What I am trying to do is select one row from two possible tables, based on the created_at date of the row. I want to join this created_at value with my users table as a new column called started_at. Here is some sample data and how I can achieve the query with a subquery/union of the two possible tables that I can get the data from:
CREATE TABLE users (
id INTEGER,
first_name TEXT,
last_name TEXT
);
INSERT INTO users (id, first_name, last_name)
VALUES
(1, 'Craig', 'Smith'),
(2, 'Bill', 'Nye'),
(3, 'Bloop', 'Blop');
CREATE TABLE old_activity (
id INTEGER,
user_id INTEGER,
firm_id INTEGER,
amount INTEGER,
created_at DATE
);
INSERT INTO old_activity (id, user_id, firm_id, amount, created_at)
VALUES
(1, 1, 3, 5.24, '2019-04-29'),
(2, 2, 7, 4, '2019-03-28'),
(3, 3, 4, 6.99, '2019-04-28');
CREATE TABLE new_activity (
id INTEGER,
user_id INTEGER,
firm_id INTEGER,
plays INTEGER,
saves INTEGER,
created_at DATE
);
INSERT INTO new_activity (id, user_id, firm_id, plays, saves, created_at)
VALUES
(1, 1, 3, 10, 1, '2019-04-27'),
(2, 2, 3, 12, 2, '2019-03-29'),
(3, 3, 3, 6, 3, '2019-04-27');
CREATE TABLE firms (
id INTEGER,
name TEXT
);
INSERT INTO firms (id, name)
VALUES
(1, 'apple'),
(2, 'banana'),
(3, 'orange');
select
id,
first_name,
last_name,
(select created_at from old_activity
where user_id = users.id
union
select created_at from new_activity
where user_id = users.id
order by created_at asc
limit 1) as started_at
from users
The query should only return the oldest created_at for a particular user in one of the two activity tables.
How can I achieve this with a join? Any help with this would be greatly appreciated.
Hmmm . . . You could always use window functions:
select u.*, a.*
from users u left join
(select a.*,
row_number() over (partition by a.user_id order by a.created_at desc) as seqnum
from ((select oa.* from old_activity oa) union all
(select na.* from new_activity na)
) a
) a
on a.user_id = a.id and a.seqnum = 1
Schemas
// First table
CREATE TABLE assignments (
id int,
uid int,
comments varchar(255),
assignmentdate date,
status int
);
INSERT INTO assignments (id, uid, comments, assignmentdate, status)
values (1, 6, 'a', '2019-07-15', 0), (2, 6, 'ab', '2019-07-15', 0),
(3, 6, 'abc', '2019-07-14', 0), (4, 6, 'abc', '2019-07-14', 1)
, (5, 7, 'xyz', '2019-07-14', 1), (6, 7, 'zyx', '2019-07-14', 1);
// Second table
CREATE TABLE users (
id int,
username varchar(255),
status int
);
INSERT INTO users (id, username, status)
values (6, 'user1', 0), (7, 'user2', 0),
(8, 'user3', 1);
// Third table
CREATE TABLE user_images (
id int,
uid int,
imagename varchar(255),
status int
);
INSERT INTO user_images (id, uid, imagename, status)
values (1, 6, 'abc.jpeg', 0), (2, 6, 'def.jpeg', 0), (3, 8, 'ghi.png', 1);
what I'm looking for here is to get
1) distinct and latest row of table assignments which,
2) joins the table users and get a row and then joins,
3) distinct and latest row of table user_images.
So far i have gone through this answer
My trial query:
SELECT
p.*,
u.username,
groupedpi.*
FROM
assignments p
INNER JOIN(
SELECT
comments,
MAX(id) AS latest
FROM
assignments
WHERE
STATUS
= 0
GROUP BY
uid
) AS groupedp
ON
groupedp.latest = p.id
LEFT JOIN users u ON
p.uid = u.id AND u.status = 0
LEFT JOIN(
SELECT
uid,
MAX(id) AS latesti,
imagename
FROM
user_images us
WHERE
STATUS = 0
GROUP BY
uid
order by id desc LIMIT 1
) AS groupedpi
ON
groupedpi.uid = p.uid
Output:
The 3rd result I'm not getting, i.e I'm not getting the distinct and latest record of the third table while joining.
Instead of abc.jpeg, I want to get def.jpeg.
MySQL is tripping you up here, because it automatically adds columns to GROUP BY if they aren't specified, so it's grouping the groupedpi subquery on imagename too - this will lead to duplicated rows. Remove the imagename column from the subquery (and the order by clause is irrelevant too) and have it just output the userid and the max image id
If you want the image name, join the images table in again on images.id = groupedpi.latesti (In the main query not the subquery that is finding the latest image id)
(Note that your screenshot says lastesti 2 but imagename abc- it's not the right pairing. ID 2 is def.jpg. When you want latest Id but also other data from the same row you can't do it in one hit unless you use an analytic (mysql8+) - you have to write a subquery that finds the max id and then join it back to the same table to get the rest of the data off that row)
EDIT:: SOLVED I was using a for loop when a while loop was the correct option to print the results. Many thanks to all for contributing below.. I have left all steps below for reference but here is the solution and working code. Now to clean up my data and see how this runs with my 'not so big' data hehe!
$db = new PDO($dsn, $db_user, $db_pass);
$query = $db->prepare("SELECT brand
FROM transactions
WHERE
id IN (SELECT id FROM transactions WHERE brand = :brand1)
AND brand <> :brand1
GROUP BY brand
ORDER BY COUNT(*) DESC
LIMIT 10");
$query->bindparam(":brand1", $brand);
$query->execute();
echo "<table>";
while($row = $query->fetch(PDO::FETCH_ASSOC)) {
echo "<tr><td>".$row['brand']."</td</tr>";
}
echo "</table>";
To put into better context, I have transaction level sales data for which I want to do a very simple brand level basket analysis/affinity analysis.
EDIT:: actual schema and example working data below.
On my page I will have a dropdown box which will select a brand. For the purposes of this question 'Brand1'. And then execute a query which lists the top 10 most occurring brands which also appear in the table with the same id as the one selected in the dropdown.
The output based on the data would be
brand2
brand4
brand3
brand5
The table consists of 3 million rows, so I don't think I can load the lot into memory. But even the query itsself I would know quite easily how to retrieve the top 10 most frequent values in a table. But to do it based on whether it shares and id with a variable is beyond my current level of skill.
So I call on you experts to help me to take my next step of being able to handle big data with php/mysql. How could I word such a query.
EDIT:: Attempt 1
$brand = "Brand1";
$db = new PDO($dsn, $db_user, $db_pass);
$query = $db->prepare("SELECT brand
FROM brand
WHERE
id IN (SELECT id FROM brand WHERE brand = :brand1)
AND brand <> :brand1
GROUP BY brand
ORDER BY COUNT(*) DESC
LIMIT 10");
$query->bindparam(":brand1", $brand);
$query->execute();
$row = $query->fetch(PDO::FETCH_ASSOC);
echo "<table>";
for($i=0;$i<10;$i++) {
echo "<tr><td>".$row['brand']."</td</tr>";
$i++;
}
echo "</table>";
The above returns, "Brand2" 5 times. (I'm only using small sample data like in my OP). Is it my loop that's the issue, because it did similar with both types of query suggested. Here is the schema for reference:
--
-- Database: `transactions`
--
-- --------------------------------------------------------
--
-- Table structure for table `brand`
--
CREATE TABLE `brand` (
`id` int(11) NOT NULL,
`brand` varchar(25) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--
-- Dumping data for table `brand`
--
INSERT INTO `brand` (`id`, `brand`) VALUES
(1, 'Brand1'),
(1, 'Brand1'),
(1, 'Brand2'),
(1, 'Brand3'),
(1, 'Brand4'),
(2, 'Brand1'),
(2, 'Brand2'),
(2, 'Brand3'),
(3, 'Brand1'),
(3, 'Brand2'),
(4, 'Brand1'),
(4, 'Brand2'),
(5, 'Brand1'),
(5, 'Brand2'),
(5, 'Brand4'),
(5, 'Brand5'),
(6, 'Brand2'),
(6, 'Brand3'),
(7, 'Brand1'),
(7, 'Brand2'),
(7, 'Brand3');
--
-- Indexes for dumped tables
--
--
-- Indexes for table `brand`
--
ALTER TABLE `brand`
ADD KEY `brand` (`id`,`brand`) USING BTREE;
I would express it as
SELECT brand
FROM brand
WHERE
id IN (SELECT id FROM brand WHERE brand = 'brand1')
AND brand <> 'brand1'
GROUP BY brand
ORDER BY COUNT(*) DESC
LIMIT 10;
This avoids the cost of a JOIN and removes the user selected brand that does not appear in your example result set.
As mentioned by Gondon Linoff, indexes might improve performance greatly.
In SQL, you can express this as:
select b.brand
from brand b join
brand b1
on b.id = b1.id and b1.brand = 1 and b1.brand <> b.brand
group by b.brand
order by count(*) desc
limit 10;
You'll get some benefit in performance from an index on brand(brand, id) as well as brand(id).
Depending on the data and user requirements, I'm not sure that you'll get the performance that you want from this query. But, first get the logic to work, then work on performance.
The SQL query below says "return only 10 records, start on record 16 (OFFSET 15)":
SELECT * FROM <YOURTABLE> LIMIT 10 OFFSET 15
I have the next tables
Users {id, name}
Messages {id, user_id, cache_user_name}
What I want is to do a JOIN only when cache_user_name is NULL for performance reasons.
For example:
SELECT Messages.*, Users.name FROM Messages INNER JOIN Users ON (Messages.user_id = Users.id)
// ON (ISNULL(Messages.cache_user_name) AND ...
The best way is doing 2 queries? 1 for rows without cache (join) and the other for cached rows with a join?
[EDIT]
The result I need is:
Users
ID: 1, NAME: Wiliam
Messages
ID: 1, USER_ID: 1, CACHE_USER_NAME: Wiliam
ID: 2, USER_ID: 1, CACHE_USER_NAME: null
Result
ID: 1, USER_ID: 1, CACHE_USER_NAME: Wiliam, USERS.NAME: null // No join, faster
ID: 2, USER_ID: 1, CACHE_USER_NAME: null, USERS.NAME: Wiliam // Join
You can add WHERE ... IS NULLclause.
The optimizer will (try to) use the best performing plan.
SELECT Messages.*
, Users.name
FROM Messages
INNER JOIN Users ON (Messages.user_id = User.id)
WHERE Users.cache_user_name IS NULL
Edit
Given following data, what would you expect as output?
DECLARE #Users TABLE (ID INTEGER, Name VARCHAR(32))
DECLARE #Messages TABLE (ID INTEGER, User_ID INTEGER, Cache_User_Name VARCHAR(32))
INSERT INTO #Users VALUES (1, 'Wiliam')
INSERT INTO #Users VALUES (2, 'Lieven')
INSERT INTO #Users VALUES (3, 'Alexander')
INSERT INTO #Messages VALUES (1, 1, NULL)
INSERT INTO #Messages VALUES (2, 1, 'Cached Wiliam')
INSERT INTO #Messages VALUES (3, 2, NULL)
INSERT INTO #Messages VALUES (4, 3, 'Cached Alexander')
SELECT *
FROM #Users u
INNER JOIN #Messages m ON m.User_ID = u.ID
WHERE m.Cache_User_name IS NULL
SELECT m.Id, m.user_id, CACHE_USER_NAME user_name
FROM messages m
WHERE CACHE_USER_NAME IS NOT NULL
UNION ALL
SELECT m.Id, m.user_id, u.user_name user_name
FROM (Select * from messages Where cache_user_name IS NULL) m
JOIN users ON (u.user_id = m.user_id)
Anyway best approach store cache_user_name in table message during creating message. Then you will need join at all.
I think those joins in previous answers with a Not Null where clause should work fine, but maybe we're not following your in-efficiencies problem. As long as users.id and messages.user_id are indexed and of the same type, that join shouldn't be slow unless you have a huge user database and lots of messages. Throw more hardware at it if it is; likely you are running a lot of traffic and can afford it. :)
Alternatively, you could handle it like this: do a query on Messages where the name isn't null, run through the results, find the names for each message (and put them in an array), then query the User's table for just those names. Then as you loop over the Messages results you can display the proper name from the array you saved. You'll have two queries, but they'll be fast.
$users = $messages = $users_ids = array ();
$r = mysql_query('select * from Messages where cache_user_name is not null');
while ( $rs = mysql_fetch_array($r, MYSQL_ASSOC) )
{
$user_ids[] = $rs['user_id'];
$messages[] = $rs;
}
$user_ids = implode ( ',', $user_ids );
$u = mysql_query("select * from Users where id in ($users)");
while ( $rs = mysql_fetch_array($r, MYSQL_ASSOC) )
{
$users[$rs['id']] = $rs['name'];
}
foreach ( $messages as $message )
{
echo "message {$message['id']} authored by " . $users[$message['user_id']] . "<br />\n";
}
I'm trying to join the NAME and PHOTO from USERS table to the TRANSACTIONS table based on who is the payer or payee. It keeps telling me can't find the table this -- What am I doing wrong?
SELECT `name`,`photo`,`amount`,`comment`,
(
CASE `payer_id`
WHEN 72823 THEN `payee_id`
ELSE `payer_id`
END
) AS `this`
FROM `transactions`
RIGHT JOIN `users` ON (`users`.`id`=`this`)
WHERE `payee_id`=72823 OR `payer_id`=72823
From the documentation about aliases:
The alias is used as the expression's column name and can be used in GROUP BY, ORDER BY, or HAVING clauses.
You can't use an alias in a join. You can use it only in the places listed above. The reason is that the alias is on a field in the result of the join. If the join were allowed to these aliases in its definition it would (or could) result in recursive definitions.
To solve your problem you could repeat the CASE clause in both places:
SELECT `name`,`photo`,`amount`,`comment`,
(
CASE `payer_id`
WHEN 72823 THEN `payee_id`
ELSE `payer_id`
END
) AS `this`
FROM `transactions`
RIGHT JOIN `users` ON `users`.`id`= (
CASE `payer_id`
WHEN 72823 THEN `payee_id`
ELSE `payer_id`
END
)
WHERE `payee_id`=72823 OR `payer_id`=72823
However I would probably rewrite this query as two selects and UNION them:
SELECT name, photo, amount, comment, payer_id AS this
FROM transactions
JOIN users ON users.id = payer_id
WHERE payee_id = 72823
UNION ALL
SELECT name, photo, amount, comment, payee_id AS this
FROM transactions
JOIN users ON users.id = payee_id
WHERE payer_id = 72823
Result:
'name3', 'photo3', 30, 'comment3', 3
'name1', 'photo1', 10, 'comment1', 1
'name2', 'photo2', 20, 'comment2', 2
Test data:
CREATE TABLE users (id INT NOT NULL, name NVARCHAR(100) NOT NULL, photo NVARCHAR(100) NOT NULL);
INSERT INTO users (id, name, photo) VALUES
(1, 'name1', 'photo1'),
(2, 'name2', 'photo2'),
(3, 'name3', 'photo3'),
(4, 'name4', 'photo4');
CREATE TABLE transactions (amount INT NOT NULL, comment NVARCHAR(100) NOT NULL, payer_id INT NOT NULL, payee_id INT NOT NULL);
INSERT INTO transactions (amount, comment, payer_id, payee_id) VALUES
(10, 'comment1', 72823, 1),
(20, 'comment2', 72823, 2),
(30, 'comment3', 3, 72823),
(40, 'comment4', 4, 5);
SELECT
th.id, th.coin_id, th.coin_family, cm.coin_id, cm.current_price
FROM
trnx_history th
JOIN
fmi_coins.coins_markets cm
ON
cm.coin_id=(CASE th.coin_family WHEN 1 THEN 1 ELSE 2 END)