Fastest way to join to an IP2C table? - php

I have three tables overall, one with player names and their last login, and another table with the player name and their IP. These are from a game server, but it's two separate "plugins" of the server, so I cannot merge these into one table.
I successfully join these two on the playername column like so:
SELECT
u.`user` as `ign`,
lb.`lastlogin` as `date`,
lb.`ip`
FROM `mcmmo_users` u
LEFT JOIN `lb-players` lb
ON u.`user`=lb.`playername`
These produce the following array: Array(ign,date,ip);
However, I have an IP2C (IP-Country) table as well, and I would like to get these results at the same time. However, this table is extremely large, and would heavily slow down the query if I did a standard LEFT JOIN.
Is there a quicker way to join this? I would prefer to not query on every PHP loop of the data.
I am using MySQL and PHP
The IP2C database is layed out as follows:
begin_ip | end_ip | begin_ip_num | end_ip_num | country_code | country_name
And is queried as follows:
$IPNUM = sprintf("%u",ip2long($ip));
SELECT `country_code`
FROM `cpanel_ip2c`
WHERE `$IPNUM` BETWEEN `begin_ip_num` AND `end_ip_num`

A between condition is hard to optimize for a database. Instead, consider querying for the first IP block that is greater or equal to the user's IP:
select *
from mcmmo_users u
left join
`lb-players` lb
on u.user = lb.playername
left join
cpanel_ip2c ip
on ip.begin_ip_num =
(
select begin_ip_num
from cpanel_ip2c ip
where ip.begin_ip_num <= inet_aton(lb.ip)
order by
ip.begin_ip_num desc
limit 1
)
and inet_aton(lb.ip) <= ip.end_ip_num
With an index on cpanel_ip2c(begin_ip_num ), the country can be resolved with an index seek.
Here's an example on SQL Fiddle, with the mcmmo_users table omitted for simplicity.

Related

Optimal joins in MySQL or offloading to application layer

I have 3 tables in a MySQL database: courses, users and participants, which contains about 30mil, 30k and 3k entries respectively.
My goal is to (efficiently) figure out the number of users that have been assigned to courses that matches our criteria. The criteria is a little more complex, but for this example we only care about users where deleted_at is null and courses where deleted_at is null and active is 1.
Simplified these are the columns:
users:
id
deleted_at
1
null
2
2022-01-01
courses:
id
active 
deleted_at
1
1
null
1
1
2020-01-01
2
0
2020-01-01
participants:
id
participant_id 
course_id
1
1
1
2
1
2
3
2
2
Based on the data above, the number we would get would be 1 as only user 1 is not deleted and that user assigned to some course (id 1) that is active and not deleted.
Here is a list of what I've tried.
Joining all the tables and do simple where's.
Joining using subqueries.
Pulling the correct courses and users out to the application layer (PHP), and querying participants using WHERE IN.
Pulling everything out and doing the filtering in the application layer.
Calling using EXPLAIN to add better indexes - I, admittedly, do not do this often and may not have done this well enough.
A combination of all the above.
An example of a query would be:
SELECT COUNT(DISTINCT participant_id)
FROM `participants`
INNER JOIN
(SELECT `courses`.`id`
FROM `courses`
WHERE (`active` = '1')
AND `deleted_at` IS NULL) AS `tempCourses` ON `tempCourses`.`id` = `participants`.`course_id`
WHERE `participant_type` = 'Eloomi\\Models\\User'
AND `participant_id` in
(SELECT `users`.`id`
FROM `users`
WHERE `users`.`deleted_at` IS NULL)
From what I can gather doing this will create a massive table, which only then will start applying where's. In my mind it should be possible to short circuit a lot of that because once we get a match for a user, we can disregard that going forward. That would be how to handle it, in my mind, in the application layer.
We could do this on a per-user basis in the application layer, but the number of requests to the database would make this a bad solution.
I have tagged it as PHP as well as MySQL, not because it has to be PHP but because I do not mind offloading some parts to the application layer if that is required. It's my experience that joins do not always use indexes optimally
Edit:
To specify my question: Can someone help me provide a efficient way to pull out the number of non-deleted users that have been assigned to to active non-deleted courses?
I would write it this way:
SELECT COUNT(DISTINCT p.participant_id)
FROM courses AS c
INNER JOIN participants AS p
ON c.id = p.course_id
INNER JOIN users AS u
ON p.participant_id = u.id
WHERE u.deleted_at IS NULL
AND c.active = 1 AND c.deleted_at IS NULL
AND p.participant_type = 'Eloomi\\Models\\User';
MySQL may join the tables in another order, not the order you list the tables in the query.
I hope that courses is the first table MySQL accesses, because it's probably the smallest table. Especially after filtering by active and deleted_at. The following index will help to narrow down that filtering, so only matching rows are examined:
ALTER TABLE courses ADD KEY (active, deleted_at);
Every index implicitly has the table's primary key (e.g. id) appended as the last column. That column being part of the index, it is used in the join to participants. So you need an index in participants that the join uses to find the corresponding rows in that table. The order of columns in the index is important.
ALTER TABLE participants ADD KEY (course_id, participant_type, participant_id);
The participant_id is used to join to the users table. MySQL's optimizer will probably prefer to join to users by its primary key, but you also want to restrict that by deleted_at, so you might need this index:
ALTER TABLE users ADD KEY (id, deleted_at);
And you might need to use an index hint to coax the optimizer to prefer this secondary index over the primary key index.
SELECT COUNT(DISTINCT p.participant_id)
FROM courses AS c
INNER JOIN participants AS p
ON c.id = p.course_id
INNER JOIN users AS u USE INDEX(deleted_at)
ON p.participant_id = u.id
WHERE u.deleted_at IS NULL
AND c.active = 1 AND c.deleted_at IS NULL
AND p.participant_type = 'Eloomi\\Models\\User';
MySQL knows how to use compound indexes even if some conditions are in join clauses and other conditions are in the WHERE clause.
Caveat: I have not tested this. Choosing indexes may take several tries, and testing the EXPLAIN after each try.

MySQL join query duplicates users in output

I have the following tables
ea_users
id
first_name
last_name
email
password
id_roles
ea_user_cfields
id
c_id = custom field ID
u_id = user ID
data
ea_customfields
id
name = name of custom field
description
I want to get all users which have a certain role, but I also want to retrieve all the custom fields per user. This is for the backend of my software where all the ea_users and custom fields should be shown.
I tried the following, but for each custom field, it duplicates the same user
$this->db->join('(SELECT GROUP_CONCAT(data) AS custom_data, id AS dataid, u_id, c_id
FROM ea_user_cfields userc
GROUP BY id) AS tt', 'tt.u_id = ea.id','left');
$this->db->join('(SELECT GROUP_CONCAT(name) AS custom_name, id AS customid
FROM ea_customfields AS cf
GROUP BY id) AS te', 'tt.c_id = te.customid','left');
$this->db->where('id_roles', $customers_role_id);
return $this->db->get('ea_users ea')->result_array();
the problem that u did not understand properly how join works.
its ok, that u have duplicates in select when u have relation one to many.
in few words your case: engine tries to fetch data from table "A" (ea_users) then JOIN according to the conditions another table "B" (ea_customfields). If u have one to many relation between tables (it means that one record from table "A" (lets say that we have in this table A1 record) can contain few related rows in table "B", lets call them as B1.1, B1.2 and B1.3 and B1.4), in this case it will join this records and put join result in memory. So in memory u would see something like
| FromTable A | FromTableB |
| A1 | B1.1 |
| A1 | B1.2 |
| A1 | B1.3 |
| A1 | B1.4 |
if u have 10 records in table "B", which related to the table "A" it would put 10 times in memory copy of data from table "A" during fetching. And then will render it to u.
depending on join type rows, with missing related records, can be skipped at all (INNER JOIN), or can be filled up with NULLs (LEFT JOIN or RIGHT JOIN), etc.
When u think about JOINs, try to imagine yourself, when u try to join on the paper few big tables. U would always need to mark somehow which data come from which table in order to be able to operate with it later, so its quite logically to write row "A1" from table "A" as many times as u need to fill up empty spaces when u find appropriate record in table "B". Otherwise u would have on your paper something like:
| FromTable A | FromTableB |
| A1 | B1.1 |
| | B1.2 |
| | B1.3 |
| | B1.4 |
Yes, its looks ok even when column "FromTable A" contains empty data, when u have 5-10 records and u can easily operate with it (for example u can sort it in your head - u just need to imagine what should be instead of empty space, but for it, u need to remember all the time order how did u wrote the data on the paper). But lets assume that u have 100-1000 records. if u still can sort it easily, lets make things more complicated and tell, that values in table "A" can be empty, etc, etc.. Thats why for mysql engine simpler to repeat many times data from table..
Basically, I always stick to examples when u try to imagine how would u join huge tables on paper or will try to select something from this tables and then make sorting there or something, how would u look through the tables, etc.
GROUP_CONCAT, grouping
Then, next mistake, u did not understand how GROUP_CONCAT works:
The thing is that mysqlEngine fetch on the first step structure into memory using all where conditions, evaluating subqueries + appends all joins. When structure is loaded, it tried to perform GROUPing. It means that it will select from temporary table all rows related to the "A1". Then will try to apply aggregation function to selected data. GROUP_CONCAT function means that we want to apply concatenation on selected group, thus we would see something like "B1.1, B1.2, B1.3, B1.4". Its in few words, but I hope it will help a little to understand it.
I googled table structure so u can write some queries there.
http://www.mysqltutorial.org/tryit/query/mysql-left-join/#1
and here is example how GROUP_CONCAT works, try to execute there query:
SELECT
c.customerNumber, c.customerName, GROUP_CONCAT(orderNumber) AS allOrders
FROM customers c
LEFT JOIN orders o ON (c.customerNumber = o.customerNumber)
GROUP BY 1,2
;
can compare with results with previous one.
power of GROUP in aggregation functions which u can use with it. For example, u can use "COUNT()", "MAX()", "GROUP_CONCAT()" or many many others.
or example of fetching of count (try to execute it):
SELECT c.customerName, count(*) AS ordersCount
FROM customers AS c
LEFT JOIN orders AS o ON (c.customerNumber = o.customerNumber)
GROUP BY 1
;
so my opinion:
simpler and better to solve this issue on client side or on backend, after fetching. because in term of mysql engine response with duplication in column is absolutely correct. BUT of course, u can also solve it using grouping with concatenations for example. but I have a feeling that for your task its overcomplicating of logic
PS.
"GROUP BY 1" - means that I want to group using column 1, so after selecting data into memory mySql will try to group all data using first column, better not to use this format of writing on prod. Its the same as "GROUP BY c.customerNumber".
PPS. Also I read comments like "use DISTINCT", etc.
To use DISTINCT or order functions, u need to understand how does it work, because of incorrect usage it can remove some data from your selection, (same as GROUP or INNER JOINS, etc). On the first look, you code might work fine, but it can cause bugs in logic, which is the most complicated to find out later.
Moreover DISTINCT will not help u, when u have one-to-many relation(in your particular case). U can try to execute queries:
SELECT
c.customerName, orderNumber AS nr
FROM customers c
INNER JOIN orders o ON (c.customerNumber = o.customerNumber)
WHERE c.customerName='Alpha Cognac'
;
SELECT
DISTINCT(c.customerName), orderNumber AS nr
FROM customers c
INNER JOIN orders o ON (c.customerNumber = o.customerNumber)
WHERE c.customerName='Alpha Cognac'
;
the result should be the same. Duplication in customer name column and orders numbers.
and example how to loose data with incorrect query ;):
SELECT
c.customerName, orderNumber AS nr
FROM customers c
INNER JOIN orders o ON (c.customerNumber = o.customerNumber)
WHERE c.customerName='Alpha Cognac'
GROUP BY 1
;

Retrieve values from multiple tables relationed

So, I have a table named clients, another one known as orders and other two, orders_type_a and orders_type_b.
What I'm trying to do is create a query that returns the list of all clients, and for each client it must return the number of orders based on this client's id and the amount of money this customer already spent.
And... I have no idea how to do that. I know the logic behind this, but can't find out how to translate it into a MySQL query.
I have a basic-to-thinkimgoodbutimnot knowledge of MySQL, but to this situation I've got really confused.
Here is a image to illustrate better the process I'm trying to do:
Useful extra information:
Each orders row have only one type (which is A or B)
Each orders row can have multiple orders_type_X (where X is A or B)
orders relate with client through the column client_id
orders_type_X relate with orders through the column order_id
This process is being made today by doing a query to retrieve clients, and then from each entry returned the code do another query (with php) to retrieve the orders and yet another one to retrieve the values. So basically for each row returned from the first query there is two others inside it. Needless to say that this is a horrible approach, the performance sucks and I thats the reason why I want to change it.
UPDATE width tables columns:
clients:
id | name | phone
orders:
id | client_id | date
orders_type_a:
id | order_id | number_of_items | price_of_single_item
orders_type_b:
id | order_id | number_of_shoes_11 | number_of_shoes_12 | number_of_shoes_13 | price_of_single_shoe
For any extra info needed, just ask.
If I understand you correctly, you are looking for something like this?
select c.*, SUM(oa.value) + SUM(ob.value) as total
from clients c
inner join orders o on c.order_id = o.id
inner join orders_type_a oa on oa.id = o.order_type_id AND o.type = 'A'
inner join orders_type_b ob on ob.id = o.order_type_id AND o.type = 'B'
group by c.id
I do not know your actual field names, but this returns the information on each customer plus a single field 'total' that contains the sum of the values of all the orders of both type A and type B. You might have to tweak the various names to get it to work, but does this get you in the right direction?
Erik's answer is on the right track. However, since there could be multiple orders_type_a and orders_type_b records for each order, it is a little more complex:
SELECT c.id, c.name, c.phone, SUM(x.total) as total
FROM clients c
INNER JOIN orders o
ON o.client_id = c.id
INNER JOIN (
SELECT order_id, SUM(number_of_items * price_of_single_item) as total
FROM orders_type_a
UNION ALL
SELECT order_id, SUM((number_of_shoes_11 + number_of_shoes_12 + number_of_shoes_13) * price_of_single_shoe) as total
FROM orders_type_b
) x
ON x.order_id = o.id
GROUP BY c.id
;
I'm making a few assumptions about how to calculate the total based on the columns in the orders_type_x tables.

Complicated MySQL Database Query

I have the following database structure:
Sites table
id | name | other_fields
Backups table
id | site_id | initiated_on(unix timestamp) | size(float) | status
So Backups table have a Many to One relationship with Sites table connected via site_id
And I would like to output the data in the following format
name | Latest initiated_on | status of the latest initiated_on row
And I have the following SQL query
SELECT *, `sites`.`id` as sid, SUM(`backups`.`size`) AS size
FROM (`sites`)
LEFT JOIN `backups` ON `sites`.`id` = `backups`.`site_id`
WHERE `sites`.`id` = '1'
GROUP BY `sites`.`id`
ORDER BY `backups`.`initiated_on` desc
The thing is, with the above query I can achieve what I am looking for, but the only problem is I don't get the latest initiated_on values.
So if I had 3 rows in backups with site_id=1, the query does not pick out the row with the highest value in initiated_on. It just picks out any row.
Please help, and
thanks in advance.
You should try:
SELECT sites.name, FROM_UNIXTIME(b.latest) as latest, b.size, b.status
FROM sites
LEFT JOIN
( SELECT bg.site_id, bg.latest, bg.sizesum AS size, bu.status
FROM
( SELECT site_id, MAX(initiated_on) as latest, SUM(size) as sizesum
FROM backups
GROUP BY site_id ) bg
JOIN backups bu
ON bu.initiated_on = bg.latest AND bu.site_id = bg.site_id
) b
ON sites.id = b.site_id
In the GROUP BY subquery - bg here, the only columns you can use for SELECT are columns that are either aggregated by a function or listed in the GROUP BY part.
http://dev.mysql.com/doc/refman/5.5/en/group-by-hidden-columns.html
Once you have all the aggregate values you need to join the result again to backups to find other values for the row with latest timestamp - b.
Finally join the result to the sites table to get names - or left join if you want to list all sites, even without a backup.
Try with this:
select S.name, B.initiated_on, B.status
from sites as S left join backups as B on S.id = B.site_id
where B.initiated_on =
(select max(initiated_on)
from backups
where site_id = S.id)
To get the latest time, you need to make a subquery like this:
SELECT sites.id as sid,
SUM(backups.size) AS size
latest.time AS latesttime
FROM sites AS sites
LEFT JOIN (SELECT site_id,
MAX(initiated_on) AS time
FROM backups
GROUP BY site_id) AS latest
ON latest.site_id = sites.id
LEFT JOIN backups
ON sites.id = backups.site_id
WHERE sites.id = 1
GROUP BY sites.id
ORDER BY backups.initiated_on desc
I have removed the SELECT * as this will only work using MySQL and is generally bad practice anyway. Non-MySQL RDBSs will throw an error if you include the other fields, even individually and you will need to make this query itself into a subquery and then do an INNER JOIN to the sites table to get the rest of the fields. This is because they will be trying to add all of them into the GROUP BY statement and this fails (or is at least very slow) if you have long text fields.

Complex SQL query, need to sort via count based upon time constraints

Hi guys I have the following three tables here.
COUNTRIES
ID | Name | Details
Airports
ID | NAME | CountryID
Trips
ID | AirportID | Date
I have to retrieve a list showing the following:
AirportID | AIrport Name | Country Name | Number of Trips Made Between Date1 and Date2
I need this to be really efficient, what kind of indexes do I need to set up and how would I formulate the SQL query here? I would be displaying this using Php. Note that I need to be able to sort based upon the number of trips made.
EDIT ==
Oops forgot to mention my sql:
I've tried the following:
SELECT `c`.*, `t`.`country` AS `country_name`, COUNT(f.`id`) AS `num_trips` FROM `airports` AS `c`
LEFT JOIN `countries` AS `t` ON t.`id` = c.`country_id`
LEFT JOIN `trips` AS `f` ON f.`airportid` = c.`id` GROUP BY `c`.`id` ORDER BY `num_flights` ASC LIMIT 10
It works but takes a really looong time to execute - plus consider this that my airports table has over 30'000 entries and teh trips table is variable.
I'm just taking the name of the country from the countries table - would it be better if I were to instead exclude joining teh countries table in the sql and instead retrieve the country name from an array where the index is the ID and values are the names of countries?
I'm not sure why you're using left joins. If every trip has an airport and every airport has a country, and inner join would give you accurate results.
I would do this:
select a.ID as AirportID, a.Name as AirportName, c.Name as CountryName, count(t.id) as NumTrips
from Trips t
inner join Airports a on t.AirportID = a.ID
inner join Countries c on a.CountryID = c.ID
where t.Date >= #StartDate
and t.Date <= #EndDate
group by AirportID, AirportName, CountryName
order by NumTrips
limit 10
Replace the #StartDate and #EndDate with your appropriate values.
Not sure what you're looking for in results, but I would expect you want the most trips. In that case you would want to do "order by NumTrips desc". This will show the highest values first, especially since you're limiting it to 10.
Also, I suggest you rename your "Date" column to something that won't collide with reserved SQL words. I usually use "DateCreated" or "DateOfTravel" or something like that.
If I made any poor assumptions let me know and I can re-write this.
Edit:
For indexes, create them on fields you will be looking up on. In other words, primary keys (which should always be indexed), foreign keys, and in this case it looks like the Date column would be the other important index. However, if you plan on searching by "Airport Name", then add an index there. I think you see where this is headed, etc.
Indexes on airpoirt(countryid, id) and trips(airportid) would seem the most important.
Instead of count(f.id) try count(f.airportid), so MySQL doesn't have to check the trips.id column.

Categories