Comparison time for 2 large MySQL database table - php

I have imported 2 .csv file that I wanted to compare into MySQL table. now i want to compare both of them using join.
However, whenever I include both table in my queries, i get no response from phpMyAdmin ( sometimes it shows 'max execution time exceeded).
The record size in both db tables is 73k max. I dont think thats huge on data. Even a simple query like
SELECT *
FROM abc456, xyz456
seems to hang. I did an explain and I got this below. I dont know what to take from this.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE abc456 ALL NULL NULL NULL NULL 73017
1 SIMPLE xyz456 ALL NULL NULL NULL NULL 73403 Using join buffer
can someone please help?
UPDATE: added the structure of the table with composite keys. There are around 100000+ records that would be inserted in this table.
CREATE TABLE IF NOT EXISTS `abc456` (
`Col1` varchar(4) DEFAULT NULL,
`Col2` varchar(12) DEFAULT NULL,
`Col3` varchar(9) DEFAULT NULL,
`Col4` varchar(3) DEFAULT NULL,
`Col5` varchar(3) DEFAULT NULL,
`Col6` varchar(40) DEFAULT NULL,
`Col7` varchar(200) DEFAULT NULL,
`Col8` varchar(40) DEFAULT NULL,
`Col9` varchar(40) DEFAULT NULL,
`Col10` varchar(40) DEFAULT NULL,
`Col11` varchar(40) DEFAULT NULL,
`Col12` varchar(40) DEFAULT NULL,
`Col13` varchar(40) DEFAULT NULL,
`Col14` varchar(20) DEFAULT NULL,
KEY `Col1` (`Col1`,`Col2`,`Col3`,`Col4`,`Col5`,`Col6`,`Col7`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

It looks like you are doing a pure catesian join in your query.
Shouldn't you be joining the tables on certain fields? If you do that and the query still takes a long time to execute, you should put appropriate indexes to speed up the query.
The reason that it is taking so long is that it is trying to join every single row of the first table to every single row of the second table.

You need a join condition, some way of identifying which rows should be matched up:
SELECT * FROM abc456, xyz456 WHERE abc456.id = xyz456.id

Add indexes on joining columns. That should help with performance.
Use MySQL Workbench or MySQL Client (console) for long queries. phpmyadmin is not designed to display queries that return 100k rows :)
If you REALLY have to use phpmyadmin and you need to run long queries you can use Firefox extension that prevents phpmyadmin timeout: phpMyAdmin Timeout Preventer (direct link!)
There is a direct link, because i couldnt find english description.

Related

Mysql 5.7 Innodb Delete query very slow randomly

I have table with 5 simple fields. Total rows in table is cca 250.
When I use PHPmyAdmin with one DELETE query it is processed in 0.05 sec. (always).
Problem is that my PHP application (PDO connection) processing same query between other queries and this query is extremely slow (cca 10 sec.). And another SELECT query on table with 5 rows too (cca 1 sec.). It happened only sometimes!
Other queries (cca 100) are always OK with normal time response.
What problem should be or how to find what is the problem?
Table:
CREATE TABLE `list_ip` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`type` CHAR(20) NOT NULL DEFAULT '',
`address` CHAR(50) NOT NULL DEFAULT '',
`description` VARCHAR(50) NOT NULL DEFAULT '',
`datetime` DATETIME NOT NULL DEFAULT '1000-01-01 00:00:00',
PRIMARY KEY (`id`),
INDEX `address` (`address`),
INDEX `type` (`type`),
INDEX `datetime` (`datetime`) ) COLLATE='utf8_general_ci' ENGINE=InnoDB;
Query:
DELETE FROM list_ip WHERE address='1.2.3.4' AND type='INT' AND datetime<='2017-12-06 08:04:30';
As I said before table has only 250 rows. Size of table is 96Kib.
I tested also with empty table and its slow too.
Wrap your query in EXPLAIN and see if it's running a sequential select, not using indexes. EXPLAIN would be my first stop in determining if I have a data model problem (bad / missing indexes would be one model issue).
About EXPLAIN: https://dev.mysql.com/doc/refman/5.7/en/explain.html
Another tool I'd recommend is running 'mytop' and looking at the server activity/load during those times when it's bogging down. http://jeremy.zawodny.com/mysql/mytop/
There was some network problem. I uninstalled docker app with some network peripherals and looks much beter.

MySQL - Ranking with millions of entries

I'm working on a project and right now I'm implementing a leaderboard. Before I start working on it, I need some advices for better practice of my leaderboard's structure.
First of all the leaderboard will be displayed on two pages, the one is on the home page of each player's which will contain the first 10 teams (same 10 teams for all players) and the other leaderboard will be in the leaderboard's page, which there, will have all the teams with sorting functionalities.
The structure of the leaderboard of each row is the following:
• ranking position
• team name
• team value
• total of the games the team won
• total of the games the team defeated
• total of the games the team had draw
• sum of the goals the team has made
• sum of the goals the team has conceded
• the last 4 game results of the team
Below is my database's tables
challenges table
CREATE TABLE `challenges` (
id` int(10) unsigned NOT NULL AUTO_INCREMENT,
'challenge_date` datetime NOT NULL,
`status` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `challanges_id_index` (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
challenges results
CREATE TABLE `challenges_results` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`challenge_id` int(11) NOT NULL,
`team_id` int(11) NOT NULL,
`goals` int(11) NOT NULL,
`result` char(1) DEFAULT NULL,
`challenge_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
On challenges results result column can be W for wins, D for draws and L for defeats
team values
CREATE TABLE `team_values` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`team_id` int(11) DEFAULT NULL,
`value` double(15,8) DEFAULT '1500.00000000',
`created_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
team
CREATE TABLE `teams` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`avatar` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`founded` date NOT NULL,
`residense_city_id` int(10) unsigned NOT NULL,
`slug` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`primary_color` char(10) COLLATE utf8_unicode_ci NOT NULL,
`secondary_color` char(10) COLLATE utf8_unicode_ci NOT NULL,
`status` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `teams_slug_unique` (`slug`),
KEY `teams_id_index` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
One team can have many values (teams_values) but only the recent will be displayed.
One team can be in many challenges.
One team can have many results from different challenges.
The leaderboard will work as follow. The teams will be sorted with the highest values from teams_values table. That value is calculated and stored every time the team is having a challenge.
In case where two or more teams have the same value we need to apply the following three rules. The rules also needs to be executed one by one, for example if I run the first rule and still there are teams which are equal also on value and goals scored then I will apply the second rule and so on.
• Best offense (higher Number of goals scored)
• Best Defense (Less Number of goals conceded)
• The team with the most wins in the games between them
So I came with three solutions which still I don't know which one is the better and if there is a better from the three.
The first option that I though is to use options like inner join, union etc to collect the information from the tables and apply the rules on the same SQL query. So every time that I want to view the leaderboard, I will execute this SQL. The problem with this solution is that I don't know how effective will be in case that we want the leaderboard to be always up to date with the latest results. Because imaging having 10k visitors per day and everyone executing this query.
Second option is to collect the information and in case of duplicate values, I will use PHP to get the duplicate teams, apply the rules and then based on the results of the rules swipe the teams in the array. From performance site I don't know how effective is this option.
Third solution is to create another table called leaderboard which I will store all this information in case the team doesn't exist or I will update the record if exist based on the results of the latest challenge e.g increasing the goals if the team scored. Then I will use only the leaderboard table for filtering the data and printing the ranking of the teams. I believe this option is better because I need to deal only with one table and I will update the record only when a team finished a challenge.
We will use cache, but for now we are thinking that the leaderboard should be always up to date and not updating it once a day.
Which one is better solution and why and in case of a better solution I'm open for suggestions. Thanks
Since you're running on a shared account on a virtual private server, the chances are very going you're going to theoretically run into cases where you contend for the use of server resources, disk usage, memory usage, cpu processing power.
First and foremost, try do all your database calculations in MySQL, and only return the data to PHP once you've completed all operations on them. MySQL is optimised for the job, whereas PHP is better at general computing problems.
Your one option to take some load off the server would be to use PHP to create a webpage that is viewable in the browser, every single time the leaderboard is updated. That way, you run through calculations only once every time the leaderboard needs to be updated.
If I was building the system and the system was never going to reach enterprise-grade level, but instead remain small and functional, I would write a PHP script early on, because you can save a lot of processing power that way alone.
For what it's worth, if the server is well configured, you shouldn't be worried about getting 10k user requests a day, unless your code is really terribly written.
EDIT: As an afterthought, you can install a program like https://memcached.org/, which caches your SQL data in RAM. Sites like LiveJournal and Wordpress use it, but you'd need to configure it in a way that works for the rest of the vps users unless the box is really high spec.

Dynamically update sql columns based on number of entries

I want to create a table like below:
id| timestamp | neighbour1_id | neighbour1_email | neighbour2_id | neighbour2_email
and so on upto max neighbour 20.
I have two questions:
Should I create columns statically or is there a way to create columns dynamically using php based on the count of json Array?
In either case, how would I refer to the columns dynamically and assign value to them based on jsonArray?
My jsonArray would look something like:
{id:123, email_id:abc, neighbours: [{neighbour1_id:234, neighbour1_email: bcd}, {neighbour2_id:345, neighbour2_email:dsf}, {}, {}...]}
Please advice. Thanks.
It looks like you need to rethink your database structure a bit. To me it looks like you need a single users (or whatever they are) table:
CREATE TABLE `users` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
`cretaed_at` timestamp NOT NULL,
PRIMARY KEY (`id`)
);
And another table that defines relations between those users:
CREATE TABLE `neighbors` (
`parent` int(11) unsigned NOT NULL,
`child` int(11) unsigned NOT NULL,
PRIMARY KEY (`parent`,`child`)
);
Now you can add as many neighbors to each user as you want. Fetching them is as easy as:
SELECT * FROM `users`
LEFT JOIN `neighbors` ON `users`.`id` = `neighbors`.`child`
WHERE `neighbors`.`parent` = ?
Where that question mark would become the id of the user from which you are fetching the neighbors, preferably by using a prepared statement.
If it is all JSON you will be working with, and querying isn't much of an issue, you could consider working with a noSql database or document store (like redis or mongoDb), but that is an entirely different story.
Just repeating a bunch of columns x times is definitely not the way to go. Vertical size (# rows) of tables in relational databases is no big issue, they are designed for that. Horizontal size (# columns) however is something to be careful with, as it may make your db uanessacry large, and decrease performance.
Just consider what you would if you want to find a user that has a neighbor with an email address [x]. You would have to repeat your where statement 20 times for each possible email column. And that is just one example...
well, the answer i was working on while pevara was posting theirs faster is almost the same...
CREATE TABLE `neighbours` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`neighbour_email` char(64) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;
CREATE TABLE `neighbour_email_collections` (
`id` int(10) unsigned NOT NULL,
`email_id` char(64) NOT NULL,
`neighbour_id` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`,`neighbour_id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;
insert into neighbours values (234, "bcd");
insert into neighbours values (345, "dsf");
insert into neighbour_email_collections values(123, "abc", 234);
insert into neighbour_email_collections values(123, "abc", 345);
select *
from neighbours
left join neighbour_email_collections
on neighbour_email_collections.neighbour_id=neighbours.id
where neighbour_email_collections.id=123;

How to speed up the SELECT query of the following table?

I have a mysql table whose create code is as follows :
CREATE TABLE image_ref (
region VARCHAR(50) NULL DEFAULT NULL,
district VARCHAR(50) NULL DEFAULT NULL,
district_name VARCHAR(100) NULL DEFAULT NULL,
lot_no VARCHAR(10) NULL DEFAULT NULL,
sp_no VARCHAR(10) NULL DEFAULT NULL,
name VARCHAR(200) NULL DEFAULT NULL,
form_no VARCHAR(50) NOT NULL DEFAULT '',
imagename VARCHAR(50) NULL DEFAULT NULL,
updated_by VARCHAR(50) NULL DEFAULT NULL,
update_log DATETIME NULL DEFAULT NULL,
ip VARCHAR(50) NULL DEFAULT NULL,
imgfetchstat VARCHAR(1) NULL DEFAULT NULL,
PRIMARY KEY (form_no)
)
COLLATE='latin1_swedish_ci'
ENGINE=MyISAM;
This table contains approximately 7,00,000 number of rows. I have an application developed using PHP. Somewhere I need to run the following query :
SELECT
min(imagename) imagename
FROM
image_ref
WHERE
district_name = '$sess_district'
AND
lot_no = '$sess_lotno'
AND
imgfetchstat = '0';
which is taking on average 1.560 sec. The form_no field only has unique values. After some job is done with the result set fetched, the imgfetchstat is required to be updated with a value 1. Now my requirement is that, whether I should use InnoDB or MyISAM? Also, the application is accessed by around 50 numbers of users in LAN. Is there any way out to run the above query little bit faster? because the imagename fetched is being used to load an image of resolution 500 x 498 into the browser and the it is taking enough time to load the image. Thanks in advance.
You can add indexes to your table (be aware that this will make the storage larger - but given your query, you should be able to use the following:
ALTER TABLE `table` ADD INDEX `product_id` (`product_id`)
For more information see http://dev.mysql.com/doc/refman/5.0/en/create-index.html
You can add an index on a single column (which makes things nice for the DB, even it it uses a few of them on a query) but if you have a secific query that needs to REALLY run fast, you can add a multi-column index which is specific to your query:
ALTER TABLE image_ref ADD INDEX `someName`
(`district_name`, `lot_no`, `imgfetchstat`)

MySQL + PHP: select multiple rows on a join, then update those rows/insert new ones

I want to do the following:
Select multiple rows on an INNER JOIN between two tables.
Using the primary keys of the returned rows, either:
Update those rows, or
Insert rows into a different table with the returned primary key as a foreign key.
In PHP, echo the results of step #1 out, ideally with results of #2 included (to be consumed by a client).
I've written the join, but not much else. I tried using a user-defined variable to store the primary keys from step #1 to use in step #2, but as I understand it user-defined variables are single-valued, and my SELECT can return multiple rows. Is there a way to do this in a single MySQL transaction? If not, is there a way to do this with some modicum of efficiency?
Update: Here are the schemas of the tables I'm concerned with (names changed, 'natch):
CREATE TABLE IF NOT EXISTS `widgets` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`author` varchar(75) COLLATE utf8_unicode_ci NOT NULL,
`text` varchar(500) COLLATE utf8_unicode_ci NOT NULL,
`created` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated` timestamp
NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
);
CREATE TABLE IF NOT EXISTS `downloads` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`widget_id` int(11) unsigned NOT NULL,
`lat` float NOT NULL,
`lon` float NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
);
I'm currently doing a join to get all widgets paired with their downloads. Assuming $author and $batchSize are php vars:
SELECT w.id, w.author, w.text, w.created, d.lat, d.lon, d.date
FROM widgets AS w
INNER JOIN downloads AS d
ON w.id = d.widget_id
WHERE w.author NOT LIKE '$author'
ORDER BY w.updated ASC
LIMIT $batchSize;
Ideally my query would get a bunch of widgets, update their updated field OR insert a new download referencing that widget (I'd love to see answers for both approaches, haven't decided on one yet), and then allow the joined widgets and downloads to be echoed. Bonus points if the new inserted download or updated widgets are included in the echo.
Since you asked if you can do this in a single Mysql transaction I'll mention cursors. Cursors will allow you to do a select and loop through each row and do the insert or anything else you want all within the db. So you could create a stored procedure that does all the logic behind the scenes that you can call via php.
Based on your update I wanted to mention that you can have the stored procedure return the new recordset or an I'd, anything you want. For more info on creating stored procedures that return a recordset with php you can check out this post: http://www.joeyrivera.com/2009/using-mysql-stored-procedure-inout-and-recordset-w-php/

Categories