I'm currently trying to create a logparser for Call of Duty 4. The parser itself is in php and reads through every line of the logfile for a specific server, and writes all the statistics to a database with mysqli. The databases are already in place and I'm fairly certain (with my limited experience) that they're well-organized. However, I'm not sure in what way I should send the update/insert queries to the database, or rather, which way is optimal.
My databases are structured as follows
-- --------------------------------------------------------
--
-- Table structure for table `servers`
--
CREATE TABLE IF NOT EXISTS `servers` (
`server_id` tinyint(3) unsigned NOT NULL auto_increment,
`servernr` smallint(1) unsigned NOT NULL default '0',
`name` varchar(30) NOT NULL default '',
`gametype` varchar(8) NOT NULL default '',
PRIMARY KEY (`server_id`),
UNIQUE KEY (`servernr`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-- --------------------------------------------------------
--
-- Table structure for table `players`
--
CREATE TABLE IF NOT EXISTS `players` (
`player_id` tinyint(3) unsigned NOT NULL auto_increment,
`guid` varchar(8) NOT NULL default '0',
`fixed_name` varchar(30) NOT NULL default '',
`hide` smallint(1) NOT NULL default '0',
PRIMARY KEY (`player_id`),
UNIQUE KEY (`guid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
-- --------------------------------------------------------
--
-- Table structure for table `playerstats`
--
CREATE TABLE IF NOT EXISTS `playerstats` (
`pid` mediumint(9) unsigned NOT NULL auto_increment,
`guid` varchar(8) NOT NULL default '0',
`servernr` smallint(1) unsigned NOT NULL default '0',
`kills` mediumint(8) unsigned NOT NULL default '0',
`deaths` mediumint(8) unsigned NOT NULL default '0',
# And more stats...
PRIMARY KEY (`pid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
In short, servers and players contain unique entities, and they are combined in playerstats (i.e. statistics for a player in a specific server). In addition to the stats, they are also given a player id (pid) for use in later databases. Similarly, the database contains the tables weapons (unique weapons) and weaponstats (statistics for a weapon in a server), attachments and attachstats, and maps and mapstats. Once I get all of this working, I would like to implement more relations between these stats (i.e. a player's stats for a specific weapon in a specific server, using pid and wid).
The PHP parser copies the log of each server (there are 6 atm) over http and then reads through them every 5 minutes (I'm not too sure on that yet). One can assume that during this parsing, every table has to be queried (either with UPDATE or INSERT) at least once (and probably alot more). Right now, I have a number of options on how to send queries (that I know of):
1: Use regular queries, i.e.
$statdb = new mysqli($sqlserver,$user,$pw, $db);
foreach( $playerlist as $guid => $data ){
$query = 'INSERT INTO `playerstats`
VALUES (NULL, '$guid', $servernr, $data[0], $data[1])';
$statdb->query($query);
}
2: Use multi query
$statdb = new mysqli($sqlserver,$user,$pw, $db);
foreach( $playerlist as $guid => $data ){
$query = "INSERT INTO `playerstats`
VALUES (NULL, '$guid', $servernr, $data[0], $data[1]);";
$totalquery .= $query;
}
$statdb->multi_query($totalquery);
3: Use prepared statements; I haven't actually tried this yet. It seems like a good idea, but then I have to make a prepared statement for every table (I think). Will that even be possible, and if so, will it be efficient?
4: As you might be able to see from the aforementioned code, I initially count all the statistics for each player,weapon,map, etc. into an array. Once the parser has read through the entire file, it sends a query with those accumulated stats to the mysql server. However, I have also seen (more often then not) in other logparsers, that queries are being sent whenever a new line of the logfile has been parsed, so something like:
UPDATE playerstats
SET kills = kills+1
WHERE guid = $guid
It doesn't seem very efficient to me, but then again I'm just starting out with both php and sql so what do I know :>
So, in short; what would be the most efficient way to query the database, considering that the logparser reads through every line one by one? Of course, any other advice or suggestion is always welcome.
.5. create a single multi-insert query using mysql's support for the queries like
INSERT INTO table (fields) VALUES(data),VALUES(data)...
It seems most efficient of them all, including prepared statements
The most efficient way to me would be to scan the server every so often, once every 5 minutes or so, then scan the list of stats into an array (e.g. in 5 mins 38 people have been on the server so you have an array of 38 IDs, each with the accumulated stats changes of those 38 IDs that need to be updated in the server). Run one query to check to see if a user has an existing ID in stats, and then 2 more queries, one to create new users (multi query insert) and one to update users (single query with CASE update). That limits you to 3 queries every 5 minutes.
Related
I have table with 5 simple fields. Total rows in table is cca 250.
When I use PHPmyAdmin with one DELETE query it is processed in 0.05 sec. (always).
Problem is that my PHP application (PDO connection) processing same query between other queries and this query is extremely slow (cca 10 sec.). And another SELECT query on table with 5 rows too (cca 1 sec.). It happened only sometimes!
Other queries (cca 100) are always OK with normal time response.
What problem should be or how to find what is the problem?
Table:
CREATE TABLE `list_ip` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`type` CHAR(20) NOT NULL DEFAULT '',
`address` CHAR(50) NOT NULL DEFAULT '',
`description` VARCHAR(50) NOT NULL DEFAULT '',
`datetime` DATETIME NOT NULL DEFAULT '1000-01-01 00:00:00',
PRIMARY KEY (`id`),
INDEX `address` (`address`),
INDEX `type` (`type`),
INDEX `datetime` (`datetime`) ) COLLATE='utf8_general_ci' ENGINE=InnoDB;
Query:
DELETE FROM list_ip WHERE address='1.2.3.4' AND type='INT' AND datetime<='2017-12-06 08:04:30';
As I said before table has only 250 rows. Size of table is 96Kib.
I tested also with empty table and its slow too.
Wrap your query in EXPLAIN and see if it's running a sequential select, not using indexes. EXPLAIN would be my first stop in determining if I have a data model problem (bad / missing indexes would be one model issue).
About EXPLAIN: https://dev.mysql.com/doc/refman/5.7/en/explain.html
Another tool I'd recommend is running 'mytop' and looking at the server activity/load during those times when it's bogging down. http://jeremy.zawodny.com/mysql/mytop/
There was some network problem. I uninstalled docker app with some network peripherals and looks much beter.
I need to update and read data from one table at the same time with different PHP scripts.
I create $sess as script session identifier (works around 20 scripts at same time) and I set in the table row the session identifier for 100 rows. After, it will SELECT rows reserved for this script by session identifier. In the while loop the script will do some work with the data and update reserved rows.
But the scripts don't work at same time, the first script works fine, but others not do first query while the first script is running. I see it in my database management app.
$sess = intval(str_replace(".", "", microtime(TRUE)));
sql_query("UPDATE locations SET sess='$sess' WHERE sess='0' LIMIT 100");
$r = sql_query("SELECT * FROM locations WHERE sess='$sess'");
while ($q = sql_row($r))
{
Create table syntax
CREATE TABLE `locations` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`country_id` int(11) unsigned DEFAULT NULL,
`area_id` int(11) unsigned DEFAULT NULL,
`timeZone` int(2) DEFAULT NULL,
`lat` double DEFAULT NULL,
`lon` double DEFAULT NULL,
`locationKey` int(11) unsigned DEFAULT NULL,
`cityId` int(11) unsigned DEFAULT NULL,
`sess` bigint(100) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `lat` (`lat`),
KEY `lon` (`lon`),
KEY `country_id` (`country_id`),
KEY `cityId` (`cityId`),
KEY `area_id` (`area_id`),
KEY `sess` (`sess`)
) ENGINE=InnoDB AUTO_INCREMENT=3369269 DEFAULT CHARSET=utf8;
Are you inside a transaction (BEGIN...COMMIT)? If so, you have locked those 100 rows. Ditto for autocommit=0.
Instead, be sure the UPDATE (which is used to assign 100 items from your 'queue' to your process) is in a transaction by itself.
Then, assuming the other threads cannot call microtime in the same microsecond (a dubious assumption), the 100 items are safely assigned to you. Then you can start another transaction (if needed) and process them.
However, that 100-row SELECT will unnecessarily put a read lock on those rows.
So...
START TRANSACTION;
UPDATE ... LIMIT 100;
SELECT ...
COMMIT;
foreach ...
START TRANSACTION;
work on one item
COMMIT;
end-for
It is unclear whether the second START-COMMIT should be inside the for loop or outside. Inside will be slower, but may be 'correct', depending on what "work" you are doing and how long it could take. (You don't want to ROLLBACK 99 successful 'works' because the 100th took too long.)
Are you later doing DELETE ... WHERE sess = $sess? If you are mass-deleting like that, then do it in a separate transaction after the for loop.
Goal: segregate the queuing transactions from the application transactions.
Note that the segregation will make it easier to code for errors/deadlocks/etc. (You are checking, correct?)
I am a bit stumped on this wierdness.
I have a gps tracking app that logs gps points into a track_log table.
When I do a basic query on the running log table it takes about 50 seconds to complete:
SELECT * FROM track_log WHERE node_id = '26' ORDER BY time_stamp DESC LIMIT 1
When I run the exact same query on the archived table where I copied most of the logs to to reduce the running table's logs to about 1.2 million records.
The archive table is 7.5 million records big.
The exact same query on the archive table runs for 0.1 seconds on the same server even though it's six times bigger!
What's going on?
Here's the full Create Table schema:
CREATE TABLE `track_log` (
`id_track_log` INT(11) NOT NULL AUTO_INCREMENT,
`node_id` INT(11) DEFAULT NULL,
`client_id` INT(11) DEFAULT NULL,
`time_stamp` DATETIME NOT NULL,
`latitude` DOUBLE DEFAULT NULL,
`longitude` DOUBLE DEFAULT NULL,
`altitude` DOUBLE DEFAULT NULL,
`direction` DOUBLE DEFAULT NULL,
`speed` DOUBLE DEFAULT NULL,
`event_code` INT(11) DEFAULT NULL,
`event_description` VARCHAR(255) DEFAULT NULL,
`street_address` VARCHAR(255) DEFAULT NULL,
`mileage` INT(11) DEFAULT NULL,
`run_time` INT(11) DEFAULT NULL,
`satellites` INT(11) DEFAULT NULL,
`gsm_signal_status` DOUBLE DEFAULT NULL,
`hor_pos_accuracy` double DEFAULT NULL,
`positioning_status` char(1) DEFAULT NULL,
`io_port_status` char(16) DEFAULT NULL,
`AD1` decimal(10,2) DEFAULT NULL,
`AD2` decimal(10,2) DEFAULT NULL,
`AD3` decimal(10,2) DEFAULT NULL,
`battery_voltage` decimal(10,2) DEFAULT NULL,
`ext_power_voltage` decimal(10,2) DEFAULT NULL,
`rfid` char(8) DEFAULT NULL,
`pic_name` varchar(255) DEFAULT NULL,
`temp_sensor_no` char(2) DEFAULT NULL,
PRIMARY KEY (`id_track_log`),
UNIQUE KEY `id_track_log_UNIQUE` (`id_track_log`),
KEY `client_id_fk_idx` (`client_id`),
KEY `track_log_node_id_fk_idx` (`node_id`),
KEY `track_log_event_code_fk_idx` (`event_code`),
KEY `track_log_time_stamp_index` (`time_stamp`),
CONSTRAINT `track_log_client_id` FOREIGN KEY (`client_id`) REFERENCES `clients` (`client_id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `track_log_event_code_fk` FOREIGN KEY (`event_code`) REFERENCES `event_codes` (`event_code`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `track_log_node_id_fk` FOREIGN KEY (`node_id`) REFERENCES `nodes` (`id_nodes`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=8632967 DEFAULT CHARSET=utf8
TL;DR
Make sure the indexes are defined in both tables, for this query node_id and time_stamp are good indexes.
Defragment your table: https://dev.mysql.com/doc/refman/5.5/en/innodb-file-defragmenting.html (This could help, but should not make this much of a difference).
Make sure your query is not being blocked by other queries. If data is being inserted in the track_log table at continuously, those queries might block your query. You can prevent this by changing the transaction isolation level, see https://dev.mysql.com/doc/refman/5.5/en/set-transaction.html for more information. Caution: be carefull with this!
Indexes
I'm guessing this has something to do with the indexes you defined on the tables. Could you post the SHOW CREATE TABLES track_log output and the output of your archive table as well? The query you are executing would require an index on node_id and time_stamp for optimal performance.
Defragmentation
Besides this indexes you defined on the table, this might have something to do with data fragmentation. I'm assuming you are using InnoDB as your table engine now. Depending on your settings, every table in a database is stored in a separate file or every table in the database is stored in a single file (innodb_file_per_table variable). Those files will never shrink in size. If your track_log table has grown to 8.7 million records, on disk, it still takes up space for all those 8.7 million records.
If you have moved records from your track_log table to your archive table, the data might still be at the beginning and the end of the physical file for track_log. If no index is defined at time_stamp, a full table scan is still required to order by the timestamp. This means: reading the complete file from disk. Because the records you deleted still take up space in the file, this could make a difference.
Edit:
Transactions
Other transactions might be blocking your SELECT query. This can happen with the InnoDB engine. If you continously insert a lot of data into your track_log table, those queries might block your query. It will have to wait until no other transactions are being performed at this table.
There is a way around this, but you should be careful with this. You are able to change to transaction isolation level of your query. By setting the transaction isolation level to READ UNCOMMITTED you will be able to read data, while the other inserts are running. But it might not always give you the latest data. If you want to sacrifice this depends on your situation. If you are going to alter the data and update the data later, you generally do not want to change the transaction isolation level. But, for example, when showing statistics which should not always be accurate and up to date, this could be something that really speeds up your query.
I use this myself sometimes when I need to show statistics from large tables which are updated regularly.
This is almost certainly because your archive table has superior indexing to your track_log table.
To satisfy this query efficiently you need a compound index on (node_id, time_stamp) Why does this work? Because InnoDB and MyISAM indexes are so-called BTREE indexes, which means our intuition about searching them in order will work. Your query looks for a specific value of node_id, which means it can jump to that value in the index efficiently. The query then calls for the highest possible value of time_stamp related to that node_id value. Now that's in the same index, and in the right order to access it quickly too. So the row you need can be random-accessed, and MySQL doesn't have to hunt for it by scanning the table row by row. That scanning is almost certainly what's taking the time in your query.
Three things to keep in mind:
One: lots of indexes on single columns can't help a query as much as well-chosen compound indexes. Read this http://use-the-index-luke.com/
Two: SELECT * is usually harmful on a table with as many columns as the one you have shown. Instead, you should enumerate the columns you actually need in your SELECT query. That way MySQL doesn't have to sling as much data.
Three: The DOUBLE datatype is overkill for commercial-grade GPS data. FLOAT is plenty of precision.
Let us analyze your query:
SELECT * FROM track_log WHERE node_id = '26' ORDER BY time_stamp DESC LIMIT 1
The above mentioned query first sorts all the data present in the table based on time_stamp and then returns the top row.
But, when this query is executed on archived table, order by clause might be ignored (based on compression and system setting) and hence it returns the first row it encountered in the table.
You may verify the output of archived table by comparing the result with actual latest row.
I'm adding "activity log" to a busy website, which should show user the last N actions relevant to him and allow going to a dedicated page to view all the actions, search them etc.
The DB used is MySQL and I'm wondering how the log should be stored - I've started with a single Myisam table used for FULLTEXT searches, and to avoid extra select queries on every action: 1) an insert to that table happens 2) the APC cache for each is updated, so on the next page request mysql is not used. Cache has a log lifetime and if it's missing, the first AJAX request from user creates it.
I'm caching 3 last events for each user, so when a new event happens, I grab the current cache, add the new event to the beginning and remove the oldest event, so there's always 3 of those in the cache. Every page of the site has a small box displaying those.
Is this a proper setup? How would you recommend implementing this sort of feature?
The schema I have is:
CREATE DATABASE `audit`;
CREATE TABLE `event` (
`eventid` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`userid` INT UNSIGNED NOT NULL ,
`createdat` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ,
`message` VARCHAR( 255 ) NOT NULL ,
`comment` TEXT NOT NULL
) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER DATABASE `audit` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE `audit`.`event` ADD FULLTEXT `search` (
`message` ( 255 ) ,
`comment` ( 255 )
);
Based on your schema, I'm guessing that (caching aside), you'll be inserting many records per second, and running fairly infrequent queries along the lines of select * from event where user_id = ? order by created_date desc, probably with a paging strategy (thus requiring "limit x" at the end of the query to show the user their history.
You probably also want to find all users affected by a particular type of event - though more likely in an off-line process (e.g. a nightly mail to all users who have updated their password"; that might require a query along the lines of select user_id from event where message like 'password_updated'.
Are there likely to be many cases where you want to search the body text of the comment?
You should definitely read the MySQL Manual on tuning for inserts; if you don't need to search on freetext "comment", I'd leave the index off; I'd also consider a regular index on the "message" table.
It might also make sense to introduce the concept of "message_type" so you can introduce relational consistency (rather than relying on your code to correctly spell "password_updat3"). For instance, you might have an "event_type" table, with a foreign key relationship to your event table.
As for caching - I'm guessing users would only visit their history page infrequently. Populating the cache when they visit the site, on the off-chance they might visit their history (if I've understood your design) immediately limits the scalability of your solution to how many history records you can fit into your cachce; as the history table will grow very quickly for your users, this could quickly become a significant factor.
For data like this, which moves quickly and is rarely visited, caching may not be the right solution.
This is how Prestashop does it:
CREATE TABLE IF NOT EXISTS `ps_log` (
`id_log` int(10) unsigned NOT NULL AUTO_INCREMENT,
`severity` tinyint(1) NOT NULL,
`error_code` int(11) DEFAULT NULL,
`message` text NOT NULL,
`object_type` varchar(32) DEFAULT NULL,
`object_id` int(10) unsigned DEFAULT NULL,
`id_employee` int(10) unsigned DEFAULT NULL,
`date_add` datetime NOT NULL,
`date_upd` datetime NOT NULL,
PRIMARY KEY (`id_log`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=6 ;
My advice would be use a schema less storage system .. they perform better in high volume logging data
Try to consider
Redis
MongoDB
Riak
Or any other No SQL System
I have a table that its structure is as like as follow:
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ttype` int(1) DEFAULT '19',
`title` mediumtext,
`tcode` char(2) DEFAULT NULL,
`tdate` int(11) DEFAULT NULL,
`visit` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `tcode` (`tcode`),
KEY `ttype` (`ttype`),
KEY `tdate` (`tdate`)
ENGINE=MyISAM
I have two query on x.php same as:
SELECT * FROM table_name WHERE id='10' LIMIT 1
UPDATE table_name SET visit=visit+1 WHERE id='10' LIMIT 1
My first problem is that whether updating 'visit' in table cause reindexing and decreasing performance or not? Note to this point that 'visit' is not key.
Second method may be creating new table that contain 'visit' like as follow:
'newid' int(10) unsigned NOT NULL ,
`visit` int(11) DEFAULT '0',
PRIMARY KEY (`newid`),
ENGINE=MyISAM
So selecting by
SELECT w.*,q.visit FROM table_name w LEFT JOIN table_name2 q
ON (w.id=q.newid) WHERE w.id='10' LIMIT 1
UPDATE table_name2 SET visit=visit+1 WHERE newid='10' LIMIT 1
Is second method prefered rescpect to first method? Which one would have better performance and would be quick?
Note: all sql queries would be run by PHP (mysql_query command). Also I need first table indexes for other queries on other pages.
I'd say your first method is the best, and simplest. Updating visit will be very fast and no updating of indexes needs to be performed.
I'd prefer the first, and have used that for similar things in the past with no problems. You can remove the limit clause since id is your primary key you will never have more than 1 result, although the query optimizer probably does this for you.
There was a question someone asked earlier to which I responded with a solution you may want to consider as well. When you do 'count' columns you lose the ability to mine the data later. With a transaction table not only can you get 'views' counts, but you can also query for date ranges etc. Sure you will carry the weight of storing potentially hundreds of thousands of rows, but the table is narrow and indices numeric.
I cannot see a solution on the database side... Perhaps you can do it in PHP: If the user has a PHP session, you could, for example, only update the visitor count each 10th time, like:
<?php
session_start();
$_SESSION['count']+=1;
if ($_SESSION['count'] > 10) {
do_the_function_that_updates_the_count_plus_10();
$_SESSION['count'] = 0;
}
Of course you loose some counts, this way, but perhaps this is not that important?