Sql Index Performance? - php

I have a small table called "DataVisitorActivity" with this fields
id int auto_increment primary key,
vID int null,
category varchar(128) null,
timestamp timestamp default CURRENT_TIMESTAMP not null,
value text null,
handle text null
it have 2 index fields
handle_index(handle)
DataVisitorActivity_vID_index(vID)
Until now I had no performance problems all worked in around 0.01 seconds.
currently the table have around 2Milion entrys and it will get bigger every day (We save every website the user visits in this list)
The only thing I had to change last time I edit the table was to set "handle" to "text" because we have really long strings that get saved in that field.
with that change the query I use
SELECT COUNT(*) AS `blog_count`, handle FROM DataVisitorActivity WHERE value = "blog" GROUP BY handle ORDER BY blog_count DESC Limit 5
this time it needs 0.1 - 0.3 seconds still fine for me.
I saw now that the query sometimes(looks random) need around 5-15 seconds to execute.
I just wrote a while loop and let it run 10x10 times total 100 times.
around 60 was under 1 second 20 was under 5 seconds and all other was bigger than 5 seconds.
So My question is: is this query taking so long because the table is getting bigger and bigger? Why does the execution time changes so hard?
Edit: In phpmayadmin this query is exequtet in under 0.001 seconds every time

I would think that your GROUP BY handle is the problem. How big can the field get and do you have an index on it? Check here for indices on text columns: https://dev.mysql.com/doc/refman/5.5/en/column-indexes.html.
A possible solution would be to add a column where you store for example a sha1 hash of the handle column. That will have a fixed width so you can easily add an index - and GROUP BY - on that. Then use EXPLAIN to see where you can improve more.

Related

Purge old data in dynamo db by rate limiting in php

I have dataset in dynamodb, whose primary key is user ID, and timestamp is one of the data attribute. I want to run a purge query on this table, where timestamp is older than 1 week.
I do not want to eat up all writes per s units. I would ideally want a rate limiting delete operation(in php). Otherwise for a dataset that's 10sof GBs in size, it will stop other writes.
I was wondering on lines of usingglobal secondary indexing on timestamp (+user ID) would help reduce the rows to be scanned. But again, I'd not want to thrash table such that other writes start failing.
Can someone provide rate limiting insert/delete example code and references for this in php?
You can create a global secondary index:
timestampHash (number, between 1 and 100)
timestamp (number)
Whenever you create/update your timestamp, also set the timestampHash attribute as a random number between 1 to 100. This will distribute the items in your index evenly. You need this hash because to do a range query on a GSI, you need a hash. Querying by user id and timestamp doesn't seem to make sense because that will only return one item every time and you will have to loop over all your users (assuming there is one item per user id).
Then you can run a purger that will do a query 100 times for each timestampHash number and all items with timestamp older than 1 week. Between each run you can wait 5 minutes, or however long you think is appropriate, depending on the number of items you need to purge.
You can use BatchWriteItem to leverage the API's multithreading to delete concurrently.
In pseudocode it looks like this:
while (true) {
for (int i = 0; i < 100; i++) {
records = dynamo.query(timestampHash = i, timestamp < Date.now());
dynamo.batchWriteItem(records, DELETE);
}
sleep(5 minutes);
}
You can also catch ProvisionedThroughputExceededException and do an exponential back off so that if you do exceed the throughput, you will reasonably stop and wait until your throughput recovers.
Another way is to structure structure your tables by time.
TABLE_08292016
TABLE_09052016
TABLE_09122016
All your data for the week of 08/28/2016 will go into TABLE_08292016. Then at the end of every week you can just drop the table.

Finding Interval of a data present on latest 2 dates

I'm developing a web-based tool that can help analyze number intervals that occurs in a 6-digit lottery.
Let us focus on a certain number first. Say 7
The sql query I've done so far:
SELECT * FROM `l642` WHERE `1d`=7 OR `2d`=7 OR `3d`=7 OR `4d`=7 OR `5d`=7
OR `6d`=7 ORDER BY `draw_date` DESC LIMIT 2
This will pull the last two latest dates where number 7 is present
I'm thinking of using DATEDIFF but I'm confused on how to get the previous value to subtract it on the latest draw_date
My goal is to list the intervals of numbers 1-42 and I'll plan to accomplish it using PHP.
Looking forward to your help
A few ideas spring to mind.
(1) First, since you perfectly have your result set ordered, use PHP loop on the two rows getting $date1 =$row['draw_date']. Then fetch next/last row and set $date2 =$row['draw_date']. With these two you have
$diff=date_diff($date1,$date2);
as the difference in days.
(2)
A second way is to have mysql return datediff by including a rownumber in the resultset and doing a self-join with aliases say alias a for row1 and alias b for row2.
datediff(a.draw_date,b.drawdate).
How one goes about getting rownumber could be either:
(2a) rownumber found here: With MySQL, how can I generate a column containing the record index in a table?
(2b) worktable with id int auto_increment primary key column with select into from your shown LIMIT 2 query (and a truncate table worktable between iterations 1 to 42) to reset auto_increment to 0.
The entire thing could be wrapped with an outer table 1 to 42 where 42 rows are brought back with 2 columns (num, number_of_days), but that wasn't your question.
So considering how infrequent you are probably doing this, I would probably recommend not over-engineering it and would shoot for #1

Mysql Storing very large number of data : UUID as primary key vs any other logic [duplicate]

This question already has answers here:
MySQL PRIMARY KEYs: UUID / GUID vs BIGINT (timestamp+random)
(4 answers)
Closed 8 years ago.
I am developing a GPS device based application in Cakephp 2.0 and mysql (InnoDB). Each device sends data every minute to the Db and I need to make it scalable to a very large number of devices simultaneously to the server.
I have not used BIGINT auto increment as primary key because there is a limit and max value beyond which the limit to BIGINT will be reached and the system will fall apart, even if far away.
I created the primary key as char(36) and generated UUID from php and started storing the data. Primarily because the limit and uniqueness of the primary key will never stop and the design will never fail.
Uniqueness is the only reason for me and nothing else.
Problems:
The system is in pilot testing mode and time to insert the data has increased to very large extent. Refer to http://kccoder.com/mysql/uuid-vs-int-insert-performance/ This is exactly happening in my case where by time, the time to insert the data is increasing and i expect that it might get worse in the coming days as the number of data keeps increasing. There are around 2,00,000 data in the table now.\
The primary key being char(36), i assume the performance is getting effected in inserts, select statements and joins.
My idea is to replace the primary key UUID column with a varchar(50) column and have Device ID / IMEI Number + timestamp stored as primary key and they will always be unique. But on the downside, it's again a varchar field and performance issues on long run.
What is the best option for me in the long run?
Just use BIGINT UNSIGNED.
An unsigned INT64 is something like 4.000.000.000 times 4.000.000.000. So assuming you have one device for each of the less than 8 billion people on the planet logging once per second, that leaves you with 2 billion seconds or more than 63 years. I assume, that in 63 years an INT128 is normal (or small).
I am also quite sure, that you will run into very different classes of trouble long before you reach 2^64 rows in a single table.

MySQL - Return login attempts by intervals

I'm trying to block a user's access after a given number of attempts. Before stop reading because this question has been asked before, let me explain my case (and if it's been already asked, please point out the duplicated question).
What I'd like
What I'd like to implement is a block-time per intervals. So, if there have been:
3 attempts in 5 minutes: user blocks by 5 minutes
5 attempts in 10 minutes: user blocks by 10 minutes
10 attempts in 15 minutes: user blocks by 15 minutes
This takes into account what e-mail has tried to login and from what IP. So, if the user tries to access the same e-mail account from a different IP, it's able to do so (and if the same IP tries another user it's able to do so too).
Please, I know it's not the best solution in security issues - maybe - but solving the problem will help me also in my MySQL learning, so I'd like to skip security recommendations and center in how to achieve this in a MySQL query, if possible.
What I have
I've got a table in my database with Login Attempts:
CREATE TABLE IF NOT EXISTS `er_login_attempts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`target_mail` varchar(128) COLLATE utf8_unicode_ci NOT NULL,
`attempt_date` datetime NOT NULL,
`attempt_success` tinyint(4) DEFAULT '1',
`ip_from` varchar(36) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
So, I've got a register of the dates of the login attempts. Each attempt creates a new register. This is done this way because of the different intervals (so I cannot use a "attempts" field as I would'nt know how much time has passed since last attempt).
If the attempt finally gets successfull, the "attempt_success" of this entry is set to 1. If not, it's set to 0.
What I've tried
What I've tried to do is to write a MySQL query that returns the number of wrong attempts per intervals. Then, I'd take this in PHP and make some IF's to check if the limit has been reached for each interval.
So, the query I've tried (please, do not laugh):
SELECT count(cinc.id) AS cinc_minuts, count(deu.id) AS deu_minuts, count(quinze.id) AS quinze_minuts
FROM er_login_attempts cinc, er_login_attempts deu, er_login_attempts quinze
WHERE 'mail#mail.com' IN (cinc.target_mail, deu.target_mail, quinze.target_mail)
AND '1.11.11.111' IN (cinc.ip_from, deu.ip_from, quinze.ip_from)
AND cinc.attempt_date BETWEEN MAX(cinc.attempt_date) AND DATE_SUB(MAX(cinc.attempt_date), INTERVAL 5 minutes)
AND deu.attempt_date BETWEEN MAX(deu.attempt_date) AND DATE_SUB(MAX(deu.attempt_date), INTERVAL 10 minutes)
AND quinze.attempt_date BETWEEN MAX(quinze.attempt_date) AND DATE_SUB(MAX(quinze.attempt_date), INTERVAL 5 minutes)
But! I get a "Group By" error. I've tried to add a "Group By" expression to group this by target_mail, by IP, by INTERVALS, and I'm getting the same error again and again.
However, as you can see, this query does not contemplate that a successfull login attempt has been made, and it would be wonderful that if there has been a successfull attempt, all older wrong attempts are ignored - but not deleted.
With all this
I hope you can get an idea of what I'm trying to achieve. Again, if it's been asked before I'd thank those who point me in the right direction and - if it's been so - sorry for repeating the question.
If you need further information or I've skipped something important, please, let me know and I'll try to write down the missing information.
Thanks to everybody for your time!
First thing of note is DO NOT STORE IP ADDRESSES AS STRINGS - they are a convenient way to make large numbers readable by human beings. But if you want to start doing further processing on the address (scan for logons in the subnet, apply country specific validation...) then you need to use the IP number.
SELECT
SUM(IF(secs_ago<=300, 1, 0)) as attempts_in_last_5,
SUM(IF(secs_ago<=600, 1, 0)) as attempts_in_last_10,
SUM(IF(secs_ago<=900, 1, 0)) as attempts_in_last_15
FROM
(
SELECT
UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(attempt_date) AS secs_ago
FROM er_login_attempts
WHERE target_email='mail#mail.com'
AND INET_ATON('1.11.11.111')=ip_from
AND attempt_date>NOW() - INTERVAL 15 MINUTES
) ilv;
....and your primary key is pretty much useless for the purpose you have described - but you do need an index on attempt_date, target_email and ip_from (most likely in that order).
This needs some work to accommodate the blocking period - simplest solution would be to add a second table.

Subset-sum problem in PHP with MySQL

following Problem:
I have a MySQL database with songs in it. The database has the following structure:
id INT(11)(PRIMARY)
title VARCHAR(255)
album VARCHAR(255)
track INT(11)
duration INT(11)
The user should be able to enter a specific time into a php form and the php function should give him a list of all possible combinations of songs which add up to the given time ±X min.
So if the user wants to listen to 1 hour of music ±5 minutes he would enter 60 minutes and 5 minutes of threshold into the form and would recieve all possible song sets which add up to a total of 55 to 65 minutes. It should not print out duplicates.
I already found several approaches to this problem but they did only give me back the durations which add up to X and not the song names etc. So my problem is how to solve this so that it gives me back the IDs of the songs which add up to the desired time or print out the list with the corresponding song names.
This seems to be one of the best answers I found, but I am not able to adapt it to my database.
What you are describing is a Knapsack Problem. When I was in college, we used Dynamic Programming to attack problems like this.
The brute-force methods (just try every combination until one (or more) works is a complexity O(n!), or factorial-length problem - extremely lengthy to iterate through and calculate!
If your times for tracks are stored as an int (in seconds seems the easiest math to me), then your knapsack size is 3300-3900 (3600 seconds == 1 hour).
Is your goal to always return the first set that matches, or to always return a random set?
Note - by bounding your sack size, you greatly expand the number of possible answers.

Categories