I'm in the process of churning through some raw data that I have. The data are in a MySQL database. The data lists, in a millisecond-by-millisecond format, which of a number of possible 'events' are currently happening. It has only a few columns:
id - unique identifier for the row
event - indicates which event is currently occurring
What I would like to do is get some basic information regarding these data. Specifically, I'd like to create a table that has:
The id that an event starts
The id that an event ends
A new id indexing the events and their occurrence, as well as a column detailing which event is currently happening.
I know that this would be easy to deal with using PHP, just using a simple loop through all the records, but I'm trying to push the boundaries of my MySQL knowledge for a bit here (it may be dangerous, I know!!).
So, my question is this: would a cursor be the best thing to use for this? I ask because events can occur multiple times, so doing something like grouping by the event type won't work - or will it? I'm just wondering if there is a clever way of dealing with this I have missed, without needing to go through each row sequentially.
Thanks!
To demonstrate what I commented earlier about, say you have the following table:
event_log
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
start DATETIME
event VARCHAR(255) # or whatever you want for datatype
Gathering this information is as simple as:
SELECT el.*,
(SELECT el_j.start # -
FROM event_log el_j # |
WHERE el_j.id > el.id # |- Grab next row based on the next ID
LIMIT 1) as end # -
FROM event_log
ORDER BY start;
Related
I have a PHP - MySQL set up . I have a table devicevalue structure of it is like this
devId | vals | date | time
xysz | 23 | 2020.02.17 | 22.06
abcs | 44 | 2020.02.31 | 22.07
The vals columns hold temperature values .
any user loggin in on my webapp have access to only certain devices.
Here are steps
On my website "a user" selects from and to dates for which he wants to see data & submit it
Then these dates are passed a page "getrecords.php " ,where there are lot select queries ( and many are in loop ) to fetch filtered data in required format
The problem is that this table holds almost 2-3 Million records . and in every where clause I have to add to and from conditions. this causes to search in entire table .
My question is there any way that I can get temporary table at step 1 which will have only certain rows based on given two dates and then all my queries on other page will be against that temporary table ?
Edit: If your date column is a text string, you must convert it to a column of type DATE or TIMESTAMP, or you will never get good performance from this table. A vast amount of optimization code is in the MySQL server to make handling of time/date data types efficient. If you store dates or times as strings, you defeat all that optimization code.
Then, put an index on your date column like this.
CREATE INDEX date_from_to ON devicevalue (`date`, devId, vals, `time` );
It's called a covering index because the entire query can be satisfied using it only.
Then, in your queries use
WHERE date >= <<<fromdate>>>
AND date < <<<todate>> + INTERVAL 1 DAY
Doing this indexing correctly gets rid of the need to create temp tables.
If your query has something like `WHERE devId = <<>> in it, you need this index instead (or in addition).
CREATE INDEX date_id_from_to ON devicevalue (devId, `date`, vals, `time` );
If you get a chance to change this table's layout, combine the date and time columns into a single column with TIMESTAMP data type. The WHERE clauses I showed you above will still work correctly if you do that. And everything will be just as fast.
SQL is made to solve your kind of problem simply and fast. With a good data choices and proper indexing, a few million records is a modestly-sized table.
Short answer: No. Don't design temp tables that need to live between sessions.
Longer answer:
Build into your app that the date range will be passed from one page to the next, then use those as initial values in the <form> <input type=text...>
Then make sure you have a good composite index for the likely queries. But, to do that, you must get a feel for what might be requested. You will probably need a small number of multi-column indexes.
You can probably build a SELECT from the form entries. I rarely need to use more than one query, but it is mostly "constructed" on the fly based on the form.
It is rarely a good idea to have separate columns for date and time. It makes it very difficult, for example, to say noon one day to noon the next day. Combine into a DATETIME or TIMESTAMP.
O.Jones has said a lot of things that I would normally add here.
I'm setting up to gather long time statistics. It will be recorded in little blocks that I'm planning to stick all into one TEXT field, latest first.. sorta like this
[date:03.01.2016,data][date:02.01.2016,data][date:01.01.2016,data]...
it will be more frequent than that (just a sample) but should remain small enough to keep recording for decades, yet big enough to make me want to optimize it.
I'm looking for 2 things
Can you append to the front of a field in mysql?
Can you read the field partially, just the first 100 characters for example?
The blocks will be fixed length so I can accurately estimate how many characters I need to download to display statistics for X time period.
The answer to your two questions is "yes":
update t
set field = concat($newval, field)
where id = $id;
And:
select left(field, 100)
from t
where id = $id;
(These assume that you have multiple rows in the table.)
That said, you method of storing the data is absolutely not the right thing to do in a relational database.
Presumably, you want a table that looks something like this:
create table t (
tId int auto_increment primary key,
creationDate date,
data <something>
);
(This may be more complicated if data should be multiple columns.)
Then you insert into the table:
insert into t(createDate, data)
select $date, $data;
And you can fetch the most recent row:
select t.*
from t
order by tId desc
limit 1;
All of these are just examples, because your question doesn't give a complete picture of the data.
My database has a table messages with 3 fields: 'messageid',messagetext (varchar), dateposted (datetime)
I want to store a bunch of messages in the field messagetext along with their respective date of posting in the field dateposted. A lot of these messages will have hashtags in them.
Then, using PHP and MySQL I want to find out which hashtags are the top 5 most frequently mentioned hashtags in messages posted in the past week.
How can I do this? I'd really appreciate any help. Many thanks in advance.
Do not take me wrongly, but you've set up yourself for a world of hurt. The best way to proceed would be to follow lonesomeday's advice and parse hashtags at insert time. This also greatly reduces processing time as well as making it more deterministic (the workload is "spread" between inserts).
If you want to proceed anyway, you need to tackle several problems.
1) Recognize the tags.
2) Multiple-select the tags. If you have a message saying that "#MySQL splitting is #cool", you want to get two rows from that one message, one saying 'MySQL', the other 'cool'.
3) Selecting the appropriate messages
4) Performances
You can approach this in at least two ways. You can use a stored function , which you find here on SO (actually from this site) - you will have to modify it though.
This syntax will get you the first occurrence of #hashtag in value plus all the text following it:
select substring(value, LENGTH(substring_index(value, '#', 1))+1);
You will then need to decide where, for each #hashtag, it #stops. (And it could be #parenthesized). At this point you need a regexp, or to search for a sequence of at least one alphanumeric character - in regexp parlance, [a-zA-Z0-9]+ - by specifying all possible characters or by using a loop, i.e. "#" is OK, "#t" is OK, "#ta" is OK, "#tag" is OK, "#tag," is not and so your hashtag is '#tag' (or 'tag').
Another more promising approach is to use a user defined function to capture the hashtags; you can use PREG_CAPTURE.
You will probably have to merge both approaches: modifying the stored function's setup and inner loop to read
DECLARE cur1 CURSOR FOR SELECT messages.messagetext
FROM messages
WHERE messages.messagetext LIKE '%#%';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
DROP TEMPORARY TABLE IF EXISTS table2;
CREATE TEMPORARY TABLE table2( hashtag` VARCHAR(255) NOT NULL
) ENGINE=Memory;
...
SET occurrence = (SELECT LENGTH(msgtext)
- LENGTH(REPLACE(msgtext, '#, ''))
+1);
SET i=1;
WHILE i <= occurrence DO
INSERT INTO table2 VALUES SELECT PREG_CAPTURE('/#([a-z0-9]+)/i', messagetext, occurrence));
SET i = i + 1;
END WHILE;
...
This will return a list of message-ids and hashtags. You then need to GROUP them BY hashtag, count them and ORDER by count DESC, finally adding LIMIT 5 to only get the five most popular.
I wrote a small script which uses the concept of long polling.
It works as follows:
jQuery sends the request with some parameters (say lastId) to php
PHP gets the latest id from database and compares with the lastId.
If the lastId is smaller than the newly fetched Id, then it kills the
script and echoes the new records.
From jQuery, i display this output.
I have taken care of all security checks. The problem is when a record is deleted or updated, there is no way to know this.
The nearest solution i can get is to count the number of rows and match it with some saved row count variable. But then, if i have 1000 records, i have to echo out all the 1000 records which can be a big performance issue.
The CRUD functionality of this application is completely separated and runs in a different server. So i dont get to know which record was deleted.
I don't need any help coding wise, but i am looking for some suggestion to make this work while updating and deleting.
Please note, websockets(my fav) and node.js is not an option for me.
Instead of using a certain ID from your table, you could also check when the table itself was modified the last time.
SQL:
SELECT UPDATE_TIME
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'yourdb'
AND TABLE_NAME = 'yourtable';
If successful, the statement should return something like
UPDATE_TIME
2014-04-02 11:12:15
Then use the resulting timestamp instead of the lastid. I am using a very similar technique to display and auto-refresh logs, works like a charm.
You have to adjust the statement to your needs, and replace yourdb and yourtable with the values needed for your application. It also requires you to have access to information_schema.tables, so check if this is available, too.
Two alternative solutions:
If the solution described above is too imprecise for your purpose (it might lead to issues when the table is changed multiple times per second), you might combine that timestamp with your current mechanism with lastid to cover new inserts.
Another way would be to implement a table, in which the current state is logged. This is where your ajax requests check the current state. Then generade triggers in your data tables, which update this table.
You can get the highest ID by
SELECT id FROM table ORDER BY id DESC LIMIT 1
but this is not reliable in my opinion, because you can have ID's of 1, 2, 3, 7 and you insert a new row having the ID 5.
Keep in mind: the highest ID, is not necessarily the most recent row.
The current auto increment value can be obtained by
SELECT AUTO_INCREMENT FROM information_schema.tables
WHERE TABLE_SCHEMA = 'yourdb'
AND TABLE_NAME = 'yourtable';
Maybe a timestamp + microtime is an option for you?
I build a like system for a website and I'm front of a dilemma.
I have a table where all the items which can be liked are stored. Call it the "item table".
In order to preserve the speed of the server, do I have to :
add a column in the item table.
It means that I have to search (with a regex in my PHP) inside a string where all the ID of the users who have liked the item are registered, each time a user like an item. This in order verify if the user in question has (or not) already liked the item before. In this case, I show a different button on my html.
Problem > If I have (by chance) 3000 liked on an item, I fear the string to begin very big and heavy to regex each time ther is a like
on it...
add a specific new table (LikedBy) and record each like separately with the ID of the liker, the name of the item and the state of the like (liked or not).
Problem > In this case, I fear for the MySQL server with thousand of rows to analyze each time a new user like one popular item...
Server version: 5.5.36-cll-lve MySQL Community Server (GPL) by Atomicorp
Should I put the load on the PHP script or the MySql Database? What is the most performant (and scalable)?
If, for some reasons, my question does not make sens could anyone tell me the right way to do the trick?
thx.
You have to create another table call it likes_table containing id_user int, id_item int that's how it should be done, if you do like your proposed first solution your database won't be normalized and you'll face too many issues in the future.
To get count of like you just have to
SELECT COUNT(*) FROM likes_table WHERE id_item='id_item_you_are_looking_for';
To get who liked what:
SELECT id_item FROM likes_table WHERE id_user='id_user_you_are_looking_for';
No regex needed nothing, and your database is well normalized for data to be found easily. You can tell mysql to index id_user and id_item making them unique in likes_table this way all your queries will run much faster
With MySQL you can set the user ID and the item ID as a unique pair. This should improve performance by a lot.
Your table would have these 2 columns: item id, and user id. Every row would be a like.