Updating Database From Static File JSON Feed - php

I have a PHP script pulling a JSON file that is static and updates every 10 seconds. It has details about some events that happen and it just adds to the top of the JSON file. I then insert them into a MySQL database.
Because I have to pull every event every time I pull the file, I will only be inserting new events. The easy way would be to search for the event in the database (primary keys are not the same), but I am talking about ~4000 events every day, and I do not want that many queries just to see if it exists.
I am aware of INSERT IGNORE, but it looks like it only uses PRIMARY_KEY to do this.
What can I do (preferably easily) to prevent duplicates on two keys?
Example:
I have a table events with the following columns:
ID (irrelevant, really)
event_id (that I need to store from the source)
action_id (many action_ids belong to one event_id)
timestamp
whatever...
And my data is my JSON comes out on the first pull like this:
event_id|action_id|...
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
2 | 3
Then the next pull is this:
event_id|action_id|...
1 | 1
1 | 2
1 | 3
1** | 4**
1** | 5**
2 | 1
2 | 2
2 | 3
2** | 4**
I only want the rows marked with asterisks to be inserted, and the others to be ignored. Remember, primary_key column id is completely in this table, and I just use it for ubiquity.
What command can I use to "INSERT" every event I pull, but ONLY adding those that aren't duplicated by way of the two columns event_id and action_id.
Thanks.

Create a unique index of both columns.
CREATE
UNIQUE INDEX event_action
ON tablename (event_id, action_id)

Related

Second Column with sequence numbers in database

I am designing a system where I have multiple shops where each shop should have its own set of sequential numbers for its invoices. Obviously my primary ID column will be sequential for all invoices in the system so obviously I will need another column to store this "shop specific" invoice number. What is the best manner to store and get the next ID for this shop specific number? For example would it be safe to simply get it right from the invoices table doing something like: SELECT MAX(INV_NUM) FROM INVOICES WHERE SHOP_ID = # and add one, and subsequently create the new invoice record? Or would there be issues with the timing if 2 instances of my script were to run at the same time? For example the first script fetches the next ID, but before it gets the chance to create the new invoice record the second instance requests the next ID and gets the same one as the first script... I was then thinking about just storing the last used number in a separate table and as soon as I request the next invoice number immediately write the new next order number and persist it so that the time between fetching the next order number and creating the record that the next request would rely on is kept to an absolute minimum... literally 3 lines of code:
$nextId = $shop->getLastId() + 1;
$shop->setLastId($nextId);
$em->persist();
Invoices
------------------------------
| ID | INV_NUM | SHOP_ID |
------------------------------
| 1 | 99 | 1 |
| 2 | 100 | 2 |
| 3 | 100 | 1 |
Shops
-------------------
| ID | LAST_ID |
-------------------
| 1 | 100 |
| 2 | 100 |
If you're using Doctrine, which I assume you are since you're using Symfony then you can lifecycle events to listen for changes in your entities. Before saving you can then update your second column to the incremented value.
Regarding race conditions, to be sure you don't have bad data in your database you can put a unique constraint on your shop ID and invoice number.

MySql db structure to store a list of items in sequence

I need to store and retrieve items of a course plan in sequence. I also need to be able to add or remove items at any point.
The data looks like this:
-- chapter 1
--- section 1
----- lesson a
----- lesson b
----- drill b
...
I need to be able to identify the sequence so that when the student completes lesson a, I know that he needs to move to lesson b. I also need to be able to insert items in the sequence, like say drill a, and of course now the student goes from lesson a to drill a instead of going to lesson b.
I understand relational databases are not intended for sequences. Originally, I thought about using a simple autoincrement column and use that to handle the sequence, but the insert requirement makes it unworkable.
I have seen this question and the first answer is interesting:
items table
item_id | item
1 | section 1
2 | lesson a
3 | lesson b
4 | drill a
sequence table
item_id | sequence
1 | 1
2 | 2
3 | 4
4 | 3
That way, I would keep adding items in the items table with whatever id and work out the sequence in the sequence table. The only problem with that system is that I need to change the sequence numbers for all items in the sequence table after an insertion. For instance, if I want to insert quiz a before drill a I need to update the sequence numbers.
Not a huge deal but the solutions seems a little overcomplicated. Is there an easier, smarter way to handle this?
Just relate records to the parent and use a sequence flag. You will still need to update all the records when you insert in the middle but I can't really think of a simple way around that without leaving yourself space to begin with.
items table:
id | name | parent_id | sequence
--------------------------------------
1 | chapter 1 | null | 1
2 | section 1 | 1 | 2
3 | lesson a | 2 | 3
4 | lesson b | 2 | 5
5 | drill a | 2 | 4
When you need to insert a record in the middle a query like this will work:
UPDATE items SET sequence=sequence+1 WHERE sequence > 3;
insert into items (name, parent_id, sequence) values('quiz a', 2, 4);
To select the data in order your query will look like:
select * from items order by sequence;

Safely auto increment MySQL field based on MAX() subquery upon insert

I have a table which contains a standard auto-incrementing ID, a type identifier, a number, and some other irrelevant fields. When I insert a new object into this table, the number should auto-increment based on the type identifier.
Here is an example of how the output should look:
id type_id number
1 1 1
2 1 2
3 2 1
4 1 3
5 3 1
6 3 2
7 1 4
8 2 2
As you can see, every time I insert a new object, the number increments according to the type_id (i.e. if I insert an object with type_id of 1 and there are 5 objects matching this type_id already, the number on the new object should be 6).
I'm trying to find a performant way of doing this with huge concurrency. For example, there might be 300 inserts within the same second for the same type_id and they need to be handled sequentially.
Methods I've tried already:
PHP
This was a bad idea but I've added it for completeness. A request was made to get the MAX() number for the item type and then add the number + 1 as part of an insert. This is quick but doesn't work concurrently as there could be 200 inserts between the request for MAX() and that particular insert leading to multiple objects with the same number and type_id.
Locking
Manually locking and unlocking the table before and after each insert in order to maintain the increment. This caused performance issues due to the number of concurrent inserts and because the table is constantly read from throughout the app.
Transaction with Subquery
This is how I'm currently doing it but it still causes massive performance issues:
START TRANSACTION;
INSERT INTO objects (type_id,number) VALUES ($type_id, (SELECT COALESCE(MAX(number),0)+1 FROM objects WHERE type_id = $type_id FOR UPDATE));
COMMIT;
Another negative thing about this approach is that I need to do a follow up query in order to get the number that was added (i.e. searching for an object with the $type_id ordered by number desc so I can see the number that was created - this is done based on a $user_id so it works but adds an extra query which I'd like to avoid)
Triggers
I looked into using a trigger in order to dynamically add the number upon insert but this wasn't performant as I need to perform a query on the table I'm inserting into (which isn't allowed so has to be within a subquery causing performance issues).
Grouped Auto-Increment
I've had a look at grouped auto-increment (so that the number would auto-increment based on type_id) but then I lose my auto-increment ID.
Does anybody have any ideas on how I can make this performant at the level of concurrent inserts that I need? My table is currently InnoDB on MySQL 5.5
Appreciate any help!
Update: Just in case it is relevant, the objects table has several million objects in it. Some of the type_id can have around 500,000 objects assigned to them.
Use transaction and select ... for update. This will solve concurrency conflicts.
In Transaction with Subquery
Try to make index on column type_id
I think by making index on column type_id it will speed up your subquery.
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,type_id INT NOT NULL
);
INSERT INTO my_table VALUES
(1,1),(2,1),(3,2),(4,1),(5,3),(6,3),(7,1),(8,2);
SELECT x.*
, COUNT(*) rank
FROM my_table x
JOIN my_table y
ON y.type_id = x.type_id
AND y.id <= x.id
GROUP
BY id
ORDER
BY type_id
, rank;
+----+---------+------+
| id | type_id | rank |
+----+---------+------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 4 | 1 | 3 |
| 7 | 1 | 4 |
| 3 | 2 | 1 |
| 8 | 2 | 2 |
| 5 | 3 | 1 |
| 6 | 3 | 2 |
+----+---------+------+
or, if performance is an issue, just do the same thing with a couple of #variables.
Perhaps an idea to create a (temporary) table for all rows with a common "type_id".
In that table you can use auto-incrementing for your num colomn.
Then your num shoud be fully trustable.
Then you can select your data and update your first table.

MySQL: SELECT conditional statement with grouping based off of per record parameters possible?

I have a report I'm rewriting for an application using MySQL as the database. Currently, the report is using a lot of grunt work coming from php, which creates arrays, re-stores them into a temp database then generates results from that temp DB.
One of the main goals from rewriting a bulk of all this code is to simplify and clean a lot of my old code and am wondering whether the below process can be simplified, or even better done solely on MySQL to let php just handle the dstribution of the data to the client.
I will use a made up scenario to describe what I am attempting to do:
Let's assume the following table (please note in real app, this table's information is actually pulled from several tables, but this should get the point across for clarity):
+----+-----------+--------------+--------------+
| id | location | date_visited | time_visited |
+----+-----------+--------------+--------------+
| 1 | place 1 | 2012-04-20 | 11:00:00 |
+----+-----------+--------------+--------------+
| 2 | place 2 | 2012-04-20 | 11:06:00 |
+----+-----------+--------------+--------------+
| 3 | place 1 | 2012-04-20 | 11:06:00 |
+----+-----------+--------------+--------------+
| 4 | place 3 | 2012-04-20 | 11:20:00 |
+----+-----------+--------------+--------------+
| 5 | place 2 | 2012-04-20 | 11:21:00 |
+----+-----------+--------------+--------------+
| 6 | place 1 | 2012-04-20 | 11:22:00 |
+----+-----------+--------------+--------------+
| 7 | place 3 | 2012-04-20 | 11:23:00 |
+----+-----------+--------------+--------------+
The report I need requires me to first list each location and then the number of visits made to that place. However, the caveat and what makes the query difficult for me is that there needs to be a time interval met for the visit to count whithin this report.
For example: Let's say the interval between visits to any given place is 10 minutes.
The first entry is locked in automatically because there are no previous entries, and so is the second since there are no other entries for 'place 2' yet. However on the third entry, place 1 is checked for the last time it was visited, which was less than the interval defined (10 minutes), therefore the report would ignore this entry and move along to the next one.
In essence, we are checking on a case by case scenario where the time interval is not from the last entry, but from the last entry from the same location.
The results from the report should look something like this in the end:
+----+-----------+--------+
| id | location | visits |
+----+-----------+--------+
| 1 | place 1 | 2 |
+----+-----------+--------+
| 2 | place 2 | 2 |
+----+-----------+--------+
| 3 | place 3 | 1 |
+----+-----------+--------+
My current implementation on a basic level goes through the following steps to acquire the above result set:
MySQL query creates one temp table with a list of all the required locations and their ID.
MySQL query selects all the visit data whithin the specified time frame and passes it to PHP.
PHP & MySQL populate the temporary table with the visits data, PHP does the grunt work here.
MySQL selects data from temporary table and returns it to client for display.
My question is. Is there a way to do most of this with MySQL alone? What I've been trying to find is a way to write a MySQL query which can parse through the select statement and select only the visits which meet the above criteria and then finally groups it by location and provides me with a COUNT(*) of each group.
I really don't know if it's possible and am in hopes that one of the database gurus out there might be able to shed some light on how to do this.
Suppose you have a table (probably temporary) of a slightly different structure:
CREATE TABLE `visits` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`location` varchar(45) NOT NULL,
`visited` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `loc_vis` (`location`,`visited`)
) ENGINE=InnoDB;
INSERT INTO visits (location, visited) VALUES
('place 1', '2012-04-20 11:00:00'),
('place 2', '2012-04-20 11:06:00'),
('place 1', '2012-04-20 11:06:00'),
('place 3', '2012-04-20 11:20:00'),
('place 2', '2012-04-20 11:21:00'),
('place 1', '2012-04-20 11:22:00'),
('place 1', '2012-04-20 11:23:00');
which, as you see, has an index on (location,visited). Then the following query will use the index, that is read data in the order of the index, and return the results you expected:
SELECT
location,
COUNT(IF(#loc <> #loc:=location,
#vis:=visited,
IF(#vis + INTERVAL 10 MINUTE < #vis:=visited,
visited,
NULL))) as visit_count
FROM visits,
(SELECT #loc:='', #vis:=FROM_UNIXTIME(0)) as init
GROUP BY location;
Result:
+----------+-------------+
| location | visit_count |
+----------+-------------+
| place 1 | 2 |
| place 2 | 2 |
| place 3 | 1 |
+----------+-------------+
3 rows in set (0.00 sec)
Some explanation:
The key of the solution is that it fades out the functional nature of SQL, and uses MySQL implementation specifics (they say it is bad, never do it again!!!).
If a table has an index (an ordered representation of column values) and the index is used in a query, that means that the data from the table is read in the order of the index.
GROUP BY operation will benefit from an index (since the data is already grouped there) and will choose it if it is applicable.
All aggregating functions in SQL (except for COUNT(*) which has a special meaning) check each row, and use the value only if it is not NULL (the expression within COUNT above returns NULL for wrong conditions)
The rest is just a hacky representation of procedural iteration over a list of rows (which is read in the order of the index, that is ordered by location asc, visisted asc): I initialize some variables, if location differs from the previous row - I count it, if not - I check the interval and return NULL if it is wrong.
You can populate the temporary table using a INSERT / SELECT statement.
See manual. http://dev.mysql.com/doc/refman/5.0/en/insert-select.html
I'd use the GROUP BY in the SELECT statement to narrow down the places.
For the visits column that can be populated as a COUNT operation, and I think it might be possible to perform that as also part of the INSERT / SELECT.
See manual. http://dev.mysql.com/doc/refman/5.1/en/counting-rows.html
So your SQL might look something like this.
INSERT INTO temp
SELECT * FROM (
SELECT *,COUNT('visits')
FROM source AS table1
GROUP BY location
WHERE date_visited > xxxx AND date_visited < xxxx
)
AS table2
Seriously, that is off the top of my head but it should give you some ideas on how SQL can be structured. But you likely can do the report using just one good query.

comparing values between different rows of database and getting maximum count

i have a table in which a row contains following data. So i need to compare data among themselves and show which data has maximum count.for ex. my table has following fruits name. So i need to compare these fruits among themselves and show max fruit count first.
s.no | field1 |
1 |apple,orange,pineapple |
2 |apple,pineapple,strawberry,grapes|
3 |apple,grapes, |
4 |orange,mango |
i.e apple comes first,grapes second,pineapple third and so on. and these datas are entered dynamically, so whatever the values is entered dynamically it needs to compare among themselves and get max count
Great question.
This is a classical bad outcome of not having the data normalized.
I recommend you to read about Database Normalization, normalize your tables and see after that how easy it is to do this with simple SQL queries
If you need to run queries on column field 1, then why not consider normalization ? Otherwise it might keep on getting complex and dirty in future.
Your current table will look like this (for serianl number 1 only), Pk can be an autoincrement primary key.
Pk | s.no |fruitId|
1 | 1 |1 |
2 | 1 |2 |
3 | 1 |3 |
Your New Table of Fruits
PK |fruitName |
1 |Apple |
2 |Orange |
3 |Pineapple |
This also helps you to avoid redundancy.
Quick solution would be counting the amount of fruits where you insert/update the row and add a fruitCount column. You can then use this column to order by.
Zohaib has to correct solution though - if you have the time and possibility for such changes. And I definitely suggest you to read Tudor's link!

Categories