I am currently examining aerospike for replacing my company MySQL database. Currently, in MySQL, we have a table that stores the transaction data, the table looks like this :
+--------+------------+-----------+------------+-----+--------+
| trx_id | trx_date | client_id | product_id | qty | total |
+--------+------------+-----------+------------+-----+--------+
| 1 | 2015-01-01 | 1 | 1 | 100 | 100000 |
| 2 | 2015-01-02 | 2 | 2 | 200 | 200000 |
| 3 | 2015-01-03 | 3 | 3 | 300 | 300000 |
+--------+------------+-----------+------------+-----+--------+
For reporting, we usually do something like :
SELECT MONTH(trx_date), SUM(qty), SUM(total) FROM transaction WHERE client_id = 1 AND product_id = 1 GROUP BY MONTH(trx_date)
to get the monthly transaction data for a client.
I've read the documentation for the Aerospike PHP client and I don't seem to find anything similar to AND, GROUP BY, or MONTH.
So, in Aerospike PHP client, what is the recommended way to achieve something like that?
Thanks.
Aerospike is a NoSQL key-value store, and as such you can't expect to use SQL with it. However, using Lua as the User-Defined Function (UDF) language, you can extend the basic functionality.
What you are looking for is an aggregation, applying a stream UDF to the results of a query.
There is an example of implementing a GROUP BY x HAVING in the PHP client's documentation for the aggregate() method. The thing to remember is that you want the secondary index query to eliminate as many records as you can, so that predicate should used for the 'WHERE', and the secondary filtering for the 'AND' should happen inside the stream UDF's filter on the smallest possible data set.
Reading the UDF Development Guide would also help.
Related
I'm facing a problem since ages. I've got a large data set of 500 000 rows and I want to print them to a CSV.
On a first case, I need to do calculations from the columns on a row to determine a "result" column. I've found a solution to that. I'm using "fetch" and doing calculation row by row and printing gradually to my csv
Simplified example :
|---------------------------------------|
| ID | type | var1 | var2 | var1 * var2 |
|---------------------------------------|
| 0 | car | 2 | 5 | 10 |
| 1 | moto | 4 | 8 | 32 |
| 2 | car | 0 | 2 | 2 |
|---------------------------------------|
On a second case, I need, from 500k rows, to print calculation involving large amount of rows from these 500k (It can be up to group of 10k+ rows).
Simplified example :
|-----------------------------|
| type | sum var1 * var2 |
|-----------------------------|
| car | 87677670 |
| moto | 3232435 |
|-----------------------------|
BUT, calculations are far more complicated that just a sum of a multiplication. Traduction : they can't be done directly in SQL.
My problem is that if I get all my cars from my database to my php app to loop over, my php memory_limit would be reach. How can I do this gradually like the first example ?
Not : I'm using Oracle 12c and PHP 5.3.5
You can reduce the memory used by making your calculations by blocks. With SQL you can loop multiple request with only a specific amount of rows on each requests.
With Oracle this can be done with FETCH and OFFSET. Here is a piece of documentation where you will find everything you need.
I need help to find the product of an mysql array sorted in groups.
So what I need is 1.2*1.5, and 1.1*1.6. And store them into some variables.
----------------
|Group_ID|Value|
----------------
| 1 | 1.2 |
----------------
| 1 | 1.5 |
----------------
| 2 | 1.1 |
----------------
| 2 | 1.6 |
----------------
Has said above by Mat, your datamodel is not good, you have two choices :
Alter your datamodel to be able to do all calculations using the SGBD (this is the better choice)
Fetch your data and process them to the application side (may be slow if you're not familiar with code tuning and algorithm)
Edit : if your groups are always composed of two rows you can use the solution just above
I would like to be able to track lifetime events of certain item and to be able to reconstruct its state at any time in the past for vizualization purposes. "State" here means a snapshot of several parameters, e.g. location, temperature and being alive/dead. Raw parameter values are recorded/entered only "on change" and independent from each other.
How should I store the parameter change events to be able to reconstruct the state later?
I can think of two possible solutions:
Solution 1: "Snapshot" table
+----------+-------------+------+------+
| Location | Temperature | Dead | Time |
+----------+-------------+------+------+
| A | + | 0 | 001 |
+----------+-------------+------+------+
| A | - | 0 | 002 |
+----------+-------------+------+------+
| B | + | 0 | 005 |
+----------+-------------+------+------+
On parameter change the state itself is updated and stored. To get a state of an item at a certain point is as simple as fetching one row.
This is exactly what I need, except:
Redundant data, all parameters are recorded even if only one has changed at the time
Table has to be altered if attribute set changes in the future
Knowing when a certain parameter changed is impossible without row comparison
Solution 2: Recording events
table stores individual parameters/changes rather than a complete shapshot.
+----+-----------+------------+------+
| ID | EventType | EventValue | Time |
+----+-----------+------------+------+
| 1 | loc | A | 001 |
+----+-----------+------------+------+
| 2 | temp | + | 001 |
+----+-----------+------------+------+
| 3 | temp | - | 002 |
+----+-----------+------------+------+
| 4 | loc | B | 005 |
+----+-----------+------------+------+
| 5 | temp | + | 005 |
+----+-----------+------------+------+
While this solution is more flexible than the first, it is problematic to reconstruct the snapshot. For example, how to efficiently check what is the temperature, location and viability at a time 004 in as few DB queries as possible?
Are there other solutions for this problem?
(P.S. This is for a biology experiment web app using php+Doctrine2+MySQL)
Using your Solution 2 you can very easy get everything you need:
SELECT DISTINCT (t1.eventType),t1.eventValue, t2.*
FROM `events` AS t1
LEFT JOIN
(SELECT eventtype, max(time) AS time
FROM events
WHERE events.`time`<='004'
GROUP BY eventtype ) AS t2
ON t1.eventType=t2.eventType
WHERE t1.time=t2.time
so this query will return all different attribute that was valid for time 004 , and you will see when each of attribute was set
Your second solution is looking pretty solid. There are other ways to organize the data, such as an field level revision table, which is a touch more structure than you currently have.
Using the second solution you could get a snapshot in one query with a sub-query. I assume this is something that "just needs to be done" and doesn't rely on the most efficient query.
SELECT * FROM (
SELECT * FROM event
WHERE time >= '003'
ORDER BY Time DESC) AS temp
GROUP BY EventType;
I have a comma delimited list that im storing in a varchar field in a mysql table.
Is it possible to add and remove values from the list directly using sql queries? Or do I have to take the data out of the table, manipulate in PHP and replace it back into mysql?
There is no way to do it in InnoDB and MyIsam engines in mysql. Might be in other engines (check CSV engine).
You can do it in a stored procedure, but, not recommended.
What you should do to solve such an issue is to refactor your code and normalize your DB =>
original table
T1: id | data | some_other_data
1 | gg,jj,ss,ee,tt,hh | abanibi
To become:
T1: id | some_other_data
1 | abanibi
T2: id | t1_id | data_piece
1 | 1 | gg
2 | 1 | jj
3 | 1 | ss
4 | 1 | ee
5 | 1 | tt
6 | 1 | hh
and if data_piece is a constant value in the system which is reused a lot, you need to add there a lookup table too.
I know it looks more work, but then it will save you issues like you have now, which take much more time to solve.
I'm creating a table for allowing website users to become friends. I'm trying to determine which is the best table design to store and return a user's friends. The goal is to have fast queries and not use up a lot of db space.
I have two options:
Have individual rows for each friendship.
+----+-------------+-------------------+
| ID | User_ID | Friend_ID |
+----+-------------+-------------------+
| 1 | 102 | 213 |
| 2 | 64 | 23 |
| 3 | 4 | 344 |
| 4 | 102 | 2 |
| 5 | 102 | 90 |
| 6 | 64 | 88 |
+----+-------------+-------------------+
Or store all friends in one row as CSV
+----+-------------+-------------------+
| ID | User_ID | Friend_ID |
+----+-------------+-------------------+
| 1 | 102 | 213,44,34,67,8 |
| 2 | 64 | 23,33,45,105 |
+----+-------------+-------------------+
When retrieving friends I can create an array using explode() however deleting a user would be trickier.
Edit: For second method I would separate each id in array in php for functions such as counting and others.
Which method do you think is better?
First method is definitely better. It's what makes relational databases great :)
It will allow you to search for and group by much more specific criteria than the 2nd method.
Say you wanted to write a query so users could see who had them as a friend. The 2nd method would require you to use IN() and would be much slower than simply using JOINS.
The first method is better in just about every way. Not only will you utilize your DBs indexes to find records faster, it will make modification far far easier.
Breaking from 1st normal form is usually not desirable because
Easy to Orpahned ids
Easy to insert invalid data types
Updates can require full table scans
Increases concurrency issues
No way to create the key (user_id, friend_id)
Use the power of the relational database. Definitely go with the first approach. MySQL is faster than you think, and it regularly deals with VERY large datasets.