I'm facing a problem since ages. I've got a large data set of 500 000 rows and I want to print them to a CSV.
On a first case, I need to do calculations from the columns on a row to determine a "result" column. I've found a solution to that. I'm using "fetch" and doing calculation row by row and printing gradually to my csv
Simplified example :
|---------------------------------------|
| ID | type | var1 | var2 | var1 * var2 |
|---------------------------------------|
| 0 | car | 2 | 5 | 10 |
| 1 | moto | 4 | 8 | 32 |
| 2 | car | 0 | 2 | 2 |
|---------------------------------------|
On a second case, I need, from 500k rows, to print calculation involving large amount of rows from these 500k (It can be up to group of 10k+ rows).
Simplified example :
|-----------------------------|
| type | sum var1 * var2 |
|-----------------------------|
| car | 87677670 |
| moto | 3232435 |
|-----------------------------|
BUT, calculations are far more complicated that just a sum of a multiplication. Traduction : they can't be done directly in SQL.
My problem is that if I get all my cars from my database to my php app to loop over, my php memory_limit would be reach. How can I do this gradually like the first example ?
Not : I'm using Oracle 12c and PHP 5.3.5
You can reduce the memory used by making your calculations by blocks. With SQL you can loop multiple request with only a specific amount of rows on each requests.
With Oracle this can be done with FETCH and OFFSET. Here is a piece of documentation where you will find everything you need.
Related
I am currently examining aerospike for replacing my company MySQL database. Currently, in MySQL, we have a table that stores the transaction data, the table looks like this :
+--------+------------+-----------+------------+-----+--------+
| trx_id | trx_date | client_id | product_id | qty | total |
+--------+------------+-----------+------------+-----+--------+
| 1 | 2015-01-01 | 1 | 1 | 100 | 100000 |
| 2 | 2015-01-02 | 2 | 2 | 200 | 200000 |
| 3 | 2015-01-03 | 3 | 3 | 300 | 300000 |
+--------+------------+-----------+------------+-----+--------+
For reporting, we usually do something like :
SELECT MONTH(trx_date), SUM(qty), SUM(total) FROM transaction WHERE client_id = 1 AND product_id = 1 GROUP BY MONTH(trx_date)
to get the monthly transaction data for a client.
I've read the documentation for the Aerospike PHP client and I don't seem to find anything similar to AND, GROUP BY, or MONTH.
So, in Aerospike PHP client, what is the recommended way to achieve something like that?
Thanks.
Aerospike is a NoSQL key-value store, and as such you can't expect to use SQL with it. However, using Lua as the User-Defined Function (UDF) language, you can extend the basic functionality.
What you are looking for is an aggregation, applying a stream UDF to the results of a query.
There is an example of implementing a GROUP BY x HAVING in the PHP client's documentation for the aggregate() method. The thing to remember is that you want the secondary index query to eliminate as many records as you can, so that predicate should used for the 'WHERE', and the secondary filtering for the 'AND' should happen inside the stream UDF's filter on the smallest possible data set.
Reading the UDF Development Guide would also help.
Say if I wanted to add the functionality of logging user actions within a web application. My table schema would look similar to the following:
tbl_history:
+----+---------+--+-----------+
| id | user_id | | action_id |
+----+---------+--+-----------+
| 1 | 1 | | 1 |
| 1 | 1 | | 2 |
| 1 | 2 | | 2 |
+----+---------+--+-----------+
A user can generate many actions so I will need to paginate this history. In order to do this I will need to figure out the total amount of rows for the user then calculate how many pages of data there should be.
Which would method be the most efficient if I were to have hundreds of users generating thousands of rows of data each day?
A)
Using the MYSQL's COUNT() function to query the amount of rows of data in the tbl_history table for a particular user.
B)
Having another table which would keep a count of history for the user within the tbl_history table.
+---------+--+---------------+
| user_id | | history_count |
+---------+--+---------------+
| 1 | | 2 |
| 2 | | 1 |
+---------+--+---------------+
This will allow me to instantly get the total count of rows with a simple query in less than 1ms.
The tradeoff is that I will need to perform more queries updating the count for each user and also again on page load.
Which method is more efficient to use? Or is there any other better method? Any technical explanation would be great.
Thanks in advance.
I'm displaying a record set using Datatables pulling records from two tables.
Table A
sno | item_id | start_date | end_date | created_on |
===========================================================
10523563 | 2 | 2013-10-24 | 2013-10-27 | 2013-01-22 |
10535677 | 25 | 2013-11-18 | 2013-11-29 | 2013-01-22 |
10587723 | 11 | 2013-05-04 | 2013-05-24 | 2013-01-22 |
10598734 | 5 | 2013-06-14 | 2013-06-22 | 2013-01-22 |
Table B
id | item_name |
=====================================
2 | Timesheet testing |
25 | Vigour |
11 | Fabwash |
5 | Cruise |
Now since the number of records returned is going to turn into a big number in near future, I want the processing to be done serverside. I've successfully managed to achieve that but it came at a cost. I'm running into a problem while dealing with filters.
From the figure above, (1) is the column whose value will be in int (item_id), but using some small modifications inside the while loop of the mysql resource, I'm displaying the corresponding string using Table B.
Now if I use the filter (2), it is working fine since those values come from Table A
The Problem
When I try to filter from the field (3), if I enter a string value such as fab it says no record found. But if I enter an int such as 11 I get a single row which contains Fabwash as the item name.
So while filtering I'm required to use the direct value used in Table A and not its corresponding string value stored in Table B. I hope the point that I'm putting across is understandable because it is hard to explain it in words.
I'm clueless on how to solve the issue.
Have a MYSQL look up table that returns the points received for a certain place(P) among a number of finishers(N), with a variety of formats(points_id). Different point structures are used for different events. Some times the points awarded depend on the number of finishers(N) Sometimes they don't.
Here is a short version of the table, with two sample structures.
points_id -1 the points depends on N Point_id -2 the points don't.
points
points_id | P | N | points |
1 | 1 | 3 | 90 |
1 | 1 | 2 | 85 |
1 | 1 | 1 | 80 |
1 | 2 | 3 | 60 |
1 | 2 | 2 | 50 |
1 | 3 | 3 | 30 |
3 | 1 | | 100 |
3 | 2 | | 90 |
3 | 3 | | 80 |
3 | 3 | | 70 |
So my question:
1) is there a way to put the wildcard in the table data.
eg if the N column that shows blank had a % in it
and I did this query.
SELECT points from t1 WHERE points_id=3 and P=3 and N=2
It would return 96??
PS I know this doesn't work but is shows my idea.
2) I want it to be fast, may put it in a procedure to use in larger queries. I am guessing unless there is a very simple way to do what I show above. the fastest method will be to have rows for all of the different N's in the points_id =3 case. Is that true?
You might consider UNION ALL:
SELECT points from t1 WHERE points_id=3 AND P=3
UNION ALL
SELECT points from t1 WHERE points_id=3 AND N=2
This will get the results regardless if P=3 or N=2. I copied your database schema and tried this, and it produced:
points
------
80
70
If you do want this to be fast with a large amount of data--you'll really want to have an index and/or primary key.
Try this :
SELECT points from t1 WHERE points_id=3 and P=3 and (N=2 OR (IFNULL(N,'')=''))
// dataType of N varchar
SELECT points from t1 WHERE points_id=3 and P=3 and (N=2 OR (IFNULL(N,0)=0))
// dataType of N numeric type
Let me know if there is any change or am getting you wrong
I have a comma delimited list that im storing in a varchar field in a mysql table.
Is it possible to add and remove values from the list directly using sql queries? Or do I have to take the data out of the table, manipulate in PHP and replace it back into mysql?
There is no way to do it in InnoDB and MyIsam engines in mysql. Might be in other engines (check CSV engine).
You can do it in a stored procedure, but, not recommended.
What you should do to solve such an issue is to refactor your code and normalize your DB =>
original table
T1: id | data | some_other_data
1 | gg,jj,ss,ee,tt,hh | abanibi
To become:
T1: id | some_other_data
1 | abanibi
T2: id | t1_id | data_piece
1 | 1 | gg
2 | 1 | jj
3 | 1 | ss
4 | 1 | ee
5 | 1 | tt
6 | 1 | hh
and if data_piece is a constant value in the system which is reused a lot, you need to add there a lookup table too.
I know it looks more work, but then it will save you issues like you have now, which take much more time to solve.