Archiving mysql data throwing memory limit issue

Archiving mysql data throwing memory limit issue - php

I have multiple tables. like table1, table2, table3, etc.
What is required:
1. fetch specific row from table1. (for ex: id = 203)
2. fetch all values related to id 203 from table2 (ex: 1,2,3,4,5,6,7....500)
3. again fetch all values of ids from step 2 from table3,table4, etc which have foreign key relation on table2.(millions of rows)
4. Build insert statements for all above 3 steps from result.
5. Insert queries of step.4 in respected tables in archived db with same table names. ie, in short, archiving some part of the data to archive DB.
How I am doing:
For each table, whenever got the rows, created insert statement and storing in specific arrays for each table. Once fetched all values till step 3, creating insert statement and storing in array. Then running loops for each separate arrays and executing these queries archived DB. Once queries executed successfully, deleting all fetched rows from main db, then committing the transaction.
Result:
So far the above approach worked very well with small DB of size around 10-20mb data.
Issue:
For larger number of rows(say more than 5gb), the php is throwing memory exhaust error while fetching rows and hence not working in Production. Even I have increased memory limit till 3gb. I dont want to increase it more.
Alternate solution what I am thinking is, instead of using arrays to store queries, store these queries in files, and then internally use infile command to execute queries to archive DB.
Please suggest how to achieve above issue? once moved to archive DB, there are requirements to move back to main DB with similar functionality.

There are two keys to handling large result sets.
The first is to stream the result set row by row. Unless you specify this explicitly, the php APIs for MySQL immediately attempt to read the entire result set from the MySQL server into client memory, then navigate through that row by row. If your result set has tens or hundreds of thousands of rows, this can make php run out of memory.
If you're using the mysql_ interface, use mysql_unbuffered_query(). You should not be using that interface, though. It's deprecated because, well, it sucks.
If you're using the mysqli_ interface, call mysqli_real_query() instead of mysqli_query(). Then call mysqli_use_result() to initiate retrieval of the result set. You can then fetch each row with with one of the fetch() variants. Don't forget to use mysqli_free_result() to close the result set when you have fetched all its rows. mysqli_ has object-oriented methods; you can use those as well.
PDO has a similar way of streaming result sets from server to client.
The second key to handling large result sets is to use a second connection to your MySQL server to perform the INSERT and UPDATE operations so you don't have to accumulate them in memory. The same goes if you choose to write information to a file in the file system: write it out a row at a time so you don't have to hold it in RAM.
The trick is to handle one or a few rows at a time, not tens of thousands.
It has to be said: Many people prefer to use command line programs written in a number-crunching language like Java, C#, or PERL for this kind of database maintenance.

Related

Better performance - MySQL temp table or reading CSV file directly or something else?

On a daily basis, I get a source csv file that has 250k rows and 40 columns. It’s 290MB. I will need to filter it because it has more rows than I need and more columns than I need.
For each row that meets the filtering criteria, we want to update that into the destination system 1 record at a time using its PHP API.
What will be the best approach for everything up until the API call (the reading / filtering / loading) for the fastest performance?
Iterating through each row of the file, deciding if it’s a row I want, grabbing only the columns I need, and then passing it to the API?
Loading ALL records into a temporary MySQL table using LOAD DATA INFILE. Then querying the table for the rows and fields I want, and iterating through the resultset passing each record to the API?
Is there a better option?
Thanks!

I need make an assumption first, majority of the 250K rows will go to database. If only a very small percentage, then iterate over the file and send all the rows in batch is definitely faster.
Different configurations could affect both approaches, but general speaking, the 2nd approach performs better with less scripting effort.
Approach 1: the worst is to send each row to server. More round trip and more small commits.
What you can improve here is to send rows in batch, maybe a few hundreds together. You will see a much better result.
Approach 2: MyISAM will be faster than InnoDB because of all the overheads and complexity of ACID. If MyISAM is acceptable to you, try it first.
For InnoDB, there is a better Approach 3 (which is actually a mix of approach 1 and approach 2).
Because InnoDB don't do table lock and you can try to import multiple files concurrently, i.e., separate the CSV files to multiple files and execute Load Data from your scripts. Don't add auto_increment key into the table first to avoid auto_inc lock.

LOAD DATA, but say #dummy1, #dummy2 etc for columns that you don't need to keep. That gets rid of the extra columns. Load into a temp table. (1 SQL statement.)
Do any cleansing of the data. (Some number of SQL statements, but no loop, if possible.)
Do one INSERT INTO real_table SELECT ... FROM tmp_table WHERE ... to both filter out unnecessary rows and copy the desired ones into the real table. (1 SQL statement.)
You did not mention any need for step 2. Some things you might need:
Computing one column from other(s).
Normalization.
Parsing dates into the right format.
In one project I did:
1GB of data came in every hour.
Load into a temp table.
Normalize several columns (2 SQLs per column)
Some other processing.
Summarize the data into about 6 summary tables. Sample: INSERT INTO summary SELECT a,b,COUNT(*),SUM(c) FROM tmp GROUP BY a,b;. Or an INSERT ON DUPLICATE KEY UPDATE to deal with rows already existing in summary.
Finally copy the normalized, cleaned, data into the 'real' table.
The entire process took 7 minutes.
Whether to use MyISAM, InnoDB, or MEMORY -- You would need to benchmark your own case. No index is needed on the tmp table since each processing step does a full table scan.
So, 0.3GB each 24 hours -- Should be no sweat.

Handling large data in php

I am trying to insert large data into db after mls property search(PHRETS)
. The result object have around 4500 to 5000 records each having 450-475 keys,
it gives "HTTP Error 500 Internal server error" while inserting data into db after sometime(generally after 6-7 min) because of time limit of server I guess, I asked server guys to increase the time limit for execution, still it gives error
here is the process of my execution
1)I make search into mls for properties
2)I try to insert all records at once using Implode to save execution time
$qry=mysqli_query($link,"INSERT INTO `rets_property_res` VALUES implode(',', $sql)");
-tried using prepared statements
can we store this data somewhere and then later process it at once OR can we speed up the process to make everything work in given timeframe

The fastest way to accomplish this is to grab a large chunk (many thousands of rows), blast it out to a CSV, and perform a LOAD DATA INFILE. Some environments lack the ability to use the LOAD DATA LOCAL INFILE .... LOCAL part due to server settings. The link above has another one to this that states:
When loading a table from a text file, use LOAD DATA INFILE. This is
usually 20 times faster than using INSERT statements. See Section
14.2.6, “LOAD DATA INFILE Syntax”.
I have found that to be easily true.
Certainly slower but better than individual inserts for speed is to put multiple inserts together with one statement:
insert myTable (col1,col2) values ('a1','b1'), ('a2','b2'), ('a3','b3');
So in the above example, 3 rows are inserted with one statement. Typically for speed it is best to play around with 500 to 1000 rows at a time (not 3). That all depends on your string size, based on your schema, for that insert statement.
Security concerns: you need to be keen to the possibility of 2nd level sql injection attacks, as far fetched as that may seem. But it is possible.
All of the above may seem trivial, but for an anecdotal example and UX pain comment I offer the following. I have a c# application that grabs questions off of Stackoverflow and houses their metrics. Not the body part, or answers, but the title and many counts and datetimes. I insert 1000 questions a time into my db (or do an insert ignore or an insert on duplicate key update). Before converting that to LOAD DATA INFILE it would take about 50 seconds per 1000 to do c#/mysql bindings with a re-used Prepared Statement. After converting it to LOAD DATA INFILE (including the truncate of worktable, the csv write, and the insert statement), it takes about 1.5 seconds per 1000 rows.

Generally you won't be able to insert 5000 rows at once due to the max packet size limit of MySQL (depends on how much data there is in each insert).
Try inserting them in smaller groups, e.g. 100 at once or less.

How to insert more than 10000 rows to MSSQL Table

I have a PHP project where I have to insert more than 10,000 rows to a SQL Table. These data are taken from a table and checked for some simple conditions and inserted to the second table at the end of every month.
How should I do this.
I think need more clarification. I currently use small batch (250 inserts) transferring using PHP cronjob and it works fine. But i need to do this is most appropriate method.
What will be the most appropriate one.
Cronjob with PHP as I currently use
Exporting to a file and BULK import method
Some sort of Stored procedure to transfer directly
or any other.

Use insert SQL statement. :^ )
Adds one or more rows to a table or a view in SQL Server 2012. For examples, see Examples.
Example of using mssql_* extension.
$server = 'KALLESPC\SQLEXPRESS';
$link = mssql_connect($server, 'sa', 'phpfi');
mssql_query("INSERT INTO STUFF(id, value) VALUES ('".intval($id)."','".intval($value)."')");

Since the data is large, make the batch of 500 records for processing.
Check the condition for those 500 batches , till that time, make ready another batch of 500 and insert first batch and process so on.
This will not give load on your sql server.
By this way i daily process 40k Records.

Use BULK INSERT - it is designed for exactly what you are asking and significantly increases the speed of inserts.
Also, (just in case you really do have no indexes) you may also want to consider adding an indexes - some indexes (most an index one on the primary key) may improve the performance of inserts.
The actual rate at which you should be able to insert records will depend on the exact data, the table structure and also on the hardware / configuration of the SQL server itself, so I can't really give you any numbers.

SQL Server does not insert more than 1000 records in a single batch. You have to create separate batch for insertion. Here I am suggesting some of alternative which will help you.
Create one stored procedure. create two temporary table one for valid data and other for invalid data. one by one check all your rules and validation and base on that insert data into this both table.
If data is valid then insert into valid temp table else insert into invalid temp table.
Now, next using merge statement you can insert all that data into your source table as per your requirements.
you can transfer N number of records between tables so I hope this would be fine for you
Thanks.

it's so simple , you can do it using multiple while, since 10000 rows is not huge data!
$query1 = mssql_query("select top 10000 * from tblSource");
while ($sourcerow = mssql_fetch_object($query1)){
mssql_query("insert into tblTarget (field1,field2,fieldn) values ($sourcerow->field1,$sourcerow->field2,$sourcerow->fieldn)");
}
this should be work as fine

Tricky MySQL Batch Design

I have a scraper which visits many sites and finds upcoming events and another script which is actually supposed to put them in the database. Currently the inserting into the database is my bottleneck and I need a faster way to batch the queries than what I have now.
What makes this tricky is that a single event has data across three tables which have keys to each other. To insert a single event I insert the location or get the already existing id of that location, then insert the actual event text and other data or get the event id if it already exists (some are repeating weekly etc.), and finally insert the date with the location and event ids.
I can't use a REPLACE INTO because it will orphan older data with those same keys. I asked about this in Tricky MySQL Batch Query but if TLDR the outcome was I have to check which keys already exist, preallocate those that don't exist then make a single insert for each of the tables (i.e. do most of the work in php). That's great but the problem is that if more than one batch was processing at a time, they could both choose to preallocate the same keys then overwrite each other. Is there anyway around this because then I could go back to this solution? The batches have to be able to work in parallel.
What I have right now is that I simply turn off the indexing for the duration of the batch and insert each of the events separately but I need something faster. Any ideas would be helpful on this rather tricky problem. (The tables are InnoDB now... could transactions help solve any of this?)

I'd recommend starting with Mysql Lock Tables which you can use to prevent other sessions from writing to the tables whilst you insert your data.
For example you might do something similar to this
mysql_connect("localhost","root","password");
mysql_select_db("EventsDB");
mysql_query("LOCK TABLE events WRITE");
$firstEntryIndex = mysql_insert_id() + 1;
/*Do stuff*/
...
mysql_query("UNLOCK TABLES);
The above does two things. Firstly it locks the table preventing other sessions from writing to it until you the point where you're finished and the unlock statement is run. The second thing is the $firstEntryIndex; which is the first key value which will be used in any subsequent insert queries.

PHP problem with selecting from Oracle global temporary table

I have an Oracle global temporary table which is "ON COMMIT DELETE ROWS".
I have a loop in which I:
Insert to global temporary table
Select from global temporary table (post-processing)
Commit, so that the table is purged before next iteration of the loop
Insertion is done with a call to oci_execute($stmt, OCI_DEFAULT). Retrieval is made through a call to oci_fetch_all($stmt, $result, 0, -1, OCI_FETCHSTATEMENT_BY_ROW | OCI_ASSOC). After that, a commit is made: oci_commit().
The problem is that retrieval sometimes works, and sometime I get one of the following errors:
ORA-08103: object no longer exists
ORA-01410: invalid ROWID
As if the session cannot "see" the records that it previously inserted.
Do you have any idea what could be causing this?
Thanks.

Are you using connection pooling? If so then it could be that different calls are executing in separate sessions.
A better solution would be to have a single PL/SQL procedure which populates the temporary table and returns a resultset set in a single call. Which then suggests an even better solution: do away with the temporary table altogether.
There are few situations in Oracle which demand the use of temporary tables. Most things are solvable with pure SQL or perhaps bulk collecting into nested tables. What actual manipulation of the data in the temporary table do you undertake between the insert and the subsequent select?
edit
Temporary tables have a performance hit - the rows are written to disk. PL/SQL collections remain in (session) memory and so are faster. Of course, because they are in session memory they won't solve the problem you have with connection pooling.
Is the reason you need to chunk up the data because you don't want to pass 200,000 rows to your PHP in one fell swoop? I think I need a little more context if I am to help you any further.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.