I have a mysql database that I am trying to migrate into another database. THey have different schema's and I have written a php script for each table of the old database in order to populate its data in to the new one. The script works just fine but the problem is that it does not move all the data. for example if I have a table and all its info are being selected and then inserted into the new table but only half of them are done. The way I am doing it I am opening a database selecting * and puting it in an associative array. then I close the db connection and connect to the other one go through each element of the array and insert them in the new one. Is there a limit to how big an array could be? what is wrong here?
You should read the rows from the first database in chunks (of 1000 rows for example), write those rows to the second database, clean the array (with unset() or an empty array) and repeat the process until you read all the rows.
This overcomes the memory limitations.
Another problem might be that the script is running for too long (if the table is too large), so try using the function set_time_limit(). This function resets the timeout for a script after which it should be terminated. I suggest calling it after processing each chunk.
First of all, I don't see the point in writing a script to do this. Why don't you just get a SQL dump from phpMyAdmin and edit it so that it fits the other database? Or are they that different?
But to reply on your question: my first thought would be, like other people already said, that the problem would be the time limit. Before you try to do something about this, you should check the value of max_execution_time in php.ini (this is about 30 seconds most of the time) and how long it takes for the script to execute. If it terminates after roughly 30 seconds (or the value of max_execution_time if it's different), then it's likely that that's the problem, although php should throw an error (or at least a warning).
I don't think there's a limit on the size of an array in php. However, there is a directive in php.ini, namely memory_limit that defines the amount of memory a script can use.
If you are have acces to your php.ini file, I suggest setting both max_execution_time and memory_limit to a higher value. If you don't have acces to php.ini, you won't be able to change the memory_limit directive. You will have to work your way around this, for example by using LIMIT in your SQL. Be sure to unset your used variables, or you could run in to the same problem.
You may have constraints in the target database that are rejecting some of your attempted inserts.
Why not do this via sql scripts?
If you prefer to do it via php then you could open connections to both databases and insert to target as you read from source. That way you can avoid using too much memory.
Using php to do the transform/convert logic is a possibility. I would do it, if you are doing complex transformations and if your php skills are much better thant your mysql skillset.
If you need more memory in your php script use:
memory_limit = 2048M
max_execution_time = 3600
This will give you 2gigs of possible space for the array and about an hour for processing. But if your database is really this big, it would much (really a lot) much faster to use:
1.
mysqldump, to make a dump of your source-server
Check it here: http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html
2.
Upload the dumpfile and iport it. There are a bunch of example on the mysql documentation page. (Look also in the comments).
After this you can transform your database through CREATE/SELECT-statements.
CREATE TABLE one SELECT * FROM two;
As an alternative you can use UPDATE-statements. What is best depends heavily on the kind of job that you are doing.
Good luck!
It would be preferable to do a mysql dump at the command line:
mysqldump -a -u USER_NAME -p SOURCE_DATABASE_NAME > DATA.mysql
You can also gzip the file to make it smaller for transfer to another server:
gzip DATA.mysql
After transfer, unzip the file:
gunzip -f DATA.mysql.gz
And import it:
mysql -u USER_NAME -p TARGET_DATABASE_NAME < DATA.sql
Your server (as all server do) will have a memory limit for PHP - if you use more than the assigned limit, then the script will fail.
Is it possible to just Dump the current MySQL Database into text files, perform find-and-replaces or RegExp-based replacements to change the schemas within the text files, and then reload the amended test files into MySQL to complete the change? If this is a one-off migration, then it may be a better way to do it.
You may be running into PHP's execution time or memory limits. Make sure the appropriate settings in php.ini are high enough to allow the script to finish executing.
Related
I have a problem with exporting large amount of data to csv file using php.
Information:
Need to export 700000 list of addresses from database table(address).
Server timed out or lacking memory
project I'm working on working with multiple servers
My solution(what have i tried)
Get data part by part(from database) process this data(fputcsv) write this part to the temporary file - and send information to user via Ajax (show him the amount of processed Percentage). After last part of data has been processed just give user link to download this file. All is fine i have did this and this solution works for me - on my local enviroment, but
the problem is - project I'm working on working with multiple servers so I ran into a problem that temporary file can be stored on different servers.
For Example:
I have 3 servers: Server1, Server2 and Server3.
First time i read data from db with limit 0 50000 - process it and save it to File.csv on Server1, next iteration limit 50000, 50000 can be saved on another server Server2 - this is the problem.
So my question is:
Where i can store my processed temporary csv data, or maybe i am missing something, i am stuck here, looking for advice.
Every suggestion or solution will be appreciated! Thanks.
UPDATE
PROBLEM IS SOLVED
Later i will post my solution
You can use the mysql query with limits, to dircly export the records into csv file from mysql database.
SELECT id, name, email INTO OUTFILE '/tmp/result.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
ESCAPED BY ‘\\’
LINES TERMINATED BY '\n'
FROM users WHERE 1
It would really be helpful if you posted your code. The reason I'm saying that is because it doesn't sound like you're looping row after row which is will save you heaps of memory - no huge array to keep in the RAM. If you're not looping row by row and committing to the CSV file as you go, then I suggest you modify your code to do just that and it might solve the issue altogether.
If indeed, even committing to the CSV row by row is not enough. Then the issue you're running into is your servers setup relies on the code to be stateless, but your code isn't.
You can solve this issue using either of the following ways:
Make user sessions server specific. If you're routing requests via a load balancer, then you can probably control that setting there. If not, you'll have to go into custom sessions variables and configure your environment accordingly. I don't like this method but if you can control it via the load balancer, it might be the quickest way to get the problem solved.
Save the temporary files to the shared DB all the servers have access to with a simple transaction ID or some other identifier. Make the server handling this last portion of the export aggregate the data and prepare the file for download.
Potentially, you could run into another memory limit or max run time issue with method #2. In this case, if you cannot raise the servers' RAM, configure PHP to use the extra RAM and extend the script max run time. Then my suggestion would be to let the user download the files portion by portion. Export the CSV up to the limit your server supports, let the user download, then let them download the next file, and so on.
Potentially, you should try this method before you try any of the other methods. But perhaps the question which we must be asking is why use PHP to convert database entries into CSV in the first place? A lot of DBs have a CSV export built-in which is almost guaranteed to take less memory and time. If you're using MySQL for example, you can use - How to output MySQL query results in CSV format?
Hope this helps.
you can increase the execution time of your php code using ini_set('max_execution_time', seconds in numbers);
I have already followed the question:Data Limit on MySQL DB Insert I was unable to solve it with the limited info.
I am using WAMP.
I have numerous Rich Text editors and 4 images which are being sent over to another page by a POST request. After a certain threshold limit, the query is failing. Is there a way around?
EDIT: while displaying the query string it seems that I am able to retrieve every bit of data that was sent via POST. I am quite sure that it is DB related.
Images are being stored as a BLOB.
EDIT #2: Error showing is "MySQL server has gone away".
You may be violating the max_allowed_packet setting. See here for more data.
Quote,
If you are using the mysql client program, its default
max_allowed_packet variable is 16MB.
If you are uploading uncompressed images, this value is fairly easy to reach.
Also, it would be great if you could name the specific database interface class that you use (PDO? mysql_? mysqli_?), as different classes handle errors differently. It could just as well not handle an oversized packet situation at all.
P.S.: You should really be checking your logs for the specific error you encounter. The first place to look for would be /var/log/mysql/error.log (could vary depending on your env)
Update:
mysql_error() returned "MySQL server has gone away"
From the manual pages for the error: "You can also get these errors if
you send a query to the server that is incorrect or too large. If
mysqld receives a packet that is too large or out of order, it assumes
that something has gone wrong with the client and closes the
connection. If you need big queries (for example, if you are working
with big BLOB columns), you can increase the query limit by setting
the server's max_allowed_packet variable, which has a default value of
1MB. You may also need to increase the maximum packet size on the
client end..."
(quote courtesy of #Colin Morelli)
sometimes, php tends to reach the memory limit if the file uploaded is too large. depending on your config, this might help:
set_time_limit(0);
ini_set('memory_limit', '-1');
EDIT:
if it is not the memory allocation thing we all rushed to answer
then it could be a memory engine tidbit so you could probably check that,
comment:
in my experience, it is most likely a memory issue, since it only occurs when you try bigger imports
(happens to my application when i try to return a 20MB result set from a single query)
I was wondering if there is a (free) tool for mysql/php benchmark.
In particular, I would like to insert thousands of data into the MySQL database, and test the application with concurrent queries to see if it will last. This is, test the application in the worst cases.
I saw some pay tools, but none free or customizable one.
Any suggestion? or any script?
Thnx
Insert one record into the table.
Then do:
INSERT IGNORE INTO table SELECT FLOOR(RAND()*100000) FROM table;
Then run that line several times. Each time you will double the number of rows in the table (and doubling grows VERY fast). This is a LOT faster than generating the data in PHP or other code. You can modify which columns you select RAND() from, and what the range of the numbers is. It's possible to randomly generate text too, but more work.
You can run this code from several terminals at once to test concurrent inserts. The IGNORE will ignore any primary key collisions.
Make a loop (probably infinite) that would keep inserting data into the database and test going from there.
for($i=1;$i=1000;$i++){
mysql_query("INSERT INTO testing VALUES ('".$i."')");
//do some other testing
}
for($i=1;$i<5000;$i++){
$query = mysql_query("INSERT INTO something VALUES ($i)");
}
replace something with your table ;D
if you want to test concurrency you will have to thread your insert/update statements.
An easy and very simple way(without going into fork/threads and all that jazz) would be to do it in bash as follows
1. Create an executable PHP script
#!/usr/bin/php -q
<?php
/*your php code to insert/update/whatever you want to test for concurrency*/
?>
2. Call it within a for loop by appending & so it goes in the background.
#!/bin/bash
for((i=0; i<100; i++))
do
/path/to/my/php/script.sh &;
done
wait;
You can always extend this by creating multiple php scripts having various insert/update/select queries and run them through the for loop (remember to change i<100 to higher number if you want more load. Just don't forget to add the & after you call your script. (Of course, you will need to chmod +x myscript.sh )
Edit: Added the wait statement, below this you can write other commands/stuff you may want to do after flooding your mysql db.
I did a quick search and found the following page at MySQL documentation => http://dev.mysql.com/doc/refman/5.0/en/custom-benchmarks.html. This page contains the following interesting links:
the Open Source Database Benchmark, available at
http://osdb.sourceforge.net/.
For example, you can try benchmarking packages such as SysBench and
DBT2, available at http://sourceforge.net/projects/sysbench/, and
http://osdldbt.sourceforge.net/#dbt2. These packages can bring a
system to its knees, so be sure to use them only on your development
systems.
For MySQL to be fast you should look into Memcached or Redis to cache your queries. I like Redis a lot and you can get a free (small) instance thanks to http://redistogo.com. Most of the times the READS are killing your server and not the WRITES which are less frequently(most of the times). When WRITES are frequently most of the times it is not really a big case when you lose some data. Sites which have big WRITE rates are for example Twitter or Facebook. But then again I don't think it is the end of the world if a tweet or Facebook wall post gets lost. Like I point out previously you can fix this easily by using Memcached or Redis.
If the WRITES are killing you could look into bulk insert if possible, transactional insert, delayed inserts when not using InnoDB or partitioning. If data is not really critical you could put the queries in memory first and then do bulk insert periodically. This way when you do read from MySQL you would return stale data(could be problem). But then again when you use redis you could easily store all your data in memory, but when your server crashes you can lose data, which could be big problem.
I'm converting a forum from myBB to IPBoard (the conversion is done through a PHP script), however I have over 4 million posts that need to be converted, and it will take about 10 hours at the current rate. I basically have unlimited RAM and CPU, what I want to know is how can I speed this process up? Is there a way I can allocate a huge amount of memory to this one process?
Thanks for any help!
You're not going to get a script to run any faster. By giving it more memory, you might be able to have it do more posts at one time, though. Change memory_limit in your php.ini file to change how much memory it can use.
You might be able to tell the script to do one forum at a time. Then you could run several copies of the script at once. This will be limited by how it talks to the database table and whether the script has been written to allow this -- it might do daft things like lock the target table or do an insanely long read on the source table. In any case, you would be unlikely to get more than three or four running at once without everything slowing down, anyway.
It might be possible to improve the script, but that would be several days' hard work learning the insides of both forums' database formats. Have you asked on the forums for IPBoard? Maybe someone there has experience at what you're trying to do.
not sure how the conversion is done, but if you are importing a sql file , you could split it up to multiple files and import them at the same time. hope that helps :)
If you are saying that you have the file(s) already converted, you should look into MySQL Load Data In FIle for importing, given you have access to the MySQL Console. This will load data considerably faster than executing the SQL Statements via the source command.
If you do not have them in the files and you are doing them on the fly, then I would suggest having the conversion script write the data to a file (set the time limit to 0 to allow it to run) and then use that load data command to insert / update the data.
I am importing a csv file with more then 5,000 records in it. What i am currently doing is, getting all file content as an array and saving them to the database one by one. But in case of script failure, the whole process will run again and if i start checking the them again one by one form database it will use lots of queries, so i thought to keep the imported values in session temporarily.
Is it good practice to keep that much of records in the session. Or is there any other way to do this ?
Thank you.
If you have to do this task in stages (and there's a couple of suggestions here to improve the way you do things in a single pass), don't hold the csv file in $_SESSION... that's pointless overhead, because you already have the csv file on disk anyway, and it's just adding a lot of serialization/unserialization overhead to the process as the session data is written.
You're processing the CSV records one at a time, so keep a count of how many you've successfully processed in $_SESSION. If the script times out or barfs, then restart and read how many you've already processed so you know where in the file to restart.
What can be the maximum size for the $_SESSION ?
The session is loaded into memory at run time - so it's limited by the memory_limit in php.ini
Is it good practice to keep that much of records in the session
No - for the reasons you describe - it will also have a big impact on performance.
Or is there any other way to do this ?
It depends what you are trying to achieve. Most databases can import CSV files directly or come with tools which will do it faster and more efficently than PHP code.
C.
It's not a good idea imho since session data will be serialized/unserialized for every page request, even if they are unrelated to the action you are performing.
I suggest using the following solution:
Keep the CSV file lying around somewhere
begin a transaction
run the inserts
commit after all inserts are done
end of transaction
Link: MySQL Transaction Syntax
If something fails the inserts will be rolled back so you know you can safely redo the inserts without having to worry about duplicate data.
To answer the actual question (Somebody just asked a duplicate, but deleted it in favour of this question)
The default session data handler stores its data in temporary files. In theory, those files can be as large as the file system allows.
However, as #symcbean points out, session data is auto-loaded into the script's memory when the session is initialized. This limits the maximum size you should store in session data severely. Also, loading lots of data has a massive impact on performance.
If you have huge amounts of data you need to store connected to a session, I would recommend using temporary files that you name by the current session ID. You can then deal with those files as needed, and as possible within the limits of the script's memory_limit.
If you are using Postgresql, you can use a single query to insert them all using pg_copy_from., or you can use pg_put_line like it is shown in the example (copy from stdin), which I found very useful when importing tons of data.
If you use MySql, you'll have to do multiple inserts. Remember to use transactions, so that if you use transactions, if your query fails it will be canceled and you can start over. Note that 5000 rows is not that large! You snould however be aware of the max_execution_time constraint which will kill your script after a number of seconds.
For what the SESSION is concerned, I believe that you are limited by the maximum amount of memory a script can use (memory_limit in php.ini). Session data are saved in files, so you should consider also the disk space usage if many clients are connected.
It depends on operating system file size, Whatever the session size, per page default is 128 MB.