Speed up forum conversion - php

I'm converting a forum from myBB to IPBoard (the conversion is done through a PHP script), however I have over 4 million posts that need to be converted, and it will take about 10 hours at the current rate. I basically have unlimited RAM and CPU, what I want to know is how can I speed this process up? Is there a way I can allocate a huge amount of memory to this one process?
Thanks for any help!

You're not going to get a script to run any faster. By giving it more memory, you might be able to have it do more posts at one time, though. Change memory_limit in your php.ini file to change how much memory it can use.
You might be able to tell the script to do one forum at a time. Then you could run several copies of the script at once. This will be limited by how it talks to the database table and whether the script has been written to allow this -- it might do daft things like lock the target table or do an insanely long read on the source table. In any case, you would be unlikely to get more than three or four running at once without everything slowing down, anyway.
It might be possible to improve the script, but that would be several days' hard work learning the insides of both forums' database formats. Have you asked on the forums for IPBoard? Maybe someone there has experience at what you're trying to do.

not sure how the conversion is done, but if you are importing a sql file , you could split it up to multiple files and import them at the same time. hope that helps :)

If you are saying that you have the file(s) already converted, you should look into MySQL Load Data In FIle for importing, given you have access to the MySQL Console. This will load data considerably faster than executing the SQL Statements via the source command.
If you do not have them in the files and you are doing them on the fly, then I would suggest having the conversion script write the data to a file (set the time limit to 0 to allow it to run) and then use that load data command to insert / update the data.

Related

Optimize huge file CSV treatment

I know this question can be too broad, but I need to find a way to optimize the treatment of a CSV file which contains 10 000 rows.
Each row must be parsed and at every row, I will need to call Google API and do calculations, then I need to write CSV file with new informations.
Right now, I am using PHP and the treatment takes around 1/2 hours.
Is there a way to optimize this ? I thought about using NodeJS to parallelize treatments of rows ?
You can use curl_multi_select to paralelize the Google API requests. — Load the input into a queue, run queries in parallel, write output and load more as the result is finished. Something like TCP Sliding Window algorithm.
Alternatively, you can load all data into a (SQLite) database (10 000 rows is not much) and then run the calculations in parallel. The database will be easier to implement than creating the sliding window.
I don't think the NodeJS would be much faster. Certainly not that much to be worth rewriting the existing code you already have.
You can debug the code by checking how long does it take to read the 10K rows and update them with some random extra columns or extra info. This will give you some sense of how long it takes to read and write to a CSV with 10K rows. I believe this shouldn't take long.
The google api calls might be culprit. If you know node.js it is good option, but if that is too much of a pain, you can use php curl to send multiple requests at once without waiting for the response for each request. This might help speed up the process. You can refer to this site for more info http://bytes.schibsted.com/php-perform-requests-in-parallel/
10,000 rows should be no problem but when opening in Python 3.6, make sure you use readlines and read all at once. Using the csv reader should also help with any separator issues and quote characters such as '"'. I've been reading 1.3million rows and its not an issue. Mine takes about 6-8 minutes to process, so your should be of the order of a few seconds.
Are you using a machine with enough memory? If you are using a raspberry pi, small virtual machine or really old laptop I could imagine that this would greatly hamper your processing time. Otherwise, you should be having no issues at all with python.

Limit of exec() command in PHP?

I am using exec command in PHP to execute C++ code where record_generate.cpp is code which generate output(100 to millions of records) based on hard coded parameters.
exec('./record_generater 2>&1', $output);
print_r($output);
When number of output lines are limited to few thousand it gives output but when it reaches to 100,000s to million it seems to be crashed. How can i avoid such pblms?
The first thing you should do is to see if running the C++ program from a shell causes a similar problem.
If so, it's a problem with the C++ code itself and nothing to do with PHP exec.
If it works okay standalone, then it's probably going to be related to storing millions of records into the $output variable.
While a string in PHP can be pretty big (2G from memory), there's a limited total space available to scripts, specified by memory_limit_ in thephp.ini` file.
Even at 128M (8M prior to 5.2), this may not be enough to hold millions of lines.
You could try increasing that variable to something larger and see if it helps.
However, you will probably still be better off finding a different way to get the information from your C++ executable into your PHP code, such as writing it to a file/database and processing it in PHP a bit at a time, rather than trying to store the lot in memory at once.
In any case, given that it's not really a good user experience to have to look through millions of rows anyway, it might be worthwhile examining what you really need from this data. For example, it may be possible to aggregate or partition it in some manner before outputting.
Any advice we give on that front will need substantially more information than we currently have.

Good idea to run a PHP file for a few hours as cronjob?

I would like to run a PHP script as a cronjob every night. The PHP script will import a XML file with about 145.000 products. Each product contains a link to an image which will be downloaded and saved on the server as well. I can imagine that this may cause some overload. So my question is: is it a better idea to split the PHP file? And if so, what would be a better solution? More cronjobs, with several minutes pause between each other? Run another PHP file using exec (guess not, cause I can't imagine that would make much of a difference), or someting else...? Or just use one script to import all products at once?
Thanks in advance.
It depends a lot on how you've written it in terms of whether it doesn't leak open files or database connections. It also depends on which version of php you're using. In php 5.3 there was a lot done to address garbage collection:
http://www.php.net/manual/en/features.gc.performance-considerations.php
If it's not important that the operation is transactional, i.e all or nothing (for example, if it fails half way through) then I would be tempted to tackle this in chunks where each run of the script processed the next x items, where x can be a variable depending on how long it takes. So what you'll need to do then is keep on repeating the script until nothing is done.
To do this, I'd recommend using a tool called the Fat Controller:
http://fat-controller.sourceforge.net
It can keep on repeating the script and then stop once everything is done. You can tell the Fat Controller that there's more to do, or that everything is done using exit statuses from the php script. There are some use cases on the Fat Controller website, for example: http://fat-controller.sourceforge.net/use-cases.html#generating-newsletters
You can also use the Fat Controller to run processes in parallel to speed things up, just be careful you don't run too many in parallel and slow things down. If you're writing to a database, then ultimately you'll be limited by the hard disc, which unless you have something fancy will mean your optimum concurrency will be 1.
The final question would be how to trigger this - and you're probably best off triggering the Fat Controller from CRON.
There's plenty of documentation and examples on the Fat Controller website, but if you need any specific guidance then I'd be happy to help.
To complete the previous answer, the best solution is to optimize your scripts:
Prefer JSON to XML, parsing JSON is faster (vastly).
Use one or few concurrent connection to database.
Alter multiple rows in one time (Insert 10-30 rows in one query, select 100 rows, delete multiple, not more to not overload memory and not less to make your transaction profitable).
Minimize the number of queries. (following previous point)
Skip definitively already up to date rows, use dates (timestamp, datetime).
You can also let the proc whisper with usleep(30) call.
To use multiple PHP process, use popen().

performance issue in website based on LAMP architecture

I have an internal website using LAMP architecture. My main page takes around 10 secs to load the data. There isn't lot of data, around 4-5k records. I dont have any complex MYSQL queries, but have a lot of them i.e. around 10-15 queries. Basically I'm extracting meta-data to display on the page. These are very simple queries. I have lot of PHP and javascript logic which is of medium complexity. I can't remove any of that. I have around 1800 lines of code in that page and I'm using datatables to display data.
The datatable contains 25 columns and lot of html select elements.
So how will I know what is causing performance bottleneck in this page? I tried to be as clear as possible, but please let me know if you have any questions.
Appreciate your time and help.
use any php profiler tool like zend profiler to see what section is taking much time
http://erichogue.ca/2011/03/linux/profiling-a-php-application/
Try to get all the data at once. I've had issues in the past where using multiple selects actually takes longer than more efficiently querying (also make sure you use a sql connection for as long as possible). Other than that profiling is the way to go in case your other code needs optimization.

how to handle large sets of data with PHP?

My web application lets user import an excel file and writes the data from the file into the mysql database.
The problem is, when the excel file has lots of entries, even 1000 rows, i get an error saying PHP ran out of memory. This occurs while reading the file.
I have assigned 1024MB to PHP in the php.ini file.
My question is, how to go about importing such large data in PHP.
I am using CodeIgniter.
for reading the excel file, i am using this library.
SOLVED. I used CSV instead of xls. and I could import 10,000 rows of data within seconds.
Thank you all for your help.
As others have said, 1000 records is not much. Make sure you process the records one at a time, or a few at a time, and that the variables you use for each iteration go out of scope after you're finished with that row or you're reusing the variables.
If you can avoid the necessity of processing excel files by exporting them to csv, that's even greater, cause then you wouldn't need such a library (which might or might not have its own memory issues).
Don't be afraid of increasing memory usage if you need to and that solves the problem, buying memory is the cheapest option sometimes. And don't let the 1 GB scare you, it is a lot for such a simple task, but if you have the memory and that's all you need to do, then its good enough for the moment.
And as a plus, if you are using an old version of PHP, try updating to PHP 5.4 which handles memory much better than its predecessors.
Instead of inserting one a time in a loop. Insert 100 row at a time.
You can always run
INSERT INTO myTable (clo1, col2, col2) VALUES
(val1, val2), (val3, val4), (val5, val6) ......
This way number of network transaction will reduce thus reducing resource usage.

Categories