Export BIG mysql table to JSON - php

I have a mysql table with 2.8 million records and i want to convert all of these to JSON. I wrote a script to convert but it stops with a memory warning.
Then i tried to create smaller files (file1 is 0,
to 100000 records, file 2 is 100000 to 1000000 records etc ) and combine with windows copy command. It works, but each file is a JSON array (like [{...}]) and when it merges, it becomes separate sections like [{}][{}] (where i want it like [{................}])
Is there any better solution to do this ?

I would suggest you to change 'memory_limit' in your php.ini configuration. Also if this takes much time then you can handle this by cron job(if possible)
OR
you can decode your ALL json files and merge it in a single array and then again encode in json and put in the json file.

Finally i did this. Please see the steps (i am not sure this is the right one, but it works).
Totally, i have 2.6 million records in my table. I created a script which will select mysql rows, convert to json and write to a file.
Select records from 0 to 1 million and create file 1. Repeat from 1 to 2 million and 2 to 2.6 million for file2 and file 3.
Combine these files using JQ (http://stedolan.github.io/jq/) and create a single JSON file.

Related

Bulk-update a DB table using values from a JSON object

I have a PHP program which gets from an API the weather forecast data for the following 240 hours, for 100 different cities (for a total of 24.000 records; I save them in a single table). The program gets, for every city and for every hour, temperature, humidity, probability of precipitation, sky cover and wind speed. This data is in JSON format, and I have to store all of it into a database, preferably mySQL. It is important that this operation has to be done in a single time for all the cities.
Since I would like to update the values every 10 minutes or so, performance is very important. If someone can tell me which is the most efficient way to update my table with the values from the JSON it would be of great help.
So far I have tried the following strategies:
1) decode the JSON and use a loop with a prepared statement to update each value at a time {too slow};
2) use a stored procedure {I do not know how to pass the procedure a whole JSON object, and I know there is a limited number of individual parameters I can pass};
3) use LOAD DATA INFILE {the generation of the csv file is too slow};
4) use UPDATE with CASE, generating the sql dynamically {the string gets so long that the execution is too slow}.
I will be happy to provide additional information if needed.
You have a single table with about a dozen columns, correct? And you need to insert 100 rows every 10 minutes, correct?
Inserting 100 rows like that every second would be only slightly challenging. Please show us the SQL code; something must be miserably wrong with it. I can't imagine how any of your options would take more than a few seconds. Is "a few seconds" too slow?
Or does the table have only 100 rows? And you are issuing 100 updates every 10 minutes? Still, no sweat.
Rebuild technique:
If practical, I would build a new table with the new data, then swap tables:
CREATE TABLE new LIKE real;
Load the data (LOAD DATA INFILE is good if you have a .csv)
RENAME TABLE real TO old, new TO real;
DROP TABLE old;
There is no downtime -- the real is always available, regardless of how long the load takes.
(Doing a massive update is much more "effort" inside the database; reloading should be faster.)

PHP / MYSQL Not Returning Entire BLOB At Random

We have a large table (10M + rows), with two BLOB columns. When checking certain entries with command line, the full contents are present, however using PHP to output the results, it cuts one of the BLOB fields short by about 200 characters or so at random. I already changed the memory limit in PHP.ini to 1024MB. Is there any other cap that I should be checking?
The exact same script pulling the exact same row causes the random string length to be output. It's using a simple mysql_fetch_row() method.

How to get max in and max out values from rrd files in single rrd command

We have MRTG set up to monitor the network .So for that we are using RRD tool to fetch an plotting the graph data. Now i have created a script which actually fetch data from RRD files , so from fetched data i need max in and and max out in 24 Hours. Now with these max values , i calculate the badwidth utilization for each customer/link.
Now my question is there, single rrd command to fetch max in , max out, min in and min out values from RRD files.
Since i am newbee to this RRD so i would appreciate if command is also provided with your solution.
Please help.
With an MRTG-created RRD files, the 'in' and 'out' datasources are named 'ds0' and 'ds1' respectively. There exist 8 RRAs; these correspond to granularities of 5min, 30min, 2hr and 1day with both AVG and MAX rollups. By default, these will be of length 400 (older versions of MRTG) or length 800 (newer versions of MRTG) which means that you are likely to have a time window of 2 days, 2 weeks, 2 months and 2 years respectively for these RRAs. (Note that RRDTool 1.5 may omit the 1pdp MAX RRA as this is functionally identical the the 1pdp AVG RRA)
What this means for you is the following:
You do not have a MIN type RRA. If working over the most recent 2 days, then this can be calculated from the highest-granularity AVG RRA. Otherwise, your data will be increasingly inaccurate.
Your lowest-granularity RRA holds MAX values on a per-day basis. However these days are split at midnight UCT rather than midnight local time. You do not specify which 24hr windows you need to calculate for.
IF you are only interested in claculating for the most recent 24h period, then all calculations can use the highest-granularity RRA.
Note that, because step boundaries are all calculated using UCT, unless you live in that timezone you can't use FETCH or XPORT to obtain the data you need as you need to summarise over a general time window.
To retrieve the data you can use something like this:
rrdtool graph /dev/null -e 00:00 -s "end-1day" --step 300
DEF:inrmax=target.rrd:ds0:AVERAGE:step=300:reduce=MAXIMUM
DEF:outrmax=target.rrd:ds1:AVERAGE:step=300:reduce=MAXIMUM
DEF:inrmin=target.rrd:ds0:AVERAGE:step=300:reduce=MINIMUM
DEF:outrmin=target.rrd:ds1:AVERAGE:step=300:reduce=MINIMUM
VDEF:inmax=inrmax,MAXIMUM
VDEF:inmin=inrmin,MINIMUM
VDEF:outmax=outrmax,MAXIMUM
VDEF:outmin=outrmin,MINIMUM
LINE:inrmax
PRINT:inmax:"In Max=%lf"
PRINT:inmin:"In Min=%lf"
PRINT:outmax:"Out Max=%lf"
PRINT:outmin:"Out Min=%lf"
A few notes on this:
We are using 'graph' so that we can use a generic time window, not dependent on a step boundary
Use rrdgraph in order to use a generic time window; fetch and xport will work on step boundaries.
We are summarising the highest-granularity RRA on the fly
We use /dev/null as we dont actually want the graph image
We have to define a dummy line in the graph else we get nothing
The DEF lines specify the highest-granularity step and a reduction CF. You might be able to skip this part if you're using 5min step
We calculate the summary values using VDEF and then print them on stdout using PRINT
The first line of the output will be the graph size; you can discard this
When you call rrdtool::graph from your php script, simply pass it the parameters in the same way as you would for commandline operation. If you're not using Linux you might need to use something other than /dev/null.

Insertion efficiency of a large amount of data with SQL

I have a program that I use to read CSV file and insert the data into a database. I am having trouble with it because it needs to able to insert big records ( up to 10,000 rows ) of data at a time. At first I had it looping through and inserting each record one at a time. That is slow because it calls an insert function 10,000 times... Next I tried to group it together so it inserted 50 rows at a time. I figured this way it would have to connect to the database less, but it is still too slow. What is an efficient way to insert many rows of a CSV file into a database? Also, I have to edit some data(such as add a 1 to a username if two are the same) before it goes into the database.
For a text file you can use the LOAD DATA INFILE command which is designed to do exactly this. It'll handle CSV files by default, but has extensive options for handling other text formats, including re-ordering columns, ignoring input rows, and reformatting data as it loads.
So I ended up using the fputcsv to put the data I changed into a new CSV file, then I used the LOAD DATA INFILE command to put the data from the new csv file into the table. This changed it from timing out at 120 secs for 1000 entries, to taking about 10 seconds to do 10,000 entries. Thank you to everyone that replied.
I have this crazy idea: Could you run multiple parallels scripts, each one takes care of a bunch of rows from your CSV.
Some thing like this:
<?php
// this tells linux to run the import.php in background,
// and releases your caller script.
//
// do this several times, and you could increase the overal time
$cmd = "nohup php import.php [start] [end] & &>/dev/null";
exec($cmd);
Also, have you tried to increase these limit of 50 bulk inserts to 100 or 500 for example?

php process - one after another - simulating threading

I have a report generation functionality.
Export to csv or txt .
For each month it will be 25000 records each row with 55 columns.
For yearly it will be more than 300000!!
i try to add memory limit ,but i dont think its good!! Anyway now its 128M.
My expectation
I will split the date range selected by user into a range of 25 days or 30.
I will run fetch data for 25 days , then write the csv.
Then fetch next 25000 , write that . like this.
How can I attain this?
for fetching am using a function $result= fetchRecords();
For writing csv , I am passing this $result array to view page and by looping and seperating by comma am printing.
So in controller it will be $template->records=$result;
If i do this in a for loop
for(){
$result= fetchRecords();
$template->records=$result;
}
I dont hink this will work.
How to do this? execute fetch.write then fetch then write.
Can you please suggest better way to implement this in PHP keeping it in memory limt?
I was fetching a huge data from db to an array . Then again loop this array to write into csv or text.That was really killing the memory.
Now , as i am fetching data from db , in that same loop am writing to the file.
No usage of array. Great difference.
PHP can do better export to csv or text. PHP is really a great language.
Now am downloading a csv of 25 mb . :) i break the excel.. as it exceed 65550 records .:)
More than enough for me.
So the lesson learned is- Dont use arrays for this type of huge data storing.

Categories