Import JSON data into database(MySQL) using PHP - php

I am writing a script in PHP to directly import data from a JSON file directly into a MySQL database. My sample data at the moment is pretty small. I have a few questions about it. I would like to do this without external packages. If you know or have a code sample I can look at that would be very helpful.
How can I make it work with very large data sets? Break it up in
chunks? Import it line by line? Use fopen?
If during it's run the script get interrupted for some reason, is
there a way to make sure it continues where it left of? How would you do this?
If I want to remove duplicates with the same emails should I do it
during the import or maybe after with a SQL query?
Would this work with XML and CSV as well?
DATA
[
{
"id":1,
"name":"Mike",
"email":"mike#businessweek.com"
},
{
"id":2,
"name":"Nick",
"email":"nick#businessweek.com"
}
]
$jsondata = file_get_contents('data.json');
$data = json_decode($jsondata, true);
foreach ($data as $row) {
$query = 'INSERT INTO `member` (`name`, `email`) VALUES (?, ?)';
$statement = $pdo->prepare($sql);
$statment->execute([$row['name'], $row['email']]);
}

This will probably be closed, but to give you some information:
The only way to break it up is if you had individual JSON objects per line, or individual JSON objects, or grab every X lines (5 in your example). Then you could just fopen and read it line by line. You could get from { to } (not recommended).
The only way that I can think of is to use transactions, and/or track the number of rows total and decrement each insert in a separate DB table.
If you import the entire file you can do it then, maybe with array_column to reindex on email which will remove duplicates. If not then it would have to be after with SQL.
Yes, you would do it the same way, just parsing the data is different. CSV would be fgetcsv or str_getcsv and XML would be Simple XML or something similar. After the data is in an array of arrays it will be the same inserting.

Related

Store php generated JSON string in a CSV file

I am trying to generate a JSON string using php from an array, and I would like to store that string in a csv file, also with PHP. I want to do this, because I am working with a quite large amount of data, and I would like to use MySQL's LOAD DATA LOCAL INFILE to populate and update my database table.
This is the code I have:
$tmpFileProducts = 'path/to/file';
$tmpFileProductsHandler= fopen($tmpFileProducts, 'w');
foreach ($attributeBatch as $productId => $batch) {
fputcsv($tmpFileProductsHandler, array($productId, $batch['title'], $batch['parsed'], json_encode($batch['attributes'])), "|", "\"");
}
My problem is, that when I am creating the CSV file, the JSON double-quotes are not escaped, thus I end up with simmilar lines in my csv file:
43541|"telefon mobil 3l 2020 4g "|"2020-12-05 17:38:19"|"{""color"":""dark chrome"",""memory_value"":4294967296,""storage_value"":68719476736,""sim_slot"":""dual""}"
My first possible solution would be, to change the string enclosure of my CSV file, but what enclosure should I use, to ensure no conflicts could arrise with the inner json column? I am in complete control of the array that is stringified, it will only contain ASCII characters, in it.
Would there be a way, to keep the current string enclosure, and instead escape the JSON string somehow? Later on, I will need to fetch the data that is included to the database, and convert it to an array again.
DISCLAIMER: I am well aware that instead of storing the data as a JSON string, I could store it in a specific relational table (which I am also doing), but I would need quick access to this data, for a background script that is running, and I would like to save on the time of the queries, to the relational table, as when the background script will use this data, it doesn't need to search in it.
Follow up question: as I am explicitly telling the fputcsv function what to use as string enclosure shouldn't it automatically escape all the simmilar inner strings?

Is it possible to setCellValue directly without the need of a multidimentional array?

I currently started working on a script that had already been done. It works great, but it takes way too long and I believe I know the cause but I have had no success in improving it.
The case is as follows, the script reads a XML file with a lot of info regarding temperatures, all of which are inside the various <Previsao> tags inside the xml.
$l = 3;
$q = $CON->Query("SELECT
cod_cidade,
cidade,
cidcompleta
FROM
listabrasil
WHERE
cidade LIKE '%Aj%'
ORDER BY
cidade ASC");
while ($x = $CON->Fetch ($q))
{
$XML = simplexml_load_file('http://ws.somarmeteorologia.com.br/previsao.php?Cidade='.$x['cidade'].'&DiasPrevisao=15');
print $x['cidade']."\n";
foreach ($XML->Previsao as $P)
{
$Previsao[$x['cidade']]['data'][] = (string)$P->Data;
$Previsao[$x['cidade']]['tmin'][] = (float) $P->Tmin;
$Previsao[$x['cidade']]['tmax'][] = (float) $P->Tmax;
$Previsao[$x['cidade']]['prec'][] = (float) $P->Prec;
$Previsao[$x['cidade']]['velvento'][] = (float) $P->VelVento;
$Previsao[$x['cidade']]['dirvento'][] = (string)$P->DirVento;
}
}
foreach ($Previsao as $Cid => $Dados)
{
$col = 1;
for($dias = 0; $dias < 15 ; $dias++)
{
$PlanilhaBloomberg->setCellValue($colunas[$col+0].'2', $Dados['data'][$dias]);
$PlanilhaBloomberg->setCellValue($colunas[$col+0].$l, $Dados['tmin'][$dias].'C');
$PlanilhaBloomberg->setCellValue($colunas[$col+1].$l, $Dados['tmax'][$dias].'C');
$PlanilhaBloomberg->setCellValue($colunas[$col+2].$l, $Dados['prec'][$dias].'mm');
$PlanilhaBloomberg->setCellValue($colunas[$col+3].$l, $Dados['velvento'][$dias].'km/h');
$PlanilhaBloomberg->setCellValue($colunas[$col+4].$l, $Dados['dirvento'][$dias]);
print $Dados['data'][$dias]."\n";
print $Dados['tmin'][$dias]."\n";
print $Dados['tmax'][$dias]."\n";
print $Dados['prec'][$dias]."\n";
print $Dados['velcento'][$dias]."\n";
print $Dados['dirvento'][$dias]."\n";
$col = $col + 5;
}
$l++;
}
Don't worry about the setCellValue, it's just from the PHPExcel library. So, from what I could gather, it's taking so long to execute due to obviously the large amount of data that it's gathering from the XML, but also because it keeps filling the multidimensional array $Previsao ... What I am hoping to achieve (with no success, might I add) is to fill the setCellValue directly, without the need for a multidimensional array. Do you guys think it's possible, if it is, would this reduce the exec_time for the script?
Thank you all in advance for the help, and also please forgive me if this question is too focused, not sure if this could cause problems.
PHPExcel usually does take a long time to setCellValues. Their might be an optimal solution to better of your code but i don't think it would make an excessive difference when it comes to runtime. Try to setPreCalculateFormulas(False) before saving your file so that PHPExcel doesn't calculate values on save. That might save some time. Second of all, assuming $PlanilhaBloomberg is $object->getActiveSheet(), you can call it as such
$PlanilhaBloomberg->setCellValue($colunas[$col+0].'2', $Dados['data'][$dias]);
->setCellValue($colunas[$col+0].$l, $Dados['tmin'][$dias].'C');
->setCellValue($colunas[$col+1].$l, $Dados['tmax'][$dias].'C');
->setCellValue($colunas[$col+2].$l, $Dados['prec'][$dias].'mm');
->setCellValue($colunas[$col+3].$l, $Dados['velvento'][$dias].'km/h');
->setCellValue($colunas[$col+4].$l, $Dados['dirvento'][$dias]);
That might help.
Well you are iterating over the database result, iterating over the xml for each database result, in order to build a big array in memory; then iterating over that array to build the Excel sheet.... surely it should be possible to avoid building the big array; arrays are memory expensive, and a lot of time overhead is spent in allocating the memory for that array as well.
It should be possible too populate the Excel directly from the XML loop, avoiding the entire array building (which saves memory and memory allocation time) and saves an entire loop as well.
It's really just a case of identifying which cell needs populating from which XML value; but I can't envisage the array structure to work that out

Parse CSV string in Laravel

My situation is this: the first ~500kb of a CSV file is encoded as a string and sent as a request to my server, running Laravel. How can I make Laravel parse that string (which is, by itself, not a complete CSV file) to get the column headers and first few rows of data?
I've looked into Goodby/CSV, but it looks as though that can only take a file as input, and not a string. Is there a CSV interpreter plugin that can handle this scenario, or should I expect to have to write a parser of my own?
Edit:
Looks like I can do something like this:
$Data = str_getcsv($CsvString, "\n"); //parse the rows
foreach($Data as &$Row) $Row = str_getcsv($Row, ","); //parse the items in rows
I was overthinking it, of course.
I've looked into Goodby/CSV, but it looks as though that can only take
a file as input, and not a string
If you have fopen wrappers enabled, you can take advantage of the fact that GoodbyCSV uses the SplFileObject class internally, and pass the string as a data protocol:
$lexer->parse('data://text/plain;base64,' . base64_encode($csv_string), $interpreter);
I found this useful when I needed to handle large files as well as simple strings.

Single column CSV to two column MySql table data store

I have a csv file which contains 2000 data in one single column. The data format is like following:
name 100/1
name 100/2
name 105/6
...
So the general format is 'text integer_one/integer_two' Now I want it to store in a mysql database table with 2 columns. column_1 should have the data integer_one, and column_2 should have the data integer_two
How can I do this using php to push the data into mysql?
First start by extracting your values. If all your data is formated like in your example, you can use a simple explode.
$lines = explode(",", $data); // $data should be your file extracted into a string
foreach ($lines as $line) {
$temp = explode(" ", $line);
$name = $temp[0];
$temp_2 = explode("/", $temp[1]);
$integer_1 = $temp_2[0];
$integer_2 = $temp_2[1];
$query = "INSERT INTO table_two_columns SET name = '{$name}', integer_1 = '{$integer_1}', integer_2 = '{$integer_2}'";
}
Well that's my take on your problem.
Use ph function fgetcsv function to read a csv file (example is give in this page) then take each in loop and parse it using function explode() and create query from it and save in database
Take a look at preg_match, you should be able to get these values using a fairly trivial regular expression, like:
preg_match(/\w+\s*(\d+)\\\/(\d+)/, $row, $matches);
var_dump($matches);
To pull the data from a file take a look at fgets there are example of reading the file line-by-line to get the $row variable in the code above.
You can then write each line to the database using an INSERT statement.
Some things to consider:
How large is the file?
How often do you need to insert this data - is it a one-off?
These things will influence how you build your code; there are multiple ways to bulk insert data, and if this is just a one-off run then I would suggest maniuplating the text data using sed or similar, and then bulk inserting using the native methods of your database of choice (for mysql see LOAD DATA INFILE, or using the \. option in the client)
If the file is large and you need to insert things regularly, look at using prepared statements, and/or the multiple insert syntax:
INSERT INTO mytable (name, number, anothernumber) VALUES('paul', 1, 2),('bob',2,1)
have a look at fopen() and fread() or fgetcsv() php functions to handle the files, with them you should be able to loop through the lines of the cvs file, If you have control over your server configuration, I would suggest you to have a look into SplFileObject instead, because you can handle the files more efficient/elegantly, and object oriented. but require you to enable SPL. This last one is my recommendation, as you could read the file like this:
function random_name(){
$file_name = "names.csv";
if(is_readable($file_name)){
$file = new SplFileObject($file_name);
$file->seek($file->getSize());
$linesTotal = $file->key();
$file->seek(rand(1,$linesTotal-1));
$line = $file->fgetcsv();
return $line[4];
}
}
This was a sample script of mine to get a random name from a 1000 name cvs file.
As you loop and get the file information, you can then use explode(), this function separates your string by the separator you define, in your case it would be:
$array = explode("/", $myLine);
And you would access like this:
$array[0] or $array[1] to get the value

How would I go about by creating a parser which has the ability to import txt into mysql

http://pastebin.com/raw.php?i=7NTGXU5R
I have about a hundred of those listened in the same file
I tried working on a php solution, but I wasn't sure how to parse the space, I could only find fgetcsv which does commas
What direction should I head to in order to make sense
I remember some C++ from years ago, I was thinking I do something like a getline, then store the line (or row in our case) into an array
Once that is done, just write a bunch of if statements to go through each line and classify the first element (column) in each array to be the designated 'header'.
Tasks like that always boil down to a large amount of custom string-munching code. Your best weapons of choice will be regular expressions. Forget about fgetcsv if files look like your file does.
The basic logic might look something like this:
Fetch all rows of the file via the file function.
Save each table area to an own array containing the rows:
foreach ($filelines as $line)
{
$lefreportlines[] = trim(substr($line, 0, 93));
$middlereportlines[] = trim(substr($line, 67, 135));
...
}
When you're done, start processing each report as it deserves. For example, the leftmost report might simply be parsed with preg_split('/\s+/', $line);
Either way, you'll have a lot of work to do. Good luck!

Categories