How to make a CSV upload perform faster?

How to make a CSV upload perform faster? - php

I was wondering if any of you guys have tried something about CSV migration in a live form(multipart/form-data) mine is working the only thing I hate about is that it consumes so much of time and it's reaching the maximum execution timeout. The quick fix I made is by setting the maximum execution time in my php.ini(or set_time_limit()) but it's really annoying me to wait for half an hour just to import the whole data though it's not more than 100kb. Am I just overreacting or something?
This is the code:
function upload($id, $old_eid)
{
$filename = $_FILES['event_file']['tmp_name'];
$handle = fopen($filename, "r");
while(($data = fgetcsv($handle, 1000, ",")) !== FALSE){
$id = $id;
$id2 = $data[2];
$ckr = $this->Manager_model->check_if_record_exists($id, $id2);
if(count($ckr) > 0):
$this->session->set_flashdata('err', '<div class="error">Duplicated record</div>');
redirect("manager/csver/$id");
else:
$data['col1'] = $data[0];
$data['col2'] = $id;
$data['col3'] = $data[3].' '.$data[4];
$data['col4'] = $data[2];
$data['col5'] = $data[6];
$data['col6'] = $data[1];
$data['col7'] = $data[7];
$data['col8'] = mt_rand(11111, 99999);
$data['col9'] = $old_eid;
$this->Manager_model->add_csv($data);
$this->Manager_model->add_csv_to_photo($data);
endif;
}
fclose($handle);
$this->session->set_flashdata('success', '<div class="success">CSV successfully uploaded</div>');
redirect("manager/records/$id");
//$this->session->set_flashdata('msg', '<div class="success">Records successfully uploaded</div>');
}
My Manager_model:
function add_csv($data)
{
$src = array(
'col1'=> $data['col1'],
'col2' => $data['col2'],
'col3' => $data['col3'],
'col4' => $data['col4'],
'col5' => $data['col5'],
'col6' => $data['col6'],
'col7' => $data['col7'],
'col8' => $data['col8'],
);
$this->db->insert('e_records2', $src);
if($this->db->affected_rows() == '1'):
return TRUE;
endif;
return FALSE;
}
function add_csv_to_photo($data) {
$src = array(
'col1'=> $data['col1'],
'col2' => $data['col2'],
'col3' => $data['col3'],
'col4' => $data['col4'],
'col5'=> $data['col5'],
'col6'=> $data['col6'],
);
$this->db->insert('e_records', $src);
if($this->db->affected_rows() == '1'):
return TRUE;
endif;
return FALSE;
} function check_if_record_exists($id, $id2)
{
$eid = $id;
$id2 = $id2;
$query = $this->db->query("select * from races_results where eid = $eid AND id2 = $id2");
return $query->result();
}
P.S.
I'm not talking about PhpMyAdmin here cos I know how import csv file works there. And plus it would create a lot of trivial tasks to have a file to migrate using the bone.

Why not run the profiler to optimize your code? Codeigniter includes this useful piece for problems like this http://codeigniter.com/user_guide/general/profiling.html
It will give you a breakdown of your SQL queries and what is taking long, and where.
$this->output->enable_profiler(TRUE);

The problem seems to me that you are querying the DB once (or twice ?) per line in your CSV file.
Of course you're going to get horrible performance.
You can do the whole query in one go and have the DB make the CSV for you in no time.
SELECT DISTINCT f1,f2,f3,... FROM tablex WHERE .. INTO OUTFILE 'c:/dir/ca.csv'
FIELDS ESCAPED BY '"' FIELDS TERMINATED BY ';' LINES TERMINATED BY '\n';
//note the use of forward slashes even on Windows.
See: http://dev.mysql.com/doc/refman/5.0/en/select-into.html
The speed of the select itself is the limiting factor here.
Make sure you have write permissions on the directory and note that MySQL will never overwrite files.
This command is very fast on MySQL.

$id = $id;
really?
$ckr = $this->Manager_model->check_if_record_exists($id, $id2);
One obvious way to make it go faster would be to have a unique index on eid and id2 and ignore duplicate row errors on the INSERT.
But really, f you want it to go much faster, just tell mysql to parse and load the data.

Related

How to increase id inside while loop?

code:
while (($row= fgetcsv($file_data, 10000, ",")) !== FALSE)
{
$product_id = date('mdHis');
$data[] = array(
'product_id' => $product_id
);
}
In this code I am importing csv file which work perfectly. Now, When I insert csv file data into my database then I am also insert an id i.e. product_id Now, when I click on submit button then It store same value but I want to store different product_id for a different row. So, How can I do this? Please help me.
Thank You

You may want to just use auto-increment in the database, you can additionally use a date_created column, with the time. A loop is too fast for date() (s is seconds!), but even microtime() would not really make much sense.
If you want to really do this, why ever, in php:
function generateTimeID($start, $format_string) {
while (True) {
yield date($format_string) . $start;
$start ++;
}
}
$time_generator = generateTimeID($last_id_from_database, 'mdHis-');
while (($row= fgetcsv($file_data, 10000, ",")) !== FALSE)
{
$product_id = $time_generator->value();
$time_generator->next();
$data[] = array(
'product_id' => $product_id
);
}

How to process CSV with 100k+ lines in PHP?

I have a CSV file with more than 100.000 lines, each line has 3 values separated by semicolon. Total filesize is approx. 5MB.
CSV file is in this format:
stock_id;product_id;amount
==========================
1;1234;0
1;1235;1
1;1236;0
...
2;1234;3
2;1235;2
2;1236;13
...
3;1234;0
3;1235;2
3;1236;0
...
We have 10 stocks which are indexed 1-10 in CSV. In database we have them saved as 22-31.
CSV is sorted by stock_id, product_id but I think it doesn't matter.
What I have
<?php
session_start();
require_once ('db.php');
echo '<meta charset="iso-8859-2">';
// convert table: `CSV stock id => DB stock id`
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
$sql = $mysqli->query("SELECT product_id FROM table WHERE fielddef_id = 1");
while ($row = $sql->fetch_assoc()) {
$products[$row['product_id']] = 1;
}
$csv = file('export.csv');
// go thru CSV file and prepare SQL UPDATE query
foreach ($csv as $row) {
$data = explode(';', $row);
// $data[0] - stock_id
// $data[1] - product_id
// $data[2] - amount
if (isset($products[$data[1]])) {
// in CSV are products which aren't in database
// there is echo which should show me queries
echo " UPDATE t
SET value = " . (int)$data[2] . "
WHERE fielddef_id = " . (int)$stocks[$data[0]] . " AND
product_id = '" . $data[1] . "' -- product_id isn't just numeric
LIMIT 1<br>";
}
}
Problem is that writing down 100k lines by echo is soooo slow, takes long minutes. I'm not sure what MySQL will do, if it will be faster, or take ± the same time. I have no testing machine here, so I'm worry about testing in on prod server.
My idea was to load CSV file into more variables (better array) like below, but I don't know why.
$csv[0] = lines 0 - 10.000;
$csv[1] = lines 10.001 - 20.000;
$csv[2] = lines 20.001 - 30.000;
$csv[3] = lines 30.001 - 40.000;
etc.
I found eg. Efficiently counting the number of lines of a text file. (200mb+), but I'm not sure how it can help me.
When I replace foreach for print_r, I get dump in < 1 sec. The task is to make the foreach loop with database update faster.
Any ideas how to updates so many records in database?
Thanks.

Something like this (please note this is 100% untested and off top of my head may need some tweaking to actually work :) )
//define array may (probably better ways of doing this
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
$handle = fopen("file.csv", "r")); //open file
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
//loop through csv
$updatesql = "UPDATE t SET `value` = ".$data[2]." WHERE fielddef_id = ".$stocks[$data[0]]." AND product_id = ".$data[1];
echo "$updatesql<br>";//for debug only comment out on live
}
There is no need to do your initial select since you're only ever setting your product data to 1 anyway in your code and it looks from your description that your product id's are always correct its just your fielddef column which has the map.
Also just for live don't forget to put your actual mysqli execute command in on your $updatesql;
To give you a comparison to actual usage code (I can benchmark against!)
This is some code I use for an importer of an uploaded file (its not perfect but it does its job)
if (isset($_POST['action']) && $_POST['action']=="beginimport") {
echo "<h4>Starting Import</h4><br />";
// Ignore user abort and expand time limit
//ignore_user_abort(true);
set_time_limit(60);
if (($handle = fopen($_FILES['clientimport']['tmp_name'], "r")) !== FALSE) {
$row = 0;
//defaults
$sitetype = 3;
$sitestatus = 1;
$startdate = "2013-01-01 00:00:00";
$enddate = "2013-12-31 23:59:59";
$createdby = 1;
//loop and insert
while (($data = fgetcsv($handle, 10000, ",")) !== FALSE) { // loop through each line of CSV. Returns array of that line each time so we can hard reference it if we want.
if ($row>0) {
if (strlen($data[1])>0) {
$clientshortcode = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[0])));
$sitename = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[0]))." ".trim(stripslashes($data[1])));
$address = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[1])).",".trim(stripslashes($data[2])).",".trim(stripslashes($data[3])));
$postcode = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[4])));
//look up client ID
$client = $db->queryUniqueObject("SELECT ID FROM tblclients WHERE ShortCode='$clientshortcode'",ENABLE_DEBUG);
if ($client->ID>0 && is_numeric($client->ID)) {
//got client ID so now check if site already exists we can trust the site name here since we only care about double matching against already imported sites.
$sitecount = $db->countOf("tblsites","SiteName='$sitename'");
if ($sitecount>0) {
//site exists
echo "<strong style=\"color:orange;\">SITE $sitename ALREADY EXISTS SKIPPING</strong><br />";
} else {
//site doesn't exist so do import
$db->execute("INSERT INTO tblsites (SiteName,SiteAddress,SitePostcode,SiteType,SiteStatus,CreatedBy,StartDate,EndDate,CompanyID) VALUES
('$sitename','$address','$postcode',$sitetype,$sitestatus,$createdby,'$startdate','$enddate',".$client->ID.")",ENABLE_DEBUG);
echo "IMPORTED - ".$data[0]." - ".$data[1]."<br />";
}
} else {
echo "<strong style=\"color:red;\">CLIENT $clientshortcode NOT FOUND PLEASE ENTER AND RE-IMPORT</strong><br />";
}
fcflush();
set_time_limit(60); // reset timer on loop
}
} else {
$row++;
}
}
echo "<br />COMPLETED<br />";
}
fclose($handle);
unlink($_FILES['clientimport']['tmp_name']);
echo "All Imports finished do not reload this page";
}
That imported 150k rows in about 10 seconds

Due to answers and comments for the question, I have the solution. The base for that is from #Dave, I've only updated it to pass better to question.
<?php
require_once 'include.php';
// stock convert table (key is ID in CSV, value ID in database)
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
// product IDs in CSV (value) and Database (product_id) are different. We need to take both IDs from database and create an array of e-shop products
$sql = mysql_query("SELECT product_id, value FROM cms_module_products_fieldvals WHERE fielddef_id = 1") or die(mysql_error());
while ($row = mysql_fetch_assoc($sql)) {
$products[$row['value']] = $row['product_id'];
}
$handle = fopen('import.csv', 'r');
$i = 1;
while (($data = fgetcsv($handle, 1000, ';')) !== FALSE) {
$p_id = (int)$products[$data[1]];
if ($p_id > 0) {
// if product exists in database, continue. Without this condition it works but we do many invalid queries to database (... WHERE product_id = 0 updates nothing, but take a time)
if ($i % 300 === 0) {
// optional, we'll see what it do with the real traffic
sleep(1);
}
$updatesql = "UPDATE table SET value = " . (int)$data[2] . " WHERE fielddef_id = " . $stocks[$data[0]] . " AND product_id = " . (int)$p_id . " LIMIT 1";
echo "$updatesql<br>";//for debug only comment out on live
$i++;
}
}
// cca 1.5sec to import 100.000k+ records
fclose($handle);

Like I said in the comment, use SPLFileObject to iterate over the CSV file. Use Prepared statements to reduce performance overhead of calling the UPDATE in each loop. Also, merge your two queries together, there isn't any reason to pull all of the product rows first and check them against the CSV. You can use a JOIN to ensure that only those stocks in the second table that are related to the product in the first and that is the current CSV row will get updated:
/* First the CSV is pulled in */
$export_csv = new SplFileObject('export.csv');
$export_csv->setFlags(SplFileObject::READ_CSV | SplFileObject::DROP_NEW_LINE | SplFileObject::READ_AHEAD);
$export_csv->setCsvControl(';');
/* Next you prepare your statement object */
$stmt = $mysqli->prepare("
UPDATE stocks, products
SET value = ?
WHERE
stocks.fielddef_id = ? AND
product_id = ? AND
products.fielddef_id = 1
LIMIT 1
");
$stmt->bind_param('iis', $amount, $fielddef_id, $product_id);
/* Now you can loop through the CSV and set the fields to match the integers bound to the prepared statement and execute the update on each loop. */
foreach ($export_csv as $csv_row) {
list($stock_id, $product_id, $amount) = $csv_row;
$fielddef_id = $stock_id + 21;
if(!empty($stock_id)) {
$stmt->execute();
}
}
$stmt->close();

Make the query bigger, i.e. use the loop to compile a larger query. You may need to split it up into chunks (e.g. process 100 at a time), but certainly don't do one query at a time (applies for any kind, insert, update, even select if possible). This should greatly increase the performance.
It's generally recommended that you don't query in a loop.

Updating every record every time will be too expensive (mostly due to seeks, but also from writing).
You should TRUNCATE the table first and then insert all the records again (assuming you won't have external foreign keys linking to this table).
To make it even faster, you should lock the table before the insert and unlock it afterwards. This will prevent the indexing from happening at every insert.

query inside while loop only shows 1 result

i'm making a while loop in php and it all goes well but the problem is that
I don't only want to get the id of the user but also some other stuff that is inside another table, so when I go ahead and make a query inside this while loop and select everything from that second table (where the id is equal to the id of the result from the first query), it only returns 1 result...
So this is the code that I currently have:
public function getFriends($id)
{
global $params;
$get = $this->db->select("{$this->DB['data']['friends']['tbl']}", "*",
array(
"{$this->DB['data']['friends']['one']}" => $id
)
);
if($get)
{
while($key = $get->fetch())
{
$query = $this->db->query("SELECT * FROM {$this->DB['data']['users']['tbl']}
WHERE {$this->DB['data']['users']['id']} = :id",
array(
"id" => $key->{$this->DB['data']['friends']['two']}
)
);
while($row = $query->fetch())
{
$params["user_friends"][] = [
"id" => $key->{$this->DB['data']['friends']['two']},
"name" => $row->{$this->DB['data']['users']['username']},
"look" => $row->{$this->DB['data']['users']['figure']}
];
}
}
}
else
{
$params["update_error"] = $params["lang_no_friends"];
}
}
Thanks in advance!
Please help me out!

In the absence of answers, I don't know what db framework you are using behind the scenese...PDO, mysqli_, or (hopefully not) mysql_. But, in any case, the problem might be that your second query stops the first from continuing. I would use PDO->fetchAll() to get them all...but you say you can't do that...so, looping the first and loading those results into an array is the first thing I would do to see if this is the problem:
public function getFriends($id)
{
global $params;
$get = $this->db->select("{$this->DB['data']['friends']['tbl']}", "*",
array(
"{$this->DB['data']['friends']['one']}" => $id
)
);
$firstResults = array();
if( $get ) {
while( $key = $get->fetch() ) {
$firstResults[] = $key;
}
}
else
{
$params["update_error"] = $params["lang_no_friends"];
}
foreach( $firstResults AS $key )
{
$query = $this->db->query("SELECT * FROM {$this->DB['data']['users']['tbl']}
WHERE {$this->DB['data']['users']['id']} = :id",
array(
"id" => $key->{$this->DB['data']['friends']['two']}
)
);
while($row = $query->fetch())
{
$params["user_friends"][] = [
"id" => $key->{$this->DB['data']['friends']['two']},
"name" => $row->{$this->DB['data']['users']['username']},
"look" => $row->{$this->DB['data']['users']['figure']}
];
}
}
}
If this doesn't work, then we need more data...e.g. what is the query generated? When you run it manually does it return more than one result? If you get rid of the inner-query, does this fix it? etc.

The first step when diagnosing PHP and Mysql issues is to add lines to your code that tell you what each line is doing (declare each time a loop is entered; when each mysql query is run, spit out the query string) so you can narrow down where the problem is. Often this makes you feel stupid in retrospect: "Duh, this query didn't return anything because I formatted the record ID wrong" and so forth.
The code snippet you've provided above isn't super helpful to me. I'm a troubleshooter (not a parser) so I need diagnostic data (not straight code) to be of any more help than this.

How to print out 150k record in CSV, without overloading the website

In order to avoid overloading the server, I made a loop of queryen, I'll get 150k members up and stored in an array. This works fine, but when the loop has finished with its job, the array has to be printed out, but this takes a long time and it ends up, with the side crashes.
$development = array(
'testing' => false,
'testing_loops' => 1
);
$settings = array(
'times_looped' => 0,
'members_at_a_time' => 2000,
'print_settings' => true,
'members_looped' => 0,
'test' => 0,
);
function outputCSV($data)
{
$outstream = fopen("php://output", 'w');
array_walk($data, '__outputCSV', $outstream);
fclose($outstream);
}
function __outputCSV(&$vals, $key, $filehandler)
{
fwrite($filehandler, implode(',',$vals). "\n");
}
function getMembers(&$settings, $ee)
{
// SQL FROM
$sql_from = $settings['times_looped'] * $settings['members_at_a_time'];
// SQL LIMIT
$sql_limit = $sql_from . ', ' . $settings['members_at_a_time'];
$settings['test'] = $sql_limit;
// GET MEMBERS
$query = $ee->EE->db->query("SELECT m.email,
cr.near_rest_1_id, cr.near_rest_1_distance,
cr.near_rest_2_id, cr.near_rest_2_distance,
cr.near_rest_3_id, cr.near_rest_3_distance
from exp_members m
left join
exp_menucard_closest_restaurants cr
on m.member_id = cr.member_id
where group_id = 8 or 14 limit ".$sql_limit."");
// Check if members found
if($query->num_rows() == 0)
{
return $query->num_rows();
}
// Update number of members
$settings['members_looped'] = $settings['members_looped'] + $query->num_rows();
// Loop members
foreach($query->result_array() as $row) {
if($row['near_rest_1_distance'] > 1.0)
{$near_rest_1_distance= number_format($row['near_rest_1_distance'], 2, ',', ',') ." ". 'km';}
else
{$near_rest_1_distance= number_format($row['near_rest_1_distance'], 3, ',', '')*1000 ." ". 'meter';}
if($row['near_rest_2_distance'] > 1.0)
{$near_rest_2_distance= number_format($row['near_rest_2_distance'], 2, ',', ',') ." ". 'km';}
else
{$near_rest_2_distance= number_format($row['near_rest_2_distance'], 3, ',', '')*1000 ." ". 'meter';}
if($row['near_rest_3_distance'] > 1.0)
{$near_rest_3_distance= number_format($row['near_rest_3_distance'], 2, ',', ',') ." ". 'km';}
else
{$near_rest_3_distance= number_format($row['near_rest_3_distance'], 3, ',', '')*1000 ." ". 'meter';}
$nearest_rest_result_array[] = array(
'email' => $row['email'],
'near_rest_1_id' => $row['near_rest_1_id'],
'near_rest_1_distance' => $near_rest_1_distance,
'near_rest_2_id' => $row['near_rest_2_id'],
'near_rest_2_distance' => $near_rest_2_distance,
'near_rest_3_id' => $row['near_rest_3_id'],
'near_rest_3_distance' => $near_rest_3_distance
);
}
// Loop again
return $query->num_rows();
}
// Loop
$more_rows = true;
while($more_rows == true || $more_rows > 0)
{
// Test
if($settings['times_looped'] >= $development['testing_loops'] && $development['testing'] == true){
break;
}
// get members
$more_rows = getMembers($settings, $this);
$settings['members_looped'] = $settings['members_looped'] + $more_rows;
$settings['times_looped']++;
// Got last bunch of members
if($settings['members_looped'] < $settings['members_at_a_time'])
{
break;
}
}
When the loop has finished with its job, it will print all the array out
// Write to CSV
outputCSV($nearest_rest_result_array);

Don't use a foreach loop. Use a while-loop that reads a rows from the database and writes it to the CSV file. This way you're operating line-by-line which doesn't use as much memory.
If you're working with large data sets it's usually better to have some concept of iterators or streams, rather that trying to modify the whole in one big operation.

The mistake starts early, use an iterator instead of the array you currently do:
foreach($query->result_array() as $row)
PDO and Mysqli allow to iterate over the result. Create the output on the fly and stream it to the client, your webserver will chunk it normally, if not, set your PHP output buffer to 4096k or similar.

Consider implementing pagination in your webpage.
Lets take an example. Suppose your database has 10,000 rows. There maybe no need for those 10,000 rows to be displayed at once. Instead we can display 100 records per page and have links 100 such pages.
Best Example can be https://www.google.co.in/?gws_rd=cr&ei=-HggUuXWBMj4rQeNr4CADw#q=pagination+in+php
Of 6,190,000 results they have shown only 11 per page.

How to remove htmlentities() values from the database?

Long before I knew anything - not that I know much even now - I desgined a web app in php which inserted data in my mysql database after running the values through htmlentities(). I eventually came to my senses and removed this step and stuck it in the output rather than input and went on my merry way.
However I've since had to revisit some of this old data and unfortunately I have an issue, when it's displayed on the screen I'm getting values displayed which are effectively htmlentitied twice.
So, is there a mysql or phpmyadmin way of changing all the older, affected rows back into their relevant characters or will I have to write a script to read each row, decode and update all 17 million rows in 12 tables?
EDIT:
Thanks for the help everyone, I wrote my own answer down below with some code in, it's not pretty but it worked on the test data earlier so barring someone pointing out a glaring error in my code while I'm in bed I'll be running it on a backup DB tomorrow and then on the live one if that works out alright.

I ended up using this, not pretty, but I'm tired, it's 2am and it did its job! (Edit: on test data)
$tables = array('users', 'users_more', 'users_extra', 'forum_posts', 'posts_edits', 'forum_threads', 'orders', 'product_comments', 'products', 'favourites', 'blocked', 'notes');
foreach($tables as $table)
{
$sql = "SELECT * FROM {$table} WHERE data_date_ts < '{$encode_cutoff}'";
$rows = $database->query($sql);
while($row = mysql_fetch_assoc($rows))
{
$new = array();
foreach($row as $key => $data)
{
$new[$key] = $database->escape_value(html_entity_decode($data, ENT_QUOTES, 'UTF-8'));
}
array_shift($new);
$new_string = "";
$i = 0;
foreach($new as $new_key => $new_data)
{
if($i > 0) { $new_string.= ", "; }
$new_string.= $new_key . "='" . $new_data . "'";
$i++;
}
$sql = "UPDATE {$table} SET " . $new_string . " WHERE id='" . $row['id'] . "'";
$database->query($sql);
// plus some code to check that all out
}
}

Since PHP was the method of encoding, you'll want to use it to decode. You can use html_entity_decode to convert them back to their original characters. Gotta loop!
Just be careful not to decode rows that don't need it. Not sure how you'll determine that.

I think writing a php script is good thing to do in this situation. You can use, as Dave said, the html_entity_decode() function to convert your texts back.
Try your script on a table with few entries first. This will make you save a lot of testing time. Of course, remember to backup your table(s) before running the php script.
I'm afraid there is no shorter possibility. The computation for millions of rows remains quite expensive, no matter how you convert the datasets back. So go for a php script... it's the easiest way

This is my bullet proof version. It iterates over all Tables and String columns in a database, determines primary key(s) and performs updates.
It is intended to run the php-file from command line to get progress information.
<?php
$DBC = new mysqli("localhost", "user", "dbpass", "dbname");
$DBC->set_charset("utf8");
$tables = $DBC->query("SHOW FULL TABLES WHERE Table_type='BASE TABLE'");
while($table = $tables->fetch_array()) {
$table = $table[0];
$columns = $DBC->query("DESCRIBE `{$table}`");
$textFields = array();
$primaryKeys = array();
while($column = $columns->fetch_assoc()) {
// check for char, varchar, text, mediumtext and so on
if ($column["Key"] == "PRI") {
$primaryKeys[] = $column['Field'];
} else if (strpos( $column["Type"], "char") !== false || strpos($column["Type"], "text") !== false ) {
$textFields[] = $column['Field'];
}
}
if (!count($primaryKeys)) {
echo "Cannot convert table without primary key: '$table'\n";
continue;
}
foreach ($textFields as $textField) {
$sql = "SELECT `".implode("`,`", $primaryKeys)."`,`$textField` from `$table` WHERE `$textField` like '%&%'";
$candidates = $DBC->query($sql);
$tmp = $DBC->query("SELECT FOUND_ROWS()");
$rowCount = $tmp->fetch_array()[0];
$tmp->free();
echo "Updating $rowCount in $table.$textField\n";
$count=0;
while($candidate = $candidates->fetch_assoc()) {
$oldValue = $candidate[$textField];
$newValue = html_entity_decode($candidate[$textField], ENT_QUOTES | ENT_XML1, 'UTF-8');
if ($oldValue != $newValue) {
$sql = "UPDATE `$table` SET `$textField` = '"
. $DBC->real_escape_string($newValue)
. "' WHERE ";
foreach ($primaryKeys as $pk) {
$sql .= "`$pk` = '" . $DBC->real_escape_string($candidate[$pk]) . "' AND ";
}
$sql .= "1";
$DBC->query($sql);
}
$count++;
echo "$count / $rowCount\r";
}
}
}
?>
cheers
Roland

It's a bit kludgy but I think the mass update is the only way to go...
$Query = "SELECT row_id, html_entitied_column FROM table";
$result = mysql_query($Query, $connection);
while($row = mysql_fetch_array($result)){
$updatedValue = html_entity_decode($row['html_entitied_column']);
$Query = "UPDATE table SET html_entitied_column = '" . $updatedValue . "' ";
$Query .= "WHERE row_id = " . $row['row_id'];
mysql_query($Query, $connection);
}
This is simplified, no error handling etc.
Not sure what the processing time would be on millions of rows so you might need to break it up into chunks to avoid script timeouts.

I had the exact same problem. Since I had multiple clients running the application in production, I wanted to avoid running a PHP script to clean the database for every one of them.
I came up with a solution that is far from perfect, but does the job painlessly.
Track all the spots in your code where you use htmlentities() before inserting data, and remove that.
Change your "display data as HTML" method to something like this :
return html_entity_decode(htmlentities($chaine, ENT_NOQUOTES), ENT_NOQUOTES);
The undo-redo process is kind of ridiculous, but it does the job. And your database will slowly clean itself everytime users update the incorrect data.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to make a CSV upload perform faster? - php

Why not run the profiler to optimize your code? Codeigniter includes this useful piece for problems like this http://codeigniter.com/user_guide/general/profiling.html It will give you a breakdown of your SQL queries and what is taking long, and where. $this->output->enable_profiler(TRUE);

Related

How to increase id inside while loop?

How to process CSV with 100k+ lines in PHP?

query inside while loop only shows 1 result

How to print out 150k record in CSV, without overloading the website

How to remove htmlentities() values from the database?

Categories

Resources