How to process CSV with 100k+ lines in PHP?

How to process CSV with 100k+ lines in PHP? - php

I have a CSV file with more than 100.000 lines, each line has 3 values separated by semicolon. Total filesize is approx. 5MB.
CSV file is in this format:
stock_id;product_id;amount
==========================
1;1234;0
1;1235;1
1;1236;0
...
2;1234;3
2;1235;2
2;1236;13
...
3;1234;0
3;1235;2
3;1236;0
...
We have 10 stocks which are indexed 1-10 in CSV. In database we have them saved as 22-31.
CSV is sorted by stock_id, product_id but I think it doesn't matter.
What I have
<?php
session_start();
require_once ('db.php');
echo '<meta charset="iso-8859-2">';
// convert table: `CSV stock id => DB stock id`
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
$sql = $mysqli->query("SELECT product_id FROM table WHERE fielddef_id = 1");
while ($row = $sql->fetch_assoc()) {
$products[$row['product_id']] = 1;
}
$csv = file('export.csv');
// go thru CSV file and prepare SQL UPDATE query
foreach ($csv as $row) {
$data = explode(';', $row);
// $data[0] - stock_id
// $data[1] - product_id
// $data[2] - amount
if (isset($products[$data[1]])) {
// in CSV are products which aren't in database
// there is echo which should show me queries
echo " UPDATE t
SET value = " . (int)$data[2] . "
WHERE fielddef_id = " . (int)$stocks[$data[0]] . " AND
product_id = '" . $data[1] . "' -- product_id isn't just numeric
LIMIT 1<br>";
}
}
Problem is that writing down 100k lines by echo is soooo slow, takes long minutes. I'm not sure what MySQL will do, if it will be faster, or take ± the same time. I have no testing machine here, so I'm worry about testing in on prod server.
My idea was to load CSV file into more variables (better array) like below, but I don't know why.
$csv[0] = lines 0 - 10.000;
$csv[1] = lines 10.001 - 20.000;
$csv[2] = lines 20.001 - 30.000;
$csv[3] = lines 30.001 - 40.000;
etc.
I found eg. Efficiently counting the number of lines of a text file. (200mb+), but I'm not sure how it can help me.
When I replace foreach for print_r, I get dump in < 1 sec. The task is to make the foreach loop with database update faster.
Any ideas how to updates so many records in database?
Thanks.

Something like this (please note this is 100% untested and off top of my head may need some tweaking to actually work :) )
//define array may (probably better ways of doing this
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
$handle = fopen("file.csv", "r")); //open file
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
//loop through csv
$updatesql = "UPDATE t SET `value` = ".$data[2]." WHERE fielddef_id = ".$stocks[$data[0]]." AND product_id = ".$data[1];
echo "$updatesql<br>";//for debug only comment out on live
}
There is no need to do your initial select since you're only ever setting your product data to 1 anyway in your code and it looks from your description that your product id's are always correct its just your fielddef column which has the map.
Also just for live don't forget to put your actual mysqli execute command in on your $updatesql;
To give you a comparison to actual usage code (I can benchmark against!)
This is some code I use for an importer of an uploaded file (its not perfect but it does its job)
if (isset($_POST['action']) && $_POST['action']=="beginimport") {
echo "<h4>Starting Import</h4><br />";
// Ignore user abort and expand time limit
//ignore_user_abort(true);
set_time_limit(60);
if (($handle = fopen($_FILES['clientimport']['tmp_name'], "r")) !== FALSE) {
$row = 0;
//defaults
$sitetype = 3;
$sitestatus = 1;
$startdate = "2013-01-01 00:00:00";
$enddate = "2013-12-31 23:59:59";
$createdby = 1;
//loop and insert
while (($data = fgetcsv($handle, 10000, ",")) !== FALSE) { // loop through each line of CSV. Returns array of that line each time so we can hard reference it if we want.
if ($row>0) {
if (strlen($data[1])>0) {
$clientshortcode = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[0])));
$sitename = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[0]))." ".trim(stripslashes($data[1])));
$address = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[1])).",".trim(stripslashes($data[2])).",".trim(stripslashes($data[3])));
$postcode = mysqli_real_escape_string($db->mysqli,trim(stripslashes($data[4])));
//look up client ID
$client = $db->queryUniqueObject("SELECT ID FROM tblclients WHERE ShortCode='$clientshortcode'",ENABLE_DEBUG);
if ($client->ID>0 && is_numeric($client->ID)) {
//got client ID so now check if site already exists we can trust the site name here since we only care about double matching against already imported sites.
$sitecount = $db->countOf("tblsites","SiteName='$sitename'");
if ($sitecount>0) {
//site exists
echo "<strong style=\"color:orange;\">SITE $sitename ALREADY EXISTS SKIPPING</strong><br />";
} else {
//site doesn't exist so do import
$db->execute("INSERT INTO tblsites (SiteName,SiteAddress,SitePostcode,SiteType,SiteStatus,CreatedBy,StartDate,EndDate,CompanyID) VALUES
('$sitename','$address','$postcode',$sitetype,$sitestatus,$createdby,'$startdate','$enddate',".$client->ID.")",ENABLE_DEBUG);
echo "IMPORTED - ".$data[0]." - ".$data[1]."<br />";
}
} else {
echo "<strong style=\"color:red;\">CLIENT $clientshortcode NOT FOUND PLEASE ENTER AND RE-IMPORT</strong><br />";
}
fcflush();
set_time_limit(60); // reset timer on loop
}
} else {
$row++;
}
}
echo "<br />COMPLETED<br />";
}
fclose($handle);
unlink($_FILES['clientimport']['tmp_name']);
echo "All Imports finished do not reload this page";
}
That imported 150k rows in about 10 seconds

Due to answers and comments for the question, I have the solution. The base for that is from #Dave, I've only updated it to pass better to question.
<?php
require_once 'include.php';
// stock convert table (key is ID in CSV, value ID in database)
$stocks = array(
1 => 22,
2 => 23,
3 => 24,
4 => 25,
5 => 26,
6 => 27,
7 => 28,
8 => 29,
9 => 30,
10 => 31
);
// product IDs in CSV (value) and Database (product_id) are different. We need to take both IDs from database and create an array of e-shop products
$sql = mysql_query("SELECT product_id, value FROM cms_module_products_fieldvals WHERE fielddef_id = 1") or die(mysql_error());
while ($row = mysql_fetch_assoc($sql)) {
$products[$row['value']] = $row['product_id'];
}
$handle = fopen('import.csv', 'r');
$i = 1;
while (($data = fgetcsv($handle, 1000, ';')) !== FALSE) {
$p_id = (int)$products[$data[1]];
if ($p_id > 0) {
// if product exists in database, continue. Without this condition it works but we do many invalid queries to database (... WHERE product_id = 0 updates nothing, but take a time)
if ($i % 300 === 0) {
// optional, we'll see what it do with the real traffic
sleep(1);
}
$updatesql = "UPDATE table SET value = " . (int)$data[2] . " WHERE fielddef_id = " . $stocks[$data[0]] . " AND product_id = " . (int)$p_id . " LIMIT 1";
echo "$updatesql<br>";//for debug only comment out on live
$i++;
}
}
// cca 1.5sec to import 100.000k+ records
fclose($handle);

Like I said in the comment, use SPLFileObject to iterate over the CSV file. Use Prepared statements to reduce performance overhead of calling the UPDATE in each loop. Also, merge your two queries together, there isn't any reason to pull all of the product rows first and check them against the CSV. You can use a JOIN to ensure that only those stocks in the second table that are related to the product in the first and that is the current CSV row will get updated:
/* First the CSV is pulled in */
$export_csv = new SplFileObject('export.csv');
$export_csv->setFlags(SplFileObject::READ_CSV | SplFileObject::DROP_NEW_LINE | SplFileObject::READ_AHEAD);
$export_csv->setCsvControl(';');
/* Next you prepare your statement object */
$stmt = $mysqli->prepare("
UPDATE stocks, products
SET value = ?
WHERE
stocks.fielddef_id = ? AND
product_id = ? AND
products.fielddef_id = 1
LIMIT 1
");
$stmt->bind_param('iis', $amount, $fielddef_id, $product_id);
/* Now you can loop through the CSV and set the fields to match the integers bound to the prepared statement and execute the update on each loop. */
foreach ($export_csv as $csv_row) {
list($stock_id, $product_id, $amount) = $csv_row;
$fielddef_id = $stock_id + 21;
if(!empty($stock_id)) {
$stmt->execute();
}
}
$stmt->close();

Make the query bigger, i.e. use the loop to compile a larger query. You may need to split it up into chunks (e.g. process 100 at a time), but certainly don't do one query at a time (applies for any kind, insert, update, even select if possible). This should greatly increase the performance.
It's generally recommended that you don't query in a loop.

Updating every record every time will be too expensive (mostly due to seeks, but also from writing).
You should TRUNCATE the table first and then insert all the records again (assuming you won't have external foreign keys linking to this table).
To make it even faster, you should lock the table before the insert and unlock it afterwards. This will prevent the indexing from happening at every insert.

Related

Compare MySQL with CSV and find differences

I have a CSV file with 1 column named EAN and a MySQL Table with a column named EAN too.
This is what I want to do comparing both columns:
CSV ||| MySQL ||| STATUS
123 123 OK
321 321 OK
444 MISSING IN MySQL
111 MISSING IN CSV
Any ideas how to realize with PHP?

One way to do it:
(Assuming you already know how to open a file and execute a query.)
First read rows from your CSV and assume the data is missing in SQL.
while (($row = fgetcsv($file)) !== FALSE) {
$num = $row[0]; // or whatever CSV column the value you want is in
$result[$num] = ['csv' => $num, 'sql' => '', 'status' => 'MISSING IN SQL'];
}
Then fetch rows from your query and fill the array you created from the CSV accordingly.
while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
$num = $row['EAN']; // or whatever your column is named
if (isset($result[$num])) {
// This has a value from the CSV, so update the array
$result[$num]['sql'] = $num;
$result[$num]['status'] = 'OK';
} else {
// This doesn't have a value from the CSV, so insert a new row
$result[$num] = ['csv' => '', 'sql' => $num, 'status' => 'MISSING IN CSV'];
}
}
You could change the order of this and process the query results first. Either order will work, just as long as you do the update/insert logic with the second data source.
You can ksort($result); if you want the merged values to be in order, then output $result however you need to.

PHP & MySQL: While Loop

first, i am pretty new to PHP and MySQL, so i still code precedurally.
I am working on an application that takes transactions and pays out a due amount at a certain maturity date to users who have previously made a donation. i have a function knapSolveFast2 that solves the knapsack problem (where a set of transaction amounts in a database adds up to a due amount for a users who's maturity date is up). currently, my demo database looks like this:
if my current date (now) = 2017-04-03 11:36:03 = CAST(NOW() AS DATETIME), my application is meant to loop through the database, fetch users whose maturity_date is >= 1 month from tran_date (i.e. WHERE maturity_date <= CAST(NOW() AS DATETIME) ). Take each user found and pair them for payment in a while loop to other users tran_amt in the database whose tran_amt sums up to the maturity users found due_amount using the knapsack function knapSolveFast2.
Question:
after finding the user with maturity date due for payment (2 users) with the first while loop, i am trying to run an inner while loop to pair each user to other users whose tran_amt sums up to the fetched user's due amount. the problem here is, the inner while loop only runs for the first user found an not for thesecond user.
The code
<?php
$servername = "localhost";
$username = "root";
$password = "";
$dbname = "test";
$connect = #mysqli_connect($servername, $username, $password, $dbname);
if (mysqli_connect_errno()) {
die("<pre><h1>Sorry, we are experiencing a little Downtime!</h1></pre>");
}
//include the match controller containing the knapSolveFast2 function
include('controller/match.php');
//UPDATE `pendingpair`SET `maturity_date`= DATE_ADD(`tran_date`, INTERVAL 1 MONTH)
//select user to be paid
$sql = "SELECT `user_id`, `due_payment` FROM `pendingpair` where `maturity_date` <= CAST(NOW() AS DATETIME) ORDER BY `id` ASC";
$queryRun = mysqli_query($connect, $sql);
$num_rows = mysqli_num_rows($queryRun);
if ($num_rows > 0) {
while ($row = mysqli_fetch_assoc($queryRun)) {
$user_id_due = $row['user_id'];
$user_amt_due = $row['due_payment'];
print_r($row);
/* Perform queries to select users to pay $user_id_due the sum of $user_amt_due; Where:
- user to be paid, $user_id_due, is not included in the pairing logic
- transacton payment to be chosen, ph_conf = 1, has been confirmed
- transaction has not yet been paired for payment, tran_paired_status = 0
- transactions have not been flaged for fake POP (proof of Payment), `ph_denied_fpop`= 0
*/
$fetchQuery = "SELECT `tran_inv`, `tran_amt`, `user_id` FROM `pendingpair`WHERE `tran_amt` <= {$user_amt_due} && `user_id` != {$user_id_due} && `ph_conf`=1 && `tran_paired_status` = 0 && `ph_denied_fpop`=0 ORDER BY `id`";
$m = array(); // Match Memo items array
$picked_trans = array();
$numcalls = 0; // number of calls made to get Match
$tran_inv = array();
$tran_amt = array();
$user_id = array();
//run query and throw users that fit the criteria into an array
if ($queryRun = mysqli_query($connect, $fetchQuery)) {
//check if data was pulled
if (mysqli_num_rows($queryRun) != NULL) {
//grab data from array and insert it into an array
while ($row = mysqli_fetch_assoc($queryRun)) {
//Populate Arrays to be used
$tran_amt[] = $row['tran_amt'];
$tran_inv[] = $row['tran_inv'];
$user_id[] = $row['user_id'];
}
}
}
## Solve
list ($m4,$pickedItems) = knapSolveFast2($tran_amt, $tran_amt, sizeof($tran_amt) -1, $user_amt_due, $m);
# Display Result
echo "<b><br><br>Invoice:</b><br>".join(", ",$tran_inv)."<br>";
echo "<b>Tran Amt:</b><br>".join(", ",$tran_amt)."<br>";
echo "<b>User_id:</b><br>".join(", ",$user_id)."<br>";
echo "<b>Max Value Found:</b><br>$m4 (in $numcalls calls)<br>";
}
}
?>
the result of the first while loop that finds user with the proper maturity date criteria is:
Array
(
[user_id] => 9
[due_payment] => 150
)
Array
(
[user_id] => 2
[due_payment] => 150
)
this means 2 users are due. but on trying to loop these users. the match for the second user is never found... only that of the first user is.
Array
(
[user_id] => 9
[due_payment] => 150
)
Invoice:
1102, 9022, 9113, 9029, 9116
Tran Amt:
100, 50, 100, 50, 50
User_id:
2, 5, 8, 5, 7
Max Value Found:
150 (in 19 calls)
Please help me figure out what i am missing. Thaaaaank you :)

Your problem is that you call the variables the same thing.
If you look at :
while ($row = mysqli_fetch_assoc($queryRun)) //External loop
Inside that loop you have another
while ($row = mysqli_fetch_assoc($queryRun)) //Internal loop
So the variables inside the external loop, you are using for the internal loop are essentially overwriting the External loops variables, and thus when it is time for the second run of your External loop, the code think it is done, since it is refering to the internal loops variable
To fix this, you must rename the variables you use for the internal loop

Note SECOND_ for both the queryRun and the row
Try this:
if ($SECOND_queryRun = mysqli_query($connect, $fetchQuery)) {
//check if data was pulled
if (mysqli_num_rows($SECOND_queryRun) != NULL) {
//grab data from array and insert it into an array
while ($SECOND_row = mysqli_fetch_assoc($SECOND_queryRun)) {
//Populate Arrays to be used
$tran_amt[] = $SECOND_row['tran_amt'];
$tran_inv[] = $SECOND_row['tran_inv'];
$user_id[] = $SECOND_row['user_id'];
}
}
}

How to print out 150k record in CSV, without overloading the website

In order to avoid overloading the server, I made a loop of queryen, I'll get 150k members up and stored in an array. This works fine, but when the loop has finished with its job, the array has to be printed out, but this takes a long time and it ends up, with the side crashes.
$development = array(
'testing' => false,
'testing_loops' => 1
);
$settings = array(
'times_looped' => 0,
'members_at_a_time' => 2000,
'print_settings' => true,
'members_looped' => 0,
'test' => 0,
);
function outputCSV($data)
{
$outstream = fopen("php://output", 'w');
array_walk($data, '__outputCSV', $outstream);
fclose($outstream);
}
function __outputCSV(&$vals, $key, $filehandler)
{
fwrite($filehandler, implode(',',$vals). "\n");
}
function getMembers(&$settings, $ee)
{
// SQL FROM
$sql_from = $settings['times_looped'] * $settings['members_at_a_time'];
// SQL LIMIT
$sql_limit = $sql_from . ', ' . $settings['members_at_a_time'];
$settings['test'] = $sql_limit;
// GET MEMBERS
$query = $ee->EE->db->query("SELECT m.email,
cr.near_rest_1_id, cr.near_rest_1_distance,
cr.near_rest_2_id, cr.near_rest_2_distance,
cr.near_rest_3_id, cr.near_rest_3_distance
from exp_members m
left join
exp_menucard_closest_restaurants cr
on m.member_id = cr.member_id
where group_id = 8 or 14 limit ".$sql_limit."");
// Check if members found
if($query->num_rows() == 0)
{
return $query->num_rows();
}
// Update number of members
$settings['members_looped'] = $settings['members_looped'] + $query->num_rows();
// Loop members
foreach($query->result_array() as $row) {
if($row['near_rest_1_distance'] > 1.0)
{$near_rest_1_distance= number_format($row['near_rest_1_distance'], 2, ',', ',') ." ". 'km';}
else
{$near_rest_1_distance= number_format($row['near_rest_1_distance'], 3, ',', '')*1000 ." ". 'meter';}
if($row['near_rest_2_distance'] > 1.0)
{$near_rest_2_distance= number_format($row['near_rest_2_distance'], 2, ',', ',') ." ". 'km';}
else
{$near_rest_2_distance= number_format($row['near_rest_2_distance'], 3, ',', '')*1000 ." ". 'meter';}
if($row['near_rest_3_distance'] > 1.0)
{$near_rest_3_distance= number_format($row['near_rest_3_distance'], 2, ',', ',') ." ". 'km';}
else
{$near_rest_3_distance= number_format($row['near_rest_3_distance'], 3, ',', '')*1000 ." ". 'meter';}
$nearest_rest_result_array[] = array(
'email' => $row['email'],
'near_rest_1_id' => $row['near_rest_1_id'],
'near_rest_1_distance' => $near_rest_1_distance,
'near_rest_2_id' => $row['near_rest_2_id'],
'near_rest_2_distance' => $near_rest_2_distance,
'near_rest_3_id' => $row['near_rest_3_id'],
'near_rest_3_distance' => $near_rest_3_distance
);
}
// Loop again
return $query->num_rows();
}
// Loop
$more_rows = true;
while($more_rows == true || $more_rows > 0)
{
// Test
if($settings['times_looped'] >= $development['testing_loops'] && $development['testing'] == true){
break;
}
// get members
$more_rows = getMembers($settings, $this);
$settings['members_looped'] = $settings['members_looped'] + $more_rows;
$settings['times_looped']++;
// Got last bunch of members
if($settings['members_looped'] < $settings['members_at_a_time'])
{
break;
}
}
When the loop has finished with its job, it will print all the array out
// Write to CSV
outputCSV($nearest_rest_result_array);

Don't use a foreach loop. Use a while-loop that reads a rows from the database and writes it to the CSV file. This way you're operating line-by-line which doesn't use as much memory.
If you're working with large data sets it's usually better to have some concept of iterators or streams, rather that trying to modify the whole in one big operation.

The mistake starts early, use an iterator instead of the array you currently do:
foreach($query->result_array() as $row)
PDO and Mysqli allow to iterate over the result. Create the output on the fly and stream it to the client, your webserver will chunk it normally, if not, set your PHP output buffer to 4096k or similar.

Consider implementing pagination in your webpage.
Lets take an example. Suppose your database has 10,000 rows. There maybe no need for those 10,000 rows to be displayed at once. Instead we can display 100 records per page and have links 100 such pages.
Best Example can be https://www.google.co.in/?gws_rd=cr&ei=-HggUuXWBMj4rQeNr4CADw#q=pagination+in+php
Of 6,190,000 results they have shown only 11 per page.

Optimizing setCellValueExplicit() in PHPExcel

I am dealing with 700 rows of data in my excel.
And I add on a column this entry:
foreach($data as $k => $v){
$users ->getCell('A'.$k)->setValue($v['Username']);
$users->setCellValueExplicit('B'.$k,
'=INDEX(\'Feed\'!H2:H'.$lastRow.',MATCH(A'.$k.',\'Feed\'!G2:G'.$lastRow.',0))',
PHPExcel_Cell_DataType::TYPE_FORMULA);
}
$users stands for a spreadsheet.
I see that writing 700 cells with the above setCellValueExplicit() takes more than 2 minutes to get processed. If I omit that line it takes 4 seconds for the same machine to process it.
2 minutes can be ok, but what if I have 2000 cells. Is there any way that can be speed optimized?
ps: =VLOOKUP is the same slow as the above function.
Update
The whole idea of the script:
read a CSV file (13 columns and at least 100 rows), write it into a spreadsheet, create a new spreadsheet ($users), read two columns, sort them based to one column and write it to the $users spreadsheet.
Read the columns:
$data = array();
for ($i = 1; $i <= $lastRow; $i++) {
$user = $Feed ->getCell('G'.$i)->getValue();
$number = $Feed ->getCell('H'.$i)->getValue();
$row = array('User' => $user, 'Number' => $number);
array_push($data, $row);
}
Sort the data
function cmpb($a,$b){
//get which string is less or 0 if both are the same
if($a['Number']>$b['Number']){
$cmpb = -1;
}elseif($a['Number']<$b['Number']){
$cmpb = 1;
}else{
$cmpb = 0;
}
//if the strings are the same, check name
if($cmpb == 0){
//compare the name
$cmpb = strcasecmp($a['User'], $b['User']);
}
return $cmpb;
}
usort($data, 'cmpb');
Write data
foreach($data as $k => $v){
$users ->getCell('A'.$k)->setValue($v['Username']);
$users ->getCell("B{$k}")->setValueExplicit("=INDEX('Feed'!H2:H{$lastRow},MATCH(A{$k},'Feed'!G2:G{$lastRow},0))",
PHPExcel_Cell_DataType::TYPE_FORMULA);
}
and also unset the data for memory:
unset($data);
So if comment the line with setValueExplicit everything becomes smoother.

Looking at PHPExcel's source code, this is PHPExcel_Worksheet::setCellValueExplicit function:
public function setCellValueExplicitByColumnAndRow($pColumn = 0, $pRow = 1, $pValue = null, $pDataType = PHPExcel_Cell_DataType::TYPE_STRING)
{
return $this->getCell(PHPExcel_Cell::stringFromColumnIndex($pColumn) . $pRow)->setValueExplicit($pValue, $pDataType);
}
For the data type you're using, PHPExcel_Cell_DataType::TYPE_FORMULA, the PHPExcel_Cell::setValueExplicit function just executes:
case PHPExcel_Cell_DataType::TYPE_FORMULA:
$this->_value = (string)$pValue;
break;
I can't find a logical explanation for the old up on the execution of that particular instruction. Try to replace it for the following and let me know if there was any improvement:
$users ->getCell("B{$k}")->setValueExplicit("=INDEX('Feed'!H2:H{$lastRow},MATCH(A{$k},'Feed'!G2:G{$lastRow},0))", PHPExcel_Cell_DataType::TYPE_FORMULA);
As a last resource my advice would be to time track the execution of the instruction to find the bottleneck.

How can I copy a database table to an array while accounting for skipped IDs?

I previously designed the website I'm working on so that I'd just query the database for the information I needed per-page, but after implementing a feature that required every cell from every table on every page (oh boy), I realized for optimization purposes I should combine it into a single large database query and throw each table into an array, thus cutting down on SQL calls.
The problem comes in where I want this array to include skipped IDs (primary key) in the database. I'll try and avoid having missing rows/IDs of course, but I won't be managing this data and I want the system to be smart enough to account for any problems like this.
My method starts off simple enough:
//Run query
$localityResult = mysql_query("SELECT id,name FROM localities");
$localityMax = mysql_fetch_array(mysql_query("SELECT max(id) FROM localities"));
$localityMax = $localityMax[0];
//Assign table to array
for ($i=1;$i<$localityMax+1;$i++)
{
$row = mysql_fetch_assoc($localityResult);
$localityData["id"][$i] = $row["id"];
$localityData["name"][$i] = $row["name"];
}
//Output
for ($i=1;$i<$localityMax+1;$i++)
{
echo $i.". ";
echo $localityData["id"][$i]." - ";
echo $localityData["name"][$i];
echo "<br />\n";
}
Two notes:
Yes, I should probably move that $localityMax check to a PHP loop.
I'm intentionally skipping the first array key.
The problem here is that any missed key in the database isn't accounted for, so it ends up outputting like this (sample table):
1 - Tok
2 - Juneau
3 - Anchorage
4 - Nashville
7 - Chattanooga
8 - Memphis
-
-
I want to write "Error" or NULL or something when the row isn't found, then continue on without interrupting things. I've found I can check if $i is less than $row[$i] to see if the row was skipped, but I'm not sure how to correct it at that point.
I can provide more information or a sample database dump if needed. I've just been stuck on this problem for hours and hours, nothing I've tried is working. I would really appreciate your assistance, and general feedback if I'm making any terrible mistakes. Thank you!
Edit: I've solved it! First, iterate through the array to set a NULL value or "Error" message. Then, in the assignations, set $i to $row["id"] right after the mysql_fetch_assoc() call. The full code looks like this:
//Run query
$localityResult = mysql_query("SELECT id,name FROM localities");
$localityMax = mysql_fetch_array(mysql_query("SELECT max(id) FROM localities"));
$localityMax = $localityMax[0];
//Reset
for ($i=1;$i<$localityMax+1;$i++)
{
$localityData["id"][$i] = NULL;
$localityData["name"][$i] = "Error";
}
//Assign table to array
for ($i=1;$i<$localityMax+1;$i++)
{
$row = mysql_fetch_assoc($localityResult);
$i = $row["id"];
$localityData["id"][$i] = $row["id"];
$localityData["name"][$i] = $row["name"];
}
//Output
for ($i=1;$i<$localityMax+1;$i++)
{
echo $i.". ";
echo $localityData["id"][$i]." - ";
echo $localityData["name"][$i];
echo "<br />\n";
}
Thanks for the help all!

Primary keys must be unique in MySQL, so you would get a maximum of one possible blank ID since MySQL would not allow duplicate data to be inserted.
If you were working with a column that is not a primary or unique key, your query would need to be the only thing that would change:
SELECT id, name FROM localities WHERE id != "";
or
SELECT id, name FROM localities WHERE NOT ISNULL(id);
EDIT: Created a new answer based on clarification from OP.
If you have a numeric sequence that you want to keep unbroken, and there may be missing rows from the database table, you can use the following (simple) code to give you what you need. Using the same method, your $i = ... could actually be set to the first ID in the sequence from the DB if you don't want to start at ID: 1.
$result = mysql_query('SELECT id, name FROM localities ORDER BY id');
$data = array();
while ($row = mysql_fetch_assoc($result)) {
$data[(int) $row['id']] = array(
'id' => $row['id'],
'name' => $row['name'],
);
}
// This saves a query to the database and a second for loop.
end($data); // move the internal pointer to the end of the array
$max = key($data); // fetch the key of the item the internal pointer is set to
for ($i = 1; $i < $max + 1; $i++) {
if (!isset($data[$i])) {
$data[$i] = array(
'id' => NULL,
'name' => 'Erorr: Missing',
);
}
echo "$i. {$data[$id]['id']} - {$data[$id]['name']}<br />\n";
}

After you've gotten your $localityResult, you could put all of the id's in an array, then before you echo $localityDataStuff, check to see
if(in_array($i, $your_locality_id_array)) {
// do your echoing
} else {
// echo your not found message
}
To make $your_locality_id_array:
$locality_id_array = array();
foreach($localityResult as $locality) {
$locality_id_array[] = $locality['id'];
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to process CSV with 100k+ lines in PHP? - php

Related

Compare MySQL with CSV and find differences

PHP & MySQL: While Loop

How to print out 150k record in CSV, without overloading the website

Optimizing setCellValueExplicit() in PHPExcel

How can I copy a database table to an array while accounting for skipped IDs?

Categories

Resources