Finding matches between two tables, each with 160k+ rows - php

I have two tables each with 160k+ rows each. between the two some UUID are shared. I'm and using a foreach loop over the "new" table with an embedded foreach searching the "old" table. When a UUID match is out the "old" table is updated with data from the "new" table.
Both tables have an index on the ID.
My problem is this operation is extreme time intensive; does anyone know a more efficient way to do said search for matching UUIDs? Sidenote: we are using the MySQLi extension for PHP 5.3
Exp code:
$oldCounter = 0;
$newCounter = 1;
//loop
foreach( $accounts as $accKey=>$accValue )
{
echo( "New ID - " . $oldCounter++ . ": " . $accValue['id'] . "\n" );
foreach( $accountContactData as $acdKey=>$acdValue )
{
echo( "Old ID - " $newCounter++ . ": " . $acdValue['id'] . " \n" );
if( $accValue['id'] == $acdValue['id'] && (
$accValue['phone_office'] == "" || $accValue['phone_office'] == NULL || $accValue['phone_office'] == 0 )
){
echo("ID match found\n");
//when match found update accounts with accountsContact info
$query = '
UPDATE `accounts`
SET
`phone_fax` = "' . $acdValue['fax'] . '",
`phone_office` = "' . $acdValue['telephone1'] . '",
`phone_alternate` = "' . $acdValue['telephone2'] . '"
WHERE
`id` = "' . $acdValue['id'] . '"
';
echo "" . $query . "\n\n";
$DB->query($query);
break 1;
}
}
}
unset($oldCounter);
unset($newCounter);
Thank you in advance.

Do this all in SQL.
There is nothing that I see in your code that requires PHP.
UPDATE allows JOIN. JOIN the new and old tables and have your WHERE conditions match that of your description. Should be pretty straightforward and significantly faster.

I wrote a function some months ago
You can modify it as you want .
This can improve the speed of your search
public static function search($query)
{
$result = array();
$all = custom_query::getNumRows("bar");
$quarter = floor(0.25 * $all) + 1;
$all = 0;
for($i = 0;$i<4;$i++)
{
custom_query::condition("", "limit $all, $quarter");
$data = custom_query::FetchAll("bar");
foreach ($data as $v)
foreach($v as $_v)
if (count(explode($query, $_v)) > 1)
$result[] = $v["bar_id"];
$all += $quarter;
}
return $result;
}
It returns you the ID of the record that search matched on it.
This method divides the table to 4 parts and each iteration gets only a quarter of it...
You can change this number to for example 10 or 20 for speed of...
some methods are in the class and you can easily write theme...

Related

PHP MySQL UPDATE query with array and IN statement

I have this query which works successfully in one of my PHP scripts:
$sql = "UPDATE Units SET move_end = $currentTime, map_ID = $mapID, attacking = $attackStartTime, unit_ID_affected = $enemy, updated = now() WHERE unit_ID IN ($attackingUnits);";
$attackingUnits is an imploded array of anywhere between 1 - 100 integers.
What I'd like to do is also add arrays with different values for $currentTime and $mapID which correspond with the values for $attackingUnits. Something like this:
$sql = "UPDATE Units SET move_end = " . $attackingUnits['move_end'] . ", map_ID = " . $attackingUnits['map_ID'] . ", attacking = $attackStartTime, unit_ID_affected = $enemy, updated = now() WHERE unit_ID IN ($attackingUnits);";
Obviously that won't work the way I want it to because $attackingUnits['move_end'] and $attackingUnits['map_ID'] are just single values, not an array, but I'm stumped as to how I can write this query. I know I can one query for each element of $attackingUnits, but this is precisely what I'm trying to avoid as I'd like to be able to use one UPDATE for as many elements as required.
How would I write this query?
The key parts of the PHP script are:
$attackStartTime = time(); // the time the units started attacking the enemy (i.e. the current time)
// create a proper multi-dimensional array as the client only sends a string of comma-delimited unitID values
$data = array();
// add the enemy unit ID to the start of the selectedUnits CSV and call it allUnits. we then run the same query for all units in the selectedUnits array. this avoids two separate queries for pretty much the same columns
$allUnits = $enemy . "," . $selectedUnits;
// get the current enemy and unit data from the database
$sql = "SELECT user_ID, unit_ID, type, map_ID, moving, move_end, destination, attacking, unit_ID_affected, current_health FROM Units WHERE unit_ID IN ($allUnits);";
$result = mysqli_query($conn, $sql);
// convert the CSV strings to arrays for processing in this script
$selectedUnits = explode(',', $selectedUnits);
$allUnits = explode(',', $allUnits);
while ($row = mysqli_fetch_assoc($result)) {
$data[] = $row;
}
$result -> close();
$increment = 0; // set an increment value outside of the foreach loop so that we can use the pointer value at each loop
// check each selected unit to see if it can validly attack the enemy unit, otherwise remove them from selected units and send an error back for that specific unit
foreach ($data as &$unit) {
// do a whole bunch of checking stuff here
}
// convert the attacking units (i.e. the unit ids from selected units which have passed the attacking tests) to a CSV string for processing on the database
$attackingUnits = implode(',', $selectedUnits);
// update each attacking unit with the start time of the attack and the unit id we are attacking, as well as any change in movement data
// HERE IS MY PROBLEMATIC QUERY
$sql = "UPDATE Units SET moving = " . $unit['moving'] . ", move_end = " . $unit['move_end'] . ", map_ID = " . $unit['map_ID'] . ", attacking = $attackStartTime, unit_ID_affected = $enemy, updated = now() WHERE unit_ID IN ($attackingUnits);";
$result = mysqli_query($conn, $sql);
// send back the full data array - should only be used for testing and not in production!
echo json_encode($data);
mysqli_close($conn);
OK, after some more web research I found a link that helped me out:
https://stuporglue.org/update-multiple-rows-at-once-with-different-values-in-mysql/
I updated his code to mysqli and after a lot of testing, it works! I can now successfully UPDATE hundreds of rows with one query, rather than sending hundreds of small updates via PHP. Here are the relevant parts of my code for anyone who's interested:
$updateValues = array(); // the array we are going to build out of all the unit values that need to be updated
// build up the query string
$updateValues[$unit["unit_ID"]] = array(
"moving" => $unit["new_start_time"],
"move_end" => $unit["new_end_time"],
"map_ID" => "`destination`",
"destination" => $unit["new_destination"],
"attacking" => 0,
"unit_ID_affected" => 0,
"updated" => "now()"
);
// start of the query
$updateQuery = "UPDATE Units SET ";
// columns we will be updating
$columns = array("moving" => "`moving` = CASE ",
"move_end" => "`move_end` = CASE ",
"map_ID" => "`map_ID` = CASE ",
"destination" => "`destination` = CASE ",
"attacking" => "`attacking` = CASE ",
"unit_ID_affected" => "`unit_ID_affected` = CASE ",
"updated" => "`updated` = CASE ");
// build up each column's CASE statement
foreach ($updateValues as $id => $values) {
$columns['moving'] .= "WHEN `unit_ID` = " . mysqli_real_escape_string($conn, $id) . " THEN " . mysqli_real_escape_string($conn, $values['moving']) . " ";
$columns['move_end'] .= "WHEN `unit_ID` = " . mysqli_real_escape_string($conn, $id) . " THEN " . mysqli_real_escape_string($conn, $values['move_end']) . " ";
$columns['map_ID'] .= "WHEN `unit_ID` = " . mysqli_real_escape_string($conn, $id) . " THEN " . mysqli_real_escape_string($conn, $values['map_ID']) . " ";
$columns['destination'] .= "WHEN `unit_ID` = " . mysqli_real_escape_string($conn, $id) . " THEN " . mysqli_real_escape_string($conn, $values['destination']) . " ";
$columns['attacking'] .= "WHEN `unit_ID` = " . mysqli_real_escape_string($conn, $id) . " THEN " . mysqli_real_escape_string($conn, $values['attacking']) . " ";
$columns['unit_ID_affected'] .= "WHEN `unit_ID` = " . mysqli_real_escape_string($conn, $id) . " THEN " . mysqli_real_escape_string($conn, $values['unit_ID_affected']) . " ";
$columns['updated'] .= "WHEN `unit_ID` = " . mysqli_real_escape_string($conn, $id) . " THEN " . mysqli_real_escape_string($conn, $values['updated']) . " ";
}
// add a default case, here we are going to use whatever value was already in the field
foreach ($columns as $columnName => $queryPart) {
$columns[$columnName] .= " ELSE `$columnName` END ";
}
// build the WHERE part. since we keyed our updateValues off the database keys, this is pretty easy
// $where = " WHERE `unit_ID` = '" . implode("' OR `unit_ID` = '", array_keys($updateValues)) . "'";
$where = " WHERE unit_ID IN ($unitIDs);";
// join the statements with commas, then run the query
$updateQuery .= implode(', ', $columns) . $where;
$result = mysqli_query($conn, $updateQuery);
This will significantly reduce the load on my database as these events can happen every second (think of hundreds of players at once, attacking hundreds of enemy units with hundreds of their own units). I hope this helps someone out.

Best way to organize query results into arrays?

I have the following code in one of my pages. Prior to this I execute a query that returns multiple rows keyed off of alias_code. This code creates an array of string arrays to me echoed into a javascript function for populating points on a graph. I've profiled this multiple times, but I still have the feeling that there's a more efficient way to do this. I do realize that I run the risk of running out of memory if my strings are too big, but I'll constrain this in my query since I'd like to avoid an additional sub array or the use of implode/join. Does anyone have any thoughts on speeding this up?
$detailArray = array();
$prevAliasCode = '';
$valuesStr = '';
while ($detailRow = mysqli_fetch_array($detailResult)) {
$aliasCode = $detailRow['alias_code'];
if ($aliasCode <> $prevAliasCode) {
if ($valuesStr <> '')
$detailArray[$prevAliasCode] = $valuesStr;
$valuesStr = '';
}
if ($valuesStr <> '')
$valuesStr = $valuesStr . ', ';
$valuesStr = $valuesStr .
"['" .
$detailRow['as_of_date'] . "', " .
$detailRow['difficulty'] . ", " .
$detailRow['price_usd'] . "]";
$prevAliasCode = $aliasCode;
}
$detailArray[$prevAliasCode] = $valuesStr;

parsing a very big file into mysql

i have a task where i need to parse an extremely big file and write the results into a mysql database. "extremely big" means we are talking about 1.4GB of sort-of-CSV data, totalling in approx 10 million lines of text.
Thing is not "HOW" to do it, but how to do it FAST. my first approach was to just do it in php without any speed optimization and then let it run for a few days until it's done. unfortunately, it's been running for 48 hours straight right now and has processed only 2% of the total file. therefore, that's not an option.
the file format is as follows:
A:1,2
where the amount of comma separated numbers following the ":" can be 0-1000. the example dataset has to go into a table as follows:
| A | 1 |
| A | 2 |
so right now, i did it like this:
$fh = fopen("file.txt", "r");
$line = ""; // buffer for the data
$i = 0; // line counter
$start = time(); // benchmark
while($line = fgets($fh))
{
$i++;
echo "line " . $i . ": ";
//echo $i . ": " . $line . "<br>\n";
$line = explode(":", $line);
if(count($line) != 2 || !is_numeric(trim($line[0])))
{
echo "error: source id [" . trim($line[0]) . "]<br>\n";
continue;
}
$targets = explode(",", $line[1]);
echo "node " . $line[0] . " has " . count($targets) . " links<br>\n";
// insert links in link table
foreach($targets as $target)
{
if(!is_numeric(trim($target)))
{
echo "line " . $i . " has malformed target [" . trim($target) . "]<br>\n";
continue;
}
$sql = "INSERT INTO link (source_id, target_id) VALUES ('" . trim($line[0]) . "', '" . trim($target) . "')";
mysql_query($sql) or die("insert failed for SQL: ". mysql_error());
}
}
echo "<br>\n--<br>\n<br>\nseconds wasted: " . (time() - $start);
this is obviously not optimized for speed in ANY way. any hints for a fresh start? should i switch to another language?
The first optimization would be to insert with a transaction - each 100 or 1000 lines commit and begin a new transaction. Obviously you'd have to use a storage engine that supports transactions.
Then observe the CPU usage with the top command - if you have multiple cores, the mysql process does not do much and the PHP process does much of the work, rewrite the script to accept a parameter that skips n lines from the beginning and only import 10000 lines or so. Then start multiple instances of the script, each with a different starting point.
Third solution would be to convert the file into a CSV with PHP (no INSERT at all, just writing to a file) and the using LOAD DATA INFILE as m4t1t0 suggested.
as promised, attached you'll find the solution i went for in this post. i benchmarked it and it turned out, that it is 40 times (!) faster than the old one :)
sure - there's still much room for optimization, but it's fast enough for me right now :)
$db = mysqli_connect(/*...*/) or die("could not connect to database");
$fh = fopen("data", "r");
$line = ""; // buffer for the data
$i = 0; // line counter
$start = time(); // benchmark timer
$node_ids = array(); // all (source) node ids
mysqli_autocommit($db, false);
while($line = fgets($fh))
{
$i++;
echo "line " . $i . ": ";
$line = explode(":", $line);
$line[0] = trim($line[0]);
if(count($line) != 2 || !is_numeric($line[0]))
{
echo "error: source node id [" . $line[0] . "] - skipping...\n";
continue;
}
else
{
$node_ids[] = $line[0];
}
$targets = explode(",", $line[1]);
echo "node " . $line[0] . " has " . count($targets) . " links\n";
// insert links in link table
foreach($targets as $target)
{
if(!is_numeric($target))
{
echo "line " . $i . " has malformed target [" . trim($target) . "]\n";
continue;
}
$sql = "INSERT INTO link (source_id, target_id) VALUES ('" . $line[0] . "', '" . trim($target) . "')";
mysqli_query($db, $sql) or die("insert failed for SQL: ". $db::error);
}
if($i%1000 == 0)
{
$node_ids = array_unique($node_ids);
foreach($node_ids as $node)
{
$sql = "INSERT INTO node (node_id) VALUES ('" . $node . "')";
mysqli_query($db, $sql);
}
$node_ids = array();
mysqli_commit($db);
mysqli_autocommit($db, false);
echo "committed to database\n\n";
}
}
echo "<br>\n--<br>\n<br>\nseconds wasted: " . (time() - $start);
I find your description rather confusing - and it doesn't match up with the code you've provided.
if(count($line) != 2 || !is_numeric(trim($line[0])))
the trim here is redundant - whitespace doesn't change the behaviour of is_numberic. But you've said aleswhere that the start of the line is a letter - therefore this will always fail.
If you want to speed it up then switch to using stream processing of the input rather than message processing (PHP arrays can be very slow) or use a different language and aggregate the insert statements into multi-line inserts.
I would first just use the script to create a SQL file. Then lock the table using this http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html by placing the appropriate commands at the start/end of the SQL file (could get you script to do this).
Then just use the command tool to inject the SQL into the database (preferably on the machine where the database resides).

MySQL/PHP mysql_fetch_array() keeps missing first row

Good eve everyone!
For some reason Database::fetchArray() is skipping the first $row of the query result set.
It prints all rows properly, only keeps missing out the first one for some reason, I assume there's something wrong with my fetchArray() function?
I ran the query in phpMyAdmin and it returned 4 rows, when I tried it on my localhost with the php file (code below) it only printed 3 rows, using the same 'WHERE tunes.riddim'-value ofcourse. Most similiar topics on google show that a common mistake is to use mysql_fetch_array() before the while(), which sets the pointer ahead and causes the missing of the first row, unfortunately I only have one mysql_fetch_array() call (the one within the while()-head).
<?php
$db->query("SELECT " .
"riddims.riddim AS riddim, " .
"riddims.image AS image, " .
"riddims.genre AS genre, " .
"tunes.label AS label, " .
"tunes.artist AS artist, " .
"tunes.tune AS tune, " .
"tunes.year AS year," .
"tunes.producer AS producer " .
"FROM tunes " .
"INNER JOIN riddims ON tunes.riddim = riddims.riddim " .
"WHERE tunes.riddim = '" . mysql_real_escape_string(String::plus2ws($_GET['riddim'])) . "'" .
"ORDER BY tunes.year ASC");
$ar = $db->fetchArray();
for($i = 0; $i < count($ar) - 1; $i++)
{
echo $ar[$i]['riddim'] . " - " . $ar[$i]['artist'] . " - " . $ar[$i]['tune'] . " - " . $ar[$i]['label'] . " - " . $ar[$i]['year'] . "<br>";
}
?>
Database::fetchArray() looks like:
public function fetchArray()
{
$ar = array();
while(($row = mysql_fetch_array($this->result)) != NULL)
$ar[] = $row;
return $ar;
}
Any suggestions appreciated!
You should remove -1 from the for loop
The problem's in your while loop:
for($i = 0; $i < count($ar) - 1; $i++)
if count ($ar) is 1, because there's one entry, your loop will never be called; try tweaking the check part:
for($i = 0; $i < count($ar) ; $i++)
You can also use a simple foreach:
foreach($db->fetchArray() as $row)
{
echo $row['riddim'] # ...
}
It'll make your code more readable too.

Problems using GROUP BY in MySQL in a query that does a JOIN on two tables

I have two MySQL tables, $database1 and $database2. Both have a field in them called ID. I am passing the name of a town to the file using GET (i.e. it's in the URL of the PHP file that holds this code).
I can run this query...
$PlaceName = $_GET['townName'];
$PlaceName = mysql_real_escape_string($PlaceName);
$sql="SELECT * from $database1 LEFT JOIN $database2 on $database1.ID = $database2.ID WHERE PlaceName='$PlaceName'";
$query = mysql_query($sql);
echo '<h1>People who are searching for '.$PlaceName.':</h1>';
echo '<ul>';
while ($row = mysql_fetch_array($query)) {
echo "<li>ID #",$row['ID'],": ",$row['MemberPersonalName']," ",$row['MemberSurname']," -- searching for ",$row['SurnameBeingSearched'],"</li>";
}
echo '</ul>';
...and it works and all is well. Right now the output looks like this...
People who are searching for Hogwarts:
ID #137: Hermione Granger -- searching for Stern
ID #137: Hermione Granger -- searching for Engelberg
ID #503: Harry Potter -- searching for Kreindler
ID #549: Ron Weasley -- searching for Kreindler
ID #1062: Draco Malfoy -- searching for Engelberg
ID #1155: Ginny Weasley -- searching for Kreindler
ID #1155: Ginny Weasley -- searching for Streisand
But the output needs tweaking, and I'm having trouble writing my SQL query statement to reflect the changes. What I really want is for the output to look like this...
People who are searching for Hogwarts:
Engelberg is being searched by Hermione Granger (id #137) and Draco Malfoy (id #1062)
Kreindler is being searched by Harry Potter (id #503), Ron Weasley (id #549), and Ginny Weasley (id #1155)
Stern is being searched by Hermione Granger (id #137)
Streisand is being searched by Ginny Weasley (id #1155)
In other words, I need to group the output together by the field 'SurnameBeingSearched', I need to list the names of the people doing the searching in an "X, Y, and Z" output format (where it knows where to add a comma, if necessary, depending on the number of results), and I need to order the results by the 'SurnameBeingSearched' field.
Help? Thanks!
You need to list the names so this isn't an aggregation (in the SQL sense) problem. Keep your current query. You're going to have to do the grouping in code.
So something like:
$rows = array();
$last = '';
while ($row = mysql_fetch_array($query)) {
$surname = $row['SurnameBeingSearched'];
$id = $row['ID'];
$name = $row['MemberPersonalName'];
if ($last != $surname) {
$last = $surname;
$rows[] = array();
}
$rows[count($rows)-1][$id] = $name;
}
foreach ($rows as $row) {
// now display each group of names
}
You might also be able to use the MySQL GROUP_CONCAT() function.
It would look something like this...
SELECT places_tbl.name, GROUP_CONCAT(people_tbl.name)
FROM places_tbl
LEFT JOIN people_tbl ON (places_tbl.id = people_tbl.id)
GROUP BY places_tbl.id
GROUP_CONCAT() by default returns the values as comma delimited. You can probably split them up to get the formatting as you need it or use the SEPARATOR keyword. GROUP_CONCAT(fieldname SEPARATOR '-')
$PlaceName = $_GET['townName'];
$PlaceName = mysql_real_escape_string($PlaceName);
// note - added order to the query
$sql="SELECT * from $database1 LEFT JOIN $database2 on $database1.ID = $database2.ID WHERE PlaceName='$PlaceName'
ORDER BY SurnameBeingSearched, MemberSurname, MemberPersonalName";
$query = mysql_query($sql);
echo '<h1>People who are searching for '.$PlaceName.':</h1>';
echo '<ul>';
$cntr = mysql_num_rows($query);
if ($cntr > 0) {
$i = 0;
$srchd = mysql_result($query, $i, 'SurnameBeingSearched');
$mbr = mysql_result($query, $i, 'MemberPersonalName');
$mbr = $mbr . " " . mysql_result($query, $i, 'MemberSurname');
$mbr = $mbr . " (id #" . mysql_result($query, $i, 'ID') . ")";
$lin = $srchd . " is being searched by " . $mbr;
$prev = $srchd;
if ($cntr == 1) {
echo "<li>" . $lin . "</li>";
} else {
for ($i = 1; $i< $cntr; $i++) {
$srchd = mysql_result($query, $i, 'SurnameBeingSearched');
$mbr = mysql_result($query, $i, 'MemberPersonalName');
$mbr = $mbr . " " . mysql_result($query, $i, 'MemberSurname');
$mbr = $mbr . " (id #" . mysql_result($query, $i, 'ID') . ")";
if ($srchd == $prev) { // common search
$j = $i + 1;
if ($j < $cntr) { // still have data
$nxt = mysql_result($query, $j, 'SurnameBeingSearched');
if ($prev == $nxt) { // another one coming -- use the comma
$lin = $lin . ", " . $mbr;
} else {
$lin = $lin . ", and " . $mbr; // last member add the 'and' - line is done
echo "<li>" . $lin . "</li>";
}
$prev = $srchd;
} else { // ran out of data - need to finish the line
$lin = $lin . ", and " . $mbr; // last member add the 'and' - line is done
echo "<li>" . $lin . "</li>";
} else { // new search - need to print this line and start a new one
echo "<li>" . $lin . "</li>";
$lin = $srchd . " is being searched by " . $mbr;
$prev = $srchd;
} // test searched = previous
} // next i
} // only one row
} // cntr > 0
echo '</ul>';
/* note: this is not tested
I would recommend using table1 and table2 instead of database1 and database2
or better give the tables meaningful names
I would use active voice instead of passive voice
*/

Categories