I have a strange situation.
Suppose I have a very simple function in php (I used Yii but the problem is general) which is called inside a transaction statement:
public function checkAndInsert($someKey)
{
$data = MyModel::model()->find(array('someKey'=>$someKey)); // search a record in the DB.If it does not exist, insert
if ( $data == null)
{
$data->someCol = 'newOne';
$data->save();
}
else
{
$data->someCol = 'test';
$data->save();
}
}
...
// $db is the instance variable used for operation on the DB
$db->transaction();
$this->checkAdnInsert();
$db->commit();
That said, if I run the script containing this function by staring many processes, I will have duplicate values in the DB. For example, if I have $someKey='pippo', and I run the script by starting 2 processes, I will have two (or more) records with column "someCol" = "newOne". This happens randomly, not always.
Is the code wrong? Should I put some constraint in DB in form of KEYs?
I also read this post about adding UNIQUE indexes to TokuDB which says that UNIQUE KEY "kills" write performance...
The approach you have is wrong. It's wrong because you delegate the authority for integrity/uniqueness check to PHP, but it's the database that's responsible for that.
In other words, you don't have to check whether something exists and then insert. That's bad because there's always some slight ping involved between PHP and MySQL and as you already saw - you can get false results for your checks.
If you need unique values for certain column or combination of columns, you add a UNIQUE constraint. After that you simply insert. If the record exists, insert fails and you can deal with it via Exception. Not only is it faster, it's also easier for you because your code can become a one-liner which is much easier to maintain or understand.
Related
I have function which checks if value exists, in this case it is API key. What I am trying to achieve is, before creating new api key for each account registration, I want to loop my function to generate new key if existing already in database. Key is simple string generated using:
$apiKey = bin2hex(random_bytes(16));
My function:
function apiCheckKey($apiKey) {
global $conn;
$sql = "SELECT * FROM `api` WHERE `key` = '".$apiKey."'";
$result = mysqli_query($conn, $sql);
if (mysqli_num_rows($result)) {
return true;
} else {
return false;
}
}
My check:
if(!apiCheckKey($apiKey)) {
// loop
}
How can I run loop efficiently to generate new key and eliminate duplicates? Please keep in mind, database will contain 100 000+ records...
There's a few things to keep in mind when doing this:
Ensure you have a UNIQUE constraint on this column so it's impossible to add duplicate values. You can't rely on a SELECT COUNT(*) FROM x WHERE key=? test before inserting as that's vulnerable to race conditions.
Generate an API key that's sufficiently random that collisions are unlikely. A >=20 character random string using all letters, both upper and lower case, plus numbers will have 704,423,425,546,998,022,968,330,264,616,370,176 possible forms so a collision is astronomically unlikely. If you have shorter keys collisions become a lot more probable due to effects like the pigeonhole principle and the birthday paradox.
In the unlikely event a collision does occur, make your code generate a new key and retry the insert. A UNIQUE constraint violation is a very specific MySQL error code you can handle. Check your error value if/when the INSERT fails and dispatch accordingly.
Test your code by generating a few million keys to be sure it's operating properly.
$connection->createCommand()->batchInsert('user', ['name', 'age'], [
[$names, $ages],
])->execute();
I know that i can do a batch insert using the code above in Yii 2. But how can i prevent duplicate entry using batchInsert? For example, if I have a duplicate name, i dont want to insert it in the db
Two possible options:
1) Use unique constraint for name column in your database table.
That way you just try to execute query and catch exception:
try {
// Your query goes here
} catch (\yii\db\Exception) {
// Handle error
}
\yii\db\Exception is more common database operations related exception, you can use more specific \yii\db\IntegrityException.
2) Exclude duplicates from array in PHP before feeding data and executing query. Depending on how array is constructed you can do it:
during building this array in foreach for example by checking if item with same name already exists in formed array, if yes - append element to it, otherwise - skip.
or afterwards using for example array_filter function.
I recommend first approach, because even you decide to handle duplicates in PHP, according data structure and additional protection in database won't be superfluos.
Someone retired in our group and I'm trying to figure out what his merge statement (and associated code) does so I can determine how to convert some (not all) values to integer before sending up. See comments below for questions. I am an absolute newbie with Microsoft SQL and took a class in php a few years ago, but don't have much experience. I've tried googling the merge command but I'm having trouble with a couple parts in it. See my questions below. (// ?)
I've looked at:
http://php.net/manual/en/pdo.query.php
http://stackoverflow.com/questions/4336573/merge-to-target-columns-using-source-rows
http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/index.jsp?topic=%2Fsqlp%2Frbafymerge.htm
I realize these are basic questions but I'm trying to figure it out and nobody around here knows.
function storeData ($form)
{
global $ms_conn, $QEDnamespace;
//I'm not sure what this is doing?? I thought this was where it was sending data up??
$qry = "MERGE INTO visEData AS Target
USING (VALUES (?,?,?,?,?,?,?,?,?,?))
AS Source (TestGUID,pqID, TestUnitID, TestUnitCountID,
ColorID, MeasurementID, ParameterValue,
Comments, EvaluatorID, EvaluationDate)
ON Target.pqID = Source.pqID
AND Target.MeasurementID=Source.MeasurementID //what is this doing?
AND Target.ColorID=Source.ColorID //what is target and source?
WHEN MATCHED THEN
UPDATE SET ParameterValue = Source.ParameterValue,
EvaluatorID = Source.EvaluatorID, //where is evaluatorID and source? My table or table we're send it to?
EvaluationDate = Source.EvaluationDate,
Comments = Source.Comments
WHEN NOT MATCHED BY TARGET THEN
INSERT (TestGUID,
pqID, TestUnitID, TestUnitCountID,
ColorID, MeasurementID,
ParameterValue, Comments,
EvaluatorID, EvaluationDate, TestIndex, TestNumber)
VALUES (Source.TestGUID, Source.pqID,
Source.TestUnitID,
Source.TestUnitCountID,
Source.ColorID, Source.MeasurementID, Source.ParameterValue,
Source.Comments, Source.EvaluatorID, Source.EvaluationDate,?,?);";
$pqID = coverSheetData($form);
$tid = getBaseTest($form['TextField6']);
$testGUID = getTestGUID($tid);
$testIndex = getTestIndex ($testGUID);
foreach ($form['visE']['parameters'] as $parameter=>$element)
{
foreach ($element as $key=>$data)
{
if ( mb_ereg_match('.+evaluation', $key) === true )
{
$testUnitData = getTestUnitData ($form, $key, $tid, $testGUID);
try
{
//I'm not sure if this is where it's sent up??
//Maybe I could add the integer conversion here??
$ms_conn->query ($qry, array(
$testGUID, $pqID,
$testUnitData[0], $testUnitData[1], $testUnitData[2],$element['parameterID'], $data, $element['comments'] $QEDnamespace->userid, date ('Y-m-d'), $testIndex, $tid));
}
catch (Zend_Db_Statement_Sqlsrv_Exception $e)
{
dataLog($e->getMessage());
returnStatus ("Failed at: " . $key);
}
}
}
}
}
This is a bit long for a comment. If you are using SQL Server, then look at the SQL Server documentation on merge. All the SQL Server documentation is on line, and it is very easy to find via Google (and perhaps even easier using Bing).
The purpose of the MERGE command is to do both inserts and updates in one step. Basically, you have a table that has new data ("source") and a table to be updated ("target"). When a record matches, then update the existing record in the target with matching record in source. When a record doesn't match, then insert it into target.
The main advantage of MERGE over two statements is not necessarily the elegant and intuitively obvious syntax. The main advantage is that all the operations occur in a single transaction, so either they all succeed or all fail as one.
The syntax actually isn't that bad. I would recommend that you set up a test database and try a few examples on your own, so you at least understand the syntax. Then, return to this code. When doing so, print out the resulting merge statement and put it in SQL Server Management Studio, where you will have nice color coded key words for the statement. Then go through it step by step, and you'll probably find that it makes lots of sense.
I am developing an application in PHP for which I need to implement a big file handler.
Reading and writing the file is not a problem, but checking the content of the file is a problem.
I built a recursive function which checks whether or not a variable is already used in the same document.
private function val_id($id){
if(!isset($this->id)){
$this->id = array();
}
if(in_array($id, $this->id)){
return $this->val_id($id+1);
}else{
$this->id[] = $id;
return $id;
}
}
When in_array($id,$this->id) returns FALSE, the $id will be added to $this->id (array which contains all used ids) and returns a valid id.
When this returns TRUE, it returns the same function with parameter $id+1
Since we are talking about over 300000 records a time, PHP won't not to be able to store such big arrays. It seems to quit writing lines in the documents I generate when this array gets too big. But I don't receive any error messages like that.
Since the generated documents are SQL files with multiple rows INSERT another solution could be to check if the id already exists in the database. Can MySQL catch these exceptions and try these entries again with adding 1 to id? How?
How do you think I need to solve this problem?
Kind regards,
Wouter
make error messages to appear.
increase memory_limit
instead of values store the parameter in the key - so you'll be able to use isset($array[$this->id]) instead of in_array()
Use INSERT IGNORE to disable duplicate key check in mysql and remove your key check in php. Your statement could look like this.
INSERT IGNORE INTO tbl_name SET key1 = 1, col1 = 'value1'
If you want to add 1 to the id always you could use ON DUPLICATE KEY to increment your key by one:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
Why should 30.000 records be a problem? Each record in a standard PHP array takes 144 bytes, for 30.000 that would mean 4218,75 kByte. No big deal.
Otherwise, Your Common Sense's idea with the array-key is worth a thought, because it's faster.
I'm using Zend_Cache_Core with Zend_Cache_Backend_File to cache results of queries executed for a model class that accesses the database.
Basically the queries themselves should form the id by which to cache the obtained results, only problem is, they are too long. Zend_Cache_Backend_File doesn't throw an exception, PHP doesn't complain but the cache file isn't created.
I've come up with a solution that is not efficient at all, storing any executed query along with an autoincrementing id in a separate file like so:
0->>SELECT * FROM table
1->>SELECT * FROM table1,table2
2->>SELECT * FROM table WHERE foo = bar
You get the idea; this way i have a unique id for every query. I clean out the cache whenever an insert, delete, or update is done.
Now i'm sure you see the potential bottleneck here, for any test, save or fetch from cache two (or three, where we need to add a new id) requests are made to the file system. This may even defeat the need to cache alltogether. So is there a way i can generate a unique id, ie a much shorter representation, of the queries in php without having to store them on the file system or in a database?
Strings are arbitrarily long, so obviously it's impossible to create a fixed-size identifier that can represent any arbitrary input string without duplication. However, for the purposes of caching, you can usually get away with a solution that's simple "good enough" and reduces collisions to an acceptable level.
For example, you can simply use MD5, which will only produce a collision in 1 in 2128 cases. If you're still worried about collisions (and you probably should be, just to be safe) you can store the query and the result in the "value" of the cache, and check when you get the value back that it's actually the query you were looking for.
As a quick example (my PHP is kind of rusty, but hopefully you get the idea):
$query = "SELECT * FROM ...";
$key = "hash-" + hash("md5", $query);
$result = $cache->load($key);
if ($result == null || $result[0] != $query) {
// object wasn't in cache, do the real fetch and store it
$result = $db->execute($query); // etc
$result = array($query, $result);
$cache->save($result, $key);
}
// the result is now in $result[1] (the original query is in $result[0])
MD5!!
Md5 generates a string of length 32 that seems to be working fine, the cache files are created (with filenames about of length 47) so it seems as though the operating system doesn't reject them.
//returns id for a given query
function getCacheId($query) {
return md5($query);
}
And that's it! But there's that issuse of collisions and i think salting the md5 hash (maybe with the name of the table) should make it more robust.
//returns id for a given query
function getCacheId($query, $table) {
return md5($table . $query);
}
If anyone wants the full code for how i've implemented the results caching, just leave a comment and i'll be happy to post it.