I'm using Zend_Cache_Core with Zend_Cache_Backend_File to cache results of queries executed for a model class that accesses the database.
Basically the queries themselves should form the id by which to cache the obtained results, only problem is, they are too long. Zend_Cache_Backend_File doesn't throw an exception, PHP doesn't complain but the cache file isn't created.
I've come up with a solution that is not efficient at all, storing any executed query along with an autoincrementing id in a separate file like so:
0->>SELECT * FROM table
1->>SELECT * FROM table1,table2
2->>SELECT * FROM table WHERE foo = bar
You get the idea; this way i have a unique id for every query. I clean out the cache whenever an insert, delete, or update is done.
Now i'm sure you see the potential bottleneck here, for any test, save or fetch from cache two (or three, where we need to add a new id) requests are made to the file system. This may even defeat the need to cache alltogether. So is there a way i can generate a unique id, ie a much shorter representation, of the queries in php without having to store them on the file system or in a database?
Strings are arbitrarily long, so obviously it's impossible to create a fixed-size identifier that can represent any arbitrary input string without duplication. However, for the purposes of caching, you can usually get away with a solution that's simple "good enough" and reduces collisions to an acceptable level.
For example, you can simply use MD5, which will only produce a collision in 1 in 2128 cases. If you're still worried about collisions (and you probably should be, just to be safe) you can store the query and the result in the "value" of the cache, and check when you get the value back that it's actually the query you were looking for.
As a quick example (my PHP is kind of rusty, but hopefully you get the idea):
$query = "SELECT * FROM ...";
$key = "hash-" + hash("md5", $query);
$result = $cache->load($key);
if ($result == null || $result[0] != $query) {
// object wasn't in cache, do the real fetch and store it
$result = $db->execute($query); // etc
$result = array($query, $result);
$cache->save($result, $key);
}
// the result is now in $result[1] (the original query is in $result[0])
MD5!!
Md5 generates a string of length 32 that seems to be working fine, the cache files are created (with filenames about of length 47) so it seems as though the operating system doesn't reject them.
//returns id for a given query
function getCacheId($query) {
return md5($query);
}
And that's it! But there's that issuse of collisions and i think salting the md5 hash (maybe with the name of the table) should make it more robust.
//returns id for a given query
function getCacheId($query, $table) {
return md5($table . $query);
}
If anyone wants the full code for how i've implemented the results caching, just leave a comment and i'll be happy to post it.
Related
I have function which checks if value exists, in this case it is API key. What I am trying to achieve is, before creating new api key for each account registration, I want to loop my function to generate new key if existing already in database. Key is simple string generated using:
$apiKey = bin2hex(random_bytes(16));
My function:
function apiCheckKey($apiKey) {
global $conn;
$sql = "SELECT * FROM `api` WHERE `key` = '".$apiKey."'";
$result = mysqli_query($conn, $sql);
if (mysqli_num_rows($result)) {
return true;
} else {
return false;
}
}
My check:
if(!apiCheckKey($apiKey)) {
// loop
}
How can I run loop efficiently to generate new key and eliminate duplicates? Please keep in mind, database will contain 100 000+ records...
There's a few things to keep in mind when doing this:
Ensure you have a UNIQUE constraint on this column so it's impossible to add duplicate values. You can't rely on a SELECT COUNT(*) FROM x WHERE key=? test before inserting as that's vulnerable to race conditions.
Generate an API key that's sufficiently random that collisions are unlikely. A >=20 character random string using all letters, both upper and lower case, plus numbers will have 704,423,425,546,998,022,968,330,264,616,370,176 possible forms so a collision is astronomically unlikely. If you have shorter keys collisions become a lot more probable due to effects like the pigeonhole principle and the birthday paradox.
In the unlikely event a collision does occur, make your code generate a new key and retry the insert. A UNIQUE constraint violation is a very specific MySQL error code you can handle. Check your error value if/when the INSERT fails and dispatch accordingly.
Test your code by generating a few million keys to be sure it's operating properly.
im working on PHP + MySQL application, which will crawl HDD/shared drive and index all files and directories into database, to provide "fulltext" search on it. So far im doing well, but im stuck on question, if i chosed good way how to store data into database.
On picture below, you can see part schema of my database. Thought is, that i'm saving domain (which represents part of disk which i wana to index) then there are some link(s) (which represents files and folder (with content, filepath, etc) then i have table to store sole (uniq) keywords, which i find in file/folder name or content.
And finaly, i have 16 tables linkkeyword to store relations between links and keywords. I have 16 of them because i thought it might be good to make something like hashtable, because im expecting high number of relations between link <-> keyword. (so far for 15k links and 400k keywords i have about 2.5milion of linkkeyword records). So to avoid storing so much data into one table (and later search above them) i thought that this hastable can be faster. It works like i wana to search for word, i compute it md5 and look at first character of md5 and then i know to which linkkeyword table i should use. So there is only about 150~200k records in each linkkeyword table (against 2.5milions)
So there im curious, if this approach can be of any use, or if will be better to store all linkkeyword information to single table and mysql will take care of it (and to how much link<->keyword it can work?)
So far this was great solution to me, but i crushed hard when i tried to implement regular-expression search. So user can use e.g. "tem*" which can result in temp, temporary, temple etc... In normal way when searching for word, i will conpute in md5 hash and then i know to which linkkeyword table i need to look. But for regular expression i need to get all keywords from keywords table (which matches regular expression) and then process them one by one.
Im also attaching part of code for normal keyword search
private function searchKeywords($selectedDomains) {
$searchValues = $this->searchValue;
$this->resultData = array();
foreach (explode(" ", $searchValues) as $keywordName) {
$keywordName = strtolower($keywordName);
$keywordMd5 = md5($keywordName);
$selection = $this->database->table('link');
$results = $selection->where('domain.id', $selectedDomains)->where('domain.searchable = ?', '1')->where(':linkkeyword' . $keywordMd5[0] . '.keyword.keyword LIKE ?', $keywordName)
->select('link.*,:linkkeyword' . $keywordMd5[0] . '.weight,:linkkeyword' . $keywordMd5[0] . '.keyword.keyword');
foreach ($results as $result) {
$keyExists = array_key_exists($result->linkId, $this->resultData);
if ($keyExists) {
$this->resultData[$result->linkId]->updateWeight($result->weight);
$this->resultData[$result->linkId]->addKeyword($result->keyword);
} else {
$domain = $result->ref('domain');
$linkClass = new search\linkClass($result, $domain);
$linkClass->updateWeight($result->weight);
$linkClass->addKeyword($result->keyword);
$this->resultData[$result->linkId] = $linkClass;
}
}
}
}
and regular expression search function
private function searchRegexp($selectedDomains) {
//get stored search value
$searchValues = $this->searchValue;
//replace astering and exclamation mark (counted as characters for regular expression) and replace them by their mysql equivalent
$searchValues = str_replace("*", "%", $searchValues);
$searchValues = str_replace("!", "_", $searchValues);
// empty result array to prevent previous results to interfere
$this->resultData = array();
//searched phrase can be multiple keywords, so split it by space and get results for each keyword
foreach (explode(" ", $searchValues) as $keywordName) {
//set default link result weight to -1 (default value)
$weight = -1;
//select all keywords, which match searched keyword (or its regular expression)
$keywords = $this->database->table('keyword')->where('keyword LIKE ?', $keywordName);
foreach ($keywords as $keyword) {
//count keyword md5 sum to determine which table should be use to match it links
$md5 = md5($keyword->keyword);
//get all link ids from linkkeyword relation table
$keywordJoinLink = $keyword->related('linkkeyword' . $md5[0])->where('link.domain.searchable','1');
//loop found links
foreach ($keywordJoinLink as $link) {
//store link weight, for later result sort
$weight = $link->weight;
//get link ID
$linkId = $link->linkId;
//check if link already exists in results, to prevent duplicity
$keyExists = array_key_exists($linkId, $this->resultData);
//if link already exists in result set, just update its weight and insert matching keyword for later keyword tag specification
if ($keyExists) {
$this->resultData[$linkId]->updateWeight($weight);
$this->resultData[$linkId]->addKeyword($keyword->keyword);
//if link isnt in result yet, insert it
} else {
//get link reference
$linkData = $link->ref('link', 'linkId');
//get information about domain, to which link belongs (location, flagPath,...)
$domainData = $linkData->ref('domain', 'domainId');
//if is domain searchable and was selected before search, add link to result set. Otherwise ignore it
if ($domainData->searchable == 1 && in_array($domainData->id, $selectedDomains)) {
//create new link instance
$linkClass = new search\linkClass($linkData, $domainData);
//insert matching keyword to links keyword set
$linkClass->addKeyword($keyword->keyword);
//set links weight
$linkClass->updateWeight($weight);
//insert link into result set
$this->resultData[$linkId] = $linkClass;
}
}
}
}
}
}
Your question is mostly one of opinion, so you may want to include the criteria that allow us to answer "worth it' more objectively.
It appears you've re-invented the concept of database sharding (though without distributing your data across multiple servers).
I assume you are trying to optimize search time; if that's the case, I'd suggest that 2.5 million records on a modern hardware is not a particularly big performance challenge, as long as your queries can use an index. If you can't use an index (e.g. because you're doing a regular expression search), sharding will probably not help at all.
My general recommendation with database performance tuning is to start with the simplest possible relational solution, keep tuning that until it breaks your performance goals, then add more hardware, and only once you've done that should you go for "exotic" solutions like sharding.
This doesn't mean using prayer as a strategy. For performance-critical application, I typically build a test database, where I can experiment with solutions. In your case, I'd build a database with your schema without the "sharding" tables, and then populate it with test data (either write your own population routines, or use a tool like DBMonster). Typically, I'd go for at least double the size I expect in production. You can then run and tune queries to prove, one way or another, whether your schema is good enough. It sounds like a lot of work, but it's much less work than your sharding solution is likely to bring along.
There are (as #danFromGermany comments) solutions that are optimized for text serach, and you could use MySQL fulltext search features rather than regular expressions.
I have a strange situation.
Suppose I have a very simple function in php (I used Yii but the problem is general) which is called inside a transaction statement:
public function checkAndInsert($someKey)
{
$data = MyModel::model()->find(array('someKey'=>$someKey)); // search a record in the DB.If it does not exist, insert
if ( $data == null)
{
$data->someCol = 'newOne';
$data->save();
}
else
{
$data->someCol = 'test';
$data->save();
}
}
...
// $db is the instance variable used for operation on the DB
$db->transaction();
$this->checkAdnInsert();
$db->commit();
That said, if I run the script containing this function by staring many processes, I will have duplicate values in the DB. For example, if I have $someKey='pippo', and I run the script by starting 2 processes, I will have two (or more) records with column "someCol" = "newOne". This happens randomly, not always.
Is the code wrong? Should I put some constraint in DB in form of KEYs?
I also read this post about adding UNIQUE indexes to TokuDB which says that UNIQUE KEY "kills" write performance...
The approach you have is wrong. It's wrong because you delegate the authority for integrity/uniqueness check to PHP, but it's the database that's responsible for that.
In other words, you don't have to check whether something exists and then insert. That's bad because there's always some slight ping involved between PHP and MySQL and as you already saw - you can get false results for your checks.
If you need unique values for certain column or combination of columns, you add a UNIQUE constraint. After that you simply insert. If the record exists, insert fails and you can deal with it via Exception. Not only is it faster, it's also easier for you because your code can become a one-liner which is much easier to maintain or understand.
Someone retired in our group and I'm trying to figure out what his merge statement (and associated code) does so I can determine how to convert some (not all) values to integer before sending up. See comments below for questions. I am an absolute newbie with Microsoft SQL and took a class in php a few years ago, but don't have much experience. I've tried googling the merge command but I'm having trouble with a couple parts in it. See my questions below. (// ?)
I've looked at:
http://php.net/manual/en/pdo.query.php
http://stackoverflow.com/questions/4336573/merge-to-target-columns-using-source-rows
http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/index.jsp?topic=%2Fsqlp%2Frbafymerge.htm
I realize these are basic questions but I'm trying to figure it out and nobody around here knows.
function storeData ($form)
{
global $ms_conn, $QEDnamespace;
//I'm not sure what this is doing?? I thought this was where it was sending data up??
$qry = "MERGE INTO visEData AS Target
USING (VALUES (?,?,?,?,?,?,?,?,?,?))
AS Source (TestGUID,pqID, TestUnitID, TestUnitCountID,
ColorID, MeasurementID, ParameterValue,
Comments, EvaluatorID, EvaluationDate)
ON Target.pqID = Source.pqID
AND Target.MeasurementID=Source.MeasurementID //what is this doing?
AND Target.ColorID=Source.ColorID //what is target and source?
WHEN MATCHED THEN
UPDATE SET ParameterValue = Source.ParameterValue,
EvaluatorID = Source.EvaluatorID, //where is evaluatorID and source? My table or table we're send it to?
EvaluationDate = Source.EvaluationDate,
Comments = Source.Comments
WHEN NOT MATCHED BY TARGET THEN
INSERT (TestGUID,
pqID, TestUnitID, TestUnitCountID,
ColorID, MeasurementID,
ParameterValue, Comments,
EvaluatorID, EvaluationDate, TestIndex, TestNumber)
VALUES (Source.TestGUID, Source.pqID,
Source.TestUnitID,
Source.TestUnitCountID,
Source.ColorID, Source.MeasurementID, Source.ParameterValue,
Source.Comments, Source.EvaluatorID, Source.EvaluationDate,?,?);";
$pqID = coverSheetData($form);
$tid = getBaseTest($form['TextField6']);
$testGUID = getTestGUID($tid);
$testIndex = getTestIndex ($testGUID);
foreach ($form['visE']['parameters'] as $parameter=>$element)
{
foreach ($element as $key=>$data)
{
if ( mb_ereg_match('.+evaluation', $key) === true )
{
$testUnitData = getTestUnitData ($form, $key, $tid, $testGUID);
try
{
//I'm not sure if this is where it's sent up??
//Maybe I could add the integer conversion here??
$ms_conn->query ($qry, array(
$testGUID, $pqID,
$testUnitData[0], $testUnitData[1], $testUnitData[2],$element['parameterID'], $data, $element['comments'] $QEDnamespace->userid, date ('Y-m-d'), $testIndex, $tid));
}
catch (Zend_Db_Statement_Sqlsrv_Exception $e)
{
dataLog($e->getMessage());
returnStatus ("Failed at: " . $key);
}
}
}
}
}
This is a bit long for a comment. If you are using SQL Server, then look at the SQL Server documentation on merge. All the SQL Server documentation is on line, and it is very easy to find via Google (and perhaps even easier using Bing).
The purpose of the MERGE command is to do both inserts and updates in one step. Basically, you have a table that has new data ("source") and a table to be updated ("target"). When a record matches, then update the existing record in the target with matching record in source. When a record doesn't match, then insert it into target.
The main advantage of MERGE over two statements is not necessarily the elegant and intuitively obvious syntax. The main advantage is that all the operations occur in a single transaction, so either they all succeed or all fail as one.
The syntax actually isn't that bad. I would recommend that you set up a test database and try a few examples on your own, so you at least understand the syntax. Then, return to this code. When doing so, print out the resulting merge statement and put it in SQL Server Management Studio, where you will have nice color coded key words for the statement. Then go through it step by step, and you'll probably find that it makes lots of sense.
I am developing a mysql database.
I "need" a unique id for each user but it must not auto increment! It is vital it is not auto increment.
So I was thinking of inserting a random number something like mt_rand(5000, 1000000) into my mysql table when a user signs up for my web site to be. This is where I am stuck?!
The id is a unique key on my mysql table specific to each user, as I can not 100% guarantee that inserting mt_rand(5000, 1000000) for the user id will not incoherently clash with another user's id.
Is there a way in which I can use mt_rand(5000, 1000000) and scan the mysql database, and if it returns true that it is unique, then insert it as the user's new ID, upon returning false (somebody already has that id) generate a new id until it becomes unique and then insert it into the mysql database.
I know this is possible I have seen it many times, I have tried with while loops and all sorts, so this place is my last resort.
Thanks
You're better off using this: http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid
Or using this: http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
But if you actually want to do what you are saying, you can just do something like:
$x;
do {
$x = random_number();
"SELECT count(*) FROM table WHERE id = $x"
} while (count != 0);
// $x is now a value that's not in the db
You could use a guid. That's what I've seen done when you can't use an auto number.
http://php.net/manual/en/function.com-create-guid.php
Doesn't this function do what you want (without verification): http://www.php.net/manual/en/function.uniqid.php?
I think you need to approach the problem from a different direction, specifically why a sequence of incrementing numbers is not desired.
If it needs to be an 'opaque' identifier, you can do something like start with a simple incrementing number and then add something around it to make it look like it's not, such as three random numbers on the end. You could go further than that and put some generated letters in front (either random or based on some other algorithm, such as the day of the month they first registered, or which server they hit), then do a simple checksuming algorithm to make another letter for the end. Now someone can't easily guess an ID and you have a way of rejecting one sort of ID before it hits the database. You will need to store the additional data around the ID somewhere, too.
If it needs to be a number that is random and unique, then you need to check the database with the generated ID before you tell the new user. This is where you will run into problems of scale as too small a number space and you will get too many collisions before the check lucks upon an unallocated one. If that is likely, then you will need to divide your ID generation into two parts: the first part is going to be used to find all IDs with that prefix, then you can generate a new one that doesn't exist in the set you got from the DB.
Random string generation... letters, numbers, there are 218 340 105 584 896 combinations for 8 chars.
function randr($j = 8){
$string = "";
for($i=0;$i < $j;$i++){
srand((double)microtime()*1234567);
$x = mt_rand(0,2);
switch($x){
case 0:$string.= chr(mt_rand(97,122));break;
case 1:$string.= chr(mt_rand(65,90));break;
case 2:$string.= chr(mt_rand(48,57));break;
}
}
return $string;
}
Loop...
do{
$id = randr();
$sql = mysql_query("SELECT COUNT(0) FROM table WHERE id = '$id'");
$sql = mysql_fetch_array($sql);
$count = $sql[0];
}while($count != 0);
For starters I always prefer to do all the randomization in php.
function gencode(){
$tempid=mt_rand(5000, 1000000);
$check=mysql_fetch_assoc(mysql_query("SELECT FROM users WHERE id =$tempid",$link));
if($check)gencode();
$reg=mysql_query("INSERT INTO users id VALUES ('$tempid')",$link);
//of course u can check for if $reg then insert successfull