I have developed a fairly ramshackle PHP/JavaScript Reward system for my school's VLE.
The main bulk of the work is done on a transactions table which has the following fields:
Transaction_ID, Datetime, Giver_ID, Recipient_ID, Points, Category_ID, Reason
The idea is that if I, as a member of staff, give a student some Reward Points, an entry such as this is inserted into the database:
INSERT INTO `transactions` (`Transaction_ID`, `Datetime`, `Giver_ID`, `Recipient_ID`, `Points`, `Category_ID`, `Reason`) VALUES
(60332, '2012-02-22', 34985, 137426, 5, 5, 'Excellent volcano homework.');
This worked fine - but I didn't really consider just how much the system would be used. I now have over 72,000 transactions in this table.
As such, a few of my pages are starting to slow down. For instance, when staff try to allocate points, my system runs a command to get the member of staff's total point allocation and other snippets of information. This appears to be displaying rather slowly, and looking at the MySQL statement/accompanying PHP code, I think this could be much more efficient.
function getLimitsAndTotals($User_ID) {
$return["SpentTotal"] = 0;
$return["SpentWeekly"] = 0;
$sql = "SELECT *
FROM `transactions`
WHERE `Giver_ID` =$User_ID";
$res = mysql_query($sql);
if (mysql_num_rows($res) > 0) {
while ($row = mysql_fetch_assoc($res)) {
$return["SpentTotal"] += $row["Points"];
$transaction_date = strtotime ($row["Datetime"]);
if ($transaction_date > strtotime( "last sunday" )) {
$return["SpentWeekly"] += $row["Points"];
}
}
}
return $return;
}
As such, my question is twofold.
Can this specific code be optimised?
Can I employ any database techniques - full text indexing or the like - to further optimise my system?
EDIT: RE Indexing
I don't know anything about indexing, but it looks like my transactions table does actually have an index in place?
Is this the correct type of index?
Here is the code for table-creation:
CREATE TABLE IF NOT EXISTS `transactions` (
`Transaction_ID` int(9) NOT NULL auto_increment,
`Datetime` date NOT NULL,
`Giver_ID` int(9) NOT NULL,
`Recipient_ID` int(9) NOT NULL,
`Points` int(4) NOT NULL,
`Category_ID` int(3) NOT NULL,
`Reason` text NOT NULL,
PRIMARY KEY (`Transaction_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=74927 ;
Thanks in advance,
Make sure Giver_ID is indexed. Try also running the strtotime outside of your while loop as I imagine its an expensive operation to be running 72,000 times.
if (mysql_num_rows($res) > 0) {
// assign the unix timestamp of last sunday here.
$last_sunday = strtotime('last sunday');
while ($row = mysql_fetch_assoc($res)) {
$return["SpentTotal"] += $row["Points"];
$transaction_date = strtotime ($row["Datetime"]);
if ($transaction_date > $last_sunday) {
$return["SpentWeekly"] += $row["Points"];
}
}
}
Also consider running UNIX_TIMESTAMP(Datetime) AS datetime_timestamp in your SQL instead of getting it out as a string and running another expensive strtotime operation. You can then simply run:
if (mysql_num_rows($res) > 0) {
// assign the unix timestamp of last sunday here.
$last_sunday = strtotime('last sunday');
while ($row = mysql_fetch_assoc($res)) {
$return["SpentTotal"] += $row["Points"];
if ($row['datetime_timestamp'] > $last_sunday) {
$return["SpentWeekly"] += $row["Points"];
}
}
}
(if your column is of type DATE, of course!)
72k is nothing... However there could be other things here that are causing slowdowns, like your mysql configuration (how much memory you allocated to the service), and a few others (do a search for optimizing mysql).
Also look at how many INSERTS you have in a given page, that will typically slow down your page load, if you are doing an insert every action for a user, you are making a costly transaction.
Typically a SELECT is much less expensive than an INSERT, etc;
From your SQL code I can also assume you didn't optimize your queries, something like:
$sql = "SELECT *
FROM `transactions`
WHERE `Giver_ID` =$User_ID";
Should be:
$sql = "SELECT Points, DateTime FROM ..."
(not a big issue from your data, but it only requests WHAT YOU NEED, not everything which would require more memory while going through the iteration).
Also not sure how your schema is designed, but are you using indexing on Transaction_ID?
Can this specific code be optimized?
Yes.
But definitively not by "looking at the code".
Can I employ any database techniques - full text indexing or the like - to further optimise my system?
Nope, you can't.
Simply because you have no idea, what certainly is slow in your system.
This is merely a common sense, widely used in the real life by anyone, but for some reason it become completely forgotten when someone start to program a web-site.
Your question is like "my car is getting slow. How to speed it up?". WHO ON THE EARTH CAN ANSWER IT, knowing no reason of the slowness? Is it tyres? Or gasoline? Or there is no road around but an open field? Or just someone forgot to release a hand brake?
You have to determine the slowness reason first.
Before asking a question you have to measure your query runtime.
And if it's still fast, you should not blame it for the slowness of your site.
And start profiling it.
First of all you have to determine what part of the whole page makes it load slow. It could be just some Javascript loading from the third-party server.
So, start with "Net" tab in the Firebug and see the most slow part.
then proceed to it.
If it's your PHP code - use microtime(1); to measure different parts to spot the problem one.
Then come with the question.
However, if take your question not as a request for the speed optimization, but for the more sanity in SQL, there are some improvements that can be made.
Both numbers can be retrieved as a single values.
select sum(Points) as Total from FROM `transactions` WHERE Giver_ID=$User_ID
will give you tital points
as well as
$sunday = date("Y-m-d",strtotime('last sunday'));
select sum(Points) as Spent from FROM `transactions`
WHERE Giver_ID=$User_ID AND Datetime > '$sunday'
will give you weekly points.
Related
After running a few test I realized that my search method does not perform very well if some words of the query is short (2~3 letters).
The way I made the search is by making a MySQL query for every words in the string the visitor entered and then filtering result from each word to see if every words had that result. Once one result has been returned for all words its a match and il show that result to the visitor.
But I was wondering if that's an effective way to do it. Is there any better way while keeping the same functionality ?
Currently the code I have takes about .7Sec making MySQL queries. And the rest of the stuff is under .1Sec.
Normally I would not care much about my search taking .7Sec, But Id like to create a "LiveSearch" and is critical that it loads faster than that.
Here is my code
public static function Search($Query){
$Querys = explode(' ',$Query);
foreach($Querys as $Query)
{
$MatchingRow = \Database\dbCon::$dbCon -> prepare("
SELECT
`Id`
FROM
`product_products` as pp
WHERE
CONCAT(
`Id`,
' ',
(SELECT `Name` FROM `product_brands` WHERE `Id` = pp.BrandId),
' ',
`ModelNumber`,
' ',
`Title`,
IF(`isFreeShipping` = 1 OR `isFreeShippingOnOrder` = 1, ' FreeShipping', '')
)
LIKE :Title;
");
$MatchingRow -> bindValue(':Title','%'.$Query.'%');
try{
$MatchingRow -> execute();
foreach($MatchingRow -> fetchAll(\PDO::FETCH_ASSOC) as $QueryInfo)
$Matchings[$Query][$QueryInfo['Id']] = $QueryInfo['Id'];
}catch(\PDOException $e){
echo 'Error MySQL: '.$e->getMessage();
}
}
$TmpMatch = $Matchings;
$Matches = array_shift(array_values($TmpMatch));
foreach($TmpMatch as $Query)
{
$Matches = array_intersect($Matches,$Query);
}
foreach($Matches as $Match){
$Products[] = new Product($Match);
}
return $Products;
}
As others have already suggested, fulltext search is your friend.
The logic should go more or less like this.
Add a text column to your "product_products" table called, say, "FtSearch"
Write a small script that will run only once, in which you write, for each existing product, the text that has to be searched for into the "FtSearch" column. This is, of course, the text you compose in your query (id + brand name + title and so forth, including the FreeShipping part). You can probably do this with a single SQL statement, but I have no mysql at hand and can't provide you the code for that... you might be able to find it out by yourself.
Create a fulltext index on the "FtSearch" column (doing this after having populated the FtSearch column saves you a little execution time).
Now you have to add the code necessary to ensure that every time any of the fields involved in the search string is inserted/updated, you insert/update the search string as well. Pay attention here, since this includes not only the "Title", "ModelNumber" and "FreeShipping" of the "product_product", but as well the "Name" of the "product_brand". This means that if the name of a product_brand is updated, you will have to regenerate all search strings of all products having that brand. This might be a little slow, depending on how many products are bound to that brand. However I assume it does not happen too often that a brand changes its name, and if it does, it certainly happens in some sort of administration interface where such things are usually acceptable.
At this point you can query the table using the MATCH construct, which is way way faster you could ever get by using your current approach.
I'm working on creating a forum (just to test) and i've reached the point where i sync the thread lists and the posts inside. I've relied on the AUTO INCREMENT in mysql to sync them but i understand that it won't be useful in the future.
My question is now, how would i generate a random number stacking just like the mysql auto_increment ?
For viewing the thread list, it's currently
$sql = "SELECT * FROM threads WHERE th_unique='$section';
$result = mysqli_query($db,$sql);
and then i just fetch the data and output the threads in the list.
Basicly, how would i generate a number just like Auto increment when a insert query is sent?
I am aware of rand() but i don't find it effective in the end due to the fact that it might overlap and use the same number that already exists.
Actually, you can use AUTO_INCREMENT with replication under certain conditions.
Statement-based replication of AUTO_INCREMENT, LAST_INSERT_ID(), and
TIMESTAMP values is done correctly, subject to the following
exceptions:
When using statement-based replication prior to MySQL 5.7.1,
AUTO_INCREMENT columns in tables on the slave must match the same
columns on the master; that is, AUTO_INCREMENT columns must be
replicated to AUTO_INCREMENT columns. ...
And the list goes on. If your situation is one of the conditions where AUTO_INCREMENT doesn't work UUID is an option.
Also take a look at this answer: https://stackoverflow.com/a/37605582/267540 it's for python/django Here's what it looks like when translated to PHP
define('START_TIME',1470825057000);
function make_id() {
/**
* inspired by http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
* Generates a unique identifier that isn't too complex and has a sequential nature (sort of)
*/
$t = microtime(True)*1000 - START_TIME;
$rnd = random_int(0,8388607);
return ($t << 23) | $rnd;
}
function reverse_id($id) {
$id = ($id >> 23) + START_TIME;
return $id;
}
for ($counter=0; $counter<100; $counter++) {
$id = make_id() ;
$time = reverse_id($id);
print "$id $time \n";
}
print 'Ending time ' . microtime(True)*1000;
As you can see the number look sequential but they are still safe for replication.
I have five different queries running on my about page showing basic data like the number of news stories we have on the site. I am using queries like this:
$sql4 = "SELECT `ride_id` FROM `tpf_rides` WHERE `type` LIKE '%Roller Coaster%'" ;
$result4 = $pdo->query($sql4);
$coasters = $result4->rowCount();
but wonder if there is a more efficient way. I've tried to minimize the load by only pulling id's but because I only need the count can the load be lightened even more?
Also these queries only really need to run once or twice per day, not every time the page is loaded. Can someone point me in the direction of setting this up? I've never had to do this before. Thanks.
Yes there is a more efficient way. Let the database do the counting for you:
SELECT count(*) as cnt
FROM `tpf_rides`
WHERE `type` LIKE '%Roller Coaster%';
If all the counts you are looking for are from the tpf_rides table, then you can do them in one query:
SELECT sum(`type` LIKE '%Roller Coaster%') as RollerCoaster,
sum(`type` LIKE '%Haunted House%') as HauntedHouse,
sum(`type` LIKE '%Ferris Wheel%') as FerrisWheel
FROM `tpf_rides`;
That would be even faster than running three different queries.
If you want to run those queries only every now and then you need to keep the result stored somewhere. This can take a form of a pre-calculated sum you manage yourself or a simple cache.
Below is a very simple and naive cache implementation that should work reliably on linux. Many things can be improved here but maybe this will give you an idea of what you could do.
The below is not compatible with the query suggested by Gordon Linoff which returns multiple counts.
The code has not been tested.
$cache_directory = "/tmp/";
$cache_lifetime = 86400; // time to keep cache in seconds. 24 hours = 86400sec
$sql4 = "SELECT count(*) FROM `tpf_rides` WHERE `type` LIKE '%Roller Coaster%'";
$cache_key = md5($sql4); //generate a semi-unique identifier for the query
$cache_file = $cache_directory . $cache_key; // generate full cache file path
if (!file_exists($cache_file) || time() <= strtotime(filemtime($cache)) + $cache_lifetime)
{
// cache file doesn't exist or has expired
$result4 = $pdo->query($sql4);
$coasters = $result4->fetchColumn();
file_put_contents($cache_file, $coasters); // store the result in a cache file
} else {
// file exists and data is up to date
$coasters = file_get_contents($cache_file);
}
I would strongly suggest you break this down into functions that take care of different aspects of the problem.
function generateRandomData(){
# $db = new mysqli('localhost','XXX','XXX','scores');
if(mysqli_connect_errno()) {
echo 'Failed to connect to database. Please try again later.';
exit;
}
$query = "insert into scoretable values(?,?,?)";
for($a = 0; $a < 1000000; $a++)
{
$stmt = $db->prepare($query);
$id = rand(1,75000);
$score = rand(1,100000);
$time = rand(1367038800 ,1369630800);
$stmt->bind_param("iii",$id,$score,$time);
$stmt->execute();
}
}
I am trying to populate a data table in mysql with a million rows of data. However, this process is extremely slow. Is there anything obvious I'm doing wrong that I could fix in order to make it run faster?
As hinted in the comments, you need to reduce the number of queries by catenating as many inserts as possible together. In PHP, it is easy to achieve that:
$query = "insert into scoretable values";
for($a = 0; $a < 1000000; $a++) {
$id = rand(1,75000);
$score = rand(1,100000);
$time = rand(1367038800 ,1369630800);
$query .= "($id, $score, $time),";
}
$query[strlen($query)-1]= ' ';
There is a limit on the maximum size of queries you can execute, which is directly related to the max_allowed_packet server setting (This page of the mysql documentation describes how to tune that setting to your advantage).
Therfore, you will have to reduce the loop count above to reach an appropriate query size, and repeat the process to reach the total number you want to insert, by wrapping that code with another loop.
Another practice is to disable check constraints on the table you wish to do bulk insert:
ALTER TABLE yourtablename DISABLE KEYS;
SET FOREIGN_KEY_CHECKS=0;
-- bulk insert comes here
SET FOREIGN_KEY_CHECKS=1;
ALTER TABLE yourtablename ENABLE KEYS;
This practice however must be done carefully, especially in your case since you generate the values randomly. If you have any unique key within the columns you generate, you cannot use that technique with your query as it is, as it may generate a duplicate key insert. You probably want to add a IGNORE clause to it:
$query = "insert INGORE into scoretable values";
This will cause the server to silently ignore duplicate entries on unique keys. To reach the total number of requiered inserts, just loop as many time as needed to fill up the remaining missing lines.
I suppose that the only place where you could have a unique key constraint is on the id column. In that case, you will never be able to reach the number of lines you wish to have, since it is way above the range of random values you generate for that field. Consider raising that limit, or better yet, generate your ids differently (perhaps simply by using a counter, which will make sure every record is using a different key).
You are doing several things wrong. First thing you have to take into account is what MySQL engine you're using.
The default one is InnoDB, previously the default engine is MyISAM.
I'll write this answer under assumption you're using InnoDB, which you should be using for plethora of reasons.
InnoDB operates in something called autocommit mode. That means that every query you make is wrapped in a transaction.
To translate that to a language that us mere mortals can understand - every query you do without specifying BEGIN WORK; block is a transaction - ergo, MySQL will wait until hard drive confirms data has been written.
Knowing that hard drives are slow (mechanical ones are still the ones most widely used), that means your inserts will be as fast as the hard drive is. Usually, mechanical hard drives can perform about 300 input output operations per second, ergo assuming you can do 300 inserts a second - yes, you'll wait quite a bit to insert 1 million records.
So, knowing how things work - you can use them to your advantage.
The amount of data that the HDD will write per transaction will be generally very small (4KB or even less), and knowing today's HDDs can write over 100MB/sec - that indicates that we should wrap several queries into a single transaction.
That way MySQL will send quite a bit of data and wait for the HDD to confirm it wrote everything and that the whole world is fine and dandy.
So, assuming you have 1M rows you want to populate - you'll execute 1M queries. If your transactions commit 1000 queries at a time, you should perform only about 1000 write operations.
That way, your code becomes something like this:
(I am not familiar with mysqli interface so function names might be wrong, and seeing I'm typing without actually running the code - the example might not work so use it at your own risk)
function generateRandomData()
{
$db = new mysqli('localhost','XXX','XXX','scores');
if(mysqli_connect_errno()) {
echo 'Failed to connect to database. Please try again later.';
exit;
}
$query = "insert into scoretable values(?,?,?)";
// We prepare ONCE, that's the point of prepared statements
$stmt = $db->prepare($query);
$start = 0;
$top = 1000000;
for($a = $start; $a < $top; $a++)
{
// If this is the very first iteration, start the transaction
if($a == 0)
{
$db->begin_transaction();
}
$id = rand(1,75000);
$score = rand(1,100000);
$time = rand(1367038800 ,1369630800);
$stmt->bind_param("iii",$id,$score,$time);
$stmt->execute();
// Commit on every thousandth query
if( ($a % 1000) == 0 && $a != ($top - 1) )
{
$db->commit();
$db->begin_transaction();
}
// If this is the very last query, then we just need to commit and end
if($a == ($top - 1) )
{
$db->commit();
}
}
}
DB querying involves many interrelated tasks. As a result it is an 'expensive' process. It is even more 'expensive' when it comes to insertion/update.
Running query once is the best way to enhance performance.
You can prepare the statements in the loop and run it once.
eg.
$query = "insert into scoretable values ";
for($a = 0; $a < 1000000; $a++)
{
$values = " ('".$?."','".$?."','".$?."'), ";
$query.=$values;
...
}
...
//remove the last comma
...
$stmt = $db->prepare($query);
...
$stmt->execute();
Have a look at this gist I've created. It takes about 5 minutes to insert a million rows on my laptop.
I'm having problems debugging a failing mysql 5.1 insert under PHP 5.3.4. I can't seem to see anything in the mysql error log or php error logs.
Based on a Yahoo presentation on efficient pagination, I was adding order numbers to posters on my site (order rank, not order sales).
I wrote a quick test app and asked it to create the order numbers on one category. There are 32,233 rows in that category and each and very time I run it I get 23,304 rows updated. Each and every time. I've increased memory usage, I've put ini setting in the script, I've run it from the PHP CLI and PHP-FPM. Each time it doesn't get past 23,304 rows updated.
Here's my script, which I've added massive timeouts to.
include 'common.inc'; //database connection stuff
ini_set("memory_limit","300M");
ini_set("max_execution_time","3600");
ini_set('mysql.connect_timeout','3600');
ini_set('mysql.trace_mode','On');
ini_set('max_input_time','3600');
$sql1="SELECT apcatnum FROM poster_categories_inno LIMIT 1";
$result1 = mysql_query($sql1);
while ($cats = mysql_fetch_array ($result1)) {
$sql2="SELECT poster_data_inno.apnumber,poster_data_inno.aptitle FROM poster_prodcat_inno, poster_data_inno WHERE poster_prodcat_inno.apcatnum ='$cats[apcatnum]' AND poster_data_inno.apnumber = poster_prodcat_inno.apnumber ORDER BY aptitle ASC";
$result2 = mysql_query($sql2);
$ordernum=1;
while ($order = mysql_fetch_array ($result2)) {
$sql3="UPDATE poster_prodcat_inno SET catorder='$ordernum' WHERE apnumber='$order[apnumber]' AND apcatnum='$cats[apcatnum]'";
$result3 = mysql_query($sql3);
$ordernum++;
} // end of 2nd while
}
I'm at a head-scratching loss. Just did a test on a smaller category and only 13,199 out of 17,662 rows were updated. For the two experiments only 72-74% of the rows are getting updated.
I'd say your problem lies with your 2nd query. Have you done an EXPLAIN on it? Because of the ORDER BY clause a filesort will be required. If you don't have appropriate indices that can slow things down further. Try this syntax and sub in a valid integer for your apcatnum variable during testing.
SELECT d.apnumber, d.aptitle
FROM poster_prodcat_inno p JOIN poster_data_inno d
ON poster_data_inno.apnumber = poster_prodcat_inno.apnumber
WHERE p.apcatnum ='{$cats['apcatnum']}'
ORDER BY aptitle ASC;
Secondly, since catorder is just an integer version of the combination of apcatnum and aptitle, it's a denormalization for convenience sake. This isn't necessarily bad, but it does mean that you have to update it every time you add a new title or category. Perhaps it might be better to partition your poster_prodcat_inno table by apcatnum and just do the JOIN with poster_data_inno when you need the actually need the catorder.
Please escape your query input, even if it does come from your own database (quotes and other characters will get you every time). Your SQL statement is incorrect because you're not using the variables correctly, please use hints, such as:
while ($order = mysql_fetch_array($result2)) {
$order = array_filter($order, 'mysql_real_escape_string');
$sql3 = "UPDATE poster_prodcat_inno SET catorder='$ordernum' WHERE apnumber='{$order['apnumber']}' AND apcatnum='{$cats['apcatnum']}'";
}