I am using the following code:
for ($x = 0; $x < count($response[0]); $x++) { //approx count 50-60
$t = $response[0][$x];
$query = "INSERT INTO tableX (time,title) VALUES ('$date','$t')";
if ($query_run = mysqli_query($mysql_connect, $query)) {
//Call some functions if the insertion happens
}
}
mysqli_close($mysql_connect);
Title in the table is a pimary key. I am going to call some functions only when the insertion is successful i.e., no existing title is provided. The title and date are derived from a csv file.
How can I improve the performance of this code? When should I use unset to save CPU memory cycles?
unset will not improve performance by any meaningful metric here.
What will improve performance is updating more than one row per query.
Let's take your code as an example. This is just an example of how it could be. If you need to run other functions after the insert, then you might want to update for example 10 or 100 rows at a time instead of ALL rows.
$query = "INSERT INTO tableX (time,title) VALUES ";
$valueQuery = array();
for ($x = 0; $x < count($response[0]); $x++) { //approx count 50-60
$t = $response[0][$x];
$valueQuery[] = "('$date','$t')";
}
$query .= implode(", ",$valueQuery);
if ($query_run = mysqli_query($mysql_connect, $query)) {
//Call some functions if the insertion happens
}
This question is entirely based on the wrong premises.
First of all, nowhere using unset will save you a CPU cycle,but will rather cousume them.
Besides, there is not much place to put unset anyway.
Finally, there are no real issues with performance with this code. If you are experiencing any, you should meausere them, detect a real bottleneck, and then fix it, instead of poking around barking at random trees.
What should be your actual concern, instead of fictional performance issues, is that your code wide open to sql injections and syntax errors of all sorts.
You could improve the shown code by using prepared statements for your insert (1). As it is, in every iteration you first have to define the $query and then execute. In each execution the database would also have to first parse your statement.
By preparing the statement before entering the loop you only define it once and can just execute it with the different values. Also the statement would only be parsed when it gets prepared, it doesn't get parsed on each execute.
Here is a small artice on how the garbage-collector in php works: http://www.hackingwithphp.com/18/1/10, found in this question: Is there garbage collection in PHP?
PHP performs garbage collection at three primary junctures:
When you tell it to
When you leave a function
When the script ends
So maybe you should check how much of your programm you can encapsulate into functions, though there is not much to see in your snippet.
(1)
I suggest the prepared statements to improve efficiency. It is imo best practice to reduce workload - the unneccessary parsing of the query.
So I have no estimation on the actual performance improvement, though as this case is talking about 50 to 60 iterations it would be minimal if even noticable.
Related
in what is efficient to execute multiple queries:
this with nextRowset() function to move over the queries
$stmt = $db->query("SELECT 1; SELECT 2;");
$info1 = $stmt->fetchAll();
$stmt->nextRowset();
$info2 = $stmt->fetchAll();
or multiple executions plan which is a lot easier to manage?
$info1 = $db->query("SELECT 1;")->fetchAll();
$info2 = $db->query("SELECT 2;")->fetchAll();
Performance of the code is likely to be similar.
The code at the bottom, to me, is more efficient for your software design because:
it is more readable
it can be changed with less chance of error since each of them addresses 1 query only
individual query and its interaction can be moved to a different function easily and can be tested individually
That's why I feel that overall efficiency (not just how fast data comes back from DB to PHP to the user, but also maintainability/refactoring of code) will be better with the code at the bottom.
"SQL injection" by a hacker is easier when you issue multiple statements at once. So, don't do it.
If you do need it regularly, write a Stored Procedure to perform all the steps via one CALL statement. That will return multiple "rowsets", so similar code will be needed.
This is an optimisation question RE: 1st principles.. Imagine I am doing a big heavy lifting comparison.. 30k files vs 30k database entries.. is it most process efficient to do one big MySQL into an array then loop through physical files checking vs the array or is it better to loop through the files and then one at a time do one line MySQL calls..
Here is some pseudo code to help explain:
//is this faster?
foreach($recursiveFileList as $fullpath){
$Record = $db->queryrow("SELECT * FROM files WHERE fullpath='".$fullpath."'");
//do some $Record logic
}
//or is this faster
$BigList = array();
$db->query("SELECT * FROM files");
while($Record = $db->rows()){
$BigList[$Record['fullpath']] = $Record;
}
foreach($recursiveFileList as $fullpath){
if (isset($BigList[$fullpath])){
$Record = $BigList[$fullpath];
//do some $Record logic
}
}
Update: if you always know that your $recursiveFileList is 100% of the table, then doing one query per row would be needless overhead. In that case, just use SELECT * FROM files.
I wouldn't use either of the two styles you show.
The first style runs one separate SQL query for each individual fullpath. This causes some overhead of SQL parsing, optimization, etc. Keep in mind that MySQL does not have the capability of remembering the query optimization from one invocation of a similar query to the next; it analysis and performs query optimization every time. The overhead is relatively small, but it adds up.
The second style shows fetching all rows from the table, and sorting it out in the application layer. This has a lot of overhead, because typically your $recursiveFileList might match only 1% or 0.1% or an even smaller portion of the rows in the table. I have seen cases where transferring excessive amounts of data over the network literally exhausted a 1Gbps network switch, and this put a ceiling on the requests per second for the application.
Use query conditions and indexes wisely to let the RDBMS examine and return only the matching rows.
The two styles you show are not the only options. What I would suggest is to use a range query to match multiple file fullpath values in a single query.
$sql = "SELECT * FROM files WHERE fullpath IN ("
. array_fill(0, count($recursiveFileList), "?") . ")";
$stmt = $pdo->prepare($sql);
$stmt->execute($recursiveFileList);
while ($row = $stmt->fetch()) {
//do some $Record logic
}
Note I also use a prepared query with ? parameter placeholders, and then pass the array of fullpath values separately when I call execute(). PDO is nice for this, because you can just pass an array, and the array elements get matched up to the parameter placeholders.
This also solves the risk of SQL injection in this case.
I often run into the situation where I want to determine if a value is in a table. Queries often happen often in a short time period and with similar values being searched therefore I want to do this the most efficient way. What I have now is
if($statment = mysqli_prepare($link, 'SELECT name FROM inventory WHERE name = ? LIMIT 1'))//name and inventory are arbitrarily chosen for this example
{
mysqli_stmt_bind_param($statement, 's', $_POST['check']);
mysqli_stmt_execute($statement);
mysqli_stmt_bind_result($statement, $result);
mysqli_stmt_store_result($statement);//needed for mysqli_stmt_num_rows
mysqli_stmt_fetch($statement);
}
if(mysqli_stmt_num_rows($statement) == 0)
//value in table
else
//value not in table
Is it necessary to call all the mysqli_stmt_* functions? As discussed in this question for mysqli_stmt_num_rows() to work the entire result set must be downloaded from the database server. I'm worried this is a waste and takes too long as I know there is 1 or 0 rows. Would it be more efficient to use the SQL count() function and not bother with the mysqli_stmt_store_result()? Any other ideas?
I noticed the prepared statement manual says "A prepared statement or a parametrized statement is used to execute the same statement repeatedly with high efficiency". What is highly efficient about it and what does it mean same statement? For example if two separate prepared statements evaluated to be the same would it still be more efficient?
By the way I'm using MySQL but didn't want to add the tag as a solution may be non-MySQL specific.
if($statment = mysqli_prepare($link, 'SELECT name FROM inventory WHERE name = ? LIMIT 1'))//name and inventory are arbitrarily chosen for this example
{
mysqli_stmt_bind_param($statement, 's', $_POST['check']);
mysqli_stmt_execute($statement);
mysqli_stmt_store_result($statement);
}
if(mysqli_stmt_num_rows($statement) == 0)
//value not in table
else
//value in table
I believe this would be sufficient. Note that I switched //value not in table
and //value in table.
It really depends of field type you are searching for. Make sure you have an index on that field and that index fits in memory. If it does, SELECT COUNT(*) FROM <your_table> WHERE <cond_which_use_index> LIMIT 1. The important part is LIMIT 1 which prevent for unnecessary lookup. You can run EXPLAIN SELECT ... to see which indexes used and probably make a hint or ban some of them, it's up to you. COUNT(*) works damn fast, it is optimized by design return result very quickly (MyISAM only, for InnoDB the whole stuff is a bit different due to ACID). The main difference between COUNT(*) and SELECT <some_field(s)> is that count doesn't perform any data reading and with (*) it doesn't care about whether some field is a NULL or not, just count rows by most suitable index (chosen internally). Actually I can suggest that even for InnoDB it's a fastest technique.
Also use case matters. If you want insert unique value make constrain on that field and use INSERT IGNORE, if you want to delete value which may not be in table run DELETE IGNORE and same for UPDATE IGNORE.
Query analyzer define by itself whether two queries are the same on or not and manage queries cache, you don't have to worry about it.
The different between prepared and regular query is that the first one contains rule and data separately, so analyzer can define which data is dynamic and better handle that, optimize and so. It can do the same for regular query but for prepared we say that we will reuse it later and give a hint which data is variable and which is fixed. I'm not very good in MySQL internal so you can ask such questions on more specific sites to understand details in a nutshell.
P.S.: Prepared statements in MySQL are session global, so after session they are defined in ends they are deallocated. Exact behavior and possible internal MySQL caching is a subject of additional investigation.
This is the kind of things in-memory caches are really good at. Something like this should work better than most microoptimization attempts (pseudocode!):
function check_if_value_is_in_table($value) {
if ($cache->contains_key($value)) {
return $cache->get($value);
}
// run the SQL query here, put result in $result
// note: I'd benchmark if using mysqli_prepare actually helps
// performance-wise
$cache->put($value, $result);
return $result;
}
Have a look at memcache or the various alternatives.
The is one php file that governs rotating ads that is causing serious server performance issues and causing "too many connections" sql errors on the site. Here is the php script. Can anyone give me some insight into how to correct this as I am an novice at php.
<?
require("../../admin/lib/config.php");
// connect to database
mysql_pconnect(DB_HOST,DB_USER,DB_PASS);
mysql_select_db(DB_NAME);
$i = 1;
function grab()
{
$getBanner = mysql_query("SELECT * FROM sponsor WHERE active='Y' AND ID != 999 AND bannerRotation = '0' ORDER BY RAND() LIMIT 1");
$banner = mysql_fetch_array($getBanner);
if ($banner['ID'] == ''){
mysql_query("UPDATE sponsor SET bannerRotation = '0'");
}
if (file_exists(AD_PATH . $banner['ID'] . ".jpg")){
$hasAd = 1;
}
if (file_exists(BANNER_PATH . $banner['ID'] . ".jpg")){
return "$banner[ID],$hasAd";
} else {
return 0;
}
}
while ($i <= 3){
$banner = grab();
if ($banner != 0){
$banner = explode(",",$banner);
mysql_query("UPDATE sponsor SET bannerView = bannerView + 1 WHERE ID='$banner[0]'");
mysql_query("UPDATE sponsor SET bannerRotation = '1' WHERE ID = '$banner[0]'");
echo "banner$i=$banner[0]&hasAd$i=$banner[1]&";
$i++;
}
}
?>
I see not mysqli
The problem is that mysql_pconnect() opens a persistent connection to the database and is not closed at end of execution, and as you are not calling mysql_close() anywhere the connection never gets closed.
Its all in the manual: http://php.net/manual/en/function.mysql-pconnect.php
Well, the good news for your client is that the previous developer abandoned the project. He could only have done more damage if he had continued working on it.
This script is using ext/mysql, not ext/mysqli. It would be better to use mysqli or PDO_mysql, since ext/mysql is deprecated.
It's recommended to use the full PHP open tag syntax (<?php), not the short-tags syntax (<?). The reason is that not every PHP environment enables the use of short tags, and if you deploy code into such an environment, your code will be viewable by anyone browsing to the page.
This script does no error checking. You should always check for errors after attempting to connect to a database or submitting a query.
The method of using ORDER BY RAND() LIMIT 1 to choose a random row from a database is well known to be inefficient, and it cannot be optimized. As the table grows to have more than a trivial number of rows, this query is likely to be your bottleneck. See some of my past answers about optimizing ORDER BY RAND queries, or a great blog by Jan Kneschke on selecting random rows.
Even if you are stuck using ORDER BY RAND(), there's no need to call it three times to get three distinct random sponsors. Just use ORDER BY RAND() LIMIT 3. Then you don't need the complex and error-prone update against bannerRotation to ensure that you get sponsors that haven't been chosen before.
Using SELECT * fetches all the columns, even though they aren't needed for this function.
If a sponsor isn't eligible for random selection, i.e. if it has active!='Y' or if its ID=999, then I would move it to a different table. This will simplify your queries, and make the table of sponsors smaller and quicker to query.
The UPDATE in the grab() function has no WHERE clause, so it applies to all rows in the sponsor table. I don't believe this is intentional. I assume it should apply only to the single row WHERE ID=$banner['ID'].
This code has two consecutive UPDATE statements against the same row of the same table. Combine these into a single UPDATE statement that modifies two columns.
The grab() function appends values together separated by commas, and then explodes that string into an array as soon as it returns. As if the programmer doesn't know that a function can return an array.
Putting the $i++ inside a conditional block makes it possible for this code to run in an infinite loop. That means this script can run forever. Once a few dozen of these are running concurrently, you'll run out of connections.
This code uses no caching. Any script that serves ad banners must be quick, and doing multiple updates to the database is not going to be quick enough. You need to use some in-memory caching for reads and writes. A popular choice is memcached.
Why is this client coding their own ad-banner server so inexpertly? Just use Google DFP Small Business.
Yikes!
grab() is being called from within a loop, but is not parameterized. Nor does there seem to be any rationale for repeatedly calling it.
A 200% speedup is easily realizable.
I'm creating a PHP script that imports some data from text files into a MySQL database. These text files are pretty large, an average file will have 10,000 lines in it each of which corresponds to a new item I want in my database. (I won't be importing files very often)
I'm worried that reading a line from the file, and then doing a INSERT query, 10,000 times in a row might cause some issues. Is there a better way for me to do this? Should I perform one INSERT query with all 10,000 values? Or would that be just as bad?
Maybe I can reach a medium, and perform something like 10 or 100 entries at once. Really my problem is that I don't know what is good practice. Maybe 10,000 queries in a row is fine and I'm just worrying for nothing.
Any suggestions?
yes it is
<?php
$lines = file('file.txt');
$count = count($lines);
$i = 0;
$query = "INSERT INTO table VALUES ";
foreach($lines as $line){
$i++;
if ($count == $i) {
$query .= "('".$line."')";
}
else{
$query .= "('".$line."'),";
}
}
echo $query;
http://sandbox.phpcode.eu/g/5ade4.php
this will make one single query, which is multiple faster than one-line-one-query style!
Use prepared statements, suggested by the authors of High Performance MySQL. It saves a lot of time (saves from wasteful protocol and SQL ASCII code).
I would do it in one large query with all the values at once. Just to be sure, though, make sure you run START TRANSACTION; before and COMMIT; afterwards, so that if something goes wrong during the execution of the query (which is possible, since it will most likely run for a fairly long time), the database will not be affected.