Most efficent way of determing if a value is in a table - php

I often run into the situation where I want to determine if a value is in a table. Queries often happen often in a short time period and with similar values being searched therefore I want to do this the most efficient way. What I have now is
if($statment = mysqli_prepare($link, 'SELECT name FROM inventory WHERE name = ? LIMIT 1'))//name and inventory are arbitrarily chosen for this example
{
mysqli_stmt_bind_param($statement, 's', $_POST['check']);
mysqli_stmt_execute($statement);
mysqli_stmt_bind_result($statement, $result);
mysqli_stmt_store_result($statement);//needed for mysqli_stmt_num_rows
mysqli_stmt_fetch($statement);
}
if(mysqli_stmt_num_rows($statement) == 0)
//value in table
else
//value not in table
Is it necessary to call all the mysqli_stmt_* functions? As discussed in this question for mysqli_stmt_num_rows() to work the entire result set must be downloaded from the database server. I'm worried this is a waste and takes too long as I know there is 1 or 0 rows. Would it be more efficient to use the SQL count() function and not bother with the mysqli_stmt_store_result()? Any other ideas?
I noticed the prepared statement manual says "A prepared statement or a parametrized statement is used to execute the same statement repeatedly with high efficiency". What is highly efficient about it and what does it mean same statement? For example if two separate prepared statements evaluated to be the same would it still be more efficient?
By the way I'm using MySQL but didn't want to add the tag as a solution may be non-MySQL specific.

if($statment = mysqli_prepare($link, 'SELECT name FROM inventory WHERE name = ? LIMIT 1'))//name and inventory are arbitrarily chosen for this example
{
mysqli_stmt_bind_param($statement, 's', $_POST['check']);
mysqli_stmt_execute($statement);
mysqli_stmt_store_result($statement);
}
if(mysqli_stmt_num_rows($statement) == 0)
//value not in table
else
//value in table
I believe this would be sufficient. Note that I switched //value not in table
and //value in table.

It really depends of field type you are searching for. Make sure you have an index on that field and that index fits in memory. If it does, SELECT COUNT(*) FROM <your_table> WHERE <cond_which_use_index> LIMIT 1. The important part is LIMIT 1 which prevent for unnecessary lookup. You can run EXPLAIN SELECT ... to see which indexes used and probably make a hint or ban some of them, it's up to you. COUNT(*) works damn fast, it is optimized by design return result very quickly (MyISAM only, for InnoDB the whole stuff is a bit different due to ACID). The main difference between COUNT(*) and SELECT <some_field(s)> is that count doesn't perform any data reading and with (*) it doesn't care about whether some field is a NULL or not, just count rows by most suitable index (chosen internally). Actually I can suggest that even for InnoDB it's a fastest technique.
Also use case matters. If you want insert unique value make constrain on that field and use INSERT IGNORE, if you want to delete value which may not be in table run DELETE IGNORE and same for UPDATE IGNORE.
Query analyzer define by itself whether two queries are the same on or not and manage queries cache, you don't have to worry about it.
The different between prepared and regular query is that the first one contains rule and data separately, so analyzer can define which data is dynamic and better handle that, optimize and so. It can do the same for regular query but for prepared we say that we will reuse it later and give a hint which data is variable and which is fixed. I'm not very good in MySQL internal so you can ask such questions on more specific sites to understand details in a nutshell.
P.S.: Prepared statements in MySQL are session global, so after session they are defined in ends they are deallocated. Exact behavior and possible internal MySQL caching is a subject of additional investigation.

This is the kind of things in-memory caches are really good at. Something like this should work better than most microoptimization attempts (pseudocode!):
function check_if_value_is_in_table($value) {
if ($cache->contains_key($value)) {
return $cache->get($value);
}
// run the SQL query here, put result in $result
// note: I'd benchmark if using mysqli_prepare actually helps
// performance-wise
$cache->put($value, $result);
return $result;
}
Have a look at memcache or the various alternatives.

Related

Is there a better way to combat SQL Injection?

I've watched Computerphile's video many times on this subject(for any of you who want, this is the link: https://www.youtube.com/watch?v=_jKylhJtPmI). He provides some really good advice on how to combat SQL Injection and make your app more effective. These are the key points from his video:
Don't use straight and unprotected SQL commands because this is the way hackers can perform a SQL Injection, stealing, modifying, or even deleting your data.
A good approach is to use the mysql_real_escape_string(String s) function. This basically places on the start of every dangerous character (/,", {, }, etc) a slash (/). So basically this makes the quote or slash inside the string useless.
The best thing to do is to use prepared statements. So, you basically say:
SELECT * FROM USERS WHERE username = ?
Later you go and replace the question mark with the string you want to input as the user name. This has the advantage of not confusing PHP or any other fault-tolerant language, and using this simple and (kind of, hacky) elegant solution to just say replace this with the string and tell the language that what is given is just a string and nothing more than that.
That is good and all, but this video is really outdated. It was made way back in 2013 and since then a lot of new technology has emerged. So, I tried to search the internet to find if there were any new approaches or if this is the one. But the problem was that either I couldn't find it or either I found something that was super confusing.
So, my question is: Is there a better and enhanced way to combat SQL Injection that has been introduced, or if prepared statements are still the norm and if they are vulnerable to any kind of attack?
Parameter binding is still the best solution in most examples of combining dynamic data with an SQL query.
You should understand why. It's NOT just doing a string substitution for you. You could do that yourself.
It works because it separates the dynamic value from the SQL-parsing step. The RDBMS parses the SQL syntax during prepare():
$stmt = $pdo->prepare("SELECT * FROM USERS WHERE username = ?");
After this point, the RDBMS knows that the ? must only be a single scalar value. Not anything else. Not a list of values, not a column name, not an expression, not a subquery, not a UNION to a second SELECT query, etc.
Then you send the value to be bound to that placeholder in the execute step.
$stmt->execute( [ "taraiordanov" ] );
The value is sent to the RDBMS server, and it takes its place in the query but only as a value and then the query can be executed.
This is allows you to execute the query multiple times with different values plugged in. Even though the SQL parser only needed to parse the query once. It remembers how to plug a new value into the original prepared SQL query, so you can execute() as many times as you want:
$stmt->execute( [ "hpotter" ] );
$stmt->execute( [ "hgranger" ] );
$stmt->execute( [ "rweasley" ] );
...
Are prepared statements the best? Yes, they are. It doesn't matter that the advice comes from 2013, it's still true. Actually, this feature about SQL dates back a lot further than that.
So are query parameters the foolproof way of defending against SQL injection? Yes they are, if you need to combine a variable as a value in SQL. That is, you intend for the parameter to substitute in your query where you would otherwise use a quoted string literal, a quoted date literal, or a numeric literal.
But there are other things you might need to do with queries too. Sometimes you need to build an SQL query piece by piece based on conditions in your application. Like what if you want to do a search for username but sometimes also add a term to your search for last_login date? A parameter can't add a whole new term to the search.
This isn't allowed:
$OTHER_TERMS = "and last_login > '2019-04-01'";
$stmt = $pdo->prepare("SELECT * FROM USERS WHERE username = ? ?");
$stmt->execute( [ "taraiordanov", $OTHER_TERMS ] ); // DOES NOT WORK
What if you want to allow the user to request sorting a result, and you want to let the user choose which column to sort by, and whether to sort ascending or descending?
$stmt = $pdo->prepare("SELECT * FROM USERS WHERE username = ? ORDER BY ? ?");
$stmt->execute( [ "taraiordanov", "last_login", "DESC" ] ); // DOES NOT WORK
In these cases, you must put the column names and syntax for query terms into your SQL string before prepare(). You just have to be extra careful not to let untrusted input contaminate the dynamic parts you put in the query. That is, make sure it's based on string values you have complete control over in your code, not anything from outside the app, like user input or a file or the result of calling an API.
Re comments:
The idea Martin is adding is sometimes called whitelisting. I'll write out Martin's example in a more readable manner:
switch ($_GET['order']) {
case "desc":
$sqlOrder = "DESC";
break;
default:
$sqlOrder = "ASC";
break;
}
I replaced Martin's case "asc" with default because then if the user input is anything else -- even something malicious -- the only thing that can happen is that any other input will default to SQL order ASC.
This means there are only two possible outcomes, ASC or DESC. Once your code has complete control over the possible values, and you know both values are safe, then you can interpolate the value into your SQL query.
In short: always keep in your mind an assumption that $_GET and $_POST may contain malicious content. It's easy for a client to put anything they want into the request. They are not limited by the values in your HTML form.
Write your code defensively with that assumption in mind.
Another tip: Many people think that client input in $_GET and $_POST are the only inputs you need to protect against. This is not true! Any source of input can contain problematic content. Reading a file and using that in your SQL query, or calling an API, for example.
Even data that has previously been inserted in your database safely can introduce SQL injection if you use it wrong.

PDO rowcount used often

I often use the function rowCount of PDO like this for example:
$sql = $dataBase->prepare('SELECT email, firstname, lastname
FROM pr__user
WHERE id = :id');
$sql->execute(array('id' => $_SESSION['user_id']));
$rowCount = $sql->rowCount();
It al the time worked fine but I saw in the PHP manual:
If the last SQL statement executed by the associated PDOStatement was
a SELECT statement, some databases may return the number of rows
returned by that statement. However, this behaviour is not guaranteed
for all databases and should not be relied on for portable
applications.
http://php.net/manual/en/pdostatement.rowcount.php
It works fine with MySQL and MariaDB so I kept on using it. As I use it an application I wish portable, should I modify my code?
I never ask for the row count. Querying always returns an array of result rows (in some format), I can simply ask how many rows in the array -- such as with PHP's count(..) function.
What you're missing is that PDO is an interface to many different databases, not just MySQL. They make no guarantees that the function will return the same sort of values on completely different back-ends.
This is what "for portable applications" means: If you want your code to run universally on an arbitrary database you may need to avoid using that function. If that's not the case, you're not writing generic library code, you can depend on MySQL's particular behaviour.
Just be sure to test whatever you're doing to ensure that assumption is reasonable.
Rather, it's just pointless and superfluous. You are bloating your code for no reason.
From your example it is evident that you are going to use the data selected (otherwise there is no point in selecting email). So it means that you can use that data instead of row count all the way. Assuming the next operator would be fetch(), you can omit rowCount()
$sql = $dataBase->prepare('SELECT email, firstname, lastname
FROM pr__user
WHERE id = :id');
$sql->execute(array('id' => $_SESSION['user_id']));
$user = $sql->fetch();
if ($user) { ...
and from now on you can use $user in the every single condition where $rowCount has been used. Simply because you don't actually need a count here, but rather a boolean flag for which purpose an array would serve as good as an integer.
Even in a case when you don't need email but only to know whether a user exists, you can simply select just 1 scalar value and then fetch it - so your code remains uniform.

SQL query is there do and don't WHERE , AND

I have a request that has 4 conditions:
$query = "
SELECT ROW,col
FROM mytable
WHERE ROW = $num
AND col = '$col'
AND loc = '80'
AND user_id = $_SESSION[id]
";
Should I only have the first condition then parse the data with to apply the 2 other AND statement.
I'm looking for the fastest way to retrieve and parse the data.
There is technically nothing wrong with the methodology, other than the fact that you may want to filter your data (or better yet, use prepared statements).
Also, what will be crucial is whether or not you have a proper index on the columns which are named in your conditions. That way, your database engine doesn't have to scan the whole table, instead it can use an index (performance benefit will be very noticeable with a bigger table).
As in first comment - always use DB to filter data (mind using indexes for faster searching) It's good to use EXPLAIN on certain queries to check if they can be optimised. Try to use parameterized queries, PDO?

Can I use the result of a previous MySQL query in the From section of another MySQL query?

I'm using a PHP webservice where I have performed a simple SELECT query, and stored it
$result = run_query($get_query);
I now need to perform further querying on the data based on different parameters, which I know is possible via MySQL in the form:
SELECT *
FROM (SELECT *
FROM customers
WHERE CompanyName > 'g')
WHERE ContactName < 'g'
I do know that this performs two Select queries on the table. However, what I would like to know is if I can simply use my previously saved query in the FROM section of the second section, such as this, and if my belief that it helps performance by not querying the entire database again is true:
SELECT *
FROM ($result)
WHERE ContactName < 'g'
You can make a temp table to put the initial results and then use it to select the data and in the second query. This will work faster only if your 1-st query is slow.
PHP and SQL are different languages and very different platforms. They often don't even run in the same computer. Your PHP variables won't interact at all with the MySQL server. You use PHP to create a string that happens to contain SQL code but that's all. In the end, the only thing that counts is the SQL code you sent to the server—how you manage to generate it is irrelevant.
Additionally, you can't really say how MySQL will run a query unless you obtain an explain plan:
EXPLAIN EXTENDED
SELECT *
FROM (SELECT *
FROM customers
WHERE CompanyName > 'g')
WHERE ContactName < 'g'
... but I doubt it'll read the table twice for your query. Memory is much faster than disk.
Thanks for the responses, everyone. Turns out what I was looking for was a "query of query", which isn't supported directly by PHP but I found a function over here which provides the functionality: http://www.tom-muck.com/blog/index.cfm?newsid=37
That was found from this other SO question: Can php query the results from a previous query?
I still need to do comparisons to determine whether it improves speed.
If I understand your question correctly you want to know whether saving the "from" part of your SQL query in a php variable improves the performance of you querying your SQL server, then the answer is NO. Simply because the variable keeping the value is inserted into the query.
Whether performance is gained in PHP, the answer is most probable yes; but depends on the length of the variable value (and how often you repeat using the variable instead of building a new complete query) whether the performance will be notable.
Why not just get this data in a single query like this?
SELECT *
FROM customers
WHERE CompanyName > 'g'
AND ContactName < 'g'

Is my method of fetching MySql data using prepared statements inefficient and taxing on my server?

I was informed by someone senior in our company today that the PHP code I have written for performing prepared statements on a MySQL database is "inefficient" and "too taxing on our server". Since then I find myself in the difficult position of trying to understand what he meant and then to fix it. I have no contact to said person for four days now so I am asking other developers what they think of my code and if there are any areas that might be causing bottlenecks or issues with server performance.
My code works and returns the results of my query in the variable $data, so technically it works. There is another question though as to whether it is efficient and written well. Any advice as to what that senior employee meant or was referring to? Here is the method I use to connect and query our databases.
(Please note, when I use the word method I do not mean a method inside a class. What I mean to say is this how I write/structure my code when I connect and query our databases.)
<?php
// Create database object and connect to database
$mysqli=new mysqli();
$mysqli->real_connect($hostname, $username, $password, $database);
// Create statement object
$stmt=$mysqli->stmt_init();
// Prepare the query and bind params
$stmt->prepare('SELECT `col` FROM `table` WHERE `col` > ?');
$stmt->bind_param('i', $var1);
// Execute the query
$stmt->execute();
// Store result
$stmt->store_result();
// Prepare for fetching result
$rslt=array();
$stmt->bind_result($rslt['col']);
// Fetch result and save to array
$data=array();
while($stmt->fetch()){
foreach($rslt as $key=>$value){
$row[$key]=$value;
}
$data[]=$row;
}
// Free result
$stmt->free_result();
// Close connections
$stmt->close();
$mysqli->close();
?>
Any advice or suggestions are useful, please do contribute and help out even if you are only guessing. Thanks in advance :)
There are two types of code that may be inefficient, the PHP code and the SQL code, or both.
For example, the SQL is a problem if the `col` column isn't indexed in the database. This puts lots of load on the database because the database has to scan very many rows to answer queries. If `col` isn't indexed in the given query, then all of the rows in the table would be scanned. Also, if the value passed in isn't very selective, then many rows will have to be examined, perhaps all of the rows, as MySQL will choose a table scan over an index scan when many rows will be examined. You will need to become familiar with the MySQL EXPLAIN plan feature to fix your queries, or add indexes to the database to support your queries.
The PHP would be a problem if you followed something like the pattern:
select invoice_id from invoices where customer_id = ?
for each invoice_id
select * from line_items where invoice_id = ?
That kind of pattern will lead to "over querying" the database, which puts extra load on it. Instead use a join:
select li.* from invoices i join line_items li using (invoice_id)
Ask your database administrator to turn on the slow query log and then process it with pt-query-digest
You can use pt-query-digest to report on queries that are expensive (take a long time to execute) and also to use it to report by frequency to detect over querying.

Categories