Why should SELECT * be avoided in SQL? [duplicate]

Why should SELECT * be avoided in SQL? [duplicate] - php

This question already has answers here:
Why is SELECT * considered harmful?
(16 answers)
Closed 9 years ago.
I am developing an application and I was reading about how queries work. I read somewhere that you should avoid SELECT * FROM... where blah = blah
Why is that? And what's the workaround if you're trying to select pretty much everything?

Initially need to know what data you will need. Although, you can select all at once, if such requests will not be much. The difference in performance, you'll see only in heavy projects.

This is not really a direct answer to your question "Why is that?" (so downvote the answer if you need to.) It's an answer to the "what's a workaround if you need to" question.
The only workaround to avoid SELECT *, when I need all of the columns in the table, is to get a list of all the columns. And that's just extra busy I work I don't need when I'm already busy.
To put a backwards twist on a line from Office Space charaacter Peter Gibbons: "The thing is, Bob, it's not that I don't care, it's just that I'm lazy."
With MySQL I make for less busy work by using the SQLyog right click menu option to generate a skeleton SELECT statement that contains all the columns.
For a SQL statement that references multiple tables, I want every column reference to be qualified with a table alias, so I'll just use a SQL statement to retrieve a ready-to-use list of columns for me:
SELECT GROUP_CONCAT(CONCAT('t.',c.column_name)
ORDER BY c.ordinal_position
) AS col_list
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase'
AND c.table_name = 'mytable'
if I only need a few out a long list, it's easier for me to get them out of a vertical list
SELECT CONCAT(',s.`',c.column_name,'`') AS col_names
FROM information_schema.columns c
WHERE c.table_schema = 'mydatabase'
AND c.table_name = 'mytable'
ORDER BY c.ordinal_position
When the column references are qualified, the backticks are only needed for "special" characters in column names (or maybe some weird case sensitive setting.)
I can start with that list, and whittle out the columns I know I don't need.
Again, I apologize that this doesn't answer the question "Why?" There's several good reasons, given in answers to similar questions. For me, a big reason is that a future reader of the statement isn't going to have to go look somewhere else to find out what columns are being returned. Sure they can copy the statement, and go run it in a different environment, to see the list. But if the statement has variable substitutions, bind variables, and the dots and double quotes and calls to mysql_real_escape_string (in the case of mysql_ interface), that's a bigger hassle than it needs to be. Sure, the code can be modified to echo out the SQL text before its executed, and the reader may need to do that. But someone just reviewing the code shouldn't have to do that. And having the list of columns and expressions being returned by the statement, in an order more appropriate than the ordinal position of the columns in the table, I think that just makes for more readable code. (If it's important for the column to be returned by the statement, then I think it's reasonable that the column name is shown in the query.)
(This was in terms of application code, statements that are going to be included in an application. For ad hoc queries and development and such, I use the SELECT c.* freely. But when a statement is going into an application, that * gets replaced.

Related

PHP PDO ODBC unexpected empty result set

I am trying to track down a problem with using PDO via an ODBC connection to a SQL Server database where I am getting an empty result set for a known good query. I would appreciate any guidance from the community. This is part of a large system that I have been working on for about five years; it takes an XML representation of a report, generates SQL from it, runs the query, formats the result set as requested, and generates a web page for presentation. More than you probably needed to know, but I am trying to convey that I understand much of how this is supposed to work and in most cases it works reliably. But I have a customer who wanted something new, and it broke my system.
I refer to this as a known good query in the sense that I can copy and paste the query from my log file into SSMS (SQL Server console) and run it. It yields 62 rows of results. But when I run the same query through PDO, I get a PDOStatement back, no errorInfo(), no exceptions thrown, and so on. But fetchAll() returns an empty array. I was originally using query(), but it seemed safer to use prepare() and execute() in case there was something I was missing in the query. It made no difference.
I realize that there can be type conversion issues, but in the example below, the two retrieved fields are of type nvarchar(128) and nvarchar(32), respectively, which return successfully with other queries.
I should mention that the query is executed exactly once in the app, so it's not a matter of some previous execution interfering with the next one, as far as I can tell. Also, the PDO object has setAttribute(PDO::ATTR_ERRMODE,PDO::ERRMODE_EXCEPTION);
Here's the PDOStatement returned by execute():
Result Set PDOStatement Object
(
[queryString] => SELECT [dbo].[Supplier].[SupplierName] AS suppliername,[dbo].[Item].[ItemLookupCode] AS itemlookupcode FROM [dbo].[Order] LEFT JOIN [dbo].[OrderEntry] ON [dbo].[Order].ID=[dbo].[OrderEntry].OrderID LEFT JOIN [dbo].[Item] ON [dbo].[Item].ID=[dbo].[OrderEntry].ItemID,[dbo].[Supplier] WHERE ([dbo].[Order].Time >= '2015-01-01 00:00:00') AND ([dbo].[Order].Time <= '2015-03-31 23:59:59') AND ([dbo].[Item].SupplierID=[dbo].[Supplier].ID) ORDER BY [dbo].[Supplier].[SupplierName]
)
It's not that complex, and other SQL queries work fine against this database. There's just something about this one that fails via PDO, but works inside SSMS.
Any ideas? Has anyone seen this behavior before? Is there some other way to see what's going on here? I have looked at several questions on this theme, but they all seemed to have something wrong that I am not doing.
PHP 5.4.22, by the way.

After breaking your query down into a format such as I am using below, I noticed that you were mixing explicit joins (LEFT JOIN, INNER JOIN, etc) and implicit joins (FROM table1, table2). This is not only considered a very bad practice, but has been known to cause unusual and unexpected query responses on occasion. So, looking at the implicit logic of your joins, I have rewritten the query as below:
SELECT
[dbo].[Supplier].[SupplierName] AS suppliername,
[dbo].[Item].[ItemLookupCode] AS itemlookupcode
FROM [dbo].[Order]
INNER JOIN [dbo].[OrderEntry]
ON [dbo].[Order].ID=[dbo].[OrderEntry].OrderID
INNER JOIN [dbo].[Item]
ON [dbo].[Item].ID=[dbo].[OrderEntry].ItemID
INNER JOIN [dbo].[Supplier]
ON [dbo].[Item].SupplierID=[dbo].[Supplier].ID
WHERE ([dbo].[Order].Time >= '2015-01-01 00:00:00')
AND ([dbo].[Order].Time <= '2015-03-31 23:59:59')
ORDER BY [dbo].[Supplier].[SupplierName]
I changed the LEFT JOINs in your query to INNER JOINs, because the [Item].SupplierID and [Supplier].ID had to match in your original query (and thus, to exist, since equals will not return a TRUE value if either or both values are NULL.) Thus, the OrderEntry value also had to exist for a valid response to return. If the row must exist for valid data to be returned, you should always use an INNER JOIN - it simplifies internal logic and can often result in faster query responses.
I realize this is an old question at this point, but a good answer is never wasted.

I don't disagree with your point. If I was hand-crafting SQL queries, they would come out much like yours.
But the context here is diffferent. In this system (http://www.calast.com/DynaCRUX.html), there is an abstraction of the database tables and their relationships, expressed in XML. And, there is an abstraction of the desired report, also in XML. The app that creates SQL has to deal with the inputs it is given. Sometimes there is enough information to generate "good" SQL like you and I would write. And sometimes there isn't, leaving the system to do the best it can. The fallback is at times pre-ANSI join syntax.
Also, I did point out that (ugly as we might think it), the generated query (1) is legal T-SQL, and (2) produced the output the customer wanted, when run inside SSMS. The problem was never in the query, as it turns out. It was just a configuration bug in my system, so I need to close this question out.
That said, I have recently rewritten the SQL generation engine to use a different approach that produces queries that are much more reasonable. That is, ones that look as you and I would write them.
Your answer is a good one, and I suspect it will help others write better queries.

Getting a Mysql Results without knowing a column name [duplicate]

This question already has answers here:
Get table column names in MySQL?
(19 answers)
Closed 9 years ago.
As I am still learning PHP and MySQL, I would like to know if it is possible to query a table without knowing it's name. Knowing the table I am querying and the column I would like to retrieve, I can write something like
$book_title=$row['book_title'];
Then I can use the resulting variable later in the script. In each case, each book category table will have a column of interest with a different name. I have several tables on which I am running queries. I am able to query any table by using a variable that always evaluates to the correct table name, because all the input from users corresponds to the tables in the database, so the $_POST super global will always carry a correct table name. The problem is for me to have a
$variable=$row['column'];
in cases where I do not know a column name before hand even though I know the table name.
My queries are simple, and look like
query="select * FROM $book_categories WHERE id=$id";
$row = mysqli_fetch_array ($result);
$variable=$row['?'];
The question mark say, I do not know what column to expect, as it's name could be anything from the tables in the database!
Since I have several tables, the query will zero on a table, but the column names in each table varies so I would like to be able to use one query that can give me an entry from such a column.
I hope my question is clear and that I am not asking for the impossible. If it's ambiguous, I care to elucidate (hope so).

I'm not sure what you mean, but it is possible to reference specifc columns by typing index (starting with 0) something like this: $row[0], $row[1] where 0 indicates the first column, and 1 indicates the second column from the returned recordset.
Example:
If you have a select-statement like this:
SELECT title, author FROM books
You could reference these two columns with $row[0], $row[1]
If you try to get the value of $row[2] you will get an unassigned value because there are only two columns (0 and 1) from the recordset.
If you have a select-statement like this:
SELECT * FROM book_categories
and the recordset returns three columns, then you could access these with $row[0], $row[1] and $row[2]. $row[3] does not exist because there are only three columns (0,1 and 2)

Since you are learning maybe we could take some time to explain why this is possible but many people (including myself) would say this is bad -- or at least dangerous
Why you can
Your SQL query is basically a text string you send to the DB server, which decode that string trying to interpret it as SQL in order to execute the query.
Since all you send to the DB server is text string, you could build that string however you want. Such as using string interpolation as you did:
select * FROM $book_categories WHERE id=$id
That way, you could replace any part of your query by the content of a variable. You could even go further:
$query FROM $book_categories WHERE id=$id
Where $query could by SELECT * or DELETE.
And, why not initializing all those variables from a form:
$book_categories = $_POST['book_categories'];
$id = $_POST['id'];
$query = $_POST['query'];
Great, no? Well, no...
Why you shouldn't
The problem here is "could you trust those variables to only contain acceptable values?". That is, what would append if $book_categories somehow resolve to one table you didn't want to (say myTableContainigSecretData)? And what if $id resolve to some specially crafted value like 1; DELETE * FROM myImportantTable;?
In these conditions, your query:
select * FROM $book_categories WHERE id=$id
Will become as received by the DB server:
select * FROM myTableContainigSecretData WHERE id=1; DELETE * FROM myImportantTable;
Probably not what you want.
What I've tried to demonstrate here is called SQL injection. This is a very common bug in web application.
How to prevent that?
The best way to prevent SQL injection is to use prepared statement to replace some placeholders in your query by values properly shielded against SQL injection. There was an example posted a few minutes ago as a response to an other question: https://stackoverflow.com/a/18035404/2363712
The "problem" regarding your initial question is that will replace values not table or columns identifiers.
If you really want to replace table/columns identifiers (or other non-value part of your query) by variables contents, you will have to check yourself the content of each of these variables in order to prevent SQL injection. This is quite feasible. But that's some work...

When writing a query in mysql [duplicate]

This question already has answers here:
Which is faster/best? SELECT * or SELECT column1, colum2, column3, etc
(49 answers)
select * vs select column
(12 answers)
Closed 8 years ago.
Lets say I have a table named users with the fields id, name, username, and password. When i write the query for getting all of the fields, is it better to write it this way
$sql = mysqli_query($con,"SELECT * FROM users LIMIT 50");
or this way:
$sql = mysqli_query($con,"SELECT id, name, username, password FROM users LIMIT 50");
for this code:
while($me = mysqli_fetch_array($sql)){
$id = $me['id'];
$name = $me['name'];
$user = $me['username'];
$pass = $me['password'];
}
Is one better then the other. Is one more finely tuned then the other. Or are they both the exact same performance wise

There is no difference in performance at all. I recommend being explicit about the columns that you want to select in production code, however. If you add a column to the table it could have an effect on the query. It's also clearer to anyone maintaining the code what the query is doing and what values are needed.
In point of fact, using SELECT * is less efficient because MySQL will have to look up the column names and it's less memory efficient because the constructs that PHP creates will have more values than are used (possibly) which costs more memory. However, I think this is negligible enough for my "no difference" comment to stand.

I dont know about performance, but when doing joins this can get you into trouble if you have the same column in more than one table. In this case you need to manually alias which will generally require listing all the columns anyhow. Given that i prefer to remain consitent, and as Explosion Pills mentions its easier to tell whats going on/maintain, than using *.

Never use * to return all columns in a table–it’s lazy. You should only extract the data you need. Even if you require every field, your tables will inevitably change.
Fore more detail see this

Can I use the result of a previous MySQL query in the From section of another MySQL query?

I'm using a PHP webservice where I have performed a simple SELECT query, and stored it
$result = run_query($get_query);
I now need to perform further querying on the data based on different parameters, which I know is possible via MySQL in the form:
SELECT *
FROM (SELECT *
FROM customers
WHERE CompanyName > 'g')
WHERE ContactName < 'g'
I do know that this performs two Select queries on the table. However, what I would like to know is if I can simply use my previously saved query in the FROM section of the second section, such as this, and if my belief that it helps performance by not querying the entire database again is true:
SELECT *
FROM ($result)
WHERE ContactName < 'g'

You can make a temp table to put the initial results and then use it to select the data and in the second query. This will work faster only if your 1-st query is slow.

PHP and SQL are different languages and very different platforms. They often don't even run in the same computer. Your PHP variables won't interact at all with the MySQL server. You use PHP to create a string that happens to contain SQL code but that's all. In the end, the only thing that counts is the SQL code you sent to the server—how you manage to generate it is irrelevant.
Additionally, you can't really say how MySQL will run a query unless you obtain an explain plan:
EXPLAIN EXTENDED
SELECT *
FROM (SELECT *
FROM customers
WHERE CompanyName > 'g')
WHERE ContactName < 'g'
... but I doubt it'll read the table twice for your query. Memory is much faster than disk.

Thanks for the responses, everyone. Turns out what I was looking for was a "query of query", which isn't supported directly by PHP but I found a function over here which provides the functionality: http://www.tom-muck.com/blog/index.cfm?newsid=37
That was found from this other SO question: Can php query the results from a previous query?
I still need to do comparisons to determine whether it improves speed.

If I understand your question correctly you want to know whether saving the "from" part of your SQL query in a php variable improves the performance of you querying your SQL server, then the answer is NO. Simply because the variable keeping the value is inserted into the query.
Whether performance is gained in PHP, the answer is most probable yes; but depends on the length of the variable value (and how often you repeat using the variable instead of building a new complete query) whether the performance will be notable.

Why not just get this data in a single query like this?
SELECT *
FROM customers
WHERE CompanyName > 'g'
AND ContactName < 'g'

Query a Query - MySQL and PHP

I was recently trying to do a project*, which caused me to ask this question. Although since then I've found an alternative solution, I am still curious if what I envisioned doing is, in any way, possible.
Essentially, I am wondering if there is anyway to perform a MySQL query on a MySQL query result in php. For example:
$result = mysql_query("SELECT * FROM foo WHERE bar=".$barValue);
AND THEN, be able to perform multiple queries on $result:
$newResult = mysql_query("SELECT * FROM $result WHERE otherBar=".$barValue);
OR
$otherNewResult = mysql_query("SELECT * FROM $result WHERE otherOtherBar=".$barValue." ORDER BY foobar ASC");
AND so on and so forth...
I realize that I could append the original query with my new WHERE statements and ORDER BYs, but that causes my to query the database unnecessarily and it prevents me from writing more objected oriented code (because I can't pass around a result to be queried, but rather have to requery the database in every function...)
Any advice, pieces of code, frameworks, or ramblings appreciated.
*BTW, my project was having to query a large database of people for people born in certain age groups and then query those age groups for different demographics.
Edit
No, writing a custom function to query the database is not worth the object-orientation (and modifiability) it would give me

You could do a nested query in the same SQL query and keep PHP out of it:
'SELECT * FROM (SELECT * FROM foo WHERE bar="something") AS q1 WHERE q1.bar2 = "something else"'

The question has already been answered. However following explanation will help someone who might be interested in knowing the details of it.
What are Nested query / subquery:
Subqueries are also known as nested queries. A subquery is a SELECT statement within another statement. MySQL supports all SQL standards and additionally provides MySQL specific features.
Why should I use Subquery:
Subquery is structured and it is possible to isolate each parts of statement
Subquery is more readable that complex joins and unions
Subquery provides alternative means to perform action which otherwise would require complex joins and unions
What Subquery returns:
A subquery can return a single value, a single row, a single column, or a table. These are called scalar, column, row, and table subqueries.
Reference: http://dev.mysql.com/doc/refman/5.7/en/subqueries.html
http://www.w3resource.com/sql/subqueries/nested-subqueries.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.