PHP/SQL: framework query building vs string concatenation

PHP/SQL: framework query building vs string concatenation - php

Consider 2 ways of querying the database:
With a framework (Yii):
$user = Yii::app()->db->createCommand()
->select('id, username, profile')
->from('tbl_user u')
->join('tbl_profile p', 'u.id=p.user_id')
->where('id=:id', array(':id'=>$id))
->queryRow();
With string concatenation (separating individual parts of a SQL statement):
$columns = "id,username,profile"; // or =implode(",",$column_array);
//you can always use string functions to wrap quotes around each columns/tables
$join = "INNER JOIN tbl_profile p ON u.id=p.user_id";
$restraint = "WHERE id=$id ";//$id cleaned with intval()
$query="SELECT $columns FROM tbl_user u {$restraint}{$join}";
//use PDO to execute query... and loop through records...
Example with string concatenation for pagination:
$records_per_page=20;
$offset = 0;
if (isset($_GET['p'])) $offset = intval($_GET['p'])*$records_per_page;
Squery="SELECT * FROM table LIMIT $offset,$records_per_page";
Which method has better performance?
PHP's PDO allows code to be portable to different databases
2nd method can be wrapped in a function so no code is ever repeated.
String concatenation allows building complex SQL statements programmatically (by manipulating strings)

Use which is right for you and your project team. Frameworks are written for a reason, so use them if it suits, but if it doesn't (and there are reasons they don't) then fall away.
I don't know Yii, but if you look at a lot of frameworks, all they do is build a string query from the parts at the end of the day, hopefully taking advantage of parametization but not always. So, regarding speed, string concat is probably "fastest" - but you're unlikely to really see the difference with a stop watch (you could benchmark if you needed with 1000 queries, but other features such as better error checking or caching may unfairly slow or speed up hte results).
But one advantage frameworks have is they can add context-sensitive caching and know when you update table X that you query caches for A, D and F need to be deleted, but queries B, C and E are all good.
You also have "easy to read" and "debug" and "functionality" to worry about. The top example is much easier to read, which is important in a shared project.
You also need to consider prepared statements - does the framework use them? If so, does it allow you to re-use them (as opposed to merely using them for syntax purposes).
But can the framework do sub-selects? Can it do parametization inside the "JOIN ON"? If not, string concatination with PDO may be more appropriate.
It's not a hard and fast answer - but hopefully provides all the points you need to consider.
Recommendation: use framework unless you really notice it being slow, using too much memory or there is some other good reason not to.

Related

Is there a way to add a LIMIT to an UPDATE query in Doctrine ORM?

I am using Doctrine 2.5.x and I am having problems with getting the LIMIT clause to work for UPDATE queries. It always updates all matched records (i.e. it seems to ignore the LIMIT clause).
setMaxResults() seems to have no effect when used together with UPDATE queries.
As a quick workaround I am using a native MySQL query but that cannot be the best solution.
I tried these examples but none are working:
Doctrine update query with LIMIT
https://recalll.co/app/?q=doctrine2%20-%20Doctrine%20update%20query%20with%20LIMIT
QueryBuilder with setMaxResults() (does not work):
$qb = $em->createQueryBuilder();
$query = $qb->update('\Task\Entity', 't')
->set('t.ClaimedBy', 1)
->where('t.Claimed IS NULL')
->getQuery();
$query->setMaxResults(20);
$this->log($query->getSQL());
Hope someone can help in finding a better solution than a native query. It takes away the whole benefit of the ORM.
Is it even possible to use a LIMIT clause in an UPDATE statement?

In short, no, because the SQL specification does not support UPDATE ... LIMIT ..., so none of the ORM trying to achieve portability should allow you to do it.
Please also have a look at MySQL Reference Manual itself stating that UPDATE ... LIMIT ... is not a standard SQL construction:
MySQL Server supports some extensions that you probably will not find in other SQL DBMSs. Be warned that if you use them, your code will not be portable to other SQL servers. In some cases, you can write code that includes MySQL extensions, but is still portable, by using comments of the following form:
SQL statement syntax
The ORDER BY and LIMIT clauses of the UPDATE and DELETE statements.
So by essence because what you are trying to achieve is not standard SQL the ORM will not have a portable way to implement it and will probably not implement it at all.
Sorry, but what you are trying to achieve is not possible through DQL, because:
Ocramius commented on Sep 2, 2014
DQL doesn't allow limit on UPDATE queries, as it is not portable.
As suggested in this issue of DoctrineBundle repository by its owner, Marco Pivetta (he also happen to be the owner of the ORM repository).
Further information, although it might needs a good link to the right ISO specification documentation that is sadly not freely available:
The ISO standard of UPDATE instruction do not allow LIMIT in an UPDATE, where SELECT is, of course, an instruction that does allow it.
As you were raising it by yourself, the purpose of an ORM is to not write pure SQL in order to have it cross DBMS compatible. If there is no possibility to make it that way, then it makes sense that the ORM does not implement it.
Also note that on other SQL variant than MYSQL, the limit is actually part of the SELECT clause:
select * from demo limit 10
Would translate in a SQL Server to
select top 10 from demo
Or in Orcale to
select * from demo WHERE rownum = 1
Also see: https://stackoverflow.com/a/1063937/2123530

As b.enoit.be already stated in his answer, this is not possible in Doctrine because using LIMIT's in an UPDATE statement is not portable (only valid in MySQL).
Hope someone can help in finding a better solution than a native query. It takes away the whole benefit of the ORM.
I would argue that you are mixing business rules with persistence (and the ORM does not play well with that, luckily).
Let me explain:
Updating an entity's state is not necessarily a business rule. Updating max. 20 entities is (where does that 20 come from?).
In order to fix this, you should properly separate your business rules and persistence by separating it into a service.
class TaskService
{
private $taskRepository;
public function __construct(TaskRepository $taskRepository)
{
$this->taskRepository = $taskRepository;
}
public function updateClaimedBy()
{
$criteria = ['Claimed' => null];
$orderBy = null;
// Only update the first 20 because XYZ
$limit = 20;
$tasks = $taskRepository->findBy($criteria, $orderBy, $limit);
foreach($tasks as $task) {
$task->setClaimedBy(1)
}
}
}

Which way will have better performance - Filter/Order results with sql query or php?

Recently I tried new way to filter/order SQL queries: instead filter / order the results in SQL query I pull all data I need "as is" then doing the filters / orders with php code.
For example : I want only events with name like "test" order by Date .
Table struct :
id eventDate eventName
New way :
$query = mysqli_query($GLOBALS["link"], "SELECT `eventDate`, `eventName` FROM `tablename` WHERE `id`='X' ");
while($data = mysqli_fetch_array($query))array_push($this->events,$data);
Then I'm using array_fillter and usort /array_multisort and array_values in php...
Old way :
$query = mysqli_query($GLOBALS["link"], "SELECT `eventDate`, `eventName` FROM `tablename` WHERE `id`='X' AND (`eventName` LIKE '%test%') ORDER by `eventDate` DESC ") ;
while($data = mysqli_fetch_array($query))array_push($this->events,$data);
So what is better ? complex sql queries or pull all the data "as is" then make the filters and orders in php ?
The above example is very simple. I'm talking about much more complex filtering...
Please answer if you are absolutely sure !
Thanks :)

The answer will depend on a number of factors, including:
The size of your database
How the database is indexed
The size of the result row
Pros for filtering with PHP:
Less Complex SQL Queries
Potentially Simpler Code
Cons for filtering with PHP:
Higher RAM usage, and with a large dataset, this can be a real deal breaker
Slower (Unless you have a table which is not indexed properly)
Pros for filtering with SQL:
Typically much faster, especially on properly indexed tables
Less RAM Usage
Less Data to Parse
Cons for filtering with SQL:
SQL queries can become unreadable if taken too far
Moving more logic to the query can make database interpretability more challenging (mySQL, SQLite, etc.)
Your milage my vary. Everyone has their own opinions, but in my personal experience, I've found that using native SQL filtering is typically the better choice. Remember the ultimate goal should be clean, maintainable code. And using an SQL Formatter goes a long way in making SQL more readable.

Logicaly mysql will be faster in all cases (avoid data transfer to php, mysql is written on c++), but it is the subject to test.
True power reveals with usage of indexes. In your case you use "like '%value%'" which disables index. But in easily solvable with fulltext index. With MySql 5.6 Fulltext indexes supported in InnoDB engine as well.

Reason to separate query and result?

Is there any reason why you should separate query and result when writing code, other than maybe readability?
Example separate:
$query = "SELECT * FROM foo ORDER BY foo2 DESC LIMIT 10";
$result = mysqli_query($dbconnect,$query);
compared to single line:
$result = mysqli_query($dbconnect,"SELECT * FROM foo ORDER BY foo2 DESC LIMIT 10");
I usually use the 1st example but have found myself using the 2nd single line example more and more of late as its quicker and easier to write so thought I'd ask before it becomes 2nd nature and then find out its really bad and may blow up the world or something 0.o

This is more of a preference, but readability is certainly a strong justification. However, I would also argue that scalability and maintainability of the query might also be a fitting arguments as well. suppose you have a complex query with multiple variables being sanitized for SQL injections with joins as so forth. In other words a long query:
$result = mysqli_query($dbconnect,"SELECT * FROM foo, bar Where col1.bar = (Select col1 From someTable where {$varibale} = ...) Group By ... ORDER BY foo2 DESC LIMIT 10");
Stuffing all of that into the function makes it difficult to read and anoying to maintain as well.

To give an answer to this question. Do like you want (or almost).
Like I said, the first way is maybe used due to the historical where we was limited to 80 characters. But this restriction doesn't exist anymore (and we have bigger screens). By the way, I don't tell you to put 300 characters per line.
Use the one you feel more readable / maintainable with. The only drawback can be your coworkers. They can dictate which one you must use.

Actually you could write your whole source in one line.
(Minimize source = Delete all spaces and linebreaks)
of course not in strings :)
Effect is: (1 Positive and 3 Negative to do this)
Bad readability (-)
Faster parsing (+)
Slow editing source (-)
No good maintainability (-)
Not writing everything to one line
Effect is: (3 Positive and 1 Negative to do this)
Good readability (+)
Little bit slower parsing (-)
Fast editing source (+)
Better maintainability (+)
(Guess you only feel the parsingtime difference in very large codefiles)
This is how I decide to write my source.
Sometimes (for e.g. Plugins) I use the minimized version.

for me, its all about readability. I write a ton of code. and release app. 6 months later when i go back to fix a bug, I need to quickly find things. I separate out as much as I can. Parsing times are minimal unless you are writing incredibly large volumes of code. its all about readabilty for me. I go one step further to make it as pretty as possible.
$select = "SELECT x,y,z";
$from = " FROM foo,bar,table3";
$order = " ORDER BY foo2 DESC ";
$limit = " LIMIT 10";
$query = $select . $from . $order . $limit;
$result = mysqli_query($dbconnect,$query);

I think it's just a preference too. I usually use the short way for short and "unvarying" requests.
But for longer requests, it's easier to read and maintain taken separately.
Especially for "dynamic requests" (partly depending on external conditions): it's sometimes easier to work on a string, for example to concatenate a where clause, or another, or none at all, depending on a condition.
An example I have in mind: a "list.php" script displaying all articles if called with no parameter, and filtering articles of a specific category 'foo' if called with get-parameter ?cat=foo given.
I personally find it easier to read and maintain like this:
$query="select name, description, price from articles";
if(isset($_GET['cat'])) $query.=" where cat={$_GET['cat']}";
$result = mysqli_query($dbconnect,$query);
Of course it could also be done directly in the one-line version using ternary (?:) operators, but then readibility suffers a bit:
$result = mysqli_query($dbconnect,"select name, description, price from articles".(isset($_GET['cat'])?" where cat={$_GET['cat']}":""));
It becomes really much more unreadable and unmaintainable if you add other conditions. For instance, a second filter on get-parameter `?supplier=bar"...
[EDIT:] I just had the case: when your request fails, it's easier to debug when the query stands in a string. A simple echo $query; allows you to see exactly what you're sending to the DB server...
In following php code: $query="select * from $userTable where id=:uid";, you might oversee that $userTable wasn't defined...
but after an echo $query;, you can't oversee that there's something missing here: $query="select * from where id=:uid";...

security difference between (double) $user_input and bind_param('d', $user_input)

Lets say I were to perform a prepared statement like this:
$qry->prepare('UPDATE table_name SET column1 = ? string_column = ? WHERE column3 = ? AND column4 = ?');
$qry->bind_param('sbid', $string, $blob, $int, $double);
$int = 'non int value'; /* gives 0 in the database */
$blob = 'some string';
$string = 'another string';
$double = $double;
$qry->execute();
$qry->close();
Let's just say I only wanted to perform the query once, I just used the prepared statement in the name of security. From what I've been reading - its more overhead to use prepared queries only once, and that amounts to compromising performance for the security benefits. That being said - what would be the performance/security difference in doing the same query one time like this.
$int = (int) $int;
$blob = "'" .mysql_real_escape_string($blob) ."'";
$string = "'" .mysql_real_escape_string($blob) ."'";
$double = (double) $double;
$db->query("UPDATE SET column1 = $int, column2 = $blob WHERE column3 = $string AND column4 = $double ");
PS. I am not interested on how Prepared statements improve the performance but about the security and speed difference for a single query.

There is quite a lot to that. Some random points
Single use prepared statements do impose a (more than theoretical) performance penalty, which is higher, if a lot of connections exist to the MySQL server. (Think: Context switches)
But you should not run a DB server so close to its limits, that this makes the difference.
But you not always have the choice (Think: shared hosting)
or:
There are some (or even many) cases, where prepared statements do not offer a security benefit - there is a lot of business logic, where no user-generated data is involved (Think: Jointables, that only carry IDs) or where the user-generated data has to be validated beforehand for other reasons (Think: Price calculations, memcached lookups, ...)
But selecting one of many styles for each single query results in unmaintainable code.
But it is sometimes unavoidable (Think: There is no prepared query support for the IN ( ) construct)
often overlooked:
Prepared queries sometimes make it harder to be RDBMS-agnostic
But prepared queries offer the best know protection against SQL injection.
My favorite:
it is common advice to simply always use prepared queries
But the majority of living things on this planet would advise you to eat feces or rotting organic substance.
So the choice of style often has to be made on a case-by-case basis. We have adopted the way of encapsulating all DB access including parameter management in a standardized library, that is simply require()ed, so you can drop-in replace with prepared queries, escaping or whatever you want and your RDBMS supports.

Thank you for the great question.
As a matter of fact, you could use both methods at once.
Most people do confuse the idea of a prepared statement in general with [very limited] implementation offered by major DBMS. While the latter can be questioned, the former is indeed the only way.
Take a look at the example. Let's run your query using safeMysql:
$sql = "UPDATE SET column1 = ?i, column2 = ?s WHERE column3 = ?s AND column4 = ?s";
$db->query($sql, $string, $blob, $int, $double);
It performs the very string formatting like your code does, but does it internally. Why? Because it doesn't matter how it's implemented internally (by means of native prepared statement or manual formatting), but it is essential to use a prepared statement to assembly your query either way.
There are some essential points about prepared statements, overlooked by most people:
it makes formatting being always complete (Although in your example you're doing the right thing and make complete formatting, it is still very easy to slip into incomplete one, like this:
$colname = "`$colname`";
your formatting being always the right one. it won't let you do something like
$colname = "`" .mysql_real_escape_string($colname) ."`";
which would be useless and lead you to injection
it will make formatting obligatory. With assembling a query your current way it is very easy to overlook a variable or two.
it will do proper formatting as close to the query execution as possible. That's the point of great importance, as
it will not spoil your source variable (what if query failed and you want to echo it back?)
it won't let you to move formatting code somewhere away from the query, which may lead to fatal consequences.
after all, it will make your code dramatically shorter, without all that boring manual formatting!
That's the real benefits of a prepared statements, which guarantee the safety and thus made them so overly popular. While that thing with server-side preparation, although being quite smart, is just a particular implementation.
Also, taking the idea of a prepared statement as a guide, one can create a placeholder for the everything that may be added into query (an identifier or an array for example), making it real safe and convenient to use.
Keeping all the things in mind one have to implement the very idea of a prepared statement in their DB access library, to make the code safe and short.
Just a couple of examples from safeMysql:
$name = $db->getOne('SELECT name FROM table WHERE id = ?i',$_GET['id']);
$data = $db->getInd('id','SELECT * FROM ?n WHERE id IN ?a','table', array(1,2));
$data = $db->getAll("SELECT * FROM ?n WHERE mod=?s LIMIT ?i",$table,$mod,$limit);
$ids = $db->getCol("SELECT id FROM tags WHERE tagname = ?s",$tag);
$data = $db->getAll("SELECT * FROM table WHERE category IN (?a)",$ids);
$data = array('offers_in' => $in, 'offers_out' => $out);
$sql = "INSERT INTO stats SET pid=?i,dt=CURDATE(),?u ON DUPLICATE KEY UPDATE ?u";
$db->query($sql,$pid,$data,$data);
Just try the same with conventional mysql(i) and see the amount of code it takes you.
You may note that with usable prepared statements you have to mark them with type, because there are more types than just simple string, and it's the only reliable way to tell a driver how to format your variable.

I believe they are equally secure from security point of view, but using prepare does not only make your SQL secure, but also make you FEEL secure. You cannot trust yourself to manually escape and convert to proper type all the time. If you write 10,000 different SQL queries, you will tend to forget to escape one or two.
So in conclusion, prepare is a better habit to fight against SQL injection. Putting PHP variable directly to the SQL query make me feel uneasy when sleeping at night.

How to improve doctrine performance (plain sql is 4x faster)?

The problem is next - I want to execute simple query (e.g. 10 rows from one table)
In Doctrine this operation takes 0.013752s
Here is DQL:
$q = Doctrine_Query::create()
->update('TABLE')
->set('FIELD', 1)
->where('ID = ?', $id);
$rows = $q->execute();
But when i use plain sql and mysql_query() it takes only 0.003298s
What's wrong? Is Doctrine realy 4x slower?

John,
Nothing is wrong. Doctrine introduces considerable overhead compared to a straight SQL query. But you gain the convenience of a nice object oriented interface to the database as well as many other benefits. If raw performance is really important then you might not want to use Doctrine.
For queries where I need performance over convenience (hundreds of thousands of inserts for example) I use PDO to avoid the overhead that gets introduced by the ORM.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.