storig DB results locally/cache for future operations - php

I want to access the DB only once and store the results in local memory/cache for future actions.
for Eg: I want to execute this query first;
$query = "select * from songs";
and store the results locally so that I can use it later to execute queries like this;
$query1 = "select * from songs order by song_id asc";
$query2 = "select song_name from songs where song_id='5'";
I am using PHP to access the DB.

Here are some ideas:
Use a "local" in-process SQL-database such as SQLite. SQLite is very fast (in many [trivial] cases it performs faster than MySQL, PostgreSQL or Firebird) and supports a good subset of SQL-92 syntax, but there is extra overhead to populate the "data-set" (SQLite DB) and building any indices, etc. This approach allows for great flexibility and allows persistence.
Use memcached or similar. However, memcached won't fit the model envisioned. In the models where memcached does work, an approach like this can work really well.
One could "just" store the objects/rows and then perform the query manually over the in-memory array (without the SQL syntax).
There may be some transparent caching systems such as the MTCache research project. I have no experience with these and do not know if there is a viable solution [for PHP].
All of the above suggestions depend heavily upon exact context/situation and may actual interfere with performance and/or 'correct results'. Remember that MySQL actually performs a good bit of internal work to ensure the data pages/cache are 'hot' and may perform other caching as it sees fit; coupled with good indices I could not actually recommend any of the above except for "offline" operations or cases where a performance analysis indicated it was required to meet functional requirements.
Happy coding.

Related

When to hardcode data inside the source code, when to use the database, and when to use a web service?

Consider the class below where some data related to the product and its components is hardcoded into the source code.
class ProductCharacteristics
{
private $model;
function __construct($model)
{
$this->model = $model;
//Since there are several product models,
//we hardcode each model separately.
//models are 50, 100, 200
//length
$this->length[ 50] = array(5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5);
$this->length[100] = array(5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5);
$this->length[200] = array(5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5);
//weights
$this->weight[ 50] = array(20, 114, 50);
$this->weight[100] = array(68, 192, 68);
$this->weight[200] = array(68, 192, 68);
//descriptions
$this->description[ 50] = array('3"', '3"', 6.50);
$this->description[100] = array('6"', '6"', 6.50);
$this->description[200] = array('6"', '6"', 6.50);
}
public function getLengths()
{
return $this->length[$this->modelNumber];
}
public function getWeights()
{
return $this->weight[$this->modelNumber];
}
public function getDescriptions()
{
return $this->description[$this->modelNumber];
}
}
//instantiate:
$pc = new ProductCharacteristics(50);
$weight = $pc->getWeight();
print 'weight of component 1 is ' . $weight[0];
print 'weight of component 2 is ' . $weight[1];
Question 1:
Should data of this type (small, rarely changes) be encoded (placed) into the database instead. Why or why not? I am looking for more than just a Yes/No. Looking for a little bit of explanation/history/rationale.
Question 2:
Reason why I chose to hardcode it instead of putting it into the database was because I have the impression that "a call to the database for such small set of data is expensive, and prohibitive". Had I had 2MiB of such data, I would not put it into the source code of course. But since the set was small I put it into the source code with the added benefit that if any of the datum changes, the change is tracked in my source control repository. I wouldn't be able to know about the change if it happened at the database level
I thereby see that hardcoding it into the code is "not a big deal". I already run code, so having an extra file with just data in it is readily accessible.
Question: is it a "big deal" or comparatively "not a big deal" if instead encode that data in the database? That is, if hardcoding data in the source code is O(1), what is the big oh of placing it into the database instead?
Is it similar in {access time, overhead} to hardcoding data in the source code?
I at least see using database as O(2) because we have to engage an outside program, the database system to get the data.
I could make a case that I can also get the data using a web service, but put it at O(3) because it is an outside system and we have to make a call to the outside system and also weight for network latency.
Answer 1
Data that never changes can be hardcoded, obviously.
Data that occasionally/rarely changes is data that still needs to be configurable at some point. Therefore, it should not be hardcoded because it is much easier to re-configure software than to update the source code/compile/re-deploy.
Answer 2
For 99% of cases, it is not a big deal to store data in a database. Otherwise, why would they exist? For database access it is about latency/overhead. If your database server resides on the same OS instance as your program, then there is no network latency, and the overhead will depend on a combination of your database design and the underlying storage architecture (RAM/HDD/SSD). For most projects that do not involve scale in the millions/billions, using any generic database deployment will be fine.
Re question 1:
Put everthing you can that will be used by a query into the database/DBMS. Then the DBMS can use it for optimization, integrity and clarity.
The DBMS can optimize all queries.
Eg: If you use ORM data structure code in combination with a database query then the DBMS might have to loop through a cross product of two tables checking for weight $pc->getWeight() whereas it might have avoided a cross product by joining with ProductCharacteristics earlier. Eg: Some always true things you can tell the DBMS that help it to optimize queries are UNIQUE NOT NULL (including PRIMARY KEY) and FOREIGN KEY constraints.
You can query all the database directly via SQL.
Otherwise the DBMS has most of the data and a generic optimized interface yet you can't query involving your ORM data structure without compiling application code.
You can simplify ORM code.
Since the ORM code is translated to SQL queries, when you use only the database there is ORM functionality that is available that otherwise wouldn't be. Eg: Calculating commmulative functions of weights via SQL window functions.
You can simply query application relationships differing from your ORM data structure.
Eg: It's easy to find a certain component's weight with your ORM data structure, but not easy to find a certain weight's components. But this is equally simple via a DBMS.
You can better maintain integrity.
Eg: The DBMS table format and/or integrity constraints force the equivalent of having the lengths of your arrays the same.
The relational model was designed to solve these sorts of problems with data structures and heirarchical databases. (Read about that.) Use its power.
Re question 2:
It's a big deal. (See Re question 1.)
is it a "big deal" or comparatively "not a big deal" if instead encode
that data in the database?
Your benefits are limited and restrictive.
You are thinking of particular small queries in isolation. Whereas the DBMS exists for arbitrary queries with automatic implmentation with optimization.
I have the impression that "a call to the database for such small set
of data is expensive, and prohibitive".
You are slowing down non-trivial queries.
You are saving a small (in DBMS terms) constant communication & evaluation cost on small queries for large evaluation costs on large queries due to impeded DBMS optimization. The DBMS knows the table is small via statistics. Given a small table and query, all the DBMS is doing is just looping through an array in memory. (And read about SARGability.)
with the added benefit that if any of the datum changes, the change is
tracked in my source control repository
You are introducing an exception.
You are reusing code but, given that all the other data has to be logged/tracked, needlessly. Indeed your code and database should be tracked together. A good DBMS has both update logging and version tracking (including code). Use it. Anyway, you can always track a DBMS UPDATE script in your source control repository.
That is, if hardcoding data in the source code is O(1), what is the big oh of
placing it into the database instead
I at least see using database as O(2) because we have to engage an
outside program, the database system to get the dat
Learn about big-O.
O(1) is O(2) is O(3) is constant. You mean O(1) with different constant factors. Extra levels of implementation are generally at worst constant but at best far better because of optimization using information from a larger scope.
Considering ORM data structure now is "premature optimization" ("the root of all evil"). This sort of engineering tradeoff follows empirical suspicion, investigation and demonstration followed by cost-benefit analysis (including opportunity cost).
Step 0.
Most of thing have been already said. Just for clarifing.
Wikipedia says:
Database is an organized collection of data.
So text file, relational database or even your old-plain-paper notebook are databases.
All kinds of databases have their pros and cons.
Paper notebook has large time of autonomous work, more flexible (you may wright text in different directions, draw pictures etc) and easier to study (only righting and reading skills are required). But for computers it's hardly readable.
Text config files provides human-readable syntax and their primary goal just to divide configuration from realisation (logic of your code).
Relational database used for better concurrent access, it provides optimal write and read speed, helps to organize data structure in terms of tables and relations between them.
Answers.
1. If you don't even plan to change (i.e. replace values, add new settings etc.) data in application or future applications, based on this class - just hardcode. It's not bad, if you teem is rather small (or you are standalone developer). It's simplier.
If you decide to make standalone config, for your data, I suggest plain php file. It's fast and easily to parse (no speciall class or caching). It doesn't do any overhead to your app's performance. This give your ability to share settings accross different classes, also your code becomes better structured.
Php configs are used by Zend Framework and Yii. Symfony prefers store configs in yml, but also supports php, xml and annotations (special kinds of comments, used for store configs).
To prevent warnings and for specify default values I use this class.
If you plan to make some frontend to edit setting (for example through html form in admin app area), use relational database. It's much better for concurrent writes than plain file. Database config is also usefull, if you have fat database layer (triggers for example).
2.
Premature optimization is the root of all evil. [Donald Knuth]
For small, static sets of data, it's negligible to store it or hard code it. You're talking one DB hit to fetch the data and then parsing time vs having it coded. The main performance gain would be that hard coded means opcache saves the data vs hitting the DB every time. Unless we're talking about an application getting hundreds of thousands of views, you're talking less than 1 second of processing on a query that most RDBMS systems (like MySQL) will cache for ready returns (writing > reading for system resource use).
I would say, given the small size, hard coding is perfectly acceptable here.
I strongly recommend to have all of the data in some sort of config files or database for sure. However, there is no restriction on having small data hard-coded, but here's how I explain..
The reason I am saying this is - no matter how much is data - small or big, you will end up editing.
If the data is hard-coded into the code, It's much likely that you will end-up having bad code quality.
My best suggestion is to do something similar, of course if not database..
Create a data file as "data.lengths.php"
<?php
return array(
50 => array(
5.5, 5.5, // can have as many as you want..
)
);
You can prepare same data files for other as well.
and next you can simple use it wherever you would like to use it as.
<?php
$data['length'] = require_once(__DIR__.'/data.lengths.php'); // Assuming both files are in same directory.
Now, this way you will have good code quality and on same side you are not forcing yourself to take long path.
My 2 cents, hope this helps.

Can memcached be used to reduce the load on these SELECT * queries

I have many users polling my php script on an apache server and the mysql query they run is a
"SELECT * FROM `table` WHERE `id`>'example number'
Where example number can vary from user to user, but has a known lower bound which is updated every 10 minute.
The server is getting polled twice a second by each user.
Can memcache by used? It's not crucial that the user is displayed the most uptodate information, if it's a second behind or so that is fine.
The site has 200 concurrent users at peak times. It's hugely inefficient and costing a lot of resources.
To give an accurate answer, I will need more information
Whether the query is pulling personalised information.
Whether the polling request got the 'example number' coming along with the request.
looking at the way you have structured your question , it doesn't seem like the user is polling for any personalised information. So I assume the 'example number' is also coming as a part of the polling request.
I agree to #roberttstephens and #Filippos Karapetis , That you can use ideal solutions
Redis
NoSQL
Tune the MySQL
Memcache
But as you guys have the application already out there in the wild, implementing above solutions will have a cost, so these are the practical solutions I would recommend.
Add indexes for your table wrt to relevant columns. [first thing to check /do]
Enable mysql query caching.
Use a reverse proxy - eg : varnish . [assumption 'example number' comes as a part of the request]
To intersect the requests even before it hits your application server so that the MySQL query , MemCache/ Redis lookup doesn't happen.
Make sure that you are setting specific cache headers set on the response so that varnish caches it.
So, of the 200 concurrent requests , if 100 of them are querying for same number varnish takes the hit. [it is the same advantage that memcache can also offer].
Implementation wise it doesn't cost much in terms of development / testing efforts.
I understand this is not the answer to the exact question . But I am sure this could solve your problem.
If the 'example number' doesn't come as a part of the request , and you have to fetch it from the DB [by looking at the user table may be..] Then #roberttstephens approach is the way to go. just to give you the exact picture , I have refactored the code a little.
`addServer('localhost', 11211);
$inputNumber = 12345;
$cacheKey = "poll:".$inputNumber;
$result = $m->get($cacheKey);
if ($result) {
return unserialize($result);
}
$sth = $dbh->prepare("SELECT column1, column2 FROM poll WHERE id = $inputNumber");
$sth->execute();
$poll_results = $sth->fetch(PDO::FETCH_OBJ);
$m->set($cacheKey, serialize($poll_results));`
In my opinion, you're trying to use the wrong tool for the job here.
memcached is a key/value storage, so you can make it store and retrieve several values with a given set of keys very quickly. However, you don't seem to know the keys you want in advance, since you're looking for all records where the id is GREATER THAN a number, rather than a collection of IDs. So, in my opinion, memcached won't be appropriate to use in your scenario.
Here are your options:
Option 1: keep using MySQL and tune it properly
MySQL is quite fast if you tune it properly. You can:
add the appropriate indexes to each table
use prepared statements, which can help performance-wise in your case, as users are doing the same query over and over with different parameters
use query caching
Here's a guide with some hints on MySQL tuning, and mysqltuner, a Perl script that can guide you through the options needed to optimize your MySQL database.
Option 2: Use a more advanced key-value storage
There are alternatives to memcached, with the most known one being redis. redis does allow more flexibility, but it's more complex than memcached. For your scenario, you could use the redis zrange command to retrieve the results you want - have a look at the available redis commands for more information.
Option 3: Use a document storage NoSQL database
You can use a document storage NoSQL database, with the most known example being MongoDB.
You can use more complex queries in MongoDB (e.g. use operators, like "greater than", which you require) than you can do in memcached. Here's an example of how to search through results in a mongo collection using operators (check example 2).
Have a look at the PHP MongoDB manual for more information.
Also, this SO question is an interesting read regarding document storage NoSQL databases.
You can absolutely use memcached to cache the results. You could instead create a cache table in mysql with less effort.
In either case, you would need to create an id for the cache, and retrieve the results based on that id. You could use something like entity_name:entity_id, or namespace:entity_name:entity_id, or whatever works for you.
Keep in mind, memcached is another service running on the server. You have to install it, set it up to start on reboot (or you should at least), allocate memory, etc. You'll also need php-memcached.
With that said, please view the php documentation on memcached. http://php.net/manual/en/memcached.set.php . Assuming your poll id is 12345, you could use memcached like so.
<?php
// Get your results however you normally would.
$sth = $dbh->prepare("SELECT column1, column2 FROM poll WHERE id = 12345");
$sth->execute();
$poll_results = $sth->fetch(PDO::FETCH_OBJ);
// Set up memcached. It should obviously be installed, configured, and running by now.
$m = new Memcached();
$m->addServer('localhost', 11211);
$m->set('poll:12345', serialize($poll_results));
This example doesn't have any error checking or anything, but this should explain how to do it. I also don't have a php, mysql, or memcached instance running right now, so the above hasn't been tested.

MySQL optimization: Perform Maths operation inside or outside of a query?

I have a strong feeling that all mathematical operations unnecessary to the query itself ought to be preformed outside of the query. For example:
$result = mysql_query(SELECT a, a*b/c as score FROM table)
while ($row = mysql_fetch_assoc($result))
{
echo $row['a'].' score: '.$row['score'].<br>;
}
vs:
$result = mysql_query(SELECT a, b, c FROM table)
while ($row = mysql_fetch_assoc($result))
{
echo $row['a'].' score: '.$row['a']*$row['b']/$row['c'].<br>;
}
the second option would usually be better, especially with complex table joins & such. This is my suspicion, I only lack confirmation . . .
Faster depends on the machines involved, if you're talking about faster for one user. If you're talking about faster for a million users hitting a website, then it's more efficient to do these calculations in PHP.
The load of a webserver running PHP is very easily distributed over a large number of machines. These machines can run in parallel, handling requests from visitors and fetching necessary information from the database. The database, however, is not easy to run in parallel. Issues such as replication or sharding are complex and can require specialty software and properly organized data to function well. These are expensive solutions compared to adding another PHP installation to a server array.
Because of this, the value of a CPU cycle on the database machine is far more valuable than one on the webserver. So you should perform these math functions on the webserver where CPU cycles are cheaper and significantly more easy to parallelize.
This also assumes that the database isn't holding open any sort of data lock while performing the calculation. If so, then you're not just using precious CPU cycles, you're locking data from other users directly.
My feeling would be that doing the maths in the database would be slightly more efficient in the long run, given your query setup. With the select a,b,c version, PHP has to create 3 elements and populate them for each row fetched.
With the in-database version, only 2 elements are created, so you've cut creation time by 33%. Either way, the calculation has to be done, so there's not much in the way of savings there.
Now, if you actually needed the b and c values to be exposed to your code, then there'd be no point in doing the calculation in the database, you'd be adding more fields to the result set with their attendant creation/processing/populating overhead.
Regardless, though, you should benchmark both version. What works in one situation may be worse than useless in another, and only some testing will show which is better.
I'd agree in general. Pull data from source in your query, manipulate data in the calling/scripting environment.
I wouldn't worry too much about efficiency/speed unless your queries get really complex, but it still seems like the right thing to do.
Math in the query is generally not a problem, UNLESS it is in the WHERE clause. Example:
SELECT a, b, c FROM table WHERE a*b=c
This makes it rather impossible to use an index.
SELECT a*b/c FROM table
Is fine.
If there is any performance advantage of one way over the other it is likely going to be very negligible making it more a matter of preference than optimization.
I prefer it in the query, personally because I feel it encapsulates the calculation in the data tier.
Also, although it doesn't apply to your specific example, the more information you give the DB engine about what you are ultimately trying to do, the more information it has to feed the query optimizer. It seems theoretically possible that the query might actually run faster if you put the calculation in the SQL.
Do it in the database is better because you can run the application in one machine and the database in another, that said, I will balance your overall performance. Specially in cheap hosting services, they generally do that, application in one machine database in another.
I doubt it could be a bottleneck.
especially with complex table joins & such, where one filesort will outcome these maths by factor of 1000s
However, you can always perpend your query with BENCHMARK keyword and take some measurements
BENCHMARK 1000 SELECT a, a*b/c as score FROM table

What should I do to make mysql 100% optimal?

Recently I've been doing quite a big project with php + mysql. And now I'm concerned about my mysql. What should I do to make my mysql as optimal as possible? Tell everything you know, I'll be really very grateful.
Second question, I use one mysql query per page load which takes information from mysql. It's quite a big query, because I take information from a few tables with a join. Maybe I should do something else?
Thank you.
Some top tips from MySQL Performance tips forge
Specific Query Performance:
Use EXPLAIN to profile the query
execution plan
Use Slow Query Log (always have it
on!)
Don't use DISTINCT when you have or
could use GROUP BY Insert
performance
Batch INSERT and REPLACE
Use LOAD DATA instead of INSERT
LIMIT m,n may not be as fast as it
sounds
Don't use ORDER BY RAND() if you
have > ~2K records
Use SQL_NO_CACHE when you are
SELECTing frequently updated data or
large sets of data
Avoid wildcards at the start of LIKE
queries
Avoid correlated subqueries and in
select and where clause (try to
avoid in)
Scaling Performance Tips:
Use benchmarking
isolate workloads don't let administrative work interfere with customer performance. (ie backups)
Debugging sucks, testing rocks!
As your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.
Network Performance Tips:
Minimize traffic by fetching only what you need.
1. Paging/chunked data retrieval to limit
2. Don't use SELECT *
3. Be wary of lots of small quick queries if a longer query can be more efficient
Use multi_query if appropriate to reduce round-trips
Use stored procedures to avoid bandwidth wastage
OS Performance Tips:
Use proper data partitions
1. For Cluster. Start thinking about Cluster before you need them
Keep the database host as clean as possible. Do you really need a windowing system on that server?
Utilize the strengths of the OS
pare down cron scripts
create a test environment
Learn to use the explain tool.
Three things:
Joins are not necessarily suboptimal. Oftentimes schemata that use joins will be faster than those that achieve the same but avoid table joins. The important thing is to know that your joins are optimal. EXPLAIN is very helpful but you also need to know how indexes work.
If you're grabbing data from the DB on every page hit, consider if a cacheing system would work for you. If so, check out PHP memcache and memcached. It's easy to use in PHP and very fast. It's popular for a reason.
Back to mysql: make sure you're key buffer is sized correctly. You can also think about using dedicated key buffers for critical indices that should remain in cache. Read about CACHE INDEX and LOAD INDEX INTO CACHE. See also here.
"...because I take information from a few tables with a join"
Joins, even "big" joins aren't bad. Just be sure that you have good indexes.
Also note that performance with a couple of records is a lot different than performance with hundreds of thousands of records, so test accordingly.
For performance, this book is good: High Perofmanace MYSQL. The associated blog is good too.
my 2cents: set your log_slow_queries to <2sec and use mysqlsla (get it from hackmysql.com) to analyse the 'slow' queries... Thisway you can just drilldown into the slower queries as they come along...
(the mysqlsla can also benefit from the log-queries-not-using-indexes option)
on mysqlhack.com there's a script called 'mysqlreport' that gives estimates on how your installation is runnig... (once it's running a while) and also gives pointers as to where to tune your setup more precisely...
Being perfect is a bit of a challenge and not the first target to set yourself.
Enable mysql logging of all queries, and write some code which parses the log files and removes any literal values from the SQL statements.
e.g. changes
SELECT * FROM atable WHERE something=5 AND other='splodgy';
and
SELECT * FROM atable WHERE something=1 AND other='zippy';
to something like:
SELECT * FROM atable WHERE something=:1 AND other=:2;
(Sorry, I've not got my code which does this to hand - but it's not rocket science)
Then shove the re-written log into a table so you can prioritize your performance fixes based on length and frequency of execution.

How do you manage SQL Queries

At the moment my code (PHP) has too many SQL queries in it. eg...
// not a real example, but you get the idea...
$results = $db->GetResults("SELECT * FROM sometable WHERE iUser=$userid");
if ($results) {
// Do something
}
I am looking into using stored procedures to reduce this and make things a little more robust, but I have some concerns..
I have hundreds of different queries in use around the web site, and many of them are quite similar. How should I manage all these queries when they are removed from their context (the code that uses the results) and placed in a stored procedure on the database?
The best course of action for you will depend on how you are approaching your data access. There are three approaches you can take:
Use stored procedures
Keep the queries in the code (but put all your queries into functions and fix everything to use PDO for parameters, as mentioned earlier)
Use an ORM tool
If you want to pass your own raw SQL to the database engine then stored procedures would be the way to go if all you want to do is get the raw SQL out of your PHP code but keep it relatively unchanged. The stored procedures vs raw SQL debate is a bit of a holy war, but K. Scott Allen makes an excellent point - albeit a throwaway one - in an article about versioning databases:
Secondly, stored procedures have fallen out of favor in my eyes. I came from the WinDNA school of indoctrination that said stored procedures should be used all the time. Today, I see stored procedures as an API layer for the database. This is good if you need an API layer at the database level, but I see lots of applications incurring the overhead of creating and maintaining an extra API layer they don't need. In those applications stored procedures are more of a burden than a benefit.
I tend to lean towards not using stored procedures. I've worked on projects where the DB has an API exposed through stored procedures, but stored procedures can impose some limitations of their own, and those projects have all, to varying degrees, used dynamically generated raw SQL in code to access the DB.
Having an API layer on the DB gives better delineation of responsibilities between the DB team and the Dev team at the expense of some of the flexibility you'd have if the query was kept in the code, however PHP projects are less likely to have sizable enough teams to benefit from this delineation.
Conceptually, you should probably have your database versioned. Practically speaking, however, you're far more likely to have just your code versioned than you are to have your database versioned. You are likely to be changing your queries when you are making changes to your code, but if you are changing the queries in stored procedures stored against the database then you probably won't be checking those in when you check the code in and you lose many of the benefits of versioning for a significant area of your application.
Regardless of whether or not you elect not to use stored procedures though, you should at the very least ensure that each database operation is stored in an independent function rather than being embedded into each of your page's scripts - essentially an API layer for your DB which is maintained and versioned with your code. If you're using stored procedures, this will effectively mean you have two API layers for your DB, one with the code and one with the DB, which you may feel unnecessarily complicates things if your project does not have separate teams. I certainly do.
If the issue is one of code neatness, there are ways to make code with SQL jammed in it more presentable, and the UserManager class shown below is a good way to start - the class only contains queries which relate to the 'user' table, each query has its own method in the class and the queries are indented into the prepare statements and formatted as you would format them in a stored procedure.
// UserManager.php:
class UserManager
{
function getUsers()
{
$pdo = new PDO(...);
$stmt = $pdo->prepare('
SELECT u.userId as id,
u.userName,
g.groupId,
g.groupName
FROM user u
INNER JOIN group g
ON u.groupId = g.groupId
ORDER BY u.userName, g.groupName
');
// iterate over result and prepare return value
}
function getUser($id) {
// db code here
}
}
// index.php:
require_once("UserManager.php");
$um = new UserManager;
$users = $um->getUsers();
foreach ($users as $user) echo $user['name'];
However, if your queries are quite similar but you have huge numbers of permutations in your query conditions like complicated paging, sorting, filtering, etc, an Object/Relational mapper tool is probably the way to go, although the process of overhauling your existing code to make use of the tool could be quite complicated.
If you decide to investigate ORM tools, you should look at Propel, the ActiveRecord component of Yii, or the king-daddy PHP ORM, Doctrine. Each of these gives you the ability to programmatically build queries to your database with all manner of complicated logic. Doctrine is the most fully featured, allowing you to template your database with things like the Nested Set tree pattern out of the box.
In terms of performance, stored procedures are the fastest, but generally not by much over raw sql. ORM tools can have a significant performance impact in a number of ways - inefficient or redundant querying, huge file IO while loading the ORM libraries on each request, dynamic SQL generation on each query... all of these things can have an impact, but the use of an ORM tool can drastically increase the power available to you with a much smaller amount of code than creating your own DB layer with manual queries.
Gary Richardson is absolutely right though, if you're going to continue to use SQL in your code you should always be using PDO's prepared statements to handle the parameters regardless of whether you're using a query or a stored procedure. The sanitisation of input is performed for you by PDO.
// optional
$attrs = array(PDO::ATTR_PERSISTENT => true);
// create the PDO object
$pdo = new PDO("mysql:host=localhost;dbname=test", "user", "pass", $attrs);
// also optional, but it makes PDO raise exceptions instead of
// PHP errors which are far more useful for debugging
$pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$stmt = $pdo->prepare('INSERT INTO venue(venueName, regionId) VALUES(:venueName, :regionId)');
$stmt->bindValue(":venueName", "test");
$stmt->bindValue(":regionId", 1);
$stmt->execute();
$lastInsertId = $pdo->lastInsertId();
var_dump($lastInsertId);
Caveat: assuming that the ID is 1, the above script will output string(1) "1". PDO->lastInsertId() returns the ID as a string regardless of whether the actual column is an integer or not. This will probably never be a problem for you as PHP performs casting of strings to integers automatically.
The following will output bool(true):
// regular equality test
var_dump($lastInsertId == 1);
but if you have code that is expecting the value to be an integer, like is_int or PHP's "is really, truly, 100% equal to" operator:
var_dump(is_int($lastInsertId));
var_dump($lastInsertId === 1);
you could run into some issues.
Edit: Some good discussion on stored procedures here
First up, you should use placeholders in your query instead of interpolating the variables directly. PDO/MySQLi allow you to write your queries like:
SELECT * FROM sometable WHERE iUser = ?
The API will safely substitute the values into the query.
I also prefer to have my queries in the code instead of the database. It's a lot easier to work with an RCS when the queries are with your code.
I have a rule of thumb when working with ORM's: if I'm working with one entity at a time, I'll use the interface. If I'm reporting/working with records in aggregate, I typically write SQL queries to do it. This means there's very few queries in my code.
I had to clean up a project wich many (duplicate/similar) queries riddled with injection vulnerabilities.
The first steps I took were using placeholders and label every query with the object/method and source-line the query was created.
(Insert the PHP-constants METHOD and LINE into a SQL comment-line)
It looked something like this:
-- #Line:151 UserClass::getuser():
SELECT * FROM USERS;
Logging all queries for a short time supplied me with some starting points on which queries to merge. (And where!)
I'd move all the SQL to a separate Perl module (.pm) Many queries could reuse the same functions, with slightly different parameters.
A common mistake for developers is to dive into ORM libraries, parametrized queries and stored procedures. We then work for months in a row to make the code "better", but it's only "better" in a development kind of way. You're not making any new features!
Use complexity in your code only to address customer needs.
Use a ORM package, any half decent package will allow you to
Get simple result sets
Keep your complex SQL close to the data model
If you have very complex SQL, then views are also nice to making it more presentable to different layers of your application.
We were in a similar predicament at one time. We queried a specific table in a variety of ways, over 50+.
What we ended up doing was creating a single Fetch stored procedure that includes a parameter value for the WhereClause. The WhereClause was constructed in a Provider object, we employed the Facade design pattern, where we could scrub it for any SQL injection attacks.
So as far as maintenance goes, it is easy to modify. SQL Server is also quite the chum and caches the execution plans of dynamic queries so the the overall performance is pretty good.
You'll have to determine the performance drawbacks based on your own system and needs, but all and all, this works very well for us.
There are some libraries, such as MDB2 in PEAR that make querying a bit easier and safer.
Unfortunately, they can be a bit wordy to set up, and you sometimes have to pass them the same info twice. I've used MDB2 in a couple of projects, and I tended to write a thin veneer around it, especially for specifying the types of fields. I generally make an object that knows about a particular table and its columns, and then a helper function in that fills in field types for me when I call an MDB2 query function.
For instance:
function MakeTableTypes($TableName, $FieldNames)
{
$Types = array();
foreach ($FieldNames as $FieldName => $FieldValue)
{
$Types[] = $this->Tables[$TableName]['schema'][$FieldName]['type'];
}
return $Types;
}
Obviously this object has a map of table names -> schemas that it knows about, and just extracts the types of the fields you specify, and returns an matching type array suitable for use with an MDB2 query.
MDB2 (and similar libraries) then handle the parameter substitution for you, so for update/insert queries, you just build a hash/map from column name to value, and use the 'autoExecute' functions to build and execute the relevant query.
For example:
function UpdateArticle($Article)
{
$Types = $this->MakeTableTypes($table_name, $Article);
$res = $this->MDB2->extended->autoExecute($table_name,
$Article,
MDB2_AUTOQUERY_UPDATE,
'id = '.$this->MDB2->quote($Article['id'], 'integer'),
$Types);
}
and MDB2 will build the query, escaping everything properly, etc.
I'd recommend measuring performance with MDB2 though, as it pulls in a fair bit of code that might cause you problems if you're not running a PHP accelerator.
As I say, the setup overhead seems daunting at first, but once it's done the queries can be simpler/more symbolic to write and (especially) modify. I think MDB2 should know a bit more about your schema, which would simpify some of the commonly used API calls, but you can reduce the annoyance of this by encapsulating the schema yourself, as I mentioned above, and providing simple accessor functions that generate the arrays MDB2 needs to perform these queries.
Of course you can just do flat SQL queries as a string using the query() function if you want, so you're not forced to switch over to the full 'MDB2 way' - you can try it out piecemeal, and see if you hate it or not.
This other question also has some useful links in it...
Use a ORM framework like QCodo - you can easily map your existing database
I try to use fairly generic functions and just pass the differences in them. This way you only have one function to handle most of your database SELECT's. Obviously you can create another function to handle all your INSERTS.
eg.
function getFromDB($table, $wherefield=null, $whereval=null, $orderby=null) {
if($wherefield != null) {
$q = "SELECT * FROM $table WHERE $wherefield = '$whereval'";
} else {
$q = "SELECT * FROM $table";
}
if($orderby != null) {
$q .= " ORDER BY ".$orderby;
}
$result = mysql_query($q)) or die("ERROR: ".mysql_error());
while($row = mysql_fetch_assoc($result)) {
$records[] = $row;
}
return $records;
}
This is just off the top of my head, but you get the idea. To use it just pass the function the necessary parameters:
eg.
$blogposts = getFromDB('myblog', 'author', 'Lewis', 'date DESC');
In this case $blogposts will be an array of arrays which represent each row of the table. Then you can just use a foreach or refer to the array directly:
echo $blogposts[0]['title'];

Categories