Six years ago I started a new PHP OOP project without having any experience so I just made it up as I went along. Anyway, I noticed that my rather powerful mySQL server sometimes gets bogged down far too easily and was wondering what the best way to limit some db activity, when I came up with this, as an example...
private $q_enghours;
public function gEngHours() {
if ( isset($this->q_enghours) ) {
} else {
$q = "SELECT q_eh FROM " . quQUOTES . " WHERE id = " . $this->id;
if ($r = $this->_dblink->query($q)) {
$row = $r->fetch_row();
$r->free();
$this->q_enghours = $row[0];
}
else {
$this->q_enghours = 0;
}
}
return $this->q_enghours;
}
This seems like it should be effective in greatly reducing the necessary reads to the db. If the object property is populated, no need to access the db. Note that there are almost two dozen classes all with the same db access routines for the "getter". I've only implemented this change in one place and was wondering if there is a "best practice" for this that I may have missed before I re-write all the classes.
I'd say that this question is rather based on wrong premises.
If you want to deal with "easily bogged down" database, then you have to dig up the particular reason, instead of just making guesses. These trifle reads you are so concerned with, in reality won't make any difference. You have to profile your whole application and find the real cause.
If you want to reduce number of reads, then make your object to map certain database record, by reading that record and populating all the properties once, at object creation. Constructors are made for it.
As a side note, you really need a good database wrapper, just to reduce the amount of code you have to write for each database call, so, this code can be written as
public function gEngHours() {
if ( !isset($this->q_enghours) ) {
$this->q_enghours = $this->db->getOne("SELECT q_eh FROM ?n WHERE id = ?", quQUOTES, $this->id);
}
return $this->q_enghours;
}
where getOne() method is doing all the job of running the query, fetching row, getting first result from it and many other thinks like proper error handling and making query safe.
Related
I've been working on a PHP project for about 6 month where I'm using a multitier architecture (Data Access Layer , Business Logic Layer, ...).
One of the page of my website has a really long loading time, so I investigated and I realised it was the function from a BLL file that took a longer time to process.
Here's a example of a function in my DAL (here called DAL_Releve) files :
public static function getReleveByAffectationID($id){
$conn = MgtConnexion::getConnexion();
$sql = "SELECT *"
. "FROM releve "
. "WHERE Affectation_ID = :id "
. "AND Date_Releve >= :date ;";
$req = $conn->prepare($sql);
$req->bindValue(":id", $id);
$req->bindValue(":date", $_SESSION["year"]-1 ."-01-01");
$req->execute();
MgtConnexion::closeConnexion($conn);
return $req;
}
Here's a example of a function in my BLL (here called MgtReleve) files :
public static function getReleveByAffectationID($id){
$req = DAL_Releve::getReleveByAffectationID($id);
$releveArray = array();
while($row = $req->fetch(PDO::FETCH_ASSOC)){
$releveArray[] = new releve($row);
}
return $releveArray;
}
As you can see, I return the PDOStatement object $req from the DAL and then fetch it in the BLL.
My problem is here : In my interface file, I'm looping through a array of Affectation and each of them have Releves (so each Releves have an attribute Affectation_ID), like this:
$lesAff = MgtAffectation::getAllAffectation();
//This doesn't return an array of objects but just an Array of arrays like : $array[<number>]["key"] = value;
foreach ($lesAff as $affect){
$lesReleves = MgtReleve::getReleveByAffectationID($affect["Affectation_ID"]);
foreach ($lesReleves as $rel){
//DO STUFF HERE
}
}
FYI : The page loads in 6.622 seconds with the code above and in 0.014 seconds if I remove everything inside the first foreach (except for html code which is not showed here).
Now I'm starting to question my understanding of the multitiers architecture :
If I decide not to fetch $req in my BLL file but in my interface (to loop only once in my queries results instead of twice), what's the purpose of the BLL then if only to return the PDOStatement object? Is there another way to return the data with more efficiency ?
There are many areas of optimization:
Technological optimization: instead of connecting to the database every time you are running a query, you should connect only once and then use this connection through the entire application. I wrote an article on the common errors of Database wrappers, you may find it interesting to read, and especially the part on the database connection.
Architectural optimization. It seems that you could benefit from lazy loading when instead of selecting database rows one by one you could do it in batches. Check this answer on how to run a PDO query with multiple ids at once. So it will take you to run only 68 queries instead of 1200
Database optimization. Make sure that all queries involved are running fast.
I have a website with lots of PHP files (really a lot...), which use the pg_query and pg_exec functions which do not
escape the apostrophe in Postgre SQL queries.
However, for security reasons and the ability to store names with
apostrophe in my database I want to add an escaping mechanism for my database input. A possible solution is to go
through every PHP file and change the pg_query and pg_exec to use pg_query_params but it is both time consuming
and error prone. A good idea would be to somehow override the pg_query and pg_exec to wrapper functions that would
do the escaping without having to change any PHP file but in this case I guess I will have to change PHP function
definitions and recompile it which is not very ideal.
So, the question is open and any ideas that would
allow to do what I want with minimum time consumption are very welcome.
You post no code but I guess you have this:
$name = "O'Brian";
$result = pg_query($conn, "SELECT id FROM customer WHERE name='{$name}'");
... and you'd need to have this:
$name = "O'Brian";
$result = pg_query_params($conn, 'SELECT id FROM customer WHERE name=$1', array($name));
... but you think the task will consume an unreasonable amount of time.
While it's certainly complex, what alternatives do you have? You cannot override pg_query() but it'd be extremely simple to search and replace it for my_pg_query(). And now what? Your custom function will just see strings:
SELECT id FROM customer WHERE name='O'Brian'
SELECT id FROM customer WHERE name='foo' OR '1'='1'
Even if you manage to implement a bug-free SQL parser:
It won't work reliably with invalid SQL.
It won't be able to determine whether the query is the product of intentional SQL injection.
Just take it easy and fix queries one by one. It'll take time but possibly not as much as you think. Your app will be increasingly better as you progress.
This is a perfect example of when a database layer and associated API will save you loads of time. A good solution would be to make a DB class as a singleton, which you can instantiate from anywhere in your app. A simple set of wrapper functions will allow you to make all queries to the DB go through one point, so you can then alter the way they work very easily. You can also change from one DB to another, or from one DB vendor to another without touching the rest of the app.
The problem you are having with escaping is properly solved by using the PDO interface, instead of functions like pg_query(), which makes escaping unnecessary. Seeing as you'll have to alter everywhere in your app that uses the DB, you may as well refactor to use this pattern at the same time as it'll be the same amount of work.
class db_wrapper {
// Singleton stuff
private $instance;
private function __construct() {
// Connect to DB and store connection somewhere
}
public static function get_db() {
if (isset($instance)) {
return $instance;
}
return $instance = new db_wrapper();
}
// Public API
public function query($sql, array $vars) {
// Use PDO to connect to database and execute query
}
}
// Other parts of your app look like this:
function do_something() {
$db = db_wrapper::get_db();
$sql = "SELECT * FROM table1 WHERE column = :name";
$params = array('name' => 'valuename');
$result = $db->query($sql, $params);
// Use $result for something.
}
I have recently begun working on a pre-existing PHP application, and am having a bit of trouble when it comes to exactly what I should to improve this application's scalability. I am from a C#.NET background, so the idea of separation of concerns is not new to me.
First off I want to tell you what exactly I am looking for. Currently this application is fairly well-sized, consisting of about 70 database tables and ~200 pages. For the most part these pages simply do CRUD operations on the database, rarely anything actually programmatically intensive. Almost all of the pages are currently using code akin to this:
$x = db->query("SELECT ID, name FROM table ORDER BY name");
if ($x->num_rows > 0) {
while ($y = $x->fetch_object()) {
foreach ($y as $key => $value)
$$key = $value;
Obviously, this is not great for an application that is as large as the one I am supposed to be working on. Now I, being from a C#.NET background, am used to building small-scale applications, and generally building a small Business Object for each object in my application (almost a 1 to 1 match with my database tables). They follow the form of this:
namespace Application
{
public class Person
{
private int _id = -1;
private string _name = "";
private bool _new = false;
public int ID
{
get{ return _id; }
}
public string Name
{
get { return _name; }
set { _name = value; }
}
public Person(int id)
{
DataSet ds = SqlConn.doQuery("SELECT * FROM PERSON WHERE ID = " + id + ";");
if (ds.Tables[0].Rows.Count > 0)
{
DataRow row = ds.Tables[0].Rows[0];
_id = id;
_name = row["Name"].ToString();
}
else
throw new ArgumentOutOfRangeException("Invalid PERSON ID passed, passed ID was " + id);
}
public Person()
{
_new = true;
}
public void save()
{
if (_new)
{
doQuery(" INSERT INTO PERSON " +
" ([Name]) " +
" VALUES " +
" ('" + Name.Replace("'", "''") + "'); ");
}
else
{
SqlConn.doNonQuery(" UPDATE PERSON " +
" SET " +
" [Name] = '" + _name.Replace("'", "''") + "' WHERE " +
" ID = " + _id);
}
}
In shorter terms, they simply have attributes that mimic the table, properties, a constructor that pulls the information, and a method called save that will commit any changes made to the object.
On to my actual question: first of all, is this a good way of doing things? It has always worked well for me. Second, and more importantly, is this also a good way to do things in PHP? I've noticed already that PHP's loose typing has caused me issues in the way I do things, also no method overloading is entirely new. If this isn't a good practice are there any good examples of how I should adapt this project to 3 tiers? Or, possibly I should not attempt to do this for some reason.
I am sorry for the length of the question, but I have been digging on the internet for a week or so to no avail, all I seem to ever find are syntax standards, not structure standards.
Thanks in advance.
There are several patterns that one can follow. I like to use the MVC pattern myself. MVC is about splitting user interface interaction into three distinct roles. There are other patterns though, but for clarity (end length of the post I will only go into MVC). It doesn't really matter which pattern you wish to follow as long as you use the SOLID principles.
I am from a C#.NET background, so the idea of separation of concerns is not new to me.
That's great. Trying to learn PHP will only be an implementation detail at this point.
Obviously, this is not great for an application that is as large as the one I am supposed to be working on.
Glad that we agree :-)
In an MVC pattern you would have:
Model: which does 'all the work'
View: which takes care of the presentation
Controller: which handles requests
When requesting a page it will be handled by the Controller. If the controller needs to get some work done (e.g. getting a user) it would ask the model to do this. When the controller has all the needed info it will render a view. The view only renders all the info.
That person object you were talking about would be a model in the MVC pattern.
I've noticed already that PHP's loose typing has caused me issues in the way I do things
PHP loose typing is pretty sweet (if you know what you are doing) and at the same time can suck. Remember that you can always do strict comparisons by using three = signs:
if (1 == true) // truthy
if (1 === true) // falsy
also no method overloading is entirely new
PHP indeed doesn't really support method overloading, but it can be mimicked. Consider the following:
function doSomething($var1, $var2 = null)
{
var_dump($var1, $var2);
}
doSomething('yay!', 'woo!'); // will dump yay! and woo!
doSomething('yay!'); // will dump yay! and null
Another way would be to do:
function doSomething()
{
$numargs = func_num_args();
$arg_list = func_get_args();
for ($i = 0; $i < $numargs; $i++) {
echo "Argument $i is: " . $arg_list[$i] . "<br />\n";
}
}
doSomething('now I can add any number of args', array('yay!', 'woo!', false));
Or, possibly I should not attempt to do this for some reason.
Of course you should attempt it. When you already have programming experience it shouldn't be too hard.
If you have any more questions you can can find me idling in the PHP chat tomorrow (I'm really need to get some sleep now :-) ).
is this a good way of doing things?
There is no ideal way; there is good and bad for determined situation.
About typing, PHP doesn't loose it; we call that dynamic typing.
About the architecture, when we develop enterprise level web applications, from medium to large scale, we tend to talk about performance on a daily basis.
A good point to start would be reading about some of the most known PHP frameworks, like:
Zend
Symfony
Kohana
From that you can start thinking about architecture.
I realize the knee-jerk response to this question is that "you dont.", but hear me out.
Basically I am running on an active-record system on a SQL, and in order to prevent duplicate objects for the same database row I keep an 'array' in the factory with each currently loaded object (using an autoincrement 'id' as the key).
The problem is that when I try to process 90,000+ rows through this system on the odd occasion, PHP hits memory issues. This would very easily be solved by running a garbage collect every few hundred rows, but unfortunately since the factory stores a copy of each object - PHP's garbage collection won't free any of these nodes.
The only solution I can think of, is to check if the reference count of the objects stored in the factory is equal to one (i.e. nothing is referencing that class), and if so free them. This would solve my issue, however PHP doesn't have a reference count method? (besides debug_zval_dump, but thats barely usable).
Sean's debug_zval_dump function looks like it will do the job of telling you the refcount, but really, the refcount doesn't help you in the long run.
You should consider using a bounded array to act as a cache; something like this:
<?php
class object_cache {
var $objs = array();
var $max_objs = 1024; // adjust to fit your use case
function add($obj) {
$key = $obj->getKey();
// remove it from its old position
unset($this->objs[$key]);
// If the cache is full, retire the eldest from the front
if (count($this->objs) > $this->max_objs) {
$dead = array_shift($this->objs);
// commit any pending changes to db/disk
$dead->flushToStorage();
}
// (re-)add this item to the end
$this->objs[$key] = $obj;
}
function get($key) {
if (isset($this->objs[$key])) {
$obj = $this->objs[$key];
// promote to most-recently-used
unset($this->objs[$key]);
$this->objs[$key] = $obj;
return $obj;
}
// Not cached; go and get it
$obj = $this->loadFromStorage($key);
if ($obj) {
$this->objs[$key] = $obj;
}
return $obj;
}
}
Here, getKey() returns some unique id for the object that you want to store.
This relies on the fact that PHP remembers the order of insertion into its hash tables; each time you add a new element, it is logically appended to the array.
The get() function makes sure that the objects you access are kept at the end of the array, so the front of the array is going to be least recently used element, and this is the one that we want to dispose of when we decide that space is low; array_shift() does this for us.
This approach is also known as a most-recently-used, or MRU cache, because it caches the most recently used items. The idea is that you are more likely to access the items that you have accessed most recently, so you keep them around.
What you get here is the ability to control the maximum number of objects that you keep around, and you don't have to poke around at the php implementation details that are deliberately difficult to access.
It seems like the best answer was still getting the reference count, although debug_zval_dump and ob_start was too ugly a hack to include in my application.
Instead I coded up a simple PHP module with a refcount() function, available at: http://github.com/qix/php_refcount
Yes, you can definitely get the refcount from PHP. Unfortunately, the refcount isn't easily gotten for it doesn't have an accessor built into PHP. That's ok, because we have PREG!
<?php
function refcount($var)
{
ob_start();
debug_zval_dump($var);
$dump = ob_get_clean();
$matches = array();
preg_match('/refcount\(([0-9]+)/', $dump, $matches);
$count = $matches[1];
//3 references are added, including when calling debug_zval_dump()
return $count - 3;
}
?>
Source: PHP.net
I know this is a very old issue, but it still came up as a top result in a search so I thought I'd give you the "correct" answer to your problem.
Unfortunately getting the reference count as you've found is a minefield, but in reality you don't need it for 99% of problems that might want it.
What you really want to use is the WeakRef class, quite simply it holds a weak reference to an object, which will expire if there are no other references to the object, allowing it to be cleaned up by the garbage collector. It needs to be installed via PECL, but it really is something you want in every PHP installation.
You would use it like so (please forgive any typos):
class Cache {
private $max_size;
private $cache = [];
private $expired = 0;
public function __construct(int $max_size = 1024) { $this->max_size = $max_size; }
public function add(int $id, object $value) {
unset($this->cache[$id]);
$this->cache[$id] = new WeakRef($value);
if ($this->max_size > 0) && ((count($this->cache) > $this->max_size)) {
$this->prune();
if (count($this->cache) > $this->max_size) {
array_shift($this->cache);
}
}
}
public function get(int $id) { // ?object
if (isset($this->cache[$id])) {
$result = $this->cache[$id]->get();
if ($result === null) {
// Prune if the cache gets too empty
if (++$this->expired > count($this->cache) / 4) {
$this->prune();
}
} else {
// Move to the end so it is culled last if non-empty
unset($this->cache[$id]);
$this->cache[$id] = $result;
}
return $result;
}
return null;
}
protected function prune() {
$this->cache = array_filter($this->cache, function($value) {
return $value->valid();
});
}
}
This is the overkill version that uses both weak references and a max size (set it to -1 to disable that). Basically if it gets too full or too many results were expired, then it will prune the cache of any empty references to make space, and only drop non-empty references if it has to for sanity.
PHP 7.4 now has WeakReference
To know if $obj is referenced by something else or not, you could use:
// 1: create a weak reference to the object
$wr = WeakReference::create($obj);
// 2: unset our reference
unset($obj);
// 3: test if the weak reference is still valid
$res = $wr->get();
if (!is_null($res)) {
// a handle to the object is still held somewhere else in addition to $obj
$obj = $res;
unset($res);
}
I had a similar problem with the Incredibly Flexible Data Storage (IFDS) file format with trying to keep track of references to objects in an in-memory data cache. How I solved it was to create a ref-counting class that wrapped a reference to the underlying array. I generally prefer arrays over objects as PHP has traditionally tended to handle arrays better than objects with regards to unfortunate things like memory leaks.
class IFDS_RefCountObj
{
public $data;
public function __construct(&$data)
{
$this->data = &$data;
$this->data["refs"]++;
}
public function __destruct()
{
$this->data["refs"]--;
}
}
Since 'refs' is tracked as a regular value in the data, it is possible to know when the last reference to the data has gone away. Regardless of whether multiple variables reference the refcounting object or it is cloned, the refcount will always be non-zero until all references are gone. I don't need to care how many actual references there are internally in PHP as long as the value is correctly zero vs. non-zero. The IFDS implementation also tracks an estimated amount of RAM being used by each object (again, being exact isn't super important as long as it is in the ballpark), allowing it to prioritize writing and releasing unused objects that are occupying system resources first and then writing and releasing portions of still-referenced objects that are caching large quantities of DATA chunk information.
To get back to the topic/question, with this ref-counting class-based approach, it is, for example, mostly straightforward to prune to ~5,000 records in a cache upon hitting 10,000 records in the cache. General strategy is to not get rid of records still being referenced plus keep the most recently requested/used records that aren't being referenced because they are likely to be referenced again. Upon every new reference, unset() and then setting the item again will move the item to the end of the array so that the oldest probably unreferenced items appear first and the newest probably still referenced items appear last.
Weak references, as several people have suggested, won't solve every caching issue. They don't work in caching scenarios where you don't want to remove an item from the cache until the application is done working with it (i.e. deleting an item that the application later attempts to use) but also want to keep it around as long as RAM overhead permits even if the application stops referencing it temporarily but might need it again in a moment. Weak references are also incapable of working in scenarios where the item in the cache is holding onto unwritten data that may or may not be fine with staying unwritten even if there are no references to it in the application. In short, when there is a balancing act to maintain, weak references cannot be used.
In the xdebug code coverage, it shows the line "return false;" (below "!$r") as not covered by my tests. But, the $sql is basically hard-coded. How do I get coverage on that? Do I overwrite "$table" somehow? Or kill the database server for this part of the test?
I guess this is probably telling me I'm not writing my model very well, right? Because I can't test it well. How can I write this better?
Since this one line is not covered, the whole method is not covered and the reports are off.
I'm fairly new to phpunit. Thanks.
public function retrieve_all()
{
$table = $this->tablename();
$sql = "SELECT t.* FROM `{$table}` as t";
$r = dbq ( $sql, 'read' );
if(!$r)
{
return false;
}
$ret = array ();
while ( $rs = mysql_fetch_array ( $r, MYSQL_ASSOC ) )
{
$ret[] = $rs;
}
return $ret;
}
In theory, you should be better separating the model and all the database related code.
In exemple, in Zend Framework, the quickstart guide advise you to have :
your model classes
your data mappers, whose role is to "translate" the model into the database model.
your DAOs (or table data gateway) that do the direct access to the tables
This is a really interesting model, you should have a look at it, it enables you to really separate the model from the data, and thus to perform tests only onto the model part (and don't care about any database problem/question)
But in your code, I suggest you to perform a test where you have the dbq() function to return false (maybe having the db connexion to be impossible "on purpose"), in order to have full code coverage.
I often have these kind of situations, where testing all the "error cases" takes you too much time, so I give up having 100% code coverage.
I guess the function dbq() performs a database query. Just unplug your database connection and re-run the test.
The reason you have trouble testing is that you are using a global resource: your database connection. You would normally avoid this problem by providing your connection object to the test method (or class).