I'm querying big chunks of data with cachephp's find. I use recursive 2. (I really need that much recursion sadly.) I want to cache the result from associations, but I don't know where to return them. For example I have a Card table and card belongs to Artist. When I query something from Card, the find method runs in the Card table, but not in the Artist table, but I get the Artist value for the Card's artist_id field and I see a query in the query log like this:
`Artist`.`id`, `Artist`.`name` FROM `swords`.`artists` AS `Artist` WHERE `Artist`.`id` = 93
My question is how can I cache this type of queries?
Thanks!
1. Where does Cake "do" this?
CakePHP does this really cool but - as you have discovered yourself - sometimes expensive operation in its different DataSource::read() Method implementations. For example in the Dbo Datasources its here. As you can see, you have no direct 'hooks' (= callbacks) at the point where Cake determines the value of the $recursive option and may decides to query your associations. BUT we have before and after callbacks.
2. Where to Cache the associated Data?
Such an operation is in my opinion best suited in the beforeFind and afterFind callback method of your Model classes OR equivalent with Model.beforeFind and Model.afterFind event listeners attached to the models event manager.
The general idea is to check your Cache in the beforeFind method. If you have some data cached, change the $recursive option to a lower value (e.g. -1, 0 or 1) and do the normal query. In the afterFind method, you merge your cached data with the newly fetched data from your database.
Note that beforeFind is only called on the Model from which you are actually fetching the data, whereas afterFind is also called on every associated Model, thus the $primary parameter.
3. An Example?
// We are in a Model.
protected $cacheKey;
public function beforeFind($query) {
if (isset($query["recursive"]) && $query["recursive"] == 2) {
$this->cacheKey = genereate_my_unique_query_cache_key($query); // Todo
if (Cache::read($this->cacheKey) !== false) {
$query["recursive"] = 0; // 1, -1, ...
return $query;
}
}
return parent::beforeFind($query);
}
public function afterFind($results, $primary = false) {
if ($primary && $this->cacheKey) {
if (($cachedData = Cache::read($this->cacheKey)) !== false) {
$results = array_merge($results, $cachedData);
// Maybe use Hash::merge() instead of array_merge
// or something completely different.
} else {
$data = ...;
// Extract your data from $results here,
// Hash::extract() is your friend!
// But use debug($results) if you have no clue :)
Cache::write($this->cacheKey, $data);
}
$this->cacheKey = null;
}
return parent::afterFind($results, $primary);
}
4. What else?
If you are having trouble with deep / high values of $recursion, have a look into Cake's Containable Behavior. This allows you to filter even the deepest recursions.
As another tip: sometimes such deep recursions can be a sign of a general bad or suboptimal design (Database Schema, general Software Architecture, Process and Functional flow of the Appliaction, and so on). Maybe there is an easier way to achieve your desired result?
The easiest way to do this is to install the CakePHP Autocache Plugin.
I've been using this (with several custom modifications) for the last 6 months, and it works extremely well. It will not only cache the recursive data as you want, but also any other model query. It can bring the number of queries per request to zero, and still be able to invalidate its cache when the data changes. It's the holy grail of caching... ad-hoc solutions aren't anywhere near as good as this plugin.
Write query result like following
Cache::write("cache_name",$result);
When you want to retrieve data from cache then write like
$results = Cache::read("cache_name");
Related
I'm working on an application written in PHP. I decided to follow the MVC architecture.
However, as the code gets bigger and bigger, I realized that some code gets duplicated in some cases. Also, I'm still confused whether I should use static functions when quering the database or not.
Let's take an example on how I do it :
class User {
private id;
private name;
private age;
}
Now, inside this class I will write methods that operate on a single user instance (CRUD operations). On the other hand, I added general static functions to deal with multiple users like :
public static function getUsers()
The main problem that I'm facing is that I have to access fields through the results when I need to loop through users in my views. for example :
$users = User::getUsers();
// View
foreach($users as $user) {
echo $user['firstname'];
echo $user['lastname'];
}
I decided to do this because I didn't feel it's necessary to create a single user instance for all the users just to do some simple data processing like displaying their informations. But, what if I change the table fields names ? I have to go through all the code and change those fields, and this is what bothers me.
So my question is, how do you deal with database queries like that, and is it fine to use static functions when querying the database. And finally, where is it logical to store those "displaying" functions like the one I talked about ?
Your approach seems fine, howerver I would still use caching like memcached to cache values and then you can remove static.
public function getUsers() {
$users = $cacheObj->get('all_users');
if ($users === false) {
//use your query to grab users and set it to cache
$users = "FROM QUERY";
$cacheObj->set('all_users', $users);
}
return $users;
}
(M)odel (V)iew (C)ontroller is a great choice choice, but my advice is look at using a framework. The con is they can have a step learning curve, pro is it does a lot of heavy lifting. But if you want to proceed on your own fair play, it can be tough to do it yourself.
Location wise you have a choice because the model is not clearly define:
You'll hear the term "business logic" used, basically Model has everything baring views and the controllers. The controllers should be lean only moving data then returning it to the view.
You model houses DB interaction, data conversions, timezone changes, general day to day functions.
Moudle
/User
/Model
/DB or (Entities and Mapper)
/Utils
I use Zend and it uses table gateways for standard CRUD to avoid repetition.
Where you have the getUsers() method you just pass a array to it, and it becomes really reusable and you'd just have different arrays in various controller actions and it builds the queries for you from the array info.
Example:
$data = array ('id' => 26)
$userMapper->getUsers($data);
to get user 26
enter code here
$data = array ('active' => 1, 'name' => 'Darren')
$userMapper->getUsers($data);`
to get active users named Darren
I hope this help.
I am using fatfree framework with the cortex ORM plugin. I am trying to get the number of records matching a specific criteria. My code:
$group_qry = new \models\system\UserGroup;
$group_qry->load(array('type=?',$name));
echo $group_qry->count(); //always returns 3, i.e total number of records in table
Initially I was thinking that this maybe because the filtering wasn't working and it always fetched everything, but that is not the case, cause I verified it with
while(!$group_qry->dry()){
echo '<br/>'.$group_qry->type;
$group_qry->next();
}
So how do I get the number of records actually loaded after filtering?
Yeah this part is confusing: the count() method actually executes a SELECT COUNT(*) statement. It takes the same arguments as the load() method, so in your case :
$group_qry->count(array('type=?',$name));
It is not exactly what you need, since it will execute a second SELECT, which will reduce performance.
What you need is to count the number of rows in the result array. Since this array is a protected variable, you'll need to create a dedicated function for that in the UserGroup class:
class UserGroup extends \DB\SQL\Mapper {
function countResults() {
return count($this->query);
}
}
If you feel that it's a bit of overkill for such a simple need, you can file a request to ask for the framework to handle it. Sounds like a reasonable demand.
UPDATE:
It's now part of the framework. So calling $group_qry->loaded() will return the number of loaded records.
xfra35 is right, there was currently no method for just returning the loaded items, when using the mappers as active record. alternatively you can just count the collection:
$result = $group_qry->find(array('type=?',$name));
echo count($result); // gives you the number of all found results
in addition, i've just added the countResults method, like suggested. thx xfra35.
https://github.com/ikkez/F3-Sugar/commit/92fa18130892ab7d2303edc31d2b1f4bba70e881
Let's keep it simple.
A project has just two models:
User (hasMany Project)
Project (belongsTo User)
Users are only allowed to perform actions on the projects which they own. No one else's.
I know how to manually check who the logged in user is and whether or not he/she owns a specific project, but is there a better, more global way to do this? I'm looking for a more D.R.Y. way that doesn't require repeating the same validation inside multiple actions. For example, is there a config setting like maybe...
Configure::write('Enforce_belongs_to', true);
...or maybe a setting/option on the Auth component.
Maybe this is crazy, but I figured I'd ask.
Adding to Nunser's answer, here would be a general concept of how the behavior would be. You can then attach it to the applicable model.
class StrongBelongBehavior extends ModelBehavior
{
public function beforeFind( Model $Model, $query = array() ) {
$query['conditions'] = array_merge( (array)$query['conditions'], array( $Model->alias.'.user_id' => CakeSession::read("Auth.User.id" ) );
return $query;
}
public function beforeSave( Model $Model ) {
$projectId = Hash::get( $Model->data, 'Poject.id' );
if( $projectId ) {
$Model->loadModel('UserProject'); // UserProject is a custom model
$canEdit = $Model->UserProject->projectIDExists( $projectId ); // returns true if projectId belongs to the current user
if ( ! $canEdit ) {
return false;
}
}
return true;
}
}
I'm not sure if what I'm answering is the best-utermost-dry-it's-almost-dehydrating approach, but is the simplest thing I could think of.
In the Project model, create a function that return an array of project ids associated to an user.
class Project extends AppModel {
public function getByUserId($userId) {
$projectsArray = array();
if ($userId != "valid")
//do all the checks, if it's not null, numeric, if the id exists, etc
$projects = $this->Project->find('all', array('conditions'=>
array('user_id'=>$userId)));
if (!empty($projects)) {
foreach($projects as $i => $project)
$projectsArray[] = $project['Project']['id'];
}
return $projectsArray;
}
}
You mention a find('first') in your comment, but I'm assuming you want all the projects related to the user instead of just the first. If not, it's a simple modification of that function. Also, I'm just getting the ids, but you may want an $id=>$name_project array, up to you.
Now, I don't know what you mean by "only allowed to perform actions", is it just edits that are restricted? Or lists or views should be restricted and not even shown to the user if the project is not his/hers?
For the first case, restrict editing, modify beforeSave.
public function beforeSave($options = array()) {
if(!$this->id && !isset($this->data[$this->alias][$this->primaryKey])) {
//INSERT
//not doing anything
} else {
//UPDATE
//check if project inside allowed projects array
$allowed = $this->getByUserId(CakeSession::read("Auth.User.id"));
if (!in_array($this->id, $allowed))
return false; //or throw error and catch it in the controller
}
return true;
}
The code is untested, but in general terms, you prevent the edit of a project that is not "the user's" just before the update of the record. I assume the insert of new projects is free for everyone. According to this post, all saving functions except saveAll pass through this filter first, so you will need to overwrite the saveAll function and add a validation similar to the one in beforeSave (as explained in the answer there).
And for the second part, filtering results so the user isn't even aware that there are other projects instead of his/hers, change beforeFind. The docs talk about restricting results based on user's roles, so I guess we're on the right track.
public function beforeFind($queryData) {
//force the condition
$allowed = $this->getByUserId(CakeSession::read("Auth.User.id"));
$queryData['conditions'][$this->alias.'.user_id'] = $allowed;
return $queryData;
}
Since the $allowed array has just id values, it'll work like an IN clause, but if you change that array structure, be sure to modify these functions accordingly.
And that's it. I'm thinking about the more basic cases here, edit, view, delete... Ups, delete... change the beforeDelete function also, to avoid any evil users who want to delete others property. The logic remains the same (check if project id is in allowed array, if not, return false or throw error), so I won't add the example of that function here. But that's the basic stuff. If for some reason you want to have the allowed projects in the controller, call the getByUserId function in beforeFilter and handle that ids array there. You can even store it in session, but you'll have to have in mind maintaining that session when adding or deleting projects.
If you want a superadmin that can see and edit everything, just add a condition in getByUserId that checks the role of the user, and if it is an admin, return all projects.
Also, keep in mind: maybe Project has many... subprojects (not much imagination), and so, the user related to the project can add subprojects, but the same evil user as before modifies the hidden project_id that subproject has and edits it. In that case, I recommend you also add a validation in Subproject to avoid actions on models related to Project that are not his. If you have the Security component in place and the edit and delete actions can just be reached by forms, this is a minor thing because Security Component well used avoids form tampering. But give it a thought to see if you need to validate "Subproject" instances also.
As Ayo Akinyemi mentioned, you can use all this as a behavior. I haven't personally done so, but it meets the requirements, all the callbacks modified here are what you modify in a behaviour. You'll have to encapsulate the logic, column names (need to be variable an not set hardcoded, like user_id), etc, but it will be reusable in any other cake project you'll have. Something like StrongBelongBehavior or MoreDRYBehavior. And share it if you do it :)
I'm not sure if Auth component has some way of doing what you want, but that would be the best option I guess. Until some enlightens me (I haven't investigate much this issue), this is the solution I'd use.
I have a Propel 1.6 generated class Group that has Inits related to it, and Inits have Resps related to them. Pretty straightforward.
I don't understand the difference between these two pieces of Propel code. Here in the first one, I re-create the $notDeleted criteria on every loop. This code does what I want -- it gets all the Resps into the $data array.
foreach ($group->getInits() as $init) {
$notDeleted = RespQuery::create()->filterByIsDeleted(false);
foreach ($init->getResps($notDeleted) as $resp) {
$data[] = $resp;
}
}
Here in the second code, I had the $notDeleted criteria pulled out of the loop, for (what I thought were) obvious efficiency reasons. This code does not work the way I want -- it only gets the Resps from one of the Inits.
$notDeleted = RespQuery::create()->filterByIsDeleted(false);
foreach ($group->getInits() as $init) {
foreach ($init->getResps($notDeleted) as $resp) {
$data[] = $resp;
}
}
I thought it must be something to do with how the getResps() method caches the results, but that's not how the docs or the code reads in that method. The docs and the code say that if the criteria passed in to getResps() is not null, it will always get the results from the database. Maybe some other Propel cache?
(First off, I'm guessing you meant to use $init versus $initiative in your loops. That or there's some other code we're not seeing here.)
Here's my guess: In your second example you pull out the $notDeleted Criteria object, but each time through the inner foreach the call to getResps($notDeleted) is going to make Propel do a filterByInit() on the Criteria instance with the current Init instance. This will add a new WHERE condition to the SQL, but obviously a Resp can only have one Init.Id value, hence the lone result.
I don't think there is a good reason to pull that out though, under the covers Propel is just creating a new Criteria object, cloning the one you pass in - thus no real memory saved.
I realize the knee-jerk response to this question is that "you dont.", but hear me out.
Basically I am running on an active-record system on a SQL, and in order to prevent duplicate objects for the same database row I keep an 'array' in the factory with each currently loaded object (using an autoincrement 'id' as the key).
The problem is that when I try to process 90,000+ rows through this system on the odd occasion, PHP hits memory issues. This would very easily be solved by running a garbage collect every few hundred rows, but unfortunately since the factory stores a copy of each object - PHP's garbage collection won't free any of these nodes.
The only solution I can think of, is to check if the reference count of the objects stored in the factory is equal to one (i.e. nothing is referencing that class), and if so free them. This would solve my issue, however PHP doesn't have a reference count method? (besides debug_zval_dump, but thats barely usable).
Sean's debug_zval_dump function looks like it will do the job of telling you the refcount, but really, the refcount doesn't help you in the long run.
You should consider using a bounded array to act as a cache; something like this:
<?php
class object_cache {
var $objs = array();
var $max_objs = 1024; // adjust to fit your use case
function add($obj) {
$key = $obj->getKey();
// remove it from its old position
unset($this->objs[$key]);
// If the cache is full, retire the eldest from the front
if (count($this->objs) > $this->max_objs) {
$dead = array_shift($this->objs);
// commit any pending changes to db/disk
$dead->flushToStorage();
}
// (re-)add this item to the end
$this->objs[$key] = $obj;
}
function get($key) {
if (isset($this->objs[$key])) {
$obj = $this->objs[$key];
// promote to most-recently-used
unset($this->objs[$key]);
$this->objs[$key] = $obj;
return $obj;
}
// Not cached; go and get it
$obj = $this->loadFromStorage($key);
if ($obj) {
$this->objs[$key] = $obj;
}
return $obj;
}
}
Here, getKey() returns some unique id for the object that you want to store.
This relies on the fact that PHP remembers the order of insertion into its hash tables; each time you add a new element, it is logically appended to the array.
The get() function makes sure that the objects you access are kept at the end of the array, so the front of the array is going to be least recently used element, and this is the one that we want to dispose of when we decide that space is low; array_shift() does this for us.
This approach is also known as a most-recently-used, or MRU cache, because it caches the most recently used items. The idea is that you are more likely to access the items that you have accessed most recently, so you keep them around.
What you get here is the ability to control the maximum number of objects that you keep around, and you don't have to poke around at the php implementation details that are deliberately difficult to access.
It seems like the best answer was still getting the reference count, although debug_zval_dump and ob_start was too ugly a hack to include in my application.
Instead I coded up a simple PHP module with a refcount() function, available at: http://github.com/qix/php_refcount
Yes, you can definitely get the refcount from PHP. Unfortunately, the refcount isn't easily gotten for it doesn't have an accessor built into PHP. That's ok, because we have PREG!
<?php
function refcount($var)
{
ob_start();
debug_zval_dump($var);
$dump = ob_get_clean();
$matches = array();
preg_match('/refcount\(([0-9]+)/', $dump, $matches);
$count = $matches[1];
//3 references are added, including when calling debug_zval_dump()
return $count - 3;
}
?>
Source: PHP.net
I know this is a very old issue, but it still came up as a top result in a search so I thought I'd give you the "correct" answer to your problem.
Unfortunately getting the reference count as you've found is a minefield, but in reality you don't need it for 99% of problems that might want it.
What you really want to use is the WeakRef class, quite simply it holds a weak reference to an object, which will expire if there are no other references to the object, allowing it to be cleaned up by the garbage collector. It needs to be installed via PECL, but it really is something you want in every PHP installation.
You would use it like so (please forgive any typos):
class Cache {
private $max_size;
private $cache = [];
private $expired = 0;
public function __construct(int $max_size = 1024) { $this->max_size = $max_size; }
public function add(int $id, object $value) {
unset($this->cache[$id]);
$this->cache[$id] = new WeakRef($value);
if ($this->max_size > 0) && ((count($this->cache) > $this->max_size)) {
$this->prune();
if (count($this->cache) > $this->max_size) {
array_shift($this->cache);
}
}
}
public function get(int $id) { // ?object
if (isset($this->cache[$id])) {
$result = $this->cache[$id]->get();
if ($result === null) {
// Prune if the cache gets too empty
if (++$this->expired > count($this->cache) / 4) {
$this->prune();
}
} else {
// Move to the end so it is culled last if non-empty
unset($this->cache[$id]);
$this->cache[$id] = $result;
}
return $result;
}
return null;
}
protected function prune() {
$this->cache = array_filter($this->cache, function($value) {
return $value->valid();
});
}
}
This is the overkill version that uses both weak references and a max size (set it to -1 to disable that). Basically if it gets too full or too many results were expired, then it will prune the cache of any empty references to make space, and only drop non-empty references if it has to for sanity.
PHP 7.4 now has WeakReference
To know if $obj is referenced by something else or not, you could use:
// 1: create a weak reference to the object
$wr = WeakReference::create($obj);
// 2: unset our reference
unset($obj);
// 3: test if the weak reference is still valid
$res = $wr->get();
if (!is_null($res)) {
// a handle to the object is still held somewhere else in addition to $obj
$obj = $res;
unset($res);
}
I had a similar problem with the Incredibly Flexible Data Storage (IFDS) file format with trying to keep track of references to objects in an in-memory data cache. How I solved it was to create a ref-counting class that wrapped a reference to the underlying array. I generally prefer arrays over objects as PHP has traditionally tended to handle arrays better than objects with regards to unfortunate things like memory leaks.
class IFDS_RefCountObj
{
public $data;
public function __construct(&$data)
{
$this->data = &$data;
$this->data["refs"]++;
}
public function __destruct()
{
$this->data["refs"]--;
}
}
Since 'refs' is tracked as a regular value in the data, it is possible to know when the last reference to the data has gone away. Regardless of whether multiple variables reference the refcounting object or it is cloned, the refcount will always be non-zero until all references are gone. I don't need to care how many actual references there are internally in PHP as long as the value is correctly zero vs. non-zero. The IFDS implementation also tracks an estimated amount of RAM being used by each object (again, being exact isn't super important as long as it is in the ballpark), allowing it to prioritize writing and releasing unused objects that are occupying system resources first and then writing and releasing portions of still-referenced objects that are caching large quantities of DATA chunk information.
To get back to the topic/question, with this ref-counting class-based approach, it is, for example, mostly straightforward to prune to ~5,000 records in a cache upon hitting 10,000 records in the cache. General strategy is to not get rid of records still being referenced plus keep the most recently requested/used records that aren't being referenced because they are likely to be referenced again. Upon every new reference, unset() and then setting the item again will move the item to the end of the array so that the oldest probably unreferenced items appear first and the newest probably still referenced items appear last.
Weak references, as several people have suggested, won't solve every caching issue. They don't work in caching scenarios where you don't want to remove an item from the cache until the application is done working with it (i.e. deleting an item that the application later attempts to use) but also want to keep it around as long as RAM overhead permits even if the application stops referencing it temporarily but might need it again in a moment. Weak references are also incapable of working in scenarios where the item in the cache is holding onto unwritten data that may or may not be fine with staying unwritten even if there are no references to it in the application. In short, when there is a balancing act to maintain, weak references cannot be used.