There are, of course, many many ways to store a base of data. A database being the most obvious of them. But others include JSON, XML, and so on.
The probem I have with the project I'm working on right now is that the data being stored includes callback functions as part of the objects. Functions cannot be serialised, to my knowledge. So, what should I do?
Is it acceptable to store this data as a PHP file to be included? If so, should I create one big file with everything in, or divide it into separate files for each "row" of the database?
Are there any alternatives that may be better?
Depending on how elaborate you callbacks are, for serializing you could wrap them in all in a class which utilizes some __sleep (create callback representation) & __wakeup (restore callback) voodoo, with an __invoke() method calling the actual callback. Assuming you can reverse engineer / recreate those callbacks (i.e. object to point to is clear).... If not, you are probably out of luck.
Well if they are created by the developer it should be easy to come up with a format... for example with JSON:
{
"callback" : {
"contextType": "instance", // or "static"
"callable" : "phpFunctionName",
"arguments" : []
}
}
So then on models the would be able to use this feature you might do something like:
protected function invokeCallback($json, $context = null) {
$data = json_decode($json, true);
if(isset($data['callaback'])) {
if($data['contextType'] == 'instance') {
$context = is_object($context) ? $context : $this;
$callable = array($context, $data['callable']);
} else {
// data[callable] is already the string function name or a array('class', 'staticMethod')
$callable = $data['callable'];
}
if(is_callable($callable) {
return call_user_func_array($callable, $data['arguments'];
} else {
throw new Exception('Illegal callable');
}
}
return false;
}
Theres some more error handling that needs to happen in there as well as some screening of the callables you want to allow but you get the idea.
Store names of callbacks instead, have a lookup table from names to functions, use table.call(obj, name, ...) or table.apply(obj, name, ...) to invoke. Do not have any code in the serialisable objects themselves. Since they are developer-created, there should be a limited inventory of them, they should be in code, and should not be serialised.
Alternately, make a prototype implementing the callbacks, and assign it to obj.__proto__. Kill the obj.__proto__ just before you serialise - all the functions disappear. Restore it afterwards. Not sure if cross-browser compatible; I seem to have read something about __proto__ not being accessible in some browsers khminternetexploreroperakhm
Related
I am currently working on a mashup that incorporates many data feeds. In order to display ALL of the feeds that the user wants on one page, I am currently using if statements to cross-check with the MySQL database like this:
if($var["type"]=="weather")
$var being the result of a call to mysqli_fetch_array
and then including code relevant to the function (e.g. weather) underneath, and then another "if" statement for another feed, so on so on. The problem is that there will be many feeds, and having all these "if" statements will be slow and redundant.
Is there any way to optimize this PHP code?
Another solution might be to map the "type" to a custom function using associative arrays.
e.g. (pseudo code)
function handle_wheater_logic() {
// ... your code goes here
}
function handle_news_logic() {
// .. your code goes here
}
$customFunctions = array("wheater" => "handle_wheater_logic", "news" => "handle_news_logic");
while ($row = mysql_fetch_...) {
call_user_func ($customFunctions[$row["type"]])
}
This would eliminate the need to use a lot of if statements. You might as well do the "type to function" mapping in a configuration file or maybe just store the name of the custom function to call for each "type" in a database table - that's up to you.
You can, of course also pass parameters to custom function. Just checkout the documentation for call_user_func[_array].
Try this:
$methods = array(
"weather" => function() {
// code
},
"otheroption" => function() {
}
);
Just use then $var["type"] as a index in the array to get the function:
$methods[$var["type"]]();
You can obviuosly, for better readbility do something similar:
$methods = array(
"weather" => "wheater_function",
"otheroption" => "other_function"
);
and then call the functions this way:
call_user_func($methods[$var["type"]]);
To be even more object oriented we can obviously store in the array objects implementing a particular interface, or store object redifining the __call() magic method and use it like functions.
You can use a switch statement.
A good solution for eliminating a lot of if statements and a huge switch statement just checking for one condition, would be to implement a design pattern such as the Strategy pattern.
This way you will have the code for each type separated, which makes it easier to overview and manage.
Here's an example of an implementation http://blogs.microsoft.co.il/blogs/gilf/archive/2009/11/22/applying-strategy-pattern-instead-of-using-switch-statements.aspx
Even if you won't implement this strictly it will give you some ideas on how to solve this elegantly.
Create an array that associates a function to each feed type:
$actions = array("weather" => "getWeather",
"news" => "getNews");
Then use call_user_func to call the correct one:
call_user_func($actions[$var["type"]]);
Polymorphysm for the rescue.
inteface FeedInterface {
public function retrieve($params);
}
class FeedWeather implements FeedInterface {
public function retrieve($params) {
//retrieve logic for weather feed
}
}
class FeedSports implements FeedInterface {
public function retrieve($params) {
//retrieve logic for sports feed
}
}
With use of PHP class autoloading, each of above declarations can be in a separate file, possibly namespaced as well. Then your feed retrieval code could look like this:
$class = 'Feed'.$var["type"];
$feed = new $class;
$feed->retrieve($params);
That's overly simplified and would need some additional code for error handling, discovery of non-existing classes and such, but the idea should be clear.
Using either an If Statement or a Switch statement will be faster than you care about. It might look ugly and be cumbersome to maintain but it will be fast.
I was previously mostly scripting in PHP and now considering getting "more serious" about it :)
I am working on a hiking website, and I needed to put some values into an object that I then try to pass back to the calling code.
I tried doing this:
$trailhead = new Object ();
But the system sort of barfed at me.
Then I didn't declare the object at all, and started using it like this:
$trailhead->trailhead_name = $row['trailhead_name'];
$trailhead->park_id = $row['park_id'];
That seemed to work reasonably ok. But there are at least 3 problems with this:
Since that code gets executed when getting things from the database, what do I do in case there is more than one row?
When I passed the $trailhead back to the calling code, the variables were empty
I actually am maybe better off making a real object for Trailhead like this:
class Trailhead
{
private $trailhead_name;
private $park_id;
private $trailhead_id;
public function __construct()
{
$this->trailhead_name = NULL;
$this->park_id = NULL;
$this->trailhead_id = NULL;
}
}
What do people generally do in these situations and where am I going wrong in my approach? I know its more than one place :)
$trailheads[] = $trailhead;
I'd do a print_r() of $trailhead to check that it's what you expect it to be. The default object type in PHP is going to be stdClass.
Yes, that's going to be better, as it'll allow your Trailhead objects to have functions. The way you're currently doing it is basically taking advantage of none of PHP's object functionality - it's essentially an array with slightly different syntax.
I think you should get in "contact" with some of the basics first. Objects in PHP have sort of a history. They are a relative to the array and there are two sorts of objects:
data-objects
class objects
data objects are called stdClass and that's actually what you were initially looking for:
$trailhead = new Object();
in PHP is written as:
$trailhead = new stdClass;
(with or without brackets at the end, both works)
You then can dynamically add members to it, like you did in the second part without declaring the variable (that works, too in PHP):
$trailhead->trailhead_name = $row['trailhead_name'];
$trailhead->park_id = $row['park_id'];
If you want to more quickly turn $row into an object, just cast it:
$trailhead = (object) $row;
That works the other way, too:
$array = (array) $trailhead;
As you can see, those basic data based objects in PHP do not hide their relationship to the array data type. In fact you can even iterator over them:
foreach($trailhead as $key=>$value) {
echo $key, ' is ', $value, "<br>\n";
}
You can find lots of information about this in the PHP manual, it's a bit spread around, but worth to know the little basics before repeating a lot of names only to pass along the data that belongs together.
Next to these more or less stupid data objects, you can code complete classes that - next to what every stdClass can do - can have code, visibility, inheritance and all the things you can build nice stuff from - but this can be pretty complex as the topic is larger.
So it always depends on what you need. However, both type of objects are "real objects" and both should work.
class myClass {
function importArray(array $array) {
foreach($array as $key=>$value) {
if(!is_numeric($key)) $this->$key=$value;
}
}
function listMembers() {
foreach($this as $key=>$value) {
echo $key, ' is ', $value, "<br>\n";
}
}
}
$trailhead = new myClass();
$trailhead->importArray($row);
echo $trailhead->park_id;
Keep in mind that instead of creating a set of objects that merely does the same in each object (store data), you should just take one flexible class that is handling the job flexible (e.g. stdClass) because that will keep your code more clean.
Instead of coding a getter/setter orgy you can then spend the time thinking about how you can make the database layer more modular etc. .
Just pass back an array:
$trailhead = array(
'trailhead_name' => $row['trailhead_name'],
'park_id' => $row['park_id'],
'trailhead_id' => $row['trailhead_id'],
)
Then either access it like:
$trailhead['park_id'];
or use the nifty list() to read it into variables:
list($trailhead_name, $park_id, $trailhead_id) = $trailhead;
I needed to create dynamic breadCrumbs that must be realized automatically by the application. So I have the following structure in the URL for navagation:
nav=user.listPMs.readPM&args=5
then i could have a function-file whose sole purpose would be to define the user.listPMs.readPM function itself:
file: nav/user.listPMs.readPM.php
function readPM($msgId)
{
/*code here*/
}
Of course this ends up cluttering the global scope since i'm not wrapping the function withing a class or using namespaces. The best solution here seems to be namespacing it, no doubt right? But I also thought of another one:
file: nav/user.listPMs.readPM.php
return function($msgId)
{
/*code here*/
};
Yep, that simple, the file is simply returning an anonymous function. I think this is amazing because i don't need to care about naming it - since i've already properly named the file that contains it, creating a user function and yet having to name it would seem just redundant. Then in the index I would have this little dirty trick:
file: index.php
if($closure = #(include 'nav/'.$_GET['nav']))
{
if($closure instanceof Closure)
{
$obj = new ReflectionFunction($closure);
$args = explode(',',#$_GET['args']);
if($obj->getNumberOfParameters($obj)<=count($args))
call_user_func_array($closure,$args);
else
die('Arguments not matching or something...');
} else {
die('Bad call or something...');
}
} else {
die('Bad request etc.');
}
Don't even need to mention that the breadCrumbs can be nicely built latter just by parsing the value within the $_GET['nav'] variable.
So, what do you think, is there a better solution to this problem? Have you found another way to explore Closures and/or Reflection?
I like the basic idea. But the implementation is pretty much terrible. Imagine that I set nav=../../../../../../etc/passwd. That would (depending on your server configuration) allow me to access your password file, which certainly is no good.
This isn't a real fluent interface. I have an object which builds up a method stack. Which gets executed by a single function call. But now I might add another virtual method, which "takes over" that method stack.
Use case: I'm wrapping my superglobals into objects. This allows me to "enforce" input filtering. $_GET and co provide simple sanitizing methods. And my new version now allows chaining of atomic filters. As example:
$_GET->ascii->nocontrol->text["field"]
This is a method call. It uses angle brackets. But that's just a nice trick which eases rewriting any occourence of $_GET["field"]. Anyway.
Now there are also occasionally forms with enumerated fields, as in field[0],field[1],field[2]. That's why I've added a virtual ->array filter method. It hijacks the collected method stack, and iterates the remaining filters on e.g. a $_POST array value. For example $_POST->array->int["list"].
Somewhat shortened implementation:
function exec_chain ($data) {
...
while ($filtername = array_pop($this->__filter)) {
...
$data = $this->{"_$filtername"} ($data);
...
}
function _array($data) {
list($multiplex, $this->__filter) = array($this->__filter, array());
$data = (array) $data;
foreach (array_keys($data) as $i) {
$this->__filter = $multiplex;
$data[$i] = $this->exec_chain($data[$i]);
}
return $data;
}
The method stack gets assembled in the $this->__filter list. Above exec_chain() just loops over it, each time removing the first method name. The virtual _array handler is usually the first method. And it simply steals that method stack, and reexecutes the remainder on each array element. Not exactly like in above example code, but it just repeatedly repopulates the original method stack.
It works. But it feels kind of unclean. And I'm thinking of adding another virtual method ->xor. (YAGNI?) Which would not just iterate over fields, but rather evaluate if alternate filters were successful. For example $_REQUEST->array->xor->email->url["fields"]. And I'm wondering if there is a better pattern for hijacking a function list. My current hook list ($this->__filter) swapping doesn't lend itself to chaining. Hmm well actually, the ->xor example wouldn't need to iterate / behave exactly like ->array.
So specifically, I'm interested in finding an alternative to my $this->__filter list usage with array_pop() and the sneaky swapping it out. This is bad. Is there a better implementation scheme to executing a method list half part me -> half part you?
I've made a similar chaining interface before, I like your idea of using it on GET/POST vars.
I think you will be better off doing something like $var->array->email_XOR_url; rather than $var->array->email->XOR->url;. That way you can catch the various combinations with your __get/__call magic.
I realize the knee-jerk response to this question is that "you dont.", but hear me out.
Basically I am running on an active-record system on a SQL, and in order to prevent duplicate objects for the same database row I keep an 'array' in the factory with each currently loaded object (using an autoincrement 'id' as the key).
The problem is that when I try to process 90,000+ rows through this system on the odd occasion, PHP hits memory issues. This would very easily be solved by running a garbage collect every few hundred rows, but unfortunately since the factory stores a copy of each object - PHP's garbage collection won't free any of these nodes.
The only solution I can think of, is to check if the reference count of the objects stored in the factory is equal to one (i.e. nothing is referencing that class), and if so free them. This would solve my issue, however PHP doesn't have a reference count method? (besides debug_zval_dump, but thats barely usable).
Sean's debug_zval_dump function looks like it will do the job of telling you the refcount, but really, the refcount doesn't help you in the long run.
You should consider using a bounded array to act as a cache; something like this:
<?php
class object_cache {
var $objs = array();
var $max_objs = 1024; // adjust to fit your use case
function add($obj) {
$key = $obj->getKey();
// remove it from its old position
unset($this->objs[$key]);
// If the cache is full, retire the eldest from the front
if (count($this->objs) > $this->max_objs) {
$dead = array_shift($this->objs);
// commit any pending changes to db/disk
$dead->flushToStorage();
}
// (re-)add this item to the end
$this->objs[$key] = $obj;
}
function get($key) {
if (isset($this->objs[$key])) {
$obj = $this->objs[$key];
// promote to most-recently-used
unset($this->objs[$key]);
$this->objs[$key] = $obj;
return $obj;
}
// Not cached; go and get it
$obj = $this->loadFromStorage($key);
if ($obj) {
$this->objs[$key] = $obj;
}
return $obj;
}
}
Here, getKey() returns some unique id for the object that you want to store.
This relies on the fact that PHP remembers the order of insertion into its hash tables; each time you add a new element, it is logically appended to the array.
The get() function makes sure that the objects you access are kept at the end of the array, so the front of the array is going to be least recently used element, and this is the one that we want to dispose of when we decide that space is low; array_shift() does this for us.
This approach is also known as a most-recently-used, or MRU cache, because it caches the most recently used items. The idea is that you are more likely to access the items that you have accessed most recently, so you keep them around.
What you get here is the ability to control the maximum number of objects that you keep around, and you don't have to poke around at the php implementation details that are deliberately difficult to access.
It seems like the best answer was still getting the reference count, although debug_zval_dump and ob_start was too ugly a hack to include in my application.
Instead I coded up a simple PHP module with a refcount() function, available at: http://github.com/qix/php_refcount
Yes, you can definitely get the refcount from PHP. Unfortunately, the refcount isn't easily gotten for it doesn't have an accessor built into PHP. That's ok, because we have PREG!
<?php
function refcount($var)
{
ob_start();
debug_zval_dump($var);
$dump = ob_get_clean();
$matches = array();
preg_match('/refcount\(([0-9]+)/', $dump, $matches);
$count = $matches[1];
//3 references are added, including when calling debug_zval_dump()
return $count - 3;
}
?>
Source: PHP.net
I know this is a very old issue, but it still came up as a top result in a search so I thought I'd give you the "correct" answer to your problem.
Unfortunately getting the reference count as you've found is a minefield, but in reality you don't need it for 99% of problems that might want it.
What you really want to use is the WeakRef class, quite simply it holds a weak reference to an object, which will expire if there are no other references to the object, allowing it to be cleaned up by the garbage collector. It needs to be installed via PECL, but it really is something you want in every PHP installation.
You would use it like so (please forgive any typos):
class Cache {
private $max_size;
private $cache = [];
private $expired = 0;
public function __construct(int $max_size = 1024) { $this->max_size = $max_size; }
public function add(int $id, object $value) {
unset($this->cache[$id]);
$this->cache[$id] = new WeakRef($value);
if ($this->max_size > 0) && ((count($this->cache) > $this->max_size)) {
$this->prune();
if (count($this->cache) > $this->max_size) {
array_shift($this->cache);
}
}
}
public function get(int $id) { // ?object
if (isset($this->cache[$id])) {
$result = $this->cache[$id]->get();
if ($result === null) {
// Prune if the cache gets too empty
if (++$this->expired > count($this->cache) / 4) {
$this->prune();
}
} else {
// Move to the end so it is culled last if non-empty
unset($this->cache[$id]);
$this->cache[$id] = $result;
}
return $result;
}
return null;
}
protected function prune() {
$this->cache = array_filter($this->cache, function($value) {
return $value->valid();
});
}
}
This is the overkill version that uses both weak references and a max size (set it to -1 to disable that). Basically if it gets too full or too many results were expired, then it will prune the cache of any empty references to make space, and only drop non-empty references if it has to for sanity.
PHP 7.4 now has WeakReference
To know if $obj is referenced by something else or not, you could use:
// 1: create a weak reference to the object
$wr = WeakReference::create($obj);
// 2: unset our reference
unset($obj);
// 3: test if the weak reference is still valid
$res = $wr->get();
if (!is_null($res)) {
// a handle to the object is still held somewhere else in addition to $obj
$obj = $res;
unset($res);
}
I had a similar problem with the Incredibly Flexible Data Storage (IFDS) file format with trying to keep track of references to objects in an in-memory data cache. How I solved it was to create a ref-counting class that wrapped a reference to the underlying array. I generally prefer arrays over objects as PHP has traditionally tended to handle arrays better than objects with regards to unfortunate things like memory leaks.
class IFDS_RefCountObj
{
public $data;
public function __construct(&$data)
{
$this->data = &$data;
$this->data["refs"]++;
}
public function __destruct()
{
$this->data["refs"]--;
}
}
Since 'refs' is tracked as a regular value in the data, it is possible to know when the last reference to the data has gone away. Regardless of whether multiple variables reference the refcounting object or it is cloned, the refcount will always be non-zero until all references are gone. I don't need to care how many actual references there are internally in PHP as long as the value is correctly zero vs. non-zero. The IFDS implementation also tracks an estimated amount of RAM being used by each object (again, being exact isn't super important as long as it is in the ballpark), allowing it to prioritize writing and releasing unused objects that are occupying system resources first and then writing and releasing portions of still-referenced objects that are caching large quantities of DATA chunk information.
To get back to the topic/question, with this ref-counting class-based approach, it is, for example, mostly straightforward to prune to ~5,000 records in a cache upon hitting 10,000 records in the cache. General strategy is to not get rid of records still being referenced plus keep the most recently requested/used records that aren't being referenced because they are likely to be referenced again. Upon every new reference, unset() and then setting the item again will move the item to the end of the array so that the oldest probably unreferenced items appear first and the newest probably still referenced items appear last.
Weak references, as several people have suggested, won't solve every caching issue. They don't work in caching scenarios where you don't want to remove an item from the cache until the application is done working with it (i.e. deleting an item that the application later attempts to use) but also want to keep it around as long as RAM overhead permits even if the application stops referencing it temporarily but might need it again in a moment. Weak references are also incapable of working in scenarios where the item in the cache is holding onto unwritten data that may or may not be fine with staying unwritten even if there are no references to it in the application. In short, when there is a balancing act to maintain, weak references cannot be used.