Lookup table: array vs. switch/case performance - php

In a PHP application, I have a very large constant (literal) lookup table in the form of a three dimensional array with string keys in the first two dimensions, and the third dimension being non-associative, with many repetitions in the second level (some whole third-dimension arrays appear repeated).
Coming from compiled languages like C++, I prefer such lookup tables because they produce almost no runtime cost (only cache-misses in the code are a problem). So I defined it in PHP like this:
$table = array(
'182-0' => array(
'221-0' => array()
),
'184-0' => array(
'197-0' => array('1.9','1.10','1.11','1.12','1.13','1.14','1.15','1.16','1.17','1.18'),
'201-0' => array('1.9','1.10','1.11','1.12','1.13','1.14','1.15','1.16','1.17','1.18'),
'221-0' => array('1.19','1.20','1.21','1.22','1.23','1.24','1.25','1.26','1.27','1.28'),
'221-1' => ...,
...
),
...
);
This array consumes about 1 megabyte in source code and is fully static. All of this code is executed unconditionally in order to create the complete lookup table, then one third-dimension array is only used once per request:
// once per request, for given $a and $b:
foreach ($table[$a][$b] as $item) {
...
}
I guess that there might be better choices for code like this. I guess the core problem of the above code is that the array is built up on every request dynamically, but since it is only used once this effort is pretty much useless.
What is a good alternative to the above code? I'd like to save both source code size as well as runtime cost.
Should I rewrite this as branches? Are there even better options, specifically suited for constant lookup tables?
I'm thinking of a "branch"-based solution like this (note the case-fall-through to avoid duplication of the third-level-arrays):
function table($a, $b) {
switch ($a) {
case '182-0':
switch ($b) {
case '221-0':
return array();
default:
return null;
}
case '184-0':
switch ($b) {
case '197-0':
case '201-0':
return array('1.9','1.10','1.11','1.12','1.13','1.14','1.15','1.16','1.17','1.18');
case '221-0':
return array('1.19','1.20','1.21','1.22','1.23','1.24','1.25','1.26','1.27','1.28');
...
default:
return null;
}
}
...
return null;
}
Then use it with:
// once per request, for given $a and $b:
foreach (table($a, $b) as $item) {
...
}

Related

Recall chained methods on PHP

I call an object that returns an array given certain chained methods:
Songs::duration('>', 2)->artist('Unknown')->genre('Metal')->stars(5)->getAllAsArray();
The problem lies that every time I want to get this array, for example, in another script, I have to chain everything again. Now imagine that in over 10 scripts.
Is there a way to recall the chained methods for later use?
Since you can't cache the result, you could cache the structure of the call chain in an array.
$chain = [
'duration' => ['>', 2],
'artist' => 'Unknown',
'genre' => 'Metal',
'stars' => 5,
'getAllAsArray' => null
];
You could use that with a function that emulates the chained call using the cached array:
function callChain($object, $chain) {
foreach ($chain as $method => $params) {
$params = is_array($params) ? $params : (array) $params;
$object = call_user_func_array([$object, $method], $params);
}
return $object;
}
$result = callChain('Songs', $chain);
If you can not cache your results as suggested, as I commented, here are a couple ideas. If your application allows for mixing of functions (as in you are permitted by standards of your company's development rules) and classes, you can use a function wrapper:
// The function can be as complex as you want
// You can make '>', 2 args too if they are going to be different all the time
function getArtists($array)
{
return \Songs::duration('>', 2)->artist($array[0])->genre($array[1])->stars($array[2])->getAllAsArray();
}
print_r(getArtists(array('Unkown','Metal',5)));
If you are only allowed to use classes and __callStatic() is not forbidden in your development and is also available in the version of PHP you are using, you might try that:
// If you have access to the Songs class
public __callStatic($name,$args=false)
{
// This should explode your method name
// so you have two important elements of your chain
// Unknown_Metal() should produce "Unknown" and "Metal" as key 0 and 1
$settings = explode("_",$name);
// Args should be in an array, so if you have 1 value, should be in key 0
$stars = (isset($args[0]))? $args[0] : 5;
// return the contents
return self::duration('>', 2)->artist($settings[0])->genre($settings[1])->stars($stars)->getAllAsArray();
}
This should return the same as your chain:
print_r(\Songs::Unknown_Metal(5));
It should be noted that overloading is hard to follow because there is no concrete method called Unknown_Metal so it's harder to debug. Also note I have not tested this particular set-up out locally, but I have notated what should happen where.
If those are not allowed, I would then make a method to shorten that chain:
public function getArtists($array)
{
// Note, '>', 2 can be args too, I just didn't add them
return self::duration('>', 2)->artist($array[0])->genre($array[1])->stars($array[2])->getAllAsArray();
}
print_r(\Songs::getArtists(array('Unkown','Metal',5)));
I wrote a lib doing exactly what you're looking for, implementing the principle suggested by Don't Panic in a high quality way: https://packagist.org/packages/jclaveau/php-deferred-callchain
In your case you would code
$search = DeferredCallChain::new_(Songs::class) // or shorter: later(Songs::class)
->duration('>',2) // static syntax "::" cannot handle chaining sadly
->artist('Unknown')
->genre('Metal')
->stars(5)
->getAllAsArray();
print_r( $search($myFirstDBSongs) );
print_r( $search($mySecondDBSongs) );
Hoping it will match your needs!

PHP Optimizing a Very Long Switch Case Statement

Please have a look to below code
function GetAreaName($AreaCode)
{
switch ($AreaCode)
{
case 201: return 'New Jersey';
case 202: return 'Washington';
// this goes on till
case 999: return '';
}
}
Let's say if the AreaCode is 998 then it would have to go through so many cases!
How could we optimize this function? (No using databases.)
I'm thinking to build an array and do a binary search over it? But this means every time the function is called the array will be rebuild? How do we build the array once, cache it and re-use every time this function is called?
Why not just use a hash table?
class Area {
private $areaCodes = array(
201 => 'New Jersey',
202 => 'Washington',
// this goes on till
999 => '';
);
function getStateByAreaCode ($areaCode) {
if (array_key_exists($areaCode, $this->areaCodes)) {
return $this->areaCodes[$areaCode];
} else {
return false;
}
}
}
Call it like this:
$area = new Area();
$city = $area->getStateByAreaCode(303);
Just save your class in a file and include it when you need it.
You Asked How to Prevent the Array From Being Created Every Request:
By putting this in a class you at least keep it clean. It technically still gets created each request, but unless your array is enormous (WAY bigger than the area codes in the U.S.) it shouldn't pose a performance issue. If you are worried about building the array every time you have a request, then take a look at a code optimizer like APC or the Zend Optimizer. This essentially takes the byte code that PHP generates at run time and caches it.
Sounds like you should just store it in your database.
But if you can't do that, either abstract it into a config file of some kind and store it in some kind of persisted object, or just use a static variable:
function foo($key) {
static $cache = array(1 => 'abc', 2 => 'def', 3 => 'ghi');
if (array_key_exists($key, $cache)) {
return $cache[$key];
} else {
//Somehow signal an error (throw an exception, return boolean false, or something)
}
}
In the above, $cache would only exist once. (If you knew that the values would never be null, you could use isset instead of array_key_exists.)
This isn't very flexible though since changing the data requires you to edit your code. You typically want your data and your code to be decoupled.
That could mean storing it in some kind of file (json, xml, php, whatever), and loading it into some kind of structure that you only create once. You would then pass that object or array around wherever it was needed. (Or, if you wanted to be hacky, you could use a static class. I suggest against this though.)
Switch condition is evaluated once only:
In a switch statement, the condition is evaluated only once and the result is compared to each case statement. In an elseif statement, the condition is evaluated again. If your condition is more complicated than a simple compare and/or is in a tight loop, a switch may be faster. ➫➫➫
There is no optimization required. However, read:
In PHP what's faster, big Switch statement, or Array key lookup
If you want to build a config file, you can consider something like:
$areas = array
(
1 => 'abc',
2 => 'def',
..
);
Then simply compare:
if (!isset($areas[$some_code]))
{
// do something
}
else
{
// ok
}
Try below pseudo code
$areas = array('201' => 'New Jersey',
'202' => 'Washington',
......
........
'999' => '');
function GetAreaName($AreaCode)
{
if(isset($areas[$AreaCode])) {
return $areas[$AreaCode];
} else {
// do something
}
}

CakePHP - a code sample seems strange to me, what am i missing?

Attached code taken from cakephp bakery, where someone uploaded a sample about custom validation rules.
class Contact extends AppModel
{
var $name = 'Contact';
var $validate = array(
'email' => array(
'identicalFieldValues' => array(
'rule' => array('identicalFieldValues', 'confirm_email' ),
'message' => 'Please re-enter your password twice so that the values match'
)
)
);
function identicalFieldValues( $field=array(), $compare_field=null )
{
foreach( $field as $key => $value ){
$v1 = $value;
$v2 = $this->data[$this->name][ $compare_field ];
if($v1 !== $v2) {
return FALSE;
} else {
continue;
}
}
return TRUE;
}
}
In the code, the guy used a foreach to access an array member which he had its name already!
As far as I understand - it's a waste of resources and a bad(even strange) practice.
One more thing about the code:
I don't understand the usage of the continue there. it's a single field array, isn't it? the comparison should happen once and the loop will be over.
Please enlighten me.
In the code, the guy used a foreach to access an array member which he had its name already! As far as I understand - it's a waste of resources and a bad(even strange) practice.
The first parameter is always an array on one key and its value, the second parameter comes from the call to that function, in a block named as the key... So, all you need is to send the key and no need to iterate
The code uses foreach to iterate through $field, which is an array of one key value pair. It all starts when the validation routine invokes identicalFieldValues, passing it two values - $field, which would be an array that looks like:
array (
[email] => 'user entered value 1'
)
The second parameter $compare_field would be set to the string confirm_email.
In this particular case, it doesn't look like it makes a lot of sense to use foreach since your array only has one key-value pair. But you must write code this way because CakePHP will pass an array to the method.
I believe the reason why CakePHP does this is because an array is the only way to pass both the field name and its value. While in this case the field name (email) is irrelevant, you might need in other cases.
What you are seeing here is one of the caveats of using frameworks. Most of the time, they simplify your code. But sometimes you have to write code that you wouldn't write normally just so the framework is happy.
One more thing about the code: I don't understand the usage of the continue there. it's a single field array, isn't it? the comparison should happen once and the loop will be over. Please enlighten me.
Indeed. And since there are no statements in the foreach loop following continue, the whole else block could also be omitted.
A simplified version of this would be:
function identicalFieldValues($field=array(), $compare_field=null)
{
foreach ($field as $field) {
$compare = $this->data[$this->name][$compare_field];
if ($field !== $compare) {
return FALSE;
}
}
return TRUE;
}
And I agree with you, the loop only goes through one iteration when validating the email field. regardless of the field. You still need the foreach because you are getting an array though.

array_filter filtering out entire array

I have an array of arrays, each array containing details of a scan by a medical device. I'm getting this data from text logs that are dumped nightly. The format of which is this:
$this->scans = array(
array(
'patientid' => (int),
'patientname' => 'John Skeet',
'reviewed' => 0 or 1
//plus more irrelevant
),
array(
//same as above
), //etc
)
The important array key here is reviewed, as each scan may be reviewed if it is of high enough quality. However, the text logs dump out EVERY scan that is acquired, then goes back through and re-lists the ones that are reviewed.
Now in order to prevent duplicates , I figured I could just use an array_filter to filter out scans that have been both acquired and reviewed (keeping the reviewed version). However, the filter function is filtering out the entire array (except in some rare cases). If someone could take a look and let me know why they think it's happening that would be much appreciated.
$this->scans = array_filter($this->scans, array($this, "scan_cleanup"));
.
private function scan_cleanup($scan) {
//only if the scan was not reviewed
if ($scan['reviewed'] == 0) {
//change reviewed status to see if there is a duplicate
$scan['reviewed'] == 1;
//return false to remove this copy (and keep reviewed)
if (in_array($scan, $this->scans)) {
return false;
}
}
return true;
}
$scan['reviewed'] == 1;
vs
$scan['reviewed'] = 1;
One is a conditional, that does nothing in this context, the other is not there.
You are also not running the return false very often. I'd change the logic a little to make it a little clearer, and simpler by a little refactoring (pulling out a condition-check).
if ($scan['reviewed'] and hasDupe($scan)) {
return false; // filter out
}
return true; // it is passed back, and is output
hasDupe() does the best checks you know for a duplicate record and returns true/false.
Simple case of "==" vs. "=" as far as I can see.
$scan['reviewed'] = 1;
That oughta do the trick. Sometimes the simplest problems are the hardest to spot ;-)

PHP architecture, and pass-by-reference vs pass-by-value

Seeking suggestions from PHP architects!
I'm not terribly familiar with PHP but have taken over maintenance of a large analytics package written in the language. The architecture is designed to read reported data into large key/value arrays, which are passed through various parsing modules to extract those report parameters known to each of those modules. Known parameters are removed from the master array, and any leftovers which were not recognized by any of the modules, are dumped into a kind of catch-all report showing the "unknown" data points.
There are a few different methods being used to call these parser modules, and I would like to know which if any are considered to be "proper" PHP structure. Some are using pass-by-reference, others pass-by-value, some are functions, some are objects. All of them modify the input parameter in some way.
A super-simplified example follows:
#!/usr/bin/php
<?php
$values = Array("a"=>1, "b"=>2, "c"=>3, "d"=>4 );
class ParserA {
private $a = null;
public function __construct(&$myvalues) {
$this->a = $myvalues["a"];
unset($myvalues["a"]);
}
public function toString() { return $this->a; }
}
// pass-by-value
function parse_b($myvalues) {
$b = $myvalues["b"];
unset($myvalues["b"]);
return Array($b, $myvalues);
}
// pass-by-reference
function parse_c(&$myvalues) {
echo "c=".$myvalues["c"]."\n";
unset($myvalues["c"]);
}
// Show beginning state
print_r($values);
// will echo "1" and remove "a" from $values
$a = new ParserA($values);
echo "a=".$a->toString()."\n";
print_r($values);
// w ill echo "2" and remove "b" from $values
list($b, $values) = parse_b($values);
echo "b=".$b."\n";
print_r($values);
// will echo "3" and remove "c" from $values
parse_c($values);
print_r($values);
?>
The output will be:
Array
(
[a] => 1
[b] => 2
[c] => 3
[d] => 4
)
a=1
Array
(
[b] => 2
[c] => 3
[d] => 4
)
b=2
Array
(
[c] => 3
[d] => 4
)
c=3
Array
(
[d] => 4
)
I'm really uncomfortable having so many different call methods in use, some of which have hidden effects on the call function parameters using "&pointer"-style functions, some requiring the main body to write their output, and some writing their output independently.
I would prefer to choose a single methodology and stick with it. In order to do so, I would also like to know which is most efficient; my reading of the PHP documentation indicates that since it uses copy-on-write, there shouldn't be much performance difference between using pointers to vs passing the object directly and re-reading a return value. I would also prefer to use the object-oriented structure, but am uncomfortable with the hidden changes being made to the input parameter on the constructor.
Of the three calling methods, ParserA(), parse_b(), and parse_c(), which if any is the most appropriate style?
I'm not really an expert in PHP but from my experience passing by value is better. This way code won't have side effects and that mean it will be easier to understand and maintain and do all sorts of crazy things on it, like using it as callback for map function. So I'm all for parse_b way of doing things.
FYI: In PHP, objects are always passed by reference, no matter what. Also if you have an array with objects and scalar values in it, the scalar values are passed by value, but the objects by reference.
As a general rule in PHP, do not use references unless you really have to.
references in PHP are also not what most people expect them to be:
"References in PHP are a means to access the same variable content by different names. They are not like C pointers; instead, they are symbol table aliases.""
see also: php.net: What References Are
So in short:
The proper way of handling this PHP is using creating an object that passes the variables around by value or manipulating the array with array_map (array_map allows you to apply a callback function to the elements an array.)
I would vote against the methods proposed in general, but of them, I think parse_b has the best idea.
I think it would be better design to wrap the "data" array in a class that could let you "pop" a key out of it easily. So the parser ends up looking like:
class ParserA {
private $a = null;
public function __construct(My_Data_Class $data) {
$this->a = $data->popValue("a");
}
public function toString() { return $this->a; }
}
And a sample implementation
class My_Data_Class {
protected $_data;
public function __construct(array $data) {
$this->_data = $data;
}
public function popValue($key) {
if (isset($this->_data[$key])) {
$value = $this->_data[$key];
unset($this->_data[$key]);
return $value;
}
}
}

Categories