CakePHP overriding identical "field" in search conditions - php

I've come across an odd problem using CakePHP 1.3 to find information. Let's use this dbschema as an example:
id is int(11) PRIMARY auto_increment
amount is float(10,2) NULL
status is ENUM(Completed, Removed, Pending)
id amount status
1 100.00 Completed
2 100.00 Removed
3 100.00 Completed
4 100.00 Completed
5 100.00 Pending
When using Cake's find to retrieve data from this table, I use this query:
$this->Testtable->find('all', array(
'conditions' => array(
'status LIKE ' => 'Removed',
'status LIKE ' => 'Pending',
'status LIKE ' => 'Completed'
)
))
Looking at this query, I would assume that Cake would return all rows that match all of those conditions (which is totally acceptable in SQL), however it only uses the last condition and returns WHERE status LIKE 'Completed'.
I ran a test doing this, which returned all rows correctly (I know it's a more "correct" way to do the query anyway):
'conditions' => array(
'status' => array('Removed', 'Pending', 'Completed')
)
Take the opposite for an example, I want to return all rows that aren't Removed or Pending:
'conditions' => array(
'status !=' => 'Removed',
'status !=' => 'Pending'
)
This query returns all rows with Completed and Removed, as it only listens to the last statement. I assume that this is happening because instead of concatenating these search conditions into the query, Cake is overwriting the conditions based on the "field" being status !=. I can prove this theory by adding a space after the != in either of those conditions, creating the desired result of only Confirmed records.
Can anybody tell me why Cake would do this? As this is a legitimate thing to do in SQL, I see no reason that Cake wouldn't allow you to do it. Does anybody know if this issue is fixed up in newer versions of Cake?

I suppose that this comes down to the fact that at the end of the day, I am reassigning the array value based on that key, and it's not actually CakePHP's fault at all. I had a look into Cake's model.php and found this:
$query = array_merge(compact('conditions', 'fields', 'order', 'recursive'), array('limit' => 1));
I ran a test:
$array = array(
'conditions' => array(
'test' => 'yes',
'test' => 'no'
)
);
$var = 'hello';
$c = compact('array', 'var');
print_r($c);
As mentioned above, compact is only receiving the value no from the test key. I assumed that the use of compact to merge variables/arrays into the query would recursively merge similar keys from the conditions array into the last specified, but it turns out that yes isn't even making it that far as it's being redefined on the spot.
I suppose this is a limitation of PHP rather than Cake, but it is still something that didn't occur to me and should be done differently somehow (if it hasn't already) in future.
Edit
I ran a couple more tests. I wrapped identical conditions in their own arrays, then compared them the way Cake's find functions would. Using compact (which Cake does) the arrays containing identical keys remain intact, however using array_merge, the first key is overwritten by the second. I guess in this case it's a very, very good thing that Cake uses compact instead of array_merge to merge its query criteria.
$array = array(
array('test' => 'yes'),
array('test' => 'no')
);
$m = array_merge($array[0], $array[1]);
$c = compact('array');
print_r($c);
print_r($m);
Result:
Array
(
[array] => Array
(
[0] => Array
(
[test] => yes
)
[1] => Array
(
[test] => no
)
)
)
Array
(
[test] => no
)
While this is obviously a simple problem in the way you fundamentally write PHP code, it wasn't inherently obvious while writing in Cake syntax that conditions would overwrite each other...

Basic PHP: Don't use the same array key twice
'conditions' => array(
'status LIKE ' => 'Removed',
'status LIKE ' => 'Pending',
'status LIKE ' => 'Completed'
)
should be
'conditions' => array(
'status LIKE' => array('Removed', 'Pending', 'Completed'),
)
Same for any other array key.
Note that some quick debugging of the array reveals this.
Please also see the tons of other stackoverflow questions with the same issue or other areas where basic research could have pointed you in this direction. Taking a look there first can help to resolve the issue in less time.

Related

Returning array is faster than returning an element of the array

I just conducted this very interesting experiment and the results came out quite surprising.
The purpose of the test was to determine the best way, performance-wise, to get an element of an array. The reason is that I have a configuration class which holds settings in an associative mufti-dimensional array and I was not quite sure I was getting these values the best way possible.
Data (it is not really needed for the question, but I just decided to include it so you see it is quite a reasonable amount of data to run tests with)
$data = array( 'system' => [
'GUI' =>
array(
'enabled' => true,
'password' => '',
),
'Constants' => array(
'URL_QUERYSTRING' => true,
'ERRORS_TO_EXCEPTIONS' => true,
'DEBUG_MODE' => true,
),
'Benchmark' =>
array(
'enabled' => false,
),
'Translations' =>
array(
'db_connection' => 'Default',
'table_name' =>
array(
'languages' => 'languages',
'translations' => 'translations',
),
),
'Spam' =>
array(
'honeypot_names' =>
array(
0 => 'name1',
1 => 'name2',
2 => 'name3',
3 => 'name4',
4 => 'name5',
5 => 'name6',
),
),
'Response' =>
array(
'supported' =>
array(
0 => 'text/html',
1 => 'application/json',
),
),]
);
Methods
function plain($file, $setting, $key, $sub){
global $data;
return $data[$file][$setting][$key][$sub];
}
function loop(...$args){
global $data;
$value = $data[array_shift($args)];
foreach($args as $arg){
$value = $value[$arg];
}
return $value;
}
function arr(){
global $data;
return $data;
}
Parameters (when calling the functions)
loop('system', 'Translations', 'table_name', 'languages');
plain('system', 'Translations', 'table_name', 'languages');
arr()['system']['Translations']['table_name']['languages'];
Leaving aside any other possible flaws and focusing on performance only, I ran 50 tests with 10000 loops. Each function has been called 500000 times in total. The results are in average seconds per 10000 loops:
loop: 100% - 0.0381 sec. Returns: languages
plain: 38% - 0.0146 sec. Returns: languages
arr: 23% - 0.0088 sec. Returns: languages
I was expecting loop to be quite slow because there is logic inside, but looking at the results of the other two I was pretty surprised. I was expecting plain to be the fastest because I'm returning an element from the array and for the opposite reason arr to be the slowest because it returns the whole array.
Given the outcome of the experiment I have 2 questions.
Why is arr almost 2 times faster than plain?
Are there any other methods I have missed that can outperform arr?
I said this in the comment, but I decided it's pretty close to an answer. Your question basically boils down to why is 2+2; not faster than just plain 2;
Arrays are just objects stored in memory. To return an object from a function, you return a memory address (32 or 64 bit unsigned integer), which implies nothing more than pushing a single integer onto the stack.
In the case of returning an index of an array, that index really just represents an offset from the base address of the array, so everytime you see a square bracket-style array access, internally PHP (rather the internal C implementation of PHP) is converting the 'index' in the array into an integer that it adds to the memory address of the array to get the memory address of the stored value at that index.
So when you see this kind of code:
return $data[$file][$setting][$key][$sub];
That says:
Find me the address of $data. Then calculate the offset that the string stored in $file is (which involves looking up what $file is in memory). Then do the same for $setting, $key, and $sub. Finally, add all of those offsets together to get the address (in the case of objects) or the value (in the case of native data types) to push on to the stack as a return value.
It should be no surprise then that returning a simple array is quicker.
That's the way PHP works. You expect, that a copy of $data is returned here. It is not.
What you acutaly have, is a pointer (Something like an ID to a specific place in the memory), which reference the data.
What you return, is the reference to the data, not the data them self.
In the plain method, you search for the value first. This cost time. Take a look at this great article, which show, how arrays are working internal.
Sometimes Code say's more then words. What you assume is:
function arr(){
global $data;
//create a copy
$newData = $data;
//return reference to $newData
return $newData;
}
Also you should not use global, that is bad practise. You can give your array as a parameter.
//give a copy of the data, slow
function arr($data) {
return $data;
}
//give the reference, fast
function arr(&$data) {
return $data;
}

How is it possible to correctly nest custom fields in a CakePHP find query?

we're often dealing with a code that looks like the following:
return $this->find(
'all', [
'fields' => ['DISTINCT Tag.tag', 'COUNT(Tag.tag) as count'],
'group' => ['Tag.tag'],
'order' => ['count' => 'DESC']
]);
This query leads us to the following output:
[0] => Array
(
[Tag] => Array
(
[tag] => walls.io
)
[0] => Array
(
[count] => 15
)
)
As you can see, the query returns the results in a "somehow wrong" nesting. The "count" field is unfortunately put into a pseudo [0]-array.
IIRC, CakePHP uses internally a syntax like Tag__field for correctly nesting virtual fields.
When changing the code to the Model__-syntax, the problem stays the same:
return $this->find(
'all', [
'fields' => ['DISTINCT Tag.tag', 'COUNT(Tag.tag) as Tag__count'],
'group' => ['Tag.tag'],
'order' => ['COUNT(Tag.tag)' => 'DESC']
]);
Output:
[0] => Array
(
[Tag] => Array
(
[tag] => walls.io
)
[0] => Array
(
[Tag__count] => 15
)
)
Workaround 1: array_map
CakePHP pros: Is there a better/more elegant solution than manually mapping the array after the select statement?
$tags = array_map(function($tag) {
$tag['Tag']['count'] = $tag[0]['count'];
unset($tag[0]);
return $tag;
}, $tags);
Workaround 2: virtual field
As described above, the usage of a virtual field might solve this problem:
$this->virtualFields = ['count' => 'COUNT(Tag.Tag)'];
return $this->find(
'all', [
'group' => ['Tag.tag'],
'order' => [$this->getVirtualField('count') => 'DESC']
]);
Unfortunately, with this solution, it's not possible to specify ANY fields at all. only by completely leaving the "fields"-key, the nesting of the array works as expected. when selecting $fields = ['Tag.tag', $this->getVirtualField('count')] the nesting is wrong again.
CakePHP pros: Do you know a method, where the nesting is done right, even if you specify your own fields?
Looking at the CakePHP code, such a method does not exist.
Have a look at the file lib/Cake/Model/Datasource/Database/Mysql.php.
Find the method called: Mysql::resultSet( $results ); (around line 240).
That method maps the result to an array. To determine if a column is part of a table or not it uses PDOStatement::getColumnMeta(). For your "virtual column" that method will return an empty table and so the CakePHP code will put it separately, see the else branch
$this->map[$index++] = array(0, $column['name'], $type);
In order to avoid that else branch you would have to use Virtual fields, but then you run into the other problems that you have noticed.
So you are left with the array_map solution or you could try to overload that Mysql class and add your custom logic at how to identify where a column fits.

PHP Multi-Dimensional Array: Finding a multi-element sub-array match ... is there an alternate option to looping?

We have some customer data which started in a separate data-store. I have a consolidation script to standardize and migrate it into our core DB. There are somewhere around 60,000-70,000 records being migrated.
Naturally, there was a little bug, and it failed around row 9k.
My next trick is to make the script able to pick up where it left off when it is run again.
FYI:
The source records are pretty icky, and split over 5 tables by what brand they purchased ... IE:
create TABLE `brand1_custs` (`id` int(9), `company_name` varchar(112), etc...)
create TABLE `brand2_custs` (`id` int(9), `company_name` varchar(112), etc...)
Of course, a given company name can (and does) exist in multiple source tables.
Anyhow ... I used the ParseCSV lib for logging, and each row gets logged if successfully migrated (some rows get skipped if they are just too ugly to parse programatically). When opening the log back up with ParseCSV, it comes in looking like:
array(
0 => array( 'row_id' => '1',
'company_name' => 'Cust A',
'blah' => 'blah',
'source_tbl' => 'brand1_cust'
),
1 => array( 'row_id' => '2',
'company_name' => 'customer B',
'blah' => 'blah',
'source_tbl' => 'brand1_cust'
),
2 => array( 'row_id' => '1',
'company_name' => 'Cust A',
'blah' => 'blah',
'source_tbl' => 'brand2_cust'
),
etc...
)
My current workflow is along the lines of:
foreach( $source_table AS $src){
$results = // get all rows from $src
foreach($results AS $row){
// heavy lifting
{
}
My Plan is to check the
$row->id and $src->tbl combination
for a match in the
$log[?x?]['row_id'] and $log[?x?]['source_tbl'] combination.
In order to achieve that, I would have to do a foreach($log AS $xyz) loop inside the foreach($results AS $row) loop, and skip any rows which are found to have already been migrated (otherwise, they would get duplicated).
That seems like a LOT of of looping to me.
What about when we get up around record # 40 or 50 thousand?
That would be 50k x 50k loops!!
Question:
Is there a better way for me to check if a sub-array has a "row_id" and "source_tbl" match other than looping each time?
NOTE: as always, if there's a completely different way I should be thinking about this, I'm open to any and all suggestions :)
I think that you should do a preprocessing on the log doing a hash (or composed key) of row_id and source_tbl and store it in an hashmap then for each row just construct the hash of the key and check if it is already defined in the hashmap.
I am telling you to use hashed set because you can search in it with O(k) time otherwise it would be the same as you are proposing only that it would be a cleaner code.

CakePHP models findFirst method issue

I'm not familiar with CakePHP too close. Recently I've ran into problem using Model. I need to get exaclty one row from database, modify some columns values and save it back. Pretty simple, right?
What do I try to do:
$condition = array('some_id_column' => $another_models_id);
$model = $this->MyModel->findFirst($condition);
BUT i get FALSE in $model variable. At hte same time
$condition = array('some_id_column' => $another_models_id);
$model = $this->MyModel->findAll($condition);
returns array. Its structure is something like:
array (
0 =>
array (
'MyModel' =>
array (
'id' => '1',
'some_id_column' => '123456',
'some_field' => 'some text',
...
),
),
I'd go with findAll if it did not return an array of arrays, but array of models (in my case - of one model). What do I want to achieve:
$condition = array('some_id_column' => $another_models_id);
$model = $this->MyModel->findFirst($condition);
$model->some_field = 'some another text';
$model->save();
Could you help me out to understand how it's usually done in CakePHP?
I'd also like to hear why findAll finds row and findFirst fails to find it... It just does not make sense to me... They should work in almost the same way and use the same database APIs...
If I can not do what I want in CakePHP, would you write a receipt how it is usually done there ?
There is no such method as findFirst.
You're probably looking for find('first', array('conditions' => array(...))).

Mongodb and PHP: how to query nested arrays without using the key name

I'm probably missing something simple here but I can't seem to find a way to build a query that will allow me to update a match in a group of nested values.
I have a document like this for a blog app I've been working on (currently uses MySQL):
array (
'_id' => new MongoId("4bc8dcee8ba936a8101a0000"),
'created' => '20100418-201312 +0000',
'post-title' => 'Some Post Title',
'post-body' => 'Blah Blah Blah Blah.',
'post-blog-name' => 'default',
'post-comments' =>
array (
0 =>
array (
'comment-title' => 'Test1',
'comment-body' => 'asdf1',
'created' => '20100418-214512 +0000',
'owner' => 'User1',
),
1 =>
array (
'comment-title' => 'Test2',
'comment-body' => 'asdf2',
'created' => '20100418-214512 +0000',
'owner' => 'User2',
),
),
'owner' => 'zach',
'updated' => '20100418-201312 +0000',
)
I'd like to be able to build a query that can search 'comment-title' for a match and then allow me to update/change/delete data as needed.
Obviously I can perform an update using a query which includes the key value. Something like this works:
$collection->update(
array("post-comments.0.comment-title" => $_POST['comment-title']),
array('$set' => array('entries.0' => array('comment-title' => $_POST['comment-title'], 'comment-body' => $_POST['comment-body'], 'owner' => $_SESSION['username'], 'updated' => gmdate('Ymd\-His O')))));
But I expect I'm missing something that would allow me to leave out the key and still be able to match one of the nested arrays based on a value (in this example the 'comment-title').
Anyway, sorry, this probably isn't the best example and I probably will end up using the keys in comments to identify them (comment #) but since nesting and creating rather complex objects seem to be a few of Mongodbs strong points I'm just hoping someone can point out what it is I might be missing.
A query to remove or update all comments by a specific user (say a user the blog author just black-listed) might be a better example. I'm not sure how I'd do this short of pulling out the entire document and then iterating through the nested arrays using PHP.
try ... notice I removed the "key"
$collection->update(array("post-comments.comment-title" ...
Cheers!

Categories