PHP::MongoCollection->aggregate() Failures - php

I have a MongoDB query that is verified as 100% working ( Using MongoHub I have connected to the Replica Set and run the query and received results ), but when converting this query to PHP and attempting to run it through MongoCollection->aggregate(), I fail to get a return/result of any kind whatsoever ... not even an error.
Here is the query, as put into a PHP Array ( as MongoCollection requires ):
$query = array(
'$match' => array(
'$and' => array(
'make' => $props[0],
'model' => $props[1],
'makeYear' => (integer)$props[2],
'status' => 'Active'
)
),
'$group' => array(
'_id' => null,
'marketTotal' => array('$sum' => '$price'),
'count' => array('$sum' => 1)
)
);
The code to run the query is a simple one-liner calling aggregate.
As I don't get errors ... or a log showing any sort of error ... I'm kind of at a total loss here. Is anyone familiar with using PHP w/ MongoDB able to see what I might be doing wrong?

Turns out I was simply missing a layer of arrays ... wrapping each piece of the '$and' array in its own array ... so array('make' => $props[0]), etc ... made it work.
Fun stuff. MongoDB queries are easy. Translating them into PHP-compatible arrays is apparently very difficult and requires a lot of guesswork because it's not 1-to-1

Related

How to update many fields at once in mongoDB PHP? [duplicate]

The following code should work. I could have missed something, but right now I have it as 2 separate update statements and have decided to ask here why this line isn't working.
$this->db->settings->update(array('_id' => $mongoID),
array(
'$set' => array('about' => $about),
'$set' => array('avatar' => $avatar)
)
);
Did I miss something when reading guides or is it only possible to do with separate update statements?
The third argument to MongoCollection::update is an array of options for the update operation.
$this->db->settings->update(
array('_id' => $mongoID),
array('$set' => array('about' => $about, 'avatar' => $avatar))
);

How are filters applied in Elastic Search?

In ES are filters applied before the query?
Say, for example, I am doing a really slow fuzzy search but I am only doing it on a small date range. For an example you can look below (PHP):
$res=$client->search(array('index' => 'main', 'body' => array(
'query' => array(
'bool' => array(
'should' => array(
array('wildcard' => array('title' => '*123*')),
)
)
),
'filter' => array(
'and' => array(
array('range' => array('created' => array('gte' => date('c',time()-3600), 'lte' => date('c',time()+3600))))
)
),
'sort' => array()
)));
Will the filter be applied before trying that slower search?
Logic would dictate that filters are run and then the query but I would like to be sure.
If you use the filtered-query, then filters will be applied before documents are scored.
This will generally speed things up quite a lot. However, the fuzzy query will still be using the input to build a larger query regardless of the filters.
When you use filter right on the search object, then the query will first run without respecting the filter, then documents will be filtered out of the hits - whereas facets will remain unfiltered.
Therefore, you should almost always use the filtered-query, at least when you are not using facets.

understanding ElasticSearch routing

I am trying to use the elasticsearch routing mapping to speed up some queries, but I am not getting the expected result set (not worried about the query performance just yet)
I am using Elastic to set up my mapping:
$index->create(array('number_of_shards' => 4,
'number_of_replicas' => 1,
'mappings'=>array("country"=>array("_routing"=>array("path"=>"countrycode"))),
'analysis' => array(
'analyzer' => array(
'indexAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
),
'searchAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
)
)
) ), true);
If I understand correctly, what should happen is that each result should now have a field called "countrycode" with the value of "country" in it.
The results of _mapping look like this:
{"postcode":
{"postcode":
{"properties":
{
"area1":{"type":"string"},
"area2":{"type":"string"},
"city":{"type":"string",
"include_in_all":true},
"country":{"type":"string"},
"country_iso":{"type":"string"},
"country_name":{"type":"string"},
"id":{"type":"string"},
"lat":{"type":"string"},
"lng":{"type":"string"},
"location":{"type":"geo_point"},
"region1":{"type":"string"},
"region2":{"type":"string"},
"region3":{"type":"string"},
"region4":{"type":"string"},
"state_abr":{"type":"string"},
"zip":{"type":"string","include_in_all":true}}},
"country":{
"_routing":{"path":"countrycode"},
"properties":{}
}
}
}
Once all the data is in the index if I run this command:
http://localhost:9200/postcode/_search?pretty=true&q=country:au
it responds with 15740 total items
what I was expecting is that if I run the query like this:
http://localhost:9200/postcode/_search?routing=au&pretty=true
Then I was expecting it to respond with 15740 results
instead it returns 120617 results, which includes results where country is != au
I did note that the number of shards in the results went from 4 to 1, so something is working.
I was expecting that in the result set there would be an item called "countrycode" (from the rounting mapping) which there isn't
So I thought at this point that my understand of routing was wrong. Perhaps all the routing does is tell it which shard to look in but not what to look for? in other words if other country codes happen to also land in that particular shard, the way those queries are written will just bring back all records in that shard?
So I tried the query again, this time adding some info to it.
http://localhost:9200/postcode/_search?routing=AU&pretty=true&q=country:AU
I thought by doing this it would force the query into giving me just the AU place names, but this time it gave me only 3936 results
So I Am not quite sure what I have done wrong, the examples I have read show the queries changing from needing a filter, to just using match_all{} which I would have thought would only being back ones matching the au country code.
Thanks for your help in getting this to work correctly.
Almost have this working, it now gives me the correct number of results in a single shard, however the create index is not working quite right, it ignores my number_of_shards setting, and possibly other ones too
$index = $client->getIndex($indexname);
$index->create(array('mappings'=>array("$indexname"=>array("_routing"=>array("required"=>true))),'number_of_shards' => 6,
'number_of_replicas' => 1,
'analysis' => array(
'analyzer' => array(
'indexAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
),
'searchAnalyzer' => array(
'type' => 'keyword',
'tokenizer' => 'nGram',
'filter' => array('shingle')
)
)
) ), true);
I can at least help you with more info on where to look:
http://localhost:9200/postcode/_search?routing=au&pretty=true
That query does indeed translate into "give me all documents on the shard where documents for country:AU should be sent."
Routing is just that, routing ... it doesn't filter your results for you.
Also i noticed you're mixing your "au"s and your "AU"s .. that might mix things up too.
You should try setting required on your routing element to true, to make sure that your documents are actually stored with routing information when being indexed.
Actually to make sure your documents are indexed with proper routing explicitly set the route to lowercase(countrycode) when indexing documents. See if that helps any.
For more information try reading this blog post:
http://www.elasticsearch.org/blog/customizing-your-document-routing/
Hope this helps :)

MongoDB aggregation does not work (or is very slow) with PHP and works perfectly in shell?

I'm trying to use the aggregate method on my collection (containing more than 20M documents).
I first tried it in the Windows shell :
db.data.aggregate([
{$match: {firstname: "Roger"}},
{$group:{"_id":"$id_car",count:{$sum: 1}}},
{$sort: {count: -1}},
{$limit: 50}])
And it works perfectly, returning the results after a few seconds.
When I "translate" it in PHP :
$data = $db->data;
$ops = array(
array(
'$match' => array(
'firstname' => 'Roger'
)
),
array(
'$group' => array(
'_id' => '$id_car',
'count' => array(
'$sum' => 1
)
)
),
array(
'$sort' => array(
'count' => -1
)
),
array(
'$limit' => 4
)
);
$res = $data->aggregate($ops);
I get a timeout PHP Fatal error :
Uncaught exception 'MongoCursorTimeoutException' with message 'localhost:27017: cursor timed out (timeout: 30000, time left: 30:0, status: 0)'
I don't know if I've made a mistake in my PHP code, or if aggregate is supposed to be much slower in PHP than in shell ?
Also, I have added an index on "firstname" field to make the query go faster.
By the way, is there any way to set the timeout to infinity for this kind of call ?
Thanks a lot for your help !
Joe
I don't really know about your issue (PHP being slower than the MongoShell), but something I've done that allowed me to run an aggregation in PHP (due to the timeout problems) is changing the way I invoked the aggregation.
Hope this helps someone that reaches this page because of the timeout problems, like I did!
Instead of $data->aggregate($ops) I ran the following equivalent to your case:
$db->command(
array('aggregate' => 'data', 'pipeline' => $ops),
array('timeout' => 100000000)
)
Notice that you must run the command over the $db and not your collection.

CakePHP findList doesn't return aggregated values

The following query returns an array containing the proper ids, but null for all values.
If I remove the aggregation function (AVG()), it returns values (not the averaged ones of course), if I choose e.g. find('all') it returns the average, but not in the list format I want (I could work with that, but I want to try to do it with 'list' first).
$progress = $this->Trial->find('list', array(
'fields' => array(
'Trial.session_id',
'AVG(Trial.first_reaction_time_since_probe_shown) AS average_reaction_time'
),
'group' => 'Trial.session_id',
'conditions' => array(
'Trial.first_valid_response = Trial.probe_on_top',
'TrainingSession.user_id IS NOT NULL'
),
'contain' => array(
'TrainingSession' => array(
'conditions' => array(
'TrainingSession.user_id' => $this->Auth->user('id')
)
)
),
'recursive' => 1,
));
The generated SQL query returns exactly the result I want, when I send it to the DB via PhpMyAdmin.
SELECT
`Trial`.`session_id`,
AVG(`Trial`.`first_reaction_time_since_probe_shown`) AS average_reaction_time
FROM
`zwang`.`trials` AS `Trial`
LEFT JOIN
`zwang`.`training_sessions` AS `TrainingSession` ON (
`Trial`.`session_id` = `TrainingSession`.`id` AND
`TrainingSession`.`user_id` = 1
)
WHERE
`Trial`.`first_valid_response` = `Trial`.`probe_on_top`
GROUP BY
`Trial`.`session_id`
I've examined the source for find('list'). I think it's due to the "array path" for accessing the list getting screwed up when using functions in the query, but I couldn't fix it yet (or recognise my abuse of CakePHP logic).
Once I posted the question, Stackoverflow started relating the correct answers to me.
Apparently, it can't be done with 'list' without virtualFields.
I didn't expect that because it worked using the other find-types.
$this->Trial->virtualFields = array(
'average_reaction_time' => 'AVG(Trial.first_reaction_time_since_probe_shown)'
);
$progress = $this->Trial->find('list', array(
'fields' => array('Trial.session_id','average_reaction_time')
/* etc... */
));

Categories