I have a Mongo Collection that I'm trying to aggregate in which I need to be able to filter the results based on a sum of values from a subdocument. Each of my entries has a subdocument that looks like this
{
"_id": <MongoId>,
'clientID': 'some ID',
<Other fields I can filter on normally>
"bidCompData": [
{
"lineItemID": "210217",
"qtyBid": 3,
"priceBid": 10.25,
"qtyComp": 0
"description": "Lawn Mowed"
"invoiceID": 23
},
{
<More similar entries>
}
]
}
What I'm trying to do is filter on the sum of qtyBid in a given record. For example, my user could specify that they only want records that have a total qtyBid across all of the bidCompData that's greater than 5. My research shows that I can't use $sum outside of the $group stage in the pipeline but I need to be able to sum just the qtyBid values for each individual record. Presently my pipeline looks like this.
array(
array('$project' => $basicProjection), //fields to project calculated earlier using the input parameters.
array('$match' => $query),
array('$group' => array(
'_id' =>
array('clientID' => '$clientID'),
'count' => array('$sum' => 1)
)
)
I tried having another group and an unwind before the group I presently have in my pipeline so that I could get the sum there but it doesn't let me keep my fields besides the id and the sum field. Is there a way to do this without using $where? My database is large and I can't afford the speed hit from the JS execution.
Related
I have this query which I run in PHP:
$result = $client->executeStatement([
'Limit' => 1,
'Statement' => "SELECT * FROM transactions WHERE completed = 0",
]);
I have tried using query function as well but that too supports Limit which is not actually a limit.
$result = $db->query(array(
'TableName' => 'transactions',
'IndexName' => 'completed-index',
'Count' => 1,
'Limit' => 1,
'ScannedCount' => 1,
'KeyConditions' => array(
'completed' => array(
'AttributeValueList' => array(
array('N' => '1')
),
'ComparisonOperator' => 'EQ'
),
),
));
According to their documentation, Limit doesnt necessarily mean a number of matching items:
The maximum number of items to evaluate (not necessarily the number of
matching items). If DynamoDB processes the number of items up to the
limit while processing the results, it stops the operation and returns
the matching values up to that point
Can anyone tell me if there is an actual way to limit the number of rows returned just like we do in SQL databases?
You can only limit how much data is read from disk (pre-filter), not how much is returned (post-filter).
DynamoDB never allows you to request for unbounded work. If DynamoDB allowed you to ask for just 1 row with a filter condition that never matched anything, that would potentially need to read the full database trying to find that 1 row. Behavior like that is what causes relational databases to have issues at scale.
Now, if you're not specifying a filter condition and your query is fully indexed, the amount read will match the amount returned, so the limit should act pretty much like you'd want.
Otherwise you might have to make repeated calls to page the results until you get as many rows as you want.
Writing in PHP, I have 2 arrays, each created from SQL queries.
The first query runs through a table that has multiple pieces of data that correspond to various quiz attempts. The table has a column for the user's Email, the activity ID (which represents a quiz attempt) and another 2 columns for data relating to the attempt (for example 'percentage achieved' or 'quiz ID'):
UserEmail ActID ActKey ActMeta
joB#gm.com 2354 Percentage 98
joB#gm.com 2354 Quiz ID 4
boM#hm.com 4567 Percentage 65
boM#hm.com 4567 Quiz ID 7
Once queried, this first array ($student_quiz_list) stores the selected data in the form of
[[UserEmail, ActID, ActKey, ActMeta], [UserEmail, ActID, ActKey, ActMeta], [UserEmail, ActID, ActKey, ActMeta]...]
where each pair of sub-arrays corresponds to a single quiz attempt.
The second table that is queried has two columns that relate to the quizzes themselves. The first column is the Quiz ID and the second is the Quiz name.
Quiz ID Quiz Name
4 Hardware
7 Logic
Once queried, this second array ($quiz_list) stores the selected data in the form of
[[ID, Name], [ID, Name]...]
What I need to do is create a 3rd array (from the 2 above) which holds the user's email and percentage score
[email, percentage], [email, percentage]...]
but with each sub-array corresponding to a unique actID (so basically the user's percentage in each quiz they attempted without duplicates) and (this is the challenging bit) only for quizzes with certain ID values, in this case, let's say quiz ID 4.
In PHP, what would be the most efficient solution to this? I continually create arrays with duplicates and cannot find a neat solution which provides the outcome desired.
Any help would be greatly received.
Try this code as the example and let me know.
$student_quiz_list=array(
array(
'UserEmail'=>'joB#gm.com','ActID'=>'2354','ActKey'=>'Percentage','ActMeta'=>'90',
),
array(
'UserEmail'=>'joB#gm.com','ActID'=>'2354','ActKey'=>'QuizID','ActMeta'=>'4',
),
array(
'UserEmail'=>'boM#hm.com','ActID'=>'4567','ActKey'=>'Percentage','ActMeta'=>'98',
),
array(
'UserEmail'=>'boM#hm.com','ActID'=>'4567','ActKey'=>'QuizID','ActMeta'=>'7',
),
);
$final_array=array();
foreach( $student_quiz_list as $row){
if($row['ActKey']=='Percentage'){
$final_array[]=array('UserEmail'=>$row['UserEmail'],
'ActMeta'=>$row['ActMeta']
) ;
}
}
echo"<pre>"; print_r($final_array); echo"</pre>";
As commenter #Nico Haase suggested, you can do most of the logic in SQL. You didn't respond to my comment, so I suppose a user can have multiple attempts per quiz ID:
SELECT
UserEmail,
ActMeta
FROM
your_table # replace with your table name
WHERE
ActKey = 'Percentage'
AND ActID IN (
# subselection with table alias
SELECT
t2.ActID
FROM
your_table t2 # replace with your table name
WHERE
t2.ActKey = 'Quiz ID'
AND t2.ActMeta = 2 # insert your desired quiz ID here
AND t2.ActID = ActID
)
(Query tested with MySQL/MariaDB)
For the case that you cannot change the SQL part, here is how you can process your data in PHP. But consider that a large dataset could exceed your server capabilities, so I would definitely recommend the solution above:
// Your sample data
$raw = [
['UserEmail' => 'joB#gm.com', 'ActID' => 2354, 'ActKey' => 'Percentage' , 'ActMeta' => 98],
['UserEmail' => 'joB#gm.com', 'ActID' => 2354, 'ActKey' => 'Quiz ID', 'ActMeta' => 4],
['UserEmail' => 'joB#gm.com', 'ActID' => 4567, 'ActKey' => 'Percentage' , 'ActMeta' => 65],
['UserEmail' => 'joB#gm.com', 'ActID' => 4567, 'ActKey' => 'Quiz ID', 'ActMeta' => 7],
];
// Extract the corresponding ActIDs for a QuizID
$quiz_id = 4;
$act_ids = array_column(
array_filter(
$raw,
function($item) use ($quiz_id) {
return $item['ActMeta'] == $quiz_id;
}
),
'ActID'
);
// Get the entries with ActKey 'Percentage' and an ActID present in the previously extracted set
$percentage_entries = array_filter(
$raw,
function($item) use ($act_ids) {
return $item['ActKey'] === 'Percentage' && in_array($item['ActID'], $act_ids);
}
);
// Map over the previous set to get the array into the final form
$final = array_map(
function($item) {
return [$item['UserEmail'], $item['ActMeta']];
},
$percentage_entries
);
I'm trying to write a query using CakePHP 3.7 ORM where it needs to add a column to the result set. I know in MySQL this sort of thing is possible: MySQL: Dynamically add columns to query results
So far I've implemented 2 custom finders. The first is as follows:
// src/Model/Table/SubstancesTable.php
public function findDistinctSubstancesByOrganisation(Query $query, array $options)
{
$o_id = $options['o_id'];
$query = $this
->find()
->select('id')
->distinct('id')
->contain('TblOrganisationSubstances')
->where([
'TblOrganisationSubstances.o_id' => $o_id,
'TblOrganisationSubstances.app_id IS NOT' => null
])
->orderAsc('Substances.app_id')
->enableHydration(false);
return $query;
}
The second custom finder:
// src/Model/Table/RevisionSubstancesTable.php
public function findProductNotifications(Query $query, array $options)
{
$date_start = $options['date_start'];
$date_end = $options['date_end'];
$query = $this
->find()
->where([
'RevisionSubstances.date >= ' => $date_start,
'RevisionSubstances.date <= ' => $date_end
])
->contain('Substances')
->enableHydration(false);
return $query;
}
I'm using the finders inside a Controller to test it out:
$Substances = TableRegistry::getTableLocator()->get('Substances');
$RevisionSubstances = TableRegistry::getTableLocator()->get('RevisionSubstances');
$dates = // method to get an array which has keys 'date_start' and 'date_end' used later.
$org_substances = $Substances->find('distinctSubstancesByOrganisation', ['o_id' => 123);
if (!$org_substances->isEmpty()) {
$data = $RevisionSubstances
->find('productNotifications', [
'date_start' => $dates['date_start'],
'date_end' => $dates['date_end']
])
->where([
'RevisionSubstances.substance_id IN' => $org_substances
])
->orderDesc('RevisionSubstances.date');
debug($data->toArray());
}
The logic behind this is that I'm using the first custom finder to produce a Query Object which contains unique (DISTINCT in SQL) id fields from the substances table, based on a particular company (denoted by the o_id field). These are then fed into the second custom finder by implementing where(['RevisionSubstances.substance_id IN' ....
This works and gives me all the correct data. An example of the output from the debug() statement is as follows:
(int) 0 => [
'id' => (int) 281369,
'substance_id' => (int) 1,
'date' => object(Cake\I18n\FrozenDate) {
'time' => '2019-09-02T00:00:00+00:00',
'timezone' => 'UTC',
'fixedNowTime' => false
},
'comment' => 'foo',
'substance' => [
'id' => (int) 1,
'app_id' => 'ID000001',
'name' => 'bar',
'date' => object(Cake\I18n\FrozenDate) {
'time' => '2019-07-19T00:00:00+00:00',
'timezone' => 'UTC',
'fixedNowTime' => false
}
]
],
The problem I'm having is as follows: Each of the results returned contains a app_id field (['substance']['app_id'] in the array above). What I need to do is perform a count (COUNT() in MySQL) on another table based on this, and then add that to the result set.
I'm unsure how to do this for a couple of reasons. Firstly, my understanding is that custom finders return Query Objects, but the query is not executed at this point. Because I haven't executed the query - until calling $data->toArray() - I'm unsure how I would refer to the app_id in a way where it could be referenced per row?
The equivalent SQL that would give me the required results is this:
SELECT COUNT (myalias.app_id) FROM (
SELECT
DISTINCT (tbl_item.i_id),
tbl_item.i_name,
tbl_item.i_code,
tbl_organisation_substances.o_id,
tbl_organisation_substances.o_sub_id,
tbl_organisation_substances.app_id,
tbl_organisation_substances.os_name
FROM
tbl_organisation_substances
JOIN tbl_item_substances
ON tbl_organisation_substances.o_sub_id = tbl_item_substances.o_sub_id
JOIN tbl_item
ON tbl_item.i_id = tbl_item_substances.i_id
WHERE
tbl_item.o_id = 1
AND
tbl_item.date_valid_to IS NULL
AND
tbl_organisation_substances.app_id IS NOT NULL
ORDER BY
tbl_organisation_substances.app_id ASC
) AS myalias
WHERE myalias.app_id = 'ID000001'
This does a COUNT() where the app_id is ID000001.
So in the array I've given previously I need to add something to the array to hold this, e.g.
'substance' => [
// ...
],
'count_app_ids' => 5
(Assuming there were 5 rows returned by the query above).
I have Table classes for all of the tables referred to in the above query.
So my question is, how do you write this using the ORM, and add the result back to the result set before the query is executed?
Is this even possible? The only other solution I can think of is to write the data (from the query I have that works) to a temporary table and then perform successive queries which UPDATE with the count figure based on the app_id. But I'm really not keen on that solution because there are potentially huge performance problems of doing this. Furthermore I'd like to be able to paginate my query so ideally need everything confined to 1 SQL statement, even if it's done across multiple finders.
I've tagged this with MySQL as well as CakePHP because I'm not even sure if this is achievable from a MySQL perspective although it does look on the linked SO post like it can be done? This has the added complexity of having to write the equivalent query using Cake's ORM.
This is the Mongo collection I'm trying to test this on:
{"_id":{"$id":"54d5002adc533bf41000002c"},"tasks":[{"taskID":1,"taskName":"Task 1 Name Here","subTasks":[1],"coords":{"gantt":{"x":10,"y":30},"pert":{"x":90,"y":100}}},{"taskID":2,"taskName":"Task 2 Name Here","participators":[1,2],"startDate":"5-12-2014","endDate":"5-21-2014"},{"taskID":3,"taskName":"Task 3 Name Here","subTasks":[3],"participators":[1]}],"participators":[{"participatorID":1,"participatorName":"Participator 1 Name Here"},{"participatorID":2,"participatorName":"Participator 2 Name Here"}]}
I'm trying to filter this data based on the ID and then only return back a certain set of tasks, filtering using the taskID.
Here's the code I'm using:
$cursor = $this->mongo->findOne(['_id' => $mongoID, 'tasks.taskID' => 2], ['_id' => false, 'tasks.taskID' => true, 'tasks.coords.gantt' => true]);
This should only return taskID 2's data; as you can see I'm trying to filter to only displaying taskID of 2. But instead it returns:
{"tasks":[{"taskID":1,"coords":{"gantt":{"x":10,"y":30}}},{"taskID":2},{"taskID":3}]}
I don't know how I'm meant to filter the results so only the specified taskID's data is returned.
Thank you.
-- UPDATE --
Fixed using the following code:
$cursor = $this->mongo->findOne(['_id' => $mongoID], ['_id' => false, 'tasks' => ['$elemMatch' => ['taskID' => 1]], 'tasks.coords.gantt' => true]);
If you want to find an element in an array you need to use the $elemMatch projection.
i.e : i have 2 tables
Product ( id, name )
Photo ( id, name, photo_id )
And I need to get result in array like this:
array(
'id' => 1,
'name' => product,
'photos' => array(
array('id' => 1, 'name' => 'photo1')
array('id' => 2, 'name' => 'photo2')
)
}
Is it possible in PHP using clear SQL?
I know that is possible to get 2 arrays and connect it but I have many records and I dont want to wase time to quering.
You have to add a foreign_key in your photo table "product_id".
Then create a method getPhotos() in your Product class with will collect all photos for your product.
Is it possible in PHP using clear SQL?
Not in a single SQL call. With a single call, this is the closest you can get:
array(
'id' => 1,
'name' => product,
'photo_id' => 1,
'photo_name' => 'photo1')
),
array(
'id' => 1,
'name' => product,
'photo_id' => 2,
'photo_name' => 'photo2')
)
Your only choice for the format you want is to run queries separately or to combine them into the data structure you want.
As mentioned, this is not possible with SQL. SQL is based on the relational model which is a 1-Normal-Form data model. That means, the result relation is also flat (no nested relations in a relation).
However, there are good frameworks which generate intermediary models in your corresponding target language (e.g. Python, Java, ...) that circumvent the impression of a flat data model. Check for example Django.
https://docs.djangoproject.com/en/1.8/topics/db/models/
Moo