Most efficient way to query with two keys in either order? - php

I have some data I receive of the following format:
{ gameId: 1, playerId: "john", score: .12 }
{ gameId: 1, playerId: "mary", score: .75 }
{ gameId: 1, playerId: "jane", score: .32 }
{ gameId: 2, playerId: "john", score: .89 }
{ gameId: 2, playerId: "mary", score: .91 }
{ gameId: 2, playerId: "jane", score: .99 }
And I want to expose these endpoints:
GET games => get a list of all games
GET games/{id} => list all the games for the gameId and the average score among all players
GET players => get a list of all games
GET players/{id} => list all the scores for the player and their average score across all games
So setting DBs aside for a moment, I thought of a hash map approach where I basically have two maps:
gameScores[gameId][playerId] = score
playerScores[playerId][gameId] = score
This way I could very efficiently return results as the following:
GET games => array_keys(gameScores)
GET games/{id} => ['average' => avg(gameScores[id]), 'games' => gameScores[id]]
GET players => array_keys(playerScores)
GET players/{id} => ['average' => avg(playerScores[id]), 'games' => playerScores[id]]
This seems like a very efficient way to return results in near O(1) time, but is it too great a drawback to be duplicating the dataset by 2? Imagine if score was some very large object instead.
I'm doing this in PHP, so not using something like a Python tuple solution here, but I feel like this problem is very generalizable (hash map with 2 keys) and I'm wondering if there's a better way to approach this rather than duplicating the hash map for both key orders.
Is using a database the only more optimal approach here? If so, would it be best to enter the data into a single table with those 3 columns, or should I be splitting the tables?

You only need one array and search it with
array_search($search, array_column($array, $field));

Related

Efficient solution to generating an array in PHP, which extracts unique data from one array, based on data from another

Writing in PHP, I have 2 arrays, each created from SQL queries.
The first query runs through a table that has multiple pieces of data that correspond to various quiz attempts. The table has a column for the user's Email, the activity ID (which represents a quiz attempt) and another 2 columns for data relating to the attempt (for example 'percentage achieved' or 'quiz ID'):
UserEmail ActID ActKey ActMeta
joB#gm.com 2354 Percentage 98
joB#gm.com 2354 Quiz ID 4
boM#hm.com 4567 Percentage 65
boM#hm.com 4567 Quiz ID 7
Once queried, this first array ($student_quiz_list) stores the selected data in the form of
[[UserEmail, ActID, ActKey, ActMeta], [UserEmail, ActID, ActKey, ActMeta], [UserEmail, ActID, ActKey, ActMeta]...]
where each pair of sub-arrays corresponds to a single quiz attempt.
The second table that is queried has two columns that relate to the quizzes themselves. The first column is the Quiz ID and the second is the Quiz name.
Quiz ID Quiz Name
4 Hardware
7 Logic
Once queried, this second array ($quiz_list) stores the selected data in the form of
[[ID, Name], [ID, Name]...]
What I need to do is create a 3rd array (from the 2 above) which holds the user's email and percentage score
[email, percentage], [email, percentage]...]
but with each sub-array corresponding to a unique actID (so basically the user's percentage in each quiz they attempted without duplicates) and (this is the challenging bit) only for quizzes with certain ID values, in this case, let's say quiz ID 4.
In PHP, what would be the most efficient solution to this? I continually create arrays with duplicates and cannot find a neat solution which provides the outcome desired.
Any help would be greatly received.
Try this code as the example and let me know.
$student_quiz_list=array(
array(
'UserEmail'=>'joB#gm.com','ActID'=>'2354','ActKey'=>'Percentage','ActMeta'=>'90',
),
array(
'UserEmail'=>'joB#gm.com','ActID'=>'2354','ActKey'=>'QuizID','ActMeta'=>'4',
),
array(
'UserEmail'=>'boM#hm.com','ActID'=>'4567','ActKey'=>'Percentage','ActMeta'=>'98',
),
array(
'UserEmail'=>'boM#hm.com','ActID'=>'4567','ActKey'=>'QuizID','ActMeta'=>'7',
),
);
$final_array=array();
foreach( $student_quiz_list as $row){
if($row['ActKey']=='Percentage'){
$final_array[]=array('UserEmail'=>$row['UserEmail'],
'ActMeta'=>$row['ActMeta']
) ;
}
}
echo"<pre>"; print_r($final_array); echo"</pre>";
As commenter #Nico Haase suggested, you can do most of the logic in SQL. You didn't respond to my comment, so I suppose a user can have multiple attempts per quiz ID:
SELECT
UserEmail,
ActMeta
FROM
your_table # replace with your table name
WHERE
ActKey = 'Percentage'
AND ActID IN (
# subselection with table alias
SELECT
t2.ActID
FROM
your_table t2 # replace with your table name
WHERE
t2.ActKey = 'Quiz ID'
AND t2.ActMeta = 2 # insert your desired quiz ID here
AND t2.ActID = ActID
)
(Query tested with MySQL/MariaDB)
For the case that you cannot change the SQL part, here is how you can process your data in PHP. But consider that a large dataset could exceed your server capabilities, so I would definitely recommend the solution above:
// Your sample data
$raw = [
['UserEmail' => 'joB#gm.com', 'ActID' => 2354, 'ActKey' => 'Percentage' , 'ActMeta' => 98],
['UserEmail' => 'joB#gm.com', 'ActID' => 2354, 'ActKey' => 'Quiz ID', 'ActMeta' => 4],
['UserEmail' => 'joB#gm.com', 'ActID' => 4567, 'ActKey' => 'Percentage' , 'ActMeta' => 65],
['UserEmail' => 'joB#gm.com', 'ActID' => 4567, 'ActKey' => 'Quiz ID', 'ActMeta' => 7],
];
// Extract the corresponding ActIDs for a QuizID
$quiz_id = 4;
$act_ids = array_column(
array_filter(
$raw,
function($item) use ($quiz_id) {
return $item['ActMeta'] == $quiz_id;
}
),
'ActID'
);
// Get the entries with ActKey 'Percentage' and an ActID present in the previously extracted set
$percentage_entries = array_filter(
$raw,
function($item) use ($act_ids) {
return $item['ActKey'] === 'Percentage' && in_array($item['ActID'], $act_ids);
}
);
// Map over the previous set to get the array into the final form
$final = array_map(
function($item) {
return [$item['UserEmail'], $item['ActMeta']];
},
$percentage_entries
);

How to Update Rows (with different data) in one DB Query Laravel

I'm making a gallery with sortable photos with Laravel and jQuery UI Sortable.
My function in the controller gets a nice array:
$items = [0 => 22, 1 => 25, 2 => 45];
But there will be approx 150 - 200 photos in one gallery. Is there any chance to set one DB Query instead 150 - 200? Because my controller makes this at the moment...
<?php
foreach($photos['item'] as $position => $id){
Photo::where('id', $id)->update(['position' => $position]);
}
But it creates approx 150 - 200 DB queries, which is awful.
Edit #1
Basically I need something like this (two corresponding arrays with ids and positions):
$ids = [22, 24, 25, 34];
$positions = [0, 1, 2, 3];
Photos::where('id', $ids)->update(['position'] => $positions);
But I can't find anything about this approach.
Take a look here: Eloquent model mass update.
Basically, you are looking for a mass or bulk update.

project the sum of values in a mongo subdocument

I have a Mongo Collection that I'm trying to aggregate in which I need to be able to filter the results based on a sum of values from a subdocument. Each of my entries has a subdocument that looks like this
{
"_id": <MongoId>,
'clientID': 'some ID',
<Other fields I can filter on normally>
"bidCompData": [
{
"lineItemID": "210217",
"qtyBid": 3,
"priceBid": 10.25,
"qtyComp": 0
"description": "Lawn Mowed"
"invoiceID": 23
},
{
<More similar entries>
}
]
}
What I'm trying to do is filter on the sum of qtyBid in a given record. For example, my user could specify that they only want records that have a total qtyBid across all of the bidCompData that's greater than 5. My research shows that I can't use $sum outside of the $group stage in the pipeline but I need to be able to sum just the qtyBid values for each individual record. Presently my pipeline looks like this.
array(
array('$project' => $basicProjection), //fields to project calculated earlier using the input parameters.
array('$match' => $query),
array('$group' => array(
'_id' =>
array('clientID' => '$clientID'),
'count' => array('$sum' => 1)
)
)
I tried having another group and an unwind before the group I presently have in my pipeline so that I could get the sum there but it doesn't let me keep my fields besides the id and the sum field. Is there a way to do this without using $where? My database is large and I can't afford the speed hit from the JS execution.

MongoDB - Aggregation Framework, PHP and averages

First time here - please go easy… ;)
I'm starting off with MongoDB for the first time - using the offical PHP driver to interact with an application.
Here's the first problem I've ran into with regards to the aggregation framework.
I have a collection of documents, all of which contain an array of numbers, like in the following shortened example...
{
"_id": ObjectId("51c42c1218ef9de420000002"),
"my_id": 1,
"numbers": [
482,
49,
382,
290,
31,
126,
997,
20,
145
],
}
{
"_id": ObjectId("51c42c1218ef9de420000006"),
"my_id": 2,
"numbers": [
19,
234,
28,
962,
24,
12,
8,
643,
145
],
}
{
"_id": ObjectId("51c42c1218ef9de420000008"),
"my_id": 3,
"numbers": [
912,
18,
456,
34,
284,
556,
95,
125,
579
],
}
{
"_id": ObjectId("51c42c1218ef9de420000012"),
"my_id": 4,
"numbers": [
12,
97,
227,
872,
103,
78,
16,
377,
20
],
}
{
"_id": ObjectId("51c42c1218ef9de420000016"),
"my_id": 5,
"numbers": [
212,
237,
103,
93,
55,
183,
193,
17,
346
],
}
Using the aggregation framework and PHP (which I think is the correct way), I'm trying to work out the average amount of times a number doesn't appear in a collection (within the numbers array) before it appears again.
For example, the average amount of times the number 20 doesn't appear in the above example is 1.5 (there's a gap of 2 collections, followed by a gap of 1 - add these values together, divide by number of gaps).
I can get as far as working out if the number 20 is within the results array, and then using the $cond operator, passing a value based on the result. Here’s my PHP…
$unwind_results = array(
'$unwind' => '$numbers'
);
$project = array (
'$project' => array(
'my_id' => '$my_id',
'numbers' => '$numbers',
'hit' => array('$cond' => array(
array(
'$eq' => array('$numbers',20)
),
0,
1
)
)
)
);
$group = array (
'$group' => array(
'_id' => '$my_id',
'hit' => array('$min'=>'$hit'),
)
);
$sort = array(
'$sort' => array( '_id' => 1 ),
);
$avg = $c->aggregate(array($unwind_results,$project, $group, $sort));
What I was trying to achieve, was to setup up some kind of incremental counter that reset everytime the number 20 appeared in the numbers array, and then grab all of those numbers and work out the average from there…But im truly stumped.
I know I could work out the average from a collection of documents on the application side, but ideally I’d like Mongo to give me the result I want so it’s more portable.
Would Map/Reduce need to get involved somewhere?
Any help/advice/pointers greatly received!
As Asya said, the aggregation framework isn't usable for the last part of your problem (averaging gaps in "hits" between documents in the pipeline). Map/reduce also doesn't seem well-suited to this task, since you need to process the documents serially (and in a sorted order) for this computation and MR emphasizes parallel processing.
Given that the aggregation framework does process documents in a sorted order, I was brainstorming yesterday about how it might support your use case. If $group exposed access to its accumulator values during the projection (in addition to the document being processed), we might be able to use $push to collect previous values in a projected array and then inspect them during a projection to compute these "hit" gaps. Alternatively, if there was some facility to access the previous document encountered by a $group for our bucket (i.e. group key), this could allow us to determine diffs and compute the gap span as well.
I shared those thoughts with Mathias, who works on the framework, and he explained that while all of this might be possible for a single server (were the functionality implemented), it would not work at all on a sharded infrastructure, where $group and $sort operations are distributed. It would not be a portable solution.
I think you're best option is to run the aggregation with the $project you have, and then process those results in your application language.

Updating multiple embedded docs in mongoDB

I need to update multiple embedded docs in mongo using PHP. My layout looks like this:
{
_id: id,
visits : {
visitID: 12
categories: [{
catagory_id: 1,
name: somename,
count: 11,
duration: 122
},
{
catagory_id: 1,
name: some other name,
count: 11,
duration: 122
},
{
catagory_id: 2,
name: yet another name,
count: 11,
duration: 122
}]
}
}
The document can have more than one visit too.
Now i want to update 2 categories, one with id=1 and name=somename and the other with id=1 and name=some_new_name. Both of them should inc "count" by 1 and "duration" by 45.
First document exists, but second does not.
Im thinking of having a function like this:
function updateCategory($id, $visitID,$category_name,$category_id) {
$this->profiles->update(
array(
'_id' => $id,
'visits.visitID' => $visitID,
'visits.categories.name' => $category_name,
'visits.categories.id' => $category_id,
),
array(
'$inc' => array(
'visits.categories.$.count' => 1,
'visits.categories.$.duration' =>45,
),
),
array("upsert" => true)
);
}
But with this i need to call the function for each category i will update. Is there any way to do this in one call?
EDIT:
Changed the layout a bit and made "categories" an object instead of array. Then used a combination of "category_id" and "category_name" as property name. Like:
categories: {
1_somename : {
count: 11,
duration: 122
},
1_some other name : {
count: 11,
duration: 122
},
2_yet another name : {
count: 11,
duration: 122
},
}
Then with upsert and something like
$inc: {
"visits.$.categories.1_somename.d": 100,
"visits.$.categories.2_yet another name.c": 1
}
i can update several "objects" at a time..
Mongodb currently not supporting arrays multiple levels deep updating (jira)
So following code will not work:
'$inc' => array(
'visits.categories.$.count' => 1,
'visits.categories.$.duration' => 123,
),
So there is some solutions around this:
1.Load document => update => save (possible concurrency issues)
2.Reorganize your documents structure like this(and update using one positional operator):
{
_id: id,
visits : [{
visitID: 12
}],
categories: [{
catagory_id: 1,
name: somename,
count: 11,
duration: 122,
visitID: 12
}]
}
}
3.Wait for multiple positional operators support (planning in 2.1 version of mongodb).
4.Reorganize your documents structure somehow else, in order to avoid multiple level arrays nesting.

Categories