I have a database table that is similar to the following picture. Every time an emission is reported, a row is inserted to the table. The total column is just the addition of CO and CO2 columns. There can be multiple rows for any city on a particular date. The following table represents data for only two dates and three different cities.
For each date, I need to compute totals for CO, CO2, and total columns for each city, then calculate the subtotal of those cities for that date. Finally, combine those subtotals into a grand total. The following picture represents the result I am trying to achieve,
In addition to that, I also need to keep a couple of filters such as result for a certain date range or a certain country, etc. I am planning to query rows and loop over them to compute the summations at the PHP end. This approach is questionable as it does not seem very efficient for a large volume of data. I was wondering if there is any efficient way of doing this type of computation, maybe using complex GROUP BY or temporary table/view at the MySQL end. Please suggest any better way to do it for a table that may grow up to a few millions of rows. Let me also include an array representation of the sample data for convenience,
$sample = [
['id', 'date', 'country', 'city', 'CO', 'CO2', 'total'],
[1, '2020-07-13', 'US', 'Scarsdale', 5, 10, 15],
[2, '2020-07-13', 'US', 'Scarsdale', 10, 10, 20],
[3, '2020-07-13', 'US', 'SF', 5, 15, 20],
[4, '2020-07-13', 'US', 'SF', 15, 25, 40],
[5, '2020-07-13', 'UK', 'London', 10, 15, 25],
[6, '2020-07-12', 'UK', 'London', 10, 20, 30],
[7, '2020-07-12', 'UK', 'London', 5, 5, 10],
[8, '2020-07-12', 'US', 'SF', 10, 20, 30],
[9, '2020-07-12', 'US', 'Scarsdale', 5, 5, 10],
];
Thanks in advance.
Related
I want to calculate and store the dense rank and gapped rank for all entries in an array using PHP.
I want to do this in PHP (not MySQL because I am dealing with dynamic combinations 100,000 to 900 combinations per week, that’s why I cannot use MySQL to make that many tables.
My code to find the dense ranks is working, but the gapped ranks are not correct.
PHP code
$members = [
['num' => 2, 'rank' => 0, 'dense_rank' => 0],
['num' => 2, 'rank' => 0, 'dense_rank' => 0],
['num' => 3, 'rank' => 0, 'dense_rank' => 0],
['num' => 3, 'rank' => 0, 'dense_rank' => 0],
['num' => 3, 'rank' => 0, 'dense_rank' => 0],
['num' => 3, 'rank' => 0, 'dense_rank' => 0],
['num' => 3, 'rank' => 0, 'dense_rank' => 0],
['num' => 5, 'rank' => 0, 'dense_rank' => 0],
['num' => 9, 'rank' => 0, 'dense_rank' => 0],
['num' => 9, 'rank' => 0, 'dense_rank' => 0],
['num' => 9, 'rank' => 0, 'dense_rank' => 0]
];
$rank=0;
$previous_rank=0;
$dense_rank=0;
$previous_dense_rank=0;
foreach($members as &$var){
//star of rank
if($var['num']==$previous_rank){
$var['rank']=$rank;
}else{
$var['rank']=++$rank;
$previous_rank=$var['num'];
}//end of rank
//star of rank_dense
if($var['num']===$previous_dense_rank){
$var['dense_rank']=$dense_rank;
++$dense_rank;
}else{
$var['dense_rank']=++$dense_rank;
$previous_dense_rank=$var['num'];
}
//end of rank_dense
echo $var['num'].' - '.$var['rank'].' - '.$var['dense_rank'].'<br>';
}
?>
My flawed output is:
num
rank
dynamic rank
2
1
1
2
1
1
3
2
3
3
2
3
3
2
4
3
2
5
3
2
6
5
3
8
9
4
9
9
4
9
9
4
10
Notice when the error happens and there is a higher number in the number column it corrects the error in that row. See that when the number goes from 3 to 5.
Given that your results are already sorted in an ascending fashion...
For dense ranking, you need to only increment your counter when a new score is encountered.
For gapped ranking, you need to unconditionally increment your counter and use the counter value for all members with the same score.
??= is the "null coalescing assignment" operator (a breed of "combined operator"). It only allows the right side operand to be executed/used if the left side operand is not declared or is null. This is a technique of performing conditional assignments without needing to write a classic if condition.
Code: (Demo)
$denseRank = 0;
$gappedRank = 0;
foreach ($members as &$row) {
$denseRanks[$row['num']] ??= ++$denseRank;
$row['dense_rank'] = $denseRanks[$row['num']];
++$gappedRank;
$gappedRanks[$row['num']] ??= $gappedRank;
$row['rank'] = $gappedRanks[$row['num']];
// for better presentation:
echo json_encode($row) . "\n";
}
Output:
{"num":2,"rank":1,"dense_rank":1}
{"num":2,"rank":1,"dense_rank":1}
{"num":3,"rank":3,"dense_rank":2}
{"num":3,"rank":3,"dense_rank":2}
{"num":3,"rank":3,"dense_rank":2}
{"num":3,"rank":3,"dense_rank":2}
{"num":3,"rank":3,"dense_rank":2}
{"num":5,"rank":8,"dense_rank":3}
{"num":9,"rank":9,"dense_rank":4}
{"num":9,"rank":9,"dense_rank":4}
{"num":9,"rank":9,"dense_rank":4}
For the record, if you are dealing with huge volumes of data, I would be using SQL instead of PHP for this task.
It seems like you want the dynamic rank to be sequential?
Your sample data appears to be sorted, if this remains true for your real data then you can remove the conditional and just increment the variable as you assign it:
//start of rank_dense
$var['dense_rank']=++$dense_rank;
//end of rank_dense
It sounds like you're saying you won't be implementing a database.
Databases like MySQL can easily handle the workload numbers you outlined and they can sort your data as well. You may want to reconsider.
I have a table with a json column cast as array. The Schema creation has the column defined like this:
$table->json('row_ids');
In my class:
protected $casts = [
'row_ids' => 'array',
];
I generate the data in this array / column by getting every row id from another table like this:
$id_array = DB::table($table->name)->pluck('api_id')->toArray();
TableChannelRow::create([
'table_api_id' => $table->api_id,
'channel_api_id' => $channel->api_id,
'row_ids' => $id_array,
]);
When I dd a collection record I can see the columns in the target table OK, with one of the columns containing an array as expected:
#attributes: array:4 [▼
"api_id" => 2
"table_api_id" => 1
"channel_api_id" => 6
"row_ids" => "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, ▶"
]
When I check in MySQLWorkbench the data looks like this:
In another controller I want to add or remove entries from the array in this column like this:
$table_channel_row = TableChannelRow::where('table_api_id', '=', $rowId)
->where('channel_api_id', '=', $channelId)
->first();
$row_ids = $table_channel_row->row_ids;
if ($is_checked == 'yes') {
// Add value to array if it does not already exist
if (!in_array($row_id, $row_ids)) {
array_push($row_ids, $row_id);
}
} else {
// Remove value from array
$row_id_array[] = $row_id;
$row_ids = array_diff($row_ids, $row_id_array);
}
$table_channel_row->update([
'row_ids' => $row_ids,
]);
Now the data in MySQLWorkbench looks like this:
Why does it get stored looking like a PHP array in the first instance, then later on update it gets stored as a json object?
Additionally the remove PHP code is working, yet the add is not, though it does not trigger an exception (you can see the first value is removed in the second image, but I cannot find it in the object in MySQL when I trigger the code to add it)
What did I miss? Thanks!
The reason why it's saving differently is because array_diff returns an associative array whereas your initial data is an indexed array. Take the following for example:
$ids = [1, 2, 3, 4, 5];
$ids2 = [1, 2, 3];
Then if you perform an array_diff($ids, $ids2), it would return the following:
[
3 => 4,
4 => 5
]
So if you want to save as the same format as your initial one, you have to retrieve the values of the array using array_values:
$row_ids = array_values(array_diff($row_ids, $row_id_array));
First time here - please go easy… ;)
I'm starting off with MongoDB for the first time - using the offical PHP driver to interact with an application.
Here's the first problem I've ran into with regards to the aggregation framework.
I have a collection of documents, all of which contain an array of numbers, like in the following shortened example...
{
"_id": ObjectId("51c42c1218ef9de420000002"),
"my_id": 1,
"numbers": [
482,
49,
382,
290,
31,
126,
997,
20,
145
],
}
{
"_id": ObjectId("51c42c1218ef9de420000006"),
"my_id": 2,
"numbers": [
19,
234,
28,
962,
24,
12,
8,
643,
145
],
}
{
"_id": ObjectId("51c42c1218ef9de420000008"),
"my_id": 3,
"numbers": [
912,
18,
456,
34,
284,
556,
95,
125,
579
],
}
{
"_id": ObjectId("51c42c1218ef9de420000012"),
"my_id": 4,
"numbers": [
12,
97,
227,
872,
103,
78,
16,
377,
20
],
}
{
"_id": ObjectId("51c42c1218ef9de420000016"),
"my_id": 5,
"numbers": [
212,
237,
103,
93,
55,
183,
193,
17,
346
],
}
Using the aggregation framework and PHP (which I think is the correct way), I'm trying to work out the average amount of times a number doesn't appear in a collection (within the numbers array) before it appears again.
For example, the average amount of times the number 20 doesn't appear in the above example is 1.5 (there's a gap of 2 collections, followed by a gap of 1 - add these values together, divide by number of gaps).
I can get as far as working out if the number 20 is within the results array, and then using the $cond operator, passing a value based on the result. Here’s my PHP…
$unwind_results = array(
'$unwind' => '$numbers'
);
$project = array (
'$project' => array(
'my_id' => '$my_id',
'numbers' => '$numbers',
'hit' => array('$cond' => array(
array(
'$eq' => array('$numbers',20)
),
0,
1
)
)
)
);
$group = array (
'$group' => array(
'_id' => '$my_id',
'hit' => array('$min'=>'$hit'),
)
);
$sort = array(
'$sort' => array( '_id' => 1 ),
);
$avg = $c->aggregate(array($unwind_results,$project, $group, $sort));
What I was trying to achieve, was to setup up some kind of incremental counter that reset everytime the number 20 appeared in the numbers array, and then grab all of those numbers and work out the average from there…But im truly stumped.
I know I could work out the average from a collection of documents on the application side, but ideally I’d like Mongo to give me the result I want so it’s more portable.
Would Map/Reduce need to get involved somewhere?
Any help/advice/pointers greatly received!
As Asya said, the aggregation framework isn't usable for the last part of your problem (averaging gaps in "hits" between documents in the pipeline). Map/reduce also doesn't seem well-suited to this task, since you need to process the documents serially (and in a sorted order) for this computation and MR emphasizes parallel processing.
Given that the aggregation framework does process documents in a sorted order, I was brainstorming yesterday about how it might support your use case. If $group exposed access to its accumulator values during the projection (in addition to the document being processed), we might be able to use $push to collect previous values in a projected array and then inspect them during a projection to compute these "hit" gaps. Alternatively, if there was some facility to access the previous document encountered by a $group for our bucket (i.e. group key), this could allow us to determine diffs and compute the gap span as well.
I shared those thoughts with Mathias, who works on the framework, and he explained that while all of this might be possible for a single server (were the functionality implemented), it would not work at all on a sharded infrastructure, where $group and $sort operations are distributed. It would not be a portable solution.
I think you're best option is to run the aggregation with the $project you have, and then process those results in your application language.
I'm wondering if there is a way to filter out an array like you can do in SQL queries. Like WHERE, OR, LIMIT, ORDER BY (etc). I'm caching my tables in JSON files to avoid unnecessary mysql connections.
Let me explain
This is my example array
$array = array(
0 => array('value1', 'value2', 'value3', 'value4', 'value5', 'value6', 'value7'),
1 => array(1, '-1', 1, 'Google.com', 'http://google.com/', 'about:blank', 1),
2 => array(2, '-1', 2, 'Yahoo.com', 'http://yahoo.com/', 'about:blank', 1),
3 => array(1, '-1', 1, 'Bing.com', 'http://bing.com/', 'about:blank', 3),
4 => array(1, '-1', 1, 'Youtube.com', 'http://youtube.com/', 'about:blank', 3),
5 => array(1, '-1', 1, 'Facebook.com', 'http://facebook.com/', 'about:blank', 4),
6 => array(1, '-1', 1, 'Stackoverflow.com', 'http://stackoverflow.com/', 'about:blank', 3),
);
Now I wanna filter my array out.
Lets say I just want records where value7 is 3 or value3 is Bing.com, or value2 is -1.
Is this possible todo without hundreds of loops and checks?
Is it worth it or will a SQL server cost less?
You might be able to write your own filters if you only have a couple. Otherwise, SQL might be a better way to store/query this data:
http://php.net/manual/en/function.array-filter.php
You could use a flavour of Linq for PHP. Linq is originally from ASP.net, and it's use is to filter datasets using a language similar to SQL. A quick google gave me this: http://phplinq.codeplex.com/ , but it seems it has not been updated since 2009. Perhaps there are other implementations around.
This is better suited for a database.
You might be able to get around having to get a SQL server by using a file system database like SQLite.
Currently I am storing adjacencies in a php file in an array. Here's a sample of it:
$my_neighbor_lists = array(
1=> array(3351=> array (2, 3, 5 , 6, 10)),
2=> array(3264=> array (322, 12, 54 , 6, 10), 3471=>array (122, 233, 35 , 476, 210)),
3=> array(3309=> array (52, 32, 54 , 36, 210), 3469=>array (152, 32, 15 , 836, 10)),
etc
I would like to basically migrate this into a db. Any suggestions on how many table I should have? I am looking at three tables here.
two tables:
1. vertices (id)
2. edgecost (idfrom, idto, time, cost)