I need to store some data that is essentially just an array of key-value pairs of date/ints, where the dates will always be unique.
I'd like to be able to store it like an associative array:
array(
"2012-02-26" => 5,
"2012-02-27" => 2,
"2012-02-28" => 17,
"2012-02-29" => 4
)
but I also need to be able to query the dates (ie. get everything where date > 2012-02-27), and so suspect that I'll need to use a schema more like:
array(
array("date"=>"2012-02-26", "value"=>5),
array("date"=>"2012-02-27", "value"=>2),
array("date"=>"2012-02-28", "value"=>17),
array("date"=>"2012-02-29", "value"=>4),
)
Obviously the former is much cleaner and more concise, but will I be able to query it in the way that I am wanting, and if not are there any other schemas that may be more suitable?
You've described two methods, let me break them down.
Method #1 - Associative Array
The key tool for querying by "associative array" is the $exists operator. Here are details on the operator.
So you can definitely run a query like the following:
db.coll.find( { $exists: { 'field.2012-02-27' } } );
Based on your description you are looking for range queries which does not match up well with the $exists operator. The "associative array" version is also difficult to index.
Method #2 - Array of objects
This definitely has better querying functionality:
db.coll.find( { 'field.date': { $gt: '2012-02-27' } } );
It can also be indexed
db.coll.ensureIndex( { 'field.date': 1 } );
However, there is a trade-off on updating. If you want to increment the value for a specific date you have to use this unwieldy $ positional operator. This works for an array of objects, but it fails for anything with further nesting.
Other issues
One issue with either of these methods is the long-term growth of data. As you expand the object size it will take more space on disk and in memory. If you have an object with two years worth of data that entire array of 700 items will need to be in memory for you to update data for today. This may not be an issue for your specific data, but it should be considered.
In the same vein, MongoDB queries always return the top-level object. Again, if you have an array of 700 items, you will get all of them for each document that matches. There are ways to filter out the fields that are returned, but they don't work for "arrays of objects".
Related
given I have an array, say:
$myArray=['12','AB','3C']
I want to return the value 2 (which is the length of each of the array elements indivudually.)
But in case I have something like
$myArray=['12','AB2','3C']
I want to stop the calculation/loop right after the second element of the array 'AB2' and let my function return null.
What is the most effective way to reach this in the matter of being performance and speed effective? Since such an array can get long.
Casual way
I think you are trying to stop the array loop the moment you get two different lengths in an element?
In that case, at worst, you'd need an O(n) runtime (since you need to verify every element, unless you have an abstract data type in mind in which case it could be O(1), if it is stored in the object property or you calculate the difference detected on the fly while pushing items into arrays)
Since the moment we discover an element is not the same length, we can simply quickly store the length of the first element in the array since we know if we detect any other length other than what we stored, we can immediately return null
function linear_loop($array) {
$len_of_first = strlen($array[0]);
foreach ($array as $val) {
if (strlen($val) != $len_of_first) {
return null;
}
}
//Function still running, entire array was same, return the length of first element
return $len_of_first;
}
This function is O(n) with each operation is constant. strlen is O(1)
Algorithmic complexity of PHP function strlen()
Most "performance-fastest"
Since you said that the array can get quite long, if you are not immediately generating the array, but rather you need to push items into it, then in your push operation, you can check before pushing it the item_to_be_pushed is the same strlen or whatever property you are trying to compare as the one you've stored (which can be picked arbitrarily, since the array must be of uniform some property)
In this case, you could have some object with property: uniform_length and store that. Then whenever you push into your array, you can check against it with the uniform_length. If it isn't the same length, then you can store in the object property called uniform as false. (By default uniform is true since if there is only one element in the array, it must be uniform).
This would be an O(1) calculation since it is stored as an attribute. But you probably don't need an object for something as simple as this, and you can just store it as some variable.
O(1) vs O(n) Runtime and why it is more performance effective
Since not everyone knows Big O, a quick explanation on what I said. O(1) runtime is "infinitely" better than O(n) runtime since the runtime of the function will not grow with input (as in processing 1 million items require the same amount of steps as processing 1 item)
Just loop through and return early when you find something that isn't correct. Don't worry about micro-optimizations until you have profiled and found that this function is really your bottleneck
ex.
function isCorrect($arr) {
$len = strlen($arr[0]);
for($arr as $val) {
if(strlen($val) != $len) {
return false;
}
}
return true;
}
Just my two cents. You could also use array_map for this:
$myArray = ['12','AB','3CC'];
$lengths = array_map('strlen', $myArray);
// output
Array
(
[0] => 2
[1] => 2
[2] => 3
)
you can just write an if statement and check the max($lengths) and return true or false
-Cheers
In the language of Perl, I define a hash as a mapping between one thing and another or an essential list of elements. As stated in the documentation..
A hash is a basic data type. It uses keys to access its contents.
So basically a hash is close to an array. Their initializations even look very similar.
If I were to create a mapping in Perl, I could do something like below for comparing.
my %map = (
A => [qw(a b c d)],
B => [qw(c d f a)],
C => [qw(b d a e)],
);
my #keys = keys %map;
my %matches;
for my $k ( 1 .. #keys ) {
$matches{$_} |= 2**$k for #{$map{ $keys[$k-1] }};
}
for ( sort keys %matches ) {
my #found;
for my $k ( 1 .. #keys ) {
push #found, $keys[$k-1] if $matches{$_} & 2**$k;
}
print "$_ found in ", (#found? join(',', #found) : 0 ), "\n";
}
Output:
a found in A,C,B
b found in A,C
c found in A,B
d found in A,C,B
e found in C
f found in B
I would like to find out the best method of doing this for performance and efficiency in php
If I understand correctly, you are looking to apply your knowledge of Perl hashes to PHP. If I'm correct, then...
In PHP a "Perl hash" is generally called an "associative array", and PHP implements this as an array that happens to have keys as indexes and its values are just like a regular array. Check out the PHP Array docs for lots of examples about how PHP lets you work with arrays of this (and other) types.
The nice thing about PHP is it is very flexible as to how you can deal with arrays. You can define an array as having key-value pairs then treat it like a regular array and ignore the keys, and that works just fine. You can mix and match...it doesn't complain much.
Philosophically, a hash or map is just a way to keep discrete pieces of related information together. That's all most non-primitive data structures are, and PHP is not very opinionated about how you go about things; it has lots of built-in optimizations, and does a pretty solid job of doing these types of things efficiently.
To answer your questions related to your example:
1) As for simplicity (I think you mean) and maintainability, I don't think there's anything wrong with your use of an associative array. If a data set is in pairs, then key-value pairs is a natural way to express this type of data.
2) As for most efficient, as far as lines of code and script execution overhead goes...well, the use of such a mapping is a vanishingly small task for PHP. I don't think any other way of handling it would matter much, PHP can handle it by the thousands without complaint. Now if you could avoid the use of a regular expression, on the other hand...
3) You're using it, really. Don't over think it - in PHP this is just an "array", and that's it. It's a variable that holds an arbitrary amount of elements, and PHP handles multiple-dimensions or associativity pretty darn well. Well enough that it's almost never going to be the cause of any problem you have.
PHP will handle things like hash/maps behind the scenes very logically and efficiently, to the point that part of the whole point of the language is for you not to bother to try to think about such things. If you have relates pieces of data in chunks, use an array; if the pieces of data comes in pairs, use key-value pairs; if it comes by the dozen, use an "array of arrays" (a multidimensional array where some - or all - of it's elements are arrays).
PHP doesn't do anything stupid like create a massive overhead just because you wanted to use key-value pairs, and it has lots of built-in features like foreach $yourArray as $key => $value and the functions you used like array_keys() and array_values(). Feel free to use them - as core features they are generally pretty darn well optimized!
For what you are doing I would rather use sprintf:
$format = 'Hello %s how are you. Hey %s, hi %s!';
printf($format, 'foo', 'bar', 'baz');
Background:
I have a class in this application I'm building whose job is:
__construct: $this->data = $this->mongoDB->collection->findOne();
Intermediate functions are employed to manipulate the data in tens of different ways each request. One manipulation could trigger one which would trigger another. This allows me to do unlimited updates to the mongo document with just one query, as long as $this->data['_id'] remains the same. This is the only place where data manipulation of this specific collection is allowed.
__destruct: $this->monboDB->collection->save($data)
Data is then read back, json_encode'd and sent to Javascript to draw the page
Intention:
I intended to delete a member of an array by looping through said array, matching a value within it, and unsetting that. Example:
foreach($this->data['documents'] as $key => $val){
if($val == $toBeDeleted){
unset($this->data['documents'][$key];
}
}
Then, this would be saved to the DB when the script finishes.
Problem:
When javascript reads back the data, rather than having ['a', 'b', 'd'], I had {'0': 'a', '1': 'b', '3': 'd'} - which can't be treated like an array and would pretty much break things.
I had this question half typed out before my a-hah! moment, so I figured I'd post my own answer to it too for future reference.
In php, an associative array and an array are all the same. You can have out of order keys, nonconsecutive keys, and almost any key that you'd like to use in calling your array member. Most, if not all, php array functions work with any array key. Objects are a totally different thing.
That being said, Javascript doesn't share the same rules for arrays. A javascript array must have consecutive keys starting at zero, otherwise it is an object. MongoDB is similar to Javascript in this way.
When php converts an object to be used in MongoDB or in Javascript, if the php array doesn't follow that rule, it becomes a Javascript object.
The problem was after unsetting an array index, it left a gap, causing nonconsecutive array keys, causing it to become an object. Simple fix would either be array_slice($array, $key, 1) or $array = array_values($array)
I am trying to save some db action by compiling a looped bit of code with a single query, Before I was simply adding to the the like statements using a loop before firing off the query but i cant get the same idea going in Mongo, id appreciate any ideas....
I am basically trying to do a like, but with the value as an array
('app', replaces 'mongodb' down to my CI setup )
Here's how I was doing it pre mongofication:
foreach ($workids as $workid):
$this->ci->app->or_like('work',$workid) ;
endforeach;
$query = $this->ci->db->get("who_users");
$results = $query->result();
print_r($results);
and this is how I was hoping I could get it to work, but no joy here, that function is only designed to accept strings
$query = $this->ci->app->like('work',$workids,'.',TRUE,TRUE)->get("who_users");
print_r($query);
If anyone can think of a way any cunning methods I can get my returned array with a single call again it would be great I've not found any documentation on this sort of query, The only way i can think of is to loop over the query and push it into a new results array.... but that is really gonna hurt if my app scales up.
Are you using codeigniter-mongodb-library? Based on the existing or_like() documentation, it looks like CI wraps each match with % wildcards. The equivalent query in Mongo would be a series of regex matches in an $or clause:
db.who_users.find({
$or: [
{ work: /.*workIdA.*/ },
{ work: /.*workIdB.*/ },
...
]});
Unfortunately, this is going to be quite inefficient unless (1) the work field is indexed and (2) your regexes are anchored with some constant value (e.g. /^workId.*/). This is described in more detail in Mongo's regex documentation.
Based on your comments to the OP, it looks like you're storing multiple ID's in the work field as a comma-delimited string. To take advantage of Mongo's schema, you should model this as an array of strings. Thereafter, when you query on the work field, Mongo will consider all values in the array (documented discussed here).
db.who_users.find({
work: "workIdA"
});
This query would match a record whose work value was ["workIdA", "workIdB"]. And if we need to search for one of a set of ID's (taking this back to your OR query), we can extend this example with the $in operator:
db.who_users.find({
work: { $in: ["workIdA", "workIdB", ...] }
});
If that meets your needs, be sure to index the work field as well.
You can pass a boolean to json_decode to return an array instead of an object
json_decode('{"foo", "bar", "baz"}', true); // array(0 => 'foo', 1 => 'bar', 2 => 'baz')
My question is this. When parsing object literals, does this guarantee that the ordering of the items will be preserved? I know JSON object properties aren't ordered, but PHP arrays are. I can't find anywhere in the PHP manual where this is addressed explicitly. It probably pays to err on the side of caution, but I would like to avoid including some kind of "index" sub-property if possible.
Wouldn't it make more sense in this case to use an array when you pass the JSON to PHP. If you don't have any object keys in the JSON (which become associative array keys in PHP), just send it as an array. That way you will be guaranteed they will be in the same order in PHP as in javascript.
json_decode('{["foo", "bar", "baz"]}');
json_decode('["foo", "bar", "baz"]'); //I think this would work
If you need associative arrays (which is why you are passing the second argument as true), you will have to come up with some way to maintain their order when passing. You will pry have to do some post-processing on the resulting array after you decode it to format it how you want it.
$json = '{[ {"key" : "val"}, {"key" : "val"} ]}';
json_decode($json, true);
Personally, I've never trusted any system to return an exact order unless that order is specifically defined. If you really need an order, then use a dictionary aka 2dimension array and assigned a place value (0,1,2,3...) to each value in the list.
If you apply this rule to everything, you'll never have to worry about the delivery/storage of that array, be it XML, JSON or a database.
Remember, just because something happens to work a certain way, doesn't mean it does so intentionally. It's akin to thinking rows in a database have an order, when in fact they don't unless you use an ORDER BY clause. It's unsafe to think ID 1 always comes before ID 2 in a SELECT.
I've used json_decode() some times, and the results order was kept intact with PHP client apps. But with Python for instance it does not preserve the order.
One way to be reassured is to test it over with multiple examples.
Lacking an explicit statement I'd say, by definition, no explicit order will be preserved.
My primary line of thought it what order, exactly, would this be preserving? The json_decode function takes a string representation of a javascript object literal as it's argument, and then returns either an object or an array. The function's input (object literal as string) has no explicit ordering, which means there's no clear order for the json_decode function to maintain.