I have a vary large dataset in MongoDB, in which there are documents with numeric fields. Due to some issue in the data import, some of these fields ended up in int32 datatype with some are in int64 datatype.
I need to convert all of them to int32. Since many of the fields are nested documents/array I cannot use MongoChef or RoboMongo to edit the field and do a collection wide replace.
What is my next best option? Would I need to write a script that loop through each document/field and explicitly typecast them to NumberInt(). I could do this in PHP or Python, but I was wondering if there is a way to do this without writing extra code.
Is there any mongoshell magic that can be done? I would appreciate if any Mongo Masters can give me any insights.
To anyone looking to do this and coming here. You can run
db.foo.find().forEach(doc => {
const newBar = bar.valueOf()
db.foo.update({
"_id" : doc._id
}, {
"$set" : {
"bar" : newBar
}
})
})
in the mongo shell. This might not be doable in large collections. The key is to use .valueOf() on the Int64. You might want to check that this doesn't overflow
Related
I've recently updated my server to a newer version of MySQL and PHP 7 for various reasons. On my previous instance, running PHP 5.5, Laravel's response()->json() always converted tinyint's into a string. Now running newer server software, it's returning me int's -as it should...
I'd have to change a lots of my codebase to either cast types / convert them into a string manually, whic I'm trying to avoid at the moment.
Is there a way to somehow force response()->json() to return int's as string's?
Is there a way to somehow force response()->json() to return int's as string's
I don't want to change the code base - do not want to cast types, convert it,
No. There's no option for that. You need to do that yourself if needed.
There is a way to cast integer into string in laravel
in your model you can cast id to string. Its as follows
protected $casts = [ 'id' => 'string' ];
But the downside is that you would have to do that for all Models.
If you don't want to modify a lot of code you could run response data through a quick and dirty function. Instead of going directory to JSON you should instead grab the data as a nested array. Then put it through a function like this:
function convertIntToString ($myArray) {
foreach ($myArray as $thisKey => $thisValue) {
if (is_array($thisValue)) {
// recurse to handle a nested array
$myArray[$thisKey] = convertIntToString($thisValue);
} elseif (is_integer($thisValue)) {
// convert any integers to a string
$myArray[$thisKey] = (string) $thisValue;
}
}
return $myArray;
}
The function will convert integers to strings and use recursion to handle nested arrays. Take the output from that and then convert it to JSON.
The best solution for me is to to use attribute casting and
Fractal transformers
Fractal transformers are extremely useful when you have complex responses with multiple relations included.
You can typecast it to string:
return response->json(["data" => (string)1]);
I have the following structure within a mongoDB collection:
{
"_id" : ObjectId("5301d337fa46346a048b4567"),
"delivery_attempts" : {
"0" : {
"live_feed_id" : 107,
"remaining_attempts" : 2,
"delivered" : false,
"determined_status" : null,
"date" : 1392628536
}
}
}
// > db.lead.find({}, {delivery_attempts:1}).pretty();
I'm trying to select any data from that collection where remaining_attempts are greater than 0 and a live_feed_id is equal to 107. Note that the "delivery_attempts" field is of a type hash.
I've tried using an addAnd within an elemMatch (not sure if this is the correct way to achieve this).
$qb = $this->dm->createQueryBuilder($this->getDocumentName());
$qb->expr()->field('delivery_attempts')
->elemMatch(
$qb->expr()
->field('remaining_attempts')->gt(0)
->addAnd($qb->expr()->field('live_feed_id')->equals(107))
);
I do appear to be getting the record detailed above. However, changing the greater than
test to 3
->field('remaining_attempts')->gt(3)
still returns the record (which is incorrect). Is there a way to achieve this?
EDIT: I've updated the delivery_attempts field type from a "Hash" to a "Collection". This shows the data being stored as an array rather than an object:
"delivery_attempts" : [
{
"live_feed_id" : 107,
"remaining_attempts" : 2,
"delivered" : false,
"determined_status" : null,
"date" : 1392648433
}
]
However, the original issue still applies.
You can use a dot notation to reference elements within a collection.
$qb->field('delivery_attempts.remaining_attempts')->gt(0)
->field('delivery_attempts.live_feed_id')->equals(107);
It works fine for me if I run the query on mongo.
db.testQ.find({"delivery_attempts.remaining_attempts" : {"$gt" : 0}, "delivery_attempts.live_feed_id" : 107}).pretty()
so it seems something wrong with your PHP query, I suggest running profiler to see which query is actually run against mongo
db.setProfilingLevel(2)
This will log all operation since you enable profiling. Then you can query the log to see which the actual queries
db.system.profile.find().pretty()
This might help you to find the culprit.
It sounds like your solved your first problem, which was using the Hash type mapping (instead for storing BSON objects, or associative arrays in PHP) instead of the Collection mapping (intended for real arrays); however, the query criteria in the answer you submitted still seems incorrect.
$qb->field('delivery_attempts.remaining_attempts')->gt(0)
->field('delivery_attempts.live_feed_id')->equals(107);
You said in your original question:
I'm trying to select any data from that collection where remaining_attempts are greater than 0 and a live_feed_id is equal to 107.
I assume you'd like that criteria to be satisfied by a single element within the delivery_attempts array. If that's correct, the criteria you specified above may match more than you expect, since delivery_attempts.remaining_attempts can refer to any element in the array, as can the live_feed_id criteria. You'll want to use $elemMatch to restrict the field criteria to a single array element.
I see you were using elemMatch() in your original question, but the syntax looked a bit odd. There should be no need to use addAnd() (i.e. an $and operator) unless you were attempting to apply two query operators to the same field name. Simply add extra field() calls to the same query expression you're using for the elemMatch() method. One example of this from ODM's test suite is QueryTest::testElemMatch(). You can also use the debug() method on the query to see the raw MongoDB query object created by ODM's query builder.
I'm using jQuery to post ajax requests, and PHP to construct XML responses. Everything works fine, but I wonder about the method I've used for data typing, and whether there's a more standard way, or a more correct way. My XML generally looks like this, with some attributes representing text and other attributes representing numeric data:
<UnitConversions>
<UnitConversion basicUnitName="mile" conversionFactor="5280" conversionUnit="foot"/>
<UnitConversion basicUnitName="mile" conversionFactor="1760" conversionUnit="yard"/>
</UnitConversions>
I have a lot of different objects, not just this one type, so in my constructors, rather than initializing every property explicitly, I just copy the attributes over from the XML node:
var UnitConverter = function(inUnitConversionNode) {
var that = this;
$.each(inUnitConversionNode.attributes, function(i, attribute) {
that[attribute.name] = attribute.value;
});
};
I had trouble early on when I checked for numeric values, as in if(someValueFromTheXML === 1) -- this would always evaluate to false because the value from the XML was a string, "1". So I added nodes in key places in the XML to tell my client-side code what to interpret as numeric and what to leave as text:
<UnitConversions>
<NumericFields>
<NumericField fieldName="conversionFactor"/>
</NumericFields>
<UnitConversion basicUnitName="mile" conversionFactor="5280" conversionUnit="foot"/>
<UnitConversion basicUnitName="mile" conversionFactor="1760" conversionUnit="yard"/>
</UnitConversions>
So now I pass the NumericFields node into the constructor so it will know which fields to store as actual numbers.
This all works great, but it seems like a bit of a naive solution, maybe even a hack. Seems like there would be something more sophisticated out there. It seems like this issue is related to XML schemas, but my googling seems to suggest that schemas are more about validation, rather than typing, and they seem to be geared toward server-side processing anyway.
What's the standard/correct way for js to know which fields in the XML are numeric?
You can use isNaN() to detect whether the string is a number. For example isNaN("5043") returns false indicating that "5043" can be parsed as a number. Then, just use parseInt() to compare the value. For example:
if (parseInt(someValueFromTheXML, 10) === 1) {
...
}
Another way is to use loose comparison with the == operator so that "1" == 1 evaluates to true. However, it would be better practice to use the first suggestion instead. There is really no other way to go about this since XML/HTML attributes are always strings.
Im looking to see if anyone can shed some light on a problem im having.
In my collection Y, I have a field called ADJU, which has stored in it, the serialised PHP array of MongoIDs.
One example field is
"a:1:{i:0;a:1:{s:4:\"MBID\";C:7:\"MongoId\":24:{4f2c5b9bb9a21d5010000005}}}"
The parameter im passing in is
"4f2c5b9bb9a21d5010000005"
public function read_adjudicating(MongoID $account_identifier){
$regexObj = new MongoRegex("/".$account_identifier->__toString()."/");
var_dump($regexObj);
$result = $this->connection->X->Y->find(array('ADJU' => $regexObj), array('__id'));
var_dump($result);
Can anyone work out why it is giving me 0 records, when as you can see, one example definately has it?
Thanks for your help!
Well, it's not the query:
db.illogical.insert({'ADJU': "a:1:{i:0;a:1:{s:4:\"MBID\";C:7:\"MongoId\":24:{4f2c5b9bb9a21d5010000005}}}"})
db.illogical.find({'ADJU': /4f2c5b9bb9a21d5010000005/})
{ "_id" : ObjectId("4f605b9e5d2b96c06d2adb27"), "ADJU" : "a:1:{i:0;a:1:{s:4:\"MBID\";C:7:\"MongoId\":24:{4f2c5b9bb9a21d5010000005}}}" }
Which means the php code you've written doesn't correspond to the query you expect, or the data isn't in the format you expect.
Rather than investigate why though - you'd be better off IMO either updating the script you used to import the data from mysql to deserialize before inserting to mongo - or write a (php) script to read the already-serialized-in-mongo data, deserialize it - and save it again.
I am considering using MongoDB to store documents that include a list of key/value pairs. The safe but ugly and bloated way to store this is as
[ ['k1' : 'v1'] , ['k2' : 'v2'], ...]
But document elements are inherently ordered within the underlying BSON data structure, so in principle:
{k1 : 'v1',
k2 : 'v2', ...}
should be enough. However I expect most language bindings will interpret these as associative arrays, and thus potentially scramble the ordering. So what I need to know is:
Does MongoDB itself promise to preserve item ordering of the second form.
Do language bindings have some API which can extract it ordered form -- even if the usual "convenient" API returns an associative array.
I am mostly interested in Javascript and PHP here, but I would also like to know about other languages. Any help is appreciated, or just a link to some documentation where I can go RTM.
From Version 2.6 on, MongoDB preserves the order of fields where possible. However, the _id field always comes first an renaming fields can lead to re-ordering. However, I'd generally try not to rely on details like this. As the original question mentions, there are also additional layers to consider which each must provide some sort of guarantee for the stability of the order...
Original Answer:
No, MongoDB does not make guarantees about the ordering of fields:
"There is no guarantee that the field order will be consistent, or the same, after an update."
In particular, in-place updates that change the document size will usually change the ordering of fields. For example, if you $set a field whose old value was of type number and the new value is NumberLong, fields usually get re-ordered.
However, arrays preserve ordering correctly:
[ {'key1' : 'value1'}, {'key2' : 'value2'}, ... ]
I don't see why this is "ugly" and "bloated" at all. Storing a list of complex objects couldn't be easier. However, abusing objects as lists is definitely ugly: Objects have associative array semantics (i.e. there can only be one field of a given name), while lists/arrays don't:
// not ok:
db.foo2.insert({"foo" : "bar", "foo" : "lala" });
db.foo2.find();
{ "_id" : ObjectId("4ef09cd9b37bc3cdb0e7fb26"), "foo" : "lala" }
// a list can do that
db.foo2.insert({ 'array' : [ {'foo' : 'bar'}, { 'foo' : 'lala' } ]});
db.foo2.find();
{ "_id" : ObjectId("4ef09e01b37bc3cdb0e7fb27"), "array" :
[ { "foo" : "bar" }, { "foo" : "lala" } ] }
Keep in mind that MongoDB is an object database, not a key/value store.
As of Mongo 2.6.1, it DOES keep the order of your fields:
MongoDB preserves the order of the document fields following write operations except for the following cases:
The _id field is always the first field in the document.
Updates that
include renaming of field names may result in the reordering of
fields in the document.
http://docs.mongodb.org/manual/release-notes/2.6/#insert-and-update-improvements
One of the pain points of this is comparing documents to one another in the shell.
I've created a project that creates a custom mongorc.js which sorts the document keys by default for you when they are printed out so at least you can see what is going on clearly in the shell. It's called Mongo Hacker if you want to give it a whirl.
Though it's true that, as of Mongo 2.6.1, it does preserve order, one should still be careful with update operations.
mattwad makes the point that updates can reorder things, but there's at least one other concern I can think of.
For example $addToSet:
https://docs.mongodb.com/manual/reference/operator/update/addToSet/
$addToSet when used on embedded documents in an array is discussed / exemplified here:
https://stackoverflow.com/a/21578556/3643190
In the post, mnemosyn explains how $addToSet disregards the order when matching elements in its deep value by value comparison.
($addToSet only adds records when they're unique)
This is relevant if one decided to structure data like this:
[{key1: v1, key2: v2}, {key1: v3, key2: v4}]
With an update like this (notice the different order on the embedded doc):
db.collection.update({_id: "id"},{$addToSet: {field:
{key2: v2, key1: v1}
}});
Mongo will see this as a duplicate and NOT this object to the array.