I am (as most ) coming from a mySQL background trying to switch over to noSQL and mongoDB. Since denormalization is a part of noSQL since joins is impossible, here's how I would design a simple blog:
array (
blog_title => 'my blogpost',
'date' => '2010-09-05',
comments => array (
'1' => 'Such a good post!!! You deserve a nobel prize'
)
);
If I want to update the comments, adding a new element in that array, how can I make sure that this is done and not the whole comments array being overwritten if multiple users are trying to write a comment at the same time?
Is it the push function I am looking after in mongoDB?
Correct, the $push operator allows you to update an existing array. You can use the $pushAll operator to add multiple values in a single query.
To add a comment to your example document, the query would be:
db.posts.update({blog_title: "my blogpost"}, {$push: {comments: "New comment"}})
These operators are atomic, so you won't run into any problems if multiple users add comments simultaneously.
Related
Let's say that I have array like the one I posted below and that I need to store it in my MySQL database:
Array(
"Weight" => "10",
"Height" => "17",
"Usage" => "35"
);
Preamble:
I will never update these values
I will never perform a query based on these values
Long story short I only need to store and display this array as it is. Actually I need to use these values to generate graphs. Now I see 2 possible options.
Option 1: even if I will never use a WHERE, ORDER BY, HAVING (...) condition on these values, I store each value separately in a dedicated column (weight, height, usage).
Option 2: I create a single column (stats) where I store a serialized version of the array then, in order generate my graphs, I unserialize each row before using it.
The question is: what's the best approach to store this array in terms of effectiveness and performaces?
In my opinion the second approach is the best but let's say that there are many rows and elements involved in the process. I don't understand if it's faster and ligher to unserialize an array made by 20 elements for 100 rows with PHP or to read plain values stored in 20 columns considering that I need to save lot of them very frequently and simultaneously.
I will never update these values
I will never perform a query based on these values
The second you finalise your code having stored them as serialised values, you'll be asked to perform a query to update anything with a weight above ten.
Just store them in their own columns - not only will this future-proof the code, but it is easier to work with and will take up less drive space in the long run.
I have two, big, 2 dimensional arrays (pulled from some xml data) one (A list) is ~1000 items containing 5 fields the other (B list) is dinamically between 10.000-12.000 items containing 5 fields.
My idea was to compare EACH id key of list A against EACH id key of list B and on "true" compose a new array of combined fields, or just fields from array A if no match.
I used nested foreach loops and ended up with millions of iterations taking long time to process. needless to say...not a solution.
The form of this two structures and my needed result reminded me straight away of a sql join.
The questions are:
1.) Should i try sql or nested foreach might not be the best php way?
2.) Will a relational query be much faster than the iterations?
EDIT:
I pull data only periodically from an xml file (in a separate process) which contains 10+ fields for each node. Than i store the 5 fields i need in a CSV file to later compare with table A that i pull out from a mysql database. basically much like catalog update of attributes with fresh feed.
I'm affraid the original idea of storing into CSV was an error and i should just save the feed updates into a database too.
EDIT 2
The array list B look like this
Array
(
[0] => Array
(
[code] => HTS541010A9E680
[name] => HDD Mobile HGST Travelstar 5K100 (2.5", 1TB, 8MB, SATA III-600)
[price] => 385.21
[avail] => 0
[retail] => asbis
)
...
...
while the A list is similar in all but the 'code' field which is the only one useful for comparison
Array
(
[0] => Array
(
[code] => ASD-HTS541010A
[name] => HDD Mobile HGST Travelstar 5K100 (2.5", 1TB, 8MB, SATA III-600)
[price] => 385.21
[avail] => 0
[retail] => asbis
)
As you can see each feed will have universal code BUT some different random data as prefix or suffix so in each loop i have to do a couple of operations on the string to stripos or compare it to feeds id for a match or close match.
Pseudo code:
$mylist = loadfromDB();
$whslist = loadfromCSV();
foreach ($mylist as $myl) {
foreach ($whslist as $whl){
if ((stripos(code_a,code_b) OR (code_b,code_a) !== false)){
...
}
elseif (stripos(substr(strstr(code_a,'-'),1),code_b) !== false) {
...
}
elseif (stripos( substr(code_a,0,-5);) == !false ){
...
}
}
}
Using SQL will be faster because most SQL engines are optimized for joins, and your method is a brute-force method. However, inserting all that data to MySQL tables is quite a heavy task, so it's still not the best solution.
I suggest you do the join in PHP - but use a smarter algorithm. Start by sorting the two arrays by the field you want to match. Iterate both sorted arrays together - use two iterators(or pointers or indices or whatever) - lets say a iterates over A and b over B. On each iteration of the loop, compare the comparison field of the elements pointed by the a and b. If a's is smaller - advance a. If b's is smaller - advance b. If a's is equal to b's - you have a match, which you should store in a new list, and then advance both a and b(assuming the relation is one-to-one - if it's one-to-many you only advance the many iterator, and if it's many-to-many you need a bit more complex solution).
problem
I have two data tables SEQUENCES and ORGANISMS whose many-to-many-relationship is mappend in the table SOURCES. There is also a 1-m relationshipt between SOURCES and ENTRIES. I will append a detailed structure.
What i want to achieve, is the display of all sequences with all associated organisms and entries, where a condition within the sequences table is met. I have some ideas on how to achieve this, but i need the solution with the best performance, as each of these contains 50k+ entries.
idea one
Select all organisms that belong to the same sequence as a concatenated string in sql, and split it in PHP. I have no idea though, how to do the concatenation in SQL.
idea two
select same sequences with different organisms as distinct records, order by organism, and join them later in php. though this somehow feels just wrong.
idea three
use views. ANY idea on this one appreciated
structure
SEQUENCES
SEQUENCE_ID
DESCRIPTION
ORGANISMS
ORGANISM_ID
NAME
SOURCES
SOURCE_ID
SEQUENCE_ID FK to SEQUENCES.SEQUENCE_ID
ORGANISM_ID FK to ORGANISMS.ORGANISM_ID
ENTRIES
SOURCE_ID FK to SOURCES.SOURCE_ID
ENTRY_VALUE
desired outcome
array(
array(
"SEQUENCE_ID" => 4,
"DESCRIPTION" => "Some sequence",
"SOURCES" => array(
array(
"ORGANISM_ID" => 562,
"ORGANISM_NAME" => "Escherichia coli",
"ENTRIES" => array(
"some entry",
"some other entry"
),
array(
"ORGANISM_ID" => 402764,
"ORGANISM_NAME" => "Aranicola sp. EP18",
"ENTRIES" => array()
)
)
),
array(
"SEQUENCE_ID" => 5,
.....
)
)
PHP5 and FIREBIRD2.5.1
You can't fetch a nested array like that directly from a flat table structure. But if I get you right, what you want to do is not that hard to achieve.
I don't understand why you would concatenate things and then split them again, that's hard to maintain and probably slow.
I see two approaches here:
Fetch everything at once as flat table using JOIN and loop through it in PHP. This approach creates a lot of duplication but it's fast because you can fetch all data in one query and then process it with PHP.
Fetch every entity separately, loop and fetch the next hierarchy level as you go. This approach will be slower. It takes complexity away from the SQL query and doesn't fetch redunant data. It also gives you more freedom as to how you loop through your data and what you do with it.
Alternatively you might want to actually store hierarchical data in a no-sql way, where you could already store the array structure you mentioned.
My question is quite simple but I can't manage to find an answer.
When I execute a query like:
$query->select('t2.name as t2_name, t1.name as t1_name')
->from('table1 t1')
->leftJoin('t1.table2 t2')
->execute(array(), Doctrine_Core::HYDRATE_ARRAY);
Doctrine returns me an array like:
array(
[0] => array(
't1_name' => 'foo',
't2_name' => 'bar'
)
)
Where i expected to get field t2_name to be set in the array before t1_name.
Is there anyway to keep the order of these selected fields in Doctrine ?
Doctrine will automatically include the primary (root) table's key field and automatically make it the first column in any query, in almost all hydration types.
Since table1 is the root table in your query, it moves that to the beginning for its own internal processing benefits.
I find this behavior annoying and somewhat unpredictable at times also, but have found great relief by creating custom hydrators.
There's a good example of creating a key/value hydrator which I have used beneficially many times in our code.
You could do something similar to rearrange the fields in the order you want.
Also I have posted an explanation to a very similar question here which may be beneficial.
First to say that I'm new to MongoDb and document oriented db's in general.
After some trouble with embedded documents in mongodb (unable to select only nested document (example single comment in blog post)),
I redesigned the db. Now I have two collections, posts and comments (not the real deal, using blog example for convinience sake).
Example - posts collection document:
Array {
'_id' : MongoId,
'title' : 'Something',
'body' : 'Something awesome'
}
Example - comments document:
Array {
'_id' : MongoId,
'postId' : MongoId,
'userId' : MongoId,
'commentId' : 33,
'comment' : 'Punch the punch line!'
}
As you can see, I have multiple comment documents (As I said before, I want to be able to select single comment, and not an array of them).
My plan is this: I want to select single comment from collection using postId and commentId (commentId is unique value only among comments with the same postId).
Oh and commentId needs to be an int, so that I could be able to use that value for calculating next and previous documents, sort of "orderWith" number.
Now I can get a comment like this:
URI: mongo.php?post=4de526b67cdfa94f0f000000&comment=4
Code: $comment = $collection->findOne(array("postId" => $theObjId, "commentId" => (int)$commentId));
I have a few questions.
Am I doing it right?
What is the best way to generate that kind of commentId?
What is the best way to ensure that commentId is unique among comments with the same postId (upsert?)?
How to deal with concurrent queries?
Am I doing it right?
This is a really difficult question. Does it work? Does it meet your performance needs, are you comfortable maintaining it?
MongoDB doesn't have any notion of "normalization" or the "the one true way". You model your data in a way that works for you.
What is the best way to generate that kind of commentId?
What is the best way to ensure that commentId is unique among comments with the same postId (upsert?)?
This is really a complex problem. If you want to generate monotonically increasing integers IDs (like auto-increment), then you need a central authority for generating these integers. That doesn't tend to scale very well.
The commonly suggested method is to use the the ObjectId/MongoId. That will give you a unique ID.
However, you really want an integer. So take a look at findAndModify. You can keep a "last_comment_id" on your post and then update it when creating a new comment.
How to deal with concurrent queries?
Why would concurrent queries be a problem? Two readers should be able to access the same data.
Are you worried about concurrent comments being created? Then see the find an modify docs.
I don't know if The Big Picture will allow you to do this, but here is how I'd do it.
I'd have an array of comments contained inside each post. This means no joins are needed. In your case, normalization of comments doesn't give any benefit. I'd replace CommentID with CreatedAt as the time of creation.
This will let you have an easy data model to work with, as well as the ability to sort it.