I have a database that contains 4 relationship tables used to construct the content of a page:
content rel theme theme_meta
The rel table matches the contentID from the content table to the corresponding rel field of the theme table. theme_meta has a field called themeID that links it to the theme table.
So
When constructing a page at the moment I JOIN the content table to the del table, joining that to the theme table and that to the theme_meta table.
It gives me around 24 rows for each matched row of the content table.
I then use a some php foreach loops to restructure the results into multidimensional arrays per content row.
Is that efficient? Would it be faster and more efficient to make 2 calls to the database, one for content and one for theme. This would produce far fewer rows and be easier to work with but require a second call to the database.
As mentioned above, an approach that uses a single query is usually the best way (since database queries incur a lot of overhead).
Indeed, it sounds as though your alternative approach would loop over the results of one query (on the content table) each time calling some other query (on the other tables) to fetch the joined data: such an approach will prove very costly in the long-term and will not scale well.
Therefore, to assemble a multi-dimensional array from the data, you merely need sort the joined results accordingly and keep track of the last seen identifier as you loop over the resultset (in order to detect when one needs to traverse up a level within the resulting array):
$qry = $dbh->query('
SELECT *
FROM content
JOIN rel USING (contentID)
JOIN theme USING (rel)
JOIN theme_meta USING (themeID)
ORDER BY contentID
');
$arr = array();
$row = $qry->fetch();
while ($row) {
array_push($arr, array());
$cid = $row['contentID'];
do {
array_push(end($arr), $row);
} while ($row = $qry->fetch() and $row['contentID'] == $cid);
}
echo var_export($arr);
I would however caution that it is often unnecessarily costly to build such a PHP data structure from the results of a database query, as one can might be able to build and dispatch the requisite output whilst reading the resultset.
Related
In a site I maintain I have a need to query the same table (articles) twice, once for each category of article. AFAIT there are basically two ways of doing this (maybe someone can suggest a better, third way?):
Perform the db query twice, meaning the db server has to sort through the entire table twice. After each query, I iterate over the cursor to generate html for a list entry on the page.
Perform the query just once and pull out all the records, then sort them into two separate arrays. After this, I have to iterate over each array separately in order to generate the HTML.
So it's this:
$newsQuery = $mysqli->query("SELECT * FROM articles WHERE type='news' ");
while($newRow = $newsQuery->fetch_assoc()){
// generate article summary in html
}
// repeat for informational articles
vs this:
$query = $mysqli->query("SELECT * FROM articles ");
$news = Array();
$info = Array();
while($row = $query->fetch_assoc()){
if($row['type'] == "news"){
$news[] = $row;
}else{
$info[] = $row;
}
}
// iterate over each array separate to generate article summaries
The recordset is not very large, current <200 and will probably grow to 1000-2000. Is there a significant different in the times between the two approaches, and if so, which one is faster?
(I know this whole thing seems awfully inefficient, but it's a poorly coded site I inherited and have to take care of without a budget for refactoring the whole thing...)
I'm writing in PHP, no framework :( , on a MySql db.
Edit
I just realized I left out one major detail. On a given page in the site, we will display (and thus retrieve from the db) no more than 30 records at once - but here's the catch: 15 info articles, and 15 news articles. On each page we pull the next 15 of each kind.
You know you can sort in the DB right?
SELECT * FROM articles ORDER BY type
EDIT
Due to the change made to the question, I'm updating my answer to address the newly revealed requirement: 15 rows for 'news' and 15 rows for not-'news'.
The gist of the question is the same "which is faster... one query to two separate queries". The gist of the answer remains the same: each database roundtrip incurs overhead (extra time, especially over a network connection to a separate database server), so with all else being equal, reducing the number database roundtrips can improve performance.
The new requirement really doesn't impact that. What the newly revealed requirement really impacts is the actual query to return the specified resultset.
For example:
( SELECT n.*
FROM articles n
WHERE n.type='news'
LIMIT 15
)
UNION ALL
( SELECT o.*
FROM articles o
WHERE NOT (o.type<=>'news')
LIMIT 15
)
Running that statement as a single query is going to require fewer database resources, and be faster than running two separate statements, and retrieving two disparate resultsets.
We weren't provided any indication of what the other values for type can be, so the statement offered here simply addresses two general categories of rows: rows that have type='news', and all other rows that have some other value for type.
That query assumes that type allows for NULL values, and we want to return rows that have a NULL for type. If that's not the case, we can adjust the predicate to be just
WHERE o.type <> 'news'
Or, if there are specific values for type we're interested in, we can specify that in the predicate instead
WHERE o.type IN ('alert','info','weather')
If "paging" is a requirement... "next 15", the typical pattern we see applied, LIMIT 30,15 can be inefficient. But this question isn't asking about improving efficiency of "paging" queries, it's asking whether running a single statement or running two separate statements is faster.
And the answer to that question is still the same.
ORIGINAL ANSWER below
There's overhead for every database roundtrip. In terms of database performance, for small sets (like you describe) you're better off with a single database query.
The downside is that you're fetching all of those rows and materializing an array. (But, that looks like that's the approach you're using in either case.)
Given the choice between the two options you've shown, go with the single query. That's going to be faster.
As far as a different approach, it really depends on what you are doing with those arrays.
You could actually have the database return the rows in a specified sequence, using an ORDER BY clause.
To get all of the 'news' rows first, followed by everything that isn't 'news', you could
ORDER BY type<=>'news' DESC
That's MySQL short hand for the more ANSI standards compliant:
ORDER BY CASE WHEN t.type = 'news' THEN 1 ELSE 0 END DESC
Rather than fetch every single row and store it in an array, you could just fetch from the cursor as you output each row, e.g.
while($row = $query->fetch_assoc()) {
echo "<br>Title: " . htmlspecialchars($row['title']);
echo "<br>byline: " . htmlspecialchars($row['byline']);
echo "<hr>";
}
Best way of dealing with a situation like this is to test this for yourself. Doesn't matter how many records do you have at the moment. You can simulate whatever amount you'd like, that's never a problem. Also, 1000-2000 is really a small set of data.
I somewhat don't understand why you'd have to iterate over all the records twice. You should never retrieve all the records in a query either way, but only a small subset you need to be working with. In a typical site where you manage articles it's usually about 10 records per page MAX. No user will ever go through 2000 articles in a way you'd have to pull all the records at once. Utilize paging and smart querying.
// iterate over each array separate to generate article summaries
Not really what you mean by this, but something tells me this data should be stored in the database as well. I really hope you're not generating article excerpts on the fly for every page hit.
It all sounds to me more like a bad architecture design than anything else...
PS: I believe sorting/ordering/filtering of a database data should be done on the database server, not in the application itself. You may save some traffic by doing a single query, but it won't help much if you transfer too much data at once, that you won't be using anyway.
I have a database design here that looks this in simplified version:
Table building:
id
attribute1
attribute2
Data in there is like:
(1, 1, 1)
(2, 1, 2)
(3, 5, 4)
And the tables, attribute1_values and attribute2_values, structured as:
id
value
Which contains information like:
(1, "Textual description of option 1")
(2, "Textual description of option 2")
...
(6, "Textual description of option 6")
I am unsure whether this is the best setup or not, but it is done as such per requirements of my project manager. It definitely has some truth in it as you can modify the text easily now without messing op the id's.
However now I have come to a page where I need to list the attributes, so how do I go about there? I see two major options:
1) Make one big query which gathers all values from building and at the same time picks the correct textual representation from the attribute{x}_values table.
2) Make a small query that gathers all values from the building table. Then after that get the textual representation of each attribute one at a time.
What is the best option to pick? Is option 1 even faster as option 2 at all? If so, is it worth the extra trouble concerning maintenance?
Another suggestion would be to create a view on the server with only the data you need and query from that. That would keep the work on the server end, and you can pull just what you need each time.
If you have a small number of rows in attributes table, then I suggest to fetch them first, fetch all of them! store them into some array using id as index key in array.
Then you can proceed with building data, now you just have to use respective array to look for attribute value
I would recommend something in-between. Parse the result from the first table in php, and figure out how many attributes you need to select from each attribute[x]_values table.
You can then select attributes in bulk using one query per table, rather than one query per attribute, or one query per building.
Here is a PHP solution:
$query = "SELECT * FROM building";
$result = mysqli_query(connection,$query);
$query = "SELECT * FROM attribute1_values";
$result2 = mysqli_query(connection,$query);
$query = "SELECT * FROM attribute2_values";
$result3 = mysqli_query(connection,$query);
$n = mysqli_num_rows($result);
for($i = 1; $n <= $i; $i++) {
$row = mysqli_fetch_array($result);
mysqli_data_seek($result2,$row['attribute1']-1);
$row2 = mysqli_fetch_array($result2);
$row2['value'] //Use this as the value for attribute one of this object.
mysqli_data_seek($result3,$row['attribute2']-1);
$row3 = mysqli_fetch_array($result3);
$row3['value'] //Use this as the value for attribute one of this object.
}
Keep in mind that this solution requires that the tables attribute1_values and attribute2_values start at 1 and increase by 1 every single row.
Oracle / Postgres / MySql DBA here:
Running a query many times has quite a bit of overhead. There are multiple round trips to the db, and if it's on a remote server, this can add up. The DB will likely have to parse the same query multiple times in MySql which will be terribly inefficient if there are tons of rows. Now, one thing that your PHP method (multiple queries) has as an advantage is that it'll use less memory as it'll release the results as they're no longer needed (if you run the query as a nested loop that is, but if you query all the results up front, you'll have a lot of memory overhead, depending on the table sizes).
The optimal result would be to run it as 1 query, and fetch the results 1 at a time, displaying each one as needed and discarding it, which can reek havoc with MVC frameworks unless you're either comfortable running model code in your view, or run small view fragments.
Your question is very generic and i think that to get an answer you should give more hints to how this page will look like and how big the dataset is.
You will get all the buildings with theyr attributes or just one at time?
Cause your data structure look like very simple and anything more than a raspberrypi can handle it very good.
If you need one record at time you don't need any special technique, just JOIN the tables.
If you need to list all buildings and you want to save db time you have to measure your data.
If you have more attribute than buildings you have to choose one way, if you have 8 attributes and 2000 buildings you can think of caching attributes in an array with a select for each table and then just print them using the array. I don't think you will see any speed drop or improvement with so simple tables on a modern computer.
$att1[1]='description1'
$att1[2]='description2'
....
Never do one at a time queries, try to combine them into a single one.
MySQL will cache your query and it will run much faster. PhP loops are faster than doing many requests to the database.
The query cache stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again.
http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
I feel a little embarrassed as there is probably an easy solution, but I don't know enough MySQL to do it. How do I use one query to get data from each of these tables, and then return an array as illustrated below? Every attempt I make ends up returning either one tag, or returning multiple arrays of the same task, each with a different tag.
What should my query structure look like?
Thanks!
http://i.stack.imgur.com/ViqEs.png
The image's array actually shows how the data would look like after two queries, not one. To be able to do it in a single query, and this is because the data is not too complex, you could use a GROUP_CONCAT() to get all of the tags for a task and then use post-query logic to split the data into separate arrays.
The SQL query to get all of the requested data would be:
SELECT
tasks.*, GROUP_CONCAT(tag_name) AS tags
FROM
tasks LEFT JOIN tags ON tags.task_id=tags.id
WHERE
id=2
This query will return a single record; in that record, the column tags will hold a comma-separated list of all of the tags that belong to the task. You can split the data in that column into an array to build your desired structure.
An example, with PHP:
$result = mysql_query("SELECT tasks.*, GROUP_CONCAT(tag_name) AS tags FROM tasks LEFT JOIN tags ON tags.task_id=tags.id WHERE id=2");
// create the "$task" array that has a "task" and "tags" index
$task = array('task' => array(), 'tags' => array());
$task['task'] = mysql_fetch_assoc($result);
// split the comma-separated list of tags into an array
$task['tags'] = explode(',', $task['task']['tags']);
// delete the original "tags" entry that's returned by the sql query
unset($task['task']['tags']);
Please note that this example is void of any data validation, connection information, or other logic and should just be used as a rough idea as how you could split the data into your desired structure.
Im making an app where multiple users can post comments above or below other comments. This is not a thread-type structure. It's more like collaborating on a Word document. Im having trouble designing the method these entries are sorted.
Using mySQL and PHP, sorting by time of entry doesnt work, and neither does sorting by comment position because the position changes if user posts inbetween other comments.
I dont want to have to re-serialize comment positions for every new entry (what if there are thousands of entries and dozens of users doing the same thing).
What is the best way to design this?
What you are describing is a linked list. The problem is that they are usually hard to retrieve using just SQL. My solution is to use PHP to do the sorting upon retrieval.
Your table would look something like this:
CREATE TABLE page {
page_id INT,
first_comment_id INT
}
CREATE TABLE comment {
comment_id INT PRIMARY KEY AUTOINCREMENT,
page_id INT,
next_comment_id INT
}
Your query is simple:
SELECT comment_id, next_comment_id
FROM comment
WHERE page_id = $page_id
ORDER BY comment_id DESC
The important step is to massage the results from mysql_fetch_assoc() into an array that is indexed according to comment_id:
$result = mysql_query($sql);
$indexed_list = array();
while ($row = mysql_fetch_assoc($result))
{
$indexed_list[$row['comment_id']] = $row;
}
Resulting in an array similar to this one:
$indexed_list = array(
1 => array("comment_id"=>1, "next_comment_id"=>2),
2 => array("comment_id"=>2, "next_comment_id"=>5),
3 => array("comment_id"=>3, "next_comment_id"=>4),
4 => array("comment_id"=>4, "next_comment_id"=>0),
5 => array("comment_id"=>5, "next_comment_id"=>3));
The PHP function to sort them into displayable order is simple:
function llsort($indexed_list, $first_comment_id)
{
$sorted_list = array();
$node = $indexed_list[$first_comment_id];
array_push($sorted_list, $node);
do
{
$node = $indexed_list[$node['next_comment_id']];
array_push($sorted_list, $node);
} while ($node['next_comment_id'] != 0
AND isset($indexed_list[$node['next_comment_id']]) );
return $sorted_list;
}
You get first_comment_id from the page table. Of course, you still have to implement functions to insert a node and delete a node, but those are left as exercises for the reader. Don't forget to use transactions for inserting and deleting nodes.
More information on linked lists in MySQL:
Fetching linked list in MySQL database
Creating a linked list or similar queue in MySQL?
Managing Hierarchical Data in MySQL
Trees and Other Hierarchies in MySQL
this sounds like a good time to use MPTT, Modified Pre-ordered Tree Traversal. It's often used for threaded comment boards and things of that nature. Of all the ways to keep hierarchical structures in a RDBMS, it has the lowest overhead when pruning or adding nodes to the tree.
here is a good intro, and another. Googling around for it should get you some more info. It's not hard to implement at all once you understand the concept.
I would definitely go with ordering by position. When inserting, it's just a question of incrementing all entries below it -- a single update query. One important feature of that implementation is that it handles concurrency very well; if there are two concurrent inserts, you don't care in which order the incrementing is done (but you do need a non-positional pk so there's no upset when an insert happens above you).
An alternative is to model it as a tree, which means you only need to update entries below you in the branch. But it's going to be a rare situation where the maintenance overhead is justifiable. (A compromise is to model is as a weeping-willow -- you divide the total into chunks which form branches, but you don't allow branches from branches; that avoids ever having to update every single record; however I'm still guessing it's not worth the overhead in comparison to the first approach.)
There seems to be no shortage of hierarchical data questions in MySQL on SO, however it seems they are mostly talking about managing such data in the database or actually retrieving recursively hierarchical data. My situation is neither. I have a grid of items I need to display. Each item can also have 0 or more comments associated with it. Right now, both the item, along with its data, are displayed in the grid as well as any comments belonging to that item. Usually there is some sort of drill down, dialog, or other user action required to see child data for a grid item but in this case we display both parent and child data in the same grid. Might not fit the de facto standards but it is what it is.
Right now the comments are retrieved by a separate MySQL query for every single parent item in the grid. I immediately cringe at this being aware of all the completely separate database queries that have to be run for a single page load. I haven't profiled but I wouldn't be too surprised if this is part of the slow page loads we sometimes see. I'd like to ideally bring this down to a single query or perhaps 2. However, I'm having difficulty coming up with a solution that sounds any better than what is currently being done.
My first thought was to flatten the comment children for each row with some sort of separator like '|' and then explode them back apart in PHP when rendering the page. The issue with this is it gets increasingly complicated with having to separate each field in a comment, and then each comment, and then account for the possibility of separator characters in the data. Just feels like a mess to maintain and debug.
My next thought was to left outer join the comments to the items and just account for the item duplicates in PHP. I'm working with Codeigniter's database library that returns a PHP array for database data. This sounds like potentially a lot of duplicated data in the resulting array which could possibly be system taxing for larger result sets. I'm thinking in most cases it wouldn't be too bad though so this option is currently at the top of my possibilities list. Ideally, if I understand MVC correctly, I should keep my database, business logic, and view/display as separate as possible. So again, ideally, there should not be any database "quirks" (for lack of a better word) apparent in the data returned by the model. That is, whatever calls for data from this model method, shouldn't be concerned with duplicate data like this. So I'd have to add on an additional loop to somehow eliminate the duplicate item array entries but only after I have retrieved all the child comments and placed them into their own array.
Two queries is another idea but then I have to pass numerous item IDs in the SQL statement for the comments and then go through and zip all the data together manually in PHP.
My goal isn't to get out of doing work here but I am hoping there is some more optimal (less resource intensive and less confusing to the coder) method I haven't thought of yet.
As you state in your question, using a join will bring back a lot of duplicate information. It should be simple enough to remove in PHP, but why bring it back in the first place?
Compiling a SQL statement with a list of IDs retrieved from the query for your list of items shouldn't be a problem (see cwallenpoole's answer). Alternatively, you could create a sub-query so that MySQL recreates the list of IDs for you - it depends on how intensive the sub-query is.
Select your items:
SELECT * FROM item WHERE description = 'Item 1';
Then select the comments for those items:
SELECT * FROM comment WHERE item_id IN (
SELECT id FROM item WHERE description = 'Item 1'
);
For the most part, I solve this type of problem using some sort of ORM Lazy-Loading system but it does not look like you've that as an option.
Have you considered:
Select all top-level items.
Select all second-level items by the ID's in the top-level set.
Associate the objects retrieved in 2 with the items found in 1 in PHP.
Basically (in pseudo-code)
$stmt = $pdo->query("SELECT ID /*columns*/ FROM ENTRIES");
$entries = array();
foreach( $row as $stmt->fetchAll(PDO::FETCH_ASSOC) )
{
$row['child-entities'] = array();
$entries[$row['id']] = $row;
}
$ids = implode(',',array_keys($entries));
$stmt = $pdo->query("SELECT PARENT_ID /*columns*/ FROM children WHERE PARENT_ID IN ($ids)");
foreach( $row as $stmt->fetchAll(PDO::FETCH_ASSOC) )
{
$entries[$row['parent_pid']]['child-entities'][] = $row;
}
$entries will now be an associative array with parent items directly associated with child items. Unless recursion is needed, that should be everything in two queries.