How to massage the result set from a query with joins? - php

I have a domain model and a data mapper built with PHP and MySql. Some domain objects appear only in the context of others and in these cases I am using dependent mappings, i.e. the dependent objects do not have their own mapper, but are instead persisted by their owner's mapper.
For performance reasons I am joining multiple tables and issuing one SQL query, (rather than one query per table), and this is where the difficulty arises: the results of my queries are cartesian joins (of course) and will require a fair amount of array sifting to get sensible data with which to populate domain objects.
For example, I have three tables: one parent (P1) and two children (C1 and C2). If a single record in P1 has two records in both C1 and C2, I have four records in my result set. I can cycle through the result set array looking for unique values and create a fresh array from these, but that seems like a lot of work.
This is bound to be a common problem. What is the typical way to solve it? I looked through the SPL data structures and iterators but didn't find anything useful. I don't want to split into multiple queries unless I really have to.
Thanks!

Related

Many types of data - One table vs multiple tables

I am working on a webapplication, which main functionality will be to present some data to user. However, there are several types of these data and each of them have to be presented in a diffrent way.
For example I have to list 9 results - 3 books, 3 authors and 3 files.
Book is described with (char)TITLE, (text)DESCRIPTION.
Author is described with (char)TITLE, (char)DESCRIPTION.
File is described with (char)URL.
Moreover, every type has fields like ID, DATE, VIEWS etc.
Book and Author are presented with simple HTML code, File use external reader embed on the website.
Should I build three diffrent tables and use JOIN while getting these data or build one table and store all types in there? Which attitude is more efficient?
Additional info - there are going to be really huge amounts of records.
The logical way of doing this is keeping things separate, which is following the 3NF rules o the database design. This gives more flexibility while retrieving different kinds of results specially when there is huge amount of data. Putting everything in a single table is absolutely bad DB practice.
That depends on the structure of your data.
If you have 1:1 relationships, say one book has one author, you can put the records in one row. If one book has several authors or one author has several books you should set up seperate tables books and authors and link those with a table author_has_books where you have both foreign keys. This way you won't store duplicate data and avoid inconsistencies.
More information about db normalization here:
http://en.wikipedia.org/wiki/Database_normalization
Separate them and create a relationship. That way, when you start to get a lot of data, you'll notice a performance boost because you are only calling 3 fields at a time (IE when you are just looking at a book) instead of 7.

how do I get a recursive result by querying a self referencing table in mysql?

I have a self-referencing table 'comments' where comments.replyToId REFERENCES comments.ID.
My question is, how do I query a database with a self-referencing table to get a result that is properly ordered so that I can represent the result as a tree in PHP?
I've tried
select * from comments as comments_1
left join comments as comments_2
on comments_1.id = comments_2.replyToId
I'm trying to use the result of this in php
You're not going to get a recursive result out of MySQL directly. There was a similar discussion recently - it is maybe possible with some RDBMS using stored procedures etc, but not with out-of-the-box SQL (see How can I get ancestor ids for arbitrary recursion depth in one SQL query?).
What I do instead in similar cases: Get all comments without parents. Then, for each comment, get its children (if you store the "depth" of each comment you may get all these children and all children of the next layers with one SQL query). Store the children in the appropriate place in your tree structure, repeat.
If you need a more low-level, you'll prly need to share some code, explain your data structure, what you've tried so far etc., this is just the general approach.

mysql: use SET or lots of columns?

I'm using PHP and MySQL. I have records for:
events with various "event types" that are hierarchical (events can have multiple categories and subcategories, but there are a fixed amount of such categories and subcategories) (timestamped)
What is the best way to set up the table? Should I have a bunch of columns (30 or so) with enums for yes or no indicating membership in that category? or should I use MySQL SET datatype?
http://dev.mysql.com/tech-resources/articles/mysql-set-datatype.html
Basically I have performance in mind and I want to be able to retrieve all of the ids of the events for a given category. Just looking for some insight on the most efficient way to do this.
It sounds like you're chiefly concerned with performance.
A couple people have suggested splitting into 3 tables (category table plus either simple cross-reference table or a more sophisticated way of modeling the tree hierarchy, like nested set or materialized path), which is the first thing I thought when I read your question.
With indexes, a fully normalized approach like that (which adds two JOINs) will still have "pretty good" read performance. One issue is that an INSERT or UPDATE to an event now may also include one or more INSERT/UPDATE/DELETEs to the cross-reference table, which on MyISAM means the cross-reference table is locked and on InnoDB means the rows are locked, so if your database is busy with a significant number of writes you're going to have a larger contention problems than if just the event rows were locked.
Personally, I would try out this fully normalized approach before optimizing. But, I'll assume you know what you're doing, that your assumptions are correct (categories never change) and you have a usage pattern (lots of writes) that calls for a less-normalized, flat structure. That's totally fine and is part of what NoSQL is about.
SET vs. "lots of columns"
So, as to your actual question "SET vs. lots of columns", I can say that I've worked with two companies with smart engineers (whose products were CRM web applications ... one was actually events management), and they both used the "lots of columns" approach for this kind of static set data.
My advice would be to think about all of the queries you will be doing on this table (weighted by their frequency) and how the indexes would work.
First, with the "lots of columns" approach you are going to need indexes on each of these columns so that you can do SELECT FROM events WHERE CategoryX = TRUE. With the indexes, that is a super-fast query.
Versus with SET, you must use bitwise AND (&), LIKE, or FIND_IN_SET() to do this query. That means the query can't use an index and must do a linear search of all rows (you can use EXPLAIN to verify this). Slow query!
That's the main reason SET is a bad idea -- its index is only useful if you're selecting by exact groups of categories. SET works great if you'd be selecting categories by event, but not the other way around.
The primary problem with the less-normalized "lots of columns" approach (versus fully normalized) is that it doesn't scale. If you have 5 categories and they never change, fine, but if you have 500 and are changing them, it's a big problem. In your scenario, with around 30 that never change, the primary issue is that there's an index on every column, so if you're doing frequent writes, those queries become slower because of the number of indexes that have to updated. If you choose this approach, you might want to check the MySQL slow query log to make sure there aren't outlier slow queries because of contention at busy times of day.
In your case, if yours is a typical read-heavy web app, I think going with the "lots of columns" approach (as the two CRM products did, for the same reason) is probably sane. It is definitely faster than SET for that SELECT query.
TL;DR Don't use SET because the "select events by category" query will be slow.
It's good that the number of categories is fixed. If it wasn't you couldn't use either approach.
Check the Why You Shouldn't Use SET on the page you linked. I think that should give you a comprehensive guide.
I think the most important one is about indexes. Also, modifying a SET is slightly more complex.
The relationship between events and event types/categories is a many to many relationship, as echo says, but a simple xref table will leave you with a problem: If you want to query for all descendants of any given node, then you must make multiple recursive queries. On a deep tree, that will be very inefficient.
So when you say "retrieve all ids for a given category", if you do mean all descendants, then you want to use a Nested Set Model:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
The Nested Set model makes writes updates a bit slower, but makes it very easy to retrieve subtrees:
To get the Televisions sub tree, you query for all categories left >= 2 and right <= 9.
Leaf nodes always have left = right - 1
You can find the count of descendants without pulling those rows: (right - left - 1)/2
Finding inheritance paths and depth is also very easy (single query stuff). See the article for full details.
You might try using a cross-reference (Xref) table, to create a many-to-many relationship between your events and their types.
create table event_category_event_xref
(
event_id int,
event_category_id int,
foreign key(event_id) references event(id),
foreign key (event_category_id) references event_category(id)
);
Event / category membership is defined by records in this table. So if you have a record with {event_id = 3, event_category_id = 52}, it means event #3 is in category #52. Similarly you can have records for {event_id = 3, event_category_id = 27}, and so on.

Object oriented representation of multi-table query

Suppose we have two related tables, for example one representing a person:
PERSON
name
age
...
current_status_id
and one representing a status update at a specific time for this person:
STATUS_HISTORY
recorded_on
status_id
blood_pressure
length
...
I have built an application in PHP using Zend Framework, and tried to retain 'object orientedness' by using a class for representing a person and a class for representing the status of a person. I also tried to use ORM principles where possible, such as using the data mapper for separating the domain model from the data layer.
What would be a nice (and object oriented) way of returning a list of persons from a data mapper, where in the list I sometimes want to know the last measured blood_pressure of the person, and sometimes not (depending on the requirements of the report/view in which the list is used). The same holds for different fields, e.g. values computed at the data layer (sum's, count's, etc.).
My first thought was using a rowset (e.g. Zend_Db_Rowset) but this introduces high coupling between my view and data layer. Another way might be to return a list of persons, and then querying for each person the latest status using a data mapper for requesting the status of a specific person. However, this will result in (at least) one additional query for each person record, and does not allow me to use JOINS at the data layer.
Any suggestions?
We have this same issue because of our ORM where I work. If you are worried enough about the performance hit of having to first get a list of your persons, then query for their statuses individually, you really have no other choice but to couple your data a little bit.
In my opinion, this is okay. You can either create a class that will hold the single "person" data and an array containing "status_history" records or suffer the performance hit of making another query per "person". You COULD reduce your query overhead by doing data caching locally (your controller would have to decide that if a request for a set of data is made before a certain time threshold, it just returns its own data instead of querying the db server)
Having a pure OO view is nice, but sometimes impractical.
Try to use "stdclass" class which is PHP's inbuild class, You can get the object of stdclass which will be created automatically by PHP and its member variable will be column name. So u can get object and get the values by column name. For example.
Query is
SELECT a.dept_id,a.dept_name,a.e_id,b.emp_name,b.emp_id from DEPT a,EMP b where b.emp_id=a.e_id;
Result will be array of stdclass objects. Each row represents one stdclass object.
Object
STDCLASS
{
dept_id;
dept_name;
e_id;
emp_id;
emp_name;
}
You can access like
foreach($resultset as $row)
{
$d_id = $row->dept_id;
$d_nam= $row->dept_name;
$e_id = $row->e_id;
$em_id= $row->emp_id;
$e_nam= $row->emp_name;
}
But
Blockquote
I am not sure about performance.

Returning multiple rows per row (in Zend Framework)

I have a MySQL database containing these tables:
sessions
--------
sessionid (INT)
[courseid (INT)]
[locationid (INT)]
[comment (TEXT)]
dates
-----
dateid (INT)
sessionid (INT)
date (DATE)
courses
-------
...
locations
---------
...
Each session has a unique sessionid, and each date has a unique dateid. But dates don't necessarily have a unique sessionid, as a session can span over a variable number of dates (not necessarily consecutive).
Selecting each full row is simply a matter of joining the tables on the sessionid. However, I'm looking for a way to return a rowset for a particular courseid, where each row in that rowset represents a location, and contains another rowset, each containing single session, which in turn contains another rowset, which contains all of the dates for that session:
course
location
sesssion
date
date
session
date
date
date
location
...
This is because I'm using querying this database from PHP using Zend Framework, which has a great interface for manipulating rows and rowsets in an object-oriented manner.
Ultimately, I'm trying to output a 'schedule' to the view, organized first by course, then location, then date. Ideally, I'd be able iterate over each row as a location, and then for each location, iterate over each session, and then for each session, iterate over each date.
I'm thinking of doing this by querying for all the locations, sessions, and dates separately. Then, I'd convert each rowset into an array, and add each sessions array as a member of a locations array, and add each dates array as a member of a sessions array.
This, however, feels very kludgy, and doesn't provide me with the ability to handle the rows in an object-oriented manner.
I was wondering if there was either:
a) a better table schema for representing this data;
b) an sql query which i'm not aware of;
c) a method in Zend_Db that allows me to assign a rowset to a rowset
Please let me know if I haven't been clear anywhere, and thanks in advance.
(Crossing my fingers that this doesn't end up on the daily wtf...)
I've run into lots of issues with using Zend Frameworks database abstraction classes when I have to deal with data from multiple tables. The number of queries that run and the overhead of all of the objects generated has brought my hosting server to it's knees. I've since reverted back to writing queries to gather all of my data and then walking the data to build my display. It's not a pretty or OO as using the abstraction layers but it's also not making my PHP scripts page to disk just to display a table full of data.
As Steve mentions benchmark whatever solution you end up with, I'd also profile your memory usage.
You could handle this scenario using the relationship features of Zend_ Db_ Table. You'd need to create table wrapper classes for sessions, dates, courses, etc. if you're using Zend_ Db_ Aadpter for your queries currently.
http://framework.zend.com/manual/en/zend.db.table.relationships.html
It's not too different from the approach you described of querying for each dataset separately, but it gives you a straight forward OO interface for retrieving the appropriate related data for a given record.
You'll want to do some benchmarking if you go this route, as it could potentially execute a lot of queries.

Categories