Doctrine Paginator selects entire table (very slow)? - php

This is related to a previous question here: Doctrine/Symfony query builder add select on left join
I want to perform a complex join query using Doctrine ORM. I want to select 10 paginated blog posts, left joining a single author, like value for current user, and hashtags on the post. My query builder looks like this:
$query = $em->createQueryBuilder()
->select('p')
->from('Post', 'p')
->leftJoin('p.author', 'a')
->leftJoin('p.hashtags', 'h')
->leftJoin('p.likes', 'l', 'WITH', 'l.post_id = p.id AND l.user_id = 10')
->where("p.foo = bar")
->addSelect('a AS post_author')
->addSelect('l AS post_liked')
->addSelect('h AS post_hashtags')
->orderBy('p.time', 'DESC')
->setFirstResult(0)
->setMaxResults(10);
// FAILS - because left joined hashtag collection breaks LIMITS
$result = $query->getQuery()->getResult();
// WORKS - but is extremely slow (count($result) shows over 80,000 rows)
$result = new \Doctrine\ORM\Tools\Pagination\Paginator($query, true);
Strangely, count($result) on the paginator shows the total number of rows in my table (over 80,000) but traversing the $result with foreach outputs 10 Post entities, as expected. Do I need to do some additional configuration to properly limit my paginator?
If this is a limitation of the paginator class what other options do I have? Writing custom paginator code or other paginator libraries?
(bonus): How can I hydrate an array, like $query->getQuery()->getArrayResult();?
EDIT: I left out a stray orderBy in my function. It looks like including both groupBy and orderBy causes the slowdown (using groupBy rather than the paginator). If I omit one or the other, the query is fast. I tried adding an index on the "time" column in my table, but didn't see any improvement.
Things I Tried
// works, but makes the query about 50x slower
$query->groupBy('p.id');
$result = $query->getQuery()->getArrayResult();
// adding an index on the time column (no improvement)
indexes:
time_idx:
columns: [ time ]
// the above two solutions don't work because MySQL ORDER BY
// ignores indexes if GROUP BY is used on a different column
// e.g. "ORDER BY p.time GROUP BY p.id is" slow

You should simplify your query. That would shave off some execution time. I can't test your query but here are a few pointers:
don't do sort while executing count()
you could sort by orderBy('p.id', 'DESC'), index would be used
instead of leftJoin() you could use join() if at least one record always exists at joined table. Else that record is skipped.
KNP/Paginator uses DISTINCT() to read only distinct records, but that could lead to using disk tmp table
$query->getArrayResult() uses array hidration mode, which returns multidimension array and it is way faster than object hidration for large result set
you could use partial select('partial p.{id, other used fields}'), this way you would load only needed fields, maybe skip unneded relations when using object hydration
check SF profiler EXPLAIN on a given query under doctrine section, maybe indexes are not used
does p.hashtags and p.likes return only one row or is oneToMany, which multiplies result
maybe some Posts design changes, that would remove some joins:
have p.hashtags field defined as #ORM\Column(type="array") and have stored string values of tags. Later maybe using full text search on serialized array.
have p.likesCount field defined as #ORM\Column(type="integer") which would have count of likes
I use KnpLabs/KnpPaginatorBundle and can also have speed issues for complex queries.
Usually using LIMIT x,z is slow for DB, because it runs COUNT on whole dataset. If indexes are not used it is painfully slow.
You could use different approach and do some custom pagination by ID advancing, but that would complicate your approach. I have used this with large datasets like SYSLOG tables. But you loose sorting and total record count functionality.

At the end of the day, many of the queries used in my application are too complex to make proper use of the Paginator, and I wasn't able to use array hydration mode with the Paginator.
According to MySQL documentation, ORDER BY cannot be resolved by indexes if GROUP BY is used on a different column. Thus, I ended up using a couple post-processing queries to populate my base results (ORDERed and LIMITed) with one-to-many relations (like hashtags).
For joins that load a single row from the joined table, I was able to join the desired values in the base ordered query. For example, when loading the "like status" for a current user, only one like from the set of likes needs to be loaded to indicate whether or not the current post has been liked. Similarly, the presence of only one author for a given post produces a single joined author row. e.g.
$query = $em->createQueryBuilder()
->select('p')
->from('Post', 'p')
->leftJoin('p.author', 'a')
->leftJoin('p.likes', 'l', 'WITH', 'l.post_id = p.id AND l.user_id = 10')
->where("p.foo = bar")
->addSelect('a AS post_author')
->addSelect('l AS post_liked')
->orderBy('p.time', 'DESC')
->setFirstResult(0)
->setMaxResults(10);
// SUCCEEDS - because joins only join a single author and single like
// no collections are joined, so LIMIT applies only the the posts, as intended
$result = $query->getQuery()->getArrayResult();
This produces a result in the form:
[
[0] => [
['id'] => 1
['text'] => 'foo',
['author'] => [
['id'] => 10,
['username'] => 'username',
],
['likes'] => [
[0] => [
['post_id'] => 1,
['user_id'] => 10,
]
],
],
[1] => [...],
...
[9] => [...]
]
Then in a second query I load the hashtags for posts loaded in the previous query. e.g.
// we don't care about orders or limits here, we just want all the hashtags
$query = $em->createQueryBuilder()
->select('p, h')
->from('Post', 'p')
->leftJoin('p.hashtags', 'h')
->where("p.id IN :post_ids")
->setParameter('post_ids', $pids);
Which produces the following:
[
[0] => [
['id'] => 1
['text'] => 'foo',
['hashtags'] => [
[0] => [
['id'] => 1,
['name'] => '#foo',
],
[2] => [
['id'] => 2,
['name'] => '#bar',
],
...
],
],
...
]
Then I just traverse the results containing hashtags and append them to the original (ordered and limited) results. This approach ends up being much faster (even though it uses more queries), as it avoids GROUP BY and COUNT, fully leverages MySQL indexes, and allows for more complex queries, such as the one I posted here.

You can configure the paginator to use a simpler 'count' sql strategy by doing one or more of the optimizations below.
$paginator = new Paginator($query, false);
$paginator->setUseOutputWalkers(false);
If results are unexpected you may want to do a DISTINCT select (select('DISTINCT p'))
For us it made massive improvements and we had no need to write or use a custom paginator.
More details can be found on this site. Note that I am owner of that website.

Related

Laravel: How to load joined tables into the subarrays?

I have 2 tables:
User:
id
schedule_id
name
Schedules:
id
date
I execute a query using eager loading:
User::with (['schedules'])->get()->first();
And got result like this:
[
'id' => '3',
'schedule' => [
'id' => '13',
'date' => '20.11.2020',
],
'name' => 'John',
];
But when I execute a similar query using join,
User::join ('schedules', 'user.schedule_id', '=', 'schedules.id')->get()->first();
I got result like this, with merged arrays:
[
'id' => '13',
'date' => '20.11.2020',
'name' => 'John',
];
How can I got a result with separated arrays, using join in Eloquent?
Note: in raw PHP and PDO I always got separated arrays for any queries with join.
Data coming from a database query will always be formatted as a flat array, that's just how databases work. Eloquent has a lot of magic going on behind the scene's that is going to map the correct related values to the correct models in the collection.
When using joins, there is no way for Eloquent to know what data should be mapped to which relation or property.
If you want to use join queries, you either will have to use aliases for these properties on the models. Or you can manually map the properties the way you want them before using the data.
Ok, I can suggest a more suitable way to get results ordered by relation's column in Laravel's Eloquent.
You should use 2 queries.
Firstly just get only ids, but ordered by any relation's column.
(You can't order results by relation's column using eager loading)
$idsRaw = User::join ('schedules', 'user.schedule_id', '=', 'schedules.id')
->select ('users.id')
->orderBy ('schedules.date')
->get ()->toArray ();
$ids = \array_column ($idsRaw, 'id');
Secondly get good results with subarrays for relations using eager loading:
User::with (['schedules'])
->whereIn ('user.id', \implode (',', $ids))
->orderByRaw('FIELD(' . $ids . ',' . \implode(',', $ids) . ')')
->get()->toArray ();
Advantages:
You don't need manually map columns and aliases into subarrays (It may be very difficult if you have many relations).
Disadvantages:
This method works as slow as much ids you have.
Yes, if you have a thousands ids in the query, you should use one complex query with joins for all relations and manually map aliases into subarrays for best performance.

Cakephp3 case mysql statement is not creating the correct query

I'm trying to create a query that returns the sum of a column using a case (it has logged time and the format in either minutes or hours, if it's in hours, multiply by 60 to convert to minutes). I'm very close, however the query is not populating the ELSE part of the CASE.
The finder method is:
public function findWithTotalTime(Query $query, array $options)
{
$conversionCase = $query->newExpr()
->addCase(
$query->newExpr()->add(['Times.time' => 'hours']),
['Times.time*60', 'Times.time'],
['integer', 'integer']
);
return $query->join([
'table' => 'times',
'alias' => 'Times',
'type' => 'LEFT',
'conditions' => 'Times.category_id = Categories.id'
])->select([
'Categories.name',
'total' => $query->func()->sum($conversionCase)
])->group('Categories.name');
}
The resulting query is:
SELECT Categories.name AS `Categories__name`, (SUM((CASE WHEN
Times.time = :c0 THEN :c1 END))) AS `total` FROM categories Categories
LEFT JOIN times Times ON Times.category_id = Categories.id GROUP BY
Categories.name
It's missing the ELSE statement before the CASE end, which according to the API docs:
...the last $value is used as the ELSE value...
https://api.cakephp.org/3.3/class-Cake.Database.Expression.QueryExpression.html
I know there might be a better way to do this, but at this point I'd like to at least know how to do CASE statements properly using the built in QueryBuilder.
Both arguments must be arrays
Looks like there are some documenation issues in the Cookbook, and the API could maybe be a little more clear on that subject too. Both, the $conditions argument as well as the $values argument must be arrays in order for this to work.
Enforcing types ends up with casting values
Also you're passing the SQL expression wrong, including the wrong types, defining the types as integer will cause the data passed in $values to be casted to these types, which means that you will be left with 0s.
The syntax that you're using is useful when dealing with user input, which needs to be passed safely. In your case however you want to pass hardcoded identifiers, so what you have to do is to use the key => value syntax to pass the values as literals or identifiers. That would look something like:
'Times.time' => 'identifier'
However, unfortunately there seems to be a bug (or at least an undocumented limitation) which causes the else part to not recognize this syntax properly, so for now you'd have to use the manual way, that is by passing proper expression objects, which btw, you may should have done for the Times.time*60 anyways, as it would otherwise break in case automatic identifier quoting is being applied/required.
tl;dr, Example time
Here's a complete example with all forementioned techniques:
use Cake\Database\Expression\IdentifierExpression;
// ...
$conversionCase = $query
->newExpr()
->addCase(
[
$query->newExpr()->add(['Times.time' => 'hours'])
],
[
$query
->newExpr(new IdentifierExpression('Times.time'))
->add('60')
->tieWith('*'), // setConjunction() as of 3.4.0
new IdentifierExpression('Times.time')
],
);
If you were for sure that you'd never ever make use of automatic identifier quoting, then you could just pass the multiplication fragment as:
'Times.time * 60' => 'literal'
or:
$query->newExpr('Times.time * 60')
See also
Cookbook > Database Access & ORM > Query Builder > Case statements
Cookbook > Database Access & ORM > Query Builder > Using SQL Functions
API > \Cake\Database\Expression\QueryExpression::add()
API > \Cake\Database\Expression\QueryExpression::tieWith()

Doctrine 2 conditional multiple row update with QueryBuilder

The question has some answers on SO, but non of them seems to help accomplish, actually, a simple task.
I need to update multiple rows based on condition in one query, using Doctrine2 QueryBuilder. Most obvious way is supposed to be wrong:
$userAgeList = [
'user_name_a' => 30,
'user_name_b' => 40,
'user_name_c' => 50,
];
//Array of found `User` objects from database
$usersList = $this->getUsersList();
foreach($usersList as $user)
{
$userName = $user->getName();
$user->setAge($userAgeList[$userName]);
$this->_manager->persist($user);
}
$this->_manager->flush();
It will create an update query for each User object in transaction, but I need only one query. This source suggests that instead you should rely on the UPDATE query, because in case of it we only execute one SQL UPDATE statement, so I've made like this:
$userAgeList = [
'user_name_a' => 30,
'user_name_b' => 40,
'user_name_c' => 50,
];
$builder = $this->_manager->getRepository(self::REPOSITORY_USER)
->createQueryBuilder(self::REPOSITORY_USER);
foreach($userAgeList as $userName => $age)
{
$builder->update(self::REPOSITORY_USER, 'user')
->set('user.age', $builder->expr()->literal($age))
->where('user.name = :name')
->setParameter('name', $userName)
->getQuery()->execute();
}
But that also makes (obviously) a bunch of updates instead of one. If I assign result of getQuery() to a variable, and try to execute() it after foreach loop, I see that $query understands and accumulates a set(), but it does no such thing for WHERE condition.
Is there any way to accomplish such task in QueryBuilder?
UPDATE - similar questions:
Multiple update queries in doctrine and symfony2 - this one does not assume UPDATE in one query;
Symfony - update multiple records - this also says, qoute, 'one select - one update';
Update multiple columns with Doctrine in Symfony - WHERE condition in this query is always the same, not my case;
Doctrine 2: Update query with query builder - same thing as previous, only one WHERE clause;
http://doctrine-orm.readthedocs.org/en/latest/reference/batch-processing.html - doctrine batch processing does not mention conditions at all...
In MySQL I used to do it using CASE-THEN, but that's not supported by Doctrine.

Codeigniter, join of two tables with a WHERE clause

I've this code:
public function getAllAccess(){
$this->db->select('accesscode');
$this->db->where(array('chain_code' => '123');
$this->db->order_by('dateandtime', 'desc');
$this->db->limit($this->config->item('access_limit'));
return $this->db->get('accesstable')->result();
}
I need to join it with another table (codenamed table), I've to tell it this. Not really a literal query but what I want to achieve:
SELECT * accesscode, dateandtime FROM access table WHERE chain_code = '123' AND codenames.accselect_lista != 0
So basically accesstable has a column code which is a number, let us say 33, this number is also present in the codenames table; in this last table there is a field accselect_lista.
So I have to select only the accselect_lista != 0 and from there get the corrisponding accesstable rows where codenames are the ones selected in the codenames.
Looking for this?
SELECT *
FROM access_table a INNER JOIN codenames c ON
a.chain_code = c.chain_code
WHERE a.chain_code = '123' AND
c.accselect_lista != 0
It will bring up all columns from both tables for the specified criteria. The table and column names need to be exact, obviously.
Good start! But I think you might be getting a few techniques mixed up here.
Firstly, there are two main ways to run multiple where queries. You can use an associative array (like you've started to do there).
$this->db->where(array('accesstable.chain_code' => '123', 'codenames.accselect_lista !=' => 0));
Note that I've appended the table name to each column. Also notice that you can add alternative operators if you include them in the same block as the column name.
Alternatively you can give each their own line. I prefer this method because I think its a bit easier to read. Both will accomplish the same thing.
$this->db->where('accesstable.chain_code', '123');
$this->db->where('codenames.accselect_lista !=', 0);
Active record will format the query with 'and' etc on its own.
The easiest way to add the join is to use from with join.
$this->db->from('accesstable');
$this->db->join('codenames', 'codenames.accselect_lista = accesstable.code');
When using from, you don't need to include the table name in get, so to run the query you can now just use something like:
$query = $this->db->get();
return $query->result();
Check out Codeigniter's Active Record documentation if you haven't already, it goes into a lot more detail with lots of examples.

How to get the column value from the particular join table using YII framework?

I am having the query like this
$criteria = new CDbCriteria(array(
'distinct' => true,
'select' => array('assets_id'),
'condition' => 'assets_id in (159)',
'with' => array('tbl_asset_mappings'=>array('select'=>array('catid')), 'tbl_assets_details'=>array('select'=>array('filetype','original_filename'))),
'together' => true
));
$result=TblAssets::model()->findAll($criteria);
But I am getting all the column values from firsttable only.I didnt get the column values from second tables.why?
My aim is getting assets_id from tblasset,tbl_asset_mappings.catid,tbl_assets_details.filetype,tbl_assets_details.original_filename
How can I achieve that.
You are querying for objects, so you will get the relations as child objects as relations like $post->author->name.
You need instead to do a join not a with. In this situation is more easier to write as Join raw-query.
Maybe would be more easier if you just write your own query rather than constructing throughout Yii
You can access related object like $model->relatedModel->attribute.
Set a break point after model->findAll() and look to $model->_related property. You must have a collection of related models there.

Categories