How expensive is find('count') in CakePHP? - php

I have a CakePHP application where I am trying to get a count of how many of each type of post a user has. I am also using the paginate helper to display all the user's posts on their profile.
I could:
1) Make a find('all', $conditions) request for all the user's posts then parse that for the counts I need (by going post by post and checking if it matches what I'm looking for).
2) Make multiplie find('count', $conditions) requests, getting all the counts I need individually.
3) Maintain new columns in the user's table to track these numbers when the posts are created (put that would involve writing to the user account whenever a new post is made).
Which one would be the best choice?

Fetching all records from the database and counting them in PHP is hugely wasteful nonsense. That's exactly what databases are for.
A find('count') just makes an SQL query like SELECT COUNT(*) FROM … WHERE …, which is really the fastest way to go. Add appropriate conditions to get the count you want, you may need to brush up on your SQL to know what conditions these are.
There's also the special of counter-caching, which you might want to look into if you're counting hasMany relations.

You should modify your find call to include a group by condition to aggregate the count by type. See here for more info: http://debuggable.com/posts/how-to-group-by-in-cakephps-new-release-part-2:4850d2c9-edbc-4f0e-9e0e-07d64834cda3

Related

Moodle join tables without raw SQL query

I am new to the Moodle world. I want to make inner join 2 or more tables with PDO. I checked documentation but can't find anything helpful. Maybe I am missing some part of Moodle. Let's say I want to get all users enrolled to the specified course.
Is there any way to make something similar to this:
$users = get_records(['course', 'user'], 'course.id = user.course_id');
I got replied from Moodle forum:
No.
The DB API is there to simplify many common situations (e.g. getting
one or more records from a single table), but if you need to gather
data from more than one table at once, then you need to use
$DB->get_records_sql() (or similar).
However, if you want a list of users in a particular course, then use
the get_enrolled_users() function, rather than trying to manually
write the SQL query for it.

Database design for hashtags in MySQL

Im currently working on system that will enable to use of hashtags on our site and im having some trouble with how to best and most efficiently store the hashtags in the database. The design needs to be set up so that its relatively simple to retrieve posts that match search terms (like on Twitter when you click the link of a hashtag and it shows all the tweets with that hashtag).
The hashtags will be stored in db by extracting the terms from the content of created posts (also comparable to twitter) and inserting them. How is insert them of course is the problem at hand:
At the moment I'm torn between 2 possible designs:
1) My first design idea (and perhaps more conventional) is a 3-table design:
the first table simply stores the post content and other data related
to the post itself (im already using a table like this).
the second table simply stores new hashtags being used, basically functioning as a look-up for all hashtags that have been used.
the third table is a table that defines the relationships between hashtags and posts. So basically is would be a simple table that
would have one column with the ID of a post and another column for
the ID of a single hashtag that we stored in the previous table. So a post that has for example 3 hashtags would have 3 rows in this table, 1 for each hashtag with which it is associated.
2) The second design is 2-table design:
the same table with the post data stored in it, like above.
the 2nd table is a mix of the 2nd and 3rd table in the first design: It holds the data between the relationships of hashtags and
posts, but instead of storing the new hashtag in a table assigning it
an ID, it simply stores the actual hashtag (so for example "#test")
itself along with the ID of the post. Same concept applies here that
if a post has 3 hashtags in it, it would store 3 individual rows in
the table.
The reason I'm torn between the ideas, is because the first option does seem to the more standard way to do it and there seems to be more "structure" to it. Since they are hashtags, however, I don't see a lot of purpose in actually assigning a unique ID to each hashtag, since hashtags aren't true classifications like a category or genre or so.
Also for when I try to make a search page for hashtags I have to use less JOINs since I wouldn't need to look up the ID of the searched terms and then go to another table and find the associated posts with that ID.
Additionally, when trying to simply list the hashtags of a post, one thing that would be kind of annoying is that the hashtags may print out differently than a user may have stylized them in their post. So for example if a user adds #testing, but another user had previously entered a post with #TeStIng, the hashtag for the post would then print out #TeStIng, since that's how it would have been saved in the database lookup table. Of course you could make it case-sensitive but in searches #testing and #TeStIng should be considered the same hashtag so that could get messy. Or am I wrong about this? Does anyone have a suggestion about how to avoid this scenario?
On the other hand my concern with the 2nd table design is that I fear it could become inefficient if the table grows to be huge, since looking up strings is slower than searching for integers (which I would be doing with the first design). However, since I would have to use more JOINs in the 1st design, would there actually be a performance difference? Just to be clear, when searching for strings themselves I would be using the = operator and not LIKE.
Similarly, I would imagine that the first design is more efficient if I wanted to make queries about the hashtags themselves, for example how many posts are using a certain hashtag and things like that, though it would not be very difficult with the 2nd design either, I just wonder about the efficiency again.
Any thoughts on what may work better? The most important thing is that it is efficient to search by hashtag, so for example I'm trying to find posts that have #test associated with them. Ideally, I would also like to be able to retrieve a post's hashtag from the database as it was stylized by the user in the post content. All other queries and functions around analyzing hashtags is secondary at this point.
Purely from a database normalization perspective your second design would not be in the 3NF. There's a reason why you rely on the whole primary and nothing but the key. If anything in the hash table changes that has a direct impact on the post table you come up with a logical inconsistency. For example, the table of hashtags has two rows: one with the hashtag #politics and another with the hashtag #politic. Let's say the person that created the post for the second hashtags decides to edit their post and updates the hashtag to #politics (perhaps because they made a typo). Which row do you update?
As for performance, I wouldn't worry about it in the least with the first design. Your database (like almost every major relational dbms out there today) relies on something called a binary search tree (or more specifically a red-black tree) to optimize the cost of insertion/deletion/search in your database tables when you're properly indexing these values. It can further optimize this with O(1) (hashtable lookups) in some text search use cases or you could even do that in a key/value cache store like Memcached/Redis yourself down the road. For the most part, indexing the hashtags in order to create faster search of posts that use those hashtags is definitely the design you want to go for. Since the biggest cost factor isn't in looking up a single hashtag (most searches will have a single hashtag I'm assuming in this use case), but retrieving all of the posts that contain that hashtag.
As for addressing the case-insensitive search portion of your query, your dbms most likely has some collation option that you can specify in your schema (like utf8_general_ci) where the ci represents case-insensitive comparison in the schema. Meaning, the data will be stored as-is, but when compared in a query to another value, MySQL would do the comparison of characters in a case-insensitive manner.

Which is the best way to fetch 1-M relationship data?

I am using Yii framework and I have a Post that has many comments, I need to get a list of posts, and each post should hold the latest 5 comments with it.
First Option: Is to make a join between posts and comments table and in the PHP code I should normalize the posts list to remove the duplicates and put the commets inside each related post.
Second Option: Is to load the posts list, then in PHP code iterate over these posts list and load the comments for each post using a separate sql hit.
Which one has the best performance and is there any better way to do it?
You should never have incremental hits on your database because of the data. Therefore, the first option would be the wisest one. If you do a join and then filter away the stuff you do not need, your program will be a lot faster than if you do one more database lookup for each row the previous query returned.
For more information, have a look on lazy and eager loading here: http://www.yiiframework.com/forum/index.php/topic/34412-eager-loading-vs-lazy-loading/

Where to store search matches in cakephp?

I am writing an app in cakephp that will perform scheduled searches for users and store the search results in a matches table. My question is do I really need this matches model in cakephp to store the results? If the answer is no, how should I store the results?
Happy new year.
There are many ways to store data and the one you choose will depend on the data itself and the use to which it will be put (and when it will be used). Because you are doing scheduled searches, I assume that the user may not be around when the search is done, in which case the result needs to be stored.
In this case, I'd use the database. If you need to keep historical results this is definitely the way to go. If the results can be overwritten, you could use a text file per user, but that might get messy.
You don't need to use the main database - you could have another MySql, for example or even a totally different one such as a flat file db.
What would I do? I'd use a table in the main database and get on with something else.

What is the most efficient way to paginate my site when querying with SQL?

I am trying to paginate the results of an SQL query for use on a web page. The language and the database backend are PHP and SQLite.
The code I'm using works something like this (page numbering starts at 0)
http://example.com/table?page=0
page = request(page)
per = 10 // results per page
offset = page * per
// take one extra record so we know if a next link is needed
resultset = query(select columns from table where conditions limit offset, per + 1)
if(page > 0) show a previous link
if(count(resultset) > per) show a next link
unset(resultset[per])
display results
Are there more efficient ways to do pagination than this?
One problem that I can see with my current method is that I must store all 10 (or however many) results in memory before I start displaying them. I do this because PDO does not guarantee that the row count will be available.
Is it more efficient to issue a COUNT(*) query to learn how many rows exist, then stream the results to the browser?
Is this one of those "it depends on the size of your table, and whether the count(*) query requires a full table scan in the database backend", "do some profiling yourself" kind of questions?
I've opted to go with the COUNT(*) two query method, because it allows me to create a link directly to the last page, which the other method does not allow. Performing the count first also allows me to stream the results, and so should work well with higher numbers of records with less memory.
Consistency between pages is not an issue for me. Thank you for your help.
There are several cases where I have a fairly complex (9-12 table join) query, returning many thousands of rows, which I need to paginate. Obviously to paginate nicely, you need to know the total size of the result. With MySQL databases, using the SQL_CALC_FOUND_ROWS directive in the SELECT can help you achieve this easily, although the jury is out on whether that will be more efficient for you to do.
However, since you are using SQLite, I recommend sticking with the 2 query approach. Here is a very concise thread on the matter.
i'd suggest just doing the count first. a count(primary key) is a very efficient query.
I doubt that it will be a problem for your users to wait for the backend to return ten rows. (You can make it up to them by being good at specifying image dimensions, make the webserver negotiate compressed data transfers when possible, etc.)
I don't think that it will be very useful for you to do a count(*) initially.
If you are up to some complicated coding: When the user is looking at page x, use ajax-like magic to pre-load page x+1 for improved user experience.
A general note about pagination:
If the data changes while the user browses through your pages, it may be a problem if your solution demands a very high level of consistency. I've writte a note about that elsewhere.

Categories