Points system like stackoverflow - php

I am trying to create a point system in my program similar to stack overflow i.e. when the user does some good deed (activity) his/her points are increased. I am wondering what is the best way to go about implementing this in terms of db schema + logic.
I can think of three options:
Add an extra field called points in the users table, and everytime a user does something, add it to that field (but this will not be able to show an activity of sorts)
Create a function which will run everytime the user does good deed and it calculates from scratch the value and updates the points field
Calculate everytime using a function without any points field.
What is the best way to go about this? Thank you for your time.

Personally, I would use the second option to approach this problem.
The first option limits functionality, so I eliminate that right away.
The third option is inefficient in terms of performance - it is likely that you will be fetching that number a lot, and, if your program is anything like stackoverflow, perhaps showing (calculating) that number many times per pageview/action.
To me, the second option is a decent hybrid solution. Normally, I hate having duplicated data in my system (actions and points, rather than one or the other), but in this case, an integer field is a rather small amount of space per user that saves you a lot of time in recalculating the values unnecessarily.
We must, at times, trade data storage space for performance or vice versa, and I would say that #2 is a trade-off that greatly benefits the application.

This depends very much on the number of expected computations you'll face. In fact, SO apparently uses a method which is similar to your 1) approach, for performance reasons I assume.
This also prevents jumps in the numbers if factors change (such as deleted items which awarded points, or here on SO replies which become community wiki, changes in the point rules, external actions such as joining another account here on SO etc.)
If a recalc solution (2) is what you want, you may implement a "smart" caching by clearing the value (setting it to NULL which would mean "dirty") each time a point modification may take place, and re-computing it when it is NULL, using the cache otherwise. You could also (as a self-correcting measure when non-explicit things happened) clear out the values after an hour, a day or whatever you think firs so that a recalc is forced after a certain time, independently of the "dirty" state.

I would go for 1 and 2 (run in cron on every minute or so).
So that:
- extra field would act as a cache to the amount of point.
- The function to calc the points could be a single sql query that would recalculate the points for all users at once to gain some speed.
I think that recalculating the field each time when point is recieved would be an overkill.

Personally, I'd go with the first option, and add an "Actions" table to keep track of your activity history.
When a user does something good, they get an entry in the "Actions" table, with the action and some point value. The point value can come from another table, or some config file. That same value gets added to the user record.
At any point in time, you could sum up the actions and get the user total, but for performance, simply updating when you add the action record would be simple enough.

How simple is your points system going to be?
I reckon some kind of logging / journalling is good so that you can track activity on a daily /weekly/monthly basis across all users

Check out http://code.google.com/p/userinfuser/
Its open source and allows for you to add points and badges to your application. It has Java, Python, PHP, and Ruby bindings.

Related

Collecting database changes for later

I'm currently facing a problem with my game, namely that it's gained a lot of activity lately.
The premise of the game involves making interactions with other players to give their characters EXP (in the hopes they will return the favour so you both get rewarded). However, this means there are a LOT of UPDATE x SET `exp` = ? WHERE `id` = ?-type queries.
In addition to this, a random user is chosen to receive double EXP, and this effect is "contagious" in that it passes to another based on activity during its time with the current host. This makes that particular user very desirable and likely to receive EXP, resulting in a concentration of activity there. (Race conditions aren't really a concern here, any EXP lost in a race condition won't end the world)
As a test I have temporarily disabled EXP gain and the server lag was greatly reduced as a result (still collecting results on that fact). This makes me think that the EXP gain is indeed the culprit here.
I already have Memcached saving the data, so for the most part the database is only being hit with UPDATE queries, very few SELECTs. What I would like to do is accumulate these EXP gains and apply them every so often via a Cron script to try and reduce the activity on that table. Essentially, rather than every single user's interaction modifying the table, it would be stored and then the updates applied by a single process.
However, the problem I then face is "how do I store these TODO EXP gains"?
While I can use Memcached to store them, Memcached isn't very good for arbitrary key/value pairs (eg. key=ID of thing to apply EXP to, value=EXP gained) and I'm unsure how I might retrieve that list.
The other option is to use a separate table that just lists the EXP gains. It would literally be CREATE TABLE `expgains` (`apply_to` INT..., `deltaexp` INT...) with the idea being it will only contain EXP changes (maybe a hundred thousand rows instead of ~20M) and no other data. The Cron script could then atomically read and wipe the table and apply those EXP changes at once.
However, I don't know if moving the problem will really fix it.
Any other ideas how to go about doing this? Or opinions on the above ideas?

What are the number of ways in which my approach to a news-feed is wrong?

This question has been asked a THOUSAND times... so it's not unfair if you decide to skip reading/answering it, but I still thought people would like to see and comment on my approach...
I'm building a site which requires an activity feed, like FourSquare.
But my site has this feature for the eye-candy's sake, and doesn't need the stuff to be saved forever.
So, I write the event_type and user_id to a MySQL table. Before writing new events to the table, I delete all the older, unnecessary rows (by counting the total number of rows, getting the event_id lesser than which everything is redundant, and deleting those rows). I prune the table, and write a new row every time an event happens. There's another user_text column which is NULL if there is no user-generated text...
In the front-end, I have jQuery that checks with a PHP file via GET every x seconds the user has the site open. The jQuery sends a request with the last update "id" it received. The <div> tags generated by my backend have the "id" attribute set as the MySQL row id. This way, I don't have to save the last_received_id in memory, though I guess there's absolutely no performance impact from storing one variable with a very small int value in memory...
I have a function that generates an "update text" depending on the event_type and user_id I pass it from the jQuery, and whether the user_text column is empty. The update text is passed back to jQuery, which appends the freshly received event <div> to the feed with some effects, while simultaneously getting rid of the "tail end" event <div> with an effect.
If I (more importantly, the client) want to, I can have an "event archive" table in my database (or a different one) that saves up all those redundant rows before deleting. This way, event information will be saved forever, while not impacting the performance of the live site...
I'm using CodeIgniter, so there's no question of repeated code anywhere. All the pertinent functions go into a LiveUpdates class in the library and model respectively.
I'm rather happy with the way I'm doing it because it solves the problem at hand while sticking to the KISS ideology... but still, can anyone please point me to some resources, that show a better way to do it? A Google search on this subject reveals too many articles/SO questions, and I would like to benefit from the experience any other developer that has already trawled through them and found out the best approach...
If you use proper indexes there's no reason you couldn't keep all the events in one table without affecting performance.
If you craft your polling correctly to return nothing when there is nothing new you can minimize the load each client has on the server. If you also look into push notification (the hybrid delayed-connection-closing method) this will further help you scale big successfully.
Finally, it is completely unnecessary to worry about variable storage in the client. This is premature optimization. The performance issues are going to be in the avalanche of connections to the web server from many users, and in the DB, tables without proper indexes.
About indexes: An index is "proper" when the most common query against a table can be performed with a seek and a minimal number of reads (like 1-5). In your case, this could be an incrementing id or a date (if it has enough precision). If you design it right, the operation to find the most recent update_id should be a single read. Then when your client submits its ajax request to see if there is updated content, first do a query to see if the value submitted (id or time) is less than the current value. If so, respond immediately with the new content via a second query. Keeping the "ping" action as lightweight as possible is your goal, even if this incurs a slightly greater cost for when there is new content.
Using a push would be far better, though, so please explore Comet.
If you don't know how many reads are going on with your queries then I encourage you to explore this aspect of the database so you can find it out and assess it properly.
Update: offering the idea of clients getting a "yes there's new content" answer and then actually requesting the content was perhaps not the best. Please see Why the Fat Pings Win for some very interesting related material.

Optimizing queries for the next and previous element

I am looking for the best way to retrieve the next and previous records of a record without running a full query. I have a fully implemented solution in place, and would like to know whether there are any better approaches to do this out there.
Let's say we are building a web site for a fictitious greengrocer. In addition to his HTML pages, every week, he wants to publish a list of special offers on his site. He wants those offers to reside in an actual database table, and users have to be able to sort the offers in three ways.
Every item also has to have a detail page with more, textual information on the offer and "previous" and "next" buttons. The "previous" and "next" buttons need to point to the neighboring entries depending on the sorting the user had chosen for the list.
(source: pekkagaiser.com)
Obviously, the "next" button for "Tomatoes, Class I" has to be "Apples, class 1" in the first example, "Pears, class I" in the second, and none in the third.
The task in the detail view is to determine the next and previous items without running a query every time, with the sort order of the list as the only available information (Let's say we get that through a GET parameter ?sort=offeroftheweek_price, and ignore the security implications).
Obviously, simply passing the IDs of the next and previous elements as a parameter is the first solution that comes to mind. After all, we already know the ID's at this point. But, this is not an option here - it would work in this simplified example, but not in many of my real world use cases.
My current approach in my CMS is using something I have named "sorting cache". When a list is loaded, I store the item positions in records in a table named sortingcache.
name (VARCHAR) items (TEXT)
offeroftheweek_unsorted Lettuce; Tomatoes; Apples I; Apples II; Pears
offeroftheweek_price Tomatoes;Pears;Apples I; Apples II; Lettuce
offeroftheweek_class_asc Apples II;Lettuce;Apples;Pears;Tomatoes
obviously, the items column is really populated with numeric IDs.
In the detail page, I now access the appropriate sortingcache record, fetch the items column, explode it, search for the current item ID, and return the previous and next neighbour.
array("current" => "Tomatoes",
"next" => "Pears",
"previous" => null
);
This is obviously expensive, works for a limited number of records only and creates redundant data, but let's assume that in the real world, the query to create the lists is very expensive (it is), running it in every detail view is out of the question, and some caching is needed.
My questions:
Do you think this is a good practice to find out the neighbouring records for varying query orders?
Do you know better practices in terms of performance and simplicity? Do you know something that makes this completely obsolete?
In programming theory, is there a name for this problem?
Is the name "Sorting cache" is appropriate and understandable for this technique?
Are there any recognized, common patterns to solve this problem? What are they called?
Note: My question is not about building the list, or how to display the detail view. Those are just examples. My question is the basic functionality of determining the neighbors of a record when a re-query is impossible, and the fastest and cheapest way to get there.
If something is unclear, please leave a comment and I will clarify.
Starting a bounty - maybe there is some more info on this out there.
Here is an idea. You could offload the expensive operations to an update when the grocer inserts/updates new offers rather than when the end user selects the data to view. This may seem like a non-dynamic way to handle the sort data, but it may increase speed. And, as we know, there is always a trade off between performance and other coding factors.
Create a table to hold next and previous for each offer and each sort option. (Alternatively, you could store this in the offer table if you will always have three sort options -- query speed is a good reason to denormalize your database)
So you would have these columns:
Sort Type (Unsorted, Price, Class and Price Desc)
Offer ID
Prev ID
Next ID
When the detail information for the offer detail page is queried from the database, the NextID and PrevID would be part of the results. So you would only need one query for each detail page.
Each time an offer is inserted, updated or deleted, you would need to run a process which validates the integrity/accuracy of the sorttype table.
I have an idea somewhat similar to Jessica's. However, instead of storing links to the next and previous sort items, you store the sort order for each sort type. To find the previous or next record, just get the row with SortX=currentSort++ or SortX=currentSort--.
Example:
Type Class Price Sort1 Sort2 Sort3
Lettuce 2 0.89 0 4 0
Tomatoes 1 1.50 1 0 4
Apples 1 1.10 2 2 2
Apples 2 0.95 3 3 1
Pears 1 1.25 4 1 3
This solution would yield very short query times, and would take up less disk space than Jessica's idea. However, as I'm sure you realize, the cost of updating one row of data is notably higher, since you have to recalculate and store all sort orders. But still, depending on your situation, if data updates are rare and especially if they always happen in bulk, then this solution might be the best.
i.e.
once_per_day
add/delete/update all records
recalculate sort orders
Hope this is useful.
I've had nightmares with this one as well. Your current approach seems to be the best solution even for lists of 10k items. Caching the IDs of the list view in the http session and then using that for displaying the (personalized to current user) previous/next. This works well especially when there are too many ways to filter and sort the initial list of items instead of just 3.
Also, by storing the whole IDs list you get to display a "you are at X out of Y" usability enhancing text.
By the way, this is what JIRA does as well.
To directly answer your questions:
Yes it's good practice because it scales without any added code complexity when your filter/sorting and item types crow more complex. I'm using it in a production system with 250k articles with "infinite" filter/sort variations. Trimming the cacheable IDs to 1000 is also a possibility since the user will most probably never click on prev or next more than 500 times (He'll most probably go back and refine the search or paginate).
I don't know of a better way. But if the sorts where limited and this was a public site (with no http session) then I'd most probably denormalize.
Dunno.
Yes, sorting cache sounds good. In my project I call it "previous/next on search results" or "navigation on search results".
Dunno.
In general, I denormalize the data from the indexes. They may be stored in the same rows, but I almost always retrieve my result IDs, then make a separate trip for the data. This makes caching the data very simple. It's not so important in PHP where the latency is low and the bandwidth high, but such a strategy is very useful when you have a high latency, low bandwidth application, such as an AJAX website where much of the site is rendered in JavaScript.
I always cache the lists of results, and the results themselves separately. If anything affects the results of a list query, the cache of the list results is refreshed. If anything affects the results themselves, those particular results are refreshed. This allows me to update either one without having to regenerate everything, resulting in effective caching.
Since my lists of results rarely change, I generate all the lists at the same time. This may make the initial response slightly slower, but it simplifies cache refreshing (all the lists get stored in a single cache entry).
Because I have the entire list cached, it's trivial to find neighbouring items without revisiting the database. With luck, the data for those items will also be cached. This is especially handy when sorting data in JavaScript. If I already have a copy cached on the client, I can resort instantly.
To answer your questions specifically:
Yes, it's a fantastic idea to find out the neighbours ahead of time, or whatever information the client is likely to access next, especially if the cost is low now and the cost to recalculate is high. Then it's simply a trade off of extra pre-calculation and storage versus speed.
In terms of performance and simplicity, avoid tying things together that are logically different things. Indexes and data are different, are likely to be changed at different times (e.g. adding a new datum will affect the indexes, but not the existing data), and thus should be accessed separately. This may be slightly less efficient from a single-threaded standpoint, but every time you tie something together, you lose caching effectiveness and asychronosity (the key to scaling is asychronosity).
The term for getting data ahead of time is pre-fetching. Pre-fetching can happen at the time of access or in the background, but before the pre-fetched data is actually needed. Likewise with pre-calculation. It's a trade-off of cost now, storage cost, and cost to get when needed.
"Sorting cache" is an apt name.
I don't know.
Also, when you cache things, cache them at the most generic level possible. Some stuff might be user specific (such as results for a search query), where others might be user agnostic, such as browsing a catalog. Both can benefit from caching. The catalog query might be frequent and save a little each time, and the search query may be expensive and save a lot a few times.
I'm not sure whether I understood right, so if not, just tell me ;)
Let's say, that the givens are the query for the sorted list and the current offset in that list, i.e. we have a $query and an $n.
A very obvious solution to minimize the queries, would be to fetch all the data at once:
list($prev, $current, $next) = DB::q($query . ' LIMIT ?i, 3', $n - 1)->fetchAll(PDO::FETCH_NUM);
That statement fetches the previous, the current and the next elements from the database in the current sorting order and puts the associated information into the corresponding variables.
But as this solution is too simple, I assume I misunderstood something.
There are as many ways to do this as to skin the proverbial cat. So here are a couple of mine.
If your original query is expensive, which you say it is, then create another table possibly a memory table populating it with the results of your expensive and seldom run main query.
This second table could then be queried on every view and the sorting is as simple as setting the appropriate sort order.
As is required repopulate the second table with results from the first table, thus keeping the data fresh, but minimising the use of the expensive query.
Alternately, If you want to avoid even connecting to the db then you could store all the data in a php array and store it using memcached. this would be very fast and provided your lists weren't too huge would be resource efficient. and can be easily sorted.
DC
Basic assumptions:
Specials are weekly
We can expect the site to change infrequently... probably daily?
We can control updates to the database with ether an API or respond via triggers
If the site changes on a daily basis, I suggest that all the pages are statically generated overnight. One query for each sort-order iterates through and makes all the related pages. Even if there are dynamic elements, odds are that you can address them by including the static page elements. This would provide optimal page service and no database load. In fact, you could possibly generate separate pages and prev / next elements that are included into the pages. This may be crazier with 200 ways to sort, but with 3 I'm a big fan of it.
?sort=price
include(/sorts/$sort/tomatoes_class_1)
/*tomatoes_class_1 is probably a numeric id; sanitize your sort key... use numerics?*/
If for some reason this isn't feasible, I'd resort to memorization. Memcache is popular for this sort of thing (pun!). When something is pushed to the database, you can issue a trigger to update your cache with the correct values. Do this in the same way you would if as if your updated item existed in 3 linked lists -- relink as appropriate (this.next.prev = this.prev, etc). From that, as long as your cache doesn't overfill, you'll be pulling simple values from memory in a primary key fashion.
This method will take some extra coding on the select and update / insert methods, but it should be fairly minimal. In the end, you'll be looking up [id of tomatoes class 1].price.next. If that key is in your cache, golden. If not, insert into cache and display.
Do you think this is a good practice to find out the neighboring records for varying query orders? Yes. It is wise to perform look-aheads on expected upcoming requests.
Do you know better practices in terms of performance and simplicity? Do you know something that makes this completely obsolete? Hopefully the above
In programming theory, is there a name for this problem? Optimization?
Is the name "Sorting cache" is appropriate and understandable for this technique? I'm not sure of a specific appropriate name. It is caching, it is a cache of sorts, but I'm not sure that telling me you have a "sorting cache" would convey instant understanding.
Are there any recognized, common patterns to solve this problem? What are they called? Caching?
Sorry my tailing answers are kind of useless, but I think my narrative solutions should be quite useful.
You could save the row numbers of the ordered lists into views, and you could reach the previous and next items in the list under (current_rownum-1) and (current_rownum+1) row numbers.
The problem / datastructur is named bi-directional graph or you could say you've got several linked lists.
If you think of it as a linked list, you could just add fields to the items table for every sorting and prev / next key. But the DB Person will kill you for that, it's like GOTO.
If you think of it as a (bi-)directional graph, you go with Jessica's answer. The main problem there is that order updates are expensive operations.
Item Next Prev
A B -
B C A
C D B
...
If you change one items position to the new order A, C, B, D, you will have to update 4 rows.
Apologies if I have misunderstood, but I think you want to retain the ordered list between user accesses to the server. If so, your answer may well lie in your caching strategy and technologies rather than in database query/ schema optimization.
My approach would be to serialize() the array once its first retrieved, and then cache that in to a separate storage area; whether that's memcached/ APC/ hard-drive/ mongoDb/ etc. and retain its cache location details for each user individually through their session data. The actual storage backend would naturally be dependent upon the size of the array, which you don't go into much detail about, but memcached scales great over multiple servers and mongo even further at a slightly greater latency cost.
You also don't indicate how many sort permutations there are in the real-world; e.g. do you need to cache separate lists per user, or can you globally cache per sort permutation and then filter out what you don't need via PHP?. In the example you give, I'd simply cache both permutations and store which of the two I needed to unserialize() in the session data.
When the user returns to the site, check the Time To Live value of the cached data and re-use it if still valid. I'd also have a trigger running on INSERT/ UPDATE/ DELETE for the special offers that simply sets a timestamp field in a separate table. This would immediately indicate whether the cache was stale and the query needed to be re-run for a very low query cost. The great thing about only using the trigger to set a single field is that there's no need to worry about pruning old/ redundant values out of that table.
Whether this is suitable would depend upon the size of the data being returned, how frequently it was modified, and what caching technologies are available on your server.
So you have two tasks:
build sorted list of items (SELECTs with different ORDER BY)
show details about each item (SELECT details from database with possible caching).
What is the problem?
PS: if ordered list may be too big you just need PAGER functionality implemented. There could be different implementations, e.g. you may wish to add "LIMIT 5" into query and provide "Show next 5" button. When this button is pressed, condition like "WHERE price < 0.89 LIMIT 5" is added.

Need some suggestion for a database schema design

I'm designing a very simple (in terms of functionality) but difficult (in terms of scalability) system where users can message each other. Think of it as a very simple chatting service. A user can insert a message through a php page. The message is short and has a recipient name.
On another php page, the user can view all the messages that were sent to him all at once and then deletes them on the database. That's it. That's all the functionality needed for this system. How should I go about designing this (from a database/php point of view)?
So far I have the table like this:
field1 -> message (varchar)
field2 -> recipient (varchar)
Now for sql insert, I find that the time it takes is constant regardless of number of rows in the database. So my send.php will have a guaranteed return time which is good.
But for pulling down messages, my pull.php will take longer as the number of rows increase! I find the sql select (and delete) will take longer as the rows grow and this is true even after I have added an index for the recipient field.
Now, if it was simply the case that users will have to wait a longer time before their messages are pulled on the php then it would have been OK. But what I am worried is that when each pull.php service time takes really long, the php server will start to refuse connections to some request. Or worse the server might just die.
So the question is, how to design this such that it scales? Any tips/hints?
PS. Some estiamte on numbers:
number of users starts with 50,000 and goes up.
each user on average have around 10 messages stored before the other end might pull it down.
each user sends around 10-20 messages a day.
UPDATE from reading the answers so far:
I just want to clarify that by pulling down less messages from pull.php does not help. Even just pull one message will take a long time when the table is huge. This is because the table has all the messages so you have to do a select like this:
select message from DB where recipient = 'John'
even if you change it to this it doesn't help much
select top 1 message from DB where recipient = 'John'
So far from the answers it seems like the longer the table the slower the select will be O(n) or slightly better, no way around it. If that is the case, how should I handle this from the php side? I don't want the php page to fail on the http because the user will be confused and end up refreshing like mad which makes it even worse.
the database design for this is simple as you suggest. As far as it taking longer once the user has more messages, what you can do is just paginate the results. Show the first 10/50/100 or whatever makes sense and only pull those records. Generally speaking, your times shouldn't increase very much unless the volume of messages increases by an order of magnatude or more. You should be able to pull back 1000 short messages in way less than a second. Now it may take more time for the page to display at that point, but thats where the pagination should help.
I would suggest though going through and thinking of future features and building your database out a little more based on that. Adding more features to the software is easy, changing the database is comparatively harder.
Follow the rules of normalization. Try to reach 3rd normal form. To go further for this type of application probably isn’t worth it. Keep your tables thin.
Don’t actually delete rows just mark them as deleted with a bit flag. If you really need to remove them for some type of maintenance / cleanup to reduce size. Mark them as deleted and then create a cleanup process to archive or remove the records during low usage hours.
Integer values are easier for SQL server to deal with then character values. So instead of where recipient = 'John' use WHERE Recipient_ID = 23 You will gain this type of behavior when you normalize your database.
Don't use VARCHAR for your recipient. It's best to make a Recipient table with a primary key that is an integer (or bigint if you are expecting extremely large quantities of people).
Then when you do your select statements:
SELECT message FROM DB WHERE recipient = 52;
The speed retrieving rows will be much faster.
Plus, I believe MySQL indexes are B-Trees, which is O(log n) for most cases.
A database table without an index is called a heap, querying a heap results in each row of the table being evaluated even with a 'where' clause, the big-o notation for a heap is O(n) with n being the number of rows in the table. Adding an index (and this really depends on the underlying aspects of your database engine) results in a complexity of O(log(n)) to find the matching row in the table. This is because the index most certainly is implemented in a b-tree sort of way. Adding rows to the table, even with an index present is an O(1) operation.
> But for pulling down messages, my pull.php will take longer as the number of rows
increase! I find the sql select (and delete) will take longer as the rows grow and
this is true even after I have added an index for the recipient field.
UNLESS you are inserting into the middle of an index, at which point the database engine will need to shift rows down to accommodate. The same occurs when you delete from the index. Remember there is more than one kind of index. Be sure that the index you are using is not a clustered index as more data must be sifted through and moved with inserts and deletes.
FlySwat has given the best option available to you... do not use an RDBMS because your messages are not relational in a formal sense. You will get much better performance from a file system.
dbarker has also given correct answers. I do not know why he has been voted down 3 times, but I will vote him up at the risk that I may lose points. dbarker is referring to "Vertical Partitioning" and his suggestion is both acceptable and good. This isn't rocket surgery people.
My suggestion is to not implement this kind of functionality in your RDBMS, if you do remember that select, update, insert, delete all place locks on pages in your table. If you do go forward with putting this functionality into a database then run your selects with a nolock locking hint if it is available on your platform to increase concurrency. Additionally if you have so many concurrent users, partition your tables vertically as dbarker suggested and place these database files on separate drives (not just volumes but separate hardware) to increase I/O concurrency.
So the question is, how to design this such that it scales? Any tips/hints?
Yes, you don't want to use a relational database for message queuing. What you are trying to do is not what a relational database is best designed for, and while you can do it, its kinda like driving in a nail with a screwdriver.
Instead, look at one of the many open source message queues out there, the guys at SecondLife have a neat wiki where they reviewed a lot of them.
http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes
This is an unavoidable problem - more messages, more time to find the requested ones. The only thing you can do is what you already did - add an index and turn O(n) look up time for a complete table scan into O(log(u) + m) for a clustered index look up where n is the number of total messages, u the number of users, and m the number of messages per user.
Limit the number of rows that your pull.php will display at any one time.
The more data you transfer, longer it will take to display the page, regardless of how great your DB is.
You must limit your data in the SQL, return the most recent N rows.
EDIT
Put an index on Recipient and it will speed it up. You'll need another column to distinguish rows if you want to take the top 50 or something, possibly SendDate or an auto incrementing field. A Clustered index will slow down inserts, so use a regular index there.
You could always have only one row per user and just concatenate messages together into one long record. If you're keeping messages for a long period of time, that isn't the best way to go, but it reduces your problem to a single find and concatenate at storage time and a single find at retrieve time. It's hard to say without more detail - part of what makes DB design hard is meeting all the goals of the system in a well-compromised way. Without all the details, its hard to give advice on the best compromise.
EDIT: I thought I was fairly clear on this, but evidently not: You would not do this unless you were blanking a reader's queue when he reads it. This is why I prompted for clarification.

How to increase performance for MySQL database

How to increase the performance for mysql database because I have my website hosted in shared server and they have suspended my account because of "too many queries"
the stuff asked "index" or "cache" or trim my database
I don't know what does "index" and cache mean and how to do it on php
thanks
What an index is:
Think of a database table as a library - you have a big collection of books (records), each with associated data (author name, publisher, publication date, ISBN, content). Also assume that this is a very naive library, where all the books are shelved in order by ISBN (primary key). Just as the books can only have one physical ordering, a database table can only have one primary key index.
Now imagine someone comes to the librarian (database program) and says, "I would like to know how many Nora Roberts books are in the library". To answer this question, the librarian has to walk the aisles and look at every book in the library, which is very slow. If the librarian gets many requests like this, it is worth his time to set up a card catalog by author name (index on name) - then he can answer such questions much more quickly by referring to the catalog instead of walking the shelves. Essentially, the index sets up an 'alternative ordering' of the books - it treats them as if they were sorted alphabetically by author.
Notice that 1) it takes time to set up the catalog, 2) the catalog takes up extra space in the library, and 3) it complicates the process of adding a book to the library - instead of just sticking a book on the shelf in order, the librarian also has to fill out an index card and add it to the catalog. In just the same way, adding an index on a database field can speed up your queries, but the index itself takes storage space and slows down inserts. For this reason, you should only create indexes in response to need - there is no point in indexing a field you rarely search on.
What caching is:
If the librarian has many people coming in and asking the same questions over and over, it may be worth his time to write the answer down at the front desk. Instead of checking the stacks or the catalog, he can simply say, "here is the answer I gave to the last person who asked that question".
In your script, this may apply in different ways. You can store the results of a database query or a calculation or part of a rendered web page; you can store it to a secondary database table or a file or a session variable or to a memory service like memcached. You can store a pre-parsed database query, ready to run. Some libraries like Smarty will automatically store part or all of a page for you. By storing the result and reusing it you can avoid doing the same work many times.
In every case, you have to worry about how long the answer will remain valid. What if the library got a new book in? Is it OK to use an answer that may be five minutes out of date? What about a day out of date?
Caching is very application-specific; you will have to think about what your data means, how often it changes, how expensive the calculation is, how often the result is needed. If the data changes slowly, it may be best to recalculate and store the result every time a change is made; if it changes often but is not crucial, it may be sufficient to update only if the cached value is more than a certain age.
Setup a copy of your application locally, enable the mysql query log, and setup xdebug or some other profiler. The start collecting data, and testing your application. There are lots of guides, and books available about how to optimize things. It is important that you spend time testing, and collecting data first so you optimize the right things.
Using the data you have collected try and reduce the number of queries per page-view, Ideally, you should be able to get everything you need in less 5-10 queries.
Look at the logs and see if you are asking for the same thing twice. It is a bad idea to request a record in one portion of your code, and then request it again from the database a few lines later unless you are sure the value is likely to have changed.
Look for queries embedded in loop, and try to refactor them so you make a single query and simply loop on the results.
The select * you mention using is an indication you may be doing something wrong. You probably should be listing fields you explicitly need. Check this site or google for lots of good arguments about why select * is evil.
Start looking at your queries and then using explain on them. For queries that are frequently used make sure they are using a good index and not doing a full table scan. Tweak indexes on your development database and test.
There are a couple things you can look into:
Query Design - look into more advanced and faster solutions
Hardware - throw better and faster hardware at the problem
Database Design - use indexes and practice good database design
All of these are easier said than done, but it is a start.
Firstly, sack your host, get off shared hosting into an environment you have full control over and stand a chance of being able to tune decently.
Replicate that environment in your lab, ideally with the same hardware as production; this includes things like RAID controller.
Did I mention that you need a RAID controller. Yes you do. You can't achieve decent write performance without one - which needs a battery backed cache. If you don't have one, each write needs to physically hit the disc which is ruinous for performance.
Anyway, back to read performance, once you've got the machine with the same spec RAID controller (and same discs, obviously) as production in your lab, you can try to tune stuff up.
More RAM is usually the cheapest way of achieving better performance - make sure that you've got MySQL configured to use it - which means tuning storage-engine specific parameters.
I am assuming here that you have at least 100G of data; if not, just buy enough ram that your entire DB fits in ram then read performance is essentially solved.
Software changes that others have mentioned such as optimising queries and adding indexes are helpful too, but only once you've got a development hardware environment that enables you to usefully do performance work - i.e. measure performance of your application meaningfully - which means real hardware (not VMs), which is consistent with the hardware environment used in production.
Oh yes - one more thing - don't even THINK about deploying a database server on a 32-bit OS, it's a ruinous waste of good ram.
Indexing is done on the database tables in order to speed queries. If you don't know what it means you have none. At a minumum you should have indexes on every foriegn key and on most fileds that are used frequently in the where clauses of your queries. Primary keys should have indexes automatically assuming you set them up to begin with which I would find unlikely in someone who doesn't know what an index is. Are your tables normalized?
BTW, since you are doing a division in your math (why I haven't a clue), you should Google integer math. You may neot be getting correct results.
You should not select * ever. Instead, select only the data you need for that particular call. And what is your intention here?
order by votes*1000+((1440 - ($server_date - date))/60)2+visites600 desc
You may have poorly-written queries, and/or poorly written pages that run too many queries. Could you give us specific examples of queries you're using that are ran on a regular basis?
sure
this query to fetch the last 3 posts
select * from posts where visible = 1 and date > ($server_date - 86400) and dont_show_in_frontpage = 0 order by votes*1000+((1440 - ($server_date - date))/60)*2+visites*600 desc limit 3
what do you think?

Categories