I have user table with innoDB Engine which has about million drivers
CREATE TABLE user (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`Column2` varchar(14) NOT NULL,
`Column3` varchar(14) NOT NULL,
`lat` double NOT NULL,
`lng` double NOT NULL,
PRIMARY KEY (`Id`)
) ENGINE=InnoDB
And i have a mobile application track the locations of users and send it to server and save it.
Now am sure when go live and have millions of drivers send their locations ... the database will be down or very slow.
How i can avoid the slow performance of Mysql database when normal users use the application (read/write records)
I was thinking about create new database just to track drivers locations and then i have a main database will be updated via cronjob for example to update users table with lat/lng every specific time
I have some limitation here ... i can not switch to no-sql database in this stage
3333 rows inserted per second. Be sure to "batch" the inserts in some way. For even higher insertion rates, see http://mysql.rjweb.org/doc.php/staging_table
DOUBLE is overkill for lat/lng, and wastes space. The size of the table could lead to performance problems (when the table gets to be "huge"). For locating a vehicle, FLOAT is probably better -- 8 bytes for 2 floats vs 16 bytes for 2 doubles. The resolution is 1.7 m (5.6 ft). Ref:
http://mysql.rjweb.org/doc.php/latlng#representation_choices
On the other hand, if there is only one lat/lng per user, a million rows would be less than 100MB, not a very big table.
What queries are to be performed? A million rows against a table can be costly. "Find all users within 10 miles (or km)" would require a table scan. Recommend looking into a bounding box, plus a couple of secondary indexes.
More
The calls to update location should connect, update, disconnect. This will take a fraction of a second, and may not overload max_connections. That setting should not be too high; it could invite trouble. Also set back_log to about the same value.
Consider "connection pooling", the details of which depend on your app language, web server, version of MySQL, etc.
Together with the "bounding box" in the WHERE, have INDEX(lat), INDEX(lng); the Optimizer will pick between them.
Now many CPU cores in your server? Limit the number of webserver threads to about twice that. This provides another throttling mechanism to avoid "thundering herd syntrome".
Turn off the Query cache by having both query_cache_size=0 and query_cache_type=0. Otherwise the QC costs some overhead while essentially never providing any benefit.
Batching INSERTs is feasible. But you need to batch UPDATEs. This is trickier. It should be practical by gathering updates in a table, then doing a single, multi-table, UPDATE to copy from that table into the main table. This extra table would work something like the ping-pong I discuss in my "staging_table" link. But... First let's see if the other fixes are sufficient.
Use innodb_flush_log_at_trx_commit = 2 . Otherwise, the bottleneck will be logging transactions. The downside (of losing 1 second's worth of updates) is probably not an issue for your app -- since you will get an another lat/lng soon.
Finding nearby vehicles -- This is even better than a bounding box, but it is more complex: http://mysql.rjweb.org/doc.php/latlng . How often do look for "nearbys". I hope it is not 3333/sec; that is not practical in a single server. (Multiple Slaves could provide a solution.) Anyway, the resultset does not change very fast.
There's a lot to unpick here...
Firstly, consider using the spatial data types for storing lat and long. That, in turn, will allow you to use spatial indexes, which are optimized for finding people in bounding boxes.
Secondly, if you expect such high traffic, you may need some exotic solutions.
Firstly - set up a test rig, as similar to the production hardware as possible, so you can hunt for bottlenecks. If you expect 100K inserts over a 5 minute period, you're looking at an average of 100.000 / 5 / 60 = 333 inserts per second. But scaling for average is usually a bad idea - you need to scale for peaks. My rule of thumb is that you need to be able to hand 10 times the average if the average is in the 1 - 10 minute range, so you're looking for around 3000 inserts / second.
I'd use a load testing tool (JMeter is great) - and ensure that the bottleneck isn't in the load testing infrastructure, rather than the target server. Work out at which load your target system starts to reach the acceptable response time boundaries - for a simple insert statement, I'd set that at 1 second. If you are using modern hardware, with no triggers and a well-designed table, I'd expect to reach at least 500 inserts per second (my Macbook gets close to that).
Use this test rig to optimize your database schema and indexes - you can get a LOT of performance out of MySQL!
The next step is the painful one - there is very little you can do to increase the raw performance of MySQL inserts (lots of memory, a fast SSD drive, fast CPU; you may be able to use a staging table with no indexes to get another couple of percent improvement). If you cannot hit your target performance goal with "vanilla" MySQL, you now need to look at more exotic solutions.
The first is the easiest - make your apps less chatty. This will help the entire solution's scalability (I presume you have web/application servers between the apps and the database - they will need scaling too). For instance, rather than sending real-time updates, perhaps the apps can store 1, 5, 10, 60, 2400 minutes worth of data and send that as a batch. If you have 1 million daily active users, with peaks of 100.000 active users, it's much easier to scale to 1 million transactions per day than to 100.000 transactions every 5 minutes.
The second option is to put a message queuing server in front of your database. Message queueing systems scale much more easily than databases, but you're adding significant additional complexity to the architecture.
The third option is clustering. This allows the load to be spread over multiple physical database servers - but again introduces additional complexity and cost.
Related
I'm developing a custom tracking tool for marketing campaigns. This tool is in the middle between the ads and the landing pages. It takes care of saving all data from the user, such as the info in the user-agent, the IP, the clicks on the landing page and the geocoding data of the IPs of the users (country, ISP, etc).
At the moment I have some design issues:
The traffic on these campaigns is very very high, so potentially I have millions of rows insert a day. This system can have more than one user, so I can't store all this data on a single table because would become a mess. Maybe I can split the data in more tables, one table per user, but I'm not sure about this solution.
The data saving process must be done as quickly as possible (some milliseconds), so I think that NodeJS is much better than PHP for doing this. Especially with regard to speed and server resources. I do not want the server to crash from lack of RAM.
I need to group these data for statistic purposes. For example, I have one row for every user that visit my landing page, but I need to group these data for showing the number of impressions on this specific landing page. So all these queries need to be executed as faster as possible with this large amount of rows.
I need to geocode the IP addresses, so i need accurate information like the Country, the ISP, the type of connection etc, but this can slow down the data saving process if I call an API service. And this must be done in real-time and can't be done later.
After the saving process, the system should do a redirect to the landing page. Time is important for not losing any possible lead.
Basically, I'm finding the best solutions for:
Efficiently manage a very large database
Saving data from the users in the shortest time possible (ms)
If possible, make geocode an ip in the shortest time possible, without blocking execution
Optimize the schema and the queries for generating statistics
Do you have any suggestion? Thanks in advance.
One table per user is a worse mess; don't do that.
Millions of rows a day -- dozens, maybe hundreds, per second? That probably requires some form of 'staging' -- collecting multiple rows, then batch-inserting them. Before discussing further, please elaborate on the data flow: Single vs. multiple clients. UI vs. batch processes. Tentative CREATE TABLE. Etc.
Statistical -- Plan on creating and incrementally maintaining "Summary tables".
Are you trying to map user IP addresses to Country? That is a separate question, and it has been answered.
"Must" "real-time" "milliseconds". Face it, you will have to make some trade-offs.
More info: Go to http://mysql.rjweb.org/ ; from there, see the three blogs on Data Warehouse Techniques.
How to store by day
InnoDB stores data in PRIMARY KEY order. So, to get all the rows for one day adjacent to each other, one must start the PK with the datetime. For huge databases, may improve certain queries significantly by allowing the query to scan the data sequentially, thereby minimizing disk I/O.
If you already have id AUTO_INCREMENT (and if you continue to need it), then do this:
PRIMARY KEY(datetime, id), -- to get clustering, and be UNIQUE
INDEX(id) -- to keep AUTO_INCREMENT happy
If you have a year's worth of data, and the data won't fit in RAM, then this technique is very effective for small time ranges. But if your time range is bigger than the cache, you will be at the mercy of I/O speed.
Maintaining summary tables with changing data
This may be possible; I need to better understand the data and the changes.
You cannot scan a million rows in sub-second time, regardless of caching, tuning, and other optimizations. You can do the desired data with a Summary table much faster.
Shrink the data
Don't use BIGINT (8 bytes) if INT (4 bytes) will suffice; don't use INT if MEDIUMINT (3 bytes) will do. Etc.
Use UNSIGNED where appropriate.
Normalize repeated strings.
Smaller data will make it more cacheable, hence run faster when you do have to hit the disk.
Let's pretend with me here:
PHP/MySQL web-application. Assume a single server and a single MySQL DB.
I have 1,000 bosses. Every boss has 10 workers under them. These 10 workers (times 1k, totaling 10,000 workers) each have at least 5 database entries (call them work orders for this purpose) in the WebApplication every work day. That's 50k entries a day in this work orders table.
Server issues aside, I see two main ways to handle the basic logic of the database here:
Each Boss has an ID. There is one table called workorders and it has a column named BossID to associate every work order with a boss. This leaves you with approximately 1 million entries a month in a single table, and to me that seems to add up fast.
Each Boss has it's own table that is created when that Boss signed up, i.e. work_bossID where bossID = the boss' unique ID. This leaves you with 1,000 tables, but these tables are much more manageable.
Is there a third option that I'm overlooking?
Which method would be the better-functioning method?
How big is too big for number of entries in a table (let's assume a small number of columns: less than 10)? (this can include: it's time to get a second server when...)
How big is too big for number of tables in a database? (this can include: it's time to get a second server when...)
I know that at some point we have to bring in talks of multiple servers, and databases linked together... but again, let's focus on a single server here with a singly MySQL DB.
If you use a single server, I don't think there is a problem with how big the table gets. It isn't just the number of records in a table, but how frequently it is accessed.
To manage large datasets, you can use multiple servers. In this case:
You can keep all workorders in a single table, and mirror them across different servers (so that you have slave servers)
You can shard the workorders table by boss (in this case you access the server depending on where the workorder belongs) - search for database sharding for more information
Which option you choose depends on how you will use your database.
Mirrors (master/slave)
Keeping all workorders in a single table is good for querying when you don't know which boss a workorder belongs to, eg. if you are searching by product type, but any boss can have orders in any product type.
However, you have to store a copy of everything on every mirror. In addition only one server (the master) can deal with update (or adding workorder) SQL requests. This is fine if most of your SQL queries are SELECT queries.
Sharding
The advantage of sharding is that you don't have to store a copy of the record on every mirror server.
However, if you are searching workorders by some attribute for any boss, you would have to query every server to check every shard.
How to choose
In summary, use a single table if you can have all sorts of queries, including browsing workorders by an attribute (other than which boss it belongs to), and you are likely to have more SELECT (read) queries than write queries.
Use shards if you can have write queries on the same order of magnitude as read queries, and/or you want to save memory, and queries searching by other attributes (not boss) are rare.
Keeping queries fast
Large databases are not really a big problem, if they are not overwhelmed by queries, because they can keep most of the database on hard disk, and only keep what was accessed recently in cache (on memory).
The other important thing to prevent any single query from running slowly is to make sure you add the right index for each query you might perform to avoid linear searches. This is to allow the database to binary search for the record(s) required.
If you need to maintain a count of records, whether of the whole table, or by attribute (category or boss), then keep counter caches.
When to get a new server
There isn't really a single number you can assign to determine when a new server is needed because there are too many variables. This decision can be made by looking at how fast queries are performing, and the CPU/memory usage of your server.
Scaling is often a case of experimentation as it's not always clear from the outset where the bottlenecks will be. Since you seem to have a pretty good idea of the kind of load the system will be under, one of the first things to do is capture this in a spreadsheet so you can work out some hypotheticals. This allows you do do a lot of quick "what if" scenarios and come up with a reasonable upper end for how far you have to scale with your first build.
For collecting large numbers of records there's some straight-forward rules:
Use the most efficient data type to represent what you're describing. Don't worry about using smaller integer types to shave off a few bytes, or shrinking varchars. What's important here is using integers for numbers, date fields for dates, and so on. Don't use a varchar for data that already has a proper type.
Don't over-index your table, add only what is strictly necessary. The larger the number of indexes you have, the slower your inserts will get as the table grows.
Purge data that's no longer necessary. Where practical delete it. Where it needs to be retained for an extended period of time, make alternate tables you can dump it into. For instance, you may be able to rotate out your main orders table every quarter or fiscal year to keep it running quickly. You can always adjust your queries to run against the other tables if required for reporting. Keep your working data set as small as practical.
Tune your MySQL server by benchmarking, tinkering, researching, and experimenting. There's no magic bullet here. There's many variables that may work for some people but might slow down your application. They're also highly dependent on OS, hardware, and the structure and size of your data. You can easily double or quadruple performance by allocating more memory to your database engine, for instance, either InnoDB or MyISAM.
Try using other MySQL forks if you think they might help significantly. There are a few that offer improved performance over the regular MySQL, Percona in particular.
If you query large tables often and aggressively, it may make sense to de-normalize some of your data to reduce the number of expensive joins that have to be done. For instance, on a message board you might include the user's name in every message even though that seems like a waste of data, but it makes displaying large lists of messages very, very fast.
With all that in mind, the best thing to do is design your schema, build your tables, and then exercise them. Simulate loading in 6-12 months of data and see how well it performs once really loaded down. You'll find all kinds of issues if you use EXPLAIN on your slower queries. It's even better to do this on a development system that's slower than your production database server so you won't have any surprises when you deploy.
The golden rule of scaling is only optimize what's actually a problem and avoid tuning things just because it seems like a good idea. It's very easy to over-engineer a solution that will later do the opposite of what you intend or prove to be extremely difficult to un-do.
MySQL can handle millions if not billions of rows without too much trouble if you're careful to experiment and prove it works in some capacity before rolling it out.
i had database size problem as well in one of my networks so big that it use to slow the server down when i run query on that table..
in my opinion divide your database into dates decide what table size would be too big for you - let say 1 million entries then calculate how long it will take you to get to that amount. and then have a script every that period of time to either create a new table with the date and move all current data over or just back that table up and empty it.
like putting out dated material in archives.
if you chose the first option you'll be able to access that date easily by referring to that table.
Hope that idea helps
Just create a workers table, bosses table, a relationships table for the two, and then all of your other tables. With a relationship structure like this, it's very dynamic. Because, if it ever got large enough you could create another relationship table between the work orders to the bosses or to the workers.
You might want to look into bigints, but I doubt you'll need that. I know it that the relationships table will get massive, but thats good db design.
Of course bigint is for mySQL, which can go up to -9223372036854775808 to 9223372036854775807 normal. 0 to 18446744073709551615 UNSIGNED*
Let's say you have a search form, with multiple select fields, let's say a user selects from a dropdown an option, but before he submits the data I need to display the count of the rows in the database .
So let's say the site has at least 300k(300.000) visitors a day, and a user selects options from the form at least 40 times a visit, that would mean 12M ajax requests + 12M count queries on the database, which seems a bit too much .
The question is how can one implement a fast count (using php(Zend Framework) and MySQL) so that the additional 12M queries on the database won't affect the load of the site .
One solution would be to have a table that stores all combinations of select fields and their respective counts (when a product is added or deleted from the products table the table storing the count would be updated). Although this is not such a good idea when for 8 filters (select options) out of 43 there would be +8M rows inserted that need to be managed.
Any other thoughts on how to achieve this?
p.s. I don't need code examples but the idea itself that would work in this scenario.
I would probably have an pre-calculated table - as you suggest yourself. Import is that you have an smart mechanism for 2 things:
Easily query which entries are affected by which change.
Have an unique lookup field for an entire form request.
The 8M entries wouldn't be very significant if you have solid keys, as you would only require an direct lookup.
I would go trough the trouble to write specific updates for this table on all places it is necessary. Even with the high amount of changes, this is still efficient. If correctly done you will know which rows you need to update or invalidate when inserting/updating/deleting the product.
Sidenote:
Based on your comment. If you need to add code on eight places to cover all spots can be deleted - it might be a good time to refactor and centralize some code.
there are few scenarios
mysql has the query cache, you dun have to bother the caching IF the update of table is not that frequently
99% user won't bother how many results that matched, he/she just need the top few records
use the explain - if you notice explain will return how many rows going to matched in the query, is not 100% precise, but should be good enough to act as rough row count
Not really what you asked for, but since you have a lot of options and want to count the items available based on the options you should take a look at Lucene and its faceted search. It was made to solve problems like this.
If you do not have the need to have up to date information from the search you can use a queue system to push updates and inserts to Lucene every now and then (so you don't have to bother Lucene with couple of thousand of updates and inserts every day).
You really only have three options, and no amount of searching is likely to reveal a fourth:
Count the results manually. O(n) with the total number of the results at query-time.
Store and maintain counts for every combination of filters. O(1) to retrieve the count, but requires O(2^n) storage and O(2^n) time to update all the counts when records change.
Cache counts, only calculating them (per #1) when they're not found in the cache. O(1) when data is in the cache, O(n) otherwise.
It's for this reason that systems that have to scale beyond the trivial - that is, most of them - either cap the number of results they'll count (eg, items in your GMail inbox or unread in Google Reader), estimate the count based on statistics (eg, Google search result counts), or both.
I suppose it's possible you might actually require an exact count for your users, with no limitation, but it's hard to envisage a scenario where that might actually be necessary.
I would suggest a separate table that caches the counts, combined with triggers.
In order for it to be fast you make it a memory table and you update it using triggers on the inserts, deletes and updates.
pseudo code:
CREATE TABLE counts (
id unsigned integer auto_increment primary key
option integer indexed using hash key
user_id integer indexed using hash key
rowcount unsigned integer
unique key user_option (user, option)
) engine = memory
DELIMITER $$
CREATE TRIGGER ai_tablex_each AFTER UPDATE ON tablex FOR EACH ROW
BEGIN
IF (old.option <> new.option) OR (old.user_id <> new.user_id) THEN BEGIN
UPDATE counts c SET c.rowcount = c.rowcount - 1
WHERE c.user_id = old.user_id and c.option = old.option;
INSERT INTO counts rowcount, user_id, option
VALUES (1, new.user_id, new.option)
ON DUPLICATE KEY SET c.rowcount = c.rowcount + 1;
END; END IF;
END $$
DELIMITER ;
Selection of the counts will be instant, and the updates in the trigger should not take very long either because you're using a memory table with hash indexes which have O(1) lookup time.
Links:
Memory engine: http://dev.mysql.com/doc/refman/5.5/en/memory-storage-engine.html
Triggers: http://dev.mysql.com/doc/refman/5.5/en/triggers.html
A few things you can easily optimise:
Cache all you can allow yourself to cache. The options for your dropdowns, for example, do they need to be fetched by ajax calls? This page answered many of my questions when I implemented memcache, and of course memcached.org has great documentation available too.
Serve anything that can be served statically. Ie, options that don't change frequently could be stored in a flat file as array via cron every hour for example and included with script at runtime.
MySQL with default configuration settings is often sub-optimal for any serious application load and should be tweaked to fit the needs, of the task at hand. Maybe look into memory engine for high performance read-access.
You can have a look at these 3 great-but-very-technical posts on materialized views, as a matter of fact that whole blog is truly a goldmine of performance tips for mysql.
GOod-luck
Presumably you're using ajax to make the call to the back end that you're talking about. Use some kind of a chached flat file as an intermediate for the data. Set an expire time of 5 seconds or whatever is appropriate. Name the data file as the query key=value string. In the ajax request if the data file is older than your cooldown time, then refresh, if not, use the value stored in your data file.
Also, you might be underestimating the strength of the mysql query cache mechanism. If you're using mysql query cache, I doubt there would be any significant performance dip over doing it the way I just described. If the query was being query cached by mysql then virtually the only slowdown effect would be from the network layer between your application and mysql.
Consider what role replication can play in your architecture. If you need to scale out, you might consider replicating your tables from InnoDB to MyISAM. The MyISAM engine automatically maintains a table count if you are doing count(*) queries. If you are doing count(col) where queries, then you need to rely heavily on well designed indicies. In that case you your count queries might take shape like so:
alter table A add index ixA (a, b);
select count(a) using from A use index(ixA) where a=1 and b=2;
I feel crazy for suggesting this as it seems that no-one else has, but have you considered client-side caching? JavaScript isn't terrible at dealing with large lists, especially if they're relatively simple lists.
I know that your ideal is that you have a desire to make the numbers completely accurate, but heuristics are your friend here, especially since synchronization will never be 100% -- a slow connection or high latency due to server-side traffic will make the AJAX request out of date, especially if that data is not a constant. IF THE DATA CAN BE EDITED BY OTHER USERS, SYNCHRONICITY IS IMPOSSIBLE USING AJAX. IF IT CANNOT BE EDITED BY ANYONE ELSE, THEN CLIENT-SIDE CACHING WILL WORK AND IS LIKELY YOUR BEST OPTION. Oh, and if you're using some sort of port connection, then whatever is pushing to the server can simply update all of the other clients until a sync can be accomplished.
If you're willing to do that form of caching, you can also cache the results on the server too and simply refresh the query periodically.
As others have suggested, you really need some sort of caching mechanism on the server side. Whether it's a MySQL table or memcache, either would work. But to reduce the number of calls to the server, retrieve the full list of cached counts in one request and cache that locally in javascript. That's a pretty simple way to eliminate almost 12M server hits.
You could probably even store the count information in a cookie which expires in an hour, so subsequent page loads don't need to query again. That's if you don't need real time numbers.
Many of the latest browser also support local storage, which doesn't get passed to the server with every request like cookies do.
You can fit a lot of data into a 1-2K json data structure. So even if you have thousands of possible count options, that is still smaller than your typical image. Just keep in mind maximum cookie sizes if you use cookie caching.
i made a social network, testing it in WAMP shows almost 1500 SQL for a single person for a session of about 30 mins and 50 page views !
[ i'm not using ZEND or APC or MEMCACHED The heaviest page gets loaded within 0.25 second config 512 MB RAM, AMD 1.81GHz ]
Q-> is it ok or i need to less the number of SQL ?
there are 2 tables PARENT and CHILD
Structure of PARENT table
PID [primary key]
...
...
Structure of CHILD table
ID [primary key]
PID
...
...
i've not used Foreign Key, but deleting on PARENT also deletes from CHILD
and i made this in PHP/SQL
Q-> is it ok or i should go for FOREIGN KEY for better performance ?
In PHP i can config how much memory PHP gonna eat
Q-> can i also do it with MySQL ?
[ i am using WAMP,and need to monitor the social network's performance in bottle neck condition ! ]
No-one can say if an arbitrary number of SQL queries is OK :
It depends on the complexity of those queries
It depends on your database's structure (indexes, for instance, play a big role)
It depends on the amount of data you have
It depends on how many concurrent users you plan to have (with one user at a time, your application will probably be way faster than with 100 users at a given instant)
...
Basically : do some benchmarks, using tools such as ab / siege / Jmeter ; and see if your server can handle the load you expect on having in the next few weeks.
Using foreign keys generally doesn't help with performances (except if they force you setting indexes you'd need but wouldn't have created by yourself) : they add some extra-work on the DB side.
But using foreign keys helps with data integrity -- and having data that's OK is probably more important than a couple milliseconds, especially if you are just launching your application (which means there could be quite a few bugs).
30 SQL queries per page is reasonable in general (actually it's quite low considering what some CMS do). On the other hand, with the information given, it is not possible to determine whether it is reasonable in your case.
Foreign keys do not improve performance. Foreign key constraints might. But they also put business logic into the persistence layer. It's an optimization.
Information about configureing the memory usage of MySQL can be found in the handbook section 7.11.4.1. How MySQL Uses Memory.
I'd agree with Pascal and Oswald - esp. on testing with JMeter or similar to see if you really do have a problem.
I would also load up the database with a few million test profiles to see whether your queries slow down over time. This should help with optimizing query performance.
If your goal for tweaking MySQL is to introduce an artificial bottleneck to test the application, I'd be careful to extrapolate from those tests. What you see with bottlenecks is that they tend to be non-linear - everything is fine until you hit a bottleneck moment, and then everything becomes highly unpredictable. You may not recreate this simply by reducing the memory of the database server.
If there's any low-hanging fruit, I would reduce the number of SQL queries, but 30 queries per page is not excessive. If you want to be prepared to scale to Facebook levels, I don't think reducing the queries per page from 30 to 28 will help much - you need to be ready to partition the application across multiple databases, introduce caching, and buy more powerful hardware.
I was wondering if it's faster to process data in MySQL or a server language like PHP or Python. I'm sure native functions like ORDER will be faster in MySQL due to indexing, caching, etc, but actually calculating the rank (including ties returning multiple entries as having the same rank):
Sample SQL
SELECT TORCH_ID,
distance AS thisscore,
(SELECT COUNT(distinct(distance))+1 FROM torch_info WHERE distance > thisscore) AS rank
FROM torch_info ORDER BY rank
Server
...as opposed to just doing a SELECT TORCH_ID FROM torch_info ORDER BY score DESC and then figure out rank in PHP on the web server.
Edit: Since posting this, my answer has changed completely, partly due to the experience I've gained since then and partly because relational database systems have gotten significantly better since 2009. Today, 9 times out of 10, I would recommend doing as much of your data crunching in-database as possible. There are three reasons for this:
Databases are highly optimized for crunching data—that's their entire job! With few exceptions, replicating what the database is doing at the application level is going to be slower unless you invest a lot of engineering effort into implementing the same optimizations that the DB provides to you for free—especially with a relatively slow language like PHP, Python, or Ruby.
As the size of your table grows, pulling it into the application layer and operating on it there becomes prohibitively expensive simply due to the sheer amount of data transferred. Many applications will never reach this scale, but if you do, it's best to reduce the transfer overhead and keep the data operations as close to the DB as possible.
In my experience, you're far more likely to introduce consistency bugs in your application than in your RDBMS, since the DB can enforce consistency on your data at a low level but the application cannot. If you don't have that safety net built-in, so you have to be more careful to not make mistakes.
Original answer: MySQL will probably be faster with most non-complex calculations. However, 90% of the time database server is the bottleneck, so do you really want to add to that by bogging down your database with these calculations? I myself would rather put them on the web/application server to even out the load, but that's your decision.
In general, the answer to the "Should I process data in the database, or on the web server question" is, "It depends".
It's easy to add another web server. It's harder to add another database server. If you can take load off the database, that can be good.
If the output of your data processing is much smaller than the required input, you may be able to avoid a lot of data transfer overhead by doing the processing in the database. As a simple example, it'd be foolish to SELECT *, retrieve every row in the table, and iterate through them on the web server to pick the one where x = 3, when you can just SELECT * WHERE x = 3
As you pointed out, the database is optimized for operation on its data, using indexes, etc.
The speed of the count is going to depend on which DB storage engine you are using and the size of the table. Though I suspect that nearly every count and rank done in mySQL would be faster than pulling that same data into PHP memory and doing the same operation.
Ranking is based on count, order. So if you can do those functions faster, then rank will obviously be faster.
A large part of your question is dependent on the primary keys and indexes you have set up.
Assuming that torchID is indexed properly...
You will find that mySQL is faster than server side code.
Another consideration you might want to make is how often this SQL will be called. You may find it easier to create a rank column and update that as each track record comes in. This will result in a lot of minor hits to your database, versus a number of "heavier" hits to your database.
So let's say you have 10,000 records, 1000 users who hit this query once a day, and 100 users who put in a new track record each day. I'd rather have the DB doing 100 updates in which 10% of them hit every record (9,999) then have the ranking query get hit 1,000 times a day.
My two cents.
If your test is running individual queries instead of posting transactions then I would recommend using a JDBC driver over the ODBC dsn because youll get 2-3 times faster performance. (im assuming your using an odbc dsn here in your tests)