Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
We have a tracking system running in PHP + MySQL.
We receive about 8 to 10 millions entries per day which represent an average of about 100 insertions per seconds on 3 tables linked with a clickid key.
In parallel we can select on those tables to search for a clickid or update one after conversion etc...
We are looking for a better solution to be able to use the backoffice and get statistics in realtime because right now it takes about 150 seconds to display a result.
We use cronjobs to fill out a stats table and work with it, which allow us to get very quick result, but this cronjob runs twice per hour so we ar far away from real time stats...
So, we are thinking to switch to a NoSQL solution but we are not sure which nosql db will be the most adapted to our specific case?
We should be able to fiter and retrieve statistics by about 8 different keys like campaignid, publisherid, advertiserid, date, ...
We were thinking to test Mongodb and Redis, which one do you think will be the most appropriate? And why, in your opinion? We do have now about 500.000.000 entries we should insert as documents, and every seconds about 100 documents will be inserted... So it will increase quiet fast and we ll need to keep the data.
What do you think will be the time to display result with this quantity of data?
Also, do you think it is better to split in different Collections or better to keep everything in a single big Collection?
I dont have extensive experience with Redis but I could tell you something about MongoDB.
The NOSQL movement is more about scalability. So there would be very limited options if you want to keep it in one collection. Most of the NOSQL DB would break that up into sharded replica sets. You can read about it here .If you are planning on MongoDB, the writes could be quick, since its sharded and replicated. If you dont mind data being a bit stale(depending upon the latency between the primary and the secondary in the shard), MongoDB can be a good alternative.
Typically, you could write to the primary and read from the secondary, as opposed to your current scenario, where I guess everything happens on one DB. This should be significant performance boost for your ops, but exactly how much would depend on the details.
You can actually either, both or neither :) I still don't understand what your requirements are, and most importantly in terms of your current data volume and expected growth pattern, nor what you want to get out of it apart from getting statistic in real time. I also didn't quite get if you're planning to replace the MySQL entirely or whether you're going to build on top/beside it.
I definitely agree that 150s is not an acceptable response time for your dashboard, but before diving head on into a forklift operation, I suggest that maybe you should consider a simpler approach like just keeping your realtime statistics counters in a suitable datastore (e.g. Redis ;)).
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
Currently, I am working on a website and just started studying backend. I wonder why nobody uses JSON as a database. Also, I don't quite get the utility of php and SQL. Since I could easily get data from JSON file and use it, why do I need php and SQL?
ok! let assume you put the data in a JSON variable and store it in a file for all your projects.
obviously, u need to add a subsystem for getting back up, then you will write it.
you must increase the performance for handling a very large amount of data, just like indexing, hash algorithms, and... , assume u handle it.
if you need some API for working and connecting with a variety of programming languages, u need to write them.
what about functionalities? what if you need to add some triggers, store procedures, views, full-text search and etc? ok, you will pay your time and add them.
ok, good job, but your system will grow up and you need to scale it, can you do it? u will write abilities for clustering across servers, sharding, and ...
now you need to guarantee that your system will compatible with ACID rules, to keep atomicity, Consistency, Isolation, and Durability.
can you always handle all querying techniques (Map/Reduce) and respond with a fast and standard structure?
now it's time to offer very quick write speeds, it brings serious issues for you
ok, now proper your solutions for condition racing, isolation level, locking, relations and ...
after you do all this work plus thousands of many others, probably you will have a DBMS a little bit just like MongoDB or other relational and non-relational databases!
so it's better to use them, however, obviously, you can choose to don't to use them too, I admit that sometimes saving data in a single file has better performance, but only sometimes, in some cases, with some data, for some purpose! if you know what exactly you do, then ist OK to save data in a JSON file.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I've built a school management system for my own needs. The size of tables ranges from ~200 rows to ~30,000 rows /at the end of school year/.
Some of my friends have seen the system and they urge me to make it available to other schools. I'd like to give it a try at least with a few schools for now. Considering my current architecture and shared hosting I'd have to store all schools in single db and so 2 questions bother me:
Can MySql easily handle tables with >300,000 rows?
The system is based on Yii2 at the moment, I've optimized it for max performance - do you think it's wise to try or better work towards solution with a dedicated server and separate db for each school?
I don't know if it's wise to store all students, attendance, payments etc info from my and 10 others schools across shared tables in single db. I'd better ask than cause trouble to myself..
Any advice is more than welcome:)
premature optimization is the root of all evil (or at least most of
it) in programming
You should not worry about this at the moment. Start running your application and as you scale, identify the bottlenecks and then try to figure out a solution for it.
Can MySql easily handle tables with >300,000 rows?
First things first, use the best normlization principles to structure your tables and relations. MySQL is pretty good at handling rows up to 10,000,000. but it also depends on how you are indexing/querying the data. Use proper db indexes on the columns you frequenty use for lookup. A big no to "like" queries but if you must, then use a search engine like (elastic, solr).
The system is based on Yii2 at the moment, I've optimized it for max
performance - do you think it's wise to try or better work towards
solution with a dedicated server and separate db for each school?
I have a very little idea about Yii2, but certainly there are much better frameworks available in php which you can give a try eg. larvel (this will give you a better idea). Ofcourse, best would be to host this application on a dedicated server. why to waste money when you can have private VPS just in 5$ from digitalocean.
I don't know if it's wise to store all students, attendance, payments
etc info from my and 10 others schools across shared tables in single
db. I'd better ask than cause trouble to myself..
There is absolutely no problem storing students, attendance, payments info in the same database, just structure your tables properly.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am about to create a PHP web project that will be consisting of a large database. The database will be MYSQL and will store more than 30000 records per day. To optimize the DB I thought to use MEMCACHED library with it. Am i going the correct way or some other alternative can be used to overcome the data optimization problem. I just want to provide faster retrieval and insertion. Can somebody advise me which tool should I use and how, as the data will gradually increase at a higher rate ? Should i use object relational mapping concept too ?
You can use Master & Slave technique for this purpose. Basically it would be combination of 2 db first for read operation and other for write operation.
I'd side with #halfer and say he's right about the test data. At least you'll know that you're not trying to optimize something that doesn't need optimizing.
On top of test data you'll also need some test scenarios to mimic the traffic patterns of your production environment, that's the hard part and really depends on the exact application patterns: how many reads versus writes versus updates / per second.
Given your number (30k) you'd average out at about 3 inserts / second which I'd assume even the cheapest machines could handle with ease. As for reads, a years worth of data would be just under 11M records. You may want to partition the data (mysql level or application level) if look ups become slow but I doubt you'd need to with such relatively small volumes. The real difference maker would be if the # of reads is 1000x more than the number of inserts, then you could look into what #ram sharma suggested and set up a replicated master-slave model where the master takes all the writes and the slaves are read-only.
Memcached is a powerful beast when used correctly and can turn a slow DB disk read into a blazing fast memory read. I'd still only suggest you look into it IF the DB is too slow. Adding moving parts to any application also adds potential failure points and increases the overall complexity.
EDIT: as for the use of an ORM, that's your choice and really won't change a thing concerning the DB's speed although it may add fractions of milliseconds to the end user.. usually worth it in my experience.
Cheers --
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am starting a project for analytics platform using php(just like kissmetrics , Appanie) for our product.
So daily our database will get updated with tons of data . Using this data display reports to the users(there are so many reports for single user).
I was wondering whether the MySQL is capable for handling these many tasks. If mysql is not able to handle this, please suggest and alternate database for this.
Mysql can handle large data. You can use it. Make sure you implement proper indexing on tables for faster data retrieval. We have successfully used mysql for lakhs of records in a table. Only important part is careful designing of database.
MySQL is capable. Alternate can be postgresql.
The physical database size doesn't matter. The number of records don't matter.
In my experience the biggest problem that you are going to run in to is not size, but the number of queries you can handle at a time. Most likely you are going to have to move to a master/slave configuration so that the read queries can run against the slaves and the write queries run against the master. However if you are not ready for this yet, you can always tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernal in Linux that will help.
for more details go to this link
also this data is orignaly taken from here
MySQL is not very suitable for this. It has a quite inefficient query cache that is almost entirely flushed as soon as you edit any data in any of the tables used in the query. So when you do many updates, like you describe, querying will seriously slow down.
That said, for reports you may get away with a bit slower performance than for Web pages, for which MySQL is regularly used, so it might just work.
Many other databases, like Oracle and MS SQL Server are quite more solid, but also cost quite a lot more.
Anyway, it would be wise to make your application as undependent from the database as possible. Make sure you make a good database abstraction layer and don't use many (or any) triggers and stored procedures. That will help you switch databases if you later decide you need to.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I'm actually developping a system for bus ticket reservation. The provider has many routes and different trips. I've setup a rather comprehensive database that maps all this together, but i'm having trouble getting the pathing algorithm working when i comes to cross route reservation.
For example, the user wants to go from Montreal to Sherbrooke, he'll only take what we call here Route #47. But in the event he goes to Sutton instead of Sherbrooke, he now has to transfer into route #53 at some point.
Now, it isn't too hard to detect one and only one transfer. But when i comes to detecting what are the options he can do to cross multiple routes, i'm kinda scared. I've devised a cute and relatively efficient way to do so on 1-3 hops using only SQL but i'm wondering how i should organize all this in a much broader spectrum as the client will probably not stay with 2 routes for the rest of is life.
Example of what i've thought of so far:
StartingStop
joins to Route
joins to StopsOfTheRoute
joins to TransfersOnThatStop
joins to TargetStopOfThatTransfer
joins to RouteOfThatStop
joins to StopsOfThatNewRoute
[wash rince repeat for more hops]
where StopsOFThatNewRoute = EndingStop
Problem is, if i have more than 3 hops, i'm sure my SQL server will choke rather fast under the pressure, even if i correctly index my database, i can easily predict a major failure eventually...
Thank you
My understanding of your problem: You are looking for an algorithm that will help you identify a suitable path (covering segments of one or more routes).
This is as Jason correctly points out, a pathfinding problem. For this kind of problem, you should probably start by having a look at Wikipedia's article on pathfinding, and then dig into the details of Djikstra's algorithm. That will probably get you started.
You will however most likely soon realise that your data model might pose a problem, both from a structure and performance perspective. Typical example would be if you need to manage time constraints: which path has the shortest travel time, assuming you find several? One path might be shortest, but only provide one ride per day, while another path might be longer but provide several rides per day.
A possible way of handling this is to create a graph where each node corresponds to a particular stop, at a particular time. An edge would connect from this stop in room-time to both other geographical stops, as well as itself at the next point in time.
My suggestion would be to start by reading up on the pathfinding algorithms, then revisit your data model with regards to any constraints you might have. Then, focus on the structure for storing the data, and the structure for seaching for paths.
Suggestion (not efficient, but could work if you have a sufficient amount of RAM to spare). Use the relational database server for storing the basics: stops, which routes are connected to which stops and so on. Seems you have this covered already. Then build an in-memory representation of a graph given the constraints that you have. You could probably build your own library for this pretty fast (I am not aware of any such libraries for PHP).
Another alternative could be to use a graph database such as Neo4j and its REST interface. I guess this will require significant some redesign of your application.
Hope this gives you some helpful pointers.