Optimize large and gradually increasing database [closed]

Optimize large and gradually increasing database [closed] - php

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am about to create a PHP web project that will be consisting of a large database. The database will be MYSQL and will store more than 30000 records per day. To optimize the DB I thought to use MEMCACHED library with it. Am i going the correct way or some other alternative can be used to overcome the data optimization problem. I just want to provide faster retrieval and insertion. Can somebody advise me which tool should I use and how, as the data will gradually increase at a higher rate ? Should i use object relational mapping concept too ?

You can use Master & Slave technique for this purpose. Basically it would be combination of 2 db first for read operation and other for write operation.

I'd side with #halfer and say he's right about the test data. At least you'll know that you're not trying to optimize something that doesn't need optimizing.
On top of test data you'll also need some test scenarios to mimic the traffic patterns of your production environment, that's the hard part and really depends on the exact application patterns: how many reads versus writes versus updates / per second.
Given your number (30k) you'd average out at about 3 inserts / second which I'd assume even the cheapest machines could handle with ease. As for reads, a years worth of data would be just under 11M records. You may want to partition the data (mysql level or application level) if look ups become slow but I doubt you'd need to with such relatively small volumes. The real difference maker would be if the # of reads is 1000x more than the number of inserts, then you could look into what #ram sharma suggested and set up a replicated master-slave model where the master takes all the writes and the slaves are read-only.
Memcached is a powerful beast when used correctly and can turn a slow DB disk read into a blazing fast memory read. I'd still only suggest you look into it IF the DB is too slow. Adding moving parts to any application also adds potential failure points and increases the overall complexity.
EDIT: as for the use of an ORM, that's your choice and really won't change a thing concerning the DB's speed although it may add fractions of milliseconds to the end user.. usually worth it in my experience.
Cheers --

Related

Can I use JSON as a database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
Currently, I am working on a website and just started studying backend. I wonder why nobody uses JSON as a database. Also, I don't quite get the utility of php and SQL. Since I could easily get data from JSON file and use it, why do I need php and SQL?

ok! let assume you put the data in a JSON variable and store it in a file for all your projects.
obviously, u need to add a subsystem for getting back up, then you will write it.
you must increase the performance for handling a very large amount of data, just like indexing, hash algorithms, and... , assume u handle it.
if you need some API for working and connecting with a variety of programming languages, u need to write them.
what about functionalities? what if you need to add some triggers, store procedures, views, full-text search and etc? ok, you will pay your time and add them.
ok, good job, but your system will grow up and you need to scale it, can you do it? u will write abilities for clustering across servers, sharding, and ...
now you need to guarantee that your system will compatible with ACID rules, to keep atomicity, Consistency, Isolation, and Durability.
can you always handle all querying techniques (Map/Reduce) and respond with a fast and standard structure?
now it's time to offer very quick write speeds, it brings serious issues for you
ok, now proper your solutions for condition racing, isolation level, locking, relations and ...
after you do all this work plus thousands of many others, probably you will have a DBMS a little bit just like MongoDB or other relational and non-relational databases!
so it's better to use them, however, obviously, you can choose to don't to use them too, I admit that sometimes saving data in a single file has better performance, but only sometimes, in some cases, with some data, for some purpose! if you know what exactly you do, then ist OK to save data in a JSON file.

Which is better in terms of speed php or plpgsql? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am using php 5.3 and postgresql 9.1
Presently, I am doing DB work "outside" DB in PHP by fetching data from DB and processing the data and finally inserting/updating/deleting in DB, but as I am getting comfortable working with postgresql functions I have started coding in plpgsql.
Now, I would like to know is there any speed difference between the two or I can use which ever I am confortable with.
Also, will the answer be same for higher versions => php 5.5 and postgresql 9.3

Depends on what do you do. PL/pgSQL is optimized for data manipulation - PHP is optimized for html pages production. Some background technology is similar - and speed of basic structures is similar - PHP is significantly faster in string manipulations, but PLpgSQL runs in same address space as PostgreSQL database engine, and use same data types as PostgreSQL database engine, so there is zero overhead from data type conversions and interprocess communications.
Stored procedures has strong opponent and strong defenders - it is any other technology, and if you can use it well, it can serve perfect for small, for large projects. It is good for decomposition - it naturally divide application to presentation (interactive) layer and to data manipulation layer. It is important for data centric applications and less important for presentation centric applications. And opponents agree so, sometimes a stored procedures are necessary from performance reasons.
I disagree with kafsoksilo - debugging, unit testing, maintaining is not any issue - when you have knowledges about this technology - you can use almost all tools, that you know. And plpgsql language is pretty powerful (for data manipulation area) language - well documented with good diagnostic, clean and readable error messages and minimum issues.

Plpgsql is faster, as you don't have to fetch the data, process them and then submit a new query. All the process is done internally and it is also precompiled which also boosts performance.
Moreover when the database is on a remote server and not locally, you will have the network roundtrip delay. Sometimes the network roundtrip delay is higher that the time that your whole script needs to run.
For example if you need to execute 10 queries on a slow network, using plpgsql and execute only one would be a great improvement.
If the processing that you are going to perform is fetching large chunks of data, and output a true or false, then plpgsql gain will be even greater.
On the other hand, using plpgsql and putting logic in the database, makes your project a lot more difficult to debug, to fix errors and to unit testing. Also it makes a lot more difficult to change the RDBMS in the future.
My suggestion would be to manipulate the data at php, and use a little plpgsql only when you want to isolate some logic for security or data integrity reasons, or you want to tune your project for the best possible performance (which should be a concern after the first release).

Looking for which NoSQL db appropriate in our case [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
We have a tracking system running in PHP + MySQL.
We receive about 8 to 10 millions entries per day which represent an average of about 100 insertions per seconds on 3 tables linked with a clickid key.
In parallel we can select on those tables to search for a clickid or update one after conversion etc...
We are looking for a better solution to be able to use the backoffice and get statistics in realtime because right now it takes about 150 seconds to display a result.
We use cronjobs to fill out a stats table and work with it, which allow us to get very quick result, but this cronjob runs twice per hour so we ar far away from real time stats...
So, we are thinking to switch to a NoSQL solution but we are not sure which nosql db will be the most adapted to our specific case?
We should be able to fiter and retrieve statistics by about 8 different keys like campaignid, publisherid, advertiserid, date, ...
We were thinking to test Mongodb and Redis, which one do you think will be the most appropriate? And why, in your opinion? We do have now about 500.000.000 entries we should insert as documents, and every seconds about 100 documents will be inserted... So it will increase quiet fast and we ll need to keep the data.
What do you think will be the time to display result with this quantity of data?
Also, do you think it is better to split in different Collections or better to keep everything in a single big Collection?

I dont have extensive experience with Redis but I could tell you something about MongoDB.
The NOSQL movement is more about scalability. So there would be very limited options if you want to keep it in one collection. Most of the NOSQL DB would break that up into sharded replica sets. You can read about it here .If you are planning on MongoDB, the writes could be quick, since its sharded and replicated. If you dont mind data being a bit stale(depending upon the latency between the primary and the secondary in the shard), MongoDB can be a good alternative.
Typically, you could write to the primary and read from the secondary, as opposed to your current scenario, where I guess everything happens on one DB. This should be significant performance boost for your ops, but exactly how much would depend on the details.

You can actually either, both or neither :) I still don't understand what your requirements are, and most importantly in terms of your current data volume and expected growth pattern, nor what you want to get out of it apart from getting statistic in real time. I also didn't quite get if you're planning to replace the MySQL entirely or whether you're going to build on top/beside it.
I definitely agree that 150s is not an acceptable response time for your dashboard, but before diving head on into a forklift operation, I suggest that maybe you should consider a simpler approach like just keeping your realtime statistics counters in a suitable datastore (e.g. Redis ;)).

Logistics and transportation planning techniques [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I'm actually developping a system for bus ticket reservation. The provider has many routes and different trips. I've setup a rather comprehensive database that maps all this together, but i'm having trouble getting the pathing algorithm working when i comes to cross route reservation.
For example, the user wants to go from Montreal to Sherbrooke, he'll only take what we call here Route #47. But in the event he goes to Sutton instead of Sherbrooke, he now has to transfer into route #53 at some point.
Now, it isn't too hard to detect one and only one transfer. But when i comes to detecting what are the options he can do to cross multiple routes, i'm kinda scared. I've devised a cute and relatively efficient way to do so on 1-3 hops using only SQL but i'm wondering how i should organize all this in a much broader spectrum as the client will probably not stay with 2 routes for the rest of is life.
Example of what i've thought of so far:
StartingStop
joins to Route
joins to StopsOfTheRoute
joins to TransfersOnThatStop
joins to TargetStopOfThatTransfer
joins to RouteOfThatStop
joins to StopsOfThatNewRoute
[wash rince repeat for more hops]
where StopsOFThatNewRoute = EndingStop
Problem is, if i have more than 3 hops, i'm sure my SQL server will choke rather fast under the pressure, even if i correctly index my database, i can easily predict a major failure eventually...
Thank you

My understanding of your problem: You are looking for an algorithm that will help you identify a suitable path (covering segments of one or more routes).
This is as Jason correctly points out, a pathfinding problem. For this kind of problem, you should probably start by having a look at Wikipedia's article on pathfinding, and then dig into the details of Djikstra's algorithm. That will probably get you started.
You will however most likely soon realise that your data model might pose a problem, both from a structure and performance perspective. Typical example would be if you need to manage time constraints: which path has the shortest travel time, assuming you find several? One path might be shortest, but only provide one ride per day, while another path might be longer but provide several rides per day.
A possible way of handling this is to create a graph where each node corresponds to a particular stop, at a particular time. An edge would connect from this stop in room-time to both other geographical stops, as well as itself at the next point in time.
My suggestion would be to start by reading up on the pathfinding algorithms, then revisit your data model with regards to any constraints you might have. Then, focus on the structure for storing the data, and the structure for seaching for paths.
Suggestion (not efficient, but could work if you have a sufficient amount of RAM to spare). Use the relational database server for storing the basics: stops, which routes are connected to which stops and so on. Seems you have this covered already. Then build an in-memory representation of a graph given the constraints that you have. You could probably build your own library for this pretty fast (I am not aware of any such libraries for PHP).
Another alternative could be to use a graph database such as Neo4j and its REST interface. I guess this will require significant some redesign of your application.
Hope this gives you some helpful pointers.

Logging in a PHP webapp [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I want to keep logs of some things that people do in my app, in some cases so that it can be undone if needed.
Is it best to store such logs in a file or a database? I'm completely at a loss as to what the pros and cons are except that it's another table to setup.
Is there a third (or fourth etc) option that I'm not aware of that I should look into and learn about?

There is at least one definite reason to go for storing in the database. You can use INSERT DELAYED in MySQL (or similar constructs in other databases), which returns immediately. You won't get any return data from the database with these kinds of queries, and they are not guaranteed to be applied.
By using INSERT DELAYED, you won't slow down your app to much because of the logging. The database is free to write the INSERTs to disk at any time, so it can bundle a bunch of inserts together.
You need to watch out for using MySQL's built in timestamp function (like CURRENT_TIMESTAMP or CUR_DATE()), because they will be called whenever the query is actually executed. So you should make sure that any time data is generated in your programming language, and not by the database. (This paragraph might be MySQL-specific)

You will almost certainly want to use a database for flexible, record based access and to take advantage of the database's ability to handle concurrent data access. If you need to track information that may need to be undone, having it in a structured format is a benefit, as is having the ability to update a row indicating when and by whom a given transaction has been undone.
You likely only want to write to a file if very high performance is an issue, or if you have very unstructured or large amounts of data per record that might be unweidly to store in a database. Note that Unless your application has a very large number of transactions database speed is unlikely to be an issue. Also note that if you are working with a file you'll need to handle concurrent access (read / write / locking) very carefully which is likely not something you want to have to deal with.

I'm a big fan of log4php. It gives you a standard interface for logging actions. It's based on log4j. The library loads a central config file, so you never need to change your code to change logging. It also offers several log targets, like files, syslog, databases, etc.

I'd use a database simply for maintainability - also multiple edits on a file may cause some getting missed out.

I will second both of the above suggestions and add that file locking on a flat file log may cause issues when there are a lot of users.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.