I'm trying to finish up a long term project and looking over my code looking for inneficienies and attempting to tidy them up.
The data structure in mySQL is an undirected graph, on the whole I'm quite happy with the performance though I am sticking to old habits such as cacheing the results in flat files where data does not change readily despite being dynamic. I'm also using flatfiles as my site configuration database, this is approximately 40 lines with a structure ConfVar=ConfVarValue.
Is this an efficient hybrid use of flatfiles and SQL? I've constantly questioned myself whilst designing this structure whether the flatfiles are secure enough (they are all stored sub doc root)? And are they providing me with the efficiency I was ultimately aiming for in a scalable manner?
Any guidance, thoughts, observations anyone has had whilst designing similar data models would be invaluable. Thanks in advance.
Related
I am constructing a PHP framework from scratch (unfortunately I don't have any choice in this matter). The framework is required to rely heavily on object-oriented data, and therefore needs to have the ability to store large amounts of object-oriented data efficiently.
I am struggling with the second part.
I've been working on this for a few months. Initially I was introduced to the idea of an ORM, after trying a few pre-built libraries (Doctrine 2, Redbean etc) I liked the idea, but none of what I could find functioned the way that was required, so I set out to create my own ORM, of which turned out quite well. The only issue really is that it suffers in performance, and after spending some time trying to optimize it, I am now convinced that an ORM is not quite the solution to the problem. Although close, it just doesn't quite cut it.
I have briefly looked into other solutions, but due to my lack of experience in this area I am struggling to pin-point the solution.
Here are the requirements of the data storage engine:
Ultimately, it needs to be able to store key-value pairs
The "value" part can be a simple data type, but can also be an object, or an array of the same type of object.
The application defines the structure of each object (or the SCHEMA), sort of in the same way that a .wsdl file works, so the engine would need to like strict formats.
Objects can either have their instances re-used, or not. Meaning that if an object exists as a child object in multiple locations (across many objects) its values are the same everywhere that it is located (if it re-used). Otherwise, a new instance of the object exists for every existing object (not re-used).
There needs to be the ability to query the data efficiently, to make comparisons on any part of an object to find it. For example: find a customer where customer.address.postcode LIKE ('%XXX%')
Any suggestions would be greatly appreciated
EDIT
Thanks to those that have attempted to aid me so far in my somewhat crazy endeavour. To answer some questions that have so far been asked:
What solutions have you tried, and why did they not work?
ORM systems
I had tried a small number of pre-built ORM libraries for PHP. Including Doctrine 2 and Redbean. With Doctrine it was more to do with how you specified the SCHEMA of a model, in that you are required to do so in docblocks. I found this particularly awkward to use due to the requirements that I had, particularly because I knew of a number of ways this could be avoided. I did eventually manage to get Doctrine to work the way that I wanted, but this was after hacking away at the code. Again, this was fun, but it wasn't right.
Redbean actively required me to change the property names of objects. One of my requirements was to basically be able to plug in any sort of document-oriented object, and store it. So having to specifically name properties in order to do this was counter-intuitive. Again, I did play with Redbean for a bit to get it to work, which wasn't right.
It was after playing with a few more ORM systems that I felt I had the knowledge to make my own. Again, the ORM system that I made was good, in that it met the requirements precisely. It was massively let-down due to poor performance, specifically when dealing with large sets of data, but more so when dealing with largely complex models.
Storing objects in XML files
There was a very small time that I considered this, thinking that maybe my requirements meant that I was always going to end up with performance being a problem. So I set out designing a way to generate text-based storage and ultimately ended up creating a whole SCHEMA engine and a bunch of other interesting things. This turned out to be just a fun project in the end, I just couldn't get it to perform at all.
NoSQL
My most recent endeavours have pushed me down the route of systems such as MongoDB and a few other NoSQL systems that I didn't much get into like Cassandra.
MongoDB comes very close to being a tool I could use, however it would require that I add an additional layer because I do in-fact require a SCHEMA, since my objects always conform to a specific structure. I am slowly coming to terms with MongoDB possibly being the solution, however I want to make sure before I spend more time on this.
What exactly do you mean by efficient?
I'm not 100% talking about performance when I mention efficiency, although performance is most certainly an important factor that I am using to consider my options, I understand that going down this route rather than something like a relational database, performance is naturally going to be a problem.
I am more talking about using the right tools. I never like to have to hack away at someone's code to get things to work. To me, it feels as if I am pushing things down a road that the system wasn't designed to go down, and at some point in the future it will bite me in the a**.
So really, when I mention I am looking for something "efficient", I'm meaning tools that match the requirements as closely as possible, so that I am only using/extending the functionality, rather than re-writing it.
Here are some routes to look into. Your requirement for storing "objects" (quite a broad term when it comes to databases) makes me think of:
Storing data in databases in a serialised format, e.g. JSON. PostgreSQL these days has ways to reach into such a column to do search operations on it, so it is not as non-searchable as has been previously regarded (though I would expect it to be slower than querying correctly normalised data).
The requirement to store customer.address.postcode makes me think that you could store your data as a hierarchy, in which case there are several algorithms available to you. Look into nested sets. This is designed to work well with relational databases, without resorting to recursive SQL.
It's not an area of my expertise, but graph databases may be worth looking into.
On a side note, Doctrine is a great library from what I hear, but I suspect you need to work out what technology to use first. It is designed broadly to map onto a relational database, so if you can't express your problem cleanly in a raw RDBMS, Doctrine may not help.
(This could be an XY question, it's hard to tell. You've said you need Y, but if you can tell us that you want to achieve X, maybe the feedback you're getting would be more concrete - and take you in a better direction).
I am writing a website which indexes large amounts of data into databases (each with about 800 tables per database), and the website allows you to search the database for various items. Should I use something like lucene or just write my own search algorithm? I am using PHP and MySQL. Although I can filter my SELECT queries, and create a searching algorithm I just wanted to know if I should use Lucene because I am just indexing stuff in a database. Also please do suggest anything that might help me. Forgot to mention that even though I have 800 tables they would be pretty small in size.
Lucene is a mature, tested, open source library.
I would definetly say: try to use it as much as possible, it will probably be better and consume less time then implementing your own library.
If there is a certain functionality that lucene does not provide - you can always create your own variation of lucene to take care of it.
Do not underestimate the importance of the community in using products such lucene: Help is almost always available in lucene's forums [and SO], and the library is constantly tested and maintained because of the large number of users!
Without seeing your data answering this question is very hard, however I can say from personal experience that writing a search of any kind quickly becomes very complex. You have to worry about weighting the various columns you are searching, and search in SQL is almost never as fast as search in a dedicated search engine. At work we are switching from an in house SQL based search to Sphinx Search to search our product catalog because of this very reason.
I am using a commercial PHP web application that stores information in a mysql database, and find myself needing to create some custom reports on that database information, ideally presented via the web with the ability of exporting the reports to PDF or some external format as well.
I could just slap together some PHP to query the DB and then show the results of SQL queries against that DB, but was hoping there may be a more intelligent framework I could employ to generate these reports faster and easier now as well as in the future. Codeigniter looks like it may be a good starting point, but I'm not in love with it. What do people use when they need to work with an existing SQL db info but dont want to roll it all from scratch?
edit - I know php/python/ruby enough to operate, but I'm rusty so starting from scratch will make the process longer than it probably needs to be. I'm looking to leverage quality frameworks if they exist to give me the best results in the longrun
I would recommend Django, it has a management command that can help automatically generate models from an existing database, inspectdb. You could leverage that to quickly get going and start using Django's powerful ORM to build your reports.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm rewriting a big website, that needs very solid architecture, here are my few questions, and pardon me for mixing apples and oranges and probably kiwi too:) I did a lot of research and ended up totally confused.
Main question: Which approach would you take in building a big website expected to grow in every way?
Single entry point, pages data in the database, pulled by associating GET variable with database entry (?pageid=whatever)
Single entry point, pages data in separate files, included based on GET variable (?pageid=whatever would include whatever.php)
MVC (Alright guys, I'm all for it, but can't grasp the concept besides checking all tutorials and frameworks out there, do they store "view" in database? Seems to me from examples that if you have 1000 pages of same kind they can be shaped by 1 model, but I'll still need to have 1000 "views" files?)
PAC - this sounds even more logical to me, but didn't find much resources - if this is a good way to go, can you recommend any books or links?
DAL/DAO/DDD - i learned about these terms by diligently reading through stack overflow before posting question. Not sure if it belongs to this list
Sit down and create my own architecture (likely to do if nobody enlightens me here:)
Something not mentioned...
Thanks.
Scalability/availability (iow. high-traffic) for websites is best addressed by none of the items you mention. Especially points 1 and 2; storing the page definitions in a database is an absolute no-no. MVC and other similar patterns are more for code clarity and maintenance, not for scalability.
An important piece of missing information is what kind of concurrent hits/sec are you expecting? Sometimes, people who haven't built high-traffic websites are surprised at the hit rates that actually constitute a "scalability nightmare".
There are books on how to design scalable architectures, so an SO post will not be able to the topic justice, but some very top-level concepts, in no particular order, are:
Scalability is best handled first by looking at hardware-based solutions. A beefy server with an array of SSD disks can go a long way.
Make static anything that can be static. Serve as much as you can from the web server, not the DB. For example, a lot of pages on websites dynamically generate data lists out of databases from data stores that very rarely or never really change.
Cache output that changes infrequently, and tune the cache refresh.
Build dynamic pages to be stateless or asynchronous. Look into CQRS and Event Sourcing for patterns that favor/facilitate scaling.
Tune your queries. The DB is usually the big bottleneck since it is a shared resource. Lots of web app builders use ORMs that create poor queries.
Tune your database engine. Backups, replication, sweeping, logging, all of these require just a little bit of resource from your engine. Tuning it can lead to a faster DB that buys you time from a scale-out.
Reduce the number of HTTP requests from clients. Each HTTP connect has overhead. Check your pages and see if you can increase the payload in each request so as to reduce the overall number of individual requests.
At this point, you've optimized the behavior on one server, and you have to "scale out". Now, things get very complicated very fast. Load-balancing scenarios of various types (sharding, DNS-driven, dumb balancing, etc), separating read data from write data on different DBs, going to a virtualization solution like Google Apps, offload static content to a big CDN service, use a language like Erlang or Scala and parallelize your app, etc...
Single entry point, pages data in the
database, pulled by associating GET
variable with database entry
(?pageid=whatever)
Potential nightmare for maintenance. And also for development if you have team of more than 2-3 people. You would need to create a set of strict rules for everyone to adhere to - effort that would be much better spent if using MVC. Same goes for 2.
MVC (Alright guys, I'm all for it, but
can't grasp the concept besides
checking all tutorials and frameworks
out there, do they store "view" in
database? Seems to me from examples
that if you have 1000 pages of same
kind they can be shaped by 1 model,
but I'll still need to have 1000
"views" files?)
It depends how many page layouts are there. Most MVC frameworks allow you to work with structured views (i.e. main page views, sub-views). Think of a view as HTML template for the web page. How many templates and sub-templates inside you need is exactly how many view's you'll have. I believe most websites can get away with up to 50 main views and up to 100 subviews - but those are very large sites. Looking at some sites I run, it's more like 50 views in total.
DAL/DAO/DDD - i learned about these
terms by diligently reading through
stack overflow before posting
question. Not sure if it belongs to
this list
It does. DDD is great if you need meta-views or meta-models. Say, if all your models are quite similar in structure, but differ only in database tables used and your views almost map 1:1 to models. In that case, it is a good time for DDD. A good example is some ERP software where you don't need a separate design for all the database tables, you can use some uniform way to do all the CRUD operations. In this case you could probably get away with one model and a couple of views - all generated dynamically at run-time using meta-model that maps database columns, types and rules to logic of programming language. But, please note that it does take some time and effort to build a quality DDD engine so that your application doesn't look like hacked-up MS Access program.
Sit down and create my own
architecture (likely to do if nobody
enlightens me here:)
If you're building a public-facing website, you're most likely going to do it well with MVC. A very good starting point is to look at CodeIgniter video tutorials. It helped me understand what MVC really is and how to use it way better than any HOWTO or manual I read. And they only take 29minutes altogether:
http://codeigniter.com/tutorials/
Enjoy.
I'm a fan of MVC because I've found it easier to scale your team when everything has a place and is nice and compartmentalized. It takes some getting used to, but the easiest way to get a handle on it is to dive in.
That said definitely check your local library to see if they have the O'Reilley book on scaling: http://oreilly.com/catalog/9780596102357 which is a good place to start.
If you're creating a "big" website and don't fully grasp MVC or a web framework then a CMS might be a better route since you can expand it with plugins as you see fit. With this route you can worry more about the content and page structure rather than the platform. As long as you pick the appropriate CMS.
I would suggest to create a mock app with some of the web mvc frameworks in the wild and pick one, with which your development was smooth enough. Establishing your code on a solid basis is fundamental, if you want to grasp concepts of mvc and be ready to add new functionality to your web easily.
I'm trying to learn good relational database design (using mysql and php if that makes any difference). I've already done some database work, so I'm not totally clueless, but I suspect that my solutions may not have adhered to best practices for efficient searching, optimization, etc.
Can someone suggest a good set of videos on the topic? If you know something is superb or has really made a difference in your own learning, please post your suggestion. Prefer videos, but books (as long as they're not too huge) are ok too. But prefer videos.
Thank you
Well, I would be sooo happy if I could recommend some fine videos, but I can't think any of them. Especially in database design, which is quite a complicated topic, I will not recommend videos. The fastest way isn't the shortest all the time.
If you would like to get some theoretical introduction to the topic, I could recommend the classic Ullman-Widom - A First Course in Database Systems book. It's quite huge, but it contains information that you won't use. Anyway, it sums up the theory of database design on the first 130 pages and it's really nice to read. It wil help to get your SQL to a higher level, too. You can find some information about using databases with PHP, XML and so on.
Have a look at www.vtc.com - they have training videos on just about everything, including database design, modelling and optimisation for most platforms. Some of them are free to view online as well.