I'm not a "die hard" coder and I need some advice.
I'm developing a website where users may search for a store or a brand.
I've created a class called Search and Store.
There are two ways search is executed: "jQuery Live Search" and "normal search".
Live search is triggered for each character entered above 2 characters. So if you enter 5 characters, a search is performed 3 times. If the store you are looking for is in the dropdown list, you can click the store and the store page will be loaded.
The other search is when you click the search button after entering 3 or more characters.
Every time a search is performed, the following code is executed
$search = new Search();
$result = $search->search($_GET);
Each time a store page is loaded a $store = new Store() is executed.
My question is this:
Let's assume I get a very successful website and I have aroun 100 users per hour. Each user searches at least 3 times and looks at least 5 stores.
That means between 300 and 900 search objects are created every hour and 500 store objects.
Is it bad or good to create so many new objects?
I've read a bit about Singleton, but many advices against this.
How should I do this to achieve best performance? Any specific design pattern I should use?
I don't think that creating the classes will become a bottleneck for your site. Look at an MVC Framework like Zend Framework, and examine how many instances of classes are generated for every call. The overhead of creating an instance of a class is almost nothing, the search will put heat on your db(assuming you are using a db like mysql).
I suggest using a timer for your jQuery Live search to do the search after the user stopped entering more characters. Like refreshing everytime the timer when a character has been entered and when the timer fires you can actually search.
I think one of the bigger problems will be your database. If you have many reading requests a good caching layer like memcache may take a good heap of load from you DB.
Optimizing your db for searches should be a good measure to hold performance high. There are many tweaks and best practices to follow to get the most out of the db you are using.
As a comment of prodigitalson suggested diving into full text search with Lucene could even be more efficient than tuning the db.
If Lucene is bit overhead for you, you may want to look at the Zend_Search_Lucene component, which does the same job and is written in php.
Don't overcomplicate your design by guessing at performance bottlenecks. Number of objects created would rarely be an issue.
If you need to optimize at a later point, a memcached layer could help you.
Creating an high number of objects shouldn't be a performance problem in your application even if you have to pay a bit of attention to the dimensions of these objects.
Don't complicate too much your design, but i think that singleton pattern isn't a complication and it isn't difficult to implement.
So if the same object instance can be reused more times upon different search from the same user (or even by different users, if it is possible inside your application logic), then don't be afraid of using singleton. It saves your memory and preserves you from doing errors related of having multiple instance of objects that performs the same task, eventually sharing resources.
Related
I asked myself what would be the best way to create a history table for a website, I'm only aware of two choices:
Use triggers
Add an extra insert statement to the code when an insert/update/delete statement related to it is used
They said that triggers would be a better way since the load would be on the database and not in the program. But since my website has multiple admins, I would also need to track who modified the content, which I think would only be possible by creating a modified_by column in the history table and manually insert values to it by inserting the session user to that column using the second option together with the time_modified, modified_from, modified_tovalues.
I need to find out if would this be an acceptable reason to use the second option? Are there any more options? Will the second option create any problems in the future?
There is a Rails gem https://github.com/airblade/paper_trail that implements your mentioned functionality comprehensively. I would assume it is doing it purely at the application code level and not the database(triggers) level. This could further indicate that option 2 is better. My thoughts are:
If you use triggers, you will have a serious performance trade off as your site traffic grows and CRUD operations go up. A lot of triggers will go off then.
Implementing some part of the business logic at the database level might be tempting but I would like to keep it all in one place that is in my application code.
You will have some serious thinking to do if you wish to keep your application database agnostic. You will need to re-implement the triggers if you use a different database server.
History keeping increases table size and might become a bottleneck for database performance. You have the option of keeping history for a limited time interval and then archive it to keep the database table nice and clean. Also, you can have a separate database server responsible for history related tables only. These things will be complex to do with triggers.
I can suggest solution # 2. Because i like when all business logic contained in one place (in PHP backend). This code will be more supportable and reusable. And you can save only changed fields into some JSON format. So it keep your HDD place and will work fast. I think triggers is very bad stuff, because you don't know what happens in next time in DB (you must always remember all your triggers):) So i don't use it.
Only one problem you will have - a lot of records in history table. So, i recommend remove records oldest that 3 month
I'd go with option 2. If you are comfortable with ORMs, I'd recommend using one here: a good deal of the history collection code is already written and tested for you.
For example, if you were to use Propel, it comes with a versionable behaviour, which manages a separate versions table, and version numbers per row. (Aside: I believe version 2 hasn't been released as stable yet, though it is linked from the project home page. I use version 1.7, which is still excellent. Both versions have this feature as far as I know).
Doctrine has a similar feature, Versionable, though it looks like that is deprecated in favour of EntityAudit.
I recently started working with Yii PHP MVC Framework. I'm looking for advice on how should I continue working with the database through the framework: should I use framework's base class CActiveRecord which deals with the DB, or should I go with the classic SQL query functions (in my case mssql)?
Obviously or not, for me it seems easier to deal with the DB through classic SQL queries, but, at some point, I imagine there has to be an advantage in using framework's way.
Some SQL queries will get pretty complex pretty often. I just can't comprehend how the framework could help me and not make things more complicated than they actually are.
Very General rule from my experience with Yii and massive databases:
Use Yii Active Record when:
You want to retrieve and post single to a few rows in the database (e.g. user changing his/her settings, updating users balance, adding a vote, getting a count of users online, getting the number of posts under a topic, checking if a model exists)
You want to rapidly design a hierarchical model structure between your tables, (e.g. $user->info->email,$user->settings->currency) allowing you to quickly adjust displayed currency/settings per use.
Stay away from Yii Active Record when:
You want to update several 100 records at a time. (too much overhead for the model)
Yii::app()->db->command()
allows you to avoid the heavy objects and retrieves data in simple arrays.
You want to do advanced joins and queries that involve multiple tables.
Any batch job!! (e.g. checking a payments table to see which customers are overdue on their payments, updating database values etc.)
I love Yii Active Record, but I interchange between the Active Record Model and plain SQL (using Yii::app()->db) based on the requirement in the application.
At the end I have the option whether I want to update a single users currency
$user->info->currency = 'USD';
$user->info->save();
or if I want to update all users currencies:
Yii::app()->db->command('UPDATE ..... SET Currency="USD" where ...');
In any language when dealing with the database a framework can help you by providing an abstraction over the database.
Here is a scenario I know I found myself in many times during my earlier development days:
I have an application that needs a database.
I write a ton of code.
I put the SQL statements in the code along with everything else.
The database changes somehow.
I'm stuck with having to go back and make 100 changes to all my SQL statements.
It's very frustrating.
Another scenario I found:
I write a ton of code against a database.
Bugs come in. Lots of bugs. I can't figure them all out.
I'm asked to write tests for my code.
This is impossible because all my code relies on a direct implementation of the database. How do you test SQL statements when they're with the actual code?
So my advice is to use the framework because it can provide an abstraction over the database. This gives you two really big advantages:
You can potentially swap out the database later and your code stays the same! If you're using interfaces/some framework, then most likely you're dealing with objects and not SQL statements directly. A given implementation might know how to write to MySQL or SQL Server, but in general your code just says "Write this object", "Read that list."
You can test your code! A good framework that deals with data will let you mock the database so you can test it easily.
Try to avoid writing SQL statements directly in the application. It'll save you pain later.
I'm unfamiliar with the database system bundled with Yii, but would advise you to use it a little bit to start with. My experience is with Propel, a popular PHP ORM. In general, ORM systems have a class per table (Propel has three per table).
Now, there'll probably be a syntax to do lookups and joins etc, but the first thing to do is to work out how to use raw SQL in your queries (for any of the CRUD operations). Put methods to do these queries in your model classes, so at least you will be benefitting from centralisation of code.
Once you've got that working, you can migrate to the recommended approach at a later time, without getting overwhelmed with the amount of material you have to learn in one go. Learning Yii (especially how to share code amongst controllers, and to write maintainable view templates) takes a while, so it may be sensible not to over-complicate it with many other things as well.
Why to use Yii:
Just imagine that you have many modules and for each module you have to write a pagination code; writing in old fashion style, will need a lot of time;
Why not use Yii ClistView widget? Oh, and this widget comes with a bonus: the data provider and the auto checking for the existance of the article that is about to be printed;
When using Yii CListView with results from ... Sphinx search engine, the widget will check if the article do really exists, because the result may not be correct
How long will it take for you to write a detection code for non existing registration?
And when you have different types of projects will you addapt the methods?
NO! Yii does this for you.
How long would it take for you to write the code in crud style ? create, read, update, delete ?
Are you going to adapt the old code from another project ?
Yii has a miracle module, called Gii, that generates models, modules, forms, controllers, the crud ... and many more
at first it might seem hard, but when you get experienced, it's easy
I would suggest you should use CActiveRecord.It will give many advantages -
You can use many widgets within yii directly as mentioned above.(For paginations,grids etc)
The queries which are generated by the Yii ORM are highly optimized.
You dont need to put the results extracted from SQLs in your VO objects.
If the tables for some reason modified(addition/deletion of column,changing data type), you just need to regenerate the models using the tool provided by yii.Just make sure you try to avoid doing any code changes in the models generated by yii, that will save your merging efforts.
If you plan to change the DB from MYSQL to other vendor in futur, it would be just config change for you.
Also you and your team would save your precious development time.
I've been reading up a bit on how multi-tiered commenting systems are built:
http://articles.sitepoint.com/article/hierarchical-data-database/2
I understand the two methods talked about in that article. In fact I went down the recursive path myself, and I can see how the "Modified Preorder Tree Traversal" method is very useful as well, but I have a few questions:
How well do these two method perform in a large environment like Reddit's, where you can have thousands and thousands of mutli-tiered comments?
Which method does Reddit use? It simply seems very costly, to me, to have to update thousands of rows if they use the MPTT method. I'm not deluding myself into thinking I am building a system to handle Reddit's traffic, this is simply curiosity.
There's another way of retrieving comments like this ... JOINs via SQL that return the rows with IDs defining their parents. How much slower/faster/better/worse would it be to simply take these unformatted results, loop through them and add them into a formatted array using my language of choice (PHP)?
After reading that sitepoint article, I believe I understand that Oracle offers this functionality in a much simpler, easier to use way, and MySQL does not. Are there any free databases that offer something similar to Oracle?
On a side note, how is SQL pronounced? I'm getting the feeling I've been wrong for the past several years by saying 'sequel' instead of 's - q - l', although "My Sequel" rolls easier off the tongue than "My S Q L"!
MPTT is easier to fetch (a single SQL query), but more expensive to update. Simply delegate the update to a background process (that's what queue managers are for). Also note that most of that update is a single SQL UPDATE command. It might take long to process, but a smart RDBM could make the transaction visible (in cache) to new (read-only) queries before it's committed to disk.
I'd bet it uses MPTT, but not only doing the 'hard' update in background but also quite likely do a simple rendering to in-memory cache. This way, the posting user can see his post immediately, without having to wait until updating so many rows. Also, SSDs do help in getting high transaction rates.
that's called Adjacency Model (or sometimes adjacency list), it's a more obvious way to do it, and simpler to update (doesn't modify existing records) but FAR more inefficient to read. You have to do a recursive walk of the tree, with an SQL query at each node. That's what kills you: the number of small queries.
PostgreSQL has recursive SELECTs, which do in the server what you envision in PHP. It's better than PHP because it's closer to the data; but it still has the same (huge) number of random-access disk seeks.
You should have a closer look at the links in Further reading they give in the end. The Four ways to work with hierarchical data article on evolt linked there provides another way to approach this problem (the Flat table). Since that approach is extremely easy to implement for a threaded discussion board, I wouldn't be surprised if reddit uses it (or a variation on the theme).
I do like MPTT (aka nested set) though, and have used it for hierarchies that are (almost) static.
I am working on a web application which involves create list of Restaurants in various lists like "Joe's must visit places". Now for each Restaurant and list, I have display on website which calculates
Calculating popularity of a Restaurant
Popularity of a list
Number of lists a Restaurant is present in
Currently I am using MySQL statements in PHP for this but planning to switch to MySQL VIEWS and do a simple select statement in PHP...
my question is,
What is Advantage/Disadvantage of using VIEWS over writing sql queries in PHP?
Using views adds a level of abstraction : you may later change the structure of your tables, and you will not have to change the code that displays the information about the lists, because you will still be querying the view (the view definition may change, though).
The main difference is that views are updated after each insertion, such that the data is "ready" whenever you query the view, whereas using your custom query will have MySQL compute everything each time (there is some caching, of course).
The bottom line is that if your lists are updated less frenquently than they are viewed, you will see some gains in performance in using views.
My complete answer would depend upon several things (from your application's perspective):
do you plan to allow users to create and share such lists?
can users create lists of any kind, or just by plugging values into existing query templates?
Assuming you have a couple of pre-defined lists to display:
Use of views offers a couple of advantages:
your code will be cleaner
the query to generate the views will not have to be parsed each time by mysql.
I'm not sure about this: I don't think mysql caches views as Tomasz suggests - I don't think views contain "already preparted data".
One disadvantage is that the logic involved in creating the list goes into the database instead of living in your PHP code - something I'm very averse to. In my world databases are for data, and code is for logic.
Cheers
The original question was about pros and cons, but not seeing much about disadvantages in the answers so far.
Isn't one disadvantage of views that they can give you the false comfort of running a simple query?
For instance, SELECT username FROM myview WHERE id='1'
That looks simple, but what if "myview" is a really complex SELECT... Perhaps even built on other views? You end up having a simple-looking query that, in the background, takes a whole lot more work than if you had written your query from the ground up.
I've been experimenting with views, and despite the benefits, have not yet been fully sold.
I'd be interested in hearing what others perceive about the disadvantages of Views, rather than just the party line about why views are so great. Might still make the switch, but would like to understand more about performance.
If that tables you are trying to make view from are not subject to a frequent change, definitely you gain performance, as you are only doing simple select from already prepared data. But be aware of the fact, that view is not something that is made "once and forever" - every change of a content of one of the tables will make database engine do "view refreshing", so another query (query you are making view from) must be called to taki into account changes that were made. To sum up:
Infrequent changes? Performance. Frequent / constant changes (community adding, commenting, rating your restaurants) - better go with SQL queries.
Disadvantages:
In my opinion databases are used for data layer and it is not that proper to put business code inside them. It both reduces maintainability and it contradicts clean separation of layers. The same applies to including business code and calculations in java scripts of web pages. For java script it is even more serious since it creates security threats. Source control for the code inside database is also another issue.
Now that code is inside database, the security and access complications (to views and stored procedures) is also added.
Migrating an application from one database engine to another will be much more difficult (since in addition to simple queries the stored procedures/views etc. are possibly different too). If the database is only about data then an abstraction layer could allow changing the database engine (at least on at some extent).
Advantages:
Slight performance gains (since data is not coming out of the database for processing, it is processed right inside the database).
Code will seem cleaner (since the dirtiness is hidden inside the database views, stored procedures etc.).
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm rewriting a big website, that needs very solid architecture, here are my few questions, and pardon me for mixing apples and oranges and probably kiwi too:) I did a lot of research and ended up totally confused.
Main question: Which approach would you take in building a big website expected to grow in every way?
Single entry point, pages data in the database, pulled by associating GET variable with database entry (?pageid=whatever)
Single entry point, pages data in separate files, included based on GET variable (?pageid=whatever would include whatever.php)
MVC (Alright guys, I'm all for it, but can't grasp the concept besides checking all tutorials and frameworks out there, do they store "view" in database? Seems to me from examples that if you have 1000 pages of same kind they can be shaped by 1 model, but I'll still need to have 1000 "views" files?)
PAC - this sounds even more logical to me, but didn't find much resources - if this is a good way to go, can you recommend any books or links?
DAL/DAO/DDD - i learned about these terms by diligently reading through stack overflow before posting question. Not sure if it belongs to this list
Sit down and create my own architecture (likely to do if nobody enlightens me here:)
Something not mentioned...
Thanks.
Scalability/availability (iow. high-traffic) for websites is best addressed by none of the items you mention. Especially points 1 and 2; storing the page definitions in a database is an absolute no-no. MVC and other similar patterns are more for code clarity and maintenance, not for scalability.
An important piece of missing information is what kind of concurrent hits/sec are you expecting? Sometimes, people who haven't built high-traffic websites are surprised at the hit rates that actually constitute a "scalability nightmare".
There are books on how to design scalable architectures, so an SO post will not be able to the topic justice, but some very top-level concepts, in no particular order, are:
Scalability is best handled first by looking at hardware-based solutions. A beefy server with an array of SSD disks can go a long way.
Make static anything that can be static. Serve as much as you can from the web server, not the DB. For example, a lot of pages on websites dynamically generate data lists out of databases from data stores that very rarely or never really change.
Cache output that changes infrequently, and tune the cache refresh.
Build dynamic pages to be stateless or asynchronous. Look into CQRS and Event Sourcing for patterns that favor/facilitate scaling.
Tune your queries. The DB is usually the big bottleneck since it is a shared resource. Lots of web app builders use ORMs that create poor queries.
Tune your database engine. Backups, replication, sweeping, logging, all of these require just a little bit of resource from your engine. Tuning it can lead to a faster DB that buys you time from a scale-out.
Reduce the number of HTTP requests from clients. Each HTTP connect has overhead. Check your pages and see if you can increase the payload in each request so as to reduce the overall number of individual requests.
At this point, you've optimized the behavior on one server, and you have to "scale out". Now, things get very complicated very fast. Load-balancing scenarios of various types (sharding, DNS-driven, dumb balancing, etc), separating read data from write data on different DBs, going to a virtualization solution like Google Apps, offload static content to a big CDN service, use a language like Erlang or Scala and parallelize your app, etc...
Single entry point, pages data in the
database, pulled by associating GET
variable with database entry
(?pageid=whatever)
Potential nightmare for maintenance. And also for development if you have team of more than 2-3 people. You would need to create a set of strict rules for everyone to adhere to - effort that would be much better spent if using MVC. Same goes for 2.
MVC (Alright guys, I'm all for it, but
can't grasp the concept besides
checking all tutorials and frameworks
out there, do they store "view" in
database? Seems to me from examples
that if you have 1000 pages of same
kind they can be shaped by 1 model,
but I'll still need to have 1000
"views" files?)
It depends how many page layouts are there. Most MVC frameworks allow you to work with structured views (i.e. main page views, sub-views). Think of a view as HTML template for the web page. How many templates and sub-templates inside you need is exactly how many view's you'll have. I believe most websites can get away with up to 50 main views and up to 100 subviews - but those are very large sites. Looking at some sites I run, it's more like 50 views in total.
DAL/DAO/DDD - i learned about these
terms by diligently reading through
stack overflow before posting
question. Not sure if it belongs to
this list
It does. DDD is great if you need meta-views or meta-models. Say, if all your models are quite similar in structure, but differ only in database tables used and your views almost map 1:1 to models. In that case, it is a good time for DDD. A good example is some ERP software where you don't need a separate design for all the database tables, you can use some uniform way to do all the CRUD operations. In this case you could probably get away with one model and a couple of views - all generated dynamically at run-time using meta-model that maps database columns, types and rules to logic of programming language. But, please note that it does take some time and effort to build a quality DDD engine so that your application doesn't look like hacked-up MS Access program.
Sit down and create my own
architecture (likely to do if nobody
enlightens me here:)
If you're building a public-facing website, you're most likely going to do it well with MVC. A very good starting point is to look at CodeIgniter video tutorials. It helped me understand what MVC really is and how to use it way better than any HOWTO or manual I read. And they only take 29minutes altogether:
http://codeigniter.com/tutorials/
Enjoy.
I'm a fan of MVC because I've found it easier to scale your team when everything has a place and is nice and compartmentalized. It takes some getting used to, but the easiest way to get a handle on it is to dive in.
That said definitely check your local library to see if they have the O'Reilley book on scaling: http://oreilly.com/catalog/9780596102357 which is a good place to start.
If you're creating a "big" website and don't fully grasp MVC or a web framework then a CMS might be a better route since you can expand it with plugins as you see fit. With this route you can worry more about the content and page structure rather than the platform. As long as you pick the appropriate CMS.
I would suggest to create a mock app with some of the web mvc frameworks in the wild and pick one, with which your development was smooth enough. Establishing your code on a solid basis is fundamental, if you want to grasp concepts of mvc and be ready to add new functionality to your web easily.