I'm new to web development and for my first project I have a need for 4 cascading drop down menus. Ive been reading as much as I can about the best way to tackle this and there is some discussion about whether to preload these lists or to grab them from my database. Mine is a LAMP project and I have about 5000 items that need to go into these dropdown menus. The discussion I've seen states that if you don't have a large number of dropdown items you are much better off preloading these items and pulling them using javascript. My question is what is considered a large number? With 5000 items am I much better off going the MySQL route? On some level I prefer that but all of the tutorials out there seem to deal with either ASP or preloaded items. So my questions are:
Is 5000 items too many to preload for a dropdown list? What would the cutoff be?
And, do you know of any good tutorials to get me started if MySQL is the way to go. I've seen a couple but they are not very detailed and for a newbie I'm getting wrapped around the axle. Thanks so much!
I would look more to the size in kb than how many items. I'm guessing even 5k entries is going to be 1-2kb after gzip, which is generally very little compared to the rest of the external objects a typical webpage loads(compare to the size of images + css + js).
Having to do multiple separate http requests is expensive from a responsiveness standpoint, especially for high latency connections like mobile devices(web autocomplete is generally pretty lame on my phone due to latency).
So anyway, personally I lean heavily towards just loading it all upfront unless the amount of data gets huge, or the chance of the data being used at all is very low.
First off, 5K in dropdowns, no way! Same for caching it in JS. My advice would be, try to investigate PHP, AJAX and MySQL. Then create an autocomplete system that allows for a non-mindblowing method to access the information.
If you still need a good tutorial or PHP MySQL dropdowns:
http://www.yourwebskills.com/mysqldropdown.php
And then for my advice:
http://beski.wordpress.com/2009/11/20/jquery-php-mysql-ajax-autocomplete/
http://www.webinone.net/tutorials/php/auto-complete-text-box-with-php-jquery-and-mysql/
Please read around, and if you can't figure stuff out, please come back and ask!
Related
I have an internal website using LAMP architecture. My main page takes around 10 secs to load the data. There isn't lot of data, around 4-5k records. I dont have any complex MYSQL queries, but have a lot of them i.e. around 10-15 queries. Basically I'm extracting meta-data to display on the page. These are very simple queries. I have lot of PHP and javascript logic which is of medium complexity. I can't remove any of that. I have around 1800 lines of code in that page and I'm using datatables to display data.
The datatable contains 25 columns and lot of html select elements.
So how will I know what is causing performance bottleneck in this page? I tried to be as clear as possible, but please let me know if you have any questions.
Appreciate your time and help.
use any php profiler tool like zend profiler to see what section is taking much time
http://erichogue.ca/2011/03/linux/profiling-a-php-application/
Try to get all the data at once. I've had issues in the past where using multiple selects actually takes longer than more efficiently querying (also make sure you use a sql connection for as long as possible). Other than that profiling is the way to go in case your other code needs optimization.
I've been given a big project by a big client and I've been working on it for 2 months now. I'm getting closer and closer to a solution but it's just so insanely complex that I can't quite get there, and so I need ideas.
The project is quite simple: There is a 1mil+ database of lat/lng coordinates with lots of additional data for each record. A user will visit a page and enter some search terms which will filter out quite a lot of the records. All of the records that match the filter are displayed (often clustered) on a Google Maps.
The problem with this is that the client demands it's fast, lean, and low-bandwidth. Hence, I'm stuck. What I'm currently doing is: Present the first clusters, and when they hover over a cluster, begin loading in the data for that clusters children.
However, I've upped it to 30,000 of the millions of listings and it's starting to drag a little. I've made as many optimizations that I possibly can. When the filter is changed, I AJAX a query to the DB and return all the ID's of the matches, then update the map to reflect this.
So, optimization is not an option. I need an entirely new conceptual model for this. Any input at all would be highly appreciated, as this is an incredibly complex project of which I can't find anything in history even remotely close to it- I even looked at MMORPG's which have a lot of similar problems, and I have made a few, but the concept of having a million players in one room is still something MMORPG makers cringe at. It's getting common that people think there may be bottlenecks, but let me say that it's not a case of optimizing this way. I need a new model in which a huge database stays on the server, but is displayed fluidly to the user.
I'll be awarding 500 rep as soon as it becomes available for anything that solves this.
Thanks- Daniel.
I think there are a number of possible answers to your question depending on where it is slowing down, so here goes a few thoughts.
A wider table can effect the speed with which a query is returned. Longer records mean that more disc is being accessed to get the right data, so you might want to think about limiting your initial table to hold only the information that can be filtered out. Having said that, it will also depend on the db engine you are using, some suffer more than others.
Ensuring that your tables are correctly indexed makes a HUGE difference in performance. You need to make sure that the query is using the indexes to quickly get to the records that it needs.
A friend was working with Google Maps and said that the API really suffered if too much was displayed on the maps. This might just be totally out of your control.
Having worked for Epic Games in the past, the reason that "millions of players in a room" is something to cringe at is more often hardware driven. In a game, having that number of players would grind the graphics card to a halt as it tries to render all the polygons of the models. Secondly (and likely more importantly) the problem would be that you have to send each client information about what each item/player is doing. This means that your bandwidth use will spike very heavily. Your server might handle the load, but the players internet connection might not.
I do think that you need to edit your question though with some extra information on WHAT is slowing down. Your database? Your query? Google API? The transfer of data between server and client machine?
Let's be honest here; a db with 1 million records being accessed by presumably a large amount of users, is not going to run very well unless you put some extremely powerful hardware behind it.
In this type of case, I would suggest using several different database servers, and setting up some decent load balancing regimes in order to keep them running as smoothly as possible. First and foremost, you will need to find out the "average" load you can place on a db server before it starts to lag up; let's say for example, this is 50,000 records. Setting a low MaxClients per server may assist you with server performance and preventing against crashes, but it might aggravate your users when they can't execute any queries due to high load.. but it's something to keep in mind if your budget doesn't allow for much wiggle room hardware-wise.
On the topic of hardware however, that's something you really need to take a look at. Databases typically don't use a huge amount of CPU/RAM, but they can be quite taxing on your HDD. I would recommend going for SAS or SSD before looking at other components on your setup; these will make the world of a difference for you.
As far as load balancing goes, a very common technique used for most content providers is that when one query/particular content item (such as a popular video on youtube etc) is pulling in an above average amount of traffic, you can cache its result. A quick and dirty approach to this is to use an if statement in your search bar, which will then grab a static html page instead of actually running the query.
Another approach to this is to have a seperate db server on standalone, only for running queries which are taking in an excessive amount of traffic.
With that, never underestimate your code optimisation. While the differences may seem subtle to you, when run across millions of queries by thousands of users, those tiny differences really do add up.
Best of luck with it - let me know if you need any further assistance.
Eoghan
Google has a service named "Big Query". It is a sql Server in the cloud. It uses its fast servers for sql and it can search millions of data rows quickly. Unfortunately it is not free.. but maybe it will help you out:
https://developers.google.com/bigquery/
I'm currently developing a Zend Framework project, using Doctrine as ORM.
I ran into the typical situation where you have to show a list of items (around 400) in a table, and of course, I don't want to show them all at once.
I've already used Zend_Paginator before (only some basic usage), but i always used to get all the items from the DB, and then paginate them, but now it doesn't feel quite right.
My question is this: is it better to get all items from DB first and then "paginate" them, or to get "pages" of items as they are requested? which would have a larger impact on performance?
For me, it is better to get a part of the data and then paginate through them.
If you get All the data from a DB you paginate with the help of JavaScript.
The first opening of the page will take a long time (for 400 rec. is OK).
Browser has a limited memory. If a user opens up a lot of tabs in the browser
and you take a lot of memory (with your data)
this will slow down the speed of the browser and the speed of your application.
You have only 400 records but the increase of the data happens very often.
At worst, the whole browser may break when the page is opened.
What if browser doesn't support JS ...
If you get part of the data from DB, the only defect is if
a user has a very slow Internet speed(but this is the defect in the first option - in the first refresh of the page).
If someone wants to get to another page, it will take a little bit longer than JavaScript.
The second option is better(for me) in the long run, because if it works it will work for years.
The database engines are usually best suited to do the retrieval for you. So, in general, if you can delegate a data-retrieval task to the DB engine instead of doing it in-memory and using your programming language, the best bet for performance is to let the DB engine do it for you.
But also remember that if you don't configure the indices correctly or don't run a good query, you won't get the best result out of your DB engine.
However, most DB engines nowadays are capable of optimizing your queries for you and running them in their most normal form.
I've been working on a new site of mine for a couple of days now which will be retrieving almost all of its most used content from a MySql database. Seeming as the Database and website is still under development the tables are really small at the moment and speed is of no concern yet.
But you know what they say, a little bit of hard work now saves you a headache later on.
Now I'm only 17, the only database I've ever been taught was through Microsoft Access, and we were practically given the database completed - we learned up to 3NF, but that was about it.
I remember reading once when I was looking to pull data (randomly) out of a database how large databases were taking several seconds/minutes to complete a single query, so this just got me thinking. In a fraction of a second I can submit a search to google, google processes the query and returns the result, and then my browser renders it - all done in the blink of an eye. And google has billions of records to search through. And they're also doing this for millions of users simultaneously.
I'm thinking, how do they do it? I know that they have huge data centers, but still.
I realize that it probably comes down to the design of the database, how it's been optimized, and obviously the configuration. And I guess that's my question really. Could someone please tell me how to design high performance databases for millions/billions of rows (yes, I'm being optimistic), and possibly point me towards some good reading material to help me learn further?
Also, all my queries are done via PHP, if that's at all relevant to any answers.
The blog http://highscalability.com/ has some good articles and pointers to how companies handle large problems.
Specifically related to MySQL, you can Google for craigslist.org's use of MySQL.
http://www.slideshare.net/jzawodn/mysql-and-search-at-craigslist
First the good news... MySQL scales well (depending on the hardware) to at least hundreds of millions of rows.
Once you get to a certain point, a single database server will have trouble managing the load. That's when you get into the realm of partitioning or sharding... spreading the load across multiple database servers using any one of a number of different schemes (e.g. putting unrelated tables on different servers, spreading a single table across multiple servers e.g. by using the ID or date range as a partitioning key).
SQL does shard, but is not fundamentally designed to shard well. There's a whole category of storage alternatives collectively referred to as NoSQL that are designed to solve that very problem (MongoDB, Cassandra, HBase are a few).
When you use SQL at very large scale, you run into any number of issues such as making data model changes across a DB server farm, trouble keeping up with data backups, etc. That's a very complex topic, and people that solve it well are rare. For a glimpse at the issues, have a look at http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse-than-death/
When selecting a database platform for a specific project, benchmark the solution early and often to understand whether or not it will meet the performance requirements that you envision. Having a framework to do that will help you learn about scalability, and will help you decide whether to invest effort in improving the data storage part of your solution, and will help you know where best to invest your time.
No one can tell how to design databases. It comes after much reading and many hour working on them. A good design is product of many many years doing them though. As you've only seen Access you got no knowledge of databases. Search through Amazon.com and you'll get tons of titles. For someone that's starting, anyone will do it.
I mean no disrespect. I've been there and I'm also tutor of some people learning programming/database design. I do know that there's no silver bullet or shortcuts for the work you have ahead.
If you intend to work with high performance database, you should have something in mind. The design of them in per application. A good design depends on learning more and more how the app's users interact with the system, the usage patterns, etc. The things you'll learn from books will give you options, using them will depend heavily on the scenario.
Good luck!
It doesn't all come down to the design of the database, though that is indeed a big part of it. The guys who made Google are geniouses, and if I'm not completely wrong about Google you won't be able to find out exactly how they do what they do. Also, I know that years back they had more than 10,000 computers processing queries, and today they probably have many more. I also suspect them for caching most of the recent/popular keywords. And all the websites have been indexed and analyzed using an unknown algorithm which will make sure the computers don't have to look through all the words on every page.
In fact, Google crawls the entire internet around every 14 days, so when you do a search you do not search the entire internet. Your search gets broken down into keywords and then these keywords are used to narrow the number of relevant pages - and I'm pretty sure all pages have already been analyzed for important and/or relevant keywords before you even thought of visiting google.com.
Have a look at this question.
Have a look into Sphinx server.
http://sphinxsearch.com/
Craigslist uses that for their search engine. Basically, you give it a source and it indexes whatever you want (mysql database/table, text files, etc.). If it works for craigslist, it should work for you.
I'm a hobbyist, and started learning PHP last September solely to build a hobby website that I had always wished and dreamed another more competent person might make.
I enjoy programming, but I have little free time and enjoy a wide range of other interests and activities.
I feel learning PHP alone can probably allow me to create 98% of the desired features for my site, but that last 2% is awfully appealing:
The most powerful tool of the site is an advanced search page that picks through a 1000+ record game scenario database. Users can data-mine to tremendous depths - this advanced page has upwards of 50 different potential variables. It's designed to allow the hardcore user to search on almost any possible combination of data in our database and it works well. Those who aren't interested in wading through the sea of options may use the Basic Search, which is comprised of the most popular parts of the Advanced search.
Because the advanced search is so comprehensive, and because the database is rather small (less than 1,200 potential hits maximum), with each variable you choose to include the likelihood of getting any qualifying results at all drops dramatically.
In my fantasy land where I can wield AJAX as if it were Excalibur, my users would have a realtime Total Results counter in the corner of their screen as they used this page, which would automatically update its query structure and report how many results will be displayed with the addition of each variable. In this way it would be effortless to know just how many variables are enough, and when you've gone and added one that zeroes out the results set.
A somewhat similar implementation, at least visually, would be the Subtotal sidebar when building a new custom computer on IBuyPower.com
For those of you actually still reading this, my question is really rather simple:
Given the time & ability constraints outlined above, would I be able to learn just enough AJAX (or whatever) needed to pull this one feature off without too much trouble? would I be able to more or less drop-in a pre-written code snippet and tweak to fit? or should I consider opening my code up to a trusted & capable individual in the future for this implementation? (assuming I can find one...)
Thank you.
This is a great project for a beginner to tackle.
First I'd say look into using a library like jquery (jquery.com). It will simplify the javascript part of this and the manual is very good.
What you're looking to do can be broken down into a few steps:
The user changes a field on the
advanced search page.
The user's
browser collects all the field
values and sends them back to the
server.
The server performs a
search with the values and returns
the number of results
The user's
browser receives the number of
results and updates the display.
Now for implementation details:
This can be accomplished with javascript events such as onchange and onfocus.
You could collect the field values into a javascript object, serialize the object to json and send it using ajax to a php page on your server.
The server page (in php) will read the json object and use the data to search, then send back the result as markup or text.
You can then display the result directly in the browser.
This may seem like a lot to take in but you can break each step down further and learn about the details bit by bit.
Hard to answer your question without knowing your level of expertise, but check out this short description of AJAX: http://blog.coderlab.us/rasmus-30-second-ajax-tutorial
If this makes some sense then your feature may be within reach "without too much trouble". If it seems impenetrable, then probably not.