I have an application that is fetching several e-commerce websites using Curl, looking for the best price.
This process returns a table comparing the prices of all searched websites.
But now we have a problem, the number of stores are starting to increase, and the loading time actually is unacceptable at the user experience side. (actually 10s pageload)
So, we decided to create a database, and start to inject all Curl filtered result inside this database, in order to reduce the DNS calls, and increase Pageload.
I want to know, despite of all our efforts, is still an advantage implement a Memcache module?
I mean, will it help even more or it is just a waste of time?
The Memcache idea was inspired by this topic, of a guy that had a similar problem: Memcache to deal with high latency web services APIs - good idea?
Memcache could be helpful, but (in my opinion) it's kind of a weird way to approach the issue. If it was me, I'd go about it this way:
Firstly, I would indeed cache everything I could in my database. When the user searches, or whatever interaction triggers this, I'd show them a "searching" page with whatever results the server currently has, and a progress bar that fills up as the asynchronous searches complete.
I'd use AJAX to add additional results as they become available. I'm imagining that the search takes about ten seconds - it might take longer, and that's fine. As long as you've got a progress bar, your users will appreciate and understand that Stuff Is Going On.
Obviously, the more searches go through your system, the more up-to-date data you'll have in your database. I'd use cached results that are under a half-hour old, and I'd also record search terms and make sure I kept the top 100 (or so) searches cached at all times.
Know your customers and have what they want available. This doesn't have much to do with any specific technology, but it is all about your ability to predict what they want (or write software that predicts for you!)
Oh, and there's absolutely no reason why PHP can't handle the job. Tying together a bunch of unrelated interfaces is one of the things PHP is best at.
Your result is found outside the bounds of only PHP. Do not bother hacking together a result in PHP when a cronjob could easily be used to populate your database and your PHP script can simply query your database.
If you plan to only stick with PHP then I suggest you change your script to index your database from the results you have populated it with. To populate the results, have a cronjob ping a PHP script that is not accessible to the users which will perform all of your curl functionality.
Related
I'm hoping to develop a LAMP application that will centre around a small table, probably less than 100 rows, maybe 5 fields per row. This table will need to have the data stored within accessed rapidly, maybe up to once a second per user (though this is the 'ideal', in practice, this could probably drop slightly). There will be a number of updates made to this table, but SELECTs will far outstrip UPDATES.
Available hardware isn't massively powerful (it'll be launched on a VPS with perhaps 512mb RAM) and it needs to be scalable - there may only be 10 concurrent users at launch, but this could raise to the thousands (and, as we all hope with these things, maybe 10,000s, but this level there will be more powerful hardware available).
As such I was wondering if anyone could point me in the right direction for a starting point - all the data retrieved will be the same for all users, so I'm trying to investigate if there is anyway of sharing this data across all users, rather than performing 10,000 identical selects a second. Soooo:
1) Would the mysql_query_cache cache these results and allow access to the data, WITHOUT requiring a re-select for each user?
2) (Apologies for how broad this question is, I'd appreciate even the briefest of reponses greatly!) I've been looking into the APC cache as we already use this for an opcode cache - is there a method of caching the data in the APC cache, and just doing one MYSQL select per second to update this cache - and then just accessing the APC for each user? Or perhaps an alternative cache?
Failing all of this, I may look into having a seperate script which handles the queries and outputs the data, and somehow just piping this one script's data to all users. This isn't a fully formed thought and I'm not sure of the implementation, but perhaps a combo of AJAX to pull the outputted data from... "Somewhere"... :)
Once again, apologies for the breadth of these question - a couple of brief pointers from anyone would be very, very greatly appreciated.
Thanks again in advance
If you're doing something like an AJAX chat which polls the server constantly, you may want to look at node.js instead, which keeps an open connection between server and browser. This way, you can have changes pushed to the user when they happen and you won't need to do all that redundant checking once per second. This can scale very well to thousands of users and is written in javascript on the server-side, so not too difficult.
The problem with using the MySQL cache is that the entire table cache gets invalidated on any write to that table. You're better off using a caching solution like memcached or APC if you're trying to control that behavior more precisely. And yes, APC would be able to cache that information.
One other thing to keep in mind is that you need to know when to invalidate the cache as well, so you don't have stale data.
You can use apc,xcache or memcache for database query caching or you can use vanish or squid for gateway caching...
I'm currently developing a Zend Framework project, using Doctrine as ORM.
I ran into the typical situation where you have to show a list of items (around 400) in a table, and of course, I don't want to show them all at once.
I've already used Zend_Paginator before (only some basic usage), but i always used to get all the items from the DB, and then paginate them, but now it doesn't feel quite right.
My question is this: is it better to get all items from DB first and then "paginate" them, or to get "pages" of items as they are requested? which would have a larger impact on performance?
For me, it is better to get a part of the data and then paginate through them.
If you get All the data from a DB you paginate with the help of JavaScript.
The first opening of the page will take a long time (for 400 rec. is OK).
Browser has a limited memory. If a user opens up a lot of tabs in the browser
and you take a lot of memory (with your data)
this will slow down the speed of the browser and the speed of your application.
You have only 400 records but the increase of the data happens very often.
At worst, the whole browser may break when the page is opened.
What if browser doesn't support JS ...
If you get part of the data from DB, the only defect is if
a user has a very slow Internet speed(but this is the defect in the first option - in the first refresh of the page).
If someone wants to get to another page, it will take a little bit longer than JavaScript.
The second option is better(for me) in the long run, because if it works it will work for years.
The database engines are usually best suited to do the retrieval for you. So, in general, if you can delegate a data-retrieval task to the DB engine instead of doing it in-memory and using your programming language, the best bet for performance is to let the DB engine do it for you.
But also remember that if you don't configure the indices correctly or don't run a good query, you won't get the best result out of your DB engine.
However, most DB engines nowadays are capable of optimizing your queries for you and running them in their most normal form.
I need help find the right caching solution for a clients site. Current site is centoOS, php, mysql, apache using smarty templates (i know they suck but it as built by someone else). The current models/methods use fairly good OO structure but there are WAY to many queries being done for some of the simple page functions. I'm looking try find some sort of caching solution but i'm a noob when it comes to this and don't know what is available that would fit the current site setup.
It is an auction type site with say 10 auctions displayed on one page at one time -- the time and current bid on each auction being updated via an ajax call returning json every 1 second (it's a penny auction site like beezid.com so updates every second are necessary). As you can see, if the site gets any sort of traffic the number of simultaneous requests could be huge. Obviously this data changes every second because the json data returned has the updated time left in the auction, and possibly updated bid amounts and bid users for each auction.
What i want is the ability to cache certain pages for a given amount of time or based on other changed variable. For example, memory caching the page that displays 10 auctions and only updating that cache copy when one of the auctions ends. Or even the script above that returns json string data every second. If i was able to cache the first request to this page in memory, serve the following requests from memory and then re-cache it again after 1 second, that could potentially reduce the serverload a lot. But i don't know if this is even possible or if the overhead of doing something like this outweights any request load savings.
I looked into xcache some but i couldn't find a way that i could set a particular cache time on a specific page or based on other variables?!? Maybe i'm missed something, but does anyone have a recommendation on a caching scheme that would work for these requirements?
Mucho thanks for any input you might have...
Cacheing can be done using many methods. Memcached springs to mind as being suited to your task. but if the site is ultra busy you may run out of ram.
When I do caching I often use a simple file cache, while it does involve at least one stat call to determine the freshness of the cached content it is still fast and marginally better than calling a sql server.
If you must call a sql server then it may pay to use a memory(heap) table to store much of the precomputed data. this technique is no more efficient than memcached, probably less so but saves you installing memcached.
DC
Zend_Cache can do what you want, and a lot more. It supports a lot of backends, including xcache and memcache, and allows you to cache data, full pages, partial pages, and well, just about anything you can imagine :p.
And in case you are wondering : you can use the Zend_Cache component by itself, you don't have to use the complete Zend framework for your application.
Currently, I have a website that people can open up during a certain team's hockey games. When the hockey team scores, a designated person clicks a button in a secure location. This updates a single entry in MySQL database with the current timestamp.
On the front-end of the website, there is an asynchronous call that runs every 15 seconds to a PHP script to query the database for that timestamp. The script then compares the current time to the timestamp pulled and if it's within 15 seconds of the current timestamp, it triggers an event on the webpage that includes playing the sound of an air horn and playing a short clip of the team's goal song.
I usually get a good amount of traffic to the sight during the team's games, however many people complain about the (up to) 15 second delay after the goal is scored for the sound to be triggered. I'd like to find a way to remedy that.
Obviously, I don't think querying the database every single second for every single users who is on the page (think 100+) is going to work; I'll likely kill my database. So, is there another way I can achieve my result? Would it be possible to place a PHP variable into the server's memory that can be pulled by each session without the negative consequences as using a database or file system read?
EDIT: My host doesn't have memcached available for me to use and I cannot install it. It's disappointing because that sounds like it would have been the optimal solution. Does anyone have an alternative idea I could look in to that doesn't use memcached?
In this situation, something like memcached (also available in an objectified form as memcache) is most likely the perfect solution, one of its design goals being "to decrease database load in dynamic web applications".
You can read more about memcached at its main web site, or simply use the links above to investigate PHP's modules.
For something like this you want to use a technique known as Comet. Its not particularly difficult, but requires a bit of effort.
Basically you'll keep a live connection open to each of the browsers, instead of having them re-open the connection every 15 seconds. This allows you to write to the connection immediately.
Google for "Comet" and "PHP" and you should find some good resources. http://www.zeitoun.net/articles/comet_and_php/start looks thorough.
http://memcached.org/ is what your looking for. This is directly made for fast ram based data access to objects, arrays, and variables in your system. Cuts out the load on MySQL as long as your doing concurrent updates.
If you're not locking, just reading, "100+" requests isn't even that heavy. Have you considered just doing a stress test?
I'm developing a web app that will access and work with large amounts of data in a MySQL database, something like a dictionary/thesaurus. I need to test the performance of the DB as its size increases, so I know how slow each request will be in the future.
Any ideas? Like are there specific tools to check DB performance for a particular query, etc?
Do you know what, specifically you're testing? Measuring "performance" is almsot always useless, unless you know exactly what it is you want.
For example, are you looking for low latency on query result retrieval? Perhaps high throughput on date retrieval? Perhaps you care more about fast insertions into the database, and less about fast query results? Perhaps you care about different things on different tables (in fact, that's almost always the case).
My advice will probably be ignored, but I'll say it anyway:
Don't optimise before you know what you want.
Don't optimise as you write the code.
When you do get around to optimising your database, make sure you optimise for the right things. Use realistic data - if you're testing dictionary-sized hunks of text, don't test with binary data (for example).
Anyway, I realise you were probably looking for a more technical answer, but hey...
You can use Maatkit's query profiler to measure impact of data amount on MySQL performances.
And generatedata.com to generate the data you need to test your app.
You can also test your application responsiveness using HTTP testing tools like :
Apache's bundled 'ab' tool (Apache Bench)
JMeter
Selenium
a good tool to use is apache's ab, which comes standard with apache httpd server. this tool can make multiple connections to a web server and benchmark its performance. while firebug is a good way to see in what order things lod, how long each item takes to load, etc., you're only seeing one user's experience. against an unloaded test server, that information can only take you so far. ab simulate multiple users connecting and will give a more realistic picture of how a particular page handles concurrent users.
which leads to me a limitation in ab: it only tests one URL. i get around this often by whipping up a simple test webpage that makes a random selection from a list of pre-defined URL's that i want to test. for example: the login page, a search result, posting a comment, and so on. ab hits the test page, and the test page simply calls one of the test URL's (possibly with a randomized paramter) and returns that page. in this manner, you get a better idea of how your whole site handles concurrent users.
PS: your OS is unanswerable. you'll have to figure that out yourself based on how your application is written, the layout of your data, the configuraiton of the web server and the database server, etc.