Cache vs storing "similar" results in database - php

I am in the processing of developing a video sharing site, on the video page I was displaying "similar videos" using just a database query (based on tags/category) I haven't run into any problems with this, but I was debating basically running using my custom search function to match similar videos even more closely (so its not only based on similar categories, but tags, similar words, etc..) however my fear is running this on every video view would be too much (in terms of resources, and just not being worth it since its not a major part of the site)
So I was debating doing that - but storing results (maybe store 50 and pull 6 from that 50 by id) - I can update them maybe once a week or whenever, (again since its not a major part of the site, i don't need live searching), but my question is.... is there any down or upside to this?
I'm looking specifically at cacheing the similar video results or simply saying "never mind it" and keep it based on tags. Does anyone have any experience/knowledge on how sites deal with offering similar options for something like this?
(I'm using php, mysql, built using laravel framework, search is custom class built on the back of laravel scout)

Every decision you make as a developer is a tradeoff. If you cache results, you get speed on display, but get more complexity during cache management (and probably bugs). You should decide is it worth it, as we do not know your page load time requirements (or other KPI), user load, hardware and etc.
But in general i would cache this data.

Related

is wordpress suitable for a large database querying?

Wonder if anyone has any advice on a project I might be picking up.
It's a bit vague at the mo, but I think basically a scientific app as in you supply a product which is then checked against a database of proteins to see if it has a reaction.
From the information I did get - this could really be any scenario, and sounds quite simple dev wise - the user on the site fills out a form which then checks in some way against a db of other similar attributes to see if it's suitable, it then gets also added to that db at the end (so the db potentially grows as each user does a check).
The part I'm wondering about is this could be potentially huge (eg 10,000 + unlimited) - so would anything be gained from building a custom php app to handle all this, as in would wordpress not be suitable for backend - and should I be catering for time intensive queries ?
Thanks for looking
No.
See ticket http://core.trac.wordpress.org/ticket/9864 (Performance issues with large number of pages). I filed this ticket 3+ years ago, and there is still no resolution. Since that time, the code in question got even more complicated, and started using a heavier version of ther internal query library.
Those severe issues are with pages, but posts are equally taxing in queries. And if you have meta-data, it starts taxing the server even further. To top this off, most leading caching plugins for WP exclude web robots, so every few weeks when google, baidu and yandex start hammering your machine for the 10k pages, it'll bring your machine down.
This just means that you can't use it natively for large content sets, but you can still use most of WordPress with additional code customizations for parts that'll be outside of the WP database/construct.
Edit: to clarify- what I was saying is that solely using WP's native database structure, as defined by this schema in the codex, and query_posts() / WP_Query() to perform the queries is the inefficiency I was referring to. The native WP storage/query system doesn't handle large volumes of pages / posts very efficiently. However, bypassing some of the native functionality will likely work fine.

How to deal with External API latency

I have an application that is fetching several e-commerce websites using Curl, looking for the best price.
This process returns a table comparing the prices of all searched websites.
But now we have a problem, the number of stores are starting to increase, and the loading time actually is unacceptable at the user experience side. (actually 10s pageload)
So, we decided to create a database, and start to inject all Curl filtered result inside this database, in order to reduce the DNS calls, and increase Pageload.
I want to know, despite of all our efforts, is still an advantage implement a Memcache module?
I mean, will it help even more or it is just a waste of time?
The Memcache idea was inspired by this topic, of a guy that had a similar problem: Memcache to deal with high latency web services APIs - good idea?
Memcache could be helpful, but (in my opinion) it's kind of a weird way to approach the issue. If it was me, I'd go about it this way:
Firstly, I would indeed cache everything I could in my database. When the user searches, or whatever interaction triggers this, I'd show them a "searching" page with whatever results the server currently has, and a progress bar that fills up as the asynchronous searches complete.
I'd use AJAX to add additional results as they become available. I'm imagining that the search takes about ten seconds - it might take longer, and that's fine. As long as you've got a progress bar, your users will appreciate and understand that Stuff Is Going On.
Obviously, the more searches go through your system, the more up-to-date data you'll have in your database. I'd use cached results that are under a half-hour old, and I'd also record search terms and make sure I kept the top 100 (or so) searches cached at all times.
Know your customers and have what they want available. This doesn't have much to do with any specific technology, but it is all about your ability to predict what they want (or write software that predicts for you!)
Oh, and there's absolutely no reason why PHP can't handle the job. Tying together a bunch of unrelated interfaces is one of the things PHP is best at.
Your result is found outside the bounds of only PHP. Do not bother hacking together a result in PHP when a cronjob could easily be used to populate your database and your PHP script can simply query your database.
If you plan to only stick with PHP then I suggest you change your script to index your database from the results you have populated it with. To populate the results, have a cronjob ping a PHP script that is not accessible to the users which will perform all of your curl functionality.

Pagination - Get all items from DB and then paginate, or get "pages" of items?

I'm currently developing a Zend Framework project, using Doctrine as ORM.
I ran into the typical situation where you have to show a list of items (around 400) in a table, and of course, I don't want to show them all at once.
I've already used Zend_Paginator before (only some basic usage), but i always used to get all the items from the DB, and then paginate them, but now it doesn't feel quite right.
My question is this: is it better to get all items from DB first and then "paginate" them, or to get "pages" of items as they are requested? which would have a larger impact on performance?
For me, it is better to get a part of the data and then paginate through them.
If you get All the data from a DB you paginate with the help of JavaScript.
The first opening of the page will take a long time (for 400 rec. is OK).
Browser has a limited memory. If a user opens up a lot of tabs in the browser
and you take a lot of memory (with your data)
this will slow down the speed of the browser and the speed of your application.
You have only 400 records but the increase of the data happens very often.
At worst, the whole browser may break when the page is opened.
What if browser doesn't support JS ...
If you get part of the data from DB, the only defect is if
a user has a very slow Internet speed(but this is the defect in the first option - in the first refresh of the page).
If someone wants to get to another page, it will take a little bit longer than JavaScript.
The second option is better(for me) in the long run, because if it works it will work for years.
The database engines are usually best suited to do the retrieval for you. So, in general, if you can delegate a data-retrieval task to the DB engine instead of doing it in-memory and using your programming language, the best bet for performance is to let the DB engine do it for you.
But also remember that if you don't configure the indices correctly or don't run a good query, you won't get the best result out of your DB engine.
However, most DB engines nowadays are capable of optimizing your queries for you and running them in their most normal form.

Need a php caching recommendation

I need help find the right caching solution for a clients site. Current site is centoOS, php, mysql, apache using smarty templates (i know they suck but it as built by someone else). The current models/methods use fairly good OO structure but there are WAY to many queries being done for some of the simple page functions. I'm looking try find some sort of caching solution but i'm a noob when it comes to this and don't know what is available that would fit the current site setup.
It is an auction type site with say 10 auctions displayed on one page at one time -- the time and current bid on each auction being updated via an ajax call returning json every 1 second (it's a penny auction site like beezid.com so updates every second are necessary). As you can see, if the site gets any sort of traffic the number of simultaneous requests could be huge. Obviously this data changes every second because the json data returned has the updated time left in the auction, and possibly updated bid amounts and bid users for each auction.
What i want is the ability to cache certain pages for a given amount of time or based on other changed variable. For example, memory caching the page that displays 10 auctions and only updating that cache copy when one of the auctions ends. Or even the script above that returns json string data every second. If i was able to cache the first request to this page in memory, serve the following requests from memory and then re-cache it again after 1 second, that could potentially reduce the serverload a lot. But i don't know if this is even possible or if the overhead of doing something like this outweights any request load savings.
I looked into xcache some but i couldn't find a way that i could set a particular cache time on a specific page or based on other variables?!? Maybe i'm missed something, but does anyone have a recommendation on a caching scheme that would work for these requirements?
Mucho thanks for any input you might have...
Cacheing can be done using many methods. Memcached springs to mind as being suited to your task. but if the site is ultra busy you may run out of ram.
When I do caching I often use a simple file cache, while it does involve at least one stat call to determine the freshness of the cached content it is still fast and marginally better than calling a sql server.
If you must call a sql server then it may pay to use a memory(heap) table to store much of the precomputed data. this technique is no more efficient than memcached, probably less so but saves you installing memcached.
DC
Zend_Cache can do what you want, and a lot more. It supports a lot of backends, including xcache and memcache, and allows you to cache data, full pages, partial pages, and well, just about anything you can imagine :p.
And in case you are wondering : you can use the Zend_Cache component by itself, you don't have to use the complete Zend framework for your application.

Realtime MySQL search results on an advanced search page

I'm a hobbyist, and started learning PHP last September solely to build a hobby website that I had always wished and dreamed another more competent person might make.
I enjoy programming, but I have little free time and enjoy a wide range of other interests and activities.
I feel learning PHP alone can probably allow me to create 98% of the desired features for my site, but that last 2% is awfully appealing:
The most powerful tool of the site is an advanced search page that picks through a 1000+ record game scenario database. Users can data-mine to tremendous depths - this advanced page has upwards of 50 different potential variables. It's designed to allow the hardcore user to search on almost any possible combination of data in our database and it works well. Those who aren't interested in wading through the sea of options may use the Basic Search, which is comprised of the most popular parts of the Advanced search.
Because the advanced search is so comprehensive, and because the database is rather small (less than 1,200 potential hits maximum), with each variable you choose to include the likelihood of getting any qualifying results at all drops dramatically.
In my fantasy land where I can wield AJAX as if it were Excalibur, my users would have a realtime Total Results counter in the corner of their screen as they used this page, which would automatically update its query structure and report how many results will be displayed with the addition of each variable. In this way it would be effortless to know just how many variables are enough, and when you've gone and added one that zeroes out the results set.
A somewhat similar implementation, at least visually, would be the Subtotal sidebar when building a new custom computer on IBuyPower.com
For those of you actually still reading this, my question is really rather simple:
Given the time & ability constraints outlined above, would I be able to learn just enough AJAX (or whatever) needed to pull this one feature off without too much trouble? would I be able to more or less drop-in a pre-written code snippet and tweak to fit? or should I consider opening my code up to a trusted & capable individual in the future for this implementation? (assuming I can find one...)
Thank you.
This is a great project for a beginner to tackle.
First I'd say look into using a library like jquery (jquery.com). It will simplify the javascript part of this and the manual is very good.
What you're looking to do can be broken down into a few steps:
The user changes a field on the
advanced search page.
The user's
browser collects all the field
values and sends them back to the
server.
The server performs a
search with the values and returns
the number of results
The user's
browser receives the number of
results and updates the display.
Now for implementation details:
This can be accomplished with javascript events such as onchange and onfocus.
You could collect the field values into a javascript object, serialize the object to json and send it using ajax to a php page on your server.
The server page (in php) will read the json object and use the data to search, then send back the result as markup or text.
You can then display the result directly in the browser.
This may seem like a lot to take in but you can break each step down further and learn about the details bit by bit.
Hard to answer your question without knowing your level of expertise, but check out this short description of AJAX: http://blog.coderlab.us/rasmus-30-second-ajax-tutorial
If this makes some sense then your feature may be within reach "without too much trouble". If it seems impenetrable, then probably not.

Categories