Getting all data once for future use

Getting all data once for future use - php

Well this is kind of a question of how to design a website which uses less resources than normal websites. Mobile optimized as well.
Here it goes: I was about to display a specific overview of e.g. 5 posts (from e.g. a blog). Then if I'd click for example on the first post, I'd load this post in a new window. But instead of connecting to the Database again and getting this specific post with the specific id, I'd just look up that post (in PHP) in my array of 5 posts, that I've created earlier, when I fetched the website for the first time.
Would it save data to download? Because PHP works server-side as well, so that's why I'm not sure.
Ok, I'll explain again:
Method 1:
User connects to my website
5 Posts become displayed & saved to an array (with all its data)
User clicks on the first Post and expects more Information about this post.
My program looks up the post in my array and displays it.
Method 2:
User connects to my website
5 Posts become displayed
User clicks on the first Post and expects more Information about this post.
My program connects to MySQL again and fetches the post from the server.

First off, this sounds like a case of premature optimization. I would not start caching anything outside of the database until measurements prove that it's a wise thing to do. Caching takes your focus away from the core task at hand, and introduces complexity.
If you do want to keep DB results in memory, just using an array allocated in a PHP-processed HTTP request will not be sufficient. Once the page is processed, memory allocated at that scope is no longer available.
You could certainly put the results in SESSION scope. The advantage of saving some DB results in the SESSION is that you avoid DB round trips. Disadvantages include the increased complexity to program the solution, use of memory in the web server for data that may never be accessed, and increased initial load in the DB to retrieve the extra pages that may or may not every be requested by the user.
If DB performance, after measurement, really is causing you to miss your performance objectives you can use a well-proven caching system such as memcached to keep frequently accessed data in the web server's (or dedicated cache server's) memory.
Final note: You say
PHP works server-side as well
That's not accurate. PHP works server-side only.

Have you think in saving the posts in divs, and only make it visible when the user click somewhere? Here how to do that.

Put some sort of cache between your code and the database.
So your code will look like
if(isPostInCache()) {
loadPostFromCache();
} else {
loadPostFromDatabase();
}
Go for some caching system, the web is full of them. You can use memcached or a static caching you can made by yourself (i.e. save post in txt files on the server)

To me, this is a little more inefficient than making a 2nd call to the database and here is why.
The first query should only be pulling the fields you want like: title, author, date. The content of the post maybe a heavy query, so I'd exclude that (you can pull a teaser if you'd like).
Then if the user wants the details of the post, i would then query for the content with an indexed key column.
That way you're not pulling content for 5 posts that may never been seen.

If your PHP code is constantly re-connecting to the database you've configured it wrong and aren't using connection pooling properly. The execution time of a query should be a few milliseconds at most if you've got your stack properly tuned. Do not cache unless you absolutely have to.
What you're advocating here is side-stepping a serious problem. Database queries should be effortless provided your database is properly configured. Fix that issue and you won't need to go down the caching road.
Saving data from one request to the other is a broken design and if not done perfectly could lead to embarrassing data bleed situations where one user is seeing content intended for another. This is why caching is an option usually pursued after all other avenues have been exhausted.

Related

PHP - MySQL call or JSON static file for unfrequently updated information

I've got a heavy-read website associated to a MySQL database. I also have some little "auxiliary" information (fits in an array of 30-40 elements as of now), hierarchically organized and yet gets periodically and slowly updated 4-5 times per year. It's not a configuration file though since this information is about the subject of the website and not about its functioning, but still kind of a configuration file. Until now, I just used a static PHP file containing an array of info, but now I need a way to update it via a backend CMS from my admin panel.
I thought of a simple CMS that allows the admin to create/edit/delete entries, periodical rare job, and then creates a static JSON file to be used by the page building scripts instead of pulling this information from the db.
The question is: given the heavy-read nature of the website, is it better to read a rarely updated JSON file on the server when building pages or just retrieve raw info from the database for every request?

I just used a static PHP
This sounds like contradiction to me. Either static, or PHP.
given the heavy-read nature of the website, is it better to read a rarely updated JSON file on the server when building pages or just retrieve raw info from the database for every request?
Cache was invented for a reason :) Same with your case - it all depends on how often data changes vs how often is read. If data changes once a day and remains static for 100k downloads during the day, then not caching it or not serving from flat file would would simply be stupid. If data changes once a day and you have 20 reads per day average, then perhaps returning the data from code on each request would be less stupid, but from other hand, all these 19 requests could be served from cache anyway, so... If you can, serve from flat file.

Caching is your best option, Redis or Memcached are common excellent choices. For flat-file or database, it's hard to know because the SQL schema you're using, (as in, how many columns, what are the datatype definitions, how many foreign keys and indexes, etc.) you are using.
SQL is about relational data, if you have non-relational data, you don't really have a reason to use SQL. Most people are now switching to NoSQL databases to handle this since modifying SQL databases after the fact is a huge pain.

What php function to use to pull data from MySQL periodically and display it on website?

I'm running a server and website for an online game.
I need to pull information from the mySQL database containing player names and messages and display it on our website.
I managed to do that with a php script which directly pulls the info from the MySQL database whenever someone opens the page.
However, this puts unnecessary load on the server.
I need to achieve so that the function pulls the data only at certain intervals, for example, every hour.
I have no idea how to do this and what to even look for in terms of functions.
Could anyone show me the right direction?

If you're worried about the load being placed on the call being made every time the user opens up the site, perhaps first you should look into caching. Something like APC or memcached, that way there isn't an actual DB lookup, it just returns the same data that it grabbed last time You just set the period that you want the cache to hold the data so that in time it will grab a fresh copy, in your case this would be an hour.
There's a bunch of questions on APC, memcached etc on stack overflow that should be able to help you out, e.g., The best way of PHP Caching

Caching debate/forum entries in PHP

Just looking for a piece of advice. On one of our webpages we have a debate/forum site. Everytime a user request the debate page, he/she will get a list of all topics (and their count of answers etc.).
Too when the user request a specific topic/thread, all answers to the thread will be shown to the user a long with username, user picture, age, number of totalt forum-posts from the poster of the answer.
All content is currently retrieved by using an MySQL-query everytime the page is accessed. But this is however starting to get painfully slow (especially with large threads, +3000 answers).
I would like to cache the debate entries somehow, to speed up this proces. However the problem is, that if I cache the entries it self, number of post etc. (which is dynamic, of course), will not always be up to date.
Is there any smart way of caching the pages/recaching them when stuff like this is updated? :)
Thanks in advance,
fischer

You should create a tag or a name for the cache based on it's data.
For example for the post named Jake's Post you could create an md5 of the name, this would give you the tag 49fec15add24931728652baacc08b8ee.
Now cache the contents and everything to do with this post against the tag 49fec15add24931728652baacc08b8ee. When the post is updated or a comment is added go to the cache and delete everything associated with 49fec15add24931728652baacc08b8ee.
Now there is no cache and it will be rebuilt when the next visitors arrives to new the post.
You could break this down further by having multiple tags per post. E.g you could have a tag for comments and answers, when a comment is added delete the comments tag, but not the answers tag. This reduces the work the server has to do when rebuilding the cache as only the comments are now missing.
There are number of libraries and frameworks that can aid you in doing this.
Jake
EDIT
I'd use files to store the data, more specifically the HTML output of the page. You can then do something like:
if(file_exists($tag))
{
// Load the contents of the cache file here and output it
}
else
{
// Do complex database look up and cache the file for later
}
Remember that frameworks like Zend have this sort of stuff built in. I would seriously considering using a framework.

Interesting topic!
The first thing I'd look at is optimizing your database - even if you have to spend money upgrading the hardware, it will be significantly easier and cheaper than introducing a cache - fewer moving parts, fewer things that can go wrong...
If you can't squeeze more performance out of your database, the next thing I'd consider is de-normalizing the data a little. For instance, maintain a "reply_count" column, rather than counting the replies against each topic. This is ugly, but introduces fewer opportunities for things to go wrong - with a bit of luck, you can localize all the logic in your data access layer.
The next option I'd consider is to cache pages. For instance, just caching the "debate page" for 30 seconds should dramatically reduce the load on your database if you've got reasonable levels of traffic, and even if it all goes wrong, because you're caching the entire page, it will sort itself out the next time the page goes stale. In most situations, caching an entire page is okay - it's not the end of the world if a new post has appeared in the last 30 seconds and you don't see it on your page.
If you really have to provide more "up to date" content on the page, you might introduce caching at the database access level. I have, in the past, built a database access layer which cached the results of SQL queries based on hard-wired logic about how long to cache the results. In our case, we built a function to call the database which allowed you to specify the query (e.g. get posts for user), an array of parameters (e.g. username, date-from), and the cache duration. The database access function would cache results for the cache duration based on the query and the parameters; if the cache duration had expired, it would refresh the cache.
This scheme was fairly bug-proof - as an end user, you'd rarely notice weirdness due to caching, and because we kept the cache period fairly short, it all sorted itself out very quickly.
Building up your page by caching snippets of content is possible, but very quickly becomes horribly complex. It's very easy to create a page that makes no sense to the end user due to the different caching policies - "unread posts" doesn't add up to the number of posts in the breakdown because of different caching policies between "summary" and "detail".

What are the number of ways in which my approach to a news-feed is wrong?

This question has been asked a THOUSAND times... so it's not unfair if you decide to skip reading/answering it, but I still thought people would like to see and comment on my approach...
I'm building a site which requires an activity feed, like FourSquare.
But my site has this feature for the eye-candy's sake, and doesn't need the stuff to be saved forever.
So, I write the event_type and user_id to a MySQL table. Before writing new events to the table, I delete all the older, unnecessary rows (by counting the total number of rows, getting the event_id lesser than which everything is redundant, and deleting those rows). I prune the table, and write a new row every time an event happens. There's another user_text column which is NULL if there is no user-generated text...
In the front-end, I have jQuery that checks with a PHP file via GET every x seconds the user has the site open. The jQuery sends a request with the last update "id" it received. The <div> tags generated by my backend have the "id" attribute set as the MySQL row id. This way, I don't have to save the last_received_id in memory, though I guess there's absolutely no performance impact from storing one variable with a very small int value in memory...
I have a function that generates an "update text" depending on the event_type and user_id I pass it from the jQuery, and whether the user_text column is empty. The update text is passed back to jQuery, which appends the freshly received event <div> to the feed with some effects, while simultaneously getting rid of the "tail end" event <div> with an effect.
If I (more importantly, the client) want to, I can have an "event archive" table in my database (or a different one) that saves up all those redundant rows before deleting. This way, event information will be saved forever, while not impacting the performance of the live site...
I'm using CodeIgniter, so there's no question of repeated code anywhere. All the pertinent functions go into a LiveUpdates class in the library and model respectively.
I'm rather happy with the way I'm doing it because it solves the problem at hand while sticking to the KISS ideology... but still, can anyone please point me to some resources, that show a better way to do it? A Google search on this subject reveals too many articles/SO questions, and I would like to benefit from the experience any other developer that has already trawled through them and found out the best approach...

If you use proper indexes there's no reason you couldn't keep all the events in one table without affecting performance.
If you craft your polling correctly to return nothing when there is nothing new you can minimize the load each client has on the server. If you also look into push notification (the hybrid delayed-connection-closing method) this will further help you scale big successfully.
Finally, it is completely unnecessary to worry about variable storage in the client. This is premature optimization. The performance issues are going to be in the avalanche of connections to the web server from many users, and in the DB, tables without proper indexes.
About indexes: An index is "proper" when the most common query against a table can be performed with a seek and a minimal number of reads (like 1-5). In your case, this could be an incrementing id or a date (if it has enough precision). If you design it right, the operation to find the most recent update_id should be a single read. Then when your client submits its ajax request to see if there is updated content, first do a query to see if the value submitted (id or time) is less than the current value. If so, respond immediately with the new content via a second query. Keeping the "ping" action as lightweight as possible is your goal, even if this incurs a slightly greater cost for when there is new content.
Using a push would be far better, though, so please explore Comet.
If you don't know how many reads are going on with your queries then I encourage you to explore this aspect of the database so you can find it out and assess it properly.
Update: offering the idea of clients getting a "yes there's new content" answer and then actually requesting the content was perhaps not the best. Please see Why the Fat Pings Win for some very interesting related material.

Static web page vs MySql generated

I have a website that let's each user create a webpage (to advertise his product). Once the page is created it will never be modified again.
Now, my question: Is it better to keep the page content (only a few parts are editable) into a MySql database and generate it using queries everytime the page is accesed or to create a static webpage containing all the info and store it onto the server?
If I store every page on the disk, I may reach like 200.000 files.
If I store each page in MySQL database I would have to make a query each time the page is requested, and for like 200.000 entries and 5-6 queries/second I think the website will be slow...
So what's better?

MySQL will be able to handle the load if you create the tables properly (normalized and indexed). But if the content of the page doesn't change after creation, it's better if you cache the page statically. You can organize the files into buckets (folders) so that one folder doesn't have too many files in it.
Remember to cache only the content areas and not the templates. Unless each user has complete control over how his/her page shows up.

200.000 files writable by the Apache process is not a good idea.
I recommend using a database.
Database imports/exports are easier, not telling about the difference between the maintenance costs.
Databases are using caching, and if nothing is changed, they will pull up the last result, without running the query again. This doesn't stand, thanks JohnP.

If you want to redesign your webpage sometimes later you must be using MySQL to store the pages as you can't really change them (unless you dig into regexp) after making them static.
About the time issue - its not an issue if you set indexes right.

if the data is small to moderate then prefer static hardcoding ie. putting the data in the HTML, but if it is huge, computational or dynamic and changing you have no option but to use a connectivity to the Database

I believe that proper caching technique with certain attributes (long exp. time) would be better than static pages or retrieving everything from mysql everytime.

Static content is usually a good thing if you have a lot of traffic, but 5-6 queries a second is not hard for the database at all, so with your current load it doesn't matter.
You can spread the static files to different directories by file name and set up rewrite rules in your web server (mod_rewrite on Apache, basic location matching with regexp on Nginx and similar on other web servers). That way you won't even have to invoke the PHP interpreter.

A database and proper caching. 200.000 pages times, what? 5KB? That's 1 GB. Easy to keep in RAM. Besides 5/6 queries per second is easy on a database. Program first, then benchmark.
// insert quip about premature optimisation

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.