I am creating websites via Zend Framework and I've been wondering about this for quite some time now...
Imagine that you have created the best articles module and you usually add meta_keywords/meta_description database entries per article, so that when the view renders it populates the meta fields in question with the data entered in the administration back-end.
However, imagine the case where you have less dynamic content, and so far you are using the HeadMeta() view helper to add meta keywords/meta description to the page.
I would like to have a way to configure meta keywords/meta description or some other meta elements per page in the database. I've been scratching my head on what is the best solution over this, without adding an excessive overhead (since you would be needing to perform a query for every action that takes place in your website)
My initial thought was to save the meta attributes on a index in the database so that for each action you could quickly retrieve all relevant data. I quickly realized that parameters passed via GET/POST could change the result provided making the result-set irrelevant.
So maybe we could add the parameters as well, but you might want to ignore a few at the same time (since there is no need to take into consideration a ?page=12 parameter...). Maybe adding another varchar column in the index with the serialized parameters?
Or adding the entire URL and perform a REGEXP select instead of a regular select? (I am guessing that this would probably be the slowest solution...)
Also take note that mySQL has a limit on UTF-8 Indexes of about varchar(200) (thus one huge URL could not be saved)
Has anyone thought of a good way of solving this?
It sounds to me like the data structure is right: meta keywords/description need to be per-article.
Ultimately, you need to get articles from the db - whether it is a single article or a group of "best" articles. So seems to me that the key is to keep that maximally performant with some kind of server-side caching.
Typically I use a service class (see sample) that is configured with the db connection and also caches the data it retrieves. Cache lifetime can be set short enough so that even cached-data is "fresh enough". Alternatively, you clear/populate the cache via cron so that front-end requests always get a cache hit. Or you can set the cache lifetime to be forever and only clear/populate the cache on update in the admin side; the viability of this approach depends upon the frequency of update.
After you've got the data, it's jamming the values into the HeadMeta view-helper.
Related
im building an app in laravel (not relavant to this query i dont think).
there are a few user provided settings saved in a db table "settings".
for ease of use these values are being merged into the default config for the app (a simple php array) on boot, meaning i can just use 1 interface for accessing the values be they from the default config array, or overwritten by values in the db.
This currently means a SELECT key,value FROM settings; query is being used on boot before anything else.
im just wondering if anybody knows if this is sub optimal?
I dont anticipate more than 100 or so rows in this table, with around 60% ish being needed on every request.
Considering connection/lookup/return of a query would it be optimal to:
leave it as is and simply load all on boot
add a autoload column and only load settings with this true on boot and load the rest "on demand" (when there requested)
load every one "on demand"
as there are so few rows, and the data is two simple strings per row im guesssing just loading them all is overall as fast as loading a few and then the rest on demand which involves multiple connection requests.
Am i right or should i consider optimising via option 2/3 or another method?
For simplicity (less complicated code), I would just load them all. So what if 40%-ish aren't going to be needed for a particular request.
It's a simple SQL statement. There aren't other variations of it (with different WHERE clauses) that need to be tested.
Overall, seems like it would be less code to test. I don't see the potential savings of the other alternatives. I can't justify the additional complexity.
Well this is kind of a question of how to design a website which uses less resources than normal websites. Mobile optimized as well.
Here it goes: I was about to display a specific overview of e.g. 5 posts (from e.g. a blog). Then if I'd click for example on the first post, I'd load this post in a new window. But instead of connecting to the Database again and getting this specific post with the specific id, I'd just look up that post (in PHP) in my array of 5 posts, that I've created earlier, when I fetched the website for the first time.
Would it save data to download? Because PHP works server-side as well, so that's why I'm not sure.
Ok, I'll explain again:
Method 1:
User connects to my website
5 Posts become displayed & saved to an array (with all its data)
User clicks on the first Post and expects more Information about this post.
My program looks up the post in my array and displays it.
Method 2:
User connects to my website
5 Posts become displayed
User clicks on the first Post and expects more Information about this post.
My program connects to MySQL again and fetches the post from the server.
First off, this sounds like a case of premature optimization. I would not start caching anything outside of the database until measurements prove that it's a wise thing to do. Caching takes your focus away from the core task at hand, and introduces complexity.
If you do want to keep DB results in memory, just using an array allocated in a PHP-processed HTTP request will not be sufficient. Once the page is processed, memory allocated at that scope is no longer available.
You could certainly put the results in SESSION scope. The advantage of saving some DB results in the SESSION is that you avoid DB round trips. Disadvantages include the increased complexity to program the solution, use of memory in the web server for data that may never be accessed, and increased initial load in the DB to retrieve the extra pages that may or may not every be requested by the user.
If DB performance, after measurement, really is causing you to miss your performance objectives you can use a well-proven caching system such as memcached to keep frequently accessed data in the web server's (or dedicated cache server's) memory.
Final note: You say
PHP works server-side as well
That's not accurate. PHP works server-side only.
Have you think in saving the posts in divs, and only make it visible when the user click somewhere? Here how to do that.
Put some sort of cache between your code and the database.
So your code will look like
if(isPostInCache()) {
loadPostFromCache();
} else {
loadPostFromDatabase();
}
Go for some caching system, the web is full of them. You can use memcached or a static caching you can made by yourself (i.e. save post in txt files on the server)
To me, this is a little more inefficient than making a 2nd call to the database and here is why.
The first query should only be pulling the fields you want like: title, author, date. The content of the post maybe a heavy query, so I'd exclude that (you can pull a teaser if you'd like).
Then if the user wants the details of the post, i would then query for the content with an indexed key column.
That way you're not pulling content for 5 posts that may never been seen.
If your PHP code is constantly re-connecting to the database you've configured it wrong and aren't using connection pooling properly. The execution time of a query should be a few milliseconds at most if you've got your stack properly tuned. Do not cache unless you absolutely have to.
What you're advocating here is side-stepping a serious problem. Database queries should be effortless provided your database is properly configured. Fix that issue and you won't need to go down the caching road.
Saving data from one request to the other is a broken design and if not done perfectly could lead to embarrassing data bleed situations where one user is seeing content intended for another. This is why caching is an option usually pursued after all other avenues have been exhausted.
I want to add some static information associated with string keys to all of my pages. The individual PHP pages use some of that information filtered by a query string. Which is the better approach to add this information? Generate a 100K (or larger if more info is needed later) PHP file with an associated array or add an other DB table with this info and query that?
The first solution involves loading the 100K file every time even if I use only some of the information on the current page. The second on the other hand adds an extra database call to the rendering of every page.
Which is the less costly if there are a large number of pages? Loading a PHP file or making an extra db call?
Unless it is shown to really be a bottleneck (be it including the php file or querying the database), you should choose the option that is best maintainable.
My guess is that it is the second option. Store it in a database.
Storing it in a database is a much better plan. With the database you can provide better data constraints, more easily cross reference with other data and create strong relationships. You may or may not need that at this time, but it's a much more flexible solution in the end.
What is the data used for? I'm wondering if the data you need could be stored in a session variable/cookie once it is pulled from the database which would allow you to not query the db on the rendering of every page.
If you were to leverage a PHP file then utilizing APC or some other opcode cache will mitigate performance concerns as your PHP files will only be loaded each time the file changes.
However, as others have noted, a database is the best place to store this stuff as it is much easier to maintain (this should be your priority to begin with).
Having ensured ease of maintenance and a working application, should you require a performance boost then generally accepted practice would be to cache this static data in an in-memory key/value store such as memcached. This will give you rapid access to your static values (for most requests).
I wouldn't call this information "static".
To me, it's just a routine call to get dome information from the database, among other calls being made to assemble whole page. What I am missing?
And I do agree with Dennis, all optimizations should be based on real needs and profiling. Otherwise it's effect could be opposite.
If you want to utilize some caching, consider to implement Conditional GET for the whole page.
I'm currently developing the foundation of a an application, and looking for ways to optimize performance. My setup is based on the CakePHP framework, but I believe my question is relevant to any technology stack, as it relates to data caching.
Let's take a typical post-author relation, which is represented by 2 tables in my db. When I query the database for a specific blog post, at the same time the built-in ORM functionality in CakePHP also fetches the author of the post, comments on the post, etc. All of this is returned as one big-ass nested array, which I store in cache using a unique identifier for the concerned blog post.
When updating the blog post, it is child play to destroy the cache for the post, and have it regenerated with the next request.
But what happens when not the main entity (in this case the blog post) gets updated, but rather some of the related data? For example, a comment could be deleted, or the author could update his avatar. Are there any approaches (patterns) which I could consider for tracking updates to related data, and applying updates to my cache accordingly?
I'm curious to hear whether you've also run into similar challenges, and how you have managed to potentially overcome the hurdle. Feel free to provide an abstract perspective, if you're using another stack on your end. Your views are anyhow much appreciated, many thanks!
It is rather simple, cache entries can be
added
destroyed
You should take care of destroying cache entries when related data change (so in application layer in addition to updating the data you should destroy certain types of cached entries when you update certain tables; you keep track of dependencies by hard-coding it).
If you'd like to be smart about it you could have your cache object state their dependencies and cache the last update times for your DB tables as well.
Then you could
fetch cached data, examine dependencies,
get update times for relevant DB tables and
in case the record is stale (update time of a table that your big ass cache entry depends on is later then the time of the cache entry) drop it and get fresh data from the database.
You could even integrate the above into your persistence layer.
EDIT:
Of course the above is for when you want to have consistent cache. Sometimes, and for some data, you can relax the consistency requirements and there are scenarios where simple TTL will be good enough (for a trivial example, if you have ttl of 1 sec, you should mostly be out of trouble with users and can help data processing; and with higher times you might still be ok - for example let's say you are caching the list of country ISO codes; your application might be perfectly ok if you say let's cache this for 86400 sec).
Furthermore, you could also track the times of information presented to user, for example
let's say user has seen data A from cache and that we know that this data was created/modified at time t1
user makes changes to the data A (and makes it data B) and commits the change
the application layer can then examine if the data A is still as in DB (if the cached data upon which the user made decisions and/or changes was indeed fresh)
if it was not fresh then there is a conflict and user should confirm the changes
This has a cost of extra read of data A from DB, but it occurs only on writes.
Also, the conflict can occur not only because of the cache, but also because of multiple users trying to change the data (i.e. it is related to locking strategies).
One Approach for memcached is to use tags ( http://code.google.com/p/memcached-tag/ ). For Example, you have your Post "big-ass nested array" lets say, it inclused the autors information, the post itself and is shown on the frontpage and in some box in the sidebar. So it gets the tags: frontpage, {auhothor-id}, sidebar, {post-id} - now if someone changes the Author Information you flush every cache entry with the tag {author-id}. But thats only one Solution, and only for Cache Backends that support Tags, for example not APC (afaik). Hope That gave you an example.
To store multi language content, there is lots of content, should they be stored in the database or file? And what is the basic way to approach this, we have page content, reference tables, page title bars, metadata, etc. So will every table have additional columns for each language? So if there are 50 languages (number will keep growing as this is a woldwide social site, so eventual goal is to have as many languages as possible) then 50 extra columns per table? Or is there a better way?
There is a mixture of dynamic system and user content + static content.
Scalability and performance are important. Being developed in PHP and MySQL.
User will be able to change language on any page from the footer. Language can be either session based or preference based. Not sure what is a better route?
If you have a variable, essentially unknown today number of languages, than this definately should NOT be multiple columns in a record. Basically the search key on this table should be something like message id plus language id, or maybe screen id plus message id plus language id. Then you have a separate record for each language for each message.
If you try to cram all the languages into one record, your maintenance will become a nightmare. Every time you add another language to the app, you will have to go through every program to add "else if language=='Tagalog' then text=column62" or whatever. Make it part of the search key and then you're just reading "where messageId='Foobar' and language=current_language", and you pass the current language around. If you have a new language, nothing should have to change except adding the new language to the list of valid language codes some place.
So really the question is:
blah blah blah. Should I keep my data in flat files or a database?
Short answer is whichever you find easier to work with. Depending on how you structure it, the file based approach can be faster than the database approach. OTOH, get it wrong and performance impact will be huge. The database approach enforces more consistent structure from the start. So if you make it up as you go along, then the database approach will probably pay off in the long run.
eventual goal is to have as many languages as possible) then 50 extra columns per table?
No.
If you need to change your database schema (or the file structure) every time you add a new language (or new content) then your schema is wrong. If you don't understand how to model data properly then I'd strongly recommend the database approach for the reasons given.
You should also learn how to normalize your data - even if you ultimately choose to use a non-relational database for keeping the data in.
You may find this useful:
PHP UTF-8 cheatsheet
The article describes how to design the database for multi-lingual website and the php functions to be used.
Definitely start with a well defined model so your design doesn't care whether data comes from a file, db or even memCache or something like that. Probably best to do a single call per page to get an object that contains all the fields for that single page, rather than multiple calls. The you can just reference that single returned object to get each localised field. Behind the scenes you could then code the respository access and test. Personally I'd probably go the DB approach over a file - you don't have to worry about concurrent file access and it's probably easier to deploy changes - again you don't have to worry about files being locked by reads when you're deploying new files - just a db update.
See this link about php ioc, that might help you as that would allow you to abstract from your code what type of respository is used to hold the data. That way if you go one approach and later you want to change it - you won't have to do so much rework.
There's no reason you need to stick with one data source for all "content". There is dynamic content that will be regularly added to or updated, and then there is relatively static content that only rarely gets modified. Then there is peripheral content, like system messages and menu text, vs. primary content—what users are actually here to see. You will rarely need to search or index your peripheral content, whereas you probably do want to be able to run queries on your primary content.
Dynamic content and primary content should be placed in the database in most cases. Static peripheral content can be placed in the database or not. There's no point in putting it in the database if the site is being maintained by a professional web developer who will likely find it more convenient to just edit a .pot or .po file directly using command-line tools.
Search SO for the tags i18n and l10n for more info on implementing internationalization/localization. As for how to design a database schema, that is a subject deserving of its own question. I would search for questions on normalization as suggested by symcbean as well as look up some tutorials on database design.