is wordpress suitable for a large database querying? - php

Wonder if anyone has any advice on a project I might be picking up.
It's a bit vague at the mo, but I think basically a scientific app as in you supply a product which is then checked against a database of proteins to see if it has a reaction.
From the information I did get - this could really be any scenario, and sounds quite simple dev wise - the user on the site fills out a form which then checks in some way against a db of other similar attributes to see if it's suitable, it then gets also added to that db at the end (so the db potentially grows as each user does a check).
The part I'm wondering about is this could be potentially huge (eg 10,000 + unlimited) - so would anything be gained from building a custom php app to handle all this, as in would wordpress not be suitable for backend - and should I be catering for time intensive queries ?
Thanks for looking

No.
See ticket http://core.trac.wordpress.org/ticket/9864 (Performance issues with large number of pages). I filed this ticket 3+ years ago, and there is still no resolution. Since that time, the code in question got even more complicated, and started using a heavier version of ther internal query library.
Those severe issues are with pages, but posts are equally taxing in queries. And if you have meta-data, it starts taxing the server even further. To top this off, most leading caching plugins for WP exclude web robots, so every few weeks when google, baidu and yandex start hammering your machine for the 10k pages, it'll bring your machine down.
This just means that you can't use it natively for large content sets, but you can still use most of WordPress with additional code customizations for parts that'll be outside of the WP database/construct.
Edit: to clarify- what I was saying is that solely using WP's native database structure, as defined by this schema in the codex, and query_posts() / WP_Query() to perform the queries is the inefficiency I was referring to. The native WP storage/query system doesn't handle large volumes of pages / posts very efficiently. However, bypassing some of the native functionality will likely work fine.

Related

Cache vs storing "similar" results in database

I am in the processing of developing a video sharing site, on the video page I was displaying "similar videos" using just a database query (based on tags/category) I haven't run into any problems with this, but I was debating basically running using my custom search function to match similar videos even more closely (so its not only based on similar categories, but tags, similar words, etc..) however my fear is running this on every video view would be too much (in terms of resources, and just not being worth it since its not a major part of the site)
So I was debating doing that - but storing results (maybe store 50 and pull 6 from that 50 by id) - I can update them maybe once a week or whenever, (again since its not a major part of the site, i don't need live searching), but my question is.... is there any down or upside to this?
I'm looking specifically at cacheing the similar video results or simply saying "never mind it" and keep it based on tags. Does anyone have any experience/knowledge on how sites deal with offering similar options for something like this?
(I'm using php, mysql, built using laravel framework, search is custom class built on the back of laravel scout)
Every decision you make as a developer is a tradeoff. If you cache results, you get speed on display, but get more complexity during cache management (and probably bugs). You should decide is it worth it, as we do not know your page load time requirements (or other KPI), user load, hardware and etc.
But in general i would cache this data.

PHP application / game database - SQL vs Text files - Speed / User connections

I've just finished a basic PHP file, that lets indie game developers / application developers store user data, handle user logins, self-deleting variables etc. It all revolves around storage.
I've made systems like this before, but always hit the max_user_connections issue - which I personally can't currently change, as I use a friends hosting - and often free hosting providers limit the max_user_connections anyway. This time, I've made the system fully text file based (each of them holding JSON structures).
The system works fine currently, as it's being tested by only me and another 4/5 users per second. The PHP script basically opens a text file (based upon query arguments), uses json_decode to convert the contents into the relevant PHP structures, then alters and writes back to the file. Again, this works fine at the moment, as there are few users using the system - but I believe if two users attempted to alter a single file at the same time, the person who writes to it last will overwrite the data that the previous user wrote to it.
Using SQL databases always seemed to handle queries quite slowly - even basic queries. Should I try to implement some form of server-side caching system, or possibly file write stacking system? Or should I just attempt to bump up the max_user_connections, and make it fully SQL based?
Are there limits to the number of users that can READ text files per second?
I know game / application / web developers must create optimized PHP storage solutions all the time, but what are the best practices in dealing with traffic?
It seems most hosting companies set the max_user_connections to a fairly low number to begin with - is there any way to alter this within the PHP file?
Here's the current PHP file, if you wish to view it:
https://www.dropbox.com/s/rr5ua4175w3rhw0/storage.php
And here's a forum topic showing the queries:
http://gmc.yoyogames.com/index.php?showtopic=623357
I did plan to release the PHP file, so developers could host it on their own site, but I would like to make it work as well as possible, before doing this.
Many thanks for any help provided.
Dan.
I strongly suggest you not re-invent the wheel. There are many options available for persistent storage. If you don't want to use SQL consider trying out any of the popular "NoSQL" options like MongoDB, Redis, CouchDB, etc. Many smart people have spent many hours solving the problems you are mentioning already, and they are hard at work improving and supporting their software.
Scaling a MySQL database service is outside the scope of this answer, but if you want to throttle up what your database service can handle you need to move out of a shared hosting environment in any case.
"but I believe if two users attempted to alter a single file at the same time, the person who writes to it last will overwrite the data that the previous user wrote to it."
- that is for sure. It even throws an error if the 2nd tries to save while the first has it open.
"Are there limits to the number of users that can READ text files per second?"
- no, but it is pointless to open a file, just for read multiple times. That file needs to be cached in a content management network.
"I know game / application / web developers must create optimized PHP storage solutions all the time, but what are the best practices in dealing with traffic?"
- usually a new database will do a better job than files, starting from the fact that the most often selects are stored in the RAM, the most often .txt files are not. As #oliakaoil read about the DB difference and see what you need.

Optimising a big Wordpress site

I'm looking at optimising a rather large site I've been adding to and adding to. The database has become pretty big (maybe 100,000 posts) and it has started slowing down somewhat and giving me "Mysql has gone away"errors. I've been reading about database optimisation and have ready some people saying you should only be looking to use 1-15 queries on a page.
Do people think the suggestion that only a handful of queries should be used on any page?
Am I correct in thinking that every time I use a Wordpress function such as get_permalink() I am creating a new query and new connection the database?
I have some loops in there that literally loop through 100+ users at a time and use functions such as get_user_meta() in these loops - so would this mean I am literally making 100 database queries or are they somehow cached in Wordpress?
With issues like this, the thing to do is take the caching out of the hands of Wordpress, and make the server to the work.
Software like Wordpress and Drupal do have their own caching systems, and you should enable them, but even with them in use, there's still a certain amount of overhead for the software to load and serve the page.
So I suggest you investigate a server caching engine such as Varnish.
This will dramatically reduce the server load for most sites like yours; if you have a lot of requests for the same page over and over, Varnish will take over the caching and Wordpess will never even have to know that the page is being requested. No more loading PHP and the Wordpress core for every request, no more database session with every request.
If your back-end CMS software is starting to go slowly, this is the single most effective way of speeding it up.

Average query count for expression engine site

I just took over development of an existing EE website and am new to the cms and to blog development as well. First thing I noticed was that the site performed really poorly, so I just started doing some profiling using XDebug. I noticed that the query count is around 550. Is this normal? I know that it all comes down to what kind of queries are being run etc.. but I’m used to much lower numbers using other frameworks, but like I said: I’m new to blog development.
TLDR: What is the average ballpark query count for an EE homepage?
Thanks!
On my test install of EE2, an empty template pulls 13 queries (these have to do with sessions, tracking, grabbing the template, etc). Beyond that, there's no "average", as the amount of content can vary so widely from site-to-site.
550 queries is certainly outlandish. My guess would be that there are multiple embeds, several Channel Entries loops, and perhaps some Playa fields within those (Playa is a bit of a query monster).
I'd suggest turning on the Output Profiler to see where the load is coming from (Admin → System Administration → Output and Debugging).
Then, make sure you're making use of tag caching on your Channel Entries and other tags, and consider looking at a third-party caching solution such as CE Cache.
You can also disable some of the default tracking to save on queries (Admin → Security and Privacy → Tracking Preferences).
I've built a ton of EE sites and 500 is crazy, crazy high. With a complex build out of Structure/Matrix/Playa/ even pretty complex pages only run 200-300. And when I say "only" I mean that's way still too high.
I do think it's important to find a balance between making something delightful for your client to use and still not to processor intensive. If you are using a single template for this page (i.e. the template won't be used for a bunch of other entries) you can turn on caching and it will help substantially.
The biggest question is - what are you doing on this page? What kinds of tags/addons, etc... that would help us track it down.

Millions of Listings Mapped, 100gb of Data smoothly displayed, Advice

I've been given a big project by a big client and I've been working on it for 2 months now. I'm getting closer and closer to a solution but it's just so insanely complex that I can't quite get there, and so I need ideas.
The project is quite simple: There is a 1mil+ database of lat/lng coordinates with lots of additional data for each record. A user will visit a page and enter some search terms which will filter out quite a lot of the records. All of the records that match the filter are displayed (often clustered) on a Google Maps.
The problem with this is that the client demands it's fast, lean, and low-bandwidth. Hence, I'm stuck. What I'm currently doing is: Present the first clusters, and when they hover over a cluster, begin loading in the data for that clusters children.
However, I've upped it to 30,000 of the millions of listings and it's starting to drag a little. I've made as many optimizations that I possibly can. When the filter is changed, I AJAX a query to the DB and return all the ID's of the matches, then update the map to reflect this.
So, optimization is not an option. I need an entirely new conceptual model for this. Any input at all would be highly appreciated, as this is an incredibly complex project of which I can't find anything in history even remotely close to it- I even looked at MMORPG's which have a lot of similar problems, and I have made a few, but the concept of having a million players in one room is still something MMORPG makers cringe at. It's getting common that people think there may be bottlenecks, but let me say that it's not a case of optimizing this way. I need a new model in which a huge database stays on the server, but is displayed fluidly to the user.
I'll be awarding 500 rep as soon as it becomes available for anything that solves this.
Thanks- Daniel.
I think there are a number of possible answers to your question depending on where it is slowing down, so here goes a few thoughts.
A wider table can effect the speed with which a query is returned. Longer records mean that more disc is being accessed to get the right data, so you might want to think about limiting your initial table to hold only the information that can be filtered out. Having said that, it will also depend on the db engine you are using, some suffer more than others.
Ensuring that your tables are correctly indexed makes a HUGE difference in performance. You need to make sure that the query is using the indexes to quickly get to the records that it needs.
A friend was working with Google Maps and said that the API really suffered if too much was displayed on the maps. This might just be totally out of your control.
Having worked for Epic Games in the past, the reason that "millions of players in a room" is something to cringe at is more often hardware driven. In a game, having that number of players would grind the graphics card to a halt as it tries to render all the polygons of the models. Secondly (and likely more importantly) the problem would be that you have to send each client information about what each item/player is doing. This means that your bandwidth use will spike very heavily. Your server might handle the load, but the players internet connection might not.
I do think that you need to edit your question though with some extra information on WHAT is slowing down. Your database? Your query? Google API? The transfer of data between server and client machine?
Let's be honest here; a db with 1 million records being accessed by presumably a large amount of users, is not going to run very well unless you put some extremely powerful hardware behind it.
In this type of case, I would suggest using several different database servers, and setting up some decent load balancing regimes in order to keep them running as smoothly as possible. First and foremost, you will need to find out the "average" load you can place on a db server before it starts to lag up; let's say for example, this is 50,000 records. Setting a low MaxClients per server may assist you with server performance and preventing against crashes, but it might aggravate your users when they can't execute any queries due to high load.. but it's something to keep in mind if your budget doesn't allow for much wiggle room hardware-wise.
On the topic of hardware however, that's something you really need to take a look at. Databases typically don't use a huge amount of CPU/RAM, but they can be quite taxing on your HDD. I would recommend going for SAS or SSD before looking at other components on your setup; these will make the world of a difference for you.
As far as load balancing goes, a very common technique used for most content providers is that when one query/particular content item (such as a popular video on youtube etc) is pulling in an above average amount of traffic, you can cache its result. A quick and dirty approach to this is to use an if statement in your search bar, which will then grab a static html page instead of actually running the query.
Another approach to this is to have a seperate db server on standalone, only for running queries which are taking in an excessive amount of traffic.
With that, never underestimate your code optimisation. While the differences may seem subtle to you, when run across millions of queries by thousands of users, those tiny differences really do add up.
Best of luck with it - let me know if you need any further assistance.
Eoghan
Google has a service named "Big Query". It is a sql Server in the cloud. It uses its fast servers for sql and it can search millions of data rows quickly. Unfortunately it is not free.. but maybe it will help you out:
https://developers.google.com/bigquery/

Categories