I am in the process of writing an API for my website, along with a pretty large class to process all of the API requests.
Most pages, if not all all the pages on the website will send at least one request to the api on load. The most important priority for this website is efficiency and resultantly, very quick server processing.
I am therefore seeking a little bit of advice when it comes to classes and certain PHP functions.
Firstly, it looks like the class I am writing will probably end up being around 3000 lines of code. If this is initialised on each page, ignoring the fact that only one or two of the functions within the class will be used per page, will this make the API much slower? Should I be looking at separate files with extensions to the class for each class method?
Secondly, I currently have all of my connections to the various databases in separate files within a directory. Within each connection is the function mysql_pconnect(). At the moment, I only require these files when needed. So if the method needs a connection to the x database, then I simply place require(connection...) into the method. Is it bad to include files within a class?
I am asking, because the only other solution is to require all of the connections at the top of the page so that the class can access them without requiring them for each method. This is why I would like to know whether having several connections at once, even if they are not used, is taxing on the server?
So three questions really:
Does initiating a large class at the beginning of each page slow down the script runtime, even if only one class method is being used? Is this why people use 'class extends class'?
Is it bad to 'require()' files within a class?
Do unused connections to a mysql database slow down the runtime of a script?
No, an unused MySQL connection won't consume much (if any) cpu time, though it will occupy a bit of memory to handle the various bits of 'state' which have to be maintained on a per-connection basis.
However, note that MySQL's connection protocol is actually fairly "light-weight". Maintaining a pool of persistent connections sounds attractive, but the cost of establishing a new connection is already very low anyways.
Persistent connections are a quick fix to solving connection overhead, but they do bring in issues. The worst one being abandoned connections can leave the connections in an indeterminate state (in-progress transactions, changed server variables/configurations, etc...) and you can quite easily create inadvertent deadlocks unless you're very careful.
If you are writing an api to serve data from your db to multiple 3rd parties you are better off not letting them query the db, not even through a webservice (unless you have second-to-second db mutations that these 3rd parties need immediately)
Better way would be to write xml file(s) to a protected location and use a webservice to auth a 3rd party and serve the xml file(s).
Then update the xml periodically.
Related
On my website, when an user opens his profile or any other page, (almost all pages use data from mysql), my website makes around 50 connections to mysql in loading the page. This is the case on my development server, running elementary OS.
When I came to know about persistent connections in mysql, I was puzzled. If I am to run this website on a VPS (with low RAM in starting), and considering the overhead produced by a large number of mysql connections, will using persistent connections improve my website's performance?
Currently, I start and end a connection in every function. Is there a better way to connect to mysql?
And, taking into account that if 100 users are using my website simultaneously, what will be the performance if each page makes around 50-60 connections?
You asked so I will answer. You are doing this incorrectly. You should start the processing on each page request (each externally-accessible .php file) by using a common function to establish a single database connection, then you should reuse that connection.
You are getting away with this because you're probably using an automatic connection pool built in to your php database-access library, and because you have not yet scaled up your application.
You won't be able to scale this up very far using this multiconnection strategy because it will perform very badly indeed when you add users.
There are lots of examples of working open-source php-based web app systems you can look at. WordPress is an example. You'll find that most of them start by opening a database connection and storing its handle in a global variable.
You asked:
According to CBroe's comment, I changed my strategy. Actually, I am
using multiple database connections, but the functions are same (don't
ask why lol). So if I open connections on start, and then pass the
handler to the function, will that be an improvement?
Yes, that will be fine. You need to avoid churning connections to get best performance.
If you need connections to more than one different database, you can open them all. But it sounds like you need to hit just one database.
PHP doesn't have significant overhead when passing a handler to a function, so don't worry about that.
As explained wonderfully by Ollie Jones, I opened the connection on start, and my connections dropped from 50-60 per page to 1 per page. Although I don't see any change in performance on my local development server, this will surely we a great improvement when its on a live server. There is no need for me to use persistent connections yet.
I have been asked to re-develop an old php web app which currently uses mysql_query functions to access a replicated database (4 slaves, 1 master).
Part of this redevelopment will move some of the database into a mysql-cluster. I usually use PDO to access databases these days and I am trying to find out whether or not PDO will play nicely with a cluster, but I can't find much useful information on the web.
Does anyone have any experience with this? I have never worked with a cluster before ...
I've done this a couple different ways with different levels of success. The short answer is that your PDO connections should work fine. The options, as I see them, are as follows:
If you are using replication, then either write a class that handles connections to various servers or use a proxy. The proxy may be a hardware or a software. MySQL Proxy (http://docs.oracle.com/cd/E17952_01/refman-5.5-en/mysql-proxy.html) is the software load balancer I used to use and for the most part it did the trick. It automatically routes traffic between your readers and writers, and handles failover like a champ. Every now and then we'd write a query that would throw it off and have to tweak things, but that was years ago. It may be in better shape now.
Another option is to use a standard load balancer and create two connections - one for the writer and the other for the readers. Your app can decide which connection to use based on the function it's trying to perform.
Finally, you could consider using the max db cluster available from MySQL. In this setup, the MySQL servers are all readers AND writers. You only need one connection, but you'll need a load balancer to route all of the traffic. Max db cluster can be tricky if the indexes become out of sync, so tread lightly if you go with this option.
Clarification: When I refer to connections what I mean is an address and port to connect to MySQL on - not to be confused with concurrent connections running on the same port.
Good luck!
Have you considered hiding the cluster behind a hardware or software load balancer (e.g. HAProxy)? This way, the client code doesn't need to deal with the cluster at all, it sees the cluster as just one virtual server.
You still need to distinguish applications that write from those that read. In our system, we put the slave servers behind the load balancer, and read-only applications use this cluster, while writing applications access the master server directly. We don't try to make this happen automatically; applications that need to update the database simply use a different server hostname and username.
Write a wrapper class for the DB that has your connect and query functions in it...
The query function needs to look at the very first word to detect if it's a SELECT and use the slave DB connection, anything else (INSERT, UPDATE, RENAME, CREATE etc...) needs to go the MASTER server.
The connect() function would look at the array of slaves and pick a random one to use.
You should only connect to the master slave when you need to do an update (Most webpages shouldn't be updating the DB, only reading data... make sure you don't waste time connecting to the MASTER db when you won't use it)
You can also use a static variable in your class to hold your DB connections, that way connections are shared between instances of your DB class (i.e. you only have to open the DB connection once instead of everytime you call '$db = new DB()')
Abstracting the database functions into a class like this also makes it easier to debug or add features
I'm assuming that for every page request, the webserver (eg. Apache) creates a new instance of a script in memory. Can these instances communicate with each other while running? and pass data too?
If you want to pass data between scripts in PHP I suggest using either memcached or a database. Or possibly APC.
If the scripts belong to the same session, they could theoretically communicate via the session but this would be effectively a one-way communication in most cases because only one script can access the session at any one time (session_start() locks the session until that script ends the session implicitly or explicitly).
I believe Martin and Cletus' suggestions are valid. My choice would be function of the end goal of the script.
How much data will you be throwing around? Can you handle the overhead of an external process?
What kind of data are you exchanging? Is it normalized? Or is it now worth normalizing?
Will you need to refer to that data later on? Or can it be discarded after being processed?
Will those scripts ever run on different servers?
Flat files, with a locking mechanism
Relational DB
Document DB (key/value store, whether persistent or not)
Shared memory (APC, or core functions)
Message queues (Active MQ and company)
I think you'll get the most value by externalizing the process as you can have more than one machine managing the messages/data and more than one producing/consuming them.
The model that PHP scripts operate off of doesn't really contain the notion of any kind of persistence in memory for those scripts, since generally they're designed to only run for the minimum of time required to serve the requested page. This would make it hard to have any meaningful use for stateful communication between those scripts, since typically once the page is served there's nothing more for the script to do. Thus usually any communication between PHP scripts is done more through manipulation of database entries and the like.
If you have some sort of continual processing that should be happening for which you'd want to be passing data around, you might want to look into other web application models such as servlets.
You should be able to do this with some shared memory, as described here: http://blog.taragana.com/index.php/archive/how-to-use-shared-memory-in-php/ (assuming you're not running on Windows)
I'm trying to figure out the best way to minimize resource utilization when I have PHP talking to various backend services (e.g. Amazon S3 or any other random web services -- I'd like a general solution). Ideally, I'd like to have a single persistent connection to the backend (or maybe a small pool of persistent connections) with some caching, and then have all of the PHP tasks share it. We can consider it all read-only for the purposes of this question. It's not obvious to me how to do this in PHP. There's the database-specific stuff like mysql_pconnect(), but that doesn't really do it for me.
One idea I've had, which seems seems somewhat suboptimal (but is still better than having every single request create and destroy a new connection) is to use a local caching proxy (in a separate process) that would effectively do the pooling and caching. PHP would still be opening and closing a connection for every request, but at least it would be to a local process, so it should be a little faster (and it would reduce load on the backends). But it doesn't seem like this kind of craziness should be necessary. There's gotta be a better way. This is easy in other languages. Please tell me what I'm missing!
There's a large ideological disconnect between the various web technologies. Some are essentially daemons that run full-time in the background, and handle requests passed in on their own. Because there's a process always running, you can have a pool of already open existing working connections.
PHP (and normal CGI scripts) does not have a daemon behind the scenes. Every time a request comes in, the PHP interpreter is started up with a clean slate, compiles the scripts, and runs the bytecode. There's no persistence. The PHP database functions that support persistent connections establish the connection at the web server child level (i.e. mod_php attached to an Apache process). This isn't exactly a connection pool, as you can only ever see the persistent connection attached to your own process.
Without having a daemon or similar process sitting behind the scenes to hand out resources, you won't get real connection pooling.
Keep in mind that most new connections to most services are not heavy-weight, and non-database connections that are heavy-weight might not be friendly to the concept of a connection pool.
Before you think about writing your own PHP-based daemon to handle stuff like this, keep in mind that it may already be a solved problem. Python came up with something called WSGI, with a similar implementation in Ruby called Rack. Perl also has something remarkably similar but I can't remember the name of it off the top of my head. A quick look at Google didn't show any PHP implementations of WSGI, but that doesn't mean they don't exist...
Because S3 and other webservices use HTTP as their transport, you won't get a significant benefit from caching the connection.
Although you may be using an API that appears to authenticate as a first step, looking at the S3 Documentation, the authentication happens with every request - so no benefit in authenticating once and reusing a connection
Web service requests over HTTP are lightweight and typically stateless. Once your request has been answered, no resources (connection or sesson state) are consumed on the server. This allows the web service implementer to use many machines to answer your request without tying up resources on a particular server
I am new to php, but in other web technologies, you can share objects between page instances. For example, in java jsp pages you easily have on class that exist as static class for the whole server instance. How to do this in php?
I am not refering to sessions variables (at least I don't think so). This is more for the purpose of resource pooling (perhaps a socket to share, or database connections etc). So a whole class needs to be shared between subsequent loads, not just some primitive variables that I can store in the session.
I have also looked into doing php singleton classes but I believe that class is only shared within the same page and not across pages.
To make things even more clear, I'm looking for something that can help me share, say, a socket connected to a server for a connectSocket.php page such that all users who loads that page uses the same socket and does not open a new one.
This is a bit of a difficult answer, and might not be exactly what you are looking for.
PHP is built upon a 'shared-nothing' architecture. If you require some type of state across your application, you must do this through other means.
First I would recommend looking into the core of the problem.. Do you really need it? If you assume the PHP application could die (and lose state) is it ok to lose the data?
If you must maintain the state, even after the application dies or otherwise, you should assume probably the best place to put the data is in MySQL. PHP is intended as a thin layer around your business logic, so I can highly recommend this.
If you don't care about losing the data after a restart, the problem domain you're looking for is probably caching. I would recommend looking into memcached or if you're on a single machine, apc. APC will definitely work for you with Apache on a single machine, but you will still have to code your application assuming you might lose the data.
If you're worried your underlying datastore (MySQL) is too slow, but you still need to maintain the data after a restart, you should look into a combination of these 2 systems. You can always push and pull your data from the cache, but only when it updates send it over to Mysql.
If the data is purely user or session-bound, you probably want to just looking into the sessions system.
I've personally developed a reasonably large multi-tenant application, and although its a pretty complex application, I've never needed the true state you're looking for.
Update: Sorry, I did not read your note about sharing a socket. You will need a separate daemon to handle this, perhaps if you can explain your problem further, there might be other approaches. What type of socket is this?
There's a fundamental difference between web-served Java and web-served interpreted languages like PHP and Perl. In Java, your web server will have an operating environment that maintains state (ie. Tomcat). With interpreted languages, a request to your web server will generally spawn a new web server thread, which in turn loads a fresh operating environment for that thread, in this case, the PHP environment.
Therefore, in PHP, there is no concept of page instances. Every request to the web server is a fresh start. All the classes are re-loaded, so there is no concept of class sharing, nor is there a concept of resource pooling, unless it is implemented externally.
Sharing sockets between web requests therefore isn't really possible.
This is likely a partial answer but you can save an instance of a class into a Session variable and access it at another time.
Most of the PHP database libraries use connection pooling already. You call, for example, pg_connect as if you were requesting a new connection, but if the connection string is the same as a connection that already exists, you will get the established connection back instead. If you only care about pooling for database access, then you can just confirm that it exists in the db library you're using.
An other horroble solution may be to load the data of the object to any $_SESSION variable and then user it back into the object of the other page.
In fact, this is the solution I'm going to follow in my project, until I get some better one.
Regards!