PDO transaction across multiple databases, is it possible?

PDO transaction across multiple databases, is it possible? - php

I have the following problem with one of our users: they have two stores in two diferent locations, each place has its own database, however, they need to share the client base and the list of materials registered for sale. At the moment, what we do is when the client registers a new client, a copy is made on the other location's database. Problems quickly arise as their internet connection is unstable. If, at the time of the registration, the internet is down, it tries to make a copy, fails and carries with inconsistent databases.
I considered making the updates to the database via Pdo transactions that would manage the two databases, but it seems that you need a new instance of PDO $dbh1= new PDO('mysql:host=xxxx;dbname=test',$user,$pass); for each database, and I don't see a way to commit both updates. Looking at this related question what is the best way to do distributed transactions across multiple databases it seems that I need some for transation management. Can this be achieved by PDO?

No, PDO cannot do anything remotely resembling distributed transactions (which in any case are a very thorny issue where no silver bullets exist).
In general, in the presence of network partitions (i.e. actors falling off the network) it can be proved that you cannot achieve consistency and availability (guaranteed response to your queries) at the same time -- see CAP theorem.
It seems that you need to re-evaluate your requirements and design a solution based on the results of this analysis; in order to scale data processing or storage horizontally you have to take the scaling into account from day one and plan accordingly.

You can only instantiate a single PDO object. Therefore, you will need to switch databases using a query, then performing the same queries in the second DB.
Best bet is to do a transaction, then commit that transaction (if successful). Then do something like
$dbh->query('USE otherdb');
$dbh->exec();
Then do a second transaction, and commit or rollback based on whether or not it worked.
I'm not sure if this really answers what you are asking though.

Related

Practicality of multiple databases per client vs one database

I'm going to try to make this as brief as possible while covering all points - I work as a PHP/MySQL developer currently. I have a mobile app idea with a friend and we're going to start developing it.
I'm not saying it's going to be fantastic, but if it catches on, we're going to have a LOT of data.
For example, we'd have "clients," for lack of a better term, who would have anywhere from 100-250,000 "products" listed. Assuming the best, we could have hundreds of clients.
The client would edit data through a web interface, the mobile interface would just make calls to the web server and return JSON (probably).
I'm a lowly cms-developing kinda guy, so I'm not sure how to handle this. My question is more or less about performance; the most I've ever seen in a MySQL table was 340k, and it was already sort of slow (granted it wasn't the best server either).
I just can't fathom a table with 40 million rows (and potential to continually grow) running well.
My plan was to have a "core" database that held the name of the "real" database, so the user would come in and try to access a client's data, it would go to the core database and figure out which database to get the information from.
I'm not concerned with data separation or data security (it's not private information)

Yes, it's possible and my company does it. I'm certainly not going to say it's smart, though. We have a SAAS marketing automation system. Some client's databases have 1 million+ records. We deal with a second "common" database that has a "fulfillment" table tracking emails, letters, phone calls, etc with over 4 million records, plus numerous other very large shared tables. With proper indexing, optimizing, maintaining a separate DB-only server, and possibly clustering (which we don't yet have to do) you can handle a LOT of data......in many cases, those who think it can only handle a few hundred thousand records work on a competing product for a living. If you still doubt whether it's valid, consider that per MySQL's clustering metrics, an 8 server cluster can handle 2.5million updates PER SECOND. Not too shabby at all.....
The problem with using two databases is juggling multiple connections. Is it tough? No, not really. You create different objects and reference your connection classes based on which database you want. In our case, we hit the main database's company class to deduce the client db name and then build the second connection based on that. But, when you're juggling those connections back and forth you can run into errors that require extra debugging. It's not just "Is my query valid?" but "Am I actually getting the correct database connection?" In our case, a dropped session can cause all sorts of PDO errors to fire because the system no longer can keep track of which client database to access. Plus, from a maintainability standpoint, it's a scary process trying to push table structure updates to 100 different live database. Yes, it can be automated. But one slip up and you've knocked a LOT of people down and made a ton of extra work for yourself. Now, calculate the extra development and testing required to juggle connections and push updates....that will be your measure of whether it's worthwhile.
My recommendation? Find a host that allows you to put two machines on the same local network. We chose Linode, but who you use is irrelevant. Start out with your dedicated database server, plan ahead to do clustering when it's necessary. Keep all your content in one DB, index and optimize religiously. Finally, find a REALLY good DB guy and treat him well. With that much data, a great DBA would be a must.

Store some records in the application and some in the database?

I have an application where it seems as if it would make sense to store some records hard-coded in the application code rather than an entry in the database, and be able to merge the two for a common result set when viewing the records. Are there any pitfalls to this approach?
Firstly, it would seem to make it easier to enforce that a record is never edited/deleted, other than when the application developer wants to. Second, in some scenarios such as installing a 3rd party module, the records could be read from their configuration rather than performing an insert in the db (with the related maintenance issues).
Some common examples:
In the application In the database
----------------------------------- ------------------ ----------------------
customers (none) all customers
HTML templates default templates user-defined templates
'control panel' interface languages default language additional languages
Online shop payment processors all payment processors (none)
So, I think I have three options depending on the scenario:
All records in the database
Some records in the application, some records in the database
All records in the application
And it seems that there are two ways to implement it:
All records in the database:
A column could be flagged as 'editable' or 'locked'
Negative IDs could represent locked values and positive IDs could represent editable
Odd IDs represent locked and even IDs represent editable...
Some records live in the application (as variables, arrays or objects...)
Are there any standard ways to deal with this scenario? Am I missing some really obvious solutions?
I'm using MySQL and php, if that changes your answer!

By "in the application", do you mean these records live in the filesystem, accessible to the application?
It all depends on the app you're building. There are a few things to consider, especially when it comes to code complexity and performance. While I don't have enough info about your project to suggest specifics, here are a few pointers to keep in mind:
Having two possible repositories for everything ramps up the complexity of your code. That means readability will go down and weird errors will start cropping up that are hard to trace. In most cases, it's in your best interest to go with the simplest solution that can possibly work. If you look at big PHP/MySQL software packages you will see that even though there are a lot of default values in the code itself, the data comes almost exclusively from the database. This is probably a reasonable policy when you can't get away with the simplest solution ever (namely storing everything in files).
The big downside of heavy database involvement is performance. You should definitely keep track of all the database calls of any typical codepath in your app. If you rely heavily on lots of queries, you have to employ a lot of caching. Track everything that happens and keep in mind what the computer has to in order to fulfill the request. It's you job to make the computer's task as easy as possible.
If you store templates in the DB, another big performance penalty will be the lack of opcode re-use and caching. Normal web hosting environments compile a PHP file once and then keep the bytecode version of it around for a while. This saves subsequent recompiles and speeds up execution substantially. But if you fill PHP template code into an eval() statement, this code will have to be recompiled by PHP every single time it's called.
Also, if you're using eval() in this fashion and you allow users to edit templates, you have to make sure those users are trusted - because they'll have access to the entire PHP environment. If you're going the other route and are using a template engine, you'll potentially have a much bigger performance problem (but not a security problem). In any case, consider caching template outputs wherever possible.
Regarding the locking mechanism: it seems you are introducing a big architectural issue here since you now have to make each repository (file and DB) understand what records are off-limits to the other one. I'd suggest you reconsider this approach entirely, but if you must, I'd strongly urge you to flag records using a separate column for it (the ID-based stuff sounds like a nightmare).
The standard way would be to keep classical DB-shaped stuff in the DB (these would be user accounts and other stuff that fits nicely into tables) and keep the configuration, all your code and template things in the filesystem.

I think that keeping some fixed values hard-coded in the application may be a good way to deal with the problem. In most cases, it will even reduce load on database server, because some not all the values must be retrieved via SQL.
But there are cases when it could lead to performance issues, mainly if you have to join values coming from the database with your hard-coded values. In this case, storing all the values in database may have better performance, because all values could be optimized and processed by the database server, rather than getting all the values from SQL query and joining them manually in the code.
To deal with this case, you can store the values in database, but inserts and updates must be handled just by your maintenance or upgrade routines. If you have a bigger concern about not letting the data be modified, you can setup a maintenance routine to check if the values from the database are the same as the code from time to time. In this case, this database tables act much like a "cache" of the hard-coded values. And when you don't need to join the fixed values with the database values, you can still get them from the code, avoiding an unnecessary SQL query (because you're sure the values are the same).

In general, anytime you're performing a database query if you want to include something that's hard-coded into the work-flow, there isn't any joining that needs to happen. You would simply the action on your hard-coded data as well as the data you pulled from the database. This is especially true if we're talking about information that is formed into an object once it is in the application. For instance, I can see this being useful if you want there to always be a dev user in the application. You could have this user hard-coded in the application and whenever you would query the database, such as when you're logging in a user, you would check your hard-coded user's values before querying the database.
For instance:
// You would place this on the login page
$DevUser = new User(info);
$_SESSION['DevUser'] = $DevUser;
// This would go in the user authentication logic
if($_SESSION['DevUser']->GetValue(Username) == $GivenUName && $_SESSION['DevUser']->GetValue(PassHash) == $GivenPassHash)
{
// log in user
}
else
{
// query for user that matches given username and password hash
}
This shows how there doesn't need to be any special or tricky database stuff going on. Hard-coding variables to include in your database driven workflow is extremely simple when you don't over think it.
There could be a case where you might have a lot of hard-coded variables/objects and/or you might want to execute a large block of logic on both sets of information. In this case it could be beneficial to have an array that holds the hard-coded information and then you could just add the queried information to that array before you perform any logic on it.
In the case of payment processors, I would assume that you're referring to online payments using different services such as PayPal, or a credit card, or something else. This would make the most sense as a Payment class that has a separate function for each payment method. That way you can call whichever method the client chooses. I can't think of any other way you would want to handle this. If you're maybe talking about the payment options available to your customers, that would be something hard-coded on your payment page.
Hopefully this helps. Remember, don't make it more complicated than it needs to be.

Persistent transaction in a client-server approach

In my application (client-server) I need to edit some rows (from a database), and as long as they are edited it needs nobody to be able to edit also. This is done by transactions of course. The problem is that in a client-side environment the transactions is managed on the server side, so the client that edits the rows can't access the transaction directly. (I'm working with PHP in that situation but think that the same approach is adopted in other technologies also). So I need to keep transaction opened (for keeping rows locked for editing) until the client finishes the edit.
In PHP, persistent connection won't help because they can be broken from other clients located in the same host with the aforesaid client. Do you have any ideeas for my scenario?
thank you.

Usually such cases are handled through business locks that you set directly on the objects, or on the parent of the objects.
Add a column such "inedition" that you set to true when user claims for edit, and set to false when user validate/cancel its edit.
Be aware that some users transactions are likely to be lost before that you unlock the row, so you'll probably require:
either a periodic treatment that unlock rows
either a functionnal screen from which the user or an admin can unlock the rows that remained locked.
Edit:
This kind of solution is used whenever you do not want to rely on database specific feature, such Oracle "Select for update". In Java an EJB statefull bean can keep a reference to the transaction from UI to database. There might be solutions using PHP for Oracle or other database specific feature regarding transactions, depending on databases.

One or many databases for application for many clients in PHP

I am writing a PHP application in ZF. Customers will use it to sell their products to final customers. Customers will host their application on my server or they could use their own. Most of them will host this application on my server.
I could design one database for all customers at once, so every customer will use the same database, but of course products etc. will be assigned to particular customer. Trivial.
I could use separate database for every customer, so the database structure will be simpler. I will then probably use separate subdomains and maybe even file location, but that is just a detail.
Which solution will have better performance and how big will be the difference? Which one would you choose?

I would use a separate database for each customer. It makes backup and scaling easier. If you ever get a large customer that needs some custom changes to the schema, you can do it easily.
If one customer needs you to restore their data, with a single database it is trivial. On a shared db, much harder.
And that if large customer ever gets a lot of traffic, you can easily put them on another server with minimal changes.
If one site gets compromised, you don't have all of teh data for everyone in one place, the damage is mitigated to just the site that was hacked.
I'd definitely recommend going with 1 db per customer if possible.

Personally, I would go with multiple databases - i.e. a database for each client.
As I understand it all your clients will be using just an instance of your application so these instances should have their own databases.
If you go with a single database, you are creating a great potential security risk. One client compromising the login details to the db server would automatically compromise data of all your clients.
Also a single security vulnerability (a SQL injection attack) could destroy data of all clients (with multiple dbs you could still have time to fix the security hole and release a patch before all other sites are attacked).
You don't want to have an army of 1000000 mad clients instead of just 1 angry client.
Multiple databases also give you a greater possibility of load balancing (you can have the dbs spread across more servers).

Performance wise you're basically start with a 'sharding' approach. Because of this, the sharding performance strategy will be piece of cake.
The downside is that you could argue you're losing some (undefined) bit of overhead in the duplication.
One pitfall is that you might not notice performance issues in major components as quickly. This is because they are so scattered, so they might not be visible on your radar. Load testing is the way to get ahead of this.

To some extent this is a question of personal opinion. There are pros and cons of both models.
Personally, and because of the "they could use their own" comment, I would go with a seperate database per customer. This gives you
The ability to move customer data around when necessary. For example moving a single customer onto a different servers/setups depending on things like load.
If something goes wrong you only impact one customer and not everybody.
You can spread DB load across multiple DB servers if necessary.
If a customer comes to you with a specific requirement you can more easily cater for this without impact other customers.
From a performance perspective, to be honest I don't think there is any real performace gain in either model. That said this does of course depend on the structure of your DB and the hardware it runs on.

Don't choose multiple databases solution, if your needs can be fulfilled with one database. Because multiple databases will lead to big burden in long run, and your system will become highly complicated and unmanageable as you grow.
Using proper relationship you can go long way
A Client model can have many Products // why multiple databases?
Performance can achieved in either ways, just going multiple dbs will NOT benefit in that direction

MySQL Transaction across many PHP Requests

I would like to create an interface for manipulating invoices in a transaction-like manner.
The database consists of an invoices table, which holds billing information, and an invoice_lines table, which holds line items for the invoices. The website is a set of scripts which allow the addition, modification, and removal of invoices and their corresponding lines.
The problem I have is this, I would like the ACID properties of the database to be reflected in the web application.
Atomic: When the user hits save, either the entire invoice is modified or the entire invoice is not changed at all.
Consistent: The application code already ensures consistency, lines cannot be added to non-existent invoices. Invoice IDs cannot be duplicated.
Isolated: If a user is in the middle of a set of changes to an invoice, I would like to hide those changes from other users until the user clicks save.
Durable: If the web site dies, the data should be safe. This already works.
If I were writing a desktop application, it would maintain a connection to the MySQL database at all times, allowing me to simply use the BEGIN TRANSACTION and COMMIT at the beginning and end of the edit.
From what I understand you cannot BEGIN TRANSACTION on one PHP page and COMMIT on a different page because the connection is closed between pages.
Is there a way to make this possible without extensions? From what I have found, only SQL Relay does this (but it is an extension).

you don't want to have long running transactions, because that'll limit concurrency. http://en.wikipedia.org/wiki/Command_pattern

The translation on the web for this type of processing is the use of session data or data stored in the page itself. Typically what is done is that after each web page is completed the data is stored in the session (or in the page itself) and at the point in which all of the pages have been completed (via data entry) and a "Process" (or "Save") button is hit, the data is converted into the database form and saved - even with the relational aspect of data like you mentioned. There are many ways to do this but I would say that most developers have an architecture similar to what I mentioned (using session data or state within the page) to satisfy what you are talking about.
You'll get much advice here on different architectures but I can say that the Zend Framework (http://framework.zend.com) and the use of Doctrine (http://www.doctrine-project.org/) make this fairy easy since Zend provides much of the MVC architecture and session management and Doctrine provides the basic CRUD (create, retrieve, update, delete) you are looking for - plus all of the other aspects (uniqueness, commit, rollback, etc). Keeping the connection open to mysql may cause timeouts and lack of available connections.

Database transactions aren't really intended for this purpose - if you did use them, you'd probably run into other problems.
But also you can't use them as each page request uses its own connection (potentially) so cannot share a transaction with any others.
Keep the modifications to the invoice somewhere else while the user is editing them, then apply them when she hits save; you can do this final apply step in a transaction (albeit quite a short-lived one).
Long-lived transactions are usually bad.

The solution is not to open the transaction during the GET phase. Do all aspects of the transaction—BEGIN TRANSACTION, processing, and COMMIT—all during the POST triggered by the "save" button.

Persistent connections may help you:
http://php.net/manual/en/features.persistent-connections.php
Another is that when using
transactions, a transaction block will
also carry over to the next script
which uses that connection if script
execution ends before the transaction
block does.
But I recommend you to find another approach to the problem.
For example: create a cache table.
When you need to "commit", transfer the records from the cache table to the "real" tables.

Altough there are some good answers, I think that found some good responses to your question, that I was stuck with also. I think the best approach is using a framework like Doctrine (O/R mapping) that has this kind of approach somehow implemented. Here you have a link to what I'm talking about.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.