I have two databases, one is a Firebird database, other is a MySQL database.
Firebird database is the main one where the information changes. I have to synchronize those changes to the other MySQL database.
I have no control over the Firebird one - I can just SELECT from it. I cannot add triggers, events or similar. I have all the control on the MySQL database.
The synchronization has to be done through 'internet' as these two servers are not connected in any way and are on different locations.
Synchronization has to be done in PHP on the server that also hosts the MySQL database.
Currently I just go through every record (every 15 minutes), calculate the hash of the rows, compare two hashes and if they don't match, I update the whole row. It works but just seems very wrong and not optimized in any way.
Is there any other way to do this? I am missing something?
Thank you.
I have made the same thing once and I don't think there is a generaly better solution.
You can only more or less optimize what you have so far. For example:
If some of the tables have a column with the "latest update" information, you can select only those that were changed since the last sync.
You can change the comparison mechanism - instead of comparing and updating whole rows, you can compare individual columns and on the MySQL side update only the changed ones. I believe that it would speed things up in case of MyISAM tables, but probably not if you use InnoDB.
Related
I've got 2 frameworks (Laravel - web, Codeigniter - API) and 2 different databases. I've built a function (on the API) which detect changes on one database (from 2 tables) and apply the changes in the other database.
Note: there is no way to run both web and API on the same databases - so thats why I'm doing this thing.
Anyway, this is important that every little change will recognized. If the case is new record or delete record - its simple and no problem at all. But, if the records exists in both databases - I need to compare their values to detect changes and this section become challenging.
I know how to do this in the slowest and heavy way (pick each record and compare).
My question is - how do you suggest to make it work in smart and fast way?
Thanks a lot.
As long as the mysql user has select rights on both databases, you can qualify the database in the query like so:
SELECT * FROM `db1`.`table1`;
SELECT * FROM `db2`.`table1`;
It doesn't matter which database has been selected when you connected to PHP. The correct database will be used in the query.
The ticks are optional when the database/table name is only alphanumeric and not an SQL keyword.
Depending on the response-time of the 'slave'-database there are a two options which don't increase the overhead too much:
If you can combine both databases within the same database by prefixing one or both of the tables, you can use FOREIGN KEYS to let the database do the tough work for you.
Use the TIMESTAMP-field which you can set to update itself by the DB whenever the row gets updated.
Option 1 would be my best guess, but that might mean a physical change to the running system, and if FOREIGN KEYS are new for you, you might wanna test since they can be a real PITA (IMHO).
Option 2 is easier to implement, but you still have to manually detect changes to deleted/rows.
I have a problem with a project I am currently working on, built in PHP & MySQL. The project itself is similar to an online bidding system. Users bid on a project, and they get a chance to win if they follow their bid by clicking and cliking again.
The problem is this: if 5 users for example, enter the game at the same time, I get a 8-10 seconds delay in the database - I update the database using the UNIX_TIMESTAMP(CURRENT_TIMESTAMP), which makes the whole system of the bids useless.
I want to mention too that the project is very database intensive (around 30-40 queries per page) and I was thinking maybe the queries get delayed, but I'm not sure if that's happening. If that's the case though, any suggestions how to avoid this type of problem?
Hope I've been at least clear with this issue. It's the first time it happened to me and I would appreciate your help!
You can decide on
Optimizing or minimizing required queries.
You can cache queries do not need to update on each visit.
You can use Summery tables
Update the queries only on changes.
You have to do this cleverly. You can follow this MySQLPerformanceBlog
I'm not clearly on what you're doing, but let me elaborate on what you said. If you're using UNIX_TIMESTAMP(CURRENT_TIMESTAMP()) in your MySQL query you have a serious problem.
The problem with your approach is that you are using MySQL functions to supply the timestamp record that will be stored in the database. This is an issue, because then you have to wait on MySQL to parse and execute your query before that timestamp is ever generated (and some MySQL engines like MyISAM use table-level locking). Other engines (like InnoDB) have slower writes due to row-level locking granularity. This means the time stored in the row will not necessarily reflect the time the request was generated to insert said row. Additionally, it can also mean that the time you're reading from the database is not necessarily the most current record (assuming you are updating records after they were inserted into the table).
What you need is for the PHP request that generates the SQL query to provide the TIMESTAMP directly in the SQL query. This means the timestamp reflects the time the request is received by PHP and not necessarily the time that the row is inserted/updated into the database.
You also have to be clear about which MySQL engine you're table is using. For example, engines like InnoDB use MVCC (Multi-Version Concurrency Control). This means while a row is being read it can be written to at the same time. If this happens the database engine uses something called a page table to store the existing value that will be read by the client while the new value is being updated. That way you have guaranteed row-level locking with faster and more stable reads, but potentially slower writes.
I'm setting up a MySQL database and I'm not sure of the best method to structure it:
I am setting up a system (PHP/MySQL based) where a few hundred people will be executing SELECT/UPDATE/SET/DELETE queries to a database (probably about 50 simultaneously). I imagine there are going to be a few thousand rows if they're all using the same database and table. I could split the data across a number of tables but then I would have to make sure they're all uniform AND I, as the administrator, will be running some SELECT DISTINCT queries via cron to update an administrative interface.
What's the best way to approach this? Can I have everybody sharing one database? one table? Will there be a problem when there are a few thousand rows? I imagine there is going to be a huge performance issue over time.
Any tips or suggestions are welcome!
MySQL/php can easily handle this as long as your server is powerful enough. MySQL loves RAM and will use as much as it can (within the limits you provide).
If you're going to have a lot of concurrent users then I would suggest looking at using innodb tables instead of MyISAM (the default in MySQL versions <5.5). Innodb locks individual rows when doing INSERT/UPDATE/DELETE etc, rather than locking the whole table like MyISAM does.
We use php/MySQL and would have 1000+ users on our site at the same time (our master db server does about 4k queries per second).
The server is a shared Windows hosting server with Hostgator. We are allowed "unlimited" MS SQL databases and each is allowed "unlimited" space. I'm writing the website in PHP. The data (not the DB schema, but the data) needs to be versioned such that (ideally) my client can select the DB version he wants from a select box when he logs in to the website, and then (roughly once a month) tag the current data, also through a simple form on the website. I've thought of several theoretical ways to do this and I'm not excited about any of them.
1) Put a VersionNumber column on every table; have a master Version table that lists all versions for the select box at login. When tagged, every row without a version number in every table in the db would be duplicated, and the original would be given a version number.
This seems like the easiest idea for both me and my client, but I'm concerned the db would be awfully slow in just a few months, since every table will grow by (at least) its original size every month. There's not a whole lot of data, and there probably never will be, in any one version. But multiplying versions in the same table just scares me.
2) Duplicate the DB every time we tag.
It looks like this would have to be done manually by my client since the server is shared, so I already dislike the idea. But in addition, the old DBs would have to be able to work with the current website code, and as changes are made to the DB structure over time (which is inevitable) the old DBs will no longer work with the new website code.
3) Create duplicate tables (with the version in their name) inside the same database every time we tag. Like [v27_Employee].
The benefit here over idea (1) would be that no table would get humongous in size, allowing the queries to keep up their speed, and over idea (2) it could theoretically be done easily through the simple website tag form rather than manually by my client. The problems are that the queries in my PHP code are going to get all discombobulated as I try to explain which Employee table is joining with which Address table depending upon which version is selected, since they all have the same name, but different; and also that as the code changes, the old DB tables no longer match, same problem as (2).
So, finally, does anyone have any good recommendations? Best practices? Things they did that worked in the past?
Thanks guys.
Option 1 is the most obvious solution because it has the lowest maintenance overhead and it's the easiest to work with: you can view any version at any time simply by adding #VersionNumber to your queries. If you want or need to, this means you could also implement option 3 at the same time by creating views for each version number instead of real tables. If your application only queries one version at a time, consider making the VersionNumber the first column of a clustered primary key, so that all the data for one version is physically stored together.
And it isn't clear how much data you have anyway. You say it's "not a whole lot", but that means nothing. If you really have a lot of data (say, into hundreds of millions of rows) and if you have Enterprise Edition (you didn't say what edition you're using), you can use table partitioning to 'split' very large tables for better performance.
My conclusion would be to do the simplest, easiest thing to maintain right now. If it works fine then you're done. If it doesn't, you will at least be able to rework your design from a simple, stable starting point. If you do something more complicated now, you will have much more work to do if you ever need to redesign it.
You could copy your versionable tables into a new database every month. If you need to do a join between a versionable table and a non-versionable table, you'd need to do a cross-schema join - which is supported in SQL Server. This approach is a bit cleaner than duplicating tables in a single schema, since your database explorer will start getting unwieldy with all the old tables.
What I finally wound up doing was creating a new schema for each version and duplicating the tables and triggers and keys each time the DB is versioned. So, for example, I had this table:
[dbo].[TableWithData]
And I duplicated it into this table in the same DB:
[v1].[TableWithData]
Then, when the user wants to view old tables, they select which version and my code automatically changes every instance of [dbo] in every query to [v1]. It's conceptually fairly simple and the user doesn't have to do anything complicated to version -- just type in "v1" to a form and hit a submit button. My PHP and SQL does the rest.
I did find that some tables had to remain separate -- I made a different schema called [ctrl] into which I put tables that will not be versioned, like the username / password table for example. That way I just duplicate the [dbo] tables.
Its been operational for a year or so and seems to work well at the moment. They've only versioned maybe 4 times so far. The only problem I seem to have consistently that I can't figure out is that triggers seem to get lost somehow. That's probably a problem with my very complex PHP rather than the DB versioning concept itself though.
Here's the situation:
I have a mySQL db on a remote server. I need data from 4 of its tables. On occasion, the schema of these tables is changed (new fields are added, but not removed). At the moment, the tables have > 300,000 records.
This data needs to be imported into the localhost mySQL instance. These same 4 tables exist (with the same names), but the fields needed are a subset of the fields in the remote db tables. The data in these local tables is considered read-only and is never written to. Everything needs to be run in a transaction so there is always some data in the local tables, even if it is a day old. The localhost tables are used by an active website, so this entire process needs to complete as quickly as possible to minimize downtime.
This process runs once per day.
The options as I see them:
Get a mysqldump of the structure/data of the remote tables and save to file. Drop the localhost tables, and run the dumped sql script. Then recreate the needed indexes on the 4 tables.
Truncate the localhost tables. Run SELECT queries on the remote db in PHP and retrieve only the fields needed instead of the entire row. Then loop through the results and create INSERT statements from this data.
My questions:
Performance wise, which is my best option?
Which one will complete the fastest?
Will either one put a heavier load on the server?
Would indexing the
tables take the same amount of time in both options?
If there is no good reason for having the local d/b be a subset of the remote, make the structure the same and enable database replication on the needed tables. Replication works by the master tracking all changes made, and managing each slave d/b's pointer into the changes. Each slave says give me all changes since the last request. For a sizeable database, this is far more efficient than any alternative you have selected. It comes with only modest cost.
As for schema changes, I think the alter information is logged by the master, so the slave(s) can replicate those as well. The mechanism definitely replicates drop table ... if exists and create table ... select, so alter logically should follow, but I have not tried it.
Here it is: confirmation that alter is properly replicated.