PHP / MySQL - Compare tables from 2 different databases

PHP / MySQL - Compare tables from 2 different databases - php

I've got 2 frameworks (Laravel - web, Codeigniter - API) and 2 different databases. I've built a function (on the API) which detect changes on one database (from 2 tables) and apply the changes in the other database.
Note: there is no way to run both web and API on the same databases - so thats why I'm doing this thing.
Anyway, this is important that every little change will recognized. If the case is new record or delete record - its simple and no problem at all. But, if the records exists in both databases - I need to compare their values to detect changes and this section become challenging.
I know how to do this in the slowest and heavy way (pick each record and compare).
My question is - how do you suggest to make it work in smart and fast way?
Thanks a lot.

As long as the mysql user has select rights on both databases, you can qualify the database in the query like so:
SELECT * FROM `db1`.`table1`;
SELECT * FROM `db2`.`table1`;
It doesn't matter which database has been selected when you connected to PHP. The correct database will be used in the query.
The ticks are optional when the database/table name is only alphanumeric and not an SQL keyword.

Depending on the response-time of the 'slave'-database there are a two options which don't increase the overhead too much:
If you can combine both databases within the same database by prefixing one or both of the tables, you can use FOREIGN KEYS to let the database do the tough work for you.
Use the TIMESTAMP-field which you can set to update itself by the DB whenever the row gets updated.
Option 1 would be my best guess, but that might mean a physical change to the running system, and if FOREIGN KEYS are new for you, you might wanna test since they can be a real PITA (IMHO).
Option 2 is easier to implement, but you still have to manually detect changes to deleted/rows.

Related

Simulate MySQL connection to analyze queries to rebuild table structure (reverse-engineering tables)

I have just been tasked with recovering/rebuilding an extremely large and complex website that had no backups and was fully lost. I have a complete (hopefully) copy of all the PHP files however I have absolutely no clue what the database structure looked like (other than it is certainly at least 50 or so tables...so fairly complex). All data has been lost and the original developer was fired about a year ago in a fiery feud (so I am told). I have been a PHP developer for quite a while and am plenty comfortable trying to sort through everything and get the application/site back up and running...but the lack of a database will be a huge struggle. So...is there any way to simulate a MySQL connection to some software that will capture all incoming queries and attempt to use the requested field and table names to rebuild the structure?
It seems to me that if i start clicking through the application and it passes a query for
SELECT name, email, phone from contact_table WHERE
contact_id='1'
...there should be a way to capture that info and assume there was a table called "contact_table" that had at least 4 fields with those names... If I can do that repetitively, each time adding some sample data to the discovered fields and then moving on to another page, then eventually I should have a rough copy of most of the database structure (at least all public-facing parts). This would be MUCH easier than manually reading all the code and pulling out every reference, reading all the joins and subqueries, and sorting through it all manually.
Anyone ever tried this before? Any other ideas for reverse-engineering the database structure from PHP code?

mysql> SET GLOBAL general_log=1;
With this configuration enabled, the MySQL server writes every query to a log file (datadir/hostname.log by default), even those queries that have errors because the tables and columns don't exist yet.
http://dev.mysql.com/doc/refman/5.6/en/query-log.html says:
The general query log can be very useful when you suspect an error in a client and want to know exactly what the client sent to mysqld.
As you click around in the application, it should generate SQL queries, and you can have a terminal window open running tail -f on the general query log. As you see queries run by that reference tables or columns that don't exist yet, create those tables and columns. Then repeat clicking around in the app.
A number of things may make this task even harder:
If the queries use SELECT *, you can't infer the names of columns or even how many columns there are. You'll have to inspect the application code to see what column names are used after the query result is returned.
If INSERT statements omit the list of column names, you can't know what columns there are or how many. On the other hand, if INSERT statements do specify a list of column names, you can't know if there are more columns that were intended to take on their default values.
Data types of columns won't be apparent from their names, nor string lengths, nor character sets, nor default values.
Constraints, indexes, primary keys, foreign keys won't be apparent from the queries.
Some tables may exist (for example, lookup tables), even though they are never mentioned by name by the queries you find in the app.
Speaking of lookup tables, many databases have sets of initial values stored in tables, such as all possible user types and so on. Without the knowledge of the data for such lookup tables, it'll be hard or impossible to get the app working.
There may have been triggers and stored procedures. Procedures may be referenced by CALL statements in the app, but you can't guess what the code inside triggers or stored procedures was intended to be.
This project is bound to be very laborious, time-consuming, and involve a lot of guesswork. The fact that the employer had a big feud with the developer might be a warning flag. Be careful to set the expectations so the employer understands it will take a lot of work to do this.
PS: I'm assuming you are using a recent version of MySQL, such as 5.1 or later. If you use MySQL 5.0 or earlier, you should just add log=1 to your /etc/my.cnf and restart mysqld.

Crazy task. Is the code such that the DB queries are at all abstracted? Could you replace the query functions with something which would log the tables, columns and keys, and/or actually create the tables or alter them as needed, before firing off the real query?
Alternatively, it might be easier to do some text processing, regex matching, grep/sort/uniq on the queries in all of the PHP files. The goal would be to get it down to a manageable list of all tables and columns in those tables.

I once had a similar task, fortunately I was able to find an old backup.
If you could find a way to extract the queries, like say, regex match all of the occurrences of mysql_query or whatever extension was used to query the database, you could then use something like php-sql-parser to parse the queries and hopefully from that you would be able to get a list of most tables and columns. However, that is only half the battle. The other half is determining the data types for every single column and that would be rather impossible to do autmatically from PHP. It would basically require you inspect it line by line. There are best practices, but who's to say that the old dev followed them? Determining whether a column called "date" should be stored in DATE, DATETIME, INT, or VARCHAR(50) with some sort of manual ugly string thing can only be determined by looking at the actual code.
Good luck!

You could build some triggers with the BEFORE action time, but unfortunately this will only work for INSERT, UPDATE, or DELETE commands.
http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html

How should I version my data in an MS SQL shared server environment?

The server is a shared Windows hosting server with Hostgator. We are allowed "unlimited" MS SQL databases and each is allowed "unlimited" space. I'm writing the website in PHP. The data (not the DB schema, but the data) needs to be versioned such that (ideally) my client can select the DB version he wants from a select box when he logs in to the website, and then (roughly once a month) tag the current data, also through a simple form on the website. I've thought of several theoretical ways to do this and I'm not excited about any of them.
1) Put a VersionNumber column on every table; have a master Version table that lists all versions for the select box at login. When tagged, every row without a version number in every table in the db would be duplicated, and the original would be given a version number.
This seems like the easiest idea for both me and my client, but I'm concerned the db would be awfully slow in just a few months, since every table will grow by (at least) its original size every month. There's not a whole lot of data, and there probably never will be, in any one version. But multiplying versions in the same table just scares me.
2) Duplicate the DB every time we tag.
It looks like this would have to be done manually by my client since the server is shared, so I already dislike the idea. But in addition, the old DBs would have to be able to work with the current website code, and as changes are made to the DB structure over time (which is inevitable) the old DBs will no longer work with the new website code.
3) Create duplicate tables (with the version in their name) inside the same database every time we tag. Like [v27_Employee].
The benefit here over idea (1) would be that no table would get humongous in size, allowing the queries to keep up their speed, and over idea (2) it could theoretically be done easily through the simple website tag form rather than manually by my client. The problems are that the queries in my PHP code are going to get all discombobulated as I try to explain which Employee table is joining with which Address table depending upon which version is selected, since they all have the same name, but different; and also that as the code changes, the old DB tables no longer match, same problem as (2).
So, finally, does anyone have any good recommendations? Best practices? Things they did that worked in the past?
Thanks guys.

Option 1 is the most obvious solution because it has the lowest maintenance overhead and it's the easiest to work with: you can view any version at any time simply by adding #VersionNumber to your queries. If you want or need to, this means you could also implement option 3 at the same time by creating views for each version number instead of real tables. If your application only queries one version at a time, consider making the VersionNumber the first column of a clustered primary key, so that all the data for one version is physically stored together.
And it isn't clear how much data you have anyway. You say it's "not a whole lot", but that means nothing. If you really have a lot of data (say, into hundreds of millions of rows) and if you have Enterprise Edition (you didn't say what edition you're using), you can use table partitioning to 'split' very large tables for better performance.
My conclusion would be to do the simplest, easiest thing to maintain right now. If it works fine then you're done. If it doesn't, you will at least be able to rework your design from a simple, stable starting point. If you do something more complicated now, you will have much more work to do if you ever need to redesign it.

You could copy your versionable tables into a new database every month. If you need to do a join between a versionable table and a non-versionable table, you'd need to do a cross-schema join - which is supported in SQL Server. This approach is a bit cleaner than duplicating tables in a single schema, since your database explorer will start getting unwieldy with all the old tables.

What I finally wound up doing was creating a new schema for each version and duplicating the tables and triggers and keys each time the DB is versioned. So, for example, I had this table:
[dbo].[TableWithData]
And I duplicated it into this table in the same DB:
[v1].[TableWithData]
Then, when the user wants to view old tables, they select which version and my code automatically changes every instance of [dbo] in every query to [v1]. It's conceptually fairly simple and the user doesn't have to do anything complicated to version -- just type in "v1" to a form and hit a submit button. My PHP and SQL does the rest.
I did find that some tables had to remain separate -- I made a different schema called [ctrl] into which I put tables that will not be versioned, like the username / password table for example. That way I just duplicate the [dbo] tables.
Its been operational for a year or so and seems to work well at the moment. They've only versioned maybe 4 times so far. The only problem I seem to have consistently that I can't figure out is that triggers seem to get lost somehow. That's probably a problem with my very complex PHP rather than the DB versioning concept itself though.

PHP, MySQL: question on how to "sync" two databases

Note: my question is in the last paragraph.
I have multiple sources of files that get inserted into a database (call it process/database A). These files contain the same type of information but in different formats (i.e. different column headers, orders, number of columns, etc.), but when process A puts them into a unified table, and it is nice and neat. I need this data from multiple sources also inserted into another database (process/database B), but I'm not sure what is the best way of doing this. DB B is part of a software we use. It is not open-source, but DB connection can be made.
We already have process A up and running for a while. Process B is something new to improve physical workflow at the warehouse. I think since the data is already unified in process A, it seems to me that I should pull this unified data and insert it into B. This will save me the repetitive work of remapping everything for process B.
My question is, if I want to "sync" these two databases, what would be the optimal approach? It's not exactly "syncing," I suppose, because the two tables (only need to reference one table on each DB) have different columns. I see these approaches..
Check the entire DB's and pull from DB A to insert to DB B for new data. However, DB B has over 50K rows. DB A is much smaller and growing slowly.
Have the user input a date from which to look for new data rows to insert from A to B.
Check the latest date (data rows are dated) in DB B, and insert accordingly.
Do you guys have any inputs? I'm not too familiar with MySQL processing speed, so I'm not sure if approach 1 is a good option. I'm also not sure what some conventions (if any) are for these types of tasks. I imagine it isn't a too-uncommon thing to do. But (1) seems to be a more complete way of doing things. Any comments or alternative options are appreciated. I'd like to keep things in PHP as it will be a feature on a web application. TIA!

Use mysql clustering
Check it : http://en.wikipedia.org/wiki/MySQL_Cluster

simple multiple user selections/options

I see many implementations such as the Facebook like, forum karma, mark as read on forum posts and other simple options and selections available to multiple users on a given item.
I know I can implement this in mysql by creating a table which links say post IDs to liker user IDs for say, a like system.
My problem is, on a page with lots of posts, I will have to make a lookup for every post. I use prepared statements so that makes it faster for me.
Is there another way to implement these systems, if not, are there optimisations like database types or other tweaks that can make this faster?
Basically, is there a powerful, fast implementation of a many to many database interaction.
*EDIT***
I'm using opera mini and so I have issues with the ajax and js for commenting
Right now, I have a table with two columns. One for user id and the other for post id. Both are indexed and are used in foreign key constraints.
I'm thinking of making a compound primary key across the two.
My main issue is for the karma. I allow users to vote on each post. The problem is, for each post, I need to get the total votes, determine if a user has voted to either allow the user to or not to vote.
My site allows many users to host their own sites and so I need to seriously optimize this.
Someone suggested I use memory tables for this.
NOTE**
I can't use memcached.

I strongly suggest using something else than a MySQL db. I've written an opensocial app which had both heavy writes and reads to a database. It all started with a MySQL DB, I even switched to a dedicated master slave replication setup. But to no avail, it was expensive and it didn't scale very well.
The final solution was to use a NoSQL db which made the most out of RAM. My decision was mongoDB which has an activy community and solved my problem very well. MongoDB proofed to be highly scalable.

Still a little hazy about what you got so far, but I'll start it off and keep adding stuff if need be:
Make sure you're Indexing
Minimum lookups -so you get the list
of posts that will pop up, use that
list to match the like's, if they've viewed the article etc.
Using numbers - make sure all your
comparisons are with numbers
If you're running queries, don't run a single query for each post
Is there a limit in your query? - make sure you use that
De-normalization is not a sin
You can partition your databases to decrease lookups (e.g. if data is older than 60 days and barely touched, move it to a secondary database/table, so the size of your table is not huge)
e.g. SELECT * FROM user_liked WHERE post_id IN (1,2,3)
instead of
SELECT * FROM user_liked WHERE post_id = 1

Philipp Keller wrote a bunch of articles on tag systems based on MYSQL a few years ago. Just as Like-ing, Tagging is establishing a many-to-many relationship between a thing (tag, article being liked) and a user. The logic in his articles should be directly applicable to your problem as well.
Check out the comments as well.
http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html
Database Schemas for Tagging solutions
http://www.pui.ch/phred/archives/2005/05/tags-with-mysql-fulltext.html
Abusing the MySQL FULLTEXT indices for tagging and tag search (requires MyISAM, I'd not go there).
http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
Performance Tests of tagging systems

Synchronize Firebird with MySQL in PHP

I have two databases, one is a Firebird database, other is a MySQL database.
Firebird database is the main one where the information changes. I have to synchronize those changes to the other MySQL database.
I have no control over the Firebird one - I can just SELECT from it. I cannot add triggers, events or similar. I have all the control on the MySQL database.
The synchronization has to be done through 'internet' as these two servers are not connected in any way and are on different locations.
Synchronization has to be done in PHP on the server that also hosts the MySQL database.
Currently I just go through every record (every 15 minutes), calculate the hash of the rows, compare two hashes and if they don't match, I update the whole row. It works but just seems very wrong and not optimized in any way.
Is there any other way to do this? I am missing something?
Thank you.

I have made the same thing once and I don't think there is a generaly better solution.
You can only more or less optimize what you have so far. For example:
If some of the tables have a column with the "latest update" information, you can select only those that were changed since the last sync.
You can change the comparison mechanism - instead of comparing and updating whole rows, you can compare individual columns and on the MySQL side update only the changed ones. I believe that it would speed things up in case of MyISAM tables, but probably not if you use InnoDB.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.