Getting a MySQL database difference - php

I have a mysql database. What I'd like to do is perform an arbitrary action on it, and then figure out what changed. Something like this:
//assume connection to db already established
before();//saves db state
perform_action();//does stuff to db
diff();//prints what happened
I'd want it to output something like:
Row added in table_0 ]details]
Row added in table_1 [details]
Row modified in table_5 [details]
Row deleted in table_2 [details]
Any ideas?
To further clarify: You know how on stackoverflow, if you check a post's edits, you can see red lines/green highlights indicating what's been changed? I want something like that, but for mysql databases.

Instead of copying your whole database in order to save the state for a later diff, you might be better off by using triggers:
http://dev.mysql.com/doc/refman/5.0/en/triggers.html
When you setup appropriate triggers, you can log changes to a table - for example, you can setup a trigger that automatically logs the old values and the new values for every update. To see the changes, query the table that was filled by the trigger.
Of course, the trigger is not restricted to changes made by your application, it will also log updates done by other applications. But this is also the case if you diff the old version of the database with the new version of the database.

I think normally your application would log any interesting changes as it makes them. Or you would set up history tables for everything with datetimes.
To do it the way you describe, you could dump the contents of the database into a file before and after your action and do a diff on the two files. In php, you can check out xdiff: http://us.php.net/manual/en/book.xdiff.php
If this is something you're doing only occasionally in controlled circumstances to test some queries you're not sure about, you can dump and diff on the command line.

One way is to parse the log files, which will give you exact SQL statements executed in your database. I'm not exactly sure how to separate SQL statements made by your application from other applications (if thats the case)

The only thing I can think of is to do some combination of a few somewhat hackey things:
Save a [temporary?] table of row IDs, to check for new rows. If you need to know what was in deleted or modified rows before, you'll need to copy the whole DB, which would be rather messy.
Have each row have a datestamp that gets modified on update; grab rows for whom the updated datestamp is newer than when the analysis started.
Have a layer between your application and the database (if you have something like the classic $db->query(), it would make this easy), log queries sent, which can then be looked at.
I suppose the real question is if you want to know what queries are being executed against the DB, or if you want to know what they queries you're running are actually doing.

Related

Simulate MySQL connection to analyze queries to rebuild table structure (reverse-engineering tables)

I have just been tasked with recovering/rebuilding an extremely large and complex website that had no backups and was fully lost. I have a complete (hopefully) copy of all the PHP files however I have absolutely no clue what the database structure looked like (other than it is certainly at least 50 or so tables...so fairly complex). All data has been lost and the original developer was fired about a year ago in a fiery feud (so I am told). I have been a PHP developer for quite a while and am plenty comfortable trying to sort through everything and get the application/site back up and running...but the lack of a database will be a huge struggle. So...is there any way to simulate a MySQL connection to some software that will capture all incoming queries and attempt to use the requested field and table names to rebuild the structure?
It seems to me that if i start clicking through the application and it passes a query for
SELECT name, email, phone from contact_table WHERE
contact_id='1'
...there should be a way to capture that info and assume there was a table called "contact_table" that had at least 4 fields with those names... If I can do that repetitively, each time adding some sample data to the discovered fields and then moving on to another page, then eventually I should have a rough copy of most of the database structure (at least all public-facing parts). This would be MUCH easier than manually reading all the code and pulling out every reference, reading all the joins and subqueries, and sorting through it all manually.
Anyone ever tried this before? Any other ideas for reverse-engineering the database structure from PHP code?
mysql> SET GLOBAL general_log=1;
With this configuration enabled, the MySQL server writes every query to a log file (datadir/hostname.log by default), even those queries that have errors because the tables and columns don't exist yet.
http://dev.mysql.com/doc/refman/5.6/en/query-log.html says:
The general query log can be very useful when you suspect an error in a client and want to know exactly what the client sent to mysqld.
As you click around in the application, it should generate SQL queries, and you can have a terminal window open running tail -f on the general query log. As you see queries run by that reference tables or columns that don't exist yet, create those tables and columns. Then repeat clicking around in the app.
A number of things may make this task even harder:
If the queries use SELECT *, you can't infer the names of columns or even how many columns there are. You'll have to inspect the application code to see what column names are used after the query result is returned.
If INSERT statements omit the list of column names, you can't know what columns there are or how many. On the other hand, if INSERT statements do specify a list of column names, you can't know if there are more columns that were intended to take on their default values.
Data types of columns won't be apparent from their names, nor string lengths, nor character sets, nor default values.
Constraints, indexes, primary keys, foreign keys won't be apparent from the queries.
Some tables may exist (for example, lookup tables), even though they are never mentioned by name by the queries you find in the app.
Speaking of lookup tables, many databases have sets of initial values stored in tables, such as all possible user types and so on. Without the knowledge of the data for such lookup tables, it'll be hard or impossible to get the app working.
There may have been triggers and stored procedures. Procedures may be referenced by CALL statements in the app, but you can't guess what the code inside triggers or stored procedures was intended to be.
This project is bound to be very laborious, time-consuming, and involve a lot of guesswork. The fact that the employer had a big feud with the developer might be a warning flag. Be careful to set the expectations so the employer understands it will take a lot of work to do this.
PS: I'm assuming you are using a recent version of MySQL, such as 5.1 or later. If you use MySQL 5.0 or earlier, you should just add log=1 to your /etc/my.cnf and restart mysqld.
Crazy task. Is the code such that the DB queries are at all abstracted? Could you replace the query functions with something which would log the tables, columns and keys, and/or actually create the tables or alter them as needed, before firing off the real query?
Alternatively, it might be easier to do some text processing, regex matching, grep/sort/uniq on the queries in all of the PHP files. The goal would be to get it down to a manageable list of all tables and columns in those tables.
I once had a similar task, fortunately I was able to find an old backup.
If you could find a way to extract the queries, like say, regex match all of the occurrences of mysql_query or whatever extension was used to query the database, you could then use something like php-sql-parser to parse the queries and hopefully from that you would be able to get a list of most tables and columns. However, that is only half the battle. The other half is determining the data types for every single column and that would be rather impossible to do autmatically from PHP. It would basically require you inspect it line by line. There are best practices, but who's to say that the old dev followed them? Determining whether a column called "date" should be stored in DATE, DATETIME, INT, or VARCHAR(50) with some sort of manual ugly string thing can only be determined by looking at the actual code.
Good luck!
You could build some triggers with the BEFORE action time, but unfortunately this will only work for INSERT, UPDATE, or DELETE commands.
http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html

Writing xml feed to database, How do I safely delete old records and update with new?

I am writing information from an XML feed to a database for use on our site. We have found the xml feeds can be inconsistent, so writing info to the database has been a good solution for us.
Ideally I want to cron a file once a day that parses the xml and then writes it to the database. What methodology should I use to eliminate the data from the previous day because I no longer need it once we cron the file and update with the new daily records.
Bad:
cron file -> delete old records -> write new records
What if the xml is not quite right or there is a problem with the script? Then we blew away the data and can't get any new data at the moment.
If the XML info is bad, at least I can then write in some php on the front end to still display the older data but with dates modified or something.
What type of checks and fail safes would be best for my application? I need to update the records each day but only delete the old records if I know for sure we have good new data to import.
I would suggest a backup in the form of a mysql dump. Essentially, the dump is a snapshot of a database at a given time. So if you start the process and something goes wrong, you can revert it back to the point it was at before you started. The workflow would be something along the lines of:
Create dump -> try {Delete old records -> Create new records } catch (Load dump back into database)
If you are using mySQL more information on dumps can be found at: http://dev.mysql.com/doc/refman/5.1/en/mysqldump.html
most other databases have some form of dump as well
Create a guid for your table by hashing a couple of the fields together - whichever ones are persistant between updates. For example, if you are updating inventory you might use the distributor and sku as the input for your guid.
Then when you update just use a mysql REPLACE query to exchange the old data for new data.
REPLACE
Or use an INSERT...on duplicate key update
The nice thing about this is if your script fails for some reason you can safely run it again without getting extra rows pushed into your table.
If you are worried about bad XML data being pushed into your db just validate all your data before pushing it into your table and anything that shouldn't go just gets skipped.
You might want to take a sql backup at the beginning of the script - and if somehow your table gets really messed up you can always go back and restore to a safe backup.

PHP Do something on MySQL Row Added

I need a way to constantly (in a loop) check if a new MySQL row was added, and if so, do some thing with it, specifically send a notification to users that it pertains to, but I can handle that. I just need to know how to execute code when the number of MySQL rows changes.
You might wish to consider using a MySQL trigger on insert and/or delete:
http://dev.mysql.com/doc/refman/5.0/en/triggers.html
http://net.tutsplus.com/tutorials/databases/introduction-to-mysql-triggers/

Detect target write fields so that they can be backed up and potentially restored

Basically, I am trying to create an interface that will tell an administrator "Hey, we ran this query, and we weren't so sure about it, so if it broke things click here to undo it".
The easiest way I can think to do this is to somehow figure out what tables and cells an identified "risky" query writes to, and store this data along with some bookkeeping data in a "backups" table, so that if necessary the fields can be repopulated with their original contents.
How do I go about figuring out which fields get overwritten by a particular (possibly complicated) mysql command?
Edit: "risky" in terms of completing successfully but doing unwanted things, not in terms of throwing an error or failing and leaving the system in an inconsistent state.
I suggest the following things:
- add an AFTER UPDATE trigger to every table you want to monitor
- create a copy of every table (example: [yourtable]_backup) you want to monitor
- in all AFTER UPDATE triggers, add code: INSERT INTO yourtable_backup VALUES(OLD.field1, OLD.field2..., OLD.fieldN)
How it works: the AFTER UPDATE trigger detects an update of the table, and backups the old values into the backup table
Important: you need to use INNODB table format for triggers to work. Triggers don't work with MyISAM tables.
You may add a timestamp field to the backup tables to know when each row was inserted.
Documentation: http://dev.mysql.com/doc/refman/5.5/en/create-trigger.html

Basic version control for MySQL table

I'm trying to setup a (I thought) fairly simple versioning system for static html pages on a site. The goal is to keep previous versions of the content, then restore to them if needed (I guess basically creating a new version that's a duplicate of an old one), and optionally to toss out data older than X versions ago.
The table's setup is fairly straightforward:
id
reference_id (string/used to determine what page the item pertains to)
content (document/html page sized amount of data)
e_user (user who changed it last)
e_timestamp (when it was changed)
I just want to have something setup to create a previous version for each edit to the content, then be able to restore to it if needed.
What's the best method for accomplishing this? Should everything be in the same table, or spread across a few different ones?
I read through a few pages on the subject, but a lot of them seemed like overkill for what i'm trying to accomplish (ex http://www.jasny.net/articles/versioning-mysql-data/ )
Are there any platforms/guides about that will help me in this endeavorer?
Ideally you would want everything in the same table with something in your query to get the correct version, however you should be careful how you do this as an inefficient query will put extra load on your server. If normally you would select a single item like this:
SELECT * FROM your_table WHERE id = 42
This would then become:
SELECT * FROM your_table
WHERE id = 42
AND date < '2010-10-12 15:23:24'
ORDER BY date DESC
LIMIT 1
Index (id, e_timestamp) to allow this to perform efficiently.
Selecting multiple rows in a single query is more tricky and requires a groupwise-maximum approach but it can be done.
You can use a technique called "auditing". You would set up audit tables. Then you would either write it into your code or setup triggers on the DB side so that every time a change is made, an entry is added into the appropriate audit table. Then you can go back through the audit table and see things like:
"Oh, yesterday Sue went in and fixed a typo"
"Uh oh, steve wiped out an entire paragraph by accident earlier today while trying to rewrite this section"
Your primary table that stores the data doesn't keep all that data, so it can stay slim. If you ever need to look at that data and say roll stuff back, you can go look in your audit table and do that. You can setup the audit table however you want, so each audit row can have the entire content BEFORE edit, and not just what was edited. That should make "rolling back" fairly easy.
Add a version column and a delete column (bool) and create some functions that compare the versions of rows with the same id. You'll definitely want to be able to easily find the current version and the previous version. To get rid of the data you'll want to write another function that sorts all of the versions of id, figures out which are old enough to be deleted, and marks them for deletion by another function. You'll probably want to have an option to make certain pages immune to deletion or postpone it.

Categories