Okay, I know this i kinda a vague question. But I have a requirement where I need to list all the tables and the columns which are not being used in the application. I can do manual code search but that would take so much time. The number of tables to check are 140 and maximum number of fields in a table are 90.
Right now I have started searching the code for table names, and I have created an excel sheet with all the table names, and when I found a table in code I highlight that in green. So tables are bit easier.
My question really is, is there a way to speed up this process? or some methods / techniques can be applied?
Thank you?
All depends on your app size and the coding technique.
If I had a large application, I would enable full mysql log (or hack into any database wrapper I may have, to log the queries), run the application, and then extract the information from the log. However, doing so you are just moving the problem. Instead of worrying to capture all the queries, you now need to ensure to run each and every line of your application code (so you are sure that nothing escaped and that you have analyzed all the possibilities).
This is in fact called "code coverage analysis" and there are tools which will help you with that.
This said, I believe that the manual analysis may be quicker for small applications.
I suggest you to build a script that perform the job. For example, you can optain the table list in a database with a query like this:
show tables from YOUR_DATABASE;
More info: http://dev.mysql.com/doc/refman/5.0/en/show-tables.html
And then you can loop your tables and check for fields using:
show columns from YOUR_TABLE;
More info: http://dev.mysql.com/doc/refman/5.0/en/show-columns.html
Finally you can search (grep for example) your tables and fields in your code and write a log or something similar.
Related
My problem is I'm using a HUGE web application (a school system), with no documentation for the internal logic. I need to make a bulk update of a particular value, but I don't know what tables in the MySQL database contain the relevant data to update. The app it's self runs from php. Is there an easy way to compare the database before I do an operation and after I do it so I can see what tables are effected? I tried using a diff comparing tool on the dumped sql before and after, but the database is so huge it's really impractical to use, wondering if there is something better or if I can just configure php somehow to log any mysql operations from whatever file happens to trigger them.
You may want to run the performance tool from the mysql workbench and look at the performance reports/statement analysis. This will work if you pick a time when the system is not being used and then run some function in the web that updates the tables with the values you need to change. Look at the performance table before and after you run your experiment and look for those sql statements which show use. It's not perfect, but this will at least help you begin to hone in on the data you're looking for. The big 'gotcha' here is if the value you want to change is dynamically derived during the query process. Then you'll have to understand how the derivation works and the source columns. But, again, this will give you a brute-force starting place.
So I have an old website which was coded over an extended period of time but has been inactive for 3 or so years. I have the full PHP source to the site, but the problem is I do not have a backup of the database any longer. I'm wondering what the best solution to recreating the database would be? It is a large site so manually going through each PHP file and trying to keep track of which tables are referenced is no small task. I've tried googling for the answer but have had no luck. Does anyone know of any tools that are available to help extract this information from the PHP and at least give me the basis of a database skeleton? Otherwise, has anyone ever had to do this? Any tips to help me along and possibly speed up the process? It is a mySQL database I'm trying to use.
The way I would do it:
Write a subset of SQLi or whatever interface was used to access the DB to intercept all DB accesses.
Replace all DB accesses with the dummy version of yours.
The basic idea is to emulate the DB so that the PHP code runs long enough to activate the various DB accesses, which in turn will allow you to analyze the way the DB is built and used.
From within these dummy functions:
print the SQL code used
regenerate just enough dummy results to let the rest of the code run, based on the tables and fields mentioned in the query parameters and the PHP code that retrieves them (you won't learn much from a SELECT *, but you can see what fields the PHP code expects to get from it)
once you have understood enough of the DB structure, recreate the tables and let the original code work on them little by little
have the previous designer flogged to death for not having provided a way to recreate the DB programatically
There are currently two answers based on the information you provided.
1) you can't do this
PHP is a typeless language. you could check you sql statements for finding field and table names. but it will not complete. if there is a select * from table, you can't see the fields. so you need to check there php accesses the fields. maybe by name or by index. you could be happy if this is done by name, because you can extract the name of the fields. finally the data types will missing. also missing: where are is an index on, what are primary keys, constrains etc.
2) easy, yes you can!
because your php is using a modern framework with contains a orm. this created the database for you. a meta information are included in the php classes/design.
just check the manual how to recreate the database.
I have just been tasked with recovering/rebuilding an extremely large and complex website that had no backups and was fully lost. I have a complete (hopefully) copy of all the PHP files however I have absolutely no clue what the database structure looked like (other than it is certainly at least 50 or so tables...so fairly complex). All data has been lost and the original developer was fired about a year ago in a fiery feud (so I am told). I have been a PHP developer for quite a while and am plenty comfortable trying to sort through everything and get the application/site back up and running...but the lack of a database will be a huge struggle. So...is there any way to simulate a MySQL connection to some software that will capture all incoming queries and attempt to use the requested field and table names to rebuild the structure?
It seems to me that if i start clicking through the application and it passes a query for
SELECT name, email, phone from contact_table WHERE
contact_id='1'
...there should be a way to capture that info and assume there was a table called "contact_table" that had at least 4 fields with those names... If I can do that repetitively, each time adding some sample data to the discovered fields and then moving on to another page, then eventually I should have a rough copy of most of the database structure (at least all public-facing parts). This would be MUCH easier than manually reading all the code and pulling out every reference, reading all the joins and subqueries, and sorting through it all manually.
Anyone ever tried this before? Any other ideas for reverse-engineering the database structure from PHP code?
mysql> SET GLOBAL general_log=1;
With this configuration enabled, the MySQL server writes every query to a log file (datadir/hostname.log by default), even those queries that have errors because the tables and columns don't exist yet.
http://dev.mysql.com/doc/refman/5.6/en/query-log.html says:
The general query log can be very useful when you suspect an error in a client and want to know exactly what the client sent to mysqld.
As you click around in the application, it should generate SQL queries, and you can have a terminal window open running tail -f on the general query log. As you see queries run by that reference tables or columns that don't exist yet, create those tables and columns. Then repeat clicking around in the app.
A number of things may make this task even harder:
If the queries use SELECT *, you can't infer the names of columns or even how many columns there are. You'll have to inspect the application code to see what column names are used after the query result is returned.
If INSERT statements omit the list of column names, you can't know what columns there are or how many. On the other hand, if INSERT statements do specify a list of column names, you can't know if there are more columns that were intended to take on their default values.
Data types of columns won't be apparent from their names, nor string lengths, nor character sets, nor default values.
Constraints, indexes, primary keys, foreign keys won't be apparent from the queries.
Some tables may exist (for example, lookup tables), even though they are never mentioned by name by the queries you find in the app.
Speaking of lookup tables, many databases have sets of initial values stored in tables, such as all possible user types and so on. Without the knowledge of the data for such lookup tables, it'll be hard or impossible to get the app working.
There may have been triggers and stored procedures. Procedures may be referenced by CALL statements in the app, but you can't guess what the code inside triggers or stored procedures was intended to be.
This project is bound to be very laborious, time-consuming, and involve a lot of guesswork. The fact that the employer had a big feud with the developer might be a warning flag. Be careful to set the expectations so the employer understands it will take a lot of work to do this.
PS: I'm assuming you are using a recent version of MySQL, such as 5.1 or later. If you use MySQL 5.0 or earlier, you should just add log=1 to your /etc/my.cnf and restart mysqld.
Crazy task. Is the code such that the DB queries are at all abstracted? Could you replace the query functions with something which would log the tables, columns and keys, and/or actually create the tables or alter them as needed, before firing off the real query?
Alternatively, it might be easier to do some text processing, regex matching, grep/sort/uniq on the queries in all of the PHP files. The goal would be to get it down to a manageable list of all tables and columns in those tables.
I once had a similar task, fortunately I was able to find an old backup.
If you could find a way to extract the queries, like say, regex match all of the occurrences of mysql_query or whatever extension was used to query the database, you could then use something like php-sql-parser to parse the queries and hopefully from that you would be able to get a list of most tables and columns. However, that is only half the battle. The other half is determining the data types for every single column and that would be rather impossible to do autmatically from PHP. It would basically require you inspect it line by line. There are best practices, but who's to say that the old dev followed them? Determining whether a column called "date" should be stored in DATE, DATETIME, INT, or VARCHAR(50) with some sort of manual ugly string thing can only be determined by looking at the actual code.
Good luck!
You could build some triggers with the BEFORE action time, but unfortunately this will only work for INSERT, UPDATE, or DELETE commands.
http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html
We have an existing PHP/MySQL app which doesn't have indexes configured correctly (monitoring shows that we do 85% table scans, ouch!)
What is a good process to follow to identify where we should be putting our indexes?
We're using PHP (Kohana using ORM for the DB access), and MySQL.
The answer likely depends on many things. For example, your strategy might be different if you want to optimize SELECTs at all costs or whether INSERTs are important to you as well. You might do well to read a MySQL Performance Tuning book or web site. There are several decent-to-great ones.
If you have a Slow Query Log, check it to see if there are particular queries that are causing problems. http://dev.mysql.com/doc/refman/5.5/en/slow-query-log.html
If you know the types of queries you'll be running or have identified problematic queries via the Slow Query Log or other mechanisms, you can then use the EXPLAIN command to get some stats on those queries. http://dev.mysql.com/doc/refman/5.5/en/explain.html
Once you have the output from EXPLAIN, you can use it to optimize. See http://dev.mysql.com/doc/refman/5.5/en/using-explain.html
Indexes are not just for the primary keys or the unique keys. If there are any columns in your table that you will search by, you should almost always index them.
I think this will help you on your database problems.
http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/
I have a pretty large social network type site I have working on for about 2 years (high traffic and 100's of files) I have been experimenting for the last couple years with tweaking things for max performance for the traffic and I have learned a lot. Now I have a huge task, I am planning to completely re-code my social network so I am re-designing mysql DB's and everything.
Below is a photo I made up of a couple mysql tables that I have a question about. I currently have the login table which is used in the login process, once a user is logged into the site they very rarely need to hit the table again unless editing a email or password. I then have a user table which is basicly the users settings and profile data for the site. This is where I have questions, should it be better performance to split the user table into smaller tables? For example if you view the user table you will see several fields that I have marked as "setting_" should I just create a seperate setting table? I also have fields marked with "count" which could be total count of comments, photo's, friends, mail messages, etc. So should I create another table to store just the total count of things?
The reason I have them all on 1 table now is because I was thinking maybe it would be better if I could cut down on mysql queries, instead of hitting 3 tables to get information on every page load I could hit 1.
Sorry if this is confusing, and thanks for any tips.
alt text http://img2.pict.com/b0/57/63/2281110/0/800/dbtable.jpg
As long as you don't SELECT * FROM your tables, having 2 or 100 fields won't affect performance.
Just SELECT only the fields you're going to use and you'll be fine with your current structure.
should I just create a seperate setting table?
So should I create another table to store just the total count of things?
There is not a single correct answer for this, it depends on how your application is doing.
What you can do is to measure and extrapolate the results in a dev environment.
In one hand, using a separate table will save you some space and the code will be easier to modify.
In the other hand you may lose some performance ( and you already think ) by having to join information from different tables.
About the count I think it's fine to have it there, although it is always said that is better to calculate this kind of stuff, I don't think for this situation it hurt you at all.
But again, the only way to know what's better your you and your specific app, is to measuring, profiling and find out what's the benefit of doing so. Probably you would only gain 2% of improvement.
You'll need to compare performance testing results between the following:
Leaving it alone
Breaking it up into two tables
Using different queries to retrieve the login data and profile data (if you're not doing this already) with all the data in the same table
Also, you could implement some kind of caching strategy on the profile data if the usage data suggests this would be advantageous.
You should consider putting the counter-columns and frequently updated timestamps in its own table --- every time you bump them the entire row is written.
I wouldn't consider your user table terrible large in number of columns, just my opinion. I also wouldn't break that table into multiple tables unless you can find a case for removal of redundancy. Perhaps you have a lot of users who have the same settings, that would be a case for breaking the table out.
Should take into account the average size of a single row, in order to find out if the retrieval is expensive. Also, should try to use indexes as while looking for data...
The most important thing is to design properly, not just to split because "it looks large". Maybe the IP or IPs could go somewhere else... depends on the data saved there.
Also, as the socialnetworksite using this data also handles auth and autorization processes (guess so), the separation between login and user tables should offer a good performance, 'cause the data on login is "short enough", while the access to the profile could be done only once, inmediately after the successful login. Just do the right tricks to improve DB performance and it's done.
(Remember to visualize tables as entities, name them as an entity, not as a collection of them)
Two things you will want to consider when deciding whether or not you want to break up a single table into multiple tables is:
MySQL likes small, consistent datasets. If you can structure your tables so that they have fixed row lengths that will help performance at the potential cost of disk space. One thing that from what I can tell is common is taking fixed length data and putting it in its own table while the variable length data will go somewhere else.
Joins are in most cases less performant than not joining. If the data currently in your table will normally be accessed all at the same time then it may not be worth splitting it up as you will be slowing down both inserts and quite potentially reads. However, if there is some data in that table that does not get accessed as often then that would be a good candidate for moving out of the table for performance reasons.
I can't find a resource online to substantiate this next statement but I do recall in a MySQL Performance talk given by Jay Pipes that he said the MySQL optimizer has issues once you get more than 8 joins in a single query (MySQL 5.0.*). I am not sure how accurate that magic number is but regardless joins will usually take longer than queries out of a single table.