Lets say I have a general DB class with a query method that is used all over my source code and I want to find out table names that are used in my mysql queries.
The limitation is that I'm using MySql 5.5.36.
Also lets assume that we are talking about millions of tables that I'm using and that using the mysql information schema is not going to happen.
What I would like to know is there an easy way to get table names used?
Explain is obviously good for SELECT statments but since its MySql 5.5.36 I can't use it on replace,update,insert etc.
PDOStatement::getColumnMeta might help us with getting a table name, but it won't work with the queries that return result set.
Some kind of regexp for this might might be possible but I very much doubt that is a good solution for this, my queries are big have multiple JOINS etc. the regexp would be very complicated and probably fail fair percentage of time.
Any other ideas?
Related
So, the context is: I have a site in which many pages may need the information about one table, say for instance, 'films'. This table has many fields, like title, language, year, description, director... And perhaps in one page I need only the title and the id of some rows and in another I also need the description.
So the question is: should I code a database manager (I am using MySQL) that retrieves all the fields of the rows that satisfy a condition (I guess the WHERE clause should be passed as a parameter)? Or should I be able to specify which fields are needed? I thinks this cannot be done easily with mysqli (because prepared statements require to specify beforehand the number of fetched fields), so for this to work I would need to use PDO instead, which I haven't used yet. Is it worth it this last approach? Or there is not really a big difference in performance if I retrieve the whole information about those rows?
Thank you in advance.
Based upon the comments above, My answer to your question(s) is
Retrieving some fields vs all fields isn't a real performance consideration until you are dealing with one or more CLOB/TEXT columns which have a lot of text in them. Good database practice indicates you should always specify which fields are returned from a query.
Any query against any table should have a where clause to restrict the number of rows returned. Especially if you are looking to query exactly one row.
Your question implies you are writing a wrapper layer around the queries to hide this complexity. Don't do this. Get an existing PHP library that does this work for you. See for example: Good PHP ORM Library? . There are a number of subtle issues, like security, which you will overlook.
I have just been tasked with recovering/rebuilding an extremely large and complex website that had no backups and was fully lost. I have a complete (hopefully) copy of all the PHP files however I have absolutely no clue what the database structure looked like (other than it is certainly at least 50 or so tables...so fairly complex). All data has been lost and the original developer was fired about a year ago in a fiery feud (so I am told). I have been a PHP developer for quite a while and am plenty comfortable trying to sort through everything and get the application/site back up and running...but the lack of a database will be a huge struggle. So...is there any way to simulate a MySQL connection to some software that will capture all incoming queries and attempt to use the requested field and table names to rebuild the structure?
It seems to me that if i start clicking through the application and it passes a query for
SELECT name, email, phone from contact_table WHERE
contact_id='1'
...there should be a way to capture that info and assume there was a table called "contact_table" that had at least 4 fields with those names... If I can do that repetitively, each time adding some sample data to the discovered fields and then moving on to another page, then eventually I should have a rough copy of most of the database structure (at least all public-facing parts). This would be MUCH easier than manually reading all the code and pulling out every reference, reading all the joins and subqueries, and sorting through it all manually.
Anyone ever tried this before? Any other ideas for reverse-engineering the database structure from PHP code?
mysql> SET GLOBAL general_log=1;
With this configuration enabled, the MySQL server writes every query to a log file (datadir/hostname.log by default), even those queries that have errors because the tables and columns don't exist yet.
http://dev.mysql.com/doc/refman/5.6/en/query-log.html says:
The general query log can be very useful when you suspect an error in a client and want to know exactly what the client sent to mysqld.
As you click around in the application, it should generate SQL queries, and you can have a terminal window open running tail -f on the general query log. As you see queries run by that reference tables or columns that don't exist yet, create those tables and columns. Then repeat clicking around in the app.
A number of things may make this task even harder:
If the queries use SELECT *, you can't infer the names of columns or even how many columns there are. You'll have to inspect the application code to see what column names are used after the query result is returned.
If INSERT statements omit the list of column names, you can't know what columns there are or how many. On the other hand, if INSERT statements do specify a list of column names, you can't know if there are more columns that were intended to take on their default values.
Data types of columns won't be apparent from their names, nor string lengths, nor character sets, nor default values.
Constraints, indexes, primary keys, foreign keys won't be apparent from the queries.
Some tables may exist (for example, lookup tables), even though they are never mentioned by name by the queries you find in the app.
Speaking of lookup tables, many databases have sets of initial values stored in tables, such as all possible user types and so on. Without the knowledge of the data for such lookup tables, it'll be hard or impossible to get the app working.
There may have been triggers and stored procedures. Procedures may be referenced by CALL statements in the app, but you can't guess what the code inside triggers or stored procedures was intended to be.
This project is bound to be very laborious, time-consuming, and involve a lot of guesswork. The fact that the employer had a big feud with the developer might be a warning flag. Be careful to set the expectations so the employer understands it will take a lot of work to do this.
PS: I'm assuming you are using a recent version of MySQL, such as 5.1 or later. If you use MySQL 5.0 or earlier, you should just add log=1 to your /etc/my.cnf and restart mysqld.
Crazy task. Is the code such that the DB queries are at all abstracted? Could you replace the query functions with something which would log the tables, columns and keys, and/or actually create the tables or alter them as needed, before firing off the real query?
Alternatively, it might be easier to do some text processing, regex matching, grep/sort/uniq on the queries in all of the PHP files. The goal would be to get it down to a manageable list of all tables and columns in those tables.
I once had a similar task, fortunately I was able to find an old backup.
If you could find a way to extract the queries, like say, regex match all of the occurrences of mysql_query or whatever extension was used to query the database, you could then use something like php-sql-parser to parse the queries and hopefully from that you would be able to get a list of most tables and columns. However, that is only half the battle. The other half is determining the data types for every single column and that would be rather impossible to do autmatically from PHP. It would basically require you inspect it line by line. There are best practices, but who's to say that the old dev followed them? Determining whether a column called "date" should be stored in DATE, DATETIME, INT, or VARCHAR(50) with some sort of manual ugly string thing can only be determined by looking at the actual code.
Good luck!
You could build some triggers with the BEFORE action time, but unfortunately this will only work for INSERT, UPDATE, or DELETE commands.
http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html
Simplified scenario:
I have a table with about 100,000 rows.
I will need to pick about 300-400 rows, based on certain criteria, to display them on a web page.
Considering the above scenario, which one of the below approaches will you recommend?
Approach 1: Use just one database query to select the entire table into one big array of 100,000 rows. Using loops, pick required 300-400 rows from the array and pass it one to the front-end. Minimum load on the database server, as it's just one query. Put's more load on the PHP, as it has to store and search through an array of 100,000.
Approach 2: Using a loop, PHP will generate a new query for each row of required data. Collecting all the data will require 300-400 independent queries. More load on the server. Compared to approach 1, lesser load on PHP.
Opinions / thoughts will be appreciated!
100,000 rows is a small amount for MySQL rdbms.
You would better do fine tuning of the db server.
So I recommend neither 1 nor 2.
Just:
SELECT * FROM `your_table` WHERE `any_field` = 'YOUR CRITERIA' LIMIT 300;
When your data overcomes 1,000,000 rows you should think about strong indexes optimization and maybe you'll have to create a stored procedure for complicated select. I assure you it's not PHP work in any case.
As your question asks from Performance prospective, your both approaches would consume some resources. I would still go for approach 1 in this case, as it doesn't make query to database again and again, if you generate query for each row i.e. 300-400 queries. When it comes to huge project designing, database always comes as bottleneck.
To be honest, both approaches are not good. Its good practice to have good database design and query selection. What you are trying to achieve could be done by suitable query.
Using PHP to loop through the data is really a bad idea, after all, a database is designed to perform queries. PHP will need to loop through all the record, and doesn't use an index to speed things up; this is roughly equivalent to a 'table scan' in the database.
In order to get the most performance out of your database, it's important to have a good design and (for example) create indexes on the right columns.
Also, if you haven't decided yet what RDBMS you're going to use, depending on your usage, some databases have more advanced options that can assist in better performance (e.g. PostgreSQL has support for geographical information)
Pease provide some actual data (what kind of data will be stored, what kind of fields) and samples of the kind of queries / filters that will need to be performed so that people will be able to give you an actual answer, not a hypothetical
For what I thought would be a common problem, after a medium amount of searching has returned nothing. I have several mysql servers with different tables on them and each is either a master or a read slave. I would like to route all SELECT and other non-table-modifying queries through a read slave and all others INSERT,UPDATE,ALTER,etc. to a master as well as make sure that the right master slave combo actually has the tables I am asking about.
TLDR: based on the tables and the query type (read or write) I want to use a different mysql connection.
In order to do this in a clean way I am making a wrapper function which will examine the query and decide which connection to use. In this function will be the gory details about which tables are on which servers and which are read slaves or masters. All that I will pass in is a string containing the sql statement. I know how I can implement this using regular expressions but I would much prefer to use a built in function if one exists to account for any variability in the SQL syntax or the possibility of statement types I don't know about.
TLDR: is there any way better than regex to get the table name(s) and the query type from a string of an sql query? What is an exhaustive list of the read type sql operations? Does anyone have a good regex already to find the parts of the sql statement for all variations of SQL queries?
I would imagine that many people have faced a similar problem in setting up their systems and at least some of them solved it in a similar way. Help on any of these directions would be greatly appreciated. Thanks for your time.
Use a SQL parser, such as:
http://pear.php.net/package/SQL_Parser
http://sourceforge.net/projects/txtsql/
http://sourceforge.net/projects/osqlp/
http://code.google.com/p/php-sql-parser/
http://www.phpclasses.org/package/5007
http://www.phpclasses.org/package/4916
The way an ORM like Doctrine or Propel would address a situation like this would be to associate each table with a connection and then use a query object to build the resulting SQL.
For example, to build a SELECT query in Doctrine, you might do something like this:
$users = UserTable::getInstance()
->createQuery() // SELECT is the default operation.
->andWhere('is_active = ?', $active)
->fetch();
If you use a class to build the query for you in a piecewise method this way, it is fairly trivial to determine the nature of the operation and the primary table for the query.
I have some several codes in PHP to do jobs that MySQL could do.
such as sorting, merging each data from different MySQL tables, etc...
Lately, I found out that I can do all these stuffs with one MySQL query.
I am wondering is it better to give the MySQL capable jobs to MySQL or to PHP.
efficiencies, speed, etc..
Thank You,
If you do it in PHP you are just re-implementing the features that MySQL already has. It's far from the most optimized solution and therefore it is much slower.
You should definately do it in the SQL query.
Your performance will increase if you let MySQL handle that work.
It will be better performing to do this in MySQL. Firstly, it has optimized sorting algorithms for the data and can utilize indexes which are created. Furthermore, if it is merging and filtering, you will end up transfering less data from the database.
Databases are optimized to carry out these functions while retrieving the data. Sorting at database level is much more easier to read than writing tens of line for coding in PHP over the lists or collection
There are ready String functions available in MySQL to merge the data while retrieving the data from the database.
I definitely would suggest MySQL.
DO it in MySQL. There's no question that is more efficient. PHP will use much more memory, for one.
No question: MySQL is built for this.
To add something, maybe you'd be intrested in building joint table queries (multiple table queries). It is very helpful and really very simple. For instance:
$query = "SELECT DISTINCT post.title as title, post.id as id,
product.imageURL as imageURL, product.dueDate as dueDate
FROM post, product
WHERE post.status='saved'
AND post.productURL=product.linkURL
AND post.userEmail='$session[userEmail]'
AND NOT EXISTS(
SELECT publication.postId FROM publication
WHERE publication.postId=post.id
)
ORDER BY post.id";
This is a simple example from some code i built.
The thing is it merges 2 different tables with the restriction of post.productURL=product.linkURL. It also uses negation, pretty useful when the set you are looking for is not defined by any condition but instead the absence of one.
You can avoid this by building views in MySQL as well.
I'm a newbie myself, so I hope it helps. Cheers.