DATABASE
I have a normalized Postgres 9.1 database and in it I have written some functions. One function in particular "fn_SuperQuery"(param,param, ...)" returns SET OF RECORD and should be thought of as a view (that accepts parameters). This function has lots of overhead because it actually creates several temporary tables while calculating its own results in order to gain performance with large data sets.
On a side note, I used to use WITH (cte's) exclusively for this query, but I needed the ability to add indexes on some columns for more efficient joins.
PHP
I use PHP strictly to connect to the database, run a query, and return the results as JSON. Each query starts with a connection string and then finishes with a call to pg_close.
FRONTEND
I am using jQuery's .ajax function to call the PHP file and accept the results.
My problem is this:
"fn_SuperQuery"(param,param, ...)" is actually the foundation for several other queries. There are some parts of this application that need to run several queries at once to generate all the necessary information for the end user. Many of these queries rely on the output of "fn_SuperQuery"(param,param, ...)" The overhead in running this query is pretty steep, and the fact that it would return the same data if given the same parameters makes me think that it's dumb to make the user wait for it to run twice.
What I want to do is return the results of "fn_SuperQuery"(param,param, ...)" into a temporary table, then run the other queries that require its data, then discard the temporary table.
I understand that PostgreSQL ... requires each session to issue its own CREATE TEMPORARY TABLE command for each temporary table to be used. If I could get two PHP files to connect to the same database session then they should both be able to see the temporary table.
Any idea on how to do this? ... or maybe a different approach I have yet to consider?
May be better will be using normal tables? It will be no much difference. You can speed it up by using unlogged tables.
In 9.3 probably better choice would be using materialized views.
Temporary tables are session-private. If you want to share across different sessions, use normal tables (probably unlogged).
If you are worried about denormalization, the first thing I would look at doing is just storing these temporary normal tables ;-) in a separate schema. This allows you to keep a the denormalized (and working set data) separate for analysis and such and avoids polluting the rest of you dataset with the denormalized tables.
Alternatively you could look at other means short of denormalization. For example if data isn't going to change after a while you could put summary entries periodically for unchangeable data. This is not a denormalization since it allows you to purge old detail records if you need to down the line while continuing to have certain forms of reporting open.
Related
My problem is I'm using a HUGE web application (a school system), with no documentation for the internal logic. I need to make a bulk update of a particular value, but I don't know what tables in the MySQL database contain the relevant data to update. The app it's self runs from php. Is there an easy way to compare the database before I do an operation and after I do it so I can see what tables are effected? I tried using a diff comparing tool on the dumped sql before and after, but the database is so huge it's really impractical to use, wondering if there is something better or if I can just configure php somehow to log any mysql operations from whatever file happens to trigger them.
You may want to run the performance tool from the mysql workbench and look at the performance reports/statement analysis. This will work if you pick a time when the system is not being used and then run some function in the web that updates the tables with the values you need to change. Look at the performance table before and after you run your experiment and look for those sql statements which show use. It's not perfect, but this will at least help you begin to hone in on the data you're looking for. The big 'gotcha' here is if the value you want to change is dynamically derived during the query process. Then you'll have to understand how the derivation works and the source columns. But, again, this will give you a brute-force starting place.
I've got a situation where I'm pulling a big data set that needs post processing on the php side for sorting and filtering ( can't all be done in sql ) . This gets to be extremely slow when a user actually starts to sort or filter the data as we need to run through our nasty sql and post processing logic on each subsequent request. For arguments sake, let's just say that the sql and post processing is 100% needed as is. What I'm thinking about doing is creating a 'temporary table' (or something similar) with the data from the first run and then querying the temporary table for all successive sorts/filters. I'm wondering what people's thoughts may be related to this type of situation. I don't think a strict mysql temp table will work because those tables are dropped when the connection is dropped and we are not using persistent connections. I looked into redis but it didn't look like I could do the type of sorting / filtering needed on a set of objects (sorting an associative array on a specific key that could be a number, string, etc. etc.). Does anybody have any advice on potential solutions for this type of scenario?
I'm using mysql, php, galera cluster
Just CREATE TABLE with a "temporary" name, rather than literally CREATE TEMPORARY TABLE.
If you name them as __tmp_x it should be obvious what they are and be easy to clean them up later.
What you're describing sounds like a use-case for a data warehouse, though. A deliberately duplicated copy of your database you can thrash around with as aggressively as you want without impacting production. You can export from there as required.
I have just been tasked with recovering/rebuilding an extremely large and complex website that had no backups and was fully lost. I have a complete (hopefully) copy of all the PHP files however I have absolutely no clue what the database structure looked like (other than it is certainly at least 50 or so tables...so fairly complex). All data has been lost and the original developer was fired about a year ago in a fiery feud (so I am told). I have been a PHP developer for quite a while and am plenty comfortable trying to sort through everything and get the application/site back up and running...but the lack of a database will be a huge struggle. So...is there any way to simulate a MySQL connection to some software that will capture all incoming queries and attempt to use the requested field and table names to rebuild the structure?
It seems to me that if i start clicking through the application and it passes a query for
SELECT name, email, phone from contact_table WHERE
contact_id='1'
...there should be a way to capture that info and assume there was a table called "contact_table" that had at least 4 fields with those names... If I can do that repetitively, each time adding some sample data to the discovered fields and then moving on to another page, then eventually I should have a rough copy of most of the database structure (at least all public-facing parts). This would be MUCH easier than manually reading all the code and pulling out every reference, reading all the joins and subqueries, and sorting through it all manually.
Anyone ever tried this before? Any other ideas for reverse-engineering the database structure from PHP code?
mysql> SET GLOBAL general_log=1;
With this configuration enabled, the MySQL server writes every query to a log file (datadir/hostname.log by default), even those queries that have errors because the tables and columns don't exist yet.
http://dev.mysql.com/doc/refman/5.6/en/query-log.html says:
The general query log can be very useful when you suspect an error in a client and want to know exactly what the client sent to mysqld.
As you click around in the application, it should generate SQL queries, and you can have a terminal window open running tail -f on the general query log. As you see queries run by that reference tables or columns that don't exist yet, create those tables and columns. Then repeat clicking around in the app.
A number of things may make this task even harder:
If the queries use SELECT *, you can't infer the names of columns or even how many columns there are. You'll have to inspect the application code to see what column names are used after the query result is returned.
If INSERT statements omit the list of column names, you can't know what columns there are or how many. On the other hand, if INSERT statements do specify a list of column names, you can't know if there are more columns that were intended to take on their default values.
Data types of columns won't be apparent from their names, nor string lengths, nor character sets, nor default values.
Constraints, indexes, primary keys, foreign keys won't be apparent from the queries.
Some tables may exist (for example, lookup tables), even though they are never mentioned by name by the queries you find in the app.
Speaking of lookup tables, many databases have sets of initial values stored in tables, such as all possible user types and so on. Without the knowledge of the data for such lookup tables, it'll be hard or impossible to get the app working.
There may have been triggers and stored procedures. Procedures may be referenced by CALL statements in the app, but you can't guess what the code inside triggers or stored procedures was intended to be.
This project is bound to be very laborious, time-consuming, and involve a lot of guesswork. The fact that the employer had a big feud with the developer might be a warning flag. Be careful to set the expectations so the employer understands it will take a lot of work to do this.
PS: I'm assuming you are using a recent version of MySQL, such as 5.1 or later. If you use MySQL 5.0 or earlier, you should just add log=1 to your /etc/my.cnf and restart mysqld.
Crazy task. Is the code such that the DB queries are at all abstracted? Could you replace the query functions with something which would log the tables, columns and keys, and/or actually create the tables or alter them as needed, before firing off the real query?
Alternatively, it might be easier to do some text processing, regex matching, grep/sort/uniq on the queries in all of the PHP files. The goal would be to get it down to a manageable list of all tables and columns in those tables.
I once had a similar task, fortunately I was able to find an old backup.
If you could find a way to extract the queries, like say, regex match all of the occurrences of mysql_query or whatever extension was used to query the database, you could then use something like php-sql-parser to parse the queries and hopefully from that you would be able to get a list of most tables and columns. However, that is only half the battle. The other half is determining the data types for every single column and that would be rather impossible to do autmatically from PHP. It would basically require you inspect it line by line. There are best practices, but who's to say that the old dev followed them? Determining whether a column called "date" should be stored in DATE, DATETIME, INT, or VARCHAR(50) with some sort of manual ugly string thing can only be determined by looking at the actual code.
Good luck!
You could build some triggers with the BEFORE action time, but unfortunately this will only work for INSERT, UPDATE, or DELETE commands.
http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html
I have a MySQL table with about 9.5K rows, these won't change much but I may slowly add to them.
I have a process where if someone scans a barcode I have to check if that barcode matches a value in this table. What would be the fastest way to accomplish this? I must mention there is no pattern to these values
Here Are Some Thoughts
Ajax call to PHP file to query MySQL table ( my thoughts would this would be slowest )
Load this MySQL table into an array on log in. Then when scanning Ajax call to PHP file to check the array
Load this table into an array on log in. When viewing the scanning page somehow load that array into a JavaScript array and check with JavaScript. (this seems to me to be the fastest because it eliminates Ajax call and MySQL Query. Would it be efficient to split into smaller arrays so I don't lag the server & browser?)
Honestly, I'd never load the entire table for anything. All I'd do is make an AJAX request back to a PHP gateway that then queries the database, and returns the result (or nothing). It can be very fast (as it only depends on the latency) and you can cache that result heavily (via memcached, or something like it).
There's really no reason to ever load the entire array for "validation"...
Much faster to used a well indexed MySQL table, then to look through an array for something.
But in the end it all depends on what you really want to do with the data.
As you mentions your table contain around 9.5K of data. There is no logic to load data on login or scanning page.
Better to index your table and do a ajax call whenever required.
Best of Luck!!
While 9.5 K rows are not that much, the related amount of data would need some time to transfer.
Therefore - and in general - I'd propose to run validation of values on the server side. AJAX is the right technology to do this quite easily.
Loading all 9.5 K rows only to find one specific row, is definitely a waste of resources. Run a SELECT-query for the single value.
Exposing PHP-functionality at the client-side / AJAX
Have a look at the xajax project, which allows to expose whole PHP classes or single methods as AJAX method at the client side. Moreover, xajax helps during the exchange of parameters between client and server.
Indexing to be searched attributes
Please ensure, that the column, which holds the barcode value, is indexed. In case the verification process tends to be slow, look out for MySQL table scans.
Avoiding table scans
To avoid table scans and keep your queries run fast, do use fixed sized fields. E.g. VARCHAR() besides other types makes queries slower, since rows no longer have a fixed size. No fixed-sized tables effectively prevent the database to easily predict the location of the next row of the result set. Therefore, you e.g. CHAR(20) instead of VARCHAR().
Finally: Security!
Don't forget, that any data transferred to the client side may expose sensitive data. While your 9.5 K rows may not get rendered by client's browser, the rows do exist in the generated HTML-page. Using Show source any user would be able to figure out all valid numbers.
Exposing valid barcode values may or may not be a security problem in your project context.
PS: While not related to your question, I'd propose to use PHPexcel for reading or writing spreadsheet data. Beside other solutions, e.g. a PEAR-based framework, PHPExcel depends on nothing.
I am working on a project that is being built with a standard LAMP stack. Currently, I merely output the results of the query onto the page - the results are not being stored in objects at all.
In the future, I would like to edit the results of the query. I imagine that this would much easier if the results were stored in PHP objects.
Would it be more beneficial to store the objects themselves in a DB (via serialization/deserialization), or to create the objects when need be (after executing the query) and then destroying them when they are no longer needed?
You'd be better off storing a copy of the results directly in your object, rather than a serialized result handle. Serializing the result handle will NOT preserve locks, server-side variables, table state, transactions, or the data in the result set. MySQL has no provision for storing a connection handle in this fashion, so it'd be seen as a regular disconnect and resulting in outstanding queries being cleaned up, variables destroyed, transactions rolled back (or committed), etc...
As well, the data retrieved by the query is not actually fetched across the connection until you do a fetch_row()-type call, so you'd not even have that in your serialized handle.
Always create the objects in php, and destroy them later. In order to serialize you will need to use longtext or like field, which are known to be slow and you cannot index on them. If you are always doing a Select All, then go ahead, but if you ever use conditions or advanced queries, you should have all data separated.
It depends on many factors. If you are running the exact same queries again and again, then yes, store the results in your database. But why serialise them? If you tried Object-relational mapping, you could have a much easier to maintain query object, that you could store in a well organised relational database.
If you are not running the same queries very often, I would recommend caching the output in another way.
Would it be more beneficial to store the objects themselves in a DB (via serialization/deserialization), or to create the objects when need be (after executing the query) and then destroying them when they are no longer needed?
No. Somebody somewhere has done this for you. What would be beneficial is for you to use an existing ORM. It doesn't matter which one, just pick one and use it. You'll be lightyears ahead and get your project out the in a fraction of the time.
You should use a PHP framework while you're at it, many of which come coupled to an ORM.