Comparing data to table in the database - php

I receive raw data in CSVs, and upload it to a table in a MySQL database (upon which my website functions). I want to compare a newer CSV to the data I uploaded from an older CSV, and I want to see the differences between the two (basically I want to diff the raw data with the table).
I have PHP, MySQL, and my desktop apps (e.g. Excel) at my disposal. What's the best way to go about this? Possible ways I can think of:
Inserting the newer data into a
Table_Copy, then somehow diffing the
two tables in mysql.
Somehow querying the database in
comparison to the rawdata without
having to upload it.
Downloading the data from the
database into raw CSV format, and
then comparing the two raw CSV's using a desktop program

Why don't you use the where clause to pull only the data that is new? For instance
select * from table where dateadded > '1-1-2011 18:18'
This depends on your table having a dateadded column and populating that with the date and time the data is added.

diff <(mysqldump test old_csv --skip-extended-insert) <(mysqldump test new_csv --skip-extended-insert) --side-by-side --suppress-common-lines --width=690 | more

You can use the following approaches
1) Database Table comparison - create a copy of the table and then compare data.
You can use propriety tools to do it easily (Eg : EMS Data comparer).
You can also write some simple queries to achieve this (Eg : select id from table_copy not in (select id in table) )
2) Use a file comparer like winmerge
Take the dump of both the tables with exact method, and them compare it.
I use both the approaches depending on my data size. For smaller data 2nd approach is good.

Related

MySQL database normalization (taken from Excel)

I have imported a SQL database from an Excel sheet, so it's a little bit messy. For example there's a product field with VARCHAR values such as product8. I would like to grep through these data using some regex, capture the id in this example, and alter column data types. As of now I would start preg_matching the long and hard PHP way, and I'm curious how a database normalization is done right using SQL commands. Thanks for your support in advance.
you can select case, to pull the ids
select right(product,length(product)-7) as productID from table
this will pull the numbers, then you can do whatever

How to customize exporting query result (PHP mysql) to excel file?

I have some questions about customizing export result(excel) with php. The idea is i want to export my query of my raw data (mysql table) to excel file but with some customization in the result.
for example i want to have result which is summary of the table like below table:
The 3rd column until 7th column is named based on the last 5 days of my report date.
My idea is:
1. create temporary table using format as the result table i want to generate
2. Insert the table with my raw data.
3. Delete those table.
Is that efective?or is there any better idea?
You can always use a view. Which is essentially a select statement with your data in there, and which will be updated whenever your tables are updated. Then you can just do a 'select * from view_name' and export that into your excel.
Depending on the size of the data, there is no need to think about performance.
Edit the data before
You can have a temp table. Depending on the data, this is very fast if you can select and insert the data based on indexes. Then you make a SELECT * from tmp_table; and you have all your data
Edit the data after
You can just join over the different tables, get the data and then loop (read as foreach) over the result array and change the data and export it afterwards

Create a dataabse field vs generating data by coding

I was wondering where the cut off point is to creating a field in a database to hold your data or generating the data yourself in your code. For example, I need to know certain value that is generated from two different columns in the database. Column1-Column2 = Column3. So here is the question will it be better to generate that data in the code or should I create a Column3 and put the data there while populating the DB and then retrieve it later. In my case the data is a two digit integer or a single character string, basically small data.
I am using the latest mysql the programming language is php with the mysqli library. Also this website should not get too much traffic and the size of the db will be 200k rows at the most.
This type of attribute (column) is called derived attribute. You should not put them in database as they will increase redundancy. Just put column1 and column2 and calculate it while fetching. For example like this,
`Column1` - `Column2` as `Column3`
If you dont bother query that way every time create a view with added derived attributes.
Note, If calculating is cpu intensive you should consider using cache. and then you must implement how and when this cache will be invalidated.
It depends on how resource-hungry the calculation is and how often it's gonna be done. In your case it's pretty simple so storing the difference in a separate column would be overkill. You can do your calculation in SQL query like this:
SELECT col1, col2, col1-col2 AS col3...

Archive MySQL data using PHP every week

I have a MySQL DB that receives a lot of data from a source once every week on a certain day of the week at a given time (about 1.2million rows) and stores it in, lets call it, the "live" table.
I want to copy all the data from "live" table into an archive and truncate the live table to make space for the next "current data" that will come in the following week.
Can anyone suggest an efficient way of doing this. I am really trying to avoid -- insert into archive_table select * from live --. I would like the ability to run this archiver using PHP so I cant use Maatkit. Any suggestions?
EDIT: Also, the archived data needs to be readily accessible. Since every insert is timestamped, if I want to look for the data from last month, I can just search for it in the archives
The sneaky way:
Don't copy records over. That takes too long.
Instead, just rename the live table out of the way, and recreate:
RENAME TABLE live_table TO archive_table;
CREATE TABLE live_table (...);
It should be quite fast and painless.
EDIT: The method I described works best if you want an archive table per-rotation period. If you want to maintain a single archive table, might need to get trickier. However, if you're just wanting to do ad-hoc queries on historical data, you can probably just use UNION.
If you only wanted to save a few periods worth of data, you could do the rename thing a few times, in a manner similar to log rotation. You could then define a view that UNIONs the archive tables into one big honkin' table.
EDIT2: If you want to maintain auto-increment stuff, you might hope to try:
RENAME TABLE live TO archive1;
CREATE TABLE live (...);
ALTER TABLE LIVE AUTO_INCREMENT = (SELECT MAX(id) FROM archive1);
but sadly, that won't work. However, if you're driving the process with PHP, that's pretty easy to work around.
Write a script to run as a cron job to:
Dump the archive data from the "live" table (this is probably more efficient using mysqldump from a shell script)
Truncate the live table
Modify the INSERT statements in the dump file so that the table name references the archive table instead of the live table
Append the archive data to the archive table (again, could just import from dump file via shell script, e.g. mysql dbname < dumpfile.sql)
This would depend on what you're doing with the data once you've archived it, but have you considered using MySQL replication?
You could set up another server as a replication slave, and once all the data gets replicated, do your delete or truncate with a SET BIN-LOG 0 before it to avoid that statement also being replicated.

How can I search all of the databases on my mysql server for a single string of information

I have around 150 different databases, with dozens of tables each on one of my servers. I am looking to see which database contains a specific person's name. Right now, i'm using phpmyadmin to search each database indvidually, but I would really like to be able to search all databases and all tables at once. Is this possible? How would I go about doing this?
A solution would be to use the information_schema database, to list all database, all tables, all fields, and loop over all that...
There is this script that could help for at least some part of the work : anywhereindb (quoting) :
This code is search all the tables and
all the rows and columns in a MYSQL
Database. The code is written in PHP.
For faster result, we are only
searching in the varchar field.
But, as Harmen noted, this only works with one database -- which means you'd have to wrap something arround it, to loop over each database on your server.
For more informations about that, take a look at Chapter 19. INFORMATION_SCHEMA Tables ; especially, the SCHEMATA table, which contains the name of all databases on the server.
Here's another solution, based on a stored procedure -- which means less client/server calls, which might make it faster : http://kedar.nitty-witty.com/miscpages/mysql-search-through-all-database-tables-columns-stored-procedure.php
The right way to go about it would be to NORMALIZE your data in the first place!!!
You say name - but most people have at least 2 names (a surname and a forename) are these split up or in the same field? If they are in the same field, then what order do they appear in? how are they capitalized?
The most efficient way to try to identify where the data might be would be to write a program in C which sifts the raw data files (while the DBMS is shut down) looking for the data - but that will only tell you what table they apppear in.
Failing that you need to write some PHP which iterates through each database ('SHOW databases' works much like a select statement), then iterates through each table in the database, then generates a SELECT statement filtering on each CHAR or VARCHAR column large enough to hold the name you are looking for (try running 'DESC $table').
Good luck.
C.
The best answer probably depends on how often you want to do this. If it is ad-hoc once a week type stuff then the above answers are good.
If you want to do this kind of search once a second, maybe create a "data warehouse" database that contains just the table:columns you want to search (heavily indexed, with a reference back to the source database if that is needed) populated by cron job or by stored procedures driven by changes in the 150 databases...

Categories