Prepare and import data into existing database - php

I maintain a PHP application with SQL Server backend. The DB structure is roughly this:
lot
===
lot_id (pk, identify)
lot_code
building
========
buildin_id (pk, identity)
lot_id (fk)
inspection
==========
inspection_id (pk, identify)
building_id (fk)
date
inspector
result
The database already has lots and buildings and I need to import some inspections. Key points are:
It's a one-time initial load.
Data comes in an Excel file.
The Excel data is unaware of DB autogenerated IDs: inspections must be linked to buildings through their lot_code
What are my options to do such data load?
date inspector result lot_code
========== =========== ======== ========
31/12/2009 John Smith Pass 987654X
28/02/2010 Bill Jones Fail 123456B

1) get the excel file into a CSV.
2) import the CSV file into a holding table: SQL SERVER – Import CSV File Into SQL Server Using Bulk Insert – Load Comma Delimited File Into SQL Server
3) write a stored procedure/script where you declare local variables and loop through each row in the in the holding table, building out the proper rows in the actual tables. Since this is a one time load, there is no shame in looping, and you'll have complete control over all the logic.

Your data would have to have natural primary keys in the data file. It looks like lot_code may be one, but I don't see one for the building table.
Also, you say that inspections are be related to buildings through lot code, yet the relationship in the table is between building and inspection.
If the data is modeled correctly, you can import to temp tables and then insert/update the target tables using the natural keys.

I originally added this answer to the question itself. I'm moving it to a proper answer because that's the right place. Please note anyway that the whole business is from 2010 so the information may or may not be longer relevant.
In case someone else has to do a similar task, these are the steps the data load finally required:
Prepare the Excel file: remove unwanted columns, give proper names to sheets and column headers, etc.
With SQL Server Import / Export Wizard (32-bit version; the 64-bit version lacks this feature), load each sheet in the into a (new) database table. The wizard takes care of (most of) the dirty details, including creating the appropriate DB structure.
Log into the database with your favourite client. To make SQL coding easier, I created some extra fields in the new tables.
Start a transaction.
BEGIN TRANSACTION;
Update the auxiliary columns in the newly created tables:
UPDATE excel_inspection$
SET building_id = bu.building_id
FROM building bu
INNER JOIN ....
Insert data in the destination tables:
INSERT INTO inspection (...)
SELECT ...
FROM excel_inspection$
WHERE ....
Review the results and commit the transaction if everything's fine:
COMMIT;
In my case, SQL Server complained about collation conflicts when joining the new tables with the existing ones. It was fixed by setting an appropriate collation in the new tables but the method differs: in SQL Server 2005 I could simply change collation from the SQL Server Manager (Click, Click, Save and Done) but in SQL Server 2008 I had to set collation manually in the import Wizard ("Edit SQL" button).

Related

What's Better? Multiple Tables having same Entities vs Few Relation Tables having more Records

I'm creating a database on mysql for a small app.
Problem is there are too many fields that are identical on different Tables like
Table 1: Muncipal Issues:
ID,
UserID,
Title,
Location,
Description,
ImageURL,
Table 2: Harrasement Issues:
ID ,
UserID,
Title,
Location,
Description,
ImageURL
Tables 3 same as above
both tables have almost same coulmns.
i want to ask if it's better to use a relations and create a table for handling IDs and link it with other details or it's better to create a single table with an extra coulmn for these issues.
on one hand there'll be too many tables with identical columns.
on the other hand there'll few tables with too many rows in it.
What will be best for performance more rows or more tables.
i'm using Mysql.
Firstly, unless you expect millions of records don't care that much about performance but care more about the structure of your data and how easy it will be to access it. Literally write down a list of data that you plan to extract in your app e.g. "find all issues today", "find all unresolved issues older than 6 months" and then try to build real SQL queries on your expected structure. If they're going hard try to change the structure.
To answer your question: it depends. The current structure has following benefits:
It's easy to query certain type of issues
It's easy to build a PHP application - just make one template form (or model) and then copypaste it with slight changes for other tables
In case of performance problems it may be easier to create a cluster by simply putting each table on the different db server.
and following downsides:
It's inflexible. Adding new field that you forgot to add in the beginning will be painful since you'll have to change 3 (or more) tables and then the same amount of pieces in your app.
Adding new types of issues will be painful and require creating new table.
Creating SQL-s for getting data like "all non-resolved issues (regardless of type)" will require complicated UNION-s. Moreover this UNIONS will require creating virtual field with issue type otherwise you can't tell from which table did certain id come.
The classical db approach recommends using one table for common fields and create derived tables for fields that are different. So:
issues table should have all common fields and is identified by PK issue_id
municipal_issues uses the foreign key to issues.issue_id and has only the specific fields
harassment_issues uses the foreign key to issues.issue_id and has only the specific fields
also the issues table has the issue_type field that takes values "harassment", "municipal" etc and helps finding the table where the additional data are stored.
This pattern is called "Class Table inheritance" and you may check out the SQL antipatterns presentation for more info and other approaches. This solves the flexibility issue and still allows re-creating each of the original tables with only one simple JOIN that goes pretty fast.
Also as a side note you may look into the db schema of bug-trackers like Mantis since this looks like the same domain.

MySQL table from phpMyAdmin's tracking reports

Using phpMyAdmin I can track a certain table's transactions (new inserts, deletes, etc), but is it possible to change or export it to a SQL table to be imported into my site using PHP.
While it is not exactly what I was looking for, I found a way to do it for the time being, one of the tables in the phpmyadmin (it's own) database is called pma__tracking, which contains a record of all tables being tracked, one of its columns is the data_sql longtext column which writes each report (ascendingly which is a bit annoying) in the following format
# log date username
data definition statement
Just added it for future references.

Codeigniter 2.1 - multi language insert

I need to insert data into DB in two language, and I am having a bit of a dilemma (data needs to exist in both languages). Is it better to make user insert data in both language at once, or is it better for the user to first insert in one language and then to insert in the second one? And if the latter is better how is the most efficient way to do this? How can I present all articles that are not inserted in both language?
DB structure for the articles:
Common table for all article (same data):
**article -> id_article | image | date_created | category_id | subcategory_id**
Table where data is different:
article_info -> article_id | name | text | lang_id
If the data must exist in both languages - i.e., the application assumes that if an item exists in one language, than it must exist in the other - then you should design your application so that the user must add them both at once.
When you perform the database writes, you should also be using transactions. This will ensure that either all of your writes succeed, or none of them do. It prevents the database from being left in an indeterminate state with a record for one language but not the other.
Have a look at this CodeIgniter manual page on transactions to get an idea on how they work.
You can also use the insert_batch method in the database class to insert both records at once. I don't know how it works with all database drivers, but the mysqli driver will generate a single query when you use insert_batch, so the entire insert will succeed or the entire insert will fail, similar to what happens with transactions. That said, I would still wrap the call to insert_batch in a transaction block just to be a bit paranoid and future-proof.

Which database table schema for storing survey data?

I'm developing software for conducting online surveys. When a lot of users are filling in a survey simultaneously, I'm experiencing trouble handling the high database write load. My current table (MySQL, InnoDB) for storing survey data has the following columns: dataID, userID, item_1 .. item_n. The item_* columns have different data types corresponding to the type of data acquired with the specific items. Most item columns are TINYINT(1), but there are also some TEXT item columns. Large surveys can have more than a hundred items, leading to a table with more than a hundred columns. The users answers around 20 items in one http post and the corresponding row has to be updated accordingly. The user may skip a lot of items, leading to a lot of NULL values in the row.
I'm considering the following solution to my write load problem. Instead of having a single table with many columns, I set up several tables corresponding to the used data types, e.g.: data_tinyint_1, data_smallint_6, data_text. Each of these tables would have only the following columns: userID, itemID, value (the value column has the data type corresponding to its table). For one http post with e.g. 20 items, I then might have to create 19 rows in data_tinyint_1 and one row in data_text (instead of updating one large row with many columns). However, for every item, I need to determine its data type (via two table joins) so I know in which table to create the new row. My zend framework based application code will get more complicated with this approach.
My questions:
Will my solution be better for heavy write load?
Do you have a better solution?
Since you're getting to a point of abstracting this schema to mimic actual datatypes, it might stand to reason that you should simply create new table sets per-survey instead. Benefit will be that the locking will lessen and you could isolate heavy loads to outside machines, if the load becomes unbearable.
The single-survey database structure then can more accurately reflect your real world conditions and data input handlers. It ought to make your abstraction headaches go away.
There's nothing wrong with creating tables on the fly. In some configurations, soft sharding is preferable.
This looks like obvious solution would be to use document database for fast writes and then bulk-insert answers to MySQL asynchronously using cron or something like that. You can create view in the document database for quick statistics, but allow filtering and other complicated stuff only in MySQ if you're not a fan of document DBMSs.

PHP MySQL inserting data to multiple tables

I'm trying to make an experimental web application which minimises redundant data.
I have three example tables set up like so:
Table one
ID | created_at (unix timestamp) | updated_at (unix timestamp)
Table two
ID | Foreign Key to table one | Title
Table three (pages)
ID | Foreign Keys to both table one and two | Content | Metadata
The idea being that everything created in the application will have a creation/edit time.
Many (but not all) things will have a title (For example a page or a section for a page to go into).
Finally, some things will have attributes specific to themselves, eg content and metadata for a page.
I'm trying to work out the best way to enter data into multiple tables. I know I could do multiple insert queries from PHP, keep track of rows created in the current transaction and delete those rows should a later part of the transaction fail. However, if the PHP script dies completely, it may stop before all of the deletions can be completed.
Does MySQL have any inbuilt logic which would allow the insert query to be split up? Would a trigger be able to handle this type of transaction or is it beyond its capabilities?
Any advice, thoughts or ideas would be greatly appreciated.
Thanks!
A solution would be to use Transactions, which allow to get "all or nothing" behaviour.
The idea is the following :
you start a transaction
you do your inserts/updates
if everything is OK, you commit the transaction ; which will save everything you did during this transaction
if not, you rollback the transaction ; and everything you did in it will be cancelled.
if you don't commit and disconnected (if your PHP script dies, for instance), nothing will be commited, and what you did during the un-commited transaction will automatically be rolled-back.
For more informations, you can take a look at 12.4.1. START TRANSACTION, COMMIT, and ROLLBACK Syntax, for MySQL.
Note that transactions are only available for some DB engines :
MyISAM doesn't support transactions
InnoDB does (it also supports foreign keys, for instance -- it's far more advanced that MyISAM).
For multiple inserts you can create a procedure and on PHP you call the procedure.

Categories