I'm creating a database on mysql for a small app.
Problem is there are too many fields that are identical on different Tables like
Table 1: Muncipal Issues:
ID,
UserID,
Title,
Location,
Description,
ImageURL,
Table 2: Harrasement Issues:
ID ,
UserID,
Title,
Location,
Description,
ImageURL
Tables 3 same as above
both tables have almost same coulmns.
i want to ask if it's better to use a relations and create a table for handling IDs and link it with other details or it's better to create a single table with an extra coulmn for these issues.
on one hand there'll be too many tables with identical columns.
on the other hand there'll few tables with too many rows in it.
What will be best for performance more rows or more tables.
i'm using Mysql.
Firstly, unless you expect millions of records don't care that much about performance but care more about the structure of your data and how easy it will be to access it. Literally write down a list of data that you plan to extract in your app e.g. "find all issues today", "find all unresolved issues older than 6 months" and then try to build real SQL queries on your expected structure. If they're going hard try to change the structure.
To answer your question: it depends. The current structure has following benefits:
It's easy to query certain type of issues
It's easy to build a PHP application - just make one template form (or model) and then copypaste it with slight changes for other tables
In case of performance problems it may be easier to create a cluster by simply putting each table on the different db server.
and following downsides:
It's inflexible. Adding new field that you forgot to add in the beginning will be painful since you'll have to change 3 (or more) tables and then the same amount of pieces in your app.
Adding new types of issues will be painful and require creating new table.
Creating SQL-s for getting data like "all non-resolved issues (regardless of type)" will require complicated UNION-s. Moreover this UNIONS will require creating virtual field with issue type otherwise you can't tell from which table did certain id come.
The classical db approach recommends using one table for common fields and create derived tables for fields that are different. So:
issues table should have all common fields and is identified by PK issue_id
municipal_issues uses the foreign key to issues.issue_id and has only the specific fields
harassment_issues uses the foreign key to issues.issue_id and has only the specific fields
also the issues table has the issue_type field that takes values "harassment", "municipal" etc and helps finding the table where the additional data are stored.
This pattern is called "Class Table inheritance" and you may check out the SQL antipatterns presentation for more info and other approaches. This solves the flexibility issue and still allows re-creating each of the original tables with only one simple JOIN that goes pretty fast.
Also as a side note you may look into the db schema of bug-trackers like Mantis since this looks like the same domain.
For the past couple years I've been working on my own lightweight PHP CMS that I use for my personal projects. The one thing its missing is an easy databasing solution.
I am looking to create a simple content type database framework in which I can specify a new type (user, book, event..ect) and then be able to load everything related to it automatically.
For some content types, there could be fields that can only have 1 value and some that can have zero to many values so I will use a new table for these. Take the example:
table: event
columns: id, name, description, date
table: event_people:
columns: id_event, id_user
table: event_pictures:
columns: id_event, picture
Events will have a bunch of fields that contain a value such as the description, but there could also be a bunch of pictures and people going to it.
I want to be able to create a generic PHP class that will load all the information on a content type. My current thought process is to make entity loader function that I can give it an id and type:
Entity:load($id, "event");
From this I was going to get all of the tables with the prefix of "event", load all of the data with the passed in ID and then store it in a multidimensional array. I feel like there is probably a more efficient way for this however. I'd like to stay away from having a config file someplace that specifies all of the content types and their child tables because I want to be able to add a new child table and have it pick it up automatically.
Is there anyway to store this relationship directly within the MySQL table? I don't do a lot of databasing and I've just recently started to use foreign keys (what a life saver). Would I be more efficient to see which tables have a foreign key related to the id column in the event table, and if so how would this be done? I'm also open to different ways of storing this information.
Note: I'm doing this just for fun so please don't refer me to use any premade frameworks. I'd like to create this myself.
I think your approach of searching for all tables with prefix name event is sensible. The only way I can think to be more efficient is to have an "entity_relationship" table that you could query. It would allow you flexibility in your naming convention, avoid naming conflicts, and this lookup should be more efficient than a pattern match search.
Then whenever a new object type with its own table was added, then you could make an entry on the relationship table.
INSERT INTO entity_relationship VALUES
('event','event_people'),
('event','event_pictures'),
('event','event_documents'),
('event','event_performers');
I`m creating a simple data mapping system with PHP, PDO and Mysql.
One of its premises is to map arrays to entities by creating tertiary tables (not sure if the name is correct).
So, when I map an Array, I create the table with a works-for-all statement that uses the class name and method name passed, something like this:
"create table if not exists ".$tablename." (id_".$firstName." int unsigned not null, ".$secondName." ".$type.", constraint fk_".$tablename." foreign key (id_".$firstName.") references ".$firstName."(id) ".$secondReference.");"
The code is not the problem here.
What I wanted to know is if its a bad idea to TRY to create a table (if not exists) in every iteration (it does only create it for real in the first iteration of each element).
EDIT (explaining): As stated, creating inumerous tables is not the worry (wont happen), for this process is automated according to the classes (models) I`m using. The worry is if it is too costy memory and trafic-wise to check if the table exists at every iteration (this way for each item I would access the database twice, once for checking if the table exists and then again for inserting the new element into the table).
Another option would be to check if the table exists trough a select statement first, but it doesn`t seem much better.
One important information is that these methods used for mapping will olny be accessed through the objects DAO referencing each entity.
Edit: The link for the GitHub with the project is https://github.com/Sirsirious/SPDMap
to me it doesn't sound ideal to create a table each time. Might be better to reuse the same table (with an additional column as identifier between your current 'tables'
if you do create the table, don't see anything wrong with create table if not exist. this is a safe and good programming
I'd also consider using temp tables for this thing. if you create the table each time, it sounds like they are one-time usage as well. so if you don't need the data forever, temp can be a good way to go
When generating a table from this model:
function init()
{
parent::init();
$this->addField('person_id')->refModel('Model_Person')->mandatory(true);
$this->addField('username')->mandatory(true);
$this->addField('password')->mandatory(true);
}
I get this SQL statement:
create table users (
id int auto_increment not null primary key,
person_id varchar(255),
person int(11),
username varchar(255),
password varchar(255));
In this SQL statement i get the opposite of what is said in the tutorial:
Calling refModel with a field name ending in "_id" will actually create 2 field definitions. "publisher_id", for instance, will be defined as integer and will have type "reference", and a field "publisher" will also be added, with exactly same properties - but it will be a calculated field and will use sub-select to determine the value.
I want to know:
Is the generated SQL statement correct?
What does this VARCHAR additional generated field do? (I made CRUD and when added new records, the value of this field was saved as NULL).
When using refModel(), if i used the model name only ('Person') i got an error (Unable to include Person.php), i had to use the complete class name ('Model_Person'). Is this ok? shouldn't i be able to use the model name only?
The mandatory() doesn't use NOT NULL, is there way to do this?
The generated SQL statement is not correct. It's a bug in generator. You need just one field o type "int" ending with the _id.
The reason why it does this, is because refModel() actually creates two fields in the model, one of which is used for editing (_id) and other is used for listing data (as a sub-query)
When you use refModel, you should use "Model_Person". The consistency between refModel, setModel and other fields will be improved in 4.2, it's not done due to compatibility reasons.
The SQL Generator by it's nature is incomplete and it can't be complete, so it's better that schema is reviewed anyway. For instance you might have some fields which are not defined in "Model". Also I prefer that developers pay attention to SQL as it might not reflects models precisely, one model may use multiple tables through join or models may inherit each then add more field definitions there.
mandatory() is a model-level requirement which works similarly to other validations. While MySQL could handle the "mandatory" condition, it wouldn't be able to handle others. Besides, you may remove "mandatory" when you inherit models.
I'll try to add a guide on effective use of Models in Agile Toolkit.
Here's a tricky one - how do I programatically create and interrogate a database whose contents I can't really foresee?
I am implementing a generic input form system. The user can create PHP forms with a WYSIWYG layout and use them for any purpose he wishes. He can also query the input.
So, we have three stages:
a form is designed and generated. This is a one-off procedure, although the form can be edited later. This designs the database.
someone or several people make use of the form - say for daily sales reports, stock keeping, payroll, etc. Their input to the forms is written to the database.
others, maybe management, can query the database and generate reports.
Since these forms are generic, I can't predict the database structure - other than to say that it will reflect HTML form fields and consist of a the data input from collection of edit boxes, memos, radio buttons and the like.
Questions and remarks:
A) how can I best structure the database, in terms of tables and columns? What about primary keys? My first thought was to use the control name to identify each column, then I realized that the user can edit the form and rename, so that maybe "name" becomes "employee" or "wages" becomes ":salary". I am leaning towards a unique number for each.
B) how best to key the rows? I was thinking of a timestamp to allow me to query and a column for the row Id from A)
C) I have to handle column rename/insert/delete. Foe deletion, I am unsure whether to delete the data from the database. Even if the user is not inputting it from the form any more he may wish to query what was previously entered. Or there may be some legal requirements to retain the data. Any gotchas in column rename/insert/delete?
D) For the querying, I can have my PHP interrogate the database to get column names and generate a form with a list where each entry has a database column name, a checkbox to say if it should be used in the query and, based on column type, some selection criteria. That ought to be enough to build searches like "position = 'senior salesman' and salary > 50k".
E) I probably have to generate some fancy charts - graphs, histograms, pie charts, etc for query results of numerical data over time. I need to find some good FOSS PHP for this.
F) What else have I forgotten?
This all seems very tricky to me, but I am database n00b - maybe it is simple to you gurus?
Edit: please don't tell me not to do it. I don't have any choice :-(
Edit: in real life I don't expect column rename/insert/delete to be frequent. However it is possible that after running for a few months a change to the database might be required. I am sure this happens regularly. I fear that I have worded this question badly and that people think that changes will be made willy-nilly every 10 minutes or so.
Realistically, my users will define a database when they lay out the form. They might get it right first time and never change it - especially if they are converting from paper forms. Even if they do decide to change, this might only happen once or twice ever, after months or years - and that can happen in any database.
I don't think that I have a special case here, nor that we should be concentrating on change. Perhaps better to concentrate on linkage - what's a good primary key scheme? Say, perhaps, for one text input, one numerical and a memo?
"This all seems very tricky to me, but
I am database n00b - maybe it is
simple to you gurus?"
Nope, it really is tricky. Fundamentally what you're describing is not a database application, it is a database application builder. In fact, it sounds as if you want to code something like Google App Engine or a web version of MS Access. Writing such a tool will take a lot of time and expertise.
Google has implemented flexible schemas by using its BigTable platform. It allows you to flex the schema pretty much at will. The catch is, this flexibility makes it very hard to write queries like "position = 'senior salesman' and salary > 50k".
So I don't think the NoSQL approach is what you need. You want to build an application which generates and maintains RDBMS schemas. This means you need to design a metadata repository from which you can generate dynamic SQL to build and change the users' schemas and also generate the front end.
Things your metadata schema needs to store
For schema generation:
foreign key relationships (an EMPLOYEE works in a DEPARTMENT)
unique business keys (there can be only one DEPARTMENT called "Sales")
reference data (permitted values of EMPLOYEE.POSITION)
column data type, size, etc
whether column is optional (i.e NULL or NOT NULL)
complex business rules (employee bonuses cannot exceed 15% of their salary)
default value for columns
For front-end generation
display names or labels ("Wages", "Salary")
widget (drop down list, pop-up calendar)
hidden fields
derived fields
help text, tips
client-side validation (associated JavaScript, etc)
That last points to the potential complexity in your proposal: a regular form designer like Joe Soap is not going to be able to formulate the JS to (say) validate that an input value is between X and Y, so you're going to have to derive it using templated rules.
These are by no means exhaustive lists, it's just off the top of my head.
For primary keys I suggest you use a column of GUID datatype. Timestamps aren't guaranteed to be unique, although if you run your database on an OS which goes to six places (i.e. not Windows) it's unlikely you'll get clashes.
last word
'My first thought was to use the
control name to identify each column,
then I realized that the user can edit
the form and rename, so that maybe
"name" becomes "employee" or "wages"
becomes ":salary". I am leaning
towards a unique number for each.'
I have built database schema generators before. They are hard going. One thing which can be tough is debugging the dynamic SQL. So make it easier on yourself: use real names for tables and columns. Just because the app user now wants to see a form titled HEADCOUNT it doesn't mean you have to rename the EMPLOYEES table. Hence the need to separate the displayed label from the schema object name. Otherwise you'll find yourself trying to figure out why this generated SQL statement failed:
update table_11123
set col_55542 = 'HERRING'
where col_55569 = 'Bootle'
/
That way madness lies.
In essence, you are asking how to build an application without specifications. Relational databases were not designed so that you can do this effectively. The common approach to this problem is an Entity-Attribute-Value design and for the type of system in which you want to use it, the odds of failure are nearly 100%.
It makes no sense for example, that the column called "Name" could become "Salary". How would a report where you want the total salary work if the salary values could have "Fred", "Bob", 100K, 1000, "a lot"? Databases were not designed to let anyone put anything anywhere. Successful database schemas require structure which means effort with respect to specifications on what needs to be stored and why.
Therefore, to answer your question, I would rethink the problem. The entire approach of trying to make an app that can store anything in the universe is not a recipe for success.
Like Thomas said, rational database is not good at your problem. However, you may want to take a look at NoSQL dbs like MongoDB.
See this article:
http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/
for someone else's experience of your problem.
This is for A) & B), and is not something I have done but thought it was an interesting idea that Reddit put to use, see this link (look at Lesson 3):
http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html
Not sure about the database but for charts instead of using PHP for the charts, I recommend looking into using javascript (http://www.reynoldsftw.com/2009/02/6-jquery-chart-plugins-reviewed/). Advantages to this are some of the processing is offloaded to the client side for chart displays and they can be interactive.
The other respondents are correct that you should be very cautious with this approach because it is more complex and less performant than the traditional relational model - but I've done this type of thing to accommodate departmental differences at work, and it worked fine for the amount of use it got.
Basically I set it up like this, first - a table to store some information about the Form the user wants to create (obviously, adjust as you need):
--************************************************************************
-- Create the User_forms table
--************************************************************************
create table User_forms
(
form_id integer identity,
name varchar(200),
status varchar(1),
author varchar(50),
last_modifiedby varchar(50),
create_date datetime,
modified_date datetime
)
Then a table to define the fields to be presented on the form including any limits
and the order and page they are to be presented (my app presented the fields as a
multi-page wizard type of flow).
-
-************************************************************************
-- Create the field configuration table to hold the entry field configuration
--************************************************************************
create table field_configuration
(
field_id integer identity,
form_id SMALLINT,
status varchar(1),
fieldgroup varchar(20),
fieldpage integer,
fieldseq integer,
fieldname varchar(40),
fieldwidth integer,
description varchar(50),
minlength integer,
maxlength integer,
maxval varchar(13),
minval varchar(13),
valid_varchars varchar(20),
empty_ok varchar(1),
all_caps varchar(1),
value_list varchar(200),
ddl_queryfile varchar(100),
allownewentry varchar(1),
query_params varchar(50),
value_default varchar(20)
);
Then my perl code would loop through the fields in order for page 1 and put them on the "wizard form" ... and the "next" button would present the page 2 fields in order etc.
I had javascript functions to enforce the limits specified for each field as well ...
Then a table to hold the values entered by the users:
--************************************************************************
-- Field to contain the values
--************************************************************************
create table form_field_values
(
session_Id integer identity,
form_id integer,
field_id integer,
value varchar(MAX)
);
That would be a good starting point for what you want to do, but keep an eye on performance as it can really slow down any reports if they add 1000 custom fields. :-)