I'm trying to figure out a way to display the number of grandchildren, great grandchildren, etc. on a website focusing on animals. Someone told me about a really cool query # Hierarchical queries in MySQL
Below is my adaptation.
$stm = $pdo->prepare("SELECT COUNT(#id := (
SELECT `Taxon`
FROM gz_life_mammals
WHERE `Parent` = #id
)) AS numDescendants
FROM (
SELECT #id := :MyURL
) vars
STRAIGHT_JOIN gz_life_mammals
WHERE #id IS NOT NULL");
$stm->execute(array(
'MyURL'=>$MyURL
));
while ($row = $stm->fetch())
{
$ChildrenCount = $row['numDescendants'];
}
echo $ChildrenCount;
I think I have it set up to count children, actually, but I'll work on grandchildren next. Anyway, when I navigate to a species page, it correctly displays a count of 0. But when I navigate to a parent page, I get this error message:
Cardinality violation: 1242 Subquery returns more than 1 row
Can anyone tell me what's going on and how I can fix that?
My database table features animal taxa in a parent-child relationship in the field Taxon, like this:
Taxon | Parent
Mammalia | Chordata
Carnivora | Mammalia
Canidae | Carnivora
Canis | Canidae
Canis-lupus | Canis
To see information about the wolf (Canis lupus), I would navigate to MySite/life/canis-lupus
ON EDIT
Here's the table schema. I can't make it work with SQFiddle, though; one error after another.
CREATE TABLE t (
N INT(6) default None auto_increment,
Taxon varchar(50) default NULL,
Parent varchar(25) default NULL,
NameCommon varchar(50) default NULL,
Rank smallint(2) default 0
PRIMARY KEY (N)
) ENGINE=MyISAM
Hopefully one would agree that this is not an answer-only Answer without explanation, since the code is quite documented throughout.
Basically, it is a self-join table with a row having a reference to who its parent is. The stored proc will use a worktable to find children, children-of-children, etc. And maintain a level.
For instance, level=1 represents children, level=2 represents grandchildren, etc.
At the end, the counts are retrieved. As the id's are in the worktable, expand as you wish with it.
Schema
create schema TaxonSandbox; -- create a separate database so it does not mess up your stuff
use TaxonSandbox; -- use that db just created above (stored proc created in it)
-- drop table t;
CREATE TABLE t (
N int auto_increment primary key,
Taxon varchar(50) not null,
Parent int not null, -- 0 can mean top-most for that branch, or NULL if made nullable
NameCommon varchar(50) not null,
Rank int not null,
key(parent)
);
-- truncate table t;
insert t(taxon,parent,NameCommon,rank) values ('FrogGrandpa',0,'',0); -- N=1
insert t(taxon,parent,NameCommon,rank) values ('FrogDad',1,'',0); -- N=2 (my parent is N=1)
insert t(taxon,parent,NameCommon,rank) values ('FrogMe',2,'',0); -- N=3 (my parent is N=2)
insert t(taxon,parent,NameCommon,rank) values ('t4',1,'',0); -- N=4 (my parent is N=2)
insert t(taxon,parent,NameCommon,rank) values
('t5',4,'',0),('t6',4,'',0),('t7',5,'',0),('t8',5,'',0),('t9',7,'',0),('t10',7,'',0),('t11',7,'',0),('t12',11,'',0);
Stored Procedure
use TaxonSandbox;
drop procedure if exists showHierarchyUnder;
DELIMITER $$ -- will be discussed separately at bottom of answer
create procedure showHierarchyUnder
(
theId int -- the id of the Taxon to search for it's decendants (my awkward verbiage)
)
BEGIN
-- theId parameter means i am anywhere in hierarchy of Taxon
-- and i want all decendent Taxons
declare bDoneYet boolean default false;
declare working_on int;
declare next_level int; -- parent's level value + 1
declare theCount int;
CREATE temporary TABLE xxFindChildenxx
( -- A Helper table to mimic a recursive-like fetch
N int not null, -- from OP's table called 't'
processed int not null, -- 0 for not processed, 1 for processed
level int not null, -- 0 is the id passed in, -1=trying to figure out, 1=children, 2=grandchildren, etc
parent int not null -- helps clue us in to figure out level
-- NOTE: we don't care about level or parent when N=parameter theId passed into stored proc
-- in fact we will be deleting that row near the bottom or proc
);
set bDoneYet=false;
insert into xxFindChildenxx (N,processed,level,parent) select theId,0,0,0; -- prime the pump, get sp parameter in here
-- stay inside below while til all retrieved children/children of children are retrieved
while (!bDoneYet) do
-- see if there are any more to process for children
-- simply look in worktable for ones where processed=0;
select count(*) into theCount from xxFindChildenxx where processed=0;
if (theCount=0) then
-- found em all, we are done inside this while loop
set bDoneYet=true;
else
-- one not processed yet, insert its children for processing
SELECT N,level+1 INTO working_on,next_level FROM xxFindChildenxx where processed=0 limit 1; -- order does not matter, just get one
-- insert the rows where the parent=the one we are processing (working_on)
insert into xxFindChildenxx (N,processed,level,parent)
select N,0,next_level,parent
from t
where parent=working_on;
-- mark the one we "processed for children" as processed
-- so we processed a row, but its children rows are yet to be processed
update xxFindChildenxx set processed=1 where N=working_on;
end if;
end while;
delete from xxFindChildenxx where N=theId; -- don't really need the top level row now (stored proc parameter value)
select level,count(*) as lvlCount from xxFindChildenxx group by level;
drop table xxFindChildenxx;
END
$$ -- tell mysql that it has reached the end of my block (this is important)
DELIMTER ; -- sets the default delimiter back to a semi-colon
Test Stored Proc
use TaxonSandbox; -- create a separate database so it does not mess up your stuff
call showHierarchyUnder(1);
+-------+----------+
| level | lvlCount |
+-------+----------+
| 1 | 2 |
| 2 | 3 |
| 3 | 2 |
| 4 | 3 |
| 5 | 1 |
+-------+----------+
So there are 2 children, 3 grandchildren, 2 great-grandchildren, 3 great-great, and 1 great-great-great
Were one to pass an id to the stored proc that does not exist, or one that has no children, no result set rows are returned.
Edit:
other comments, due to leaving the OP hanging on understanding his first stored proc creation I believe. Plus other questions that point back here.
Delimiters
Delimiters are important to wrap the block of the stored proc creation. The reason is so that mysql understands that the sequence of statements that follow are still part of the stored proc until it reaches the specified delimiter. In the case above, I made up one called $$ that is different from the default delimiter of a semi-colon that we are all used to. This way, when a semi-colon is encountered inside the stored proc during creation, the db engine will just consider it as one the many statements inside of it instead of terminating the stored proc creation. Without doing this delimiter wrapping, one can waste hours trying to create their first stored proc getting Error 1064 Syntax errors. At the end of the create block I merely have a line
$$
which tell mysql that that is the end of my creation block, and then the default delimiter of a semi-colon is set back with the call to
DELIMITER ;
Mysql manual page Using Delimiters with MySqlScript. Not a great manual page imo, but trust me on this one. Same issue when creating Triggers and Events.
PHP
To call this stored proc from php, it is just a string, "call showHierarchyUnder(1)". It returns a result set as described above, which, as described, can return a result set with no rows.
Remember that the 1 is a parameter to the stored proc. And that this exists in a database created, called TaxonSandbox if you followed the above.
Related
I am currently having problems with a primary key ID which is set to auto increment. It keeps incrementing ON DUPLICATE KEY.
For Example:
ID | field1 | field2
1 | user | value
5 | secondUser | value
86 | thirdUser | value
From the description above, you'll notice that I have 3 inputs in that table but due to auto increment on each update, ID has 86 for the third input.
Is there anyway to avoid this ?
Here's what my mySQL query looks like:
INSERT INTO table ( field1, field2 ) VALUES (:value1, :value2)
ON DUPLICATE KEY
UPDATE field1 = :value1, field2 = :value2
And here's what my table looks like;
CREATE TABLE IF NOT EXISTS `table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`field1` varchar(200) NOT NULL,
`field2` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `field1` (`field1`),
KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
You could set the innodb_autoinc_lock_mode config option to "0" for "traditional" auto-increment lock mode, which guarantees that all INSERT statements will assign consecutive values for AUTO_INCREMENT columns.
That said, you shouldn't depend on the auto-increment IDs being consecutive in your application. Their purpose is to provide unique identifiers.
This behavior is easily seen below with the default setting for innodb_autoinc_lock_mode = 1 (“consecutive” lock mode). Please also reference the fine manual page entitled AUTO_INCREMENT Handling in InnoDB. Changing this value will lower concurrency and performance with the setting = 0 for “Tranditional” lock mode as it uses a table-level AUTO-INC lock.
That said, the below is with the default setting = 1.
I am about to show you four examples of how easy it is to create gaps.
Example 1:
create table x
( id int auto_increment primary key,
someOtherUniqueKey varchar(50) not null,
touched int not null,
unique key(someOtherUniqueKey)
);
insert x(touched,someOtherUniqueKey) values (1,'dog') on duplicate key update touched=touched+1;
insert x(touched,someOtherUniqueKey) values (1,'dog') on duplicate key update touched=touched+1;
insert x(touched,someOtherUniqueKey) values (1,'cat') on duplicate key update touched=touched+1;
select * from x;
+----+--------------------+---------+
| id | someOtherUniqueKey | touched |
+----+--------------------+---------+
| 1 | dog | 2 |
| 3 | cat | 1 |
+----+--------------------+---------+
The Gap (id=2 is skipped) is due to one of a handful of operations and quirks and nervous twitches of the INNODB engine. In its default high performance mode of concurrency, it performs range gap allocations for various queries sent to it. One had better have good reasons to change this setting, because doing so impacts performance. The sorts of things later versions of MySQL delivers to you, and you turn off due to Hyper Focusing on gaps in printout sheets (and bosses that say "Why do we have gaps").
In the case of an Insert on Duplicate Key Update (IODKU), it is assuming 1 new row and allocates a slot for it. Remember, concurrency, and your peers doing the same operations, perhaps hundreds concurrently. When the IODKU turns into an Update, well, there goes the use of that abandoned and never inserted row with id=2 for your connection and anyone else.
Example 2:
The same happens during Insert ... Select From as seen in This Answer of mine. In it I purposely use MyISAM due to reporting on counts, min, max, otherwise the range gap quirk would allocate and not fill all. And the numbers would look weird as that answer dealt with actual numbers. So the older engine (MyISAM) worked fine for tight non-gaps. Note that in that answer I was trying to do something fast and safe and that table could be converted to INNODB with ALTER TABLE after the fact. Had I done that example in INNODB to begin with, there would have been plenty of gaps (in the default mode). The reason the Insert ... Select From would have creates gaps in that Answer had I used INNODB was due to the uncertainty of the count, the mechanism that the engine chooses for safe (uncertain) range allocations. The INNODB engine knows the operation naturally, knows in has to create a safe pool of AUTO_INCREMENT id's, has concurrency (other users to think about), and gaps flourish. It's a fact. Try example 2 with the INNODB engine and see what you come up with for min, max, and count. Max won't equal count.
Examples 3 and 4:
There are various situations that cause INNODB Gaps documented on the Percona website as they stumble into more and document them. For instance, it occurs during failed inserts due to Foreign Key constraints seen in this 1452 Error image. Or a Primary Key error in this 1062 Error image.
Remember that the INNODB Gaps are there as a side-effect of system performance and a safe engine. Is that something one really wants to turn-off (Performance, Higher user statisfaction, higher concurrency, lack of table locks), for the sake of tighter id ranges? Ranges that have holes on deletes anyway. I would suggest not for my implementations, and the default with Performance is just fine.
I am currently having problems with a primary key ID which is set to
auto increment. It keeps incrementing ON DUPLICATE KEY
One of us must be misunderstanding the problem, or you're misrepresenting it. ON DUPLICATE KEY UPDATE never creates a new row, so it cannot be incrementing. From the docs:
If you specify ON DUPLICATE KEY UPDATE, and a row is inserted that
would cause a duplicate value in a UNIQUE index or PRIMARY KEY, MySQL
performs an UPDATE of the old row.
Now it's probably the case that auto-increment occurs when you insert and no duplicate key is found. If I assume that this is what's happening, my question would be: why is that a problem?
If you absolutely want to control the value of your primary key, change your table structure to remove the auto-increment flag, but keep it a required, non-null field. It will force you to provide the keys yourself, but I would bet that this will become a bigger headache for you.
I really am curious though: why do you need to plug all the holes in the ID values?
I answered it here:
to solve the auto-incrementing problem use the following code before insert/on duplicate update part and execute them all together:
SET #NEW_AI = (SELECT MAX(`the_id`)+1 FROM `table_blah`);
SET #ALTER_SQL = CONCAT('ALTER TABLE `table_blah` AUTO_INCREMENT =', #NEW_AI);
PREPARE NEWSQL FROM #ALTER_SQL;
EXECUTE NEWSQL;
together and in one statement it should be something like below:
SET #NEW_AI = (SELECT MAX(`the_id`)+1 FROM `table_blah`);
SET #ALTER_SQL = CONCAT('ALTER TABLE `table_blah` AUTO_INCREMENT =', #NEW_AI);
PREPARE NEWSQL FROM #ALTER_SQL;
EXECUTE NEWSQL;
INSERT INTO `table_blah` (`the_col`) VALUES("the_value")
ON DUPLICATE KEY UPDATE `the_col` = "the_value";
You can change your query from
INSERT INTO table ( f1, f2 ) VALUES (:v1, :v2) ON DUPLICATE KEY UPDATE f1 = :v1, f2 = :v2
to
insert ignore into table select (select max(id)+1 from table), :v1, :v2 ;
This will try
insert new data with last unused id (not autoincrement)
if in unique fields duplicate entry found ignore it
else insert new data normally
( but this method not support to update fields if duplicate entry found )
For a bit of background, we use Zend Framework 2 and Doctrine at work. Doctrine will always insert NULL for values we do not populate ourselves. Usually this is okay as if the field has a default value, then it SHOULD populate the field with this default value.
For one of our servers running MySQL 5.6.16 a query such as the one below runs and executes fine. Although NULL is being inserted into a field which is not nullable, MySQL populates the field with its default value on insert.
On another of our servers running MySQL 5.6.20, we run the query below and it falls over because it complains that 'field_with_default_value' CANNOT be null.
INSERT INTO table_name(id, field, field_with_default_value)
VALUES(id_value, field_value, NULL);
Doctrine itself does not support passing through "DEFAULT" into the queries it builds so that is not an option. I figure this must be a MySQL server thing of some kind seeing as though it works okay in one version but not another, but unfortunately I have no idea what this could be. Our SQL Mode is also identical on both servers ('NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION').
I should probably mention, if I actually run the above SQL in Workbench it still does not work in the same way. So it's not really a Doctrine issue but definitely a MySQL issue of some sort.
Any help on this would be greatly appreciated.
I came across the same problem after a MySQL upgrade. Turns out there is a setting to allow NULL inserts against NOT NULL timestamp fields and get the default value.
explicit_defaults_for_timestamp=0
This is documented at https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_explicit_defaults_for_timestamp
Based on my research, I would say it could both be a "you" thing and a "MySQL" thing. Check your table definitions with SHOW CREATE TABLE table_name;. Take note of any fields defined with NOT NULL.
The MySQL 5.6 Reference Manual: 13.2.5 INSERT syntax states:
Inserting NULL into a column that has been declared NOT NULL. For
multiple-row INSERT statements or INSERT INTO ... SELECT statements,
the column is set to the implicit default value for the column data
type. This is 0 for numeric types, the empty string ('') for string
types, and the “zero” value for date and time types. INSERT INTO ...
SELECT statements are handled the same way as multiple-row inserts
because the server does not examine the result set from the SELECT to
see whether it returns a single row. (For a single-row INSERT, no
warning occurs when NULL is inserted into a NOT NULL column. Instead,
the statement fails with an error.)
This would imply that it does not matter which SQL mode you are using. If you are doing a single row INSERT (as per your sample code) and inserting a NULL value into a column defined with NOT NULL, it is not supposed to work.
In the same breath, ironically, if you were to simply omit the value from the values list, the MySQL manual says the following, and the SQL mode does matter in this case:
If you are not running in strict SQL mode, any column not explicitly
given a value is set to its default (explicit or implicit) value. For
example, if you specify a column list that does not name all the
columns in the table, unnamed columns are set to their default values.
Default value assignment is described in Section 11.6, “Data Type
Default Values”. See also Section 1.7.3.3, “Constraints on Invalid
Data”.
Thus, you can't win! ;-) Kidding. The thing to do is to accept that NOT NULL on a MySQL table field really means I will not accept a NULL value for a field while performing a single row INSERT, regardless of SQL mode.'
All that being said, the following from the manual is also true:
For data entry into a NOT NULL column that has no explicit DEFAULT
clause, if an INSERT or REPLACE statement includes no value for the
column, or an UPDATE statement sets the column to NULL, MySQL handles
the column according to the SQL mode in effect at the time:
If strict SQL mode is enabled, an error occurs for transactional
tables and the statement is rolled back. For nontransactional tables,
an error occurs, but if this happens for the second or subsequent row
of a multiple-row statement, the preceding rows will have been
inserted.
If strict mode is not enabled, MySQL sets the column to the implicit
default value for the column data type.
So, take heart. Set your defaults in the business logic (objects), and let the data layer take direction from that. Database defaults seem like a good idea, but if they did not exist, would you miss them? If a tree falls in the forest...
According to the documentation, everything works as expected.
Test case:
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.6.16 |
+-----------+
1 row in set (0.00 sec)
mysql> SELECT ##GLOBAL.sql_mode 'sql_mode::GLOBAL',
##SESSION.sql_mode 'sql_mode::SESSION';
+------------------------+------------------------+
| sql_mode::GLOBAL | sql_mode::SESSION |
+------------------------+------------------------+
| NO_ENGINE_SUBSTITUTION | NO_ENGINE_SUBSTITUTION |
+------------------------+------------------------+
1 row in set (0.00 sec)
mysql> SET SESSION sql_mode := 'NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT ##GLOBAL.sql_mode 'sql_mode::GLOBAL',
##SESSION.sql_mode 'sql_mode::SESSION';
+------------------------+-----------------------------------------------------------------------------------------------------------------+
| sql_mode::GLOBAL | sql_mode::SESSION |
+------------------------+-----------------------------------------------------------------------------------------------------------------+
| NO_ENGINE_SUBSTITUTION | NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+------------------------+-----------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SHOW CREATE TABLE `table_name`;
+------------+----------------------------------------------------------------------------+
| Table | Create Table |
+------------+----------------------------------------------------------------------------+
| table_name | CREATE TABLE `table_name` ( |
| | `id` INT(11) UNSIGNED NOT NULL, |
| | `field` VARCHAR(20) DEFAULT NULL, |
| | `field_with_default_value` VARCHAR(20) NOT NULL DEFAULT 'myDefault' |
| | ) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+------------+----------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> INSERT INTO `table_name`(`id`, `field`, `field_with_default_value`)
VALUES
(1, 'Value', NULL);
ERROR 1048 (23000): Column 'field_with_default_value' cannot be null
Is it possible to post the relevant part of the structure of your table to see how we can help?
UPDATE
MySQL 5.7, using triggers, can provide a possible solution to the problem:
Changes in MySQL 5.7.1 (2013-04-23, Milestone 11)
...
If a column is declared as NOT NULL, it is not permitted to insert
NULL into the column or update it to NULL. However, this constraint
was enforced even if there was a BEFORE INSERT (or BEFORE UPDATE
trigger) that set the column to a non-NULL value. Now the constraint
is checked at the end of the statement, per the SQL standard. (Bug
#6295, Bug#11744964).
...
Possible solution:
mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> SELECT VERSION();
+-----------+
| VERSION() |
+-----------+
| 5.7.4-m14 |
+-----------+
1 row in set (0.00 sec)
mysql> DELIMITER $$
mysql> CREATE TRIGGER `trg_bi_set_default_value` BEFORE INSERT ON `table_name`
FOR EACH ROW
BEGIN
IF (NEW.`field_with_default_value` IS NULL) THEN
SET NEW.`field_with_default_value` :=
(SELECT `COLUMN_DEFAULT`
FROM `information_schema`.`COLUMNS`
WHERE `TABLE_SCHEMA` = DATABASE() AND
`TABLE_NAME` = 'table_name' AND
`COLUMN_NAME` = 'field_with_default_value');
END IF;
END$$
mysql> DELIMITER ;
mysql> INSERT INTO `table_name`(`id`, `field`, `field_with_default_value`)
VALUES
(1, 'Value', NULL);
Query OK, 1 row affected (0.00 sec)
mysql> SELECT `id`, `field`, `field_with_default_value` FROM `table_name`;
+----+-------+--------------------------+
| id | field | field_with_default_value |
+----+-------+--------------------------+
| 1 | Value | myDefault |
+----+-------+--------------------------+
1 row in set (0.00 sec)
MySQL actually works as intended, and that behavior seems to be there to stay. MariaDB also works the same way now.
Removing "strict mode" (STRICT_TRANS_TABLES & STRICT_ALL_TABLES) is supposed to revert to the previous behavior, but I personally haven't had any luck with it (maybe I'm doing something wrong, but both my ##GLOBAL.sql_mode & ##SESSION.sql_mode do not contain strict mode).
I think the best solution to this problem is to rely on default values at the PHP level, instead of relying on the Database to provide them. There is an existing answer that explains it pretty well. The comments are also helpful.
That way, you also gain the added benefit that your models/entities will have the default value upon instantiation instead of upon insert in the database. Also, if you want to surface those values to the user after insertion, you can do so without having to do an extra SELECT query after your INSERT.
Another alternative to surface the default values would be to use a RETURNING clause, as is available in PostgreSQL, but not in MySQL (yet). It might be added at some point in the future, but for now MariaDB only has it for DELETE statements. However, I believe that having the default values at the PHP level is still superior; even if you never insert the record, it'll still contain the default values. I've never turned back and used a database default value since putting this into practice.
If you leave out the column (both name and value) from the statement, then the default value will be used.
Some related advice:
Don't have any "non-empty" defaults in your tables, and don't have non-null defaults for nullable columns. Let all values be set from the application.
Don't put business logic on the database-side.
Only define a default if really needed, and only for non-nullable columns. And remove the default when no longer needed. (they come in handy with alter table runs, to set the value of a new column, but then immediately run a new (cheap!) alter to remove the default)
The "empty" mentioned above, is related to the type:
- 0 for numerical columns,
- '' for varchar/varbinary columns,
- '1970-01-01 12:34:56' for timestamps,
- etc.
That saves the application many round trips to the database. If a created row is fully predictable, then the application doesn't need to read it after creating, to find out what it has become. (this assumes: no triggers, no cascading)
With MySQL we make only a few specific exceptions to those strict rules:
Columns called mysql_row_foo, are only set by the database. Examples:
mysql_row_created_at timestamp(6) not null default '1970-01-01 12:34:56.000000',
mysql_row_updated_at timestamp(6) null default null on update current_timestamp,
Unique indexes on not-null columns are welcome, to prevent duplicate data. For example on lookup.brand.name in a table lookup.brand that looks like (id++, name).
The mysql_row_foo columns are like column attributes. They are used by data sync tools, for example. General applications don't read them, and they store their application-side timestamps as epoch values. Examples:
valid_until_epoch int unsigned not null default 0,
last_seen_epoch_ms bigint not null default 0,
I have a very large (2.7mb) XML file with following structure:
<?xml version="1.0"?>
<Destinations>
<Destination>
<DestinationId>W4R1FG</DestinationId>
<Country>Pakistan</Country>
<City>Karachi</City>
<State>Sindh</State>
</Destination>
<Destination>
<DestinationId>D2C2FV</DestinationId>
<Country>Turkey</Country>
<City>Istanbul</City>
<State>Istanbul</State>
</Destination>
<Destination>
<DestinationId>5TFV3E</DestinationId>
<Country>Canada</Country>
<City>Toronto</City>
<State>Ontario</State>
</Destination>
... ... ...
</Destinations>
And a MySQL table "destinations" like this:
+---+--------------+----------+---------+----------+
|id |DestinationId |Country |City |State |
+---+--------------+----------+---------+----------+
|1 |W4R1FG |Pakistan |Karachi |Sindh |
+---+--------------+----------+---------+----------+
|2 |D2C2FV |Turkey |Istanbul |Istanbul |
+---+--------------+----------+---------+----------+
|3 |5TFV3E |Canada |Toronto |Ontario |
+---+--------------+----------+---------+----------+
|. |...... |...... |....... |....... |
+---+--------------+----------+---------+----------+
Now I want to process my XML and check for each destination record in MySQL table. I have to compare only DestinationId against each record and check whether it exists in my DB table or not. If it does exist leave that record and move on, and if it doesn't exist then execute an INSERT query to insert that record in that table.
I first tried to accomplish this using PHP foreach loop mechanism, but since data is so huge, it caused me serious performance and speed issues. Then I came up with a MySQL Procedure approach like this:
DELIMITER $$
USE `destinations`$$
DROP PROCEDURE IF EXISTS `p_import_destinations`$$
CREATE DEFINER=`root`#`localhost` PROCEDURE `p_import_destinations`(
p_xml TEXT
)
BEGIN
DECLARE v_row_index INT UNSIGNED DEFAULT 0;
DECLARE v_row_count INT UNSIGNED;
DECLARE v_xpath_row VARCHAR(255);
-- calculate the number of row elements.
SET v_row_count := extractValue(p_xml,'count(/Destinations/Destination)');
-- loop through all the row elements
WHILE v_row_index < v_row_count DO
SET v_row_index := v_row_index + 1;
SET v_xpath_row := CONCAT('/Destinations/Destination[',v_row_index,']');
INSERT IGNORE INTO destinations VALUES (
NULL,
extractValue(p_xml,CONCAT(v_xpath_row, '/child::DestinationId')),
extractValue(p_xml,CONCAT(v_xpath_row, '/child::Country')),
extractValue(p_xml,CONCAT(v_xpath_row, '/child::City')),
extractValue(p_xml,CONCAT(v_xpath_row, '/child::State'))
);
END WHILE;
END$$
DELIMITER ;
Query to call this procedure:
SET #xml := LOAD_FILE('C:/Users/Muhammad Ali/Desktop/dest.xml');
CALL p_import_destinations(#xml);
This worked perfect but I am still not sure about this approach's scalability, performance and speed. And IGNORE clause used in this procedure skips through duplicate record but accumulates the auto increment key value. Like if it is checking row with id 3306, if this record is a duplicate, it will not insert this in the table (which is a good thing) but will take the auto increment key 3307, and when next time it inserts the NON-DUPLICATING record it will insert it at 3308. This don't seems good.
Any other approach(s) to meet such requirement would be much appreciated. And please guide me if I am ok to go on with this solution? If not, why?
Just remember, I am dealing with a very huge amount of data.
This worked perfect but I am still not sure about this approach's scalability, performance and speed.
Metric the speed, test how it scales. Then you're sure. Ask again if you find a problem that would hurt you in your scenario, but make the performance / scalability problem more concrete. Most likely such part then has been Q&A'ed already. If not on Stackoverflow here but on the DBA site: https://dba.stackexchange.com/
And IGNORE clause used in this procedure skips through duplicate record but accumulates the auto increment key value
This is similarily. If those gaps are a problem for you, this normally shows a flaw in your database design, because those gaps are normally meaningless (compare: How to fill in the "holes" in auto-incremenet fields?).
However, that won't mean others wouldn't have had that problem as well. You can find a lot of material for that, also "tricks" how to prevent that with specific versions of your database server. But honestly , I wouldn't care about gaps. The contract is that the identity column has a unique value. And that's all.
In any case, both for performance and IDs: Why don't you take the processing apart? First import from the XML into an import table, then you could easily remove every row you don't want to import from that import table and then you can insert into the destination table as needed.
Solved this using another logic described below..
DELIMITER $$
USE `test`$$
DROP PROCEDURE IF EXISTS `import_destinations_xml`$$
CREATE DEFINER=`root`#`localhost` PROCEDURE `import_destinations_xml`(
path VARCHAR(255),
node VARCHAR(255)
)
BEGIN
DECLARE xml_content TEXT;
DECLARE v_row_index INT UNSIGNED DEFAULT 0;
DECLARE v_row_count INT UNSIGNED;
DECLARE v_xpath_row VARCHAR(255);
-- set xml content.
SET xml_content = LOAD_FILE(path);
-- calculate the number of row elements.
SET v_row_count = extractValue(xml_content, CONCAT('count(', node, ')'));
-- create a temporary destinations table
DROP TABLE IF EXISTS `destinations_temp`;
CREATE TABLE `destinations_temp` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`DestinationId` VARCHAR(32) DEFAULT NULL,
`Country` VARCHAR(255) DEFAULT NULL,
`City` VARCHAR(255) DEFAULT NULL,
`State` VARCHAR(255) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
-- loop through all the row elements
WHILE v_row_index < v_row_count DO
SET v_row_index = v_row_index + 1;
SET v_xpath_row = CONCAT(node, '[', v_row_index, ']');
INSERT INTO destinations_temp VALUES (
NULL,
extractValue(xml_content, CONCAT(v_xpath_row, '/child::DestinationId')),
extractValue(xml_content, CONCAT(v_xpath_row, '/child::Country')),
extractValue(xml_content, CONCAT(v_xpath_row, '/child::City')),
extractValue(xml_content, CONCAT(v_xpath_row, '/child::State'))
);
END WHILE;
-- delete existing records from temporary destinations table
DELETE FROM destinations_temp WHERE DestinationId IN (SELECT DestinationId FROM destinations);
-- insert remaining (unmatched) records from temporary destinations table to destinations table
INSERT INTO destinations (DestinationId, Country, City, State)
SELECT DestinationId, Country, City, State
FROM destinations_temp;
-- creating a log file
SELECT *
INTO OUTFILE 'C:/Users/Muhammad Ali/Desktop/Destination_Import_Procedure/log/destinations_log.csv'
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\r\n'
FROM `destinations_temp`;
-- removing temporary destinations table
DROP TABLE destinations_temp;
END$$
DELIMITER ;
Query to call this procedure.
CALL import_destinations_xml('C:\Users\Muhammad Ali\Desktop\Destination_Import_Procedure\dest.xml', '/Destinations/Destination');
Consider 2 tables:
-- Table apples
id | randomstring | type TINYINT(1) DEFAULT 0
-- Table pears
id | randomstring | is_complete TINYINT(1) DEFAULT 0
Much like PHP events, is it possible to detect that when the field pears.is_complete is updated to a 1 from a 0, MySQL automatically copies the contents of pears.randomstring to apples.randomstring and sets the value of apples.type to 1?
If this is not possible in MySQL, can it be done in PHP where a php file is called when the value of a field changes?
If your running MySQL 5.0.2 or greater then using a trigger might be your best bet. Just create an update trigger that would run when the is_complete value changes to a 1.
Trigger code:
DELIMITER $$
CREATE TRIGGER `{database name}`.`{name of trigger}`
AFTER UPDATE ON `{database name}`.`pears`
FOR EACH ROW BEGIN
IF(NEW.is_complete = 1) THEN
# Add your random string to the apple table code
INSERT INTO `{database name}`.`apples` (randomstring, type)
VALUES (NEW.randomstring, 1);
END IF;
END $$
DELIMITER;
I need to make the databse structure for a real estate website where users can create properties of many types and with many features related to the property.
The main categories will be:
1. House (subtype apartment, house, loft)
2. Commercial (subtype: hotels, buildings, offices, factory)
3. Terrains (subtype: urban, agricola, industrial, for sports)
All this above can have many features defined, for example an apartment: light, gas, # of rooms, bathrooms, floor number, balcony, and so on, and this features are diferent from one property type to another.
At the moment I have one master table named property containing the basic info like address and price, and three subtables property_house, property_commercial, and property_terrain with as many fields as features a property can have.
Is this structure okay? I need to do the creation and modification of all the property types into one form maybe with 3-4 steps and will differ from one property type to another. Will it be easier if I have just one master table like property and a second property_features where to store the property_id, feature_name, and feature_value? What's best for performance and maintaining? What would you people vote for?
Thank you! :)
I have experience with both ways you have mentioned. ( I'm co-developer of iRealty http://www.irealtysoft.com/ ver 3. and ver 4 have two different storage methods). After several years of dealing with both ways I recommend to create a single table for all properties. This pattern is called Single Table Inheritance (http://martinfowler.com/eaaCatalog/singleTableInheritance.html by Martin Fowler).
I see only two disadvantages of this method:
field names should be unique within all property types
a lot of records will have NULL in about a have of their columns which wastes disk space a little bit
A the same time with this database structure all CRUD routines are very simple and straightforward. You will save a lot of time building queries/ORM layer. With this structure you are free to create indexes and utilize arithmetic and other database functions in WHERE clauses and avoid costly JOINs.
The disk space is cheap, the development time is expensive.
The | property_id | feature_name | feature_value | allows to keep the same database structure when changing fields and property types, which is good when you have a complex upgrade/update routines. If you are going to build a single (production) instance application the upgrades should not be an issue. However this method make CRUD model complex and hence more expensive and bug-prone. (More code --- more bugs.)
Well, are these three main categories set in stone? Is there a possibility of a fourth one cropping up in the future? I would probably go with something like this:
CREATE TABLE property (
id int not null auto_increment,
name varchar(250) not null,
property_type int not null,
property_subtype int not null,
primary key(id)
);
CREATE TABLE property_type (
id int not null auto_increment,
name varchar(250) not null,
primary key(id)
);
CREATE TABLE property_subtype (
id int not null auto_increment,
type int not null,
name varchar(250) not null,
primary key(id)
);
CREATE TABLE property_feature (
id int not null auto_increment,
property int not null,
feature int not null,
value varchar(250) not null,
primary key(id)
);
CREATE TABLE property_feature (
id int not null auto_increment,
feature int not null,
value varchar(250) not null,
primary key(id)
);
I think this would be the most effective in the long run and the most flexible if - when - the time comes.
With this structure, you can then add the data like this:
mysql> INSERT INTO property_type (name) VALUES ('House'),('Commercial'),('Terrains');
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> INSERT INTO property_subtype (type, name) VALUES (1, 'Apartment'),(1, 'House'), (1,'Loft');
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> INSERT INTO subtype_feature (subtype, name) VALUES (1, 'Light'),(1, 'Floor #');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> INSERT INTO property (name, property_type, property_subtype) VALUES ('Som
e Apartment', 1, 1);
Query OK, 1 row affected (0.01 sec)
mysql> INSERT INTO property_feature (feature, value) VALUES (1, 'Yes'),(2, '5th');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> INSERT INTO property_feature (property, feature, value) VALUES (1, 1, 'Yes'),(1, 2, '5th');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
You can then get all the features of a particular property pretty easily:
mysql> SELECT s.name, f.value FROM property_feature f INNER JOIN subtype_feature
s ON f.feature = s.id WHERE f.property = 1;
+---------+-------+
| name | value |
+---------+-------+
| Light | Yes |
| Floor # | 5th |
+---------+-------+
2 rows in set (0.00 sec)