I am building a color search function utilizing php and mysql. The requirement of the search is that it needs to be fast, not use joins, and allow for 1-5 hex color inputs that query the database and return the "most accurate" results. By "most accurate" I mean that results will be reflective of the search input. I have a few pieces of data to help that such as the distance between the mapped color value (mapped against an array of pre-defined colors) and the original search input hex value (eg. ff0000).
The way the color search engine works is that you input 1-5 hex values (eg. #ff0000, #000000, #9ef855, etc), click search, and it searches the database to find images that contain the highest percentage of those colors. See this color search for reference to how a color search engine works. Note: I built this one, but it has a completely different schema, which has scaling problems and cant add indexes because the number of colors is directly related to the number of table columns which is 120. Suggesting I use what I have built is out of the question for right now.
The data in the database comes from measurements taken on images. Up to 5 colors are extracted from an image, and then each hex color value (hex) is mapped to the closest predefined hex value (map_hex). Both of these pieces of data as well as the following are stored in the database:
media_id
hex (actual true value from image measurement)
map_hex (mapped value of the previous hex value)
percentage (the amount of this color found in the image)
distance (the distance between the true hex value and the mapped hex value)
sequence (unix timestamp, for ordering)
Before a color search query gets sent to the database, it is mapped to a set of colors so we can use the mapping to do a direct lookup on map_hex. This to me seemed like a faster way than trying to do a range type of query.
As of right now I am experimenting with two database design schemas but both seem to have their own problems.
Schema 1
CREATE TABLE `media_has_colors` (
`media_id` int(9) unsigned NOT NULL,
`hex` varchar(6) NOT NULL DEFAULT '',
`map_hex` varchar(6) NOT NULL,
`percentage` double unsigned NOT NULL,
`distance` double unsigned NOT NULL,
`sequence` int(11) unsigned NOT NULL,
PRIMARY KEY (`media_id`,`hex`),
KEY `index_on_hex` (`hex`),
KEY `index_on_percentage` (`percentage`),
KEY `index_on_timestamp` (`sequence`),
KEY `index_on_media_id` (`media_id`),
KEY `index_on_mapping_distance` (`distance`),
KEY `index_on_mapping_hex` (`map_hex`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Sample query:
SELECT sql_no_cache media_id, hex, map_hex, distance,
avg(percentage) as percentage,
SUM((IF(map_hex = '61615a',1,0)) + (IF(map_hex = '34362d',1,0)) + (IF(map_hex = 'dbd5dd',1,0))) as matchCount
FROM media_has_colors
WHERE map_hex = '61615a' or map_hex = '34362d' or map_hex = 'dbd5dd'
GROUP BY media_id
ORDER BY matchCount DESC, distance, percentage DESC
LIMIT 100;
The First problem I see with schema 1 is that I am forced to use group by and sum. I'll admit I have not tested with a ton of records yet but it seems like it could get slow. On top of that I can't tell what map_hex values are matching (which is why I'm trying to get with matchCount.
Schema 2
CREATE TABLE `media_has_colors` (
`media_id` int(9) unsigned NOT NULL,
`color_1_hex` varchar(6) NOT NULL DEFAULT '',
`color_2_hex` varchar(6) NOT NULL DEFAULT '',
`color_3_hex` varchar(6) NOT NULL DEFAULT '',
`color_4_hex` varchar(6) NOT NULL DEFAULT '',
`color_5_hex` varchar(6) NOT NULL DEFAULT '',
`color_1_map_hex` varchar(6) NOT NULL DEFAULT '',
`color_2_map_hex` varchar(6) NOT NULL DEFAULT '',
`color_3_map_hex` varchar(6) NOT NULL DEFAULT '',
`color_4_map_hex` varchar(6) NOT NULL DEFAULT '',
`color_5_map_hex` varchar(6) NOT NULL DEFAULT '',
`color_1_percent` double unsigned NOT NULL DEFAULT '0',
`color_2_percent` double unsigned NOT NULL DEFAULT '0',
`color_3_percent` double unsigned NOT NULL DEFAULT '0',
`color_4_percent` double unsigned NOT NULL DEFAULT '0',
`color_5_percent` double unsigned NOT NULL DEFAULT '0',
`color_1_distance` double unsigned NOT NULL DEFAULT '0',
`color_2_distance` double unsigned NOT NULL DEFAULT '0',
`color_3_distance` double unsigned NOT NULL DEFAULT '0',
`color_4_distance` double unsigned NOT NULL DEFAULT '0',
`color_5_distance` double unsigned NOT NULL DEFAULT '0',
`sequence` int(11) unsigned NOT NULL,
PRIMARY KEY (`media_id`),
KEY `index_on_timestamp` (`sequence`),
KEY `index_on_map_hex` (`color_1_map_hex`,`color_2_map_hex`,`color_3_map_hex`,`color_4_map_hex`,`color_5_map_hex`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This second schema is not as simple but it does avoid using group by only allowing 1 row per media. However, it seems to have the same problem of figuring out what map_hex values are matching. Here is a sample query:
SELECT sql_no_cache media_id,
(IF(color_1_percent = '61615a',color_1_percent,1)) *
(IF(color_2_percent = '34362d',color_2_percent,1)) *
(IF(color_3_percent = 'dbd5dd',color_3_percent,1)) as percentage,
(IF(color_1_distance = '61615a',color_1_distance,1)) +
(IF(color_2_distance = '34362d',color_2_distance,1)) +
(IF(color_3_distance = 'dbd5dd',color_3_distance,1)) as distance,
color_1_map_hex, color_2_map_hex, color_3_map_hex, color_4_map_hex, color_5_map_hex,
(IF(color_1_map_hex = '61615a',1,1)) +
(IF(color_2_map_hex = '34362d',1,1)) +
(IF(color_3_map_hex = 'dbd5dd',1,1)) as matchCount
FROM media_has_colors
WHERE color_1_map_hex IN ('61615a','34362d','dbd5dd') OR
color_2_map_hex IN ('61615a','34362d','dbd5dd') OR
color_3_map_hex IN ('61615a','34362d','dbd5dd')
ORDER BY matchCount DESC, distance, percentage DESC
LIMIT 100;
You can see that there is a problem with calculating percentage and distance because the actual map_hex value may not appear in those specific columns.
Update:
I don't need to know specifically what colors matched in the query but I do need to sort by which has the highest matches.
So my question is, How can the schema or queries be fixed? If not, is there a better solution?
Related
I have the following 2 tables, api_analytics_data, and telecordia.
CREATE TABLE `api_analytics_data` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`upload_file_id` bigint(20) NOT NULL,
`partNumber` varchar(100) DEFAULT NULL,
`clei` varchar(45) DEFAULT NULL,
`description` varchar(150) DEFAULT NULL,
`processed` tinyint(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_aad_clei` (`clei`),
KEY `idx_aad_pn` (`partNumber`),
KEY `id_aad_processed` (`processed`),
KEY `idx_combo1` (`partNumber`,`clei`,`upload_file_id`)
) ENGINE=InnoDB CHARSET=latin1;
CREATE TABLE `telecordia` (
`tid` int(11) NOT NULL AUTO_INCREMENT,
`ProdID` varchar(50) DEFAULT NULL,
`Mfg` varchar(20) DEFAULT NULL,
`Pn` varchar(50) DEFAULT NULL,
`Clei` varchar(50) DEFAULT NULL,
`Series` varchar(50) DEFAULT NULL,
`Dsc` varchar(50) DEFAULT NULL,
`Eci` varchar(50) DEFAULT NULL,
`AddDate` date DEFAULT NULL,
`ChangeDate` date DEFAULT NULL,
`Cost` float DEFAULT NULL,
PRIMARY KEY (`tid`),
KEY `telecordia.ProdID` (`ProdID`) USING BTREE,
KEY `telecordia.clei` (`Clei`),
KEY `telecordia.pn` (`Pn`),
KEY `telcordia.eci` (`Eci`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Users upload data via a web interface using Excel/CSV files into api_analytics_data. The data contains EITHER the partNumbers or CLEIs. I then update the api_analytics_data table by joining the telecordia table. The telecordia table is the master list of partNumber and Cleis.
So if a user uploads a file of CLEIs, the update/join I use is:
update api_analytics_data aad
inner join telecordia t on aad.clei = t.Clei
set aad.partNumber = t.Pn
where aad.partNumber is null
and aad.upload_file_id = 5;
It works quickly, but not very thoroughly. The problem I have is that the CLEI uploaded may only be a substring of the CLEI in the telecordia table.
For example, the uploaded CLEI may be "5SC1DX0". In the telcordia table, the correct matching row is:
tid: 184324
ProdID: 472467
Mfg: PLSE
Pn: AUA58-2-REV-E
Clei: 5SC1DX04AA
Series: null
Dsc: DL SGL-PTY POTS CU RT
Eci: 205756
AddDate: 1994-03-18
ChangeDate: 1998-04-13
Cost: null
So obviously my update doesn't work in this case, even though 5SC1DX0 and 5SC1DX04AA are the same part.
What I need is a wildcard search. However, when I try this, it is crazy slow. With about 4500 rows uploaded into the api_analytics_data table, it runs for about 10 minutes, and then loses the connection with the server.
update api_analytics_data aad
inner join telecordia t on aad.clei like concat(t.Clei,'%')
set aad.partNumber = t.Pn
where aad.partNumber is null
and aad.upload_file_id = 5;
Is there a way to optimize this so that it runs quickly?
The correct answer is "no". The better course of action is to create a new column in telecordia with the correct Clei value in it, one that can be used for joining the tables. In the most recent versions of MySQL, this can even be a computed column and be indexed.
That said, you might be able to do something if the matching portion is always the same length. If so, try this:
update api_analytics_data aad inner join
telecordia t
on t.Clei = left(aad.clei, 7)
set aad.partNumber = t.Pn
where aad.partNumber is null and aad.upload_file_id = 5;
For this query, you want an index on api_analytics_data(upload_fiel_id, partNumber, clei) and telecordia(clei, pn).
I looked over Google for some samples/solutions about this, for example Creating Discount Code System (MySQL/php) but I haven't found a good solution.
My situation is such that I have a platform, where the user is supposed to have a balance in a virtual currency, and can buy virtual items for it. Now there's a wish to implement vouchers and discounts. There would be different kinds of codes, like one that gives 50% discount on purchasing items, x amount of extra items (with or without minimum item amount), just a code to get some currency, or a reference code that gives the referrer something.
I have implemented it as Campaign and CampaignType, where first holds the campaign info and second holds the action info.
Here's the structure:
-- Table structure for table `cake_campaigns`
CREATE TABLE IF NOT EXISTS `cake_campaigns` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(50) CHARACTER SET utf8 NOT NULL,
`code` varchar(100) COLLATE utf8_bin NOT NULL,
`type_id` varchar(50) CHARACTER SET utf16 COLLATE utf16_bin NOT NULL DEFAULT '1',
`value` int(10) unsigned NOT NULL DEFAULT '5' COMMENT 'Percentage or amount',
`min_amount` bigint(20) unsigned NOT NULL DEFAULT '0',
`owner_id` bigint(20) unsigned NOT NULL,
`created` datetime NOT NULL,
`active` tinyint(1) unsigned NOT NULL DEFAULT '1',
`single_use` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `code` (`code`),
KEY `owner_id` (`owner_id`),
FULLTEXT KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=4 ;
-- Table structure for table `cake_campaign_types`
CREATE TABLE IF NOT EXISTS `cake_campaign_types` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(50) CHARACTER SET utf8 NOT NULL,
`unit` varchar(10) CHARACTER SET utf16 NOT NULL DEFAULT '%',
`multiplier` double(10,8) NOT NULL DEFAULT '0.01000000',
`type` varchar(50) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=7 ;
Currently my logic is that when a campaign is used, then the action is according to CampaignType's name, for example in the purchase logic:
if (isset($this->request->data['Purchase']['code'])) {
$code = $this->request->data['Purchase']['code'];
$campaign = $this->Campaign->findByCode($code);
$this->Campaign->id = $campaign['Campaign']['id'];
// campaign no longer active
if ($this->Campaign->field('active') == 0) $code = false;
if ($this->CampaignLog->find('first', array('conditions' => array(
'user_id' => $this->User->field('id'),
'campaign_id' => $this->Campaign->field('id'),
'activated' => 1,
)))) $code = false; // code has already been used
unset($this->request->data['Purchase']['code']);
} else $code = false;
// Some purchasing logic here
if ($code) {
$this->CampaignLog->create();
$this->CampaignLog->save(array(
'campaign_id' => $this->Campaign->field('id'),
'user_id' => $this->User->field('id'),
'activated' => 1,
'source' => $this->Session->read('referrer'),
'earnings' => $earned,
'created' => strftime('%Y-%m-%d %H:%M:%S'),
));
if ($this->Campaign->field('single_use') == 1) {
$this->Campaign->saveField("active", 0);
}
// Apply code here
}
Now, my question is:
What would be the best course of action on going about applying those codes, because I'm a bit queasy on going with if-then-else or switch-case through all the possible code types. But right now, since there's so many things that can be different (ex. Discount - in percentage or set amount), then that seems to be the only option.
Maybe the structure/logic of the codes should be different?
It's already straightforward in my point of view, integrating it with the purchase would be the best bet in knowing further problems. Assuming that we have $this->request->data['price'] for the price, then we have an example type_id of 1 that represents as a discount.
All we have to do is to get value and do percentage equation so that would be like
$discount = floatval('0.' . $this->Campaign->value);
$finalPrice = $this->request->data['price'] * $discount;
Better if we implement it on a switch case to isolate their logic. It may depend on how you implement it but that's the gist of the concept.
Check the cart plugin it is using events for everything and nothing is hardcoded, it is pretty flexible by using this approach.
There is one method that is fired whenever the cart needs to be recalculated. Inside it calls other methods to calculate taxes and discounts.
Implementing this through events has the advantage that it is very easy to extend with additional discounts or tax calculations later.
Feel free to review the whole code of the plugin, it is not a simple implementation and covers cookie, session and database storage for the cart data and has events for a lot of things.
So I know that there are several many posts on this topic on this website, and the closest one I could find that was similar was:
Can I take the results from two rows and combine them into one?
I am working on a project that involves 'accounts receivables' and 'accounts payable', but that both of those need data in a single list:
date | description | reference | debit | credit
I have read about the mySQL UNION statement being used to combine two result sets into one, however, it also appears that the two results sets must match in column count and type according to the below website:
http://www.w3schools.com/sql/sql_union.asp
The problem I'm facing is that the two result sets don't have the same column count as the information for one doesn't directly correlate to the other (which will exclude the use of the UNION statement). What would be the best practice at acquiring the data from the two tables and sort them based on date? I'll include my SQL calls below as reference:
Accounts Receivable:
SELECT tblARP.*,tblAR.invoiceID,tblAR.ledgerID
FROM Accounting_ReceivablesPayments tblARP
INNER JOIN Accounting_Receivables tblAR ON tblARP.invoiceID = tblAR.invoiceID
ORDER BY deposited
Accounts Payable:
SELECT tblAPP.*,tblAP.id,tblAP.ledgerID,tblAP.tblName,tblAP.rowID,tblAP.invoice
FROM Accounting_PayablesPayments tblAPP
INNER JOIN Accounting_Payables tblAP ON tblAPP.payablesID = tblAP.id
ORDER BY deposited
UPDATE
Per the requests in the comments, here are the columns for the tables:
Accounting_Receivables
id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT UNIQUE,
invoiceID BIGINT NOT NULL,
amount DECIMAL(9,2) NOT NULL DEFAULT '1.00',
ledgerID BIGINT NOT NULL,
note TEXT
Accounting_ReceivablesPayments
id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT UNIQUE,
invoiceID BIGINT NOT NULL,
received DATE NOT NULL,
type VARCHAR(10) NOT NULL,
amount DECIMAL(9,2) NOT NULL DEFAULT '1.00',
deposited DATE,
tag VARCHAR(32) NOT NULL
Accounting_Payables
id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT UNIQUE,
paid TINYINT(1) UNSIGNED NOT NULL DEFAULT '0',
invoice BIGINT NOT NULL,
amount DECIMAL(9,2) NOT NULL DEFAULT '1.00',
terms VARCHAR(3) NOT NULL DEFAULT 'net',
due DATE,
tblName VARCHAR(48) NOT NULL,
rowID BIGINT NOT NULL,
ledgerID BIGINT NOT NULL,
note TEXT
Accounting_PayablesPayments
id BIGINT PRIMARY KEY NOT NULL AUTO_INCREMENT UNIQUE,
payablesID BIGINT NOT NULL,
created DATE NOT NULL,
type VARCHAR(10) NOT NULL,
amount DECIMAL(9,2) NOT NULL DEFAULT '1.00',
deposited DATE,
tag VARCHAR(32) NOT NULL
to what I was saying in the comments you should do this
( SELECT
tblARP.*,
tblAR.invoiceID,
tblAR.ledgerID,
NULL, -- # -- null values for your rows to match columns
NULL,
NULL
FROM `Accounting_ReceivablesPayments` tblARP
INNER JOIN `Accounting_Receivables` tblAR ON tblARP.invoiceID = tblAR.invoiceID
ORDER BY deposited
)
UNION ALL -- # -- union all to include everything
( SELECT
tblAPP.*,
tblAP.id,
tblAP.ledgerID,
tblAP.tblName,
tblAP.rowID,
tblAP.invoice
FROM `Accounting_PayablesPayments` tblAPP
INNER JOIN `Accounting_Payables` tblAP ON tblAPP.payablesID = tblAP.id
ORDER BY deposited
)
I am writing a PDO prepare, but I want to know if there is another good way to write it so it runs faster.
Here is the code:
function comp_post_code($cat, $comp_post_code){
global $DBH;
$STH = $DBH->prepare("SELECT * from uk_data where
cat10 like :comp_post_code AND (
cat1 like :cat OR
cat2 like :cat OR
cat3 like :cat OR
cat4 like :cat OR
cat5 like :cat OR
cat6 like :cat OR
cat7 like :cat
)") ;
$STH->bindValue(':cat', "%$cat%", PDO::PARAM_STR);
$STH->bindValue(':comp_post_code', "$comp_post_code%", PDO::PARAM_STR);
$STH->execute();
$STH->setFetchMode(PDO::FETCH_ASSOC);
return $STH;
$DBH = Null;
}
I want full data from table so I am using Select. * ....Thx
Edited:- cat10 is a postcode, cat 1 to cat7 are categories. I need to search the categories in a given postcode.
Here is the table format:
CREATE TABLE IF NOT EXISTS `uk_data` (
`slno` int(10) NOT NULL AUTO_INCREMENT,
`comp_name` varchar(150) DEFAULT NULL,
`comp_no` varchar(50) DEFAULT NULL,
`comp_street` varchar(100) DEFAULT NULL,
`comp_area` varchar(100) DEFAULT NULL,
`comp_post_code` varchar(15) DEFAULT NULL,
`comp_phone` varchar(100) DEFAULT NULL,
`comp_phone1` varchar(100) DEFAULT NULL,
`cat1` varchar(100) DEFAULT NULL,
`cat2` varchar(100) DEFAULT NULL,
`cat3` varchar(100) DEFAULT NULL,
`cat4` varchar(100) DEFAULT NULL,
`cat5` varchar(100) DEFAULT NULL,
`cat6` varchar(100) DEFAULT NULL,
`cat7` varchar(100) DEFAULT NULL,
`cat8` decimal(9,6) DEFAULT NULL,
`cat9` decimal(9,6) DEFAULT NULL,
`cat10` varchar(15) DEFAULT NULL,
PRIMARY KEY (`slno`),
UNIQUE KEY `Phone` (`comp_phone`),
KEY `cat10` (`cat10`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=74504 ;
The output I am using:
$uk_data_radious = comp_post_code($q,$pc_o);
while (($row = $uk_data_radious->fetch()) !== false) {
$comp_name[] = $row['comp_name'];
$comp_phone[] = $row['comp_phone'];
$comp_phone1[] = $row['comp_phone1'];
$post_code[] = $row['cat10'];
$post_code1[] = $row['comp_post_code'];
$comp_no[] = $row['comp_no'];
$comp_street[] = $row['comp_street'];
$comp_area[] = $row['comp_area'];
$cat1[] = $row['cat1'];
$cat2[] = $row['cat2'];
$cat3[] = $row['cat3'];
$cat4[] = $row['cat4'];
$cat5[] = $row['cat5'];
$cat6[] = $row['cat6'];
$cat7[] = $row['cat7'];
$distance_m[] = distance($Latitude[0],$Longitude[0],$row['cat8'],$row['cat9'],"M");
}
The Full Text search works better on long string for searches anywhere inside. But it requires to create a special index, and change the queries (and is not compatible with InnoDB tables).
This is quite some work. But if the like queries are really slow, FTI (Full text index) is a fast alternative.
--
Keeping the like queries, there is no much you can do - unless of course change the way you organize your data within the columns - i.e. in order to avoid the like.
To optimize somehow the like queries you could maybe merge all your cats in a single column (i.e. add a new column catx), separated with a special char that doesn't occur in your cats, like :, and do only one "like" (that should speed up a bit). E.g.
cat1: alpha
cat2: beta
cat3: gamma
...
gives
catx is :alpha:beta:gamma:
and do the like on catx
catx like :cat
to search a specific cat, search ":mycat:"
to search for a part of cat, starting with a word, search for ":start" or
to search for a ending word, search for "end:"
The like search algorithm is more efficient on a long string, than running several times on smaller strings. MySQL uses the very efficient Turbo Boyer-Moore algorithm when the search string has more than 3 characters.
But I have to warn you : there are several constraints linked to this strategy
the special separator char that shouldn't appear within the cats
any update to cat[1-7] requires the catx adjustment, so if cats are likely to change a lot, maybe this isn't a good solution
this solution works usually for well known data - like identifiers - which format changes rarely
Empirically, I don't think you can expect more than 25% gain with this strategy.
my table structure
CREATE TABLE IF NOT EXISTS `patients` (
`patient_id` int(8) unsigned zerofill NOT NULL AUTO_INCREMENT,
`pin` int(4) unsigned zerofill NOT NULL,
`patient_name` varchar(100) COLLATE utf8_bin NOT NULL,
`patient_global_id` int(50) NOT NULL,
`patient_dob` date NOT NULL,
PRIMARY KEY (`patient_id`),
UNIQUE KEY `pt_global_id` (`patient_global_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=1 ;
what i want is that the is to make the id column contain data like this
id:4395 0001
username:fool
pin:4395
and yet nextuser have id=PIN#0002 where pin column is auto random 4 digits
is this doable within mysql or i need to php it ?
You can use the RAND()-function to get a random number:
As stated in the manual to get a random number within a specific range you can use:
FLOOR(i + RAND() * (j – i))
Where i is the minimum number and j is the maximum number + 1. So to create a number with four digits you can use:
FLOOR(1000 + RAND() * (10000 – 1000))
If you want to have leading zeros as well you can combine this with LPAD():
LPAD(FLOOR(RAND() * 10000), 4, '0')