I'm struggling a bit on the best way to do this with as little performance hit as possible.
Here's the setup...
Search results page with search refining filters that make an AJAX call to a PHP handler which returns a new (refined) set of results.
I have 4 tables that contain all of the data I need to connect to in the PHP handler code.
Table 1 - Main table of records with main details
Table 2 - Ratings for each product from professional rating company #1
Table 3 - Ratings for each product from professional rating company #2
Table 4 - Ratings for each product from professional rating company #3
The refiners on the search results page are jquery sliders with ranges from the lowest allowed rating to the highest for each.
When a slider handle is moved, a new AJAX call is made with the new value(s) and the database query will run to create a fresh set of refined results.
Getting the data I need from Table 1 is the easy part. What I'm struggling with is how to efficiently include a join on the other 3 tables and only picking up rows that match the refining values/ranges. Table 2, 3, and 4 all have multiple columns for year (2004-2012) and when I made an initial attempt to put it all into one query, it bogged down.
Table 2, 3, and 4 hold the various ratings for each record in Table 1.
The columns in Table 2, 3, and 4 are...
id - productID - y2004 - y2005 - y2006 - y2007 - ... you get the idea.
Each year column has a numeric value for each record (default is 0).
What I need to do is efficiently select records that match the refiner ranges selected by the user across all 4 tables at once.
An example refiner search would be...get all records from Table 1 where price is between $25 and $50 AND where Table 2 records have a rating (from any year/column) between 1 - 4 AND where Table 3 records have a rating (from any year/column) between 80 - 100 AND where Table 4 records have a rating (from any year/column) between 80 - 100.
Any advice on how to set this up with as much performance as possible?
My suggestion would be to use a different table structure. You should merge Table 2, 3 and 4 into a single ratings table with the following structure:
id | productID | companyID | year | rating
Then you could rewrite your query as:
SELECT *
FROM products p
JOIN ratings r ON p.id = r.productID
WHERE p.price BETWEEN 25 AND 50
AND (
( r.companyID = 1 AND r.rating BETWEEN 1 AND 4 )
OR ( r.companyID = 2 AND r.rating BETWEEN 80 AND 100 )
OR ( r.companyID = 3 AND r.rating BETWEEN 80 AND 100 )
)
This way the performance would surely increase. Also, your tables will be more scalable, both with the years and the number of companies.
One more thing: if you have a lot of fields in your products table, it might be more useful to execute 2 queries instead of joining. The reason for this is that you are fetching redundant data - every joined row will have the columns for product, even though you only need it once. This is a side-effect of joins, and there is probably a performance threshold where it will be more useful to query twice than to join. It is up to you to decide if/when that is the case.
Related
I have this very specific problem which I can't even decide how to approach. So I have 3 tables in MySQL.
Table recipe: id_recipe| name | text | picture
Table ingredients_recipe: id_rs | id_recipe| id_ingredients
Table ingredients: id_ingredient | name | picutre
This is a site, where you select ingredients(so the input is 1 or more id_ingredient) and it should display three categories:
All recipes you can make right now (you have all the ingredients required for it)
All recipes where you are missing only 1 or 2 ingredients
All recipes where you are missing only 3 or 4 ingredients.
Can you help me with these 3 SQL selects? I'm pretty deadlocked right now. Thanks.
SAMPLE DATA: http://pastebin.com/aTC5kQJi
I think your basic statement is already on the right track. You just need to do a little trick. You cannot compare them directly, but you can compare the count of ingredients:
SELECT id_receipe, count(id_rs) as ingredient_count
FROM ingredients_recipe
WHERE id_ingredient IN ( 2, 5)
GROUP BY id_recipe
This will give you the count of ingredients you have for each receipe. Now get the total amount of ingredients for each receipe
SELECT id_receipe, count(id_rs) as ingredient_count
FROM ingredients_recipe
GROUP BY id_recipe
an compare them. Taking the first query as a basis. You can easily get your three different categories out of this.
I want to code a basic search box in my php based online shopping website. the problem is the data is in 50 tables categorised based on product type.
ie
table 1 - Mobile Phones
table 2 - Laptops
table 50 - Air Conditioners
i can code it using query like
select * from table 1
if 0 rows returned
select * from table 2
if 0 rows returned
next
till table 50
but this code can slow down the website as each keypress will lead into 100 queries execution is there anything else i can do about it ?
Options:
1) Normalise your tables so there's just one table to search (may be difficult if different products have different fields)
2) Use SQL Unions (can be very slow):
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
3) Query each table, store results in array, use usort to sort the array and then output them.
I'm trying to create a web app based on a few mysql tables and forms.
I have the ADD Shipment page in which the user adds a list of products and a Invoice no., so that in my Shipment_Products table I have
id invoice_no product_code qty
1 34 HP222 4
2 34 HL234 1
I also have a Sold page in which the user adds a list of products sold from his stock, so the table Sold_Products get filled like this
id invoice_no product_code qty
1 1 HP222 2
2 34 HL234 1
I need to have a third table called Stock in which I have to get the total number of items in stock, but I'm stuck on how to auto_generate it based on these two existing tables and then keep it updated.
Any suggestions ?
I don't think what you want is as simple as what you might be trying to short-cut. First, if your company is dealing with FIFO vs LIFO inventory, you take inventory out of counts as they were available for COGS (cost of goods sold) purposes based on the cost from a given vendor at a given purchase time.
You might just have a purchases table showing every item and count received, then, as goods are sold, have another table showing the reduction from each purchase received available quantity out. It is somewhat tricky to deal with, given such a simple scenario as...
Purchase 10 of product "X" from a vendor, then another 4 of product "X" shortly after. Total of 14.
Now, for sale activity, you have one order selling 3, then 2, then 4, then 3. Total of 12 of the 14 sold, but whatever way is computed FIFO vs LIFO, there would be a split of which quantity was accounted for from which originating purchase... Ex: FIFO would be accounted for from the original 10 then 4 puchases.
10 purchase
10 - 3 = 7
7 - 2 = 5
5 - 4 = 1
1 - 3 = -2 -- WRONG... Only 1 of the 3 against the original 10 purchase
1 - (1 of 3) = 0
4 purchase
4 - (remaining 2 of 3) = 2 -- here is the remaining 2 entries from the last sale, leaving 2 left in inventory.
So, no absolute table structure to handle, but I think you should check how the inventory management requirements may need them.
I'm going to try and explain it better.
Sorry for my bad english !
I now have a table received in wich every item in a particular invoice is added when it arrives, so let's say today we receive some items, 3 items and the unique invoice_id or shipment_id (doesn't really matter) is S1. In the table received I have
id shipment_id product_code qty
1 S1 HL223 2
2 S1 XLS21 1
3 S1 BenqWHL 1
I have another similar table called sold wich works the same way, I add a list of items by their product_code and give the sale an ID (for other purposes). So the sold table looks like this:
id sold_id product_code qty
1 B1 HL223 1
2 B1 XLS21 1
Now, I just need to see the items left in stock, either by creating a table in wich I store the items grouped by their unique product_code and just count the entries as qty and then maybe when doing a sale I can substract the qty sold in this table stock ?
I don't care about invoice numbers or IDs, users, etc.
This is the bad way, but at least is the way to begins.
make a table: inventory/Stock or something with the next structure: ID, Item_id, item_quantity.
When purchase/receive, add a row to this table with a positive quantity.
When Sales/output, add a row to this table with a negative quantity.
To calculate the stock perform a query with the sum() of a particular item_id.
I've got myself into a bit of a tiss over averaging and joining tables.
Essentially I want to display the average heights of different plant species using Highcharts, pulling the data from a MySQL database. Unfortunately the height data and the species names were setup to be added in different tables.
I've got it working, however when I download the data and find the averages in Excel the figures are different to those being displayed - so I'm obviously not doing it right. I've double checked I'm doing it right in Excel so almost certain it's my MySQL query that's stuffing up.
There's loads of entries in the actual tables, so I've just put an example below.
The query I have at the moment is:
<?php
$result = mysql_query("
SELECT DISTINCT(plant_records.plant_id), ROUND(AVG(plant_records.height),2) as plant_average, plant_list.id, plant_list.plant_species
FROM plant_records
INNER JOIN plant_list
ON plant_records.plant_id=plant_list.id
GROUP BY plant_list.plant_species
") or die(mysql_error());
while ($row = mysql_fetch_array($result)) {
$xAxisValues[] = "'" . $row['plant_species'] . "'";
$AseriesValues[] = $row['plant_average'];
}
?>
Am I doing it right? I found some nice tutorials explaining joins, like this one, but I'm still confused. I'm wondering if I'm averaging before I've joined them, or something??
"plant_id" in the Records table corresponds with "id" in the List table
plant_records:
id plant_id date_recorded height
1 3 01/01/2013 0.2523123
2 1 02/01/2013 0.123
3 3 03/02/2013 0.446
4 3 04/03/2013 0.52
5 1 05/03/2013 0.3
6 2 06/03/2013 0.111
7 2 07/05/2013 0.30
8 4 08/05/2013 0.22564
9 1 09/05/2013 1.27
10 3 10/05/2013 1.8
plant_list:
id registration_date contact_name plant_species plant_parent
1 01/01/2013 Dave ilex_prinos London_Holly
2 02/01/2013 Bill acer_saccharum Brighton_Maple
3 01/01/2013 Bob ilex_prinos London_Holly
4 04/01/2013 Bruno junip_communis Park_Juniper
EDIT:
I've tried every possible way of finding the data using Excel (e.g. deliberately not filtering unique IDs, different average types, selecting multiple species, etc) to find the calculation my query is using, but I can't get the same results.
I notice two issues with your query at the moment.
Selecting plant_list.id while having a GROUP BY plant_list.plant_species will not yield anything of interest, due to the fact that MySQL will return an arbitrary id from any of the plants that match each species.
You state that you are only interested in the most recent recording, but nothing in your query reflects that fact.
Given that information, try this query:
SELECT ROUND(AVG(pr.height),2) as plant_average, plant_list.plant_species
FROM plant_records pr
INNER JOIN plant_list
ON pr.plant_id=plant_list.id
WHERE pr.date_recorded = (
SELECT MAX(pri.date_recorded) FROM plant_records pri
WHERE pri.plant_id = pr.plant_id
)
GROUP BY plant_list.plant_species
Alternately, if you want just the average heights for a specific date, simply pass that directly into the query, instead of using the subquery.
If we are assuming that plant_id is not the unique identifier - meaning that a single plant_id is only for one single plant of any given species and you want to know what the average height of a single species is you can do this:
SELECT PL.plant_species, ROUND(AVG(PR.height),2) as plant_average
FROM plant_records AS PR
JOIN plant_list AS PL
ON PR.plant_id=PL.id
GROUP BY PL.plant_species
This will return something like:
plant_species plant_average
acer_saccharum 0.2100000
ilex_prinos 0.6700000
junip_communis 0.2300000
I always struggle with dealing with normalised data, and how I display it. Maybe its because I don't fully understand the normalisation rules, like how to get it fully into Boyce-Codd. Performance is not really an issue at this stage, though maintainability of the schema is.
user
ID Name
1 Alice
2 Bob
3 Charlie
skills
ID Name
1 Karate
2 Marksmen
3 Cook
event
ID Name
1 Island
2 Volcano
user-m2m-skill
MemberID SkillID
1 1
1 2
2 1
2 3
3 1
user-m2m-event
MemberID EventID
1 1
1 2
2 1
3 2
How do I get this information out of the database? I'd like to display a table like this, where I've got the total count of each skill:
Skills at event
Event Karate Marksmen Cook
Island 2 1 1
Volcano 2 1 0
It is unlikely that the skills table will change very much. This means I could do a set of subqueries like this (obviously shortened and incorrect syntax)
SELECT event.name,
(SELECT COUNT(*) FROM ... WHERE skill = 'Karate'),
(SELECT COUNT(*) FROM ... WHERE skill = 'Marksmen') FROM event
And that's what I've been doing, putting it into a view. But its a bit horrible, no? I have to edit the view every time I add a new skill.
The other way to to process it client side. So I just get back something like this:
Event Skill Count
Island Karate 2
Island Marksmen 1
Island Cook 1
Volcano Karate 2
Volcano Marksmen 1
And I loop through the results, reformatting it. But I hate that even more. Isn't the database supposed to do data?
So: what am I doing wrong? Am I expecting too much? Which is the lesser evil?
(As b3ta would say, apologies for length of post and for bad markup. :( )
This is a typical pivot query, because you are looking to convert data in rows into columns.
SELECT e.name,
MAX(CASE WHEN x.skill_name = 'Karate' THEN x.num_skill ELSE 0) END AS Karate,
MAX(CASE WHEN x.skill_name = 'Marksmen' THEN x.num_skill ELSE 0 END) AS Marksmen
FROM EVENT e
LEFT JOIN (SELECT um.eventid,
s.name AS skill_name,
COUNT(*) 'num_skill'
FROM SKILLS s
JOIN USER-M2M-SKILL us ON us.skillid = s.id
JOIN USER-M2M-EVENT um ON um.memberid = us.memberid
GROUP BY um.eventid, s.name) x ON x.eventid = e.id
GROUP BY e.name
Followup question:
...what does this have that a load of sub queries doesn't?
SELECTs as statements within the SELECT clause. IE:
SELECT x.name,
(SELECT COUNT(*) FROM TABLE)
...means that a separate query is run for every skill. If the queries were correllated - if they were tied together by an ID to make sure records sync'd with an event, then the count would be running for every event.
Conclusion to Followup
The approach is terribly inefficient. It is better to fetch the necessary values once, as I provided in my answer.
Addendum
Regarding updating the query - it is possible to minimize the maintenance by implementing the query with dynamic SQL.
Your second example, with "Event", "Skill", and "Count" as headers, is what you should expect from dynamically generated results from normalized data. Databases are not designed to format data for display (this isn't an Excel spreadsheet), they're designed to store data and return the meaning of that data. It's up to your code to display it in a nice fashion.
Who's b3ta?
As far as the database "doing data" that doesn't mean that client code will be free of all parsing and processing. And normalization in practice shouldn't b a goal in itself. You should also take into account ease of querying and performance.