Display rows based on $array - php

I have a table in postgres called workorders. In it are various headings. The ones I am interested in are labor, date_out and ident. This table ties up with wo_parts (workorder parts). In this table are the headings I am interested in, part and workorder. Both are integers. (part auto number) The final table is part2vendor and the headings are retail and cost. Right, basically what happens is.....I create a workorder (invoice). This calls a part from part2vendor. I enter it and invoice it off. In workorder a row is created and saved. It is given an ident. In wo_parts, the part i used is recorded as well as workorder number and qty used. What I want to do is create a report in php that pools all this info on one page. IE. if i choose dates 2009-10-01 to 2009-10-31 it will pull all workorders in this range and tell me the total labour sold and then the PROFIT (retail less cost) of the parts I sold, using these 3 tables. I hope i have explained as clear as possible. any questions please ask me. Thank you very much for your time.

You will want to read up on SQL - keywords to look for include "aggregate", "SUM" and "GROUP BY".
You query will look something like (but this will certainly need correcting):
SELECT
SUM(wo.labor) AS tot_labor,
SUM(p2v.cost - p2v.retail) AS tot_profit
FROM
workorders AS wo
JOIN wo_parts AS wp ON wo.ident=wp.ident [?]
JOIN part2vendor AS p2v ON ...something...
WHERE
date_out BETWEEN '2009-10-01'::date AND '2009-10-31'::date;

Related

How to analyze item's path through system

I'm looking for a little bit of direction for how to analyze a problem. I work for a small manufacturing company. We paint about 150 items per day. Those items then go to Quality Control. About 70% pass QC. The remaining 30% have to be repaired in some way.
We have 5 different repair categories:Repaint, Reclear, Remake, Reglaze, Fix
Every time an order gets QC'd my system inputs some data in a "Repairs" mysql table. If it passes QC, it's given a category of Great. It's structure is like this:
id | Repair | Date
5 | repaint| 2013-01-01
6 | reclear| 2013-01-01
5 | great | 2013-01-02 ...etc
I need to be able to perform analysis on what actions are happening. I'd like to know what 'paths' items are going down.
For example. What percentage of items have these categories Reclear->Repaint->Great. What percentage have Repaint->Repaint->Remake->Great (every item should eventually end with 'Great)
I'm kind of stuck on where to start in figuring out how to analyze this.
Should I be keeping track of the repair number in the table? If I did that then maybe I could use a self join to select orders where repairnum=1 AND repair=Repaint joined with repairnum=2 AND repair='Great' This would tell me which orders went down the path Repaint->Great I'm a little hesitant to go this route because 1) I don't want to have to do a query and get the repairnumber before I insert a new row into the table and 2) It seems like I'd have to have some pretty nasty querys to analyze items that have 5 or 6 (or more) repairs.
Perhaps someone can point me in the right direction?
My app is in php and mysql.
You don't need a separate "repair number", because you have the date when each repair was made, so can order by that (assuming you store time as well if more than one repair can be made in a day).
The "path" for an item is the list of its repairs, in order of date. If you just say SELECT repair FROM repairs WHERE id=5 ORDER BY date ASC you'll get them as rows.
The trick is to turn these into a single value representing the whole path, using GROUP_CONCAT - SELECT GROUP_CONCAT(repair ORDER BY date ASC SEPARATOR '->') FROM repairs WHERE id=5
Once you have that, you can run that for all products in the DB using a GROUP BY, and then look for patterns in it with HAVING:
SELECT
id,
GROUP_CONCAT(repair ORDER BY date ASC SEPARATOR '->') as path
FROM
repairs
GROUP BY
id
HAVING
path = 'Repaint->Repaint->Remake->Great'
Note that I don't have a copy of MySQL to try this out with, so I may have made a mistake, but the manual suggests that the above should work.

Tracking user activities to build individual user profile & suggestions

I am about to build a web shop and need to come up with a solution of tracking user information, and based upon that suggest the users products they may like too and so build an individual user profile (what they like).
Information to be tracked/used for the algorithm, I thought should include:
past orders
wish list/bookmarks/favourites...
search terms entered
products viewed (and here also track and consider the "drop-off"-quote, meaning wether a user closes the site/goes back immediately or looks at more pictures/scrolls down (viewport) etc)
Products are assigned to categories as well as different attributes such as colors, tags etc. The table product has relations with color, category, etc.
product
id_product
price
timestamp_added
color
id_color
...
product_color
id_product_color
id_product
id_color
The questions are:
1) How would you structure a database to track e.g. products viewed? Should it be just like this?:
product_viewed
id_product_viewed
id_product
id_user
timestamp
2) If I want to calculate e.g. the users top 3 favourite colors based on colors of products the user bought, put on their wish list, bookmarked, viewed: can it be handled from a performance point of view to calculate which products should be recommended to this when querying the database every single time? Or do you update a user profile from time to time, storing only the already calculated favourite color at the moment based upon the tracked data and use the stored calculated data to find products that match this information?
How do big sites like facebook, amazon or pinterest do this? On pinterest you get suggestions for items you may like based on what items you clicked on before. How do they handle this?
Yes, your schema for product_viewed is OK.
As for their three favorite colors, try this untested code:
select c.name, count(*) as rank
from product_viewed pv
JOIN product_color pc on pc.id_product = pv.id_product
JOIN color c on pc.id_color = c.id_color
where pv.id_user = 1
group by c.name
order by rank desc
limit 3
Given indexes on the ids used to join the tables and a reasonable limit on the number of items viewed, this should have decent performance. Down the road, you might only look at their most recent 100 products, etc., just to keep it from growing forever. (Or, as you suggest, caching).
There's no magic to this, so it's probably similar to that those other sites are doing.
Doing it with tables like you just wrote is a good way.
Facebook and etc. is doing it that way as well.
But for more efficiency, they use so called B-Trees.

Temporary Table and Left Joins not showing results as expected

I'm really hoping someone can help me with this. I have a number of product attribute types that users can select from to refine the products that are returned to them on screen. What I'm trying to do is, for each product attribute type, I want to list all attributes that relate to either the selected category or search term, then once they've made their selections, I still want to display each of the attributes that relate to the category or search term, but only display a clickable link if the product count for that particular attribute is greater than 1 and for those that have a product count of zero, I want to list them, but make them unclickable. An example of what I'm trying to achieve can be found on the ASOS website, in the left hand menu
http://www.asos.com/Women/Dresses/Cat/pgecategory.aspx?cid=8799#state=Rf961%3D3340%2C3341%40Rf-200%3D20&parentID=Rf-300&pge=0&pgeSize=20&sort=-1
Initially I tried using just joins to achieve this, but I wasn't able to do it, successfully. So I decided to create a temporary table for each attribute type which held a list of all the attributes that related to the main query and then created a refined query, with a left join. Here's my code:
CREATE TEMPORARY TABLE temp_table
SELECT su_types.id, type AS item FROM su_types
INNER JOIN su_typerefs ON su_types.id=su_typerefs.id
INNER JOIN su_pref ON su_typerefs.mykey = su_pref.mykey
WHERE wp_category_id =40 GROUP BY su_typerefs.id
$sudb->query($query);
if ($sudb->affected_rows > 0) {
SELECT temp_table.id,item,COUNT(su_typerefs.mykey) AS product_count FROM temp_table
LEFT JOIN su_typerefs ON temp_table.id=su_typerefs.id
LEFT JOIN su_pref ON su_typerefs.mykey = su_pref.mykey
LEFT JOIN su_stylerefs ON su_pref.mykey = su_stylerefs.mykey
LEFT JOIN su_productrefs ON su_pref.mykey = su_productrefs.mykey
WHERE wp_category_id =40 AND su_stylerefs.id in (91) AND su_productrefs.id in (54) AND su_typerefs.id in (159) GROUP BY su_typerefs.id
if ($itemresults = $sudb->query($query)) {
while($itemresult = $itemresults->fetch_array(MYSQLI_ASSOC)) {
$id=$itemresult['id'];
$item=$itemresult['item'];
$product_count=$itemresult['product_count'];
build_link($list_type, $item, $product_count, $id);
}
}
In the above example the first query selects all the product types that relate to a particular category, say dresses. And the second query is based on the refinements the user has made on the category, in this example this is product, product type and style. A user can also refine their search by colour, fit, fabric and design.
There are a couple of issues with this:
1) The number of results returned in the second query do not match the results of the first. Using the above as an example, I wish to list all products that relate to the chosen category, then using the second query return the product count for each of these products as I described above. So if the temporary table returns, trousers, jeans and skirts. I expected these three items to be displayed on screen based on the conditions applied in the second query, however my results may only show trousers and jeans, if there is not a match for skirts in the second query. I thought that using a left join would mean that all the results of the temporary table would be displayed.
2)Also I wonder if I'm doing this the most efficient way. I have a total of 8 attribute groups, and therefore need to do the above 8 times. If the user choses to refine the results using all 8 attribute groups then in addition to the temp table join, there will be a total of 9 joins for each type. It's taking a while to execute, is there a better way to do this? There are approximately 1/2 million products in the table, and this will probably be 5 times this, once my site goes live.
I really hope all that I have written makes sense and I'd really appreciate the stackoverflow community's help with this, if anyone can help. I apologise for the essay ;). Thanks in advance
To answer your first question; yes, a LEFT JOIN will indeed keep all data from the initial table. That, however, isn't the problem.
The reason why you lose empty categories, is most likely (I say this because I don't fully know your db structure) because of the where condition filtering out all results based on the data in the joined tables.
If for a category all items get filtered out (possibly including the NULL joined values), you will not get this category back from that query anymore. Also the GROUP BY is done on a joined column, that might also effectively wipe out your other categories.
As for the second question, you already state it's taking long; so it's probably not the way to go if you want things to work fast ;) (okay, obvious answer, low hanging fruit, etc). What you might want to do, is get a collection of keys from the filterable categories first, and use that data to select items.
This prevents that you have to join up your entire products table in a temp table (at least, that's what I think you're doing), which of course will take long with the given number of entries. Selecting a list of matching IDs from the given attributes also gives you the advance of using your indexes (more), which a temp-table probably won't have. If this is possible and feasible mainly depends on your schema's structure; but I hope it might lead you to the direction you want to go :)

How should I design the database structure for this problem?

I am rebuilding the background system of a site with a lot of traffic.
This is the core of the application and the way I build this part of the database is critical for a big chunk of code and upcoming work. The system described below will have to run millions of times each day. I would appreciate any input on the issue.
The background is that a user can add what he or she has been eating during the day.
Simplified, the process is more or less this:
The user arrives to the site and the site lists his/her choices for the day (if entered before as the steps below describes).
The user can add a meal (consisting of 1 to unlimited different items of food and their quantity). The meal is added through a search field and is organized in different types (like 'Breakfast', 'Lunch').
During the meal building process a list of the most commonly used food items (primarily by this user, but secondly also by all users) will be shown for quick selection.
The meals will be stored in a FoodLog table that consists of something like this: id, user_id, date, type, food_data.
What I currently have is a huge database with food items from which the search will be performed. The food items are stored with information on both the common name (like "pork cutlets") and on producer (like "coca cola"), along with other detailed information needed.
Question summary:
My problem is that I do not know the best way to store the data for it to be easily accessible in the way I need it and without the database going out of hand.
Consider 1 million users adding 1 to 7 meals each day. To store each food item for each meal, each day and each user would potentially create (1*avg_num_meals*avg_num_food_items) million rows each day.
Storing the data in some compressed way (like the food_data is an json_encoded string), would lessen the amount of rows significally, but at the same time making it hard to create the 'most used food items'-list and other statistics on the fly.
Should the table be split into several tables? If this is the case, how would they interact?
The site is currently hosted on a mid-range CDN and is using a LAMP (Linux, Apache, MySQL, PHP) backbone.
Roughly, you want a fully normalized data structure for this. You want to have one table for Users, one table for Meals (one entry per meal, with a reference to User; you probably also want to have a time / date of the meal in this table), and a table for MealItems, which is simply an association table between Meal and the Food Items table.
So when a User comes in and creates an account, you make an entry in the Users table. When a user reports a Meal they've eaten, you create a record in the Meals table, and a record in the MealItems table for every item they reported.
This structure makes it straightforward to have a variable number of items with every meal, without wasting a lot of space. You can determine the representation of items in meals with a relatively simple query, as well as determining just what the total set of items any one user has consumed in any given timespan.
This normalized table structure will support a VERY large number of records and support a large number of queries against the database.
First,
Storing the data in some compressed way (like the food_data is an
json_encoded string)
is not a recommended idea. This will cause you countless headaches in the future as new requirements are added.
You should definitely have a few tables here.
Users
id, etc
Food Items
id, name, description, etc
Meals
id, user_id, category, etc
Meal Items
id, food_item_id, meal_id
The Meal Items would tie the Meals to the Food Items using ids. The Meals would be tied to Users using ids. This makes it simple to use joins in order to get detailed lists of data- totals, averages, etc. If the fields are properly indexed, this should be a great model to support a large number of records.
In addition to what's been said:
be judicious in your use of indexes. Properly applying these to your database could significantly speed up read access to your tables.
Consider using language-specific features to minimize space. You mention that you're using mysql; consider using ENUM when appropriate (food types, meal types) to minimize database size and to simplify management.
I would split up your meal table into two tables, one table stores a single row for each meal, the second table stores one row for each food item used in a meal, with a foreign key reference to the meal it was used in.
After that, just make sure you have indices on any table columns used in joins or WHERE clauses.

SQL database issue

I have a select statement showing the following results:
On_loan barcode
Y 12345
Y 12345
N 12345
N 12344
Y 12344
Each barcode for a book can have more than one copy. Users can place a book on hold. E.g user '1' has reserved book 12345 and 12344. The above results show: that the two books with barcode 12344- one is available, the other is unavailable. I want to be able to show two regions in PHP, the top showing books that are ready to take out(that were on hold) and the other showing books that are unavailable which have been placed on hold. From my select query i now want my select to check to see for each barcode 12345 and 12344 whether a book has been returned. If it has i will then use the hold_date to see if its the earliest Hold for the specific book.
I understand on_loan informs me whether a book has been returned, however how can i use 'N' from on_loan for each book. I believe distinct will not work.
How can i go about doing this.
My Hold table
has the following fields:
user
isbn
hold_date
I think you are asking for a way to check if a recently returned book is on hold for another customer, correct?
The book should actually have a unique barcode per each physical copy of a book in the library, as well as an ISBN for the book in general.
Holds would be placed by ISBN.
When a book is checked in, enter that copies barcode, then pull its ISBN number and see if another customer is waiting for it.
If so, set the status for that copy to 'hold', create a related library book to hold record relation.
Otherwise, set the book status to checked in.
Assuming there is a table 'copy' that has a record for each physical copy with unique barcode and relates to a table called 'book'
that has info about a book like ISBN and Author etc, and a table called 'hold' that has the hold info an ISBN (or better, book.id)
Here are the all the copies that are checked in and have a hold on them.
select * from copy left join book on book.id = copy.book_id where copy.status_id = get_book_status('in') and book.isbn in (select isbn from hold);
Maybe you should have a bookTitle table with ID, Barcode, link to barcode tables and then you could do a query to return all copies of a bookTitle and use queries to return barcodes that are on loan and not.
The ID makes it unique.
That's not quite a good database design, if you are asking this kind of questions.
First of all, you should transform this to the third normal form databse.
Then it will look like three tables: books (name, barcode, available_count), users(user_id, name) and a relationship table users_to_books(user_id, book_barcode, state), where state can be an menu with values "on hold" and "on hands".
After that you can do all kind of stuff with counting and checking.

Categories