I hope I can explain this good enough. I have 3 tables. wo_parts, workorders and part2vendor. I am trying to get the cost price of all parts sold in a month. I have this script.
$scoreCostQuery = "SELECT SUM(part2vendor.cost*wo_parts.qty) as total_score
FROM part2vendor
INNER JOIN wo_parts
ON (wo_parts.pn=part2vendor.pn)
WHERE workorder=$workorder";
What I am trying to do is each part is in wo_parts (under partnumber [pn]). The cost of that item is in part2vendor (under part number[pn]). I need each part price in part2vendor to be multiplied by the quantity sold in wo_parts. The way all 3 tie up is workorders.ident=wo_parts.workorder and part2vendor.pn=wo_parts.pn. I hope someone can assist. The above script does not give me the same total as when added by calculator.
This is not an answer, just a comment.
Why don't you take the sum/multiply operation outside the SQL statement? I know, that seems stupid because it will increase the lines of code and the complexity of the script, but, imho, it is always a good thing to keep code and SQL statements as far away as possible.
The key cause I could see for something like this would be a type issue. For example, this could happen if you are using FLOATs instead of NUMERICs, you might get a slightly different answer. That is a mistake that is way too common, btw.
I would recommend double checking your schema to make sure you are using NUMERICs across the board here. NUMERIC is crazy-powerful on PostgreSQL, it performs very well, and it properly supports arbitrary precision operations. If you can't change the data type, cast your fields to numeric in your query.
FLOAT types (including DOUBLE) are fixed precision binary numbers, and they don't correspond exactly to base 10 numbers always. NUMERICs are stored internally as base 1000 (meaning 9 digits per 30 bits), and this is very efficient to convert to/from binary. The precision is also arbitrary, although it does have a maximum. However, for financial stuff, the maximum values or precision are not an issue with numeric data types. You should use them liberally.
Related
I have a MySQL table with thousands of data points stored in 3 columns R, G, B. how can I find which data point is closest to a given point (a,b,c) using Euclidean distance?
I'm saving RGB values of colors separately in a table, so the values are limited to 0-255 in each column. What I'm trying to do is find the closest color match by finding the color with the smallest euclidean distance.
I could obviously run through every point in the table to calculate the distance but that wouldn't be efficient enough to scale. Any ideas?
I think the above comments are all true, but they are - in my humble opinion - not answering the original question. (Correct me if I'm wrong). So, let me here add my 50 cents:
You are asking for a select statement, which, given your table is called 'colors', and given your columns are called r, g and b, they are integers ranged 0..255, and you are looking for the value, in your table, closest to a given value, lets say: rr, gg, bb, then I would dare trying the following:
select min(sqrt((rr-r)*(rr-r)+(gg-g)*(gg-g)+(bb-b)*(bb-b))) from colors;
Now, this answer is given with a lot of caveats, as I am not sure I got your question right, so pls confirm if it's right, or correct me so that I can be of assistance.
Since you're looking for the minimum distance and not exact distance you can skip the square root. I think Squared Euclidean Distance applies here.
You've said the values are bounded between 0-255, so you can make an indexed look up table with 255 values.
Here is what I'm thinking in terms of SQL. r0, g0, and b0 represent the target color. The table Vector would hold the square values mentioned above in #2. This solution would visit all the records but the result set can be set to 1 by sorting and selecting only the first row.
select
c.r, c.g, c.b,
mR.dist + mG.dist + mB.dist as squared_dist
from
colors c,
vector mR,
vector mG,
vector mB
where
c.r-r0 = mR.point and
c.g-g0 = mG.point and
c.b-b0 = mB.point
group by
c.r, c.g, c.b
The first level of optimization that I see you can do would be square the distance to which you want to limit the query so that you don't need to perform the square root for each row.
The second level of optimization I would encourage would be some preprocessing to alleviate the need for extraneous squaring for each query (which could possibly create some extra run time for large tables of RGB's). You'd have to do some benchmarking to see, but by substituting in values for a, b, c, and d and then performing the query, you could alleviate some stress from MySQL.
Note that the performance difference between the last two lines may be negligible. You'll have to use test queries on your system to determine which is faster.
I just re-read and noticed that you are ordering by distance. In which case, the d should be removed everything should be moved to one side. You can still plug in the constants to prevent extra processing on MySQL's end.
I believe there are two options.
You have to either as you say iterate across the entire set and compare and check against a maximum that you set initially at an impossibly low number like -1. This runs in linear time, n times (since you're only comparing 1 point to every point in the set, this scales in a linear way).
I'm still thinking of another option... something along the lines of doing a breadth first search away from the input point until a point is found in the set at the searched point, but this requires a bit more thought (I imagine the 3D space would have to be pretty heavily populated for this to be more efficient on average though).
If you run through every point and calculate the distance, don't use the square root function, it isn't necessary. The smallest sum of squares will be enough.
This is the problem you are trying to solve. (Planar case, select all points sorted by a x, y, or z axis. Then use PHP to process them)
MySQL also has a Spatial Database which may have this as a function. I'm not positive though.
I have a table with a current structure as follows:
Currently this is populated as follows:
The data stored for product value is a decimal value
and the end digits are cut off once it is inserted into the database.
I have tried changing the table structure as follows:
However this only leads to the following:
As you can see all values have a .00 appended if none exists, however I want to
store all these values with no decimal places. Except the product value.
How can I do this?
The trouble is you are converting a decimal (float / double) to an integer, so the value is simply truncated (decimal values are chopped off).
If you really don't want to use floats (decimal values) in the database you can use this hack work around will work:
Multiply the number by 100 before inserting it, and then be sure to divide it by 100 when you use the data. This will allow you to maintain 2 decimal points while using integer storage.
Thus, 2.4 would be stored as 240, 53 would be 5300, 20.74 becomes 2074 etc...
I want to note that this is not an ideal solution, but rather a hack.
I highly recommend what the other users suggested in the comments: storing the decimal value (as you have) and formatting it when presenting it.
--- In addition ---
Your real problem appears to be with the way the database is setup.
Each of those values should have their own field since they will be repeated for each product.
I'm working on a program where you can choose up to 3 things you want to divvy points amongst.
Say for example that an action gains you 4 points, and those 4 points are divvied amongst the 3 things you selected.
In this case, those 3 things each get 1.33333... points.
In my database, they are stored as 1.33.
However when I bring them out, it tallies up to 3.99.
Understandable.
But how can I avoid this without giving one of the things 1.34 points?
Store the full float/double in your database rather than truncating to 2 decimal places. The time to trunc is when displaying the value to the user -- but only trunc the displayed string, not the actual value.
Floating point values are the annoying drunk uncle of computing. Just let them be what they are, and then clean them up when presenting to the public eye.
Floating point numbers will be lossy in this case. If you are dealing with integer numerators and denominators, why not store the numbers as fractions? You can make use of Pear's Math Fraction library or write something yourself.
Use a third decimal place - not for display, but only for tracking precision. If someone divides 4 points among three, store it as 1.333. When you calculate back, you get 3.999 which you round up to 4. On the other hand, if someone divides 3.99 among three objects, store it as 1.33, so when you calculate back, you get 3.99 (and not 3.999) and thus you know not to round up.
I'm thinking of using INT as the datatype of the Price column of my database table. And then, when showing the price on the website, I just use a php function that formats the integer and display the commas and dots in the appropriate places. Is it wrong? Am I gonna run into problems if I do it this way?
This is how our site handles pricing. In the long run it will be easier to calculate the final prices, adding / subtracting int's is easier than floats or decimals (in my opinion). Another area is the locale could be changed more easily as your setting the locale specific pricing at the client instead of the database. So yes I would go with an int or bigint if your working with larger dollar values.
Yes - but they are easily solvable. You also need to store a scaling factor to by applied to the integer, or a denominator. The scaling factors implementation is up to you (could be a list of fixed scales, could be power of 10, could be powers of 7.36 (ugg))
So for example you store 1 as the price and -2 as the scale then you can do 1 x 10^-2 aka 1 * 0.01 = 0.01
We use scaling all the time to transmit real time market data over the wire to give a compression factor. Much cheaper to send 1 and know the scale is 10^6, rather that a 1,000,000
I have a relatively large database (130.000+ rows) of weather data, which is accumulating very fast (every 5minutes a new row is added). Now on my website I publish min/max data for day, and for the entire existence of my weatherstation (which is around 1 year).
Now I would like to know, if I would benefit from creating additional tables, where these min/max data would be stored, rather than let the php do a mysql query searching for day min/max data and min/max data for the entire existence of my weather station. Would a query for max(), min() or sum() (need sum() to sum rain accumulation for months) take that much longer time then a simple query to a table, that already holds those min, max and sum values?
That depends on weather your columns are indexed or not. In case of MIN() and MAX() you can read in the MySQL manual the following:
MySQL uses indexes for these
operations:
To find the MIN() or MAX() value for a
specific indexed column key_col. This
is optimized by a preprocessor that
checks whether you are using WHERE
key_part_N = constant on all key parts
that occur before key_col in the
index. In this case, MySQL does a
single key lookup for each MIN() or
MAX() expression and replaces it with
a constant.
In other words in case that your columns are indexed you are unlikely to gain much performance benefits by denormalization. In case they are NOT you will definitely gain performance.
As for SUM() it is likely to be faster on an indexed column but I'm not really confident about the performance gains here.
Please note that you should not be tempted to index your columns after reading this post. If you put indices your update queries will slow down!
Yes, denormalization should help performance a lot in this case.
There is nothing wrong with storing calculations for historical data that will not change in order to gain performance benefits.
While I agree with RedFilter that there is nothing wrong with storing historical data, I don't agree with the performance boost you will get. Your database is not what I would consider a heavy use database.
One of the major advantages of databases is indexes. They used advanced data structures to make data access lightening fast. Just think, every primary key you have is an index. You shouldn't be afraid of them. Of course, it would probably be counter productive to make all your fields indexes, but that should never really be necessary. I would suggest researching indexes more to find the right balance.
As for the work done when a change happens, it is not that bad. An index is a tree like representation of your field data. This is done to reduce a search down to a small number of near binary decisions.
For example, think of finding a number between 1 and 100. Normally you would randomly stab at numbers, or you would just start at 1 and count up. This is slow. Instead, it would be much faster if you set it up so that you could ask if you were over or under when you choose a number. Then you would start at 50 and ask if you are over or under. Under, then choose 75, and so on till you found the number. Instead of possibly going through 100 numbers, you would only have to go through around 6 numbers to find the correct one.
The problem here is when you add 50 numbers and make it out of 1 to 150. If you start at 50 again, your search is less optimized as there are 100 numbers above you. Your binary search is out of balance. So, what you do is rebalance your search by starting at the mid-point again, namely 75.
So the work a database is just an adjustment to rebalance the mid-point of its index. It isn't actually a lot of work. If you are working on a database that is large and requires many changes a second, you would definitely need to have a strong strategy for your indexes. In a small database that gets very few changes like yours, its not a problem.