I need help with a query. I am taking input from a user where they enter a range between 1-100. So it could be like 30-40 or 66-99. Then I need a query to pull data from a table that has a high_range and a low_range to find a match to any number in their range.
So if a user did 30-40 and the table had entries for 1-80, 21-33, 32-40, 40-41, 66-99, and 1-29 it would find all but the last two in the table.
What is the easiest why to do this?
Thanks
If I understood correctly (i.e. you want any range that overlaps the one entered by the user), I'd say:
SELECT * FROM table WHERE low <= $high AND high >= $low
What I understood is that the range is stored in this format low-high. If that is the case, then this is a poor design. I suggest splitting the values into two columns: low, and high.
If you already have the values split, you can use some statement like:
SELECT * FROM myTable WHERE low <= $needleHigherBound AND high >= $needleLowerBound
If you have the values stored in one column, and insist they stay so, You might find the SUBSTRING_INDEX function of MySQL useful. But in this case, you'll have to write a complicated query to parse all the values of all the rows, and then compare them to your search values. It seems like a lot of effort to cover up a design flaw.
Related
Let me start off by stating that I'm a just a self-taught hobbyist at this, so I'm sure I'm doing some things wrong or ineffciently, so any feedback is appreciated. If this question is moot because I've made fundamental errors and need to start from scratch, I guess I need to know so I'll become better.
With that, here's the problem:
I have a database of birth names in MySQL that is intended to let you find the frequency of those names within a given year range. My only table has a lot of columns:
**Name** **Begins** **Popularity** **1800** **1801** **1802**
Aaron A 500 6 7 4
Amy A 100 10 2 12
Ashley A 250 2 5 7
...and so forth until 2013.
Right now I've written a PHP page that can call up a list of names based on the start letter over the entire year range (1800-2013). That works, but what I'd like to do is to let the user specify a custom year range from the dropdowns I put on the home page and use that to calculate the frequency of each name for the custom year range only. I'd also like to be able to sort the resulting list based on those frequency values, not the all-time frequency stored in 'Popularity'.
From what I've looked at, I'm thinking part of the solution might lie in using custom views but I just can't seem to put the pieces all together. Or should I somehow pre-calculate all possible combinations?
Here's is the working query code I'm using right now:
{$query = "SELECT Name
FROM nametable
WHERE Gender = '$genselect'
AND
(BeginsWith = '$begins')
ORDER BY $sortcolumn $sortorder";
goto resultspage;
}
resultspage:
$result = mysqli_query($dbcnx, $query)
or die ("Error in query: $query.".mysqli_error($dbcnx));
$rows = $result->num_rows;
echo "<br>You found $rows names!<br>";
while($row=mysqli_fetch_assoc($result))
{
echo '<br>'.$row['Name'];
}
I think you're going to have to consider structuring your data in a different way to make the most of using an RDBMS.
If it were me, I'd be looking at normalising data into different tables in the first instance and disposing of unnecessary fields such as "Begins" and "Popularity". That kind of information can easily be reproduced or sought out in PHP or within a query itself. The advantage here is that you also reduce the number of columns that actually need to be maintained.
I haven't worked out a silver bullet schema but, roughly, I'd start with something along these lines and expand/modify where appropriate:
Names
- id
- name
- genderID
Genders
- id
- code
Years
- id
Frequencies
- id
- nameID
- yearID
- number
So, for example, a segment of your data may take the following shape:
Names (1, Aaron, 1)
Genders (1, Male)
Years (1987)
Frequencies (1, 1, 1987, 6), (1, 1, 1988, 19)
The beauty of having your data separated out like this is that it becomes much easier to query it. So, if you wanted the frequency of occurrences of the name Aaron between 1987 and 1988 you could do something like the following:
SELECT SUM(frequencies.number) FROM frequencies WHERE frequencies.yearID
BETWEEN 1987 AND 1988
AND frequencies.nameID = 1
Furthermore, doing away with the "Begins" column would mean you can structure a query to use "LIKE"
SELECT * FROM names WHERE name LIKE "A%"
My examples are perhaps a bit contrived but hopefully they illustrate what I'm getting at.
One thing I haven't touched upon is how you might go about physically entering the data. What happens when a new name is added? Does a corresponding entry get made in the frequencies table automatically? Is a check performed in the frequencies table first and, if an entry exists, does it automatically increment the number?
These are important problems to consider but probably best left until after a schema is settled upon.
I have a slight problem. I have a dataset, which contains values measured by a weather station, which I want to analyze further using MySQL database and PHP.
Basically, the first column of the db contains the date and the other columns temperature, humidity, pressure etc.
Now, the problem is, that for the calculation of the mean, st.dev., max, min etc. it is quite simple. However there are no build-in commands for other parameters which I need, such as kurtosis etc.
What I need is for example to calculate the skewness, mean, stdev etc. for the individual months, then days etc.
For the build-in functions it is easy, for example finding some of the parameters for the individual months would be:
SELECT AVG(Temp), STD(Temp), MAX(Temp)
FROM database
GROUP BY YEAR(Date), MONTH(Date)
Obviously I cannot use this for the more advanced parameters. I thought about ways of achieving this and I could only think of one solution. I manually wrote a function, which processes the values and calculates the things such as kurtosis using the particular formulae. But, what that means is that I would need to create arrays of data for each month, day, etc. depending on what I am currently calculating. So for example, i would first need to take the data and split it into arrays lets say Jan11, Feb11, Mar11...... and each array would contain the data for that month. Then I would apply the function on those arrays and create new variables with the result (lets say kurtosis_jan11, kurtosis_feb11 etc.)
Now to my question. I need help with the splitting of data. The problem is that I dont know in advance which month the data starts and which it ends, so I cannot set fixed variables for this. The program first has to check the first month and then create new array for each month, day etc. until it reaches the last record. And for each it would create the array.
That of course would be maybe one solution but if anyone has any other ideas about how to go around this problem I would very much appreciate your help.
You can do more complex queries to achieve this. Here are some examples http://users.drew.edu/skass/sql/ , including Skew
SELECT AVG(Temp), STD(Temp), MAX(Temp)
FROM database
GROUP BY YEAR(Date), MONTH(Date)
having date between date_from and date_to
I think you want a group of data in between a data range.
I have a PHP form that grabs user-entered data and then posts it to a MySQL database. I'd like to know, how I can take the mathematical difference between two fields and post it to a third field in the database?
For example, I'd like to subtract "travel_costs" from "show_1_price" and write the difference to the "total_cost" variable. What's the best way to do this? Thanks so much.
You can lately process a select query: SELECT show_1_price - travel_costs AS pricediff FROM my_table; and then grab value in php and again do an insert query...
Should be simple to do on the PHP side of things how about
query=sprintf("INSERT INTO table VALUES(%d, %d, %d)", travel_costs,
show_1_price, show_1_price - travel_cost);
Generally though it is bad form to store a value in a database that can be calculated from other values. The reason being that you may never ever access this value again yet you are using storage for it. CPU cycles are much more abundant today so calculate the value when need. This is not a golden rule though - there are times when it could be more efficient to store the calculated value - although this is not usually the case.
I have a table with columns ID(int), Number(decimal), and Date(int only timestamp). There are millions of rows. There are indexes on ID and Date.
On many of my pages I am querying this four or five times for a list of Numbers in a specified date range (the range being different each query).
Like:
select number,date where date < 111111111 and date >111111100000
I'm querying these sets of data to be placed on several different charts. "Today vs Yesterday", "This Month vs Last Month", "This Year vs Last Year".
Would querying the largest possible result set with the sql statement and then using my programming language to filter down the query via a sorted and spliced array be better than waiting for each of these 0.3 second queries to finish?
Is there something else that can be done to speed this up?
It depends on the result set and the executing speed of your queries. There is no ultimate answer to this question.
You should benchmark and calculate the results if you really need to speed up things.
But keep in mind that premature optimization should be avoided besides that you'll implement an already implemented logic in your code which can contain bugs, etc. etc.
While it may cause the query to perform quicker you have to ask yourself about the potential impacts to memory if you were to attempt to load in the entire range of records and then aggregating it programatically.
Chances are that the MySQL optimatizations based on index will perform better than anything you could come up with anyway so it sounds like a bad idea.
I have a relatively large database (130.000+ rows) of weather data, which is accumulating very fast (every 5minutes a new row is added). Now on my website I publish min/max data for day, and for the entire existence of my weatherstation (which is around 1 year).
Now I would like to know, if I would benefit from creating additional tables, where these min/max data would be stored, rather than let the php do a mysql query searching for day min/max data and min/max data for the entire existence of my weather station. Would a query for max(), min() or sum() (need sum() to sum rain accumulation for months) take that much longer time then a simple query to a table, that already holds those min, max and sum values?
That depends on weather your columns are indexed or not. In case of MIN() and MAX() you can read in the MySQL manual the following:
MySQL uses indexes for these
operations:
To find the MIN() or MAX() value for a
specific indexed column key_col. This
is optimized by a preprocessor that
checks whether you are using WHERE
key_part_N = constant on all key parts
that occur before key_col in the
index. In this case, MySQL does a
single key lookup for each MIN() or
MAX() expression and replaces it with
a constant.
In other words in case that your columns are indexed you are unlikely to gain much performance benefits by denormalization. In case they are NOT you will definitely gain performance.
As for SUM() it is likely to be faster on an indexed column but I'm not really confident about the performance gains here.
Please note that you should not be tempted to index your columns after reading this post. If you put indices your update queries will slow down!
Yes, denormalization should help performance a lot in this case.
There is nothing wrong with storing calculations for historical data that will not change in order to gain performance benefits.
While I agree with RedFilter that there is nothing wrong with storing historical data, I don't agree with the performance boost you will get. Your database is not what I would consider a heavy use database.
One of the major advantages of databases is indexes. They used advanced data structures to make data access lightening fast. Just think, every primary key you have is an index. You shouldn't be afraid of them. Of course, it would probably be counter productive to make all your fields indexes, but that should never really be necessary. I would suggest researching indexes more to find the right balance.
As for the work done when a change happens, it is not that bad. An index is a tree like representation of your field data. This is done to reduce a search down to a small number of near binary decisions.
For example, think of finding a number between 1 and 100. Normally you would randomly stab at numbers, or you would just start at 1 and count up. This is slow. Instead, it would be much faster if you set it up so that you could ask if you were over or under when you choose a number. Then you would start at 50 and ask if you are over or under. Under, then choose 75, and so on till you found the number. Instead of possibly going through 100 numbers, you would only have to go through around 6 numbers to find the correct one.
The problem here is when you add 50 numbers and make it out of 1 to 150. If you start at 50 again, your search is less optimized as there are 100 numbers above you. Your binary search is out of balance. So, what you do is rebalance your search by starting at the mid-point again, namely 75.
So the work a database is just an adjustment to rebalance the mid-point of its index. It isn't actually a lot of work. If you are working on a database that is large and requires many changes a second, you would definitely need to have a strong strategy for your indexes. In a small database that gets very few changes like yours, its not a problem.