How to query a MSSQL database using a concatenated field - php

Is there anybody that can give advice on solving this issue that I am having. I am running PHP 5.3 with MSSQL. Just to explain what happens and what I need to be able to do...
The user selects a specific run (row) from a table on the home page. The columns in the table are:
date
division
start mile/yard (in the format 565.1211 i.e. mile 565 and yard 1211) [this column is made up of a concatenation of two separate columns "mile" and "yard" from my database]
end mile/yard
start lat
start long
end lat
end long
total
report available (yes or no)
The user can select a row by clicking on a cell in the report column where report=yes. This data is posted onto the next page. The next page allows the user to change the start and end mile/yard data so that they can see a specific section of that run.
For example the user has selected on the home page to view data from start mile/yard 565.1211 to end mile/yard 593.4321. The user can change the section that they want to see by typing into two text boxes. One box is a "start mile/yard" and the second box is "end mile/yard". So the user may type into the "start mile/yard" text box 570.2345 and "end mile/yard" 580.6543. What I want to happen is to query data from where the user has input...
SELECT id, CAST(mile AS varchar(6)) + '.' + CAST(yard AS varchar(6)) AS Mile, gps_lat, gps_long, rotten, split, broken, quality
FROM table
WHERE mile/yard BETWEEN start mile ??? AND end mile ???
ORDER BY Mile
My problem is, how would I go about querying this information from my database when the user types in a combination of mile/yard (580.6543)? I assume that I will have to split the data into mile and yard again (How would I do this)... also How do retrieve the information? It would be simple to do just search by mile (e.g. WHERE mile BETWEEN 570 AND 580), but how do I search by yard and mile at the same time?
Unfortunately I will not be able to change the database structure as this is what I have to work with... If anyone can think of a better way of doing what I am doing... I am all ears!!
I understand this is a long question, so anything that is unclear, please let me know!
Cheers,
Neil

You shouldn't cast those mile/yard numbers to varchars. You lose the ability to compare them AS NUMBERS, which is what they were to start with.
Convert the user's mile/yard values to a number, then compare against those numbers in the database.
You're trying to force apples and grapes to be oranges, and comparing them to pears and plums... just make everything a pineapple, so to speak.
Besides, by doing the cast + concatenation, you lose any chance of ever possibly putting indexes on those fields. If your table grows "large", you'll kill performance by forcing full-table scans for every query.

As Marc B said I was trying to force apples and grapes to be oranges, and comparing them to pears and plums... just make everything a pineapple, so to speak. I have changed the structure of my script now.
Cheers,
Neil

Related

Fuzzy date match

I have a mysql db of clients and crawled a website retrieving all the reviews for the past few years. Now I am trying to match those reviews up with the clients so I can email them. The problem is that the review site allowed them to enter anything they wanted for the name, so in some cases I have full first name and last initial, and in some cases first initial and last full name. It also gives an approximate time it was posted such as "1 week ago", "6 months ago" and so on which we already have converted to an approximate date.
Now I need to try matching those up to the clients. Seems the best way would be to do a fuzzy search on the names, and then once I find all John B% I look for the one with a job completion date nearest the posting of the review naturally eliminating anything that was posted before jobs were completed.
I put together a small sample dataset where table1 is the clients, table2 is the review to match on here:
http://sqlfiddle.com/#!9/23928c/6/0
I was initially thinking of doing a date_diff, but then I need to sort by the lowest number. Before I tackle this on my own, I thought I would ask if anyone has any tricks they want to share.
I am using PHP / Laravel to query MySql
You can use DATEDIFF with absolute values:
ORDER BY ABS(DATEDIFF(`date`, $calculatedDate)) DESC
To find records that match your estimation closely, positive or negative.

MySQL - building a working-schedule / how to fill DB?

I'm about to build a database for a bunch of employees (around 90) to manage their working schedule more easily. Let's say I have 3 tables which all look pretty much like this:
date / agent1 / agent2 / agent3 / etc.
01.01.2015 / Max / Gitti / Heinz / etc.
One of the tables is for "work starts at 8am, ends at 2pm",
the other is for "work starts at 2pm, ends at 9pm",
the third is for "work starts at 3pm, ends at 9pm".
I can fill the database and manage this all by myself but that wouldn't be much of an improvement.
Is there a way to fill those tables with random names from our "employee" table while also checking for employee preferences (for example Gitti doesn't like to work on Thursday afternoons)?
I'd appreciate every single hint :)
Yes.
Find out how many spots to fill.
Build a function to randomly fill an array with that number of employees.
Make sure your function checks for preferences and doesn't add an employee to an unpreferred day.
Use that array to build a query for your database.
Run the query.
As for specifics on how to do these things, please consult php.net as well as your search engine of choice.

MySQL: Calculating totals for user-selectable year ranges

Let me start off by stating that I'm a just a self-taught hobbyist at this, so I'm sure I'm doing some things wrong or ineffciently, so any feedback is appreciated. If this question is moot because I've made fundamental errors and need to start from scratch, I guess I need to know so I'll become better.
With that, here's the problem:
I have a database of birth names in MySQL that is intended to let you find the frequency of those names within a given year range. My only table has a lot of columns:
**Name** **Begins** **Popularity** **1800** **1801** **1802**
Aaron A 500 6 7 4
Amy A 100 10 2 12
Ashley A 250 2 5 7
...and so forth until 2013.
Right now I've written a PHP page that can call up a list of names based on the start letter over the entire year range (1800-2013). That works, but what I'd like to do is to let the user specify a custom year range from the dropdowns I put on the home page and use that to calculate the frequency of each name for the custom year range only. I'd also like to be able to sort the resulting list based on those frequency values, not the all-time frequency stored in 'Popularity'.
From what I've looked at, I'm thinking part of the solution might lie in using custom views but I just can't seem to put the pieces all together. Or should I somehow pre-calculate all possible combinations?
Here's is the working query code I'm using right now:
{$query = "SELECT Name
FROM nametable
WHERE Gender = '$genselect'
AND
(BeginsWith = '$begins')
ORDER BY $sortcolumn $sortorder";
goto resultspage;
}
resultspage:
$result = mysqli_query($dbcnx, $query)
or die ("Error in query: $query.".mysqli_error($dbcnx));
$rows = $result->num_rows;
echo "<br>You found $rows names!<br>";
while($row=mysqli_fetch_assoc($result))
{
echo '<br>'.$row['Name'];
}
I think you're going to have to consider structuring your data in a different way to make the most of using an RDBMS.
If it were me, I'd be looking at normalising data into different tables in the first instance and disposing of unnecessary fields such as "Begins" and "Popularity". That kind of information can easily be reproduced or sought out in PHP or within a query itself. The advantage here is that you also reduce the number of columns that actually need to be maintained.
I haven't worked out a silver bullet schema but, roughly, I'd start with something along these lines and expand/modify where appropriate:
Names
- id
- name
- genderID
Genders
- id
- code
Years
- id
Frequencies
- id
- nameID
- yearID
- number
So, for example, a segment of your data may take the following shape:
Names (1, Aaron, 1)
Genders (1, Male)
Years (1987)
Frequencies (1, 1, 1987, 6), (1, 1, 1988, 19)
The beauty of having your data separated out like this is that it becomes much easier to query it. So, if you wanted the frequency of occurrences of the name Aaron between 1987 and 1988 you could do something like the following:
SELECT SUM(frequencies.number) FROM frequencies WHERE frequencies.yearID
BETWEEN 1987 AND 1988
AND frequencies.nameID = 1
Furthermore, doing away with the "Begins" column would mean you can structure a query to use "LIKE"
SELECT * FROM names WHERE name LIKE "A%"
My examples are perhaps a bit contrived but hopefully they illustrate what I'm getting at.
One thing I haven't touched upon is how you might go about physically entering the data. What happens when a new name is added? Does a corresponding entry get made in the frequencies table automatically? Is a check performed in the frequencies table first and, if an entry exists, does it automatically increment the number?
These are important problems to consider but probably best left until after a schema is settled upon.

php - how do I display 5 results from possible 50 randomly but ensure all results are displayed equal amount

In php - how do I display 5 results from possible 50 randomly but ensure all results are displayed equal amount.
For example table has 50 entries.
I wish to show 5 of these randomly with every page load but also need to ensure all results are displayed rotationally an equal number of times.
I've spent hours googling for this but can't work it out - would very much like your help please.
please scroll down for "biased randomness" if you dont want to read.
In mysql you can just use SeleCT * From table order by rand() limit 5.
What you want just does not work. Its logically contradicting.
You have to understand that complete randomness by definition means equal distribution after an infinite period of time.
The longer the interval of selection the more evenly the distribution.
If you MUST have even distribution of selection for example every 24h interval, you cannot use a random algorithm. It is by definition contradicting.
It really depends no what your goal is.
You could for example take some element by random and then lower the possibity for the same element to be re-chosen at the next run. This way you can do a heuristic that gives you a more evenly distribution after a shorter amount of time. But its not random. Well certain parts are.
You could also randomly select from your database, mark the elements as selected, and now select only from those not yet selected. When no element is left, reset all.
Very trivial but might do your job.
You can also do something like that with timestamps to make the distribution a bit more elegant.
This could probably look like ORDER BY RAND()*((timestamps-min(timestamps))/(max(timetamps)-min(timestamps))) DESC or something like that. Basically you could normalize the timestamp of selection of an entry using the time interval window so it gets something between 0 and 1 and then multiply it by rand.. then you have 50% fresh stuff less likely selected and 50% randomness... i am not sure about the formular above, just typed it down. probably wrong but the principle works.
I think what you want is generally referred to as "biased randomness". there are a lot of papers on that and some articles on SO. for example here:
Biased random in SQL?
Copy the 50 results to some temporary place (file, database, whatever you use). Then everytime you need random values, select 5 random values from the 50 and delete them from your temporary data set.
Once your temporary data set is empty, create a new one copying the original again.

Tricky file parsing. Inconsistent Delimeters

I need to parse a file with the following format.
0000000 ...ISBN.. ..Author.. ..Title.. ..Edit.. ..Year.. ..Pub.. ..Comments.. NrtlExt Nrtl Next Navg NQoH UrtlExt Urtl Uext Uavg UQoH ABS NEB MBS FOL
ABE0001 0-679-73378-7 ABE WOMAN IN THE DUNES (INT'L ED) 1st 64 RANDOM 0.00 13.90 0.00 10.43 0 21.00 10.50 6.44 3.22 2 2.00 0.50 2.00 2.00 ABS
The ID and ISBN are not a problem, the title is. There is no set length for these fields, and there are no solid delimiters- the space can be used for most of the file.
Another issue is that there is not always an entry in the comments field. When there is, there are spaced within the content.
So I can get the first two, and the last fourteen. I need some help figuring out how to parse the middle six fields.
This file was generated by an older program that I cannot change. I am using php to parse this file.
I would also ask myself 'How good does this have to be' and 'How many records are there'?
If, for example, you are parsing this list to put up a catalog of books to sell on a website - you probably want to be as good as you can, but expect that you will miss some titles and build in feedback mechanism so your users can help you fix the issue ( and make it easy for you to fix it in your new format).
On the other hand, if you absolutely have to get it right because you will loose lots of money for each wrong parse, and there are only a few thousand books, you should plan on getting close, and then doing a human review of the entire file.
(In my first job, we spend six weeks on a data conversion project to convert 150 records - not a good use of time).
Find the title and publisher of the book by ISBN (in some on-line database) and parse only the rest :)
BTW. are you sure that what looks like space actually is a space? There are more "invisible" characters (like non-break space). I know, not a good idea, but apparently author of that format was pretty creative...
You need to analyze you data by hand and find out what year, edition and publisher look like. For example if you find that year is always two digits and publisher always comes from some limited list, this is something you can start with.
While I don't see any way other then guessing a bit I'd go about it something like this:
I'd scale off what I know I can parse out reliably. Leaving you with ABE WOMAN IN THE DUNES (INT'L ED) 1st 64 RANDOM
From there I'd try locate the Edition and split the string into two at that position after storing and removing the Edition leaving you with ABE WOMAN IN THE DUNES (INT'L ED) & 64 RANDOM, another option is to try with the year but of course Titles such as 1984 might present a problem . (Guessing edition is of course assuming it's 7th, 51st etc for all editions).
Finally I'd assume I could somewhat reliable guess the year 64 at the start of the second string and further limit the Publisher(/Comment) part.
The rest is pure guesswork unless you got a list of authors/publishers somewhere to match against as I'd assume there are not only comments with spaces but also publishers with spaces in their names. But at least you should be down to 2 strings containing Author/Title in one and Publisher(/Comments) in the other.
All in all it should limit the manual part a bit.
Once done I'd also save it in a better format somewhere so I don't have to go about parsing it again ;)
I don't know if the pcre engine allows multiple groups from within selection, therefore:
([A-Z0-1]{7})\ (\d-\d{3}-\d{5}-\d)\
(.+)\ (\d(?:st|nd|rd))\ \d{2}\
([^\d.]+)\ (\d+.\d{2})\ (\d+.\d{2})\
(\d+.\d{2})\ (\d+.\d{2})\ (\d{1})\
(\d+.\d{2})\ (\d+.\d{2})\ (\d+.\d{2})\
(\d+.\d{2})\ (\d)\ (\d+.\d{2})\
(\d+.\d{2})\ (\d+.\d{2})\ (\d+.\d{2})\
(\w{3})
It does look quite ugly and doesn't fix your author-title problem but it matches quite good for the rest of it.
Concerning your problem I don't see any solution but having a lookup table for authors or using other services to lookup title and author via the ISBN.
Thats if unlike in your example above the authors are not just represented by their first name.
Also double check all exception that might occur with the above regex as titles may contain 1st or alike.

Categories