I am a field service technician and I have an inventory of parts that is either issued to me by the company I work for or through orders for specific jobs. I am trying to design a website to manage my parts, both on-hand inventory and parts that have been returned or transferred to someone else. Here is the information I need to track:
part number(10 digit)
req number(8 digit, unique)
description(up to 50 characters)
location(Van or shed).
WorkOrder("w"+9 digits ex: 'W212141234')
BOL(15 digit bill of lading #)
TransferDate(date I get rid of part)
TransferMethod(enum 'DEF','RTS','OBF')
I will probably use PHP to make a website and interact with the MySQL database.
What is the best design? A multi-table approach or one table with webpages that display queries of only certain fields? I need a list of on hand parts that list part number, req number, description, and location. I will also need to be able to have "defective returns" view that will list what parts I returned as DEF with all the remaining fields filled in.
Besides the "on hand" fields, the rest of the fields won't have data until they are no longer "on hand".
I really appreciate any help because I am new to both SQL and PHP. I have experimented with Ruby on Rails and django but I am not sure if I need to tackle all that at this point.
Even though you give some information on your issue, it is hard to actually approach it as the question itself on "what is the best design" is vague.
What I would do is this:
MYSQL TABLE DESIGN
Table parts
req number(int(8), unique, KEY)
part number(int(10))
description(varchar(50))
location(enum 'Van','shed')
WorkOrder(varchar(10))
BOL(varchar(15))
TransferDate(date)
TransferMethod(enum 'DEF','RTS','OBF')
onhand (boolean)
PHP SCRIPTS
and then i would make 2 php scripts with a single query each and a table displaying the info
onhand.php
select *fields filled for on hand parts* from parts where onhand = 1
notonhand.php
select *fields filled for not on hand parts* from parts where onhand = 0
Related
I'm working on an application which is a large database of chemical substances (approx 250,000 but rising) and associated data. I'm looking at ways to optimise the way searching is performed.
The application is running under PHP 7.0.27, MariaDB 5.5.56, and Apache 2.4.6
The application allows searching by chemical name and various chemical codes (such as EC number and CAS number). The schema is such that there are separate tables to hold the data, and the relationships of which codes apply to which chemicals.
These tables are in the database:
substances - unique ID and name for each chemical substance.
ecs - a list of EC Numbers
ecs_substances - which EC Number(s) apply to which substances
cas - a list of CAS Numbers
cas_substances - which CAS Number(s) apply to which substances
Note: there are other tables than the ones above where similar logic will apply, but for now I want to focus on these for this example.
It is possible for a substance to have multiple EC/CAS numbers, and a small number do not have them - i.e. it's not a simple 1:1 relationship.
The application has search fields for the substance name (substances.name), EC number (ecs.value) CAS number (cas.value). These can be used on their own, or in conjuction with each other. For example: find a substance by name, or find a substance by name and CAS number.
I believe the "quickest" way of performing a search for any given value would be to use a LIKE condition on the specific table required. So if I want to look up substances which have "acids" as part of the name:
SELECT id FROM substances WHERE name LIKE '%acids%' LIMIT 0,250
However the results that the application gives are shown in a table which includes headings for substance name, CAS number, EC number. It also allows the results to be ordered on a column (e.g. order by substance name, CAS, EC, etc). This requires JOIN conditions.
I'm doing it like this:
$sql = 'SELECT
DISTINCT(substances.`id`),
substances.`name`,
"" AS cas_number,
"" AS ec_number
FROM
substances ';
// Search - EC Number, or if trying to order by EC column (JOIN has to occur to make that possible)
if ( (isset($search['ecNumber'])) || (isset($order['column']) && ($order['column'] == 'ec_number')) ) {
$sql .= ' LEFT JOIN ecs_substances ON substances.id = ecs_substances.substance_id LEFT JOIN ecs ON ecs_substances.ec_id = ecs.id ';
}
// Search - CAS Number, or if trying to order by CAS column (JOIN has to occur to make that possible)
if ( (isset($search['casNumber'])) || (isset($order['column']) && ($order['column'] == 'cas_number')) ) {
$sql .= ' LEFT JOIN cas_substances ON cas_substances.substance_id = substances.id LEFT JOIN cas ON cas_substances.cas_id = cas.id ';
}
The problem is that because of all the JOINs that are occurring it's slowing down how quickly the results can be obtained.
Benchmark: The first query I posted which just uses a LIKE condition on 1 table will execute in 140ms, whereas it's taking 506ms for the same search criteria with all of the JOIN statements in the second block of code.
I'd like to know if there are ways to optimise this such that the time taken to present results to the user decreases.
It's worth mentioning that the results are displayed in DataTables and PHP is producing a JSON feed of the results. The LIMIT 0,250 is something the end user can override by setting results per page, but I'm happy to limit them to say no more than 500 per page.
Some things I've looked into are:
Caching the JSON. Not a big fan of this because the data is updated quite regularly. The data presented must always be what is in the database, not some cached copy.
Do a search on the required table as in the first code sample. Update the other columns with ajax. This would "appear" to give instant results on the column the user has searched and then quickly thereafter populate the other columns required by the DataTable. This seems incredibly fiddly to do and I don't know whether it's really a good idea.
Consider FULLTEXT because it allows for much faster searching than LIKE with a leading wildcard %. `MATCH(col) AGAINST('+acid' IN BOOLEAN MODE)
Sounds like you need a "many:many" mapping table. Tips on efficiency in such: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Consider using GROUP_CONCAT(cas) for provideing a comma-list of CASs.
JSON does not seem practical. And even less so since you are using only MySQL 5.5.
I think a response time of half a second is quite good, given what you want to do. You must have done all necessary database optimizations? (db type, indexes, etc).
There are several things you could explore:
Prepare all possible searches and store them in a database for quick access. This may sound stupid but this is how I often achieve fast searches. It's difficult for me to judge what the best way to do this, with your data, would be. You could start by adding a TEXT column to your substances table and store all the information about the substance in it: It's name, and all EC/CAS numbers. Separate the items with something like '|', or any other character not used in searches. I would call that the 'search' column. Alternatively you could make a new table, just for searching with that column in it, and the id of the substance. Now you can make one search input field for all three types of data and search in one column only. Would that work for you? Would it be faster? Possibly, but I cannot guarantee it. I don't know, but it's quite easy to try. There is a disadvantage: You would have to update that column with every change in the database.
Use a proper search engine. Several are available for mariadb. Start at: https://mariadb.com/kb/en/library/about-sphinxse It basically does something far more advanced than what I described under point 1: Prepare a database with data for optimized searching.
Still, a response of half a second would be something I could live with.
The system is used by many shops, and each shop has its own invoice number, e.g.
SHOPA-0001
SHOPA-0002
SHOPA-0003
SHOPB-0001
SHOPB-0002
...
Now what I did is, select the last ID in MySQL table, and +1 to the invoice number. My problem is, 1 shop has multiple PC running this system, if 2 cashier submit the form at same time, it will has duplication.
Any suggestion to this problem?
Utilize safe Intention Locks like my answer Here. But where I have sections like Chassis and Brakes, yours would be SHOPA and SHOPB etc. You could decide whether you want client-side to handle the left padding of zeros, or if you want mysql to handle it with a column width int(4) or use LPAD() with a CONCAT.
As mentioned in that answer, it is the safe way to do it for concurrency and the shops are segmented off from one another. The lock is uber fast as in momentary if done correctly.
I have a directory of companies provided to me they want stored and updated in a MySQL database. There is no unique identifier such as company #1234 for each company record.
The fields are typical for a mailing list, contact name, company name, street address, city, state, zip code, phone number and type of company. Updates will be sent to me as a CSV file, again, with no company unique identifier number.
How do I go about matching up the stored record in the db to the new one so it can be updated? In this industry the contact name can change, and even the company name because they add and subtract partners. Their street address can change because when they move the business, and they can even change their phone number. The majority of the companies have a website URL, so hopefully that won't change often but it easily could as well.
I've seen in MySQL there is a similar match %, would this be the answer to match records with the new information?
I work in PHP, if there is a PHP solution. Thanks in advance to the kind soul who helps me out with this!
Without primary key, it is always tricky.
One line solution, decide the rules to best suite your requirements.
If I were you, I first would go to the client to decide some rules of identifying similar records. This step is necessary as without primary key, as there is always a chance of duplicate entry or updating wrong record.
Rules could be simple like:
1. Available fileds:
contact name,
company name,
street address,
city,
state,
zip code,
phone number and
type of company (I Hope this is industry)
2. We will first match company name for similarity like
select * from table_name where company_name like '%$company_name%'
3. For all found records, match zip code and phone number. If match, break, record needs to be updated
4. If not match found in step 3, match street address. If match, break, record needs to be updated
5. & so on.
Your client is the best person to decide these rules as he is the owner of the product.
On the other side, asking rules from client is also important to keep you secure as in the absence of primary key, even after all the care, there is always a chance of duplicating records and/or updating wrong record. You could just minimize the chances with good rules.
As you have told that all the fields of the table can change then I think there is no simple way to correctly update the table every time whatever algorithm you choose.
One of the way to achieve this could be to ask the people/system (which sends you the updated records) to also include the old values of the updated fields in the csv file. If you have the old values you can easily match them with the present records and update it with the new values.
This is rather general question, but the solution itself is somewhat unique from project to project.
I would iterate over all records ordered by the time of their change (or a creating date or update timestamp or so). Next I'd match all entries with major fields similar: company name, address (though that might be risky), telephone or an url (parsing domains only). Then, I would recursively iterate over all found entries until no more results are found.
This algorithm would help to find you same entries as long as they do not have all major columns changes at once. If they do, there is no way saying it's the same firm programmatically.
This will link rows with seemingly now connections (rows 1 and 3 in example)
Example:
2001/01/01 Awesome firm, awesome.com
2002/02/02 Awesome firm, newaddress.com // linked with the first row over company name
2010/12/05 Ohsome inc, newaddress.com // linked over url
I have come acroos bit similar scenario in one of my earlier projects in Sql server.I used to do the following things to handle it.
1.Usually there will be 2 types of files--
a)Full feed (frequency weekly) this will have all the companies from the providers database
b)Incremental Feed(Frequency Daily) this will have only the new records which are not in full feed and updates.(inserts-I,updates -U as flags in incremental feeds)
2.So once I receive the full feed I will refresh the my database table with the full feed once in a week.Also here I will have my internal ids to each company record.(thses ids are for internal purpose)
3.On daily basis I process incremental feeds based on the flags(I-insert,U-update).
4.One very important thing here is to manage the mapping table.Once the feed comes just assign a new internal id to it.
5.For comparing the data to avoid duplicates,I used to use Fuzzy algorithm to get all the potential matches and then use wildcard characters to filter and identify which are new and duplicates.
Have a look at the Damerau-Levenshtein distance algorithm. It calculates the "distance" between two strings and determines how many steps it takes to transform one string into another. The less steps the closer the two strings are.
This article shows the algorithm implemented as a MySQL stored function. Here's the PHP version.
The algorithm is so much better than LIKE or SOUNDEX.
Background
I am creating a MySQL database to store items such as courses where there may be many attributes to a single course. For example:
A single course may have any or all of the following attributes:
Title (varchar)
Secondary Title (varchar)
Description (text)
Date
Time
Specific Location (varchar; eg. White Hall Room 7)
General Location (varchar; eg. Las Vegas, NV)
Location Coords (floats; eg. lat, long)
etc.
The database is set up as follows:
A table storing specific course info:
courses table:
Course_ID (a Primary Key unique ID for each course)
Creator_ID (a unique ID for the creator)
Creation_Date (datetime of course creation)
Modified_Date (where this is the most recent timestamp the course was modified)
The table storing each courses multiple attributes is set up as follows:
course_attributes table:
Attribute_ID (a unique ID for each attribute)
Course_ID (reference to the specific course attribute is for)
Attribute (varchar definining the attribute; eg. 'title')
Value (text containing value of specified attribute; eg. 'Title Of My Course')
Desire
I would like to search this database using sphinx search. With this search, I have different fields weighing different amounts, for example: 'title' would be more important than 'description'.
Specific search fields that I wish to have are:
Title
Date
Location (string)
Location (geo - lat/long)
The Question
Should I define a View in Mysql to organize the attributes according to 'title', 'description', etc., or is there a way to define my sphinx.conf file to understand specific attributes?
I am open to all suggestions to solving this problem, whether it be rearrangement of the database/tables or the way in which I search.
Let me know if you need any additional details to help me find a solution.
Thanks in advance for the help
!--Update--!
OK, so after reading some of the answers, I feel that I should provide some additional information.
Latitude / Longitude
The latitude/longitude attributes are created by me internally after receiving the general location string. I can generate the values in any way I wish, meaning that I can store them together in a single lat/long attribute as 'float lat, float long' values or any other desired format. This is done only after they have been generated from the initial location string and verified. This is to guard against malformed data as #X-Zero and #Cody have suggested.
Keep in mind that the latitude and longitude was merely illustrating the need to have that field be searchable as opposed to anything more than that. It is simply another attribute; one of many.
Weighting Search Results
I know how to add weights to results in a Sphinx search query:
$cl->setFieldWeights( array('title'=>1000, 'description'=>500) );
This causes the title column to have a higher weight than the description column if the structure was as #X-Zero suggested. My question was more directed to how one would apply the above logic with the current table definition.
Database Structure, Views, and Efficiency
Using my introductory knowledge of Views, I was thinking that I could possibly create something that displays a row for each course where each attribute is its own column. I don't know how to accomplish this or if it's even possible.
I am not the most confident with database structures, but the reason I set my tables up as described was because there are many cases where not all of the fields will be completed for every course and I was attempting to be efficient [yes, it seems as though I've failed].
I was thinking that using my current structure, each attribute would contain a value and would therefore cause no wasted space in the table. Alternatively, if I had a table with tons of potential attributes, I would think there would be wasted space. If I am incorrect, I am happy to learn why my understanding is wrong.
Let me preface this by saying that I've never even heard of Sphinx, nor (obviously) used it. However, from a database perspective...
Doing multi-domain columns like this is a terrible (I will hunt you down and kill you) idea. For one thing, it's impossible to index or sort meaningfully, period. You also have to pray that you don't get a latitude attribute with textual data (and because this can only be reinforced programatically, I'm going to garuantee this will happen) - doing so will cause all distance based formulas to crash. And speaking of location, what happens if somebody stores a latitude without a longitude (note that this is possible regardless of whether you are storing a single GeoLocation attribute, or the pair)?
Your best bet is to do the following:
Figure out which attributes will always be required. These belong in the course table (...mostly).
For each related set of optional attributes, create a table. For example, location (although this should probably be required...), which would contain Latitude/Longitude, City, State, Address, Room, etc. Allow the columns to be nullable (in sets - add constraints so users can't add just longitude and not latitude).
For every set of common queries add a view. Even (perhaps especially) if you persist in using your current design, use a view. This promotes seperation between the logical and physical implementations of the database. (This assumes searching by SQL) You will then be able to search by specifying view_column is null or view_column = input_parameter or whichever.
For weighted searching (assuming dynamic weighting) your query will need to use left joins (inside the view as well - please document this), and use prepared-statement host-parameters (just save yourself the trouble of trying to escape things yourself). Check each set of parameters (both lat and long, for example), and assign the input weighting to a new column (per attribute), which can be summed up into a 'total' column (which must be over some threshold).
EDIT:
Using views:
For your structure, what you would normally do is left join to the attributes table multiple times (one for each attribute needed), keying off of the attribute (which should really be an int FK to a table; you don't want both 'title' and 'Title' in there) and joining on course_id - the value would be included as part of the select. Using this technique, it would be simple to then get the list of columns, which you can then apparently weight in Sphinx.
The problem with this is if you need to do any data conversion - you are betting that you'll be able to find all conversions if the type ever changes. When using strongly typed columns, this is between trivial (the likelyhood is that you end up with a uniquely named column) to unnecessary (views usually take their datatype definitions from the fields in the query); with your architecture, you'll likely end up looking through too many false positives.
Database efficiency:
You're right, unfilled columns are wasted space. Usually, when something is optional(ish), that means you may need an additional table. Which is Why I suggested splitting off location into it's own table: this prevents events which don't need a location (... what?) from 'wasting' the space, but then forces any event that defines a location to specify all required information. There's an additional benefit about splitting it off this way: if multiple events all use the same location (... not at the same time, we hope), a cross-reference table will save you a lot of space. Way more than your attributes table ever could (you're still having to store the complete location per event, after all). If you still have a lot of 'optional' attributes, I hear that NoSQL is made for these kinds of things (but I haven't really looked into it). However, other than that, the cost of an additional table is trivial; the cost of the data inside may not be, but the space required is weighed against the perceived value of the data stored. Remember that disk space is relatively cheap - it's developer/maintainer time that is expensive.
Side note for addresses:
You are probably going to want to create an address table. This would be completely divorced from the event information, and would include (among other things) the precomputed latitude/longitude (in the recommended datatype - I don't know what it is, but it's for sure not a comma-separated string). You would then have an event_address table that would be the cross-reference between the events and where they take place - if there is additional information (such as room), that should be kept in a location table that is referenced (instead of referencing address directly). Once a lat/long value is computed, you should never need to change it.
Thoughts on later updates for lat/long:
While specifying the lat/long values yourself is better, you're going to want to make them a required part of the address table (or part of/in addition to a purely lat/long only table). Frankly, multi-value columns (delimited lists) of any sort are just begging for trouble - you keep having to parse them every time you search on them (among other related issues). And the moment you make them separate rows, one of the pair will eventually get dropped - Murphy himself will personally intervene, if necessary. Additionally, updating them at different times from the addresses will result in an address having a lat/long pair that does not match; your best bet is to compute this at insertion time (there are a number of webservices to find this information for you).
Multi-domain tables:
With a multi-domain table, you're basically betting that the domain key (attribute) will never become out-of-sync with the value (err, value). I don't care how good you are, somewhere, somehow, it's going to happen: at my company, we had one of these in our legacy application (it stored FK links and which files the FKs refer to, along with an attribute). At one point an application was installed in production which promptly began storing the correct file links, but the FK links to a different file, for a given class of attribute. Thankfully, there were audit records in another file which allowed this to be reversed (... as near as they were able tell).
In summary:
Revisit your required/optional data. Don't be afraid to create additional tables, each for a single entity, with every column for a single domain; you will also need relationship tables. You may also wish to place your audit data (last_updated_time) in a set of separate tables (single-domain tables will help immensely in this regard).
In the sphinx config you define your index and the SQL queries that populate it. You can define basic attributes, see Sphinx Attributes
Sphinx also supports geo searches on lat/long but they need to be expressed in radians, definitely not text columns like you have. I agree with X-Zero that storing lat/lng values are strings is a bad idea.
OK, I know the technical answer is NEVER.
BUT, there are times when it seems to make things SO much easier with less code and seemingly few downsides, so please here me out.
I need to build a Table called Restrictions to keep track of what type of users people want to be contacted by and that will contain the following 3 columns (for the sake of simplicity):
minAge
lookingFor
drugs
lookingFor and drugs can contain multiple values.
Database theory tells me I should use a join table to keep track of the multiple values a user might have selected for either of those columns.
But it seems that using comma-separated values makes things so much easier to implement and execute. Here's an example:
Let's say User 1 has the following Restrictions:
minAge => 18
lookingFor => 'Hang Out','Friendship'
drugs => 'Marijuana','Acid'
Now let's say User 2 wants to contact User 1. Well, first we need to see if he fits User 1's Restrictions, but that's easy enough EVEN WITH the comma-separated columns, as such:
First I'd get the Target's (User 1) Restrictions:
SELECT * FROM Restrictions WHERE UserID = 1
Now I just put those into respective variables as-is into PHP:
$targetMinAge = $row['minAge'];
$targetLookingFor = $row['lookingFor'];
$targetDrugs = $row['drugs'];
Now we just check if the SENDER (User 2) fits that simple Criteria:
COUNT (*)
FROM Users
WHERE
Users.UserID = 2 AND
Users.minAge >= $targetMinAge AND
Users.lookingFor IN ($targetLookingFor) AND
Users.drugs IN ($targetDrugs)
Finally, if COUNT == 1, User 2 can contact User 1, else they cannot.
How simple was THAT? It just seems really easy and straightforward, so what is the REAL problem with doing it this way as long as I sanitize all inputs to the DB every time a user updates their contact restrictions? Being able to use MySQL's IN function and already storing the multiple values in a format it will understand (e.g. comma-separated values) seems to make things so much easier than having to create join tables for every multiple-choice column. And I gave a simplified example, but what if there are 10 multiple choice columns? Then things start getting messy with so many join tables, whereas the CSV method stays simple.
So, in this case, is it really THAT bad if I use comma-separated values?
****ducks****
You already know the answer.
First off, your PHP code isn't even close to working because it only works if user 2 has only a single value in LookingFor or Drugs. If either of these columns contains multiple comma-separated values then IN won't work even if those values are in the exact same order as User 1's values. What do expect IN to do if the right-hand side has one or more commas?
Therefore, it's not "easy" to do what you want in PHP. It's actually quite a pain and would involve splitting user 2's fields into single values, writing dynamic SQL with many ORs to do the comparison, and then doing an extremely inefficient query to get the results.
Furthermore, the fact that you even need to write PHP code to answer such a relatively simple question about the intersection of two sets means that your design is badly flawed. This is exactly the kind of problem (relational algebra) that SQL exists to solve. A correct design allows you to solve the problem in the database and then simply implement a presentation layer on top in PHP or some other technology.
Do it correctly and you'll have a much easier time.
Suppose User 1 is looking for 'Hang Out','Friendship' and User 2 is looking for 'Friendship','Hang Out'
Your code would not match them up, because 'Friendship','Hang Out' is not in ('Hang Out','Friendship')
That's the real problem here.