How to store datapoint that is unique but related to database

How to store datapoint that is unique but related to database - php

I'm working on a small PHP/Javascript map application that is basically just pulling a bunch of location names and coordinate points from a MySQL database table and adding them to an HTML canvas. It's supposed to represent locations visited by a character in an ongoing collaborative story, so I'd like to be able to also have the map retrieve the character's position on the map at any given time and display this with a different icon.
The most obvious solution -- since the character, like the map locations, has a name and coordinates -- seems to be just including the character as their own row in the locations table as well, and having the map code recognize their unique information and display them differently. But since the character is not, in and of themselves, a location, storing them in the "locations" table seems weird. Creating a whole new table, "character", just for this one row seems like overkill though.
So I guess my more general question is what is a good way for PHP/MySQL to deal with unique data like this, that is related to existing tables but not closely enough. Do I keep this data in a text file and update that with PHP?

There's nothing particularly "weird" about having a table which is intended to have only a single row. Indeed, using a database for some of your data and a text file for other data would probably be a bit more weird.
Having a table also gives you the possibility of tracking character locations over time, if you ever need that information for any reason. (Better to track it and not use it than need it and not have been tracking it.)
If the position of the character is calculated on the spot and doesn't need to be persisted then you can simply add it programmatically to the results from the database and it would be entirely transparent to both the database and the view. But if it does need to be persisted, a table is probably the way to go.

You should have a separate table for the character positions . . . over time. It would have columns such as:
Character id
Location
Date/time stamp
Eventually, you may want to have more than one character whose location can be shown over time. You may have non-characters; in this case, you'll want to change the name of the table.
There is a big difference between the locations and the character positions. The locations are static, at least once they are defined. The character positions are time dependent. They are a separate entity, and are best served by having their own table.

Related

Storing text in db: how to choose varchar size (considering formatting), storing formatting separately?

How to best choose a size for a varchar/text/... column in a (mysql) database (let's assume the text the user can type into a text area should be max 500 chars), considering that the user also might use formatting (html/bb code/...), which is not visible to the user and should not affect the max 500 chars text size...??
1) theoretically, to prevent any error, the varchar size has to be almost endless, if the user e.g. uses 20 links like this (http://[huge number of chars]) or whatever... - or not?
2) should/could you save formatting in a separate column, to e.g. not give an index (like FULLTEXT) wrong values (words that are contained in formatting but not in the real text)?
If yes, how to best do this? do you remember at which point the formatting was used, save this point and the formatting and when outputting put this information together?
(php/mysql, java script, jquery)
Thank you very much in advance!

A good solution is to consider in the amount of formatting characters.
If you do not, to avoid data loss, you need to use much more space for the text on the database and check the length of prior record before save or use full text.
Keep the same data twice in one table is not a good solution, it all depends on your project, but usually better it's filter formating on php.

PHP MySQL Name Score separation system

I've got a website which lists sports scores. It current works like this:
Firstname Lastname 1-0 Firstname Lastname
It explodes this based on spaces, then explodes the third one (containing the scores) based on the - .
The problem with this is that it does not support names with more than 2 words. If I explode using - first, it would not support names with - in there. The results are added in a textarea, because I have many thousands that need to be added, so I don't want to make multiple fields to input data into, as I can currently add matches quickly listing one result per line. Does anyone have advice on how to make the system both multi-word, and special character-insensitive? Is there maybe a way to split when it encounters a number, then select the first chunk as the first name, the last as that players score, and the rest as the last name?

I don't know if there's any way to teach a simple parsing command, or even a regular expression, to do what you want. Some cases will always be ambiguous. For example, if you have the names `Mary Ann Steiner" and "Constantin Van Dyke" the patterns are exactly the same, but one needs to be split (2/1) and the other needs to be split (1/2).
You could possibly find a library that knows how to make educated guesses based on a huge dictionary of known names, but failing that...
I think in this case you need the human brain inputting the data to make some of the decisions, and indicate them during data entry. In my experience using multiple fields isn't that slow if you navigate using the tab key instead of mousing around. You could also enter the data using a delimiter of your own, like:
Mary Ann,Steiner,2-3
Constantin,Van Dyke,4-2
Then you'd run something that explodes those lines based on "," and enters the elements into your db.
If you're copy/pasting or scraping the data from an external site, another option would be to just explode every line using the method you're currently using. This should work for most records, and when it doesn't work, it will be obvious -- the resulting record will have too many elements. You can have your script flag just those records for human intervention.

MySQL VARCHAR vs TEXT for various tables of user inputs

All,
I'm writing a web app that will receive user generated text content. Some of those inputs will be a few words, some will be several sentence long. In more than 90% of cases, the inputs will be less than 800 characters. Inputs need to be searchable. Inputs will be in various character sets, including Asian. The site and the db are based on utf8.
I understand roughly the tradeoffs between VARCHAR and TEXT. What I am envisioning is to have both a VARCHAR and a TEXT table, and to store inputs on one or the other depending on their size (this should be doable by the PHP script).
What do you think of having several tables for data based on its size? Also, would it make any sense to create several VARCHAR tables for various size ranges? My guess is that I will get a large number of user inputs clustered around a few key sizes.
Thanks,
JDelage

Storing values in one column vs another depending on size of input is going to add a heck of a lot more complexity to the application than it'll be worth.
As for VARCHAR vs TEXT in MySQL, here's a good discussion about that, MySQL: Large VARCHAR vs TEXT.
The "tricky" part is doing a full-text search on this field which requires the use of MyISAM storage engine as it's the only one that supports full-text indexes. Also of note is that sometimes at the cost of complicating the system architecture, it might be worthwhile to use something like Apache Solr as it perform full-text search much more efficiently. A lot of people have most of the data in their MySQL database and use something like Solr just for full-text indexing that text column and later doing fancy searches with that index.
Re: Unicode. I've used Solr for full-text indexing of text with Unicode characters just fine.

Comments are correct. You are only adding 1 byte by using the TEXT datatype over VARCHAR.
Storage Requirements:
VARCHAR Length of string + 1 byte
TEXT Length of string + 2 bytes

The way I see it is you have two options:
Hold it in TEXT, it will waste single additional byte on storage and additional X processing power on search.
Hold it in VARCHAR, create additional table named A_LOT_OF_TEXT with the structure of (int row_id_of_varchar_table, TEXT). If the data is small enough, put it in varchar, otherwise put a predefined value instead of data, for example 'THE_DATA_YOU_ARE_LOOKING_FOR_IS_IN_TABLE_NAMED_A_LOT_OF_TEXT' or just simply NULL and put the real data to table A_LOT_OF_TEXT.

storing/generating barcode string in database (mysql)

I never worked with barcode and now i must design a whole app with barcode support. I was wondering what type of barcode i can use, how can i make shure that barcode string is uniqe and how would i store that in MySQL.
I was thinkin about generating some barcode strings and print them to stickers so my clients can use them. I was thinking to do generating part in php/mysql then prepare for printing (render in pdf). Let's say i generated 100 strings and store them to database and next time i want to generate another 200 that must be unique.
I don't even know where to begin with string. What information can i store in barcode string?
Can i do this: XXX-ZZZZZ-YYYY-autincrementID?
Where XXX is country ID, ZZZZZ is client ID, YYYY is barcode string ID. Should i use surrogative key for my primary key or should i split those to multiple tables?
Did i mentioned that all autoincrementID's should start from 1 for each client :) I am sooooo confused about all this.
Thanks

First decide on the barcode format you want to use.
Then check if there is a PHP implementation out there (there will be for most - if not all - barcode formats).
A basic example (using PEAR Image_Barcode) can be found at Using barcodes in your web application.
You just store the text in the DB and can generate the corresponding image using the Image_Barcode class (it supports Code 39, Code 128, EAN 13, INT 25, PostNet and UPCA).
I once wrote an app creating EAN 13 barcodes, don't remember which lib I used though (I'll check at home if I can find the source).

We need to separate some concerns.
First is the action of printing any given string as a barcode. The other answers talk about how to do that.
The other action has nothing to do with barcodes and is about database design. Your example suggests the barcode will be a combination of values. However, I get the idea (correct me if I am wrong) that the larger application is not yet clearly spelled out. Therefore it does not matter what kind of "play" table you create for unique codes right now -- create whatever you want. When you know what values must be printed as barcodes, then we are into a database design question.

A barcode is just a way to print and/or read a string. It involves
special fonts,
some calculation (for check digits)
Your first step should be to identify wich barcode you need to support. Many companies manufacturing barcode printers and readers also provide some help about that.
I found some great help here, including free fonts. It's a french site but a few things are available in English.

Autodetect Presence of CSV Headers in a File

Short question: How do I automatically detect whether a CSV file has headers in the first row?
Details: I've written a small CSV parsing engine that places the data into an object that I can access as (approximately) an in-memory database. The original code was written to parse third-party CSV with a predictable format, but I'd like to be able to use this code more generally.
I'm trying to figure out a reliable way to automatically detect the presence of CSV headers, so the script can decide whether to use the first row of the CSV file as keys / column names or start parsing data immediately. Since all I need is a boolean test, I could easily specify an argument after inspecting the CSV file myself, but I'd rather not have to (go go automation).
I imagine I'd have to parse the first 3 to ? rows of the CSV file and look for a pattern of some sort to compare against the headers. I'm having nightmares of three particularly bad cases in which:
The headers include numeric data for some reason
The first few rows (or large portions of the CSV) are null
There headers and data look too similar to tell them apart
If I can get a "best guess" and have the parser fail with an error or spit out a warning if it can't decide, that's OK. If this is something that's going to be tremendously expensive in terms of time or computation (and take more time than it's supposed to save me) I'll happily scrap the idea and go back to working on "important things".
I'm working with PHP, but this strikes me as more of an algorithmic / computational question than something that's implementation-specific. If there's a simple algorithm I can use, great. If you can point me to some relevant theory / discussion, that'd be great, too. If there's a giant library that does natural language processing or 300 different kinds of parsing, I'm not interested.

As others have pointed out, you can't do this with 100% reliability. There are cases where getting it 'mostly right' is useful, however - for example, spreadsheet tools with CSV import functionality often try to figure this out on their own. Here's a few heuristics that would tend to indicate the first line isn't a header:
The first row has columns that are not strings or are empty
The first row's columns are not all unique
The first row appears to contain dates or other common data formats (eg, xx-xx-xx)

In the most general sense, this is impossible. This is a valid csv file:
Name
Jim
Tom
Bill
Most csv readers will just take hasHeader as an option, and allow you to pass in your own header if you want. Even in the case you think you can detect, that being character headers and numeric data, you can run into a catastrophic failure. What if your column is a list of BMW series?
M
3
5
7
You will process this incorrectly. Worst of all, you will lose the best car!

In the purely abstract sense, I don't think there is an foolproof algorithmic answer to your question since it boils down to: "How do I distinguish dataA from dataB if I know nothing about either of them?". There will always be the potential for dataA to be indistinguishable from dataB. That said, I would start with the simple and only add complexity as needed. For example, if examining the first five rows, for a given column (or columns) if the datatype in rows 2-5 are all the same but differ from the datatype in row 1, there's a good chance that a header row is present (increased sample sizes reduce the possibility of error). This would (sorta) solve #1/#3 - perhaps throw an exception if the rows are all populated but the data is indistinguishable to allow the calling program to decide what to do next. For #2, simply don't count a row as a row unless and until it pulls non-null data....that would work in all but an empty file (in which case you'd hit EOF). It would never be foolproof, but it might be "close enough".

It really depends on just how "general" you want your tool to be. If the data will always be numeric, you have it easy as long as you assume non-numeric headers (which seems like a pretty fair assumption).
But beyond that, if you don't already know what patterns are present in the data, then you can't really test for them ahead of time.
FWIW, I actually just wrote a script for parsing out some stuff from TSVs, all from the same source. The source's approach to headers/formatting was so scattered that it made sense to just make the script ask me questions from the command line while executing. (Is this a header? Which columns are important?). So no automation, but it let's me fly through the data sets I'm working on, instead of trying to anticipate each funny formatting case. Also, my answers are saved in a file, so I only have to be involved once per file. Not ideal, but efficient.

This article provides some good guidance:
Basically, you do statistical analysis on columns based on whether the first row contains a string and the rest of the rows numbers, or something like that.
http://penndsg.com/blog/detect-headers/

If you CSV has a header like this.
ID, Name, Email, Date
1, john, john#john.com, 12 jan 2020
Then doing a filter_var(str, FILTER_VALIDATE_EMAIL) on the header row will fail. Since the email address is only in the row data. So check header row for an email address (assuming your CSV has email addresses in it).
Second idea.
http://php.net/manual/en/function.is-numeric.php
Check header row for is_numeric, most likely a header row does not have numeric data in it. But most likely a data row would have numeric data.
If you know you have dates in your columns, then checking the header row for a date would also work.
Obviously you need to what type of data you are expecting. I am "expecting" email addresses.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.