I have a database with many columns all with year names. Inside of them on every row are numbers with a type of integer. I want them to all have thousand seperators (A dollar sign would be nice but I can add that in easy with php).
-What I have now is the following:
SELECT *, format(`2015`, 0) AS `15`, FROM `FullList`
and that gives me the seperators like 1,000,000. The problem is I would have to do that for every column that seems wrong.
in my php I use this as simply
<div class=\"example\">$".$row[`15`]."</div>
Giving me $1,000,000
I'm hoping to find a good way of doing this in SQL or maybe even PHP so that I don't need to use format on every column.
Databases should not contain formatting because you are storing information in a standard form which can be read by any application regardless of language. I suppose the column is currently of data-type float(11,2) or something similar, which is correct and recommended.
Though it may seem tedious to add a dollar sign in front of every value, displaying a value should be the job of the templating language (in this case PHP) and not the database.
You might want to use PHP's money_format() instead:
http://php.net/money_format
Related
My goal and assumption is that a built-in function would generally be faster than a loop that compares two names in a user defined function, as I describe below.
I'm creating an AtoZ set of HTML 'jump-to' buttons using a function that I wrote with server-side PHP code that labels the buttons based on the actual names in a list. To be more helpful, but still keep the list as compact as possible, I show abbreviated partial name ranges like: A-Big, Bil-D, ... where the list starts with a name beginning with A, say Apple, and the first break is between Bigly and Billows. The names in the list are made up of multibyte characters, so any suggested functions need to support multibyte character strings.
The code that I'm using already determines the breaks in the name list based on the number of names on each 'page' of the list, and then loops through the characters in the names before and after each break, comparing each of the characters in the names to find the left-most character position where the difference occurs.
Is there a built-in PHP string compare function that would do this without having to loop through the characters in the two names at each break to determine what to put in the button labels?
If there isn't such a function then, is one to tell me the length of matching characters starting from the beginning of each name?
I looked through the PHP String functions and most of the compare functions only returned less-than, equals, greater-than indicators, not positions/lengths of the difference. The strspn() function comes closer, except that the character order in the two names is important, so using the 2nd name as the mask doesn't seem like it really works.
My looping code works for now, and I'm really looking to make the PHP engine do more of the work for me by 'internalizing' the comparisons, so at this point, I not posting my code.
Thank you.
Background: I have a large database of people, and I want to look for duplicates, which is more difficult than it seems. I already do a lot of comparison between the names (which are often spelled in different ways), dates of birth and so on. When two profiles appear to be similar enough to the matching algorithm, they are presented to an operator who will judge.
Most profiles have more than one phone number attached, so I would like to use them to find duplicates. They can be entered as "001-555-123456", but also as "555-123456", "555-123456-7-8", "555-123456 call me in the evening" or anything you might imagine.
My first idea is to strip all non-numeric characters and get the "longest common substring".
There are a lot of algorithms around to find the longest common substring inside a set.
But whenever I compare two profiles A and B, I have two sets of phone numbers. I would like to find the longest common substring between a string in the set A and a string in a set B.
Can you please help me in finding such an algorithm?
I normally program in PHP, a SQL-only solution would be even better, but any other language would go.
As Voitcus said before, you have to clean your data first before you start comparing or looking for duplicates. A phone number should follow a strict pattern. For the numbers which do not match the pattern try to adjust them to it. Then you have the ability to look for duplicates.
Morevover you should do data-cleaning before persisting it, maybe in a seperate column. You then dont have to care for that when looking for duplicates ... just to avoid performance peaks.
Algorithms like levenshtein or similar_text() in php, doesnt fit to that use-case quite well.
In my opinion the best way is to strip all non-numeric characters from the texts containing phone numbers. You can do this in many ways, some regular expression would be the best, but see below.
Then, if it is possible, you can find the country direction code, if the user has its location country. If there is none, assume default and add to the string. The same would be probably with the cities. You can try to take a look also in place one lives, their zip code etc.
At the end of this you should have uniform phone numbers which can be easily compared.
The other way is to compare strings with the country (and city) code removed.
About searching "the longest common substring": The numbers thus filtered are the same, however you might need it eg. if someone typed "call me after 6 p.m.". If you're sure that the phone number is always at the beginning, so nobody typed something like 555-SUPERMAN which translates to 555-78737626, there is also possibility to remove everything after the last alphanumeric character (and this character, as well).
There is also a possibility to filter such data in the SQL statement. Consider something like a SELECT ..., [your trimming function(phone_number)] AS trimmed_phone WHERE (trimmed_phone is not numerical characters only) GROUP BY trimmed_phone. If trimming function would remove only whitespaces and special dividers like -, +, . (commonly in use in Germany), , perhaps etc., this query would leave you all phone numbers that are trimmed but contain characters not numeric -- take a look at the results, probably mostly digits and letters. How many of them are they? Maybe they have something common? Maybe some typical phrases you can filter out too?
If the result from such query would not be very much, maybe it's easier just to do it by hand?
I'm not sure if this is specific question for Cassandra or this can also belong to PHP so I'm sorry for tagging PHP.
So basically i'm ordering some long row columns by their column names, which goes like this:
2012-01-01_aa_99999 | 2012-01-01_aaa | 2012-01-12_aaaaa
So this is working the way i want it to work, but i don't understand how does it actually order those string.
What is not clear to me is that first string 2012-01-01_aa_99999 seems to be way bigger then the rest two, and i'm concerned that at some point it might ignore first part of the string which is a date and put some string where they shouldn't belong.
In my case those string consist of quite a few parts so i'm really concerned about this, so basically i need some explanation how does this ordering happens internally.
i don't understand how does it actually order those string.
The strings you provided appear to be lexicographically ordered.
I had the same question as I want to construct a composite primary key index with well-understood sorting abilities. It turns out Cassandra appears to compare UTF-8 strings using a byte-by-byte binary comparison... this is indeed a completely broken sort function from a logical perspective. If you had mixed ASCII and Kanji characters in your string, for example, your sort order would be effectively random. However, as long as this sort order is known, one can design your usage patterns around it.
This could be easily fixed, of course, and it would be nearly a single-line change of code to patch in a "real" sort function. This would require a bit extra CPU time, of course.
I've got a website which lists sports scores. It current works like this:
Firstname Lastname 1-0 Firstname Lastname
It explodes this based on spaces, then explodes the third one (containing the scores) based on the - .
The problem with this is that it does not support names with more than 2 words. If I explode using - first, it would not support names with - in there. The results are added in a textarea, because I have many thousands that need to be added, so I don't want to make multiple fields to input data into, as I can currently add matches quickly listing one result per line. Does anyone have advice on how to make the system both multi-word, and special character-insensitive? Is there maybe a way to split when it encounters a number, then select the first chunk as the first name, the last as that players score, and the rest as the last name?
I don't know if there's any way to teach a simple parsing command, or even a regular expression, to do what you want. Some cases will always be ambiguous. For example, if you have the names `Mary Ann Steiner" and "Constantin Van Dyke" the patterns are exactly the same, but one needs to be split (2/1) and the other needs to be split (1/2).
You could possibly find a library that knows how to make educated guesses based on a huge dictionary of known names, but failing that...
I think in this case you need the human brain inputting the data to make some of the decisions, and indicate them during data entry. In my experience using multiple fields isn't that slow if you navigate using the tab key instead of mousing around. You could also enter the data using a delimiter of your own, like:
Mary Ann,Steiner,2-3
Constantin,Van Dyke,4-2
Then you'd run something that explodes those lines based on "," and enters the elements into your db.
If you're copy/pasting or scraping the data from an external site, another option would be to just explode every line using the method you're currently using. This should work for most records, and when it doesn't work, it will be obvious -- the resulting record will have too many elements. You can have your script flag just those records for human intervention.
I am needing to query my database to find records that fall between 2 dates (this is simple enough) however I need to refine the search so that it only finds records where the email falls within certain constraints, basically I need to delete any row that falls between 2 dates and has a format of
x.xxxxxXXXXX#xxxxxxxx.xxx
basically I need to look for email address that start with a letter followed by full stop and have 5 numbers before the # sign. Is this possible with mySQL and if so how, and if not how could I search for these email address with PHP?
You need to use regular expressions. MySQL 5.1 supports these: documentation page. This also can be done in PHP using preg_match.
You regurlar expression could look like: [a-zA-Z]\.[a-zA-Z]+[0-9]{5}#.+
You could also use like, if you find it useful. Example
LIKE '_.__________#______.___'
However, it will not detect numbers.
In that case you have to use regex
docs
Regex should be like this: (changed from example above)
[a-zA-Z]{1}\.[a-zA-Z]+[0-9]{5}#([_a-z0-9\-])\.[a-zA-Z]