Detect multiple zeroes in a string - php

I have a list of numbers coming in mostly looking like this: 1234567912345.
But some of them have a big number of zeroes both before and after the usual number.
Making them look like this: "000000001765032019308000000".
For the moment I will call them special numbers.
My initial try was using strpos to check if it contains "00000" which should be enough to confirm that this is indeed one of the special numbers but that doesn't work.
It's also not possible to just check the length of the number.
So my question is, how do I detect if a number is one the special numbers?

You can use preg_match to determine whether there are 6-8 0's at the beginning and end of the number:
$numbers = array('1234567912345', '000000001765032019308000000',
'000000000000445000000', '0000004000000');
foreach ($numbers as $num) {
echo "$num is " . (preg_match('/^0{6,8}[1-9](\d*[1-9])?0{6,8}$/', $num) ? '' : 'not ') . "special\n";
}
Output:
1234567912345 is not special
000000001765032019308000000 is special
000000000000445000000 is not special
0000004000000 is special

Related

PHP detect variable length string contains any character other than 1

Using PHP I sometimes have strings that look like the following:
111
110
011
1111
0110012
What is the most efficient way (preferably without regex) to determine if a string contains any character other then the character 1?
Here's a one-line code solution that can be put into a conditional etc.:
strlen(str_replace('1','',$mystring))==0
It strips out the "1"s and sees if there's anything left.
User Don't Panic commented that str_replace could be replaced by trim:
strlen(trim($mystring, '1'))==0
which removes leading and trailing 1s and sees if there's anything left. This would work for the particular case in OP's request but the first option will also tell you how many non-"1" characters you have (if that information matters). Depending on implementation, trim might run slightly faster because PHP doesn't have to check any characters between the first and last non-"1" characters.
You could also use a string like a character array and iterate through from the beginning until you find a character which is not =='1' (in which case, return true) or reach the end of the array (in which case, return false).
Finally, though OP here said "preferably without regex," others open to regexes might use one:
preg_match("/[^1]/", $mystring)==1
Another way to do it:
if (base_convert($string, 2, 2) === $string) {
// $string has only 0 and 1 characters.
}
since your $string is basically a binary number, you can check it with base_convert.
How it works:
var_dump(base_convert('110', 2, 2)); // 110
var_dump(base_convert('11503', 2, 2)); // 110
var_dump(base_convert('9111111111111111111110009', 2, 2)); // 11111111111111111111000
If the returned value of base_convert is different from the input, there're something other characters, beside 0 and 1.
If you want checks if the string has only 1 characters:
if(array_sum(str_split($string)) === strlen($string)) {
// $string has only 1 characters.
}
You retrieve all the single numbers with str_split, and sum them with array_sum. If the result isn't the same as the length of the string, then you've other number in the string beside 1.
Another option is treat string like array of symbols and check for something that is not 1. If it is - break for loop:
for ($i = 0; $i < strlen($mystring); $i++) {
if ($mystring[$i] != '1') {
echo 'FOUND!';
break;
}
}

PHP money_format padded number grouping

Basically, I'm trying to output a bunch of numbers in a nicely aligned column formatted to look like so:
$##,###.##
$##,###.##
With the following code
setlocale(LC_MONETARY, 'en_US.utf8');
money_format('%=0(#5.2n',$sum)
it works EXCEPT when $sum is less than 1000. For example, with
$sum=1550.00;
(as expected) the above code outputs
$01,550.00
However, with
$sum=167.00;
the above snippet outputs
$000167.00
which, obviously, is not what I need. According to the documentation
Grouping separators will not be applied to fill characters, even if the fill character is a digit.
So, this is the expected behaviour of the function. Doesn't seem to make sense, but that's how it works.
Any suggestions on how to get proper formatting for padded numbers would be appreciated.
Thanks!
You could build your own formatting function. localeconv() will give you data like separators and where the currency symbol should sit. This is one I used, which fit my needs:
function my_number_format($amount) {
// Get separators
$l = localeconv();
$str = number_format($amount, 2, $l['decimal_point'], $l['thousands_sep']);
// manual padding
while (strlen(preg_replace('/[^0-9]/', '', $str)) < 10) {
$init = preg_replace('/[^0-9].*/', '', $str); // initial all-digit sequence
$str = '0' . (strlen($init) == 3 ? $l['thousands_sep'] : '') . $str;
}
return $amount;
}

Using regex to fix phone numbers in a CSV with PHP

My new phone does not recognize a phone number unless its area code matches the incoming call. Since I live in Idaho where an area code is not needed for in-state calls, many of my contacts were saved without an area code. Since I have thousands of contacts stored in my phone, it would not be practical to manually update them. I decided to write the following PHP script to handle the problem. It seems to work well, except that I'm finding duplicate area codes at the beginning of random contacts.
<?php
//the script can take a while to complete
set_time_limit(200);
function validate_area_code($number) {
//digits are taken one by one out of $number, and insert in to $numString
$numString = "";
for ($i = 0; $i < strlen($number); $i++) {
$curr = substr($number,$i,1);
//only copy from $number to $numString when the character is numeric
if (is_numeric($curr)) {
$numString = $numString . $curr;
}
}
//add area code "208" to the beginning of any phone number of length 7
if (strlen($numString) == 7) {
return "208" . $numString;
//remove country code (none of the contacts are outside the U.S.)
} else if (strlen($numString) == 11) {
return preg_replace("/^1/","",$numString);
} else {
return $numString;
}
}
//matches any phone number in the csv
$pattern = "/((1? ?\(?[2-9]\d\d\)? *)? ?\d\d\d-?\d\d\d\d)/";
$csv = file_get_contents("contacts2.CSV");
preg_match_all($pattern,$csv,$matches);
foreach ($matches[0] as $key1 => $value) {
/*create a pattern that matches the specific phone number by adding slashes before possible special characters*/
$pattern = preg_replace("/\(|\)|\-/","\\\\$0",$value);
//create the replacement phone number
$replacement = validate_area_code($value);
//add delimeters
$pattern = "/" . $pattern . "/";
$csv = preg_replace($pattern,$replacement,$csv);
}
echo $csv;
?>
Is there a better approach to modifying the CSV? Also, is there a way to minimize the number of passes over the CSV? In the script above, preg_replace is called thousands of times on a very large String.
If I understand you correctly, you just need to prepend the area code to any 7-digit phone number anywhere in this file, right? I have no idea what kind of system you're on, but if you have some decent tools, here are a couple options. And of course, the approaches they take can presumably be implemented in PHP; that's just not one of my languages.
So, how about a sed one-liner? Just look for 7-digit phone numbers, bounded by either beginning of line or comma on the left, and comma or end of line on the right.
sed -r 's/(^|,)([0-9]{3}-[0-9]{4})(,|$)/\1208-\2\3/g' contacts.csv
Or if you want to only apply it to certain fields, perl (or awk) would be easier. Suppose it's the second field:
perl -F, -ane '$"=","; $F[1]=~s/^[0-9]{3}-[0-9]{4}$/208-$&/; print "#F";' contacts.csv
The -F, indicates the field separator, the $" is the output field separator (yes, it gets assigned once per loop, oh well), the arrays are zero-indexed so second field is $F[1], there's a run-of-the-mill substitution, and you print the results.
Ah programs... sometimes a 10-min hack is better.
If it were me... I'd import the CSV into Excel, sort it by something - maybe the length of the phone number or something. Make a new col for the fixed phone number. When you have a group of similarly-fouled numbers, make a formula to fix. Same for the next group. Should be pretty quick, no? Then export to .csv again, omitting the bad col.
A little more digging on my own revealed the issues with the regex in my question. The problem is with duplicate contacts in the csv.
Example:
(208) 555-5555, 555-5555
After the first pass becomes:
2085555555, 208555555
and After the second pass becomes
2082085555555, 2082085555555
I worked around this by changing the replacement regex to:
//add escapes for special characters
$pattern = preg_replace("/\(|\)|\-|\./","\\\\$0",$value);
//add delimiters, and optional area code
$pattern = "/(\(?[0-9]{3}\)?)? ?" . $pattern . "/";

PHP - smart, error tolerating string comparison

I'm looking either for routine or way to look for error tolerating string comparison.
Let's say, we have test string Čakánka - yes, it contains CE characters.
Now, I want to accept any of following strings as OK:
cakanka
cákanká
ČaKaNKA
CAKANKA
CAAKNKA
CKAANKA
cakakNa
The problem is, that I often switch letters in word, and I want to minimize user's frustration with not being able (i.e. you're in rush) to write one word right.
So, I know how to make ci comparison (just make it lowercase :]), I can delete CE characters, I just can't wrap my head around tolerating few switched characters.
Also, you often put one character not only in wrong place (character=>cahracter), but sometimes shift it by multiple places (character=>carahcter), just because one finger was lazy during writing.
Thank you :]
Not sure (especially about the accents / special characters stuff, which you might have to deal with first), but for characters that are in the wrong place or missing, the levenshtein function, that calculates Levenshtein distance between two strings, might help you (quoting) :
int levenshtein ( string $str1 , string $str2 )
int levenshtein ( string $str1 , string $str2 , int $cost_ins , int $cost_rep , int $cost_del )
The Levenshtein distance is defined as
the minimal number of characters you
have to replace, insert or delete to
transform str1 into str2
Other possibly useful functions could be soundex, similar_text, or metaphone.
And some of the user notes on the manual pages of those functions, especially the manual page of levenshtein might bring you some useful stuff too ;-)
You could transliterate the words to latin characters and use a phonetic algorithm like Soundex to get the essence from your word and compare it to the ones you have. In your case that would be C252 for all of your words except the last one that is C250.
Edit    The problem with comparative functions like levenshtein or similar_text is that you need to call them for each pair of input value and possible matching value. That means if you have a database with 1 million entries you will need to call these functions 1 million times.
But functions like soundex or metaphone, that calculate some kind of digest, can help to reduce the number of actual comparisons. If you store the soundex or metaphone value for each known word in your database, you can reduce the number of possible matches very quickly. Later, when the set of possible matching value is reduced, then you can use the comparative functions to get the best match.
Here’s an example:
// building the index that represents your database
$knownWords = array('Čakánka', 'Cakaka');
$index = array();
foreach ($knownWords as $key => $word) {
$code = soundex(iconv('utf-8', 'us-ascii//TRANSLIT', $word));
if (!isset($index[$code])) {
$index[$code] = array();
}
$index[$code][] = $key;
}
// test words
$testWords = array('cakanka', 'cákanká', 'ČaKaNKA', 'CAKANKA', 'CAAKNKA', 'CKAANKA', 'cakakNa');
echo '<ul>';
foreach ($testWords as $word) {
$code = soundex(iconv('utf-8', 'us-ascii//TRANSLIT', $word));
if (isset($index[$code])) {
echo '<li> '.$word.' is similar to: ';
$matches = array();
foreach ($index[$code] as $key) {
similar_text(strtolower($word), strtolower($knownWords[$key]), $percentage);
$matches[$knownWords[$key]] = $percentage;
}
arsort($matches);
echo '<ul>';
foreach ($matches as $match => $percentage) {
echo '<li>'.$match.' ('.$percentage.'%)</li>';
}
echo '</ul></li>';
} else {
echo '<li>no match found for '.$word.'</li>';
}
}
echo '</ul>';
Spelling checkers do something like fuzzy string comparison. Perhaps you can adapt an algorithm based on that reference. Or grab the spell checker guessing code from an open source project like Firefox.

How to add currency strings (non-standardized input) together in PHP?

I have a form in which people will be entering dollar values.
Possible inputs:
$999,999,999.99
999,999,999.99
999999999
99,999
$99,999
The user can enter a dollar value however they wish. I want to read the inputs as doubles so I can total them.
I tried just typecasting the strings to doubles but that didn't work. Total just equals 50 when it is output:
$string1 = "$50,000";
$string2 = "$50000";
$string3 = "50,000";
$total = (double)$string1 + (double)$string2 + (double)$string3;
echo $total;
A regex won't convert your string into a number. I would suggest that you use a regex to validate the field (confirm that it fits one of your allowed formats), and then just loop over the string, discarding all non-digit and non-period characters. If you don't care about validation, you could skip the first step. The second step will still strip it down to digits and periods only.
By the way, you cannot safely use floats when calculating currency values. You will lose precision, and very possibly end up with totals that do not exactly match the inputs.
Update: Here are two functions you could use to verify your input and to convert it into a decimal-point representation.
function validateCurrency($string)
{
return preg_match('/^\$?(\d{1,3})(,\d{3})*(.\d{2})?$/', $string) ||
preg_match('/^\$?\d+(.\d{2})?$/', $string);
}
function makeCurrency($string)
{
$newstring = "";
$array = str_split($string);
foreach($array as $char)
{
if (($char >= '0' && $char <= '9') || $char == '.')
{
$newstring .= $char;
}
}
return $newstring;
}
The first function will match the bulk of currency formats you can expect "$99", "99,999.00", etc. It will not match ".00" or "99.", nor will it match most European-style numbers (99.999,00). Use this on your original string to verify that it is a valid currency string.
The second function will just strip out everything except digits and decimal points. Note that by itself it may still return invalid strings (e.g. "", "....", and "abc" come out as "", "....", and ""). Use this to eliminate extraneous commas once the string is validated, or possibly use this by itself if you want to skip validation.
You don't ever want to represent monetary values as floats!
For example, take the following (seemingly straight forward) code:
$x = 1.0;
for ($ii=0; $ii < 10; $ii++) {
$x = $x - .1;
}
var_dump($x);
You might assume that it would produce the value zero, but that is not the case. Since $x is a floating point, it actually ends up being a tiny bit more than zero (1.38777878078E-16), which isn't a big deal in itself, but it means that comparing the value with another value isn't guaranteed to be correct. For example $x == 0 would produce false.
http://p2p.wrox.com/topic.asp?TOPIC_ID=3099
goes through it step by step
[edit] typical...the site seems to be down now... :(
not a one liner, but if you strip out the ','s you can do: (this is pseudocode)
m/^\$?(\d+)(?:\.(\d\d))?$/
$value = $1 + $2/100;
That allows $9.99 but not $9. or $9.9 and fails to complain about missplaced thousands separators (bug or feature?)
There is a potential 'locality' issue here because you are assuming that thousands are done with ',' and cents as '.' but in europe it is opposite (e.g. 1.000,99)
I recommend not to use a float for storing currency values. You can get rounding errors if the sum gets large. (Ok, if it gets very large.)
Better use an integer variable with a large enough range, and store the input in cents, not dollars.
I belive that you can accomplish this with printf, which is similar to the c function of the same name. its parameters can be somewhat esoteric though. you can also use php's number_format function
Assuming that you are getting real money values, you could simply strip characters that are not digits or the decimal point:
(pseudocode)
newnumber = replace(oldnumber, /[^0-9.]/, //)
Now you can convert using something like
double(newnumber)
However, this will not take care of strings such as "5.6.3" and other such non-money strings. Which raises the question, "Do you need to handle badly formatted strings?"

Categories