As designing a new platform we tried to integrate the IBAN numbers. We have to make sure that the IBAN is validated and the IBAN stored to the database is always correct. So what would be a proper way to validate the number?
As the logic was explained in my other question, I've created a function myself. Based on the logic explained in the Wikipedia article find a proper function below. Country specific validation.
Algorithm and character lengths per country at https://en.wikipedia.org/wiki/International_Bank_Account_Number#Validating_the_IBAN.
function checkIBAN($iban)
{
if(strlen($iban) < 5) return false;
$iban = strtolower(str_replace(' ','',$iban));
$Countries = array('al'=>28,'ad'=>24,'at'=>20,'az'=>28,'bh'=>22,'be'=>16,'ba'=>20,'br'=>29,'bg'=>22,'cr'=>21,'hr'=>21,'cy'=>28,'cz'=>24,'dk'=>18,'do'=>28,'ee'=>20,'fo'=>18,'fi'=>18,'fr'=>27,'ge'=>22,'de'=>22,'gi'=>23,'gr'=>27,'gl'=>18,'gt'=>28,'hu'=>28,'is'=>26,'ie'=>22,'il'=>23,'it'=>27,'jo'=>30,'kz'=>20,'kw'=>30,'lv'=>21,'lb'=>28,'li'=>21,'lt'=>20,'lu'=>20,'mk'=>19,'mt'=>31,'mr'=>27,'mu'=>30,'mc'=>27,'md'=>24,'me'=>22,'nl'=>18,'no'=>15,'pk'=>24,'ps'=>29,'pl'=>28,'pt'=>25,'qa'=>29,'ro'=>24,'sm'=>27,'sa'=>24,'rs'=>22,'sk'=>24,'si'=>19,'es'=>24,'se'=>24,'ch'=>21,'tn'=>24,'tr'=>26,'ae'=>23,'gb'=>22,'vg'=>24);
$Chars = array('a'=>10,'b'=>11,'c'=>12,'d'=>13,'e'=>14,'f'=>15,'g'=>16,'h'=>17,'i'=>18,'j'=>19,'k'=>20,'l'=>21,'m'=>22,'n'=>23,'o'=>24,'p'=>25,'q'=>26,'r'=>27,'s'=>28,'t'=>29,'u'=>30,'v'=>31,'w'=>32,'x'=>33,'y'=>34,'z'=>35);
if(array_key_exists(substr($iban,0,2), $Countries) && strlen($iban) == $Countries[substr($iban,0,2)]){
$MovedChar = substr($iban, 4).substr($iban,0,4);
$MovedCharArray = str_split($MovedChar);
$NewString = "";
foreach($MovedCharArray AS $key => $value){
if(!is_numeric($MovedCharArray[$key])){
if(!isset($Chars[$MovedCharArray[$key]])) return false;
$MovedCharArray[$key] = $Chars[$MovedCharArray[$key]];
}
$NewString .= $MovedCharArray[$key];
}
if(bcmod($NewString, '97') == 1)
{
return true;
}
}
return false;
}
Slight modification of #PeterFox answer including support for bcmod() when bcmath is not available,
<?php
function isValidIBAN ($iban) {
$iban = strtolower($iban);
$Countries = array(
'al'=>28,'ad'=>24,'at'=>20,'az'=>28,'bh'=>22,'be'=>16,'ba'=>20,'br'=>29,'bg'=>22,'cr'=>21,'hr'=>21,'cy'=>28,'cz'=>24,
'dk'=>18,'do'=>28,'ee'=>20,'fo'=>18,'fi'=>18,'fr'=>27,'ge'=>22,'de'=>22,'gi'=>23,'gr'=>27,'gl'=>18,'gt'=>28,'hu'=>28,
'is'=>26,'ie'=>22,'il'=>23,'it'=>27,'jo'=>30,'kz'=>20,'kw'=>30,'lv'=>21,'lb'=>28,'li'=>21,'lt'=>20,'lu'=>20,'mk'=>19,
'mt'=>31,'mr'=>27,'mu'=>30,'mc'=>27,'md'=>24,'me'=>22,'nl'=>18,'no'=>15,'pk'=>24,'ps'=>29,'pl'=>28,'pt'=>25,'qa'=>29,
'ro'=>24,'sm'=>27,'sa'=>24,'rs'=>22,'sk'=>24,'si'=>19,'es'=>24,'se'=>24,'ch'=>21,'tn'=>24,'tr'=>26,'ae'=>23,'gb'=>22,'vg'=>24
);
$Chars = array(
'a'=>10,'b'=>11,'c'=>12,'d'=>13,'e'=>14,'f'=>15,'g'=>16,'h'=>17,'i'=>18,'j'=>19,'k'=>20,'l'=>21,'m'=>22,
'n'=>23,'o'=>24,'p'=>25,'q'=>26,'r'=>27,'s'=>28,'t'=>29,'u'=>30,'v'=>31,'w'=>32,'x'=>33,'y'=>34,'z'=>35
);
if (strlen($iban) != $Countries[ substr($iban,0,2) ]) { return false; }
$MovedChar = substr($iban, 4) . substr($iban,0,4);
$MovedCharArray = str_split($MovedChar);
$NewString = "";
foreach ($MovedCharArray as $k => $v) {
if ( !is_numeric($MovedCharArray[$k]) ) {
$MovedCharArray[$k] = $Chars[$MovedCharArray[$k]];
}
$NewString .= $MovedCharArray[$k];
}
if (function_exists("bcmod")) { return bcmod($NewString, '97') == 1; }
// http://au2.php.net/manual/en/function.bcmod.php#38474
$x = $NewString; $y = "97";
$take = 5; $mod = "";
do {
$a = (int)$mod . substr($x, 0, $take);
$x = substr($x, $take);
$mod = $a % $y;
}
while (strlen($x));
return (int)$mod == 1;
}
The accepted answer is not the preferred way of validation. The specification dictates the following:
Check that the total IBAN length is correct as per the country. If not, the IBAN is invalid
Replace the two check digits by 00 (e.g. GB00 for the UK)
Move the four initial characters to the end of the string
Replace the letters in the string with digits, expanding the string as necessary, such that A or a = 10, B or b = 11, and Z or z = 35. Each alphabetic character is therefore replaced by 2 digits
Convert the string to an integer (i.e. ignore leading zeroes)
Calculate mod-97 of the new number, which results in the remainder
Subtract the remainder from 98, and use the result for the two check digits. If the result is a single digit number, pad it with a leading 0 to make a two-digit number
I've written a class that validates, formats and parses strings according to the spec. Hope this helps some save the time required to roll their own.
The code can be found on GitHub here.
top rated function does NOT work.
Just try a string with '%' in it...
I'm using this one :
function checkIBAN($iban) {
// Normalize input (remove spaces and make upcase)
$iban = strtoupper(str_replace(' ', '', $iban));
if (preg_match('/^[A-Z]{2}[0-9]{2}[A-Z0-9]{1,30}$/', $iban)) {
$country = substr($iban, 0, 2);
$check = intval(substr($iban, 2, 2));
$account = substr($iban, 4);
// To numeric representation
$search = range('A','Z');
foreach (range(10,35) as $tmp)
$replace[]=strval($tmp);
$numstr=str_replace($search, $replace, $account.$country.'00');
// Calculate checksum
$checksum = intval(substr($numstr, 0, 1));
for ($pos = 1; $pos < strlen($numstr); $pos++) {
$checksum *= 10;
$checksum += intval(substr($numstr, $pos,1));
$checksum %= 97;
}
return ((98-$checksum) == $check);
} else
return false;
}
I found this solution in cakephp 3.7 validation class. Plain beautiful php realization.
/**
* Check that the input value has a valid International Bank Account Number IBAN syntax
* Requirements are uppercase, no whitespaces, max length 34, country code and checksum exist at right spots,
* body matches against checksum via Mod97-10 algorithm
*
* #param string $check The value to check
*
* #return bool Success
*/
public static function iban($check)
{
if (!preg_match('/^[A-Z]{2}[0-9]{2}[A-Z0-9]{1,30}$/', $check)) {
return false;
}
$country = substr($check, 0, 2);
$checkInt = intval(substr($check, 2, 2));
$account = substr($check, 4);
$search = range('A', 'Z');
$replace = [];
foreach (range(10, 35) as $tmp) {
$replace[] = strval($tmp);
}
$numStr = str_replace($search, $replace, $account . $country . '00');
$checksum = intval(substr($numStr, 0, 1));
$numStrLength = strlen($numStr);
for ($pos = 1; $pos < $numStrLength; $pos++) {
$checksum *= 10;
$checksum += intval(substr($numStr, $pos, 1));
$checksum %= 97;
}
return ((98 - $checksum) === $checkInt);
}
This function check the IBAN and need GMP activate http://php.net/manual/en/book.gmp.php.
function checkIban($string){
$to_check = substr($string, 4).substr($string, 0,4);
$converted = '';
for ($i = 0; $i < strlen($to_check); $i++){
$char = strtoupper($to_check[$i]);
if(preg_match('/[0-9A-Z]/',$char)){
if(!preg_match('/\d/',$char)){
$char = ord($char)-55;
}
$converted .= $char;
}
}
// prevent: "gmp_mod() $num1 is not an integer string" error
$converted = ltrim($converted, '0');
return strlen($converted) && gmp_strval(gmp_mod($converted, '97')) == 1;
}
enjoy !
The function levenshtein in PHP works on strings with maximum length 255. What are good alternatives to compute a similarity score of sentences in PHP.
Basically I have a database of sentences, and I want to find approximate duplicates.
similar_text function is not giving me expected results. What is the easiest way for me to detect similar sentences like below:
$ss="Jack is a very nice boy, isn't he?";
$pp="jack is a very nice boy is he";
$ss=strtolower($ss); // convert to lower case as we dont care about case
$pp=strtolower($pp);
$score=similar_text($ss, $pp);
echo "$score %\n"; // Outputs just 29 %
$score=levenshtein ( $ss, $pp );
echo "$score\n"; // Outputs '5', which indicates they are very similar. But, it does not work for more than 255 chars :(
The levenshtein algorithm has a time complexity of O(n*m), where n and m are the lengths of the two input strings. This is pretty expensive and computing such a distance for long strings will take a long time.
For whole sentences, you might want to use a diff algorithm instead, see for example: Highlight the difference between two strings in PHP
Having said this, PHP also provides the similar_text function which has an even worse complexity (O(max(n,m)**3)) but seems to work on longer strings.
I've found the Smith Waterman Gotoh to be the best algorithm for comparing sentences. More info in this answer. Here is the PHP code example:
class SmithWatermanGotoh
{
private $gapValue;
private $substitution;
/**
* Constructs a new Smith Waterman metric.
*
* #param gapValue
* a non-positive gap penalty
* #param substitution
* a substitution function
*/
public function __construct($gapValue=-0.5,
$substitution=null)
{
if($gapValue > 0.0) throw new Exception("gapValue must be <= 0");
//if(empty($substitution)) throw new Exception("substitution is required");
if (empty($substitution)) $this->substitution = new SmithWatermanMatchMismatch(1.0, -2.0);
else $this->substitution = $substitution;
$this->gapValue = $gapValue;
}
public function compare($a, $b)
{
if (empty($a) && empty($b)) {
return 1.0;
}
if (empty($a) || empty($b)) {
return 0.0;
}
$maxDistance = min(mb_strlen($a), mb_strlen($b))
* max($this->substitution->max(), $this->gapValue);
return $this->smithWatermanGotoh($a, $b) / $maxDistance;
}
private function smithWatermanGotoh($s, $t)
{
$v0 = [];
$v1 = [];
$t_len = mb_strlen($t);
$max = $v0[0] = max(0, $this->gapValue, $this->substitution->compare($s, 0, $t, 0));
for ($j = 1; $j < $t_len; $j++) {
$v0[$j] = max(0, $v0[$j - 1] + $this->gapValue,
$this->substitution->compare($s, 0, $t, $j));
$max = max($max, $v0[$j]);
}
// Find max
for ($i = 1; $i < mb_strlen($s); $i++) {
$v1[0] = max(0, $v0[0] + $this->gapValue, $this->substitution->compare($s, $i, $t, 0));
$max = max($max, $v1[0]);
for ($j = 1; $j < $t_len; $j++) {
$v1[$j] = max(0, $v0[$j] + $this->gapValue, $v1[$j - 1] + $this->gapValue,
$v0[$j - 1] + $this->substitution->compare($s, $i, $t, $j));
$max = max($max, $v1[$j]);
}
for ($j = 0; $j < $t_len; $j++) {
$v0[$j] = $v1[$j];
}
}
return $max;
}
}
class SmithWatermanMatchMismatch
{
private $matchValue;
private $mismatchValue;
/**
* Constructs a new match-mismatch substitution function. When two
* characters are equal a score of <code>matchValue</code> is assigned. In
* case of a mismatch a score of <code>mismatchValue</code>. The
* <code>matchValue</code> must be strictly greater then
* <code>mismatchValue</code>
*
* #param matchValue
* value when characters are equal
* #param mismatchValue
* value when characters are not equal
*/
public function __construct($matchValue, $mismatchValue) {
if($matchValue <= $mismatchValue) throw new Exception("matchValue must be > matchValue");
$this->matchValue = $matchValue;
$this->mismatchValue = $mismatchValue;
}
public function compare($a, $aIndex, $b, $bIndex) {
return ($a[$aIndex] === $b[$bIndex] ? $this->matchValue
: $this->mismatchValue);
}
public function max() {
return $this->matchValue;
}
public function min() {
return $this->mismatchValue;
}
}
$str1 = "Jack is a very nice boy, isn't he?";
$str2 = "jack is a very nice boy is he";
$o = new SmithWatermanGotoh();
echo $o->compare($str1, $str2);
You could try using similar_text.
It can get quite slow with 20,000+ characters (3-5 seconds) but your example you mention using only sentences, this will work just fine for that usage.
One thing to note is when comparing string of different sizes you will not get 100%. For example if you compare "he" with "head" you would only get a 50% match.
I found Marcel Jackwerth's response to How to code a URL shortener? to be a good answer for the problem, however my question is how it'll look in PHP? Here's Marcel's answer:
You need a Bijective Function f (there must be no x1 != x2, that will make f(x1) = f(x2); and for every y you will find a x so that f(x)=y). This is necessary so that you can find a inverse function g('abc') = 123 for your f(123)='abc' function.
I would continue your "convert number to string" approach (however you will realize that your proposed algorithm fails if your id is a prime and greater than 52).
How to convert the id to a shortened url:
Think of an alphabet you want to use. In your case that's [a-zA-Z0-9]. It contains 62 letters.
Take the auto-generated unique numerical key (auto-incremented id): for example 125 (a decimal number)
Now you have to convert the 125 (base 10) to X (base 62). This will then be {2}{1} (2×62+1=125).
Now map the symbols {2} and {1} to your alphabet. Say {0} = 'a', {25} = 'z' and so on. We will have {2} = 'c' and {1} = 'b'. So '/cb' will be your shortened url.
How to resolve a shortened url abc to the initial id:
If you want to do this in reverse, it's not quite diffcult. 'e9a' will be resolved to "4th,61st,0th letter in alphabet" = {4}{61}{0}, which is 4×62×62 + 61×62 + 0 = 19158. You will then just have to find your database-record with id 19158.
function convert($src, $srcAlphabet, $dstAlphabet) {
$srcBase = strlen($srcAlphabet);
$dstBase = strlen($dstAlphabet);
$wet = $src;
$val = 0;
$mlt = 1;
while ($l = strlen($wet)) {
$digit = $wet[$l - 1];
$val += $mlt * strpos($srcAlphabet, $digit);
$wet = substr($wet, 0, $l - 1);
$mlt *= $srcBase;
}
$wet = $val;
$dst = '';
while ($wet >= $dstBase) {
$digitVal = $wet % $dstBase;
$digit = $dstAlphabet[$digitVal];
$dst = $digit . $dst;
$wet /= $dstBase;
}
$digit = $dstAlphabet[$wet];
$dst = $digit . $dst;
return $dst;
}
// prints cb
print convert('125', '0123456789', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789');
// prints 19158
print convert('e9a', 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', '0123456789');
I like this PHP function which allows you to customise the alphabet (and remove confusing 0/O's etc.)
// From http://snipplr.com/view/22246/base62-encode--decode/
private function base_encode($val, $base=62, $chars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ') {
$str = '';
do {
$i = fmod($val, $base);
$str = $chars[$i] . $str;
$val = ($val - $i) / $base;
} while($val > 0);
return $str;
}
Follow the URL to find the reverse 'decode' function too.
The main problem with Marcel's solution is that it uses a zero digit as a placeholder. By converting between bases, inevitably the numeral chosen to represent 0 can't appear at the front of the converted number.
For example, if you convert base 10 integers to base 4 using "ABCD" using the provided mechanism, there is no way to obtain output that starts with the letter "A", since that represents a zero in the new base and won't prefix the number. You might expect 5 to be "AA", but instead, it is "BA". There is no way to coerce that algorithm into producing "AA", because it would be like writing "00" in decimal, which has the same value as "0".
Here's an alternate solution in PHP that uses the entire gamut:
function encode($n, $alphabet = 'ABCD') {
$output = '';
if($n == 0) {
$output = $alphabet[0];
}
else {
$digits = floor(log($n, strlen($alphabet))) + 1;
for($z = 0; $z < $digits; $z++) {
$digit = $n % 4;
$output = $alphabet[$digit] . $output;
$n = floor($n / 4) - 1;
}
}
return $output;
}
function decode($code, $alphabet = 'ABCD') {
$n = 0;
$code = str_split($code);
$unit = 1;
while($letter = array_pop($code)) {
$n += (strpos($alphabet, $letter) + 1) * $unit;
$unit = $unit * strlen($alphabet);
}
return $n - 1;
}
echo encode(25); // should output "ABB"
echo decode('ABB'); // should output 25
Change/pass the second parameter to a list of characters to use instead of the short 4-character dictionary of "ABCD".
all you need to do is convert between different base systems base 10 to base 62
https://github.com/infinitas/infinitas/blob/dev/core/short_urls/models/short_url.php