How can I encrypt a PHP variable (a link)? - php

I need help encoding a link. Basically, upon completion of an event, I run a function that redirects the user to a link. The link is taken directly from a PHP variable.
<?php
$url = "http://google.com/";
$mylink = "<a href=\"" . $url . "\">";
echo $mylink;
?>
My question is, how can I echo $mylink, without having the $url shown in the source code. I want the output of my link to still go to $url, but not show the value of $url ANYWHERE in my source code.
How can I achieve this?

Here is one sure way to do this: you could store in your database (or files on the server) the link and an ID. Instead of the actual url, print a link to a php script you write which passes in that ID. This page you write simply looks up the associated ID and uses a header to redirect to the link.
For Example: you write a script redirector.php then the links in the page source point to "redirector.php?id=10293". The redirector script looks up what is id 10293 and sees http://www.example.com then calls
header('Location: http://www.example.com/');
This way the links are only in the server side and never show up in your source code. As user pst suggested in the comments you could also use something like tinyurl which operates on this same principal.
Any other methods will rely on some sort of encryption which could be decrypted because the actual data (link url) is in the page source albeit obscured.
EDIT : here is an example of how you could write your two scripts -- the one which is printing the urls and the one which would redirect. Assuming a table urls exists in your MySQL db add a column called 'hash' or 'id' or something and in the script which will print the urls add the lines:
$hash = sha1($url);
mysql_query("UPDATE url_table SET hash = '$hash' WHERE url = '$url'");
$printURL = "../redirect.php?id=$hash";
print "<a href='$printURL'>click me to go somewhere you don't know yet</a>";
now in another file named redirect.php put the following code:
<?php
//connect to db or include files
$givenHash = $_REQUEST['id']; $realURL = $_SERVER['HTTP_REFERER'];
$result = mysql_query("SELECT url FROM url_table WHERE hash = '$givenHash'");
while($row=mysql_fetch_assoc($result)) {
$realURL = $row['url'];
}
header("Location: $realURL");
?>
This will either send them to the url you want if it is found in the db or drop them back to the page they were on before they clicked the link. If you only have a few links that are known in advance then you can do this trick without the use of databases by just using a look up array. Hope this helps.

You could do it with javascript, and some encryption like TEA.
Something like...
Include TEA function in javascript
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/* Block TEA (xxtea) Tiny Encryption Algorithm implementation in JavaScript */
/* (c) Chris Veness 2002-2010: www.movable-type.co.uk/tea-block.html */
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/* Algorithm: David Wheeler & Roger Needham, Cambridge University Computer Lab */
/* http://www.cl.cam.ac.uk/ftp/papers/djw-rmn/djw-rmn-tea.html (1994) */
/* http://www.cl.cam.ac.uk/ftp/users/djw3/xtea.ps (1997) */
/* http://www.cl.cam.ac.uk/ftp/users/djw3/xxtea.ps (1998) */
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
var Tea = {}; // Tea namespace
/*
* encrypt text using Corrected Block TEA (xxtea) algorithm
*
* #param {string} plaintext String to be encrypted (multi-byte safe)
* #param {string} password Password to be used for encryption (1st 16 chars)
* #returns {string} encrypted text
*/
Tea.encrypt = function(plaintext, password) {
if (plaintext.length == 0) return(''); // nothing to encrypt
// convert string to array of longs after converting any multi-byte chars to UTF-8
var v = Tea.strToLongs(Utf8.encode(plaintext));
if (v.length <= 1) v[1] = 0; // algorithm doesn't work for n<2 so fudge by adding a null
// simply convert first 16 chars of password as key
var k = Tea.strToLongs(Utf8.encode(password).slice(0,16));
var n = v.length;
// ---- <TEA coding> ----
var z = v[n-1], y = v[0], delta = 0x9E3779B9;
var mx, e, q = Math.floor(6 + 52/n), sum = 0;
while (q-- > 0) { // 6 + 52/n operations gives between 6 & 32 mixes on each word
sum += delta;
e = sum>>>2 & 3;
for (var p = 0; p < n; p++) {
y = v[(p+1)%n];
mx = (z>>>5 ^ y<<2) + (y>>>3 ^ z<<4) ^ (sum^y) + (k[p&3 ^ e] ^ z);
z = v[p] += mx;
}
}
// ---- </TEA> ----
var ciphertext = Tea.longsToStr(v);
return Base64.encode(ciphertext);
}
/*
* decrypt text using Corrected Block TEA (xxtea) algorithm
*
* #param {string} ciphertext String to be decrypted
* #param {string} password Password to be used for decryption (1st 16 chars)
* #returns {string} decrypted text
*/
Tea.decrypt = function(ciphertext, password) {
if (ciphertext.length == 0) return('');
var v = Tea.strToLongs(Base64.decode(ciphertext));
var k = Tea.strToLongs(Utf8.encode(password).slice(0,16));
var n = v.length;
// ---- <TEA decoding> ----
var z = v[n-1], y = v[0], delta = 0x9E3779B9;
var mx, e, q = Math.floor(6 + 52/n), sum = q*delta;
while (sum != 0) {
e = sum>>>2 & 3;
for (var p = n-1; p >= 0; p--) {
z = v[p>0 ? p-1 : n-1];
mx = (z>>>5 ^ y<<2) + (y>>>3 ^ z<<4) ^ (sum^y) + (k[p&3 ^ e] ^ z);
y = v[p] -= mx;
}
sum -= delta;
}
// ---- </TEA> ----
var plaintext = Tea.longsToStr(v);
// strip trailing null chars resulting from filling 4-char blocks:
plaintext = plaintext.replace(/\0+$/,'');
return Utf8.decode(plaintext);
}
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
// supporting functions
Tea.strToLongs = function(s) { // convert string to array of longs, each containing 4 chars
// note chars must be within ISO-8859-1 (with Unicode code-point < 256) to fit 4/long
var l = new Array(Math.ceil(s.length/4));
for (var i=0; i<l.length; i++) {
// note little-endian encoding - endianness is irrelevant as long as
// it is the same in longsToStr()
l[i] = s.charCodeAt(i*4) + (s.charCodeAt(i*4+1)<<8) +
(s.charCodeAt(i*4+2)<<16) + (s.charCodeAt(i*4+3)<<24);
}
return l; // note running off the end of the string generates nulls since
} // bitwise operators treat NaN as 0
Tea.longsToStr = function(l) { // convert array of longs back to string
var a = new Array(l.length);
for (var i=0; i<l.length; i++) {
a[i] = String.fromCharCode(l[i] & 0xFF, l[i]>>>8 & 0xFF,
l[i]>>>16 & 0xFF, l[i]>>>24 & 0xFF);
}
return a.join(''); // use Array.join() rather than repeated string appends for efficiency in IE
}
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/* Base64 class: Base 64 encoding / decoding (c) Chris Veness 2002-2010 */
/* note: depends on Utf8 class */
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
var Base64 = {}; // Base64 namespace
Base64.code = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
/**
* Encode string into Base64, as defined by RFC 4648 [http://tools.ietf.org/html/rfc4648]
* (instance method extending String object). As per RFC 4648, no newlines are added.
*
* #param {String} str The string to be encoded as base-64
* #param {Boolean} [utf8encode=false] Flag to indicate whether str is Unicode string to be encoded
* to UTF8 before conversion to base64; otherwise string is assumed to be 8-bit characters
* #returns {String} Base64-encoded string
*/
Base64.encode = function(str, utf8encode) { // http://tools.ietf.org/html/rfc4648
utf8encode = (typeof utf8encode == 'undefined') ? false : utf8encode;
var o1, o2, o3, bits, h1, h2, h3, h4, e=[], pad = '', c, plain, coded;
var b64 = Base64.code;
plain = utf8encode ? Utf8.encode(str) : str;
c = plain.length % 3; // pad string to length of multiple of 3
if (c > 0) { while (c++ < 3) { pad += '='; plain += '\0'; } }
// note: doing padding here saves us doing special-case packing for trailing 1 or 2 chars
for (c=0; c<plain.length; c+=3) { // pack three octets into four hexets
o1 = plain.charCodeAt(c);
o2 = plain.charCodeAt(c+1);
o3 = plain.charCodeAt(c+2);
bits = o1<<16 | o2<<8 | o3;
h1 = bits>>18 & 0x3f;
h2 = bits>>12 & 0x3f;
h3 = bits>>6 & 0x3f;
h4 = bits & 0x3f;
// use hextets to index into code string
e[c/3] = b64.charAt(h1) + b64.charAt(h2) + b64.charAt(h3) + b64.charAt(h4);
}
coded = e.join(''); // join() is far faster than repeated string concatenation in IE
// replace 'A's from padded nulls with '='s
coded = coded.slice(0, coded.length-pad.length) + pad;
return coded;
}
/**
* Decode string from Base64, as defined by RFC 4648 [http://tools.ietf.org/html/rfc4648]
* (instance method extending String object). As per RFC 4648, newlines are not catered for.
*
* #param {String} str The string to be decoded from base-64
* #param {Boolean} [utf8decode=false] Flag to indicate whether str is Unicode string to be decoded
* from UTF8 after conversion from base64
* #returns {String} decoded string
*/
Base64.decode = function(str, utf8decode) {
utf8decode = (typeof utf8decode == 'undefined') ? false : utf8decode;
var o1, o2, o3, h1, h2, h3, h4, bits, d=[], plain, coded;
var b64 = Base64.code;
coded = utf8decode ? Utf8.decode(str) : str;
for (var c=0; c<coded.length; c+=4) { // unpack four hexets into three octets
h1 = b64.indexOf(coded.charAt(c));
h2 = b64.indexOf(coded.charAt(c+1));
h3 = b64.indexOf(coded.charAt(c+2));
h4 = b64.indexOf(coded.charAt(c+3));
bits = h1<<18 | h2<<12 | h3<<6 | h4;
o1 = bits>>>16 & 0xff;
o2 = bits>>>8 & 0xff;
o3 = bits & 0xff;
d[c/4] = String.fromCharCode(o1, o2, o3);
// check for padding
if (h4 == 0x40) d[c/4] = String.fromCharCode(o1, o2);
if (h3 == 0x40) d[c/4] = String.fromCharCode(o1);
}
plain = d.join(''); // join() is far faster than repeated string concatenation in IE
return utf8decode ? Utf8.decode(plain) : plain;
}
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
/* Utf8 class: encode / decode between multi-byte Unicode characters and UTF-8 multiple */
/* single-byte character encoding (c) Chris Veness 2002-2010 */
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
var Utf8 = {}; // Utf8 namespace
/**
* Encode multi-byte Unicode string into utf-8 multiple single-byte characters
* (BMP / basic multilingual plane only)
*
* Chars in range U+0080 - U+07FF are encoded in 2 chars, U+0800 - U+FFFF in 3 chars
*
* #param {String} strUni Unicode string to be encoded as UTF-8
* #returns {String} encoded string
*/
Utf8.encode = function(strUni) {
// use regular expressions & String.replace callback function for better efficiency
// than procedural approaches
var strUtf = strUni.replace(
/[\u0080-\u07ff]/g, // U+0080 - U+07FF => 2 bytes 110yyyyy, 10zzzzzz
function(c) {
var cc = c.charCodeAt(0);
return String.fromCharCode(0xc0 | cc>>6, 0x80 | cc&0x3f); }
);
strUtf = strUtf.replace(
/[\u0800-\uffff]/g, // U+0800 - U+FFFF => 3 bytes 1110xxxx, 10yyyyyy, 10zzzzzz
function(c) {
var cc = c.charCodeAt(0);
return String.fromCharCode(0xe0 | cc>>12, 0x80 | cc>>6&0x3F, 0x80 | cc&0x3f); }
);
return strUtf;
}
/**
* Decode utf-8 encoded string back into multi-byte Unicode characters
*
* #param {String} strUtf UTF-8 string to be decoded back to Unicode
* #returns {String} decoded string
*/
Utf8.decode = function(strUtf) {
// note: decode 3-byte chars first as decoded 2-byte strings could appear to be 3-byte char!
var strUni = strUtf.replace(
/[\u00e0-\u00ef][\u0080-\u00bf][\u0080-\u00bf]/g, // 3-byte chars
function(c) { // (note parentheses for precence)
var cc = ((c.charCodeAt(0)&0x0f)<<12) | ((c.charCodeAt(1)&0x3f)<<6) | ( c.charCodeAt(2)&0x3f);
return String.fromCharCode(cc); }
);
strUni = strUni.replace(
/[\u00c0-\u00df][\u0080-\u00bf]/g, // 2-byte chars
function(c) { // (note parentheses for precence)
var cc = (c.charCodeAt(0)&0x1f)<<6 | c.charCodeAt(1)&0x3f;
return String.fromCharCode(cc); }
);
return strUni;
}
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */
Then for testing create an html anchor element, and set it's vale by de-crypting a pre-calculated encrypted string
document.body.innerHTML = "<a href='no_access' id='targetElement'>the link</a>"
// pre-calculated result: AAvpH77xTCdBO/qAb5yHOFVF3vlbi1XS6Dd5eA==
// By doing: Tea.encrypt('http://secret.encrypted.url/','some_password')
document.getElementById('targetElement').href=Tea.decrypt('AAvpH77xTCdBO/qAb5yHOFVF3vlbi1XS6Dd5eA==','some_password')

Related

Percentage increase or decrease between two values

How do I calculate the percentage of increase or decrease of two numbers in PHP?
For example: (increase)100, (decrease)1 = -99%
(increase)1, (decrease)100 = +99%
Before anything else you need to have a solid understanding of the meaning of percentages and how they are computed.
The meaning of "x is 15% of y" is:
x = (15 * y) / 100
The arithmetic operations with percentages are similar. If a increases with 12% (of its current value) then:
a = a + (12 * a) / 10
Which is the same as:
a = 112 * a / 100
Subtracting 9% (of its current value) from b is:
b = b - (9 * b) / 100
or
b = b * 91 / 100
which actually is 91% of the value of b (100% - 9% of b).
Turn the above a, b, x, y into PHP variables (by placing $ in front of them), terminate the statements with semicolons (;) and you get valid PHP code that performs percentage operations.
PHP doesn't provide any particular function that helps working with percentages. As you can see above, there is no need for them.
My 2 cents ;)
Using PHP
function pctDiff($x1, $x2) {
$diff = ($x2 - $x1) / $x1;
return round($diff * 100, 2);
}
Usage:
$oldValue = 1000;
$newValue = 203.5;
$diff = pctDiff($oldValue, $newValue);
echo pctDiff($oldValue, $newValue) . '%'; // -79.65%
Using Swift 3
func pctDiff(x1: CGFloat, x2: CGFloat) -> Double {
let diff = (x2 - x1) / x1
return Double(round(100 * (diff * 100)) / 100)
}
let oldValue: CGFloat = 1000
let newValue: CGFloat = 203.5
print("\(pctDiff(x1: oldValue, x2: newValue))%") // -79.65%

converting decimal to hexadecimal value

I need to convert the "age" of an item (in days), into an hexadecimal value, where oldest item = max color = D9D8C9, and most recent = min color = FFFFFF.
items beyond age 365 get color D9D8C9.
items beneath age 7 get color FFFFFF.
According to these min and max colors, how can i find the color of any item younger than 365 days and older than 7 days?
Eventually, I'll do it in PHP but pseudocode example is fine.
Essentially, you're looking for a way to arbitrarily map one range onto another (7-365 should be mapped of FFFFFF - D9D8D9).
First things first: converting decimal to hex is quite easy:
$age = mt_rand(1,600);
if ($age > 365) $hex = 'D9D8D9';
elseif ($age < 7) $hex = 'FFFFFF';
else $hex = str_pad(dechex($age), 6, '0', STR_PAD_LEFT);
What I do is simply check if $age is greater then 365, if so, I assign the D9D8D9 constant, if it's less than 7, I assign FFFFFF. In all other cases, I simply convert the number to hexadecimal, and pad the resulting string to 6 chars, using 0, so 255 would become FF, and is padded to 0000FF.
But to map a range on a range, we'll have to find out how a step of 1 in the smallest range scales to the larger one(s). It's a simple "rule of three": calculate the equivalent of 1 in both scales, and multiply. Then apply the same range-bounds and you're there.
The colour-range you're using is FFFFFF through D9D8D9, or to put it in decimals: 16777215 through 14276809. This leaves us with a range of 2500406, versus 365-7 (or 358) days. Each single day, therefore is "worth" 6984.374301676 ((D9D8D9-FFFFFF)/(365-7)) in our colour range.
Put it all together and you have 2 options: calculate the distance from FFFFFF or D9D8D9, but as far as the result is concerned, it doesn't matter which one you choose.
<CrLowBound> - (<value>-<VLowBound>)*<CrStep>
//or
<CrHighBound> - (<VHighBound> - <value>)*<CrStep>
Both simply compute the value in the colour range that corresponds to the given value. CrLowBound and CrHighBound are FFF and D9D8D9 respectively, in much the same way: VLowBound and VHighBound are 6 and 366. CrStep is 6984.374301676. I've explained how I got these values.
$age = mt_rand(1,600);
if ($age > 365) $hex = 'D9D8D9';
elseif ($age < 7) $hex = 'FFFFFF';
else $hex = str_pad(dechex(14276809-(round((366-$dec)*6984.374301676))), 6, '0', STR_PAD_LEFT);
//Or:
$hex = str_pad(dechex(16777215-(round(($dec-6)*6984.374301676))), 6, '0', STR_PAD_LEFT);
This will more evenly spread the range of colours within the D9D8D9 to FFFFFF range: the older the item, the closer the color will be to D9D8D9, the more recent, the closer it is to FFFFFF.
For example, if the age is 117, the "colour" will be 1A6E1D:
//age (max-age)*worth hex
117 == 1732125 == 1a6e1d
Tested with the following code:
function getColour($dec)
{
if ($dec > 365) return 'D9D8D9';
if ($dec < 7) return 'FFFFFF';
return strtoupper(
str_pad(
dechex(14276809-(round((366-$dec)*6984.374301676))),
6,
'0',
STR_PAD_LEFT
)
);
}
$days = range(6,366);
$colours = array();
foreach($days as $day) $colours[$day] = getColour($day);
$out = array_chunk($colours, 8);
foreach($out as $k => $chunk) $out[$k] = implode(' - ', $chunk);
echo implode('<br>', $out);
And got this as output:
FFFFFF - B3964B - B3B193 - B3CCDB - B3E824 - B4036C - B41EB4 - B439FD
B45545 - B4708E - B48BD6 - B4A71E - B4C267 - B4DDAF - B4F8F7 - B51440
B52F88 - B54AD1 - B56619 - B58161 - B59CAA - B5B7F2 - B5D33A - B5EE83
B609CB - B62514 - B6405C - B65BA4 - B676ED - B69235 - B6AD7D - B6C8C6
B6E40E - B6FF57 - B71A9F - B735E7 - B75130 - B76C78 - B787C0 - B7A309
B7BE51 - B7D99A - B7F4E2 - B8102A - B82B73 - B846BB - B86203 - B87D4C
B89894 - B8B3DD - B8CF25 - B8EA6D - B905B6 - B920FE - B93C46 - B9578F
B972D7 - B98E20 - B9A968 - B9C4B0 - B9DFF9 - B9FB41 - BA1689 - BA31D2
BA4D1A - BA6863 - BA83AB - BA9EF3 - BABA3C - BAD584 - BAF0CC - BB0C15
BB275D - BB42A6 - BB5DEE - BB7936 - BB947F - BBAFC7 - BBCB0F - BBE658
BC01A0 - BC1CE9 - BC3831 - BC5379 - BC6EC2 - BC8A0A - BCA552 - BCC09B
BCDBE3 - BCF72C - BD1274 - BD2DBC - BD4905 - BD644D - BD7F95 - BD9ADE
BDB626 - BDD16F - BDECB7 - BE07FF - BE2348 - BE3E90 - BE59D8 - BE7521
BE9069 - BEABB2 - BEC6FA - BEE242 - BEFD8B - BF18D3 - BF341B - BF4F64
BF6AAC - BF85F5 - BFA13D - BFBC85 - BFD7CE - BFF316 - C00E5E - C029A7
C044EF - C06038 - C07B80 - C096C8 - C0B211 - C0CD59 - C0E8A1 - C103EA
C11F32 - C13A7B - C155C3 - C1710B - C18C54 - C1A79C - C1C2E4 - C1DE2D
C1F975 - C214BE - C23006 - C24B4E - C26697 - C281DF - C29D27 - C2B870
C2D3B8 - C2EF01 - C30A49 - C32591 - C340DA - C35C22 - C3776A - C392B3
C3ADFB - C3C944 - C3E48C - C3FFD4 - C41B1D - C43665 - C451AD - C46CF6
C4883E - C4A387 - C4BECF - C4DA17 - C4F560 - C510A8 - C52BF0 - C54739
C56281 - C57DCA - C59912 - C5B45A - C5CFA3 - C5EAEB - C60633 - C6217C
C63CC4 - C6580D - C67355 - C68E9D - C6A9E6 - C6C52E - C6E076 - C6FBBF
C71707 - C7324F - C74D98 - C768E0 - C78429 - C79F71 - C7BAB9 - C7D602
C7F14A - C80C92 - C827DB - C84323 - C85E6C - C879B4 - C894FC - C8B045
C8CB8D - C8E6D5 - C9021E - C91D66 - C938AF - C953F7 - C96F3F - C98A88
C9A5D0 - C9C118 - C9DC61 - C9F7A9 - CA12F2 - CA2E3A - CA4982 - CA64CB
CA8013 - CA9B5B - CAB6A4 - CAD1EC - CAED35 - CB087D - CB23C5 - CB3F0E
CB5A56 - CB759E - CB90E7 - CBAC2F - CBC778 - CBE2C0 - CBFE08 - CC1951
CC3499 - CC4FE1 - CC6B2A - CC8672 - CCA1BB - CCBD03 - CCD84B - CCF394
CD0EDC - CD2A24 - CD456D - CD60B5 - CD7BFE - CD9746 - CDB28E - CDCDD7
CDE91F - CE0467 - CE1FB0 - CE3AF8 - CE5641 - CE7189 - CE8CD1 - CEA81A
CEC362 - CEDEAA - CEF9F3 - CF153B - CF3084 - CF4BCC - CF6714 - CF825D
CF9DA5 - CFB8ED - CFD436 - CFEF7E - D00AC7 - D0260F - D04157 - D05CA0
D077E8 - D09330 - D0AE79 - D0C9C1 - D0E50A - D10052 - D11B9A - D136E3
D1522B - D16D73 - D188BC - D1A404 - D1BF4D - D1DA95 - D1F5DD - D21126
D22C6E - D247B6 - D262FF - D27E47 - D29990 - D2B4D8 - D2D020 - D2EB69
D306B1 - D321F9 - D33D42 - D3588A - D373D3 - D38F1B - D3AA63 - D3C5AC
D3E0F4 - D3FC3C - D41785 - D432CD - D44E16 - D4695E - D484A6 - D49FEF
D4BB37 - D4D67F - D4F1C8 - D50D10 - D52859 - D543A1 - D55EE9 - D57A32
D5957A - D5B0C2 - D5CC0B - D5E753 - D6029C - D61DE4 - D6392C - D65475
D66FBD - D68B05 - D6A64E - D6C196 - D6DCDF - D6F827 - D7136F - D72EB8
D74A00 - D76548 - D78091 - D79BD9 - D7B722 - D7D26A - D7EDB2 - D808FB
D82443 - D83F8B - D85AD4 - D8761C - D89165 - D8ACAD - D8C7F5 - D8E33E
D8FE86 - D919CE - D93517 - D9505F - D96BA8 - D986F0 - D9A238 - D9BD81
D9D8D9
Codepad with 2 versions of this code

How does similar_text work?

I just found the similar_text function and was playing around with it, but the percentage output always suprises me. See the examples below.
I tried to find information on the algorithm used as mentioned on php: similar_text()Docs:
<?php
$p = 0;
similar_text('aaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//66.666666666667
//Since 5 out of 10 chars match, I would expect a 50% match
similar_text('aaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//40
//5 out of 20 > not 25% ?
similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//9.5238095238095
//5 out of 100 > not 5% ?
//Example from PHP.net
//Why is turning the strings around changing the result?
similar_text('PHP IS GREAT', 'WITH MYSQL', $p);
echo $p . "<hr>"; //27.272727272727
similar_text('WITH MYSQL', 'PHP IS GREAT', $p);
echo $p . "<hr>"; //18.181818181818
?>
Can anybody explain how this actually works?
Update:
Thanks to the comments I found that the percentage is actually calculated using the number of similar charactors * 200 / length1 + lenght 2
Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
So that explains why the percenatges are higher then expected. With a string with 5 out of 95 it turns out 10, so that I can use.
similar_text('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa', 'aaaaa', $p);
echo $p . "<hr>";
//10
//5 out of 95 = 5 * 200 / (5 + 95) = 10
But I still cant figure out why PHP returns a different result on turning the strings around. The JS code provided by dfsq doesn't do this. Looking at the source code in PHP I can only find a difference in the following line, but i'm not a c programmer. Some insight in what the difference is, would be appreciated.
In JS:
for (l = 0;(p + l < firstLength) && (q + l < secondLength) && (first.charAt(p + l) === second.charAt(q + l)); l++);
In PHP: (php_similar_str function)
for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
Source:
/* {{{ proto int similar_text(string str1, string str2 [, float percent])
Calculates the similarity between two strings */
PHP_FUNCTION(similar_text)
{
char *t1, *t2;
zval **percent = NULL;
int ac = ZEND_NUM_ARGS();
int sim;
int t1_len, t2_len;
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "ss|Z", &t1, &t1_len, &t2, &t2_len, &percent) == FAILURE) {
return;
}
if (ac > 2) {
convert_to_double_ex(percent);
}
if (t1_len + t2_len == 0) {
if (ac > 2) {
Z_DVAL_PP(percent) = 0;
}
RETURN_LONG(0);
}
sim = php_similar_char(t1, t1_len, t2, t2_len);
if (ac > 2) {
Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
}
RETURN_LONG(sim);
}
/* }}} */
/* {{{ php_similar_str
*/
static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max)
{
char *p, *q;
char *end1 = (char *) txt1 + len1;
char *end2 = (char *) txt2 + len2;
int l;
*max = 0;
for (p = (char *) txt1; p < end1; p++) {
for (q = (char *) txt2; q < end2; q++) {
for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
if (l > *max) {
*max = l;
*pos1 = p - txt1;
*pos2 = q - txt2;
}
}
}
}
/* }}} */
/* {{{ php_similar_char
*/
static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2)
{
int sum;
int pos1, pos2, max;
php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max);
if ((sum = max)) {
if (pos1 && pos2) {
sum += php_similar_char(txt1, pos1,
txt2, pos2);
}
if ((pos1 + max < len1) && (pos2 + max < len2)) {
sum += php_similar_char(txt1 + pos1 + max, len1 - pos1 - max,
txt2 + pos2 + max, len2 - pos2 - max);
}
}
return sum;
}
/* }}} */
Source in Javascript: similar text port to javascript
This was actually a very interesting question, thank you for giving me a puzzle that turned out to be very rewarding.
Let me start out by explaining how similar_text actually works.
Similar Text: The Algorithm
It's a recursion based divide and conquer algorithm. It works by first finding the longest common string between the two inputs and breaking the problem into subsets around that string.
The examples you have used in your question, actually all perform only one iteration of the algorithm. The only ones not using one iteration and the ones giving different results are from the php.net comments.
Here is a simple example to understand the main issue behind simple_text and hopefully give some insight into how it works.
Similar Text: The Flaw
eeeefaaaaafddddd
ddddgaaaaagbeeee
Iteration 1:
Max = 5
String = aaaaa
Left : eeeef and ddddg
Right: fddddd and geeeee
I hope the flaw is already apparent. It will only check directly to the left and to the right of the longest matched string in both input strings. This example
$s1='eeeefaaaaafddddd';
$s2='ddddgaaaaagbeeee';
echo similar_text($s1, $s2).'|'.similar_text($s2, $s1);
// outputs 5|5, this is due to Iteration 2 of the algorithm
// it will fail to find a matching string in both left and right subsets
To be honest, I'm uncertain how this case should be treated. It can be seen that only 2 characters are different in the string.
But both eeee and dddd are on opposite ends of the two strings, uncertain what NLP enthusiasts or other literary experts have to say about this specific situation.
Similar Text: Inconsistent results on argument swapping
The different results you were experiencing based on input order was due to the way the alogirthm actually behaves (as mentioned above).
I'll give a final explination on what's going on.
echo similar_text('test','wert'); // 1
echo similar_text('wert','test'); // 2
On the first case, there's only one Iteration:
test
wert
Iteration 1:
Max = 1
String = t
Left : and wer
Right: est and
We only have one iteration because empty/null strings return 0 on recursion. So this ends the algorithm and we have our result: 1
On the second case, however, we are faced with multiple Iterations:
wert
test
Iteration 1:
Max = 1
String = e
Left : w and t
Right: rt and st
We already have a common string of length 1. The algorithm on the left subset will end in 0 matches, but on the right:
rt
st
Iteration 1:
Max = 1
String = t
Left : r and s
Right: and
This will lead to our new and final result: 2
I thank you for this very informative question and the opportunity to dabble in C++ again.
Similar Text: JavaScript Edition
The short answer is: The javascript code is not implementing the correct algorithm
sum += this.similar_text(first.substr(0, pos2), second.substr(0, pos2));
Obviously it should be first.substr(0,pos1)
Note: The JavaScript code has been fixed by eis in a previous commit. Thanks #eis
Demystified!
It would indeed seem the function uses different logic depending of the parameter order. I think there are two things at play.
First, see this example:
echo similar_text('test','wert'); // 1
echo similar_text('wert','test'); // 2
It seems to be that it is testing "how many times any distinct char on param1 is found in param2", and thus result would be different if you swap the params around. It has been reported as a bug, which has been closed as "working as expected".
Now, the above is the same for both PHP and javascript implementations - paremeter order has an impact, so saying that JS code wouldn't do this is wrong. This is argued in the bug entry as intended behaviour.
Second - what doesn't seem correct is the MYSQL/PHP word example. With that, javascript version gives 3 irrelevant of the order of params, whereas PHP gives 2 and 3 (and due to that, percentage is equally different). Now, the phrases "PHP IS GREAT" and "WITH MYSQL" should have 5 characters in common, irrelevant of which way you compare: H, I, S and T, one each, plus one for empty space. In order they have 3 characters, 'H', ' ' and 'S', so if you look at the ordering, correct answer should be 3 both ways. I modified the C code to a runnable version, and added some output, so one can see what is happening there (codepad link):
#include<stdio.h>
/* {{{ php_similar_str
*/
static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max)
{
char *p, *q;
char *end1 = (char *) txt1 + len1;
char *end2 = (char *) txt2 + len2;
int l;
*max = 0;
for (p = (char *) txt1; p < end1; p++) {
for (q = (char *) txt2; q < end2; q++) {
for (l = 0; (p + l < end1) && (q + l < end2) && (p[l] == q[l]); l++);
if (l > *max) {
*max = l;
*pos1 = p - txt1;
*pos2 = q - txt2;
}
}
}
}
/* }}} */
/* {{{ php_similar_char
*/
static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2)
{
int sum;
int pos1, pos2, max;
php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max);
if ((sum = max)) {
if (pos1 && pos2) {
printf("txt here %s,%s\n", txt1, txt2);
sum += php_similar_char(txt1, pos1,
txt2, pos2);
}
if ((pos1 + max < len1) && (pos2 + max < len2)) {
printf("txt here %s,%s\n", txt1+ pos1 + max, txt2+ pos2 + max);
sum += php_similar_char(txt1 + pos1 + max, len1 - pos1 - max,
txt2 + pos2 + max, len2 - pos2 - max);
}
}
return sum;
}
/* }}} */
int main(void)
{
printf("Found %d similar chars\n",
php_similar_char("PHP IS GREAT", 12, "WITH MYSQL", 10));
printf("Found %d similar chars\n",
php_similar_char("WITH MYSQL", 10,"PHP IS GREAT", 12));
return 0;
}
the result is output:
txt here PHP IS GREAT,WITH MYSQL
txt here P IS GREAT, MYSQL
txt here IS GREAT,MYSQL
txt here IS GREAT,MYSQL
txt here GREAT,QL
Found 3 similar chars
txt here WITH MYSQL,PHP IS GREAT
txt here TH MYSQL,S GREAT
Found 2 similar chars
So one can see that on the first comparison, the function found 'H', ' ' and 'S', but not 'T', and got the result of 3. The second comparison found 'I' and 'T' but not 'H', ' ' or 'S', and thus got the result of 2.
The reason for these results can be seen from the output: algorithm takes the first letter in the first string that second string contains, counts that, and throws away the chars before that from the second string. That is why it misses the characters in-between, and that's the thing causing the difference when you change the character order.
What happens there might be intentional or it might not. However, that's not how javascript version works. If you print out the same things in the javascript version, you get this:
txt here: PHP, WIT
txt here: P IS GREAT, MYSQL
txt here: IS GREAT, MYSQL
txt here: IS, MY
txt here: GREAT, QL
Found 3 similar chars
txt here: WITH, PHP
txt here: W, P
txt here: TH MYSQL, S GREAT
Found 3 similar chars
showing that javascript version does it in a different way. What the javascript version does is that it finds 'H', ' ' and 'S' being in the same order in the first comparison, and the same 'H', ' ' and 'S' also on the second one - so in this case the order of params doesn't matter.
As the javascript is meant to duplicate the code of PHP function, it needs to behave identically, so I submitted bug report based on analysis of #Khez and the fix, which has been merged now.
first String = aaaaaaaaaa = 10 letters
second String = aaaaa = 5 letters
first five letters are similar
a+a
a+a
a+a
a+a
a+a
a
a
a
a
a
( <similar_letters> * 200 ) / (<letter_count_first_string> + <letter_count_second_string>)
( 5 * 200 ) / (10 + 5);
= 66.6666666667
Description
int similar_text ( string $first , string $second [, float &$percent ] )
This calculates the similarity between two strings as described in Oliver [1993]. Note that this implementation does not use a stack as in Oliver's pseudo code, but recursive calls which may or may not speed up the whole process. Note also that the complexity of this algorithm is O(N**3) where N is the length of the longest string.
Parameters
first
The first string.
second
The second string.
percent
By passing a reference as third argument, similar_text() will calculate the similarity in percent for you.

To select a specific column from a table using php postgres

I have table of 5000+ rows and 8+ columns like,
Station Lat Long Date Rainfall Temp Humidity Windspeed
Abcd - - 09/09/1996 - - - -
Abcd - - 10/09/1996 - - - -
Abcd - - 11/09/1996 - - - -
Abcd - - 12/09/1996 - - - -
Efgh - - 09/09/1996 - - - -
Efgh - - 10/09/1996 - - - -
Efgh - - 11/09/1996 - - - -
Efgh - - 12/09/1996 - - - -
I am developing a web application, in that user will select a column like rainfall/temp/humidity and for a particular date.
Can anyone guide me how to query for this in php-postgres. (database:postgres, table:weatherdata, user:user, password:password)
Thanks in advance.
You can use some code like this:
public function getData ($date, $columnsToShow = null) {
/* You could check the parameters here:
* $date is string and not empty
* $columnsToShow is an array or null.
*/
if (isset ($columnsToShow))
$columnsToShow = implode (',', $columnsToShow);
else $columnsToShow = "*";
$query = "select {$columnsToShow}
from table
where date = '{$date}'";
$result = array();
$conex = pg_connect ("host=yourHost user=yourUser password=yourUser dbname=yourDatabase");
if (is_resource ($conex)) {
$rows = pg_query ($conex, $query);
if ($rows) {
while ($data = pg_fetch_array ($rows, null, 'PGSQL_ASSOC'))
$result[] = $data;
}
}
return (empty ($result) ? null : $result);
}
Now you can invoke, for example, like this:
getData ('2012-03-21', array ('Station', 'Rainfall'));
I hope you serve.

Coordinates system (PHP an array)

I have point A(x,y) and B(x,y). they listed as array (a => array(x,y), b => array(x,y))
How get lenght between point A and B. Please help in php. :)
Well, remember your high-school geometry.
r = square_root((x2 - x1)^2 + (y2 - y1)^2)
So in php:
$dx = $points['b'][0] - $points['a'][0];
$dy = $points['b'][1] - $points['a'][1];
$r = sqrt(pow($dx, 2) + pow($dy, 2));

Categories