Get offset in combinations without generating all possible combinations - php

I'm trying to get a specific combination from all possible combinations of a 6 character combination with any characters a-z,0-9. There are a few hundred million possible combinations and I can find the offset when generating the combinations through recursion but I would much rather use an algorithm that doesn't need to store all preceding combinations to arrive at the given offset. I'm not a math guy and I figure this is pretty mathie. Any advice would be greatly appreciated... Thanks
I found this question: Find the index of a specific combination without generating all ncr combinations
.. but I'm not adept at C# and not sure if I can correctly translate it to PHP.

I'm not a PHP-guy but hopefully a JavaScript code with no advanced magic could be our common ground. So first here is some code:
const defaultChars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
function indexToPermutation(index, targetLength, chars) {
if (arguments.length < 3)
chars = defaultChars;
const charsLen = chars.length;
let combinationChars = [];
for (let i = 0; i < targetLength; i++) {
let ch = chars[index % charsLen];
combinationChars.push(ch);
index = Math.floor(index / charsLen); //integer division
}
return combinationChars.reverse().join("")
}
function permutationToIndex(combination, chars) {
if (arguments.length < 2)
chars = defaultChars;
const charsLen = chars.length;
let index = 0;
let combinationChars = combination.split("");
for (let i = 0; i < combination.length; i++) {
let digit = chars.indexOf(combinationChars[i]);
index = index * charsLen + digit;
}
return index;
}
function indexToCombination(index, targetLength, chars) {
if (arguments.length < 3)
chars = defaultChars;
let base = chars.length - targetLength + 1;
let combinationIndices = [];
for (let i = 0; i < targetLength; i++) {
let digit = index % base;
combinationIndices.push(digit);
index = Math.floor(index / base); //integer division
base++;
}
let combinationChars = [];
for (let i = targetLength - 1; i >= 0; i--) {
let ch = chars[combinationIndices[i]];
combinationChars.push(ch);
// here it is important that chars is only local copy rather than global variable
chars = chars.slice(0, combinationIndices[i]) + chars.slice(combinationIndices[i] + 1); // effectively chars.removeAt(combinationIndices[i])
}
return combinationChars.join("")
}
function combinationToIndex(combination, chars) {
if (arguments.length < 2)
chars = defaultChars;
const charLength = chars.length; // full length before removing!
let combinationChars = combination.split("");
let digits = [];
for (let i = 0; i < combination.length; i++) {
let digit = chars.indexOf(combinationChars[i]);
// here it is important that chars is only local copy rather than global variable
chars = chars.slice(0, digit) + chars.slice(digit + 1); // effectively chars.removeAt(digit)
digits.push(digit);
}
let index = 0;
let base = charLength;
for (let i = 0; i < combination.length; i++) {
index = index * base + digits[i];
base--;
}
return index;
}
and here are some results
indexToPermutation(0, 6) => "000000"
indexToPermutation(1, 6) => "000001"
indexToPermutation(123, 6) => "00003F"
permutationToIndex("00003F") => 123
permutationToIndex("000123") => 1371
indexToCombination(0,6) => "012345"
indexToCombination(1,6) => "012346"
indexToCombination(123,6) => "01237Z"
combinationToIndex("01237Z") => 123
combinationToIndex("543210") => 199331519
Obviously if order of digits and letters is different for you, you can change it by either passing explicitly chars (last argument) or changing defaultChars.
Some math behind it
For permutations you may see a string as a number written in a 36-base numeral system (36 = 10 digits + 26 letters) similar to how hexadecimal works. So converting index <=> permutation is actually a simple job of converting to and from that numeral system.
For combinations idea is similar but the numeral system is what I call a "variable-base" numeral system with base changing from 36 to 36-n+1 (where n is the target length of combination). An example of a more familiar "variable-base" numeral system is clock. If you want to convert milliseconds into time, you first divide by 1000 milliseconds, then by 60 seconds, then by 60 minutes, and then by 24 hours. The only really trick part here is that which "digits" are allowed for each position depends on which digits are already used by the previous positions (size of the digits set is always the same nevertheless). This tricky part is the reason chars argument is modified to remove one char after each iteration in both combinationToIndex and indexToCombination.

Related

Convert string to consistent but random 1 of 10 options

I have many strings. Each string something like:
"i_love_pizza_123"
"whatever_this_is_now_later"
"programming_is_awesome"
"stack_overflow_ftw"
...etc
I need to be able to convert each string to a random number, 1-10. Each time that string gets converted, it should consistently be the same number. A sampling of strings, even with similar text should result in a fairly even spread of values 1-10.
My first thought was to do something like md5($string), then break down a-f,0-9 into ten roughly-equal groups, determine where the first character of the hash falls, and put it in that group. But doing so seems to have issues when converting 16 down to 10 by multiplying by 0.625, but that causes the spread to be uneven.
Thoughts on a good method to consistently convert a string to a random/repeatable number, 1-10? There has to be an easier way.
Here's a quick demo how you can do it.
function getOneToTenHash($str) {
$hash = hash('sha256', $str, true);
$unpacked = unpack("L", $hash); // convert first 4 bytes of hash to 32-bit unsigned int
$val = $unpacked[1];
return ($val % 10) + 1; // get 1 - 10 value
}
for ($i = 0; $i < 100; $i++) {
echo getOneToTenHash('str' . $i) . "\n";
}
How it works:
Basically you get the output of a hash function and downscale it to desired range (1..10 in this case).
In the example above, I used sha256 hash function which returns 32 bytes of arbitrary binary data. Then I extract just first 4 bytes as integer value (unpack()).
At this point I have a 4 bytes integer value (0..4294967295 range). In order to downscale it to 1..10 range I just take the remainder of division by 10 (0..9) and add 1.
It's not the only way to downscale the range but an easy one.
So, the above example consists of 3 steps:
get the hash value
convert the hash value to integer
downscale integer range
A much shorter example with crc32() function which returns integer value right away thus allowing us to omit step 2:
function getOneToTenHash($str) {
$int = crc32($str); // 0..4294967295
return ($int % 10) + 1; // 1..10
}
below maybe what u want
$inStr = "hello world";
$md5Str = md5($inStr);
$len = strlen($md5Str);
$out = 0;
for($i=0; $i<$len; $i++) {
$out = 7*$out + intval($md5Str[$i]); // if you want more random, can and random() here
}
$out = ($out % 10 + 9)%10; // scope= [1,10]

How to make shorten URL like bit.ly [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 1 year ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to "http://www.example.org/abcdef".
Instead of "abcdef" there can be any other string with six characters containing a-z, A-Z and 0-9. That makes 56~57 billion possible strings.
My approach:
I have a database table with three columns:
id, integer, auto-increment
long, string, the long URL the user entered
short, string, the shortened URL (or just the six characters)
I would then insert the long URL into the table. Then I would select the auto-increment value for "id" and build a hash of it. This hash should then be inserted as "short". But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don't use these algorithms, I think. A self-built algorithm will work, too.
My idea:
For "http://www.google.de/" I get the auto-increment id 239472. Then I do the following steps:
short = '';
if divisible by 2, add "a"+the result to short
if divisible by 3, add "b"+the result to short
... until I have divisors for a-z and A-Z.
That could be repeated until the number isn't divisible any more. Do you think this is a good approach? Do you have a better idea?
Due to the ongoing interest in this topic, I've published an efficient solution to GitHub, with implementations for JavaScript, PHP, Python and Java. Add your solutions if you like :)
I would continue your "convert number to string" approach. However, you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.
Theoretical background
You need a Bijective Function f. This is necessary so that you can find a inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:
There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
and for every y you must be able to find an x so that f(x) = y.
How to convert the ID to a shortened URL
Think of an alphabet we want to use. In your case, that's [a-zA-Z0-9]. It contains 62 letters.
Take an auto-generated, unique numerical key (the auto-incremented id of a MySQL table for example).
For this example, I will use 12510 (125 with a base of 10).
Now you have to convert 12510 to X62 (base 62).
12510 = 2×621 + 1×620 = [2,1]
This requires the use of integer division and modulo. A pseudo-code example:
digits = []
while num > 0
remainder = modulo(num, 62)
digits.push(remainder)
num = divide(num, 62)
digits = digits.reverse
Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:
0 → a
1 → b
...
25 → z
...
52 → 0
61 → 9
With 2 → c and 1 → b, you will receive cb62 as the shortened URL.
http://shor.ty/cb
How to resolve a shortened URL to the initial ID
The reverse is even easier. You just do a reverse lookup in your alphabet.
e9a62 will be resolved to "4th, 61st, and 0th letter in the alphabet".
e9a62 = [4,61,0] = 4×622 + 61×621 + 0×620 = 1915810
Now find your database-record with WHERE id = 19158 and do the redirect.
Example implementations (provided by commenters)
C++
Python
Ruby
Haskell
C#
CoffeeScript
Perl
Why would you want to use a hash?
You can just use a simple translation of your auto-increment value to an alphanumeric value. You can do that easily by using some base conversion. Say you character space (A-Z, a-z, 0-9, etc.) has 62 characters, convert the id to a base-40 number and use the characters as the digits.
public class UrlShortener {
private static final String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private static final int BASE = ALPHABET.length();
public static String encode(int num) {
StringBuilder sb = new StringBuilder();
while ( num > 0 ) {
sb.append( ALPHABET.charAt( num % BASE ) );
num /= BASE;
}
return sb.reverse().toString();
}
public static int decode(String str) {
int num = 0;
for ( int i = 0; i < str.length(); i++ )
num = num * BASE + ALPHABET.indexOf(str.charAt(i));
return num;
}
}
Not an answer to your question, but I wouldn't use case-sensitive shortened URLs. They are hard to remember, usually unreadable (many fonts render 1 and l, 0 and O and other characters very very similar that they are near impossible to tell the difference) and downright error prone. Try to use lower or upper case only.
Also, try to have a format where you mix the numbers and characters in a predefined form. There are studies that show that people tend to remember one form better than others (think phone numbers, where the numbers are grouped in a specific form). Try something like num-char-char-num-char-char. I know this will lower the combinations, especially if you don't have upper and lower case, but it would be more usable and therefore useful.
My approach: Take the Database ID, then Base36 Encode it. I would NOT use both Upper AND Lowercase letters, because that makes transmitting those URLs over the telephone a nightmare, but you could of course easily extend the function to be a base 62 en/decoder.
Here is my PHP 5 class.
<?php
class Bijective
{
public $dictionary = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
public function __construct()
{
$this->dictionary = str_split($this->dictionary);
}
public function encode($i)
{
if ($i == 0)
return $this->dictionary[0];
$result = '';
$base = count($this->dictionary);
while ($i > 0)
{
$result[] = $this->dictionary[($i % $base)];
$i = floor($i / $base);
}
$result = array_reverse($result);
return join("", $result);
}
public function decode($input)
{
$i = 0;
$base = count($this->dictionary);
$input = str_split($input);
foreach($input as $char)
{
$pos = array_search($char, $this->dictionary);
$i = $i * $base + $pos;
}
return $i;
}
}
A Node.js and MongoDB solution
Since we know the format that MongoDB uses to create a new ObjectId with 12 bytes.
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id
a 3-byte counter (in your machine), starting with a random value.
Example (I choose a random sequence)
a1b2c3d4e5f6g7h8i9j1k2l3
a1b2c3d4 represents the seconds since the Unix epoch,
4e5f6g7 represents machine identifier,
h8i9 represents process id
j1k2l3 represents the counter, starting with a random value.
Since the counter will be unique if we are storing the data in the same machine we can get it with no doubts that it will be duplicate.
So the short URL will be the counter and here is a code snippet assuming that your server is running properly.
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
// Create a schema
const shortUrl = new Schema({
long_url: { type: String, required: true },
short_url: { type: String, required: true, unique: true },
});
const ShortUrl = mongoose.model('ShortUrl', shortUrl);
// The user can request to get a short URL by providing a long URL using a form
app.post('/shorten', function(req ,res){
// Create a new shortUrl */
// The submit form has an input with longURL as its name attribute.
const longUrl = req.body["longURL"];
const newUrl = ShortUrl({
long_url : longUrl,
short_url : "",
});
const shortUrl = newUrl._id.toString().slice(-6);
newUrl.short_url = shortUrl;
console.log(newUrl);
newUrl.save(function(err){
console.log("the new URL is added");
})
});
I keep incrementing an integer sequence per domain in the database and use Hashids to encode the integer into a URL path.
static hashids = Hashids(salt = "my app rocks", minSize = 6)
I ran a script to see how long it takes until it exhausts the character length. For six characters it can do 164,916,224 links and then goes up to seven characters. Bitly uses seven characters. Under five characters looks weird to me.
Hashids can decode the URL path back to a integer but a simpler solution is to use the entire short link sho.rt/ka8ds3 as a primary key.
Here is the full concept:
function addDomain(domain) {
table("domains").insert("domain", domain, "seq", 0)
}
function addURL(domain, longURL) {
seq = table("domains").where("domain = ?", domain).increment("seq")
shortURL = domain + "/" + hashids.encode(seq)
table("links").insert("short", shortURL, "long", longURL)
return shortURL
}
// GET /:hashcode
function handleRequest(req, res) {
shortURL = req.host + "/" + req.param("hashcode")
longURL = table("links").where("short = ?", shortURL).get("long")
res.redirect(301, longURL)
}
C# version:
public class UrlShortener
{
private static String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private static int BASE = 62;
public static String encode(int num)
{
StringBuilder sb = new StringBuilder();
while ( num > 0 )
{
sb.Append( ALPHABET[( num % BASE )] );
num /= BASE;
}
StringBuilder builder = new StringBuilder();
for (int i = sb.Length - 1; i >= 0; i--)
{
builder.Append(sb[i]);
}
return builder.ToString();
}
public static int decode(String str)
{
int num = 0;
for ( int i = 0, len = str.Length; i < len; i++ )
{
num = num * BASE + ALPHABET.IndexOf( str[(i)] );
}
return num;
}
}
You could hash the entire URL, but if you just want to shorten the id, do as marcel suggested. I wrote this Python implementation:
https://gist.github.com/778542
Take a look at https://hashids.org/ it is open source and in many languages.
Their page outlines some of the pitfalls of other approaches.
If you don't want re-invent the wheel ... http://lilurl.sourceforge.net/
// simple approach
$original_id = 56789;
$shortened_id = base_convert($original_id, 10, 36);
$un_shortened_id = base_convert($shortened_id, 36, 10);
alphabet = map(chr, range(97,123)+range(65,91)) + map(str,range(0,10))
def lookup(k, a=alphabet):
if type(k) == int:
return a[k]
elif type(k) == str:
return a.index(k)
def encode(i, a=alphabet):
'''Takes an integer and returns it in the given base with mappings for upper/lower case letters and numbers 0-9.'''
try:
i = int(i)
except Exception:
raise TypeError("Input must be an integer.")
def incode(i=i, p=1, a=a):
# Here to protect p.
if i <= 61:
return lookup(i)
else:
pval = pow(62,p)
nval = i/pval
remainder = i % pval
if nval <= 61:
return lookup(nval) + incode(i % pval)
else:
return incode(i, p+1)
return incode()
def decode(s, a=alphabet):
'''Takes a base 62 string in our alphabet and returns it in base10.'''
try:
s = str(s)
except Exception:
raise TypeError("Input must be a string.")
return sum([lookup(i) * pow(62,p) for p,i in enumerate(list(reversed(s)))])a
Here's my version for whomever needs it.
Why not just translate your id to a string? You just need a function that maps a digit between, say, 0 and 61 to a single letter (upper/lower case) or digit. Then apply this to create, say, 4-letter codes, and you've got 14.7 million URLs covered.
Here is a decent URL encoding function for PHP...
// From http://snipplr.com/view/22246/base62-encode--decode/
private function base_encode($val, $base=62, $chars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ') {
$str = '';
do {
$i = fmod($val, $base);
$str = $chars[$i] . $str;
$val = ($val - $i) / $base;
} while($val > 0);
return $str;
}
Don't know if anyone will find this useful - it is more of a 'hack n slash' method, yet is simple and works nicely if you want only specific chars.
$dictionary = "abcdfghjklmnpqrstvwxyz23456789";
$dictionary = str_split($dictionary);
// Encode
$str_id = '';
$base = count($dictionary);
while($id > 0) {
$rem = $id % $base;
$id = ($id - $rem) / $base;
$str_id .= $dictionary[$rem];
}
// Decode
$id_ar = str_split($str_id);
$id = 0;
for($i = count($id_ar); $i > 0; $i--) {
$id += array_search($id_ar[$i-1], $dictionary) * pow($base, $i - 1);
}
Did you omit O, 0, and i on purpose?
I just created a PHP class based on Ryan's solution.
<?php
$shorty = new App_Shorty();
echo 'ID: ' . 1000;
echo '<br/> Short link: ' . $shorty->encode(1000);
echo '<br/> Decoded Short Link: ' . $shorty->decode($shorty->encode(1000));
/**
* A nice shorting class based on Ryan Charmley's suggestion see the link on Stack Overflow below.
* #author Svetoslav Marinov (Slavi) | http://WebWeb.ca
* #see http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener/10386945#10386945
*/
class App_Shorty {
/**
* Explicitly omitted: i, o, 1, 0 because they are confusing. Also use only lowercase ... as
* dictating this over the phone might be tough.
* #var string
*/
private $dictionary = "abcdfghjklmnpqrstvwxyz23456789";
private $dictionary_array = array();
public function __construct() {
$this->dictionary_array = str_split($this->dictionary);
}
/**
* Gets ID and converts it into a string.
* #param int $id
*/
public function encode($id) {
$str_id = '';
$base = count($this->dictionary_array);
while ($id > 0) {
$rem = $id % $base;
$id = ($id - $rem) / $base;
$str_id .= $this->dictionary_array[$rem];
}
return $str_id;
}
/**
* Converts /abc into an integer ID
* #param string
* #return int $id
*/
public function decode($str_id) {
$id = 0;
$id_ar = str_split($str_id);
$base = count($this->dictionary_array);
for ($i = count($id_ar); $i > 0; $i--) {
$id += array_search($id_ar[$i - 1], $this->dictionary_array) * pow($base, $i - 1);
}
return $id;
}
}
?>
public class TinyUrl {
private final String characterMap = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private final int charBase = characterMap.length();
public String covertToCharacter(int num){
StringBuilder sb = new StringBuilder();
while (num > 0){
sb.append(characterMap.charAt(num % charBase));
num /= charBase;
}
return sb.reverse().toString();
}
public int covertToInteger(String str){
int num = 0;
for(int i = 0 ; i< str.length(); i++)
num += characterMap.indexOf(str.charAt(i)) * Math.pow(charBase , (str.length() - (i + 1)));
return num;
}
}
class TinyUrlTest{
public static void main(String[] args) {
TinyUrl tinyUrl = new TinyUrl();
int num = 122312215;
String url = tinyUrl.covertToCharacter(num);
System.out.println("Tiny url: " + url);
System.out.println("Id: " + tinyUrl.covertToInteger(url));
}
}
This is what I use:
# Generate a [0-9a-zA-Z] string
ALPHABET = map(str,range(0, 10)) + map(chr, range(97, 123) + range(65, 91))
def encode_id(id_number, alphabet=ALPHABET):
"""Convert an integer to a string."""
if id_number == 0:
return alphabet[0]
alphabet_len = len(alphabet) # Cache
result = ''
while id_number > 0:
id_number, mod = divmod(id_number, alphabet_len)
result = alphabet[mod] + result
return result
def decode_id(id_string, alphabet=ALPHABET):
"""Convert a string to an integer."""
alphabet_len = len(alphabet) # Cache
return sum([alphabet.index(char) * pow(alphabet_len, power) for power, char in enumerate(reversed(id_string))])
It's very fast and can take long integers.
For a similar project, to get a new key, I make a wrapper function around a random string generator that calls the generator until I get a string that hasn't already been used in my hashtable. This method will slow down once your name space starts to get full, but as you have said, even with only 6 characters, you have plenty of namespace to work with.
I have a variant of the problem, in that I store web pages from many different authors and need to prevent discovery of pages by guesswork. So my short URLs add a couple of extra digits to the Base-62 string for the page number. These extra digits are generated from information in the page record itself and they ensure that only 1 in 3844 URLs are valid (assuming 2-digit Base-62). You can see an outline description at http://mgscan.com/MBWL.
Very good answer, I have created a Golang implementation of the bjf:
package bjf
import (
"math"
"strings"
"strconv"
)
const alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
func Encode(num string) string {
n, _ := strconv.ParseUint(num, 10, 64)
t := make([]byte, 0)
/* Special case */
if n == 0 {
return string(alphabet[0])
}
/* Map */
for n > 0 {
r := n % uint64(len(alphabet))
t = append(t, alphabet[r])
n = n / uint64(len(alphabet))
}
/* Reverse */
for i, j := 0, len(t) - 1; i < j; i, j = i + 1, j - 1 {
t[i], t[j] = t[j], t[i]
}
return string(t)
}
func Decode(token string) int {
r := int(0)
p := float64(len(token)) - 1
for i := 0; i < len(token); i++ {
r += strings.Index(alphabet, string(token[i])) * int(math.Pow(float64(len(alphabet)), p))
p--
}
return r
}
Hosted at github: https://github.com/xor-gate/go-bjf
Implementation in Scala:
class Encoder(alphabet: String) extends (Long => String) {
val Base = alphabet.size
override def apply(number: Long) = {
def encode(current: Long): List[Int] = {
if (current == 0) Nil
else (current % Base).toInt :: encode(current / Base)
}
encode(number).reverse
.map(current => alphabet.charAt(current)).mkString
}
}
class Decoder(alphabet: String) extends (String => Long) {
val Base = alphabet.size
override def apply(string: String) = {
def decode(current: Long, encodedPart: String): Long = {
if (encodedPart.size == 0) current
else decode(current * Base + alphabet.indexOf(encodedPart.head),encodedPart.tail)
}
decode(0,string)
}
}
Test example with Scala test:
import org.scalatest.{FlatSpec, Matchers}
class DecoderAndEncoderTest extends FlatSpec with Matchers {
val Alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
"A number with base 10" should "be correctly encoded into base 62 string" in {
val encoder = new Encoder(Alphabet)
encoder(127) should be ("cd")
encoder(543513414) should be ("KWGPy")
}
"A base 62 string" should "be correctly decoded into a number with base 10" in {
val decoder = new Decoder(Alphabet)
decoder("cd") should be (127)
decoder("KWGPy") should be (543513414)
}
}
Function based in Xeoncross Class
function shortly($input){
$dictionary = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','0','1','2','3','4','5','6','7','8','9'];
if($input===0)
return $dictionary[0];
$base = count($dictionary);
if(is_numeric($input)){
$result = [];
while($input > 0){
$result[] = $dictionary[($input % $base)];
$input = floor($input / $base);
}
return join("", array_reverse($result));
}
$i = 0;
$input = str_split($input);
foreach($input as $char){
$pos = array_search($char, $dictionary);
$i = $i * $base + $pos;
}
return $i;
}
Here is a Node.js implementation that is likely to bit.ly. generate a highly random seven-character string.
It uses Node.js crypto to generate a highly random 25 charset rather than randomly selecting seven characters.
var crypto = require("crypto");
exports.shortURL = new function () {
this.getShortURL = function () {
var sURL = '',
_rand = crypto.randomBytes(25).toString('hex'),
_base = _rand.length;
for (var i = 0; i < 7; i++)
sURL += _rand.charAt(Math.floor(Math.random() * _rand.length));
return sURL;
};
}
My Python 3 version
base_list = list("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
base = len(base_list)
def encode(num: int):
result = []
if num == 0:
result.append(base_list[0])
while num > 0:
result.append(base_list[num % base])
num //= base
print("".join(reversed(result)))
def decode(code: str):
num = 0
code_list = list(code)
for index, code in enumerate(reversed(code_list)):
num += base_list.index(code) * base ** index
print(num)
if __name__ == '__main__':
encode(341413134141)
decode("60FoItT")
For a quality Node.js / JavaScript solution, see the id-shortener module, which is thoroughly tested and has been used in production for months.
It provides an efficient id / URL shortener backed by pluggable storage defaulting to Redis, and you can even customize your short id character set and whether or not shortening is idempotent. This is an important distinction that not all URL shorteners take into account.
In relation to other answers here, this module implements the Marcel Jackwerth's excellent accepted answer above.
The core of the solution is provided by the following Redis Lua snippet:
local sequence = redis.call('incr', KEYS[1])
local chars = '0123456789ABCDEFGHJKLMNPQRSTUVWXYZ_abcdefghijkmnopqrstuvwxyz'
local remaining = sequence
local slug = ''
while (remaining > 0) do
local d = (remaining % 60)
local character = string.sub(chars, d + 1, d + 1)
slug = character .. slug
remaining = (remaining - d) / 60
end
redis.call('hset', KEYS[2], slug, ARGV[1])
return slug
Why not just generate a random string and append it to the base URL? This is a very simplified version of doing this in C#.
static string chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
static string baseUrl = "https://google.com/";
private static string RandomString(int length)
{
char[] s = new char[length];
Random rnd = new Random();
for (int x = 0; x < length; x++)
{
s[x] = chars[rnd.Next(chars.Length)];
}
Thread.Sleep(10);
return new String(s);
}
Then just add the append the random string to the baseURL:
string tinyURL = baseUrl + RandomString(5);
Remember this is a very simplified version of doing this and it's possible the RandomString method could create duplicate strings. In production you would want to take in account for duplicate strings to ensure you will always have a unique URL. I have some code that takes account for duplicate strings by querying a database table I could share if anyone is interested.
This is my initial thoughts, and more thinking can be done, or some simulation can be made to see if it works well or any improvement is needed:
My answer is to remember the long URL in the database, and use the ID 0 to 9999999999999999 (or however large the number is needed).
But the ID 0 to 9999999999999999 can be an issue, because
it can be shorter if we use hexadecimal, or even base62 or base64. (base64 just like YouTube using A-Z a-z 0-9 _ and -)
if it increases from 0 to 9999999999999999 uniformly, then hackers can visit them in that order and know what URLs people are sending each other, so it can be a privacy issue
We can do this:
have one server allocate 0 to 999 to one server, Server A, so now Server A has 1000 of such IDs. So if there are 20 or 200 servers constantly wanting new IDs, it doesn't have to keep asking for each new ID, but rather asking once for 1000 IDs
for the ID 1, for example, reverse the bits. So 000...00000001 becomes 10000...000, so that when converted to base64, it will be non-uniformly increasing IDs each time.
use XOR to flip the bits for the final IDs. For example, XOR with 0xD5AA96...2373 (like a secret key), and the some bits will be flipped. (whenever the secret key has the 1 bit on, it will flip the bit of the ID). This will make the IDs even harder to guess and appear more random
Following this scheme, the single server that allocates the IDs can form the IDs, and so can the 20 or 200 servers requesting the allocation of IDs. The allocating server has to use a lock / semaphore to prevent two requesting servers from getting the same batch (or if it is accepting one connection at a time, this already solves the problem). So we don't want the line (queue) to be too long for waiting to get an allocation. So that's why allocating 1000 or 10000 at a time can solve the issue.

Store a playing cards deck in MySQL (single column)

I'm doing a game with playing cards and have to store shuffled decks in MySQL.
What is the most efficient way to store a deck of 52 cards in a single column? And save/retrieve those using PHP.
I need 6 bits to represent a number from 0 to 52 and thus thought of saving the deck as binary data but I've tried using PHP's pack function without much luck. My best shot is saving a string of 104 characters (52 zero-padded integers) but that's far from optimal.
Thanks.
I do agree that it's not necessary or rather impractical to do any of this, but for fun, why not ;)
From your question I'm not sure if you aim to have all cards encoded as one value and stored accordingly or whether you want to encode cards individually. So I assume the former.
Further I assume you have a set 52 cards (or items) that you represent with an integer value between 1 and 52.
I see some approaches, outlined as follows, not all actually for the better of using less space, but included for the sake of being complete:
using a comma separated list (CSV), total length of 9+2*42+51 = 144 characters
turning each card into a character ie a card is represented with 8 bits, total length of 52 characters
encoding each card with those necessary 6 bits and concatenating just the bits without the otherwise lost 2 bits (as in the 2nd approach), total length of 39 characters
treat the card-ids as coefficients in a polynomial of form p(cards) = cards(1)*52^51 + cards(2)*52^50 + cards(3)*52^49 + ... + cards(52)*52^0 which we use to identify the card-set. Roughly speaking p(cards) must lie in the value range of [0,52^52], which means that the value can be represented with log(52^52)/log(2) = 296.422865343.. bits or with a byte sequence of length 37.052 respectively 38.
There naturally are further approaches, taking into account mere practical, technical or theoretical aspects, as is also visible through the listed approaches.
For the theoretical approaches (which I consider the most interesting) it is helpful to know a bit about information theory and entropy. Essentially, depending on what is known about a problem, no further information is required, respectively only the information to clarify all remaining uncertainty is needed.
As we are working with bits and bytes, it mostly is interesting for us in terms of memory usage which practically speaking is bit- or byte-based (if you consider only bit-sequences without the underlying technologies and hardware); that is, bits represent one of two states, ergo allow the differentiation of two states. This is trivial but important, actually :/
then, if you want to represent N states in a base B, you will need log(N) / log(B) digits in that base, or in your example log(52) / log(2) = 5.70.. -> 6 bits. you will notice that actually only 5.70.. bits would be required, which means with 6 bits we actually have a loss.
Which is the moment the problem transformation comes in: instead of representing 52 states individually, the card set as a whole can be represented. The polynomial approach is a way to do this. Essentially it works as it assumes a base of 52, ie where the card set is represented as 1'4'29'31.... or mathematically speaking: 52 + 1 = 1'1 == 53 decimal, 1'1 + 1 = 1'2 == 54 decimal, 1'52'52 + 1 = 2'1'1 == 5408 decimal,
But if you further look at the polynomial-approach you will notice that there is a total of 52^52 possible values whereas we would only ever use 52! = 1*2*3*...*52 because once a card is fixed the remaining possibilities decrease, respectively the uncertainty or entropy decreases. (please note that 52! / 52^52 = 4.7257911e-22 ! which means the polynomial is a total waste of space).
If we now were to use a value in [1,52!] which is pretty much the theoretical minimum, we could represent the card set with log(52!) / log(2) = 225.581003124.. bits = 28.1976.. bytes. Problem with that is, that any of the values represented as such does not contain any structure from which we can derive its semantics, which means that for each of the 52! possible values (well 52! - 1, if you consider the principle of exclusion) we need a reference of its meaning, ie a lookup table of 52! values and that would certainly be a memory overkill.
Although we can make a compromise with the knowledge of the decreasing entropy of an encoded ordered set. As an example: we sequentially encode each card with the minimum number of bits required at that point in the sequence. So assume N<=52 cards remain, then in each step a card can be represented in log(N)/log(2) bits, meaning that the number of required bits decreases, until for the last card, you don't need a bit in the first place. This would give about (please correct)..
20 * 6 bits + 16 * 5 bits + 8 * 4 bits + 4 * 3 bits + 2 * 2 bits + 1 bit = 249 bits = 31.125.. bytes
But still there would be a loss because of the partial bits used unnecessarily, but the structure in the data totally makes up for that.
So a question might be, hej can we combine the polynomial with this??!?11?! Actually, I have to think about that, I'm getting tired.
Generally speaking, knowing about the structure of a problem drastically helps decreasing the necessary memory space. Practically speaking, in this day and age, for your average high-level developer such low level considerations are not so important (hej, 100kByte of wasted space, so what!) and other considerations are weighted higher; also because the underlying technologies are often reducing memory usage by themselves, be it your filesystem or gzip-ed web-server responses, etc. The general knowledge of these kind of things though is still helpful in creating your services and datastructures.
But these latter approaches are very problem-specific "compression procedures"; general compression works differently, where as an example approach the procedures sequencially run through the bytes of the data and for any unseen bit sequences add these to a lookup table and represent the actual sequence with an index (as a sketch).
Well enough of funny talk, let's get technical!
1st approach "csv"
// your unpacked card set
$cards = range(1,52);
$coded = implode(',',$cards);
$decoded = explode(',',$coded);
2nd approach: 1 card = 1 character
// just a helper
// (not really necessary, but using this we can pretty print the resulting string)
function chr2($i)
{
return chr($i + 32);
}
function myPack($cards)
{
$ar = array_map('chr2',$cards);
return implode('',$ar);
}
function myUnpack($str)
{
$set = array();
$len = strlen($str);
for($i=0; $i<$len; $i++)
$set[] = ord($str[$i]) - 32; // adjust this shift along with the helper
return $set;
}
$str = myPack($cards);
$other_cards = myUnpack($str);
3rd approach, 1 card = 6 bits
$set = ''; // target string
$offset = 0;
$carry = 0;
for($i=0; $i < 52; $i++)
{
$c = $cards[$i];
switch($offset)
{
case 0:
$carry = ($c << 2);
$next = null;
break;
case 2:
$next = $carry + $c;
$carry = 0;
break;
case 4:
$next = $carry + ($c>>2);
$carry = ($c << 6) & 0xff;
break;
case 6:
$next = $carry + ($c>>4);
$carry = ($c << 4) & 0xff;
break;
}
if ($next !== null)
{
$set .= chr($next);
}
$offset = ($offset + 6) % 8;
}
// and $set it is!
$new_cards = array(); // the target array for cards to be unpacked
$offset = 0;
$carry = 0;
for($i=0; $i < 39; $i++)
{
$o = ord(substr($set,$i,1));
$new = array();
switch($offset)
{
case 0:
$new[] = ($o >> 2) & 0x3f;
$carry = ($o << 4) & 0x3f;
break;
case 4:
$new[] = (($o >> 6) & 3) + $carry;
$new[] = $o & 0x3f;
$carry = 0;
$offset += 6;
break;
case 6:
$new[] = (($o >> 4) & 0xf) + $carry;
$carry = ($o & 0xf) << 2;
break;
}
$new_cards = array_merge($new_cards,$new);
$offset = ($offset + 6) % 8;
}
4th approach, the polynomial, just outlined (please consider using bigints because of the integer overflow)
$encoded = 0;
$base = 52;
foreach($cards as $c)
{
$encoded = $encoded*$base + $c;
}
// and now save the binary representation
$decoded = array();
for($i=0; $i < 52; $i++)
{
$v = $encoded % $base;
$encoded = ($encoded - $v) / $base;
array_shift($v, $decoded);
}

php How to make a proper hash function that will handle given strings

I want to create a hash function that will receive strings and output the corresponding value in an array that has predefined "proportions". For instance, if my array holds the values:
[0] => "output number 1"
[1] => "output number 2"
[2] => "output number 3"
Then the hash function int H(string) should return only values in the range 0 and 2 for any given string (an input string will always return the same key).
The thing is that i want it to also make judgement by predefined proportions so, for instance
85% of given strings will hash out as 0, 10% as 1 and 5% as 2. If there are functions that can emulate normal distribution that will be even better.
I also want it to be fast as it will run frequently. Can someone point me to the right direction on how to approach this in php? I believe I'm not the first one that asked this but I came short digging on SO for an hour.
EDIT:
What i did until now is built a hash function in c. It does the above hashing without proportions (still not comfortable with php):
int StringFcn (const void *key, size_t arrSize)
{
char *str = key;
int totalAsciiVal = 0;
while(*str)
{
totalAsciiVal += *str++;
}
return totalAsciiVal % arrSize;
}
What about doing something like this:
// Hash the string so you can pretty much guarantee it will have a number in it and it is relatively "random"
$hash = sha1($string);
// Iterate through string and get ASCII values
$chars = str_split($hash);
$num = 0;
foreach ($chars as $char) {
$num += ord($int);
}
// Get compare modulo 100 of the number
if ($num % 100 < 85) {
return 0;
}
if ($num % 100 < 95) {
return 1;
}
return 2;
Edit:
Instead of hashing with sha1, you can get a sufficiently large integer directly using crc32 (thanks to #nivrig in the comments).
// Convert string to integer
$num = crc32($string);
// Get compare modulo 100 of the number
if ($num % 100 < 85) {
return 0;
}
if ($num % 100 < 95) {
return 1;
}
return 2;

PHP: How to output list like this: AA, AB, AC, all the way to ZZZY, ZZZZ, ZZZZA etc

I'm trying to write a function that'll convert an integer to a string like this, but I can't figure out the logic... :(
1 = a
5 = e
27 = aa
28 = ab
etc...
Can anyone help? I'm really niffed that I can't wrap my head around how to write this... :(
Long list of them here:
/*
* Convert an integer to a string of uppercase letters (A-Z, AA-ZZ, AAA-ZZZ, etc.)
*/
function num2alpha($n)
{
for($r = ""; $n >= 0; $n = intval($n / 26) - 1)
$r = chr($n%26 + 0x41) . $r;
return $r;
}
/*
* Convert a string of uppercase letters to an integer.
*/
function alpha2num($a)
{
$l = strlen($a);
$n = 0;
for($i = 0; $i < $l; $i++)
$n = $n*26 + ord($a[$i]) - 0x40;
return $n-1;
}
I'll add this answer to sum up the comments regarding the misuse of base-26.
A common first reaction when confronted with this problem is to think "There are 26 letters, so this must be base-26! All I need to do is map each letter to its corresponding number".
But this is not base-26. It's easy to see why: there is no zero!
In base-26, the number twenty-six is the first number with two digits, and is written "10". In this counting system, twenty-six has a single digit, "Z", and the first two-digit number is twenty-seven.
But what if we make A=0, ..., Z=25? This way we have a zero and the first two-digit number becomes twenty-six. So far so good. How do we write twenty-six now? That's "AA". But... isn't A=0? Ooops! A = AA = AAA = "0" = "00" = "000".
You will have to use base_convert to convert your numbers to a 26 base:
base_convert(35, 10, 26);
That gives you the individual components in numbers from 1 - p, so 35 becomes 19 (1 * 26 + 9). Then you have to map the individual components to your desired set, so 1 => a, 9 => i, a => j, etc. and 19 becomes ai.
Well, you're pretty much converting from base 10 to base 26. Base 10 has digits 0-9, whereas base 26 can be expressed with "digits" A-Z. Conversion from base-10 is easy - see e.g. this: http://www.mathsisfun.com/base-conversion-method.html
Edit: actually, base-26 fails to account for multiple equivalent ways to write 0 ( 0 = 00 = 000).
void convert(int number)
{
string str = "";
while(number)
{
char ch;
ch = (number - 1) % 26 + 65;
str = ch + str;
number = (number-1) / 26;
}
cout << str << endl;
}

Categories