Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 1 year ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to "http://www.example.org/abcdef".
Instead of "abcdef" there can be any other string with six characters containing a-z, A-Z and 0-9. That makes 56~57 billion possible strings.
My approach:
I have a database table with three columns:
id, integer, auto-increment
long, string, the long URL the user entered
short, string, the shortened URL (or just the six characters)
I would then insert the long URL into the table. Then I would select the auto-increment value for "id" and build a hash of it. This hash should then be inserted as "short". But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don't use these algorithms, I think. A self-built algorithm will work, too.
My idea:
For "http://www.google.de/" I get the auto-increment id 239472. Then I do the following steps:
short = '';
if divisible by 2, add "a"+the result to short
if divisible by 3, add "b"+the result to short
... until I have divisors for a-z and A-Z.
That could be repeated until the number isn't divisible any more. Do you think this is a good approach? Do you have a better idea?
Due to the ongoing interest in this topic, I've published an efficient solution to GitHub, with implementations for JavaScript, PHP, Python and Java. Add your solutions if you like :)
I would continue your "convert number to string" approach. However, you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.
Theoretical background
You need a Bijective Function f. This is necessary so that you can find a inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:
There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
and for every y you must be able to find an x so that f(x) = y.
How to convert the ID to a shortened URL
Think of an alphabet we want to use. In your case, that's [a-zA-Z0-9]. It contains 62 letters.
Take an auto-generated, unique numerical key (the auto-incremented id of a MySQL table for example).
For this example, I will use 12510 (125 with a base of 10).
Now you have to convert 12510 to X62 (base 62).
12510 = 2×621 + 1×620 = [2,1]
This requires the use of integer division and modulo. A pseudo-code example:
digits = []
while num > 0
remainder = modulo(num, 62)
digits.push(remainder)
num = divide(num, 62)
digits = digits.reverse
Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:
0 → a
1 → b
...
25 → z
...
52 → 0
61 → 9
With 2 → c and 1 → b, you will receive cb62 as the shortened URL.
http://shor.ty/cb
How to resolve a shortened URL to the initial ID
The reverse is even easier. You just do a reverse lookup in your alphabet.
e9a62 will be resolved to "4th, 61st, and 0th letter in the alphabet".
e9a62 = [4,61,0] = 4×622 + 61×621 + 0×620 = 1915810
Now find your database-record with WHERE id = 19158 and do the redirect.
Example implementations (provided by commenters)
C++
Python
Ruby
Haskell
C#
CoffeeScript
Perl
Why would you want to use a hash?
You can just use a simple translation of your auto-increment value to an alphanumeric value. You can do that easily by using some base conversion. Say you character space (A-Z, a-z, 0-9, etc.) has 62 characters, convert the id to a base-40 number and use the characters as the digits.
public class UrlShortener {
private static final String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private static final int BASE = ALPHABET.length();
public static String encode(int num) {
StringBuilder sb = new StringBuilder();
while ( num > 0 ) {
sb.append( ALPHABET.charAt( num % BASE ) );
num /= BASE;
}
return sb.reverse().toString();
}
public static int decode(String str) {
int num = 0;
for ( int i = 0; i < str.length(); i++ )
num = num * BASE + ALPHABET.indexOf(str.charAt(i));
return num;
}
}
Not an answer to your question, but I wouldn't use case-sensitive shortened URLs. They are hard to remember, usually unreadable (many fonts render 1 and l, 0 and O and other characters very very similar that they are near impossible to tell the difference) and downright error prone. Try to use lower or upper case only.
Also, try to have a format where you mix the numbers and characters in a predefined form. There are studies that show that people tend to remember one form better than others (think phone numbers, where the numbers are grouped in a specific form). Try something like num-char-char-num-char-char. I know this will lower the combinations, especially if you don't have upper and lower case, but it would be more usable and therefore useful.
My approach: Take the Database ID, then Base36 Encode it. I would NOT use both Upper AND Lowercase letters, because that makes transmitting those URLs over the telephone a nightmare, but you could of course easily extend the function to be a base 62 en/decoder.
Here is my PHP 5 class.
<?php
class Bijective
{
public $dictionary = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
public function __construct()
{
$this->dictionary = str_split($this->dictionary);
}
public function encode($i)
{
if ($i == 0)
return $this->dictionary[0];
$result = '';
$base = count($this->dictionary);
while ($i > 0)
{
$result[] = $this->dictionary[($i % $base)];
$i = floor($i / $base);
}
$result = array_reverse($result);
return join("", $result);
}
public function decode($input)
{
$i = 0;
$base = count($this->dictionary);
$input = str_split($input);
foreach($input as $char)
{
$pos = array_search($char, $this->dictionary);
$i = $i * $base + $pos;
}
return $i;
}
}
A Node.js and MongoDB solution
Since we know the format that MongoDB uses to create a new ObjectId with 12 bytes.
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id
a 3-byte counter (in your machine), starting with a random value.
Example (I choose a random sequence)
a1b2c3d4e5f6g7h8i9j1k2l3
a1b2c3d4 represents the seconds since the Unix epoch,
4e5f6g7 represents machine identifier,
h8i9 represents process id
j1k2l3 represents the counter, starting with a random value.
Since the counter will be unique if we are storing the data in the same machine we can get it with no doubts that it will be duplicate.
So the short URL will be the counter and here is a code snippet assuming that your server is running properly.
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
// Create a schema
const shortUrl = new Schema({
long_url: { type: String, required: true },
short_url: { type: String, required: true, unique: true },
});
const ShortUrl = mongoose.model('ShortUrl', shortUrl);
// The user can request to get a short URL by providing a long URL using a form
app.post('/shorten', function(req ,res){
// Create a new shortUrl */
// The submit form has an input with longURL as its name attribute.
const longUrl = req.body["longURL"];
const newUrl = ShortUrl({
long_url : longUrl,
short_url : "",
});
const shortUrl = newUrl._id.toString().slice(-6);
newUrl.short_url = shortUrl;
console.log(newUrl);
newUrl.save(function(err){
console.log("the new URL is added");
})
});
I keep incrementing an integer sequence per domain in the database and use Hashids to encode the integer into a URL path.
static hashids = Hashids(salt = "my app rocks", minSize = 6)
I ran a script to see how long it takes until it exhausts the character length. For six characters it can do 164,916,224 links and then goes up to seven characters. Bitly uses seven characters. Under five characters looks weird to me.
Hashids can decode the URL path back to a integer but a simpler solution is to use the entire short link sho.rt/ka8ds3 as a primary key.
Here is the full concept:
function addDomain(domain) {
table("domains").insert("domain", domain, "seq", 0)
}
function addURL(domain, longURL) {
seq = table("domains").where("domain = ?", domain).increment("seq")
shortURL = domain + "/" + hashids.encode(seq)
table("links").insert("short", shortURL, "long", longURL)
return shortURL
}
// GET /:hashcode
function handleRequest(req, res) {
shortURL = req.host + "/" + req.param("hashcode")
longURL = table("links").where("short = ?", shortURL).get("long")
res.redirect(301, longURL)
}
C# version:
public class UrlShortener
{
private static String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private static int BASE = 62;
public static String encode(int num)
{
StringBuilder sb = new StringBuilder();
while ( num > 0 )
{
sb.Append( ALPHABET[( num % BASE )] );
num /= BASE;
}
StringBuilder builder = new StringBuilder();
for (int i = sb.Length - 1; i >= 0; i--)
{
builder.Append(sb[i]);
}
return builder.ToString();
}
public static int decode(String str)
{
int num = 0;
for ( int i = 0, len = str.Length; i < len; i++ )
{
num = num * BASE + ALPHABET.IndexOf( str[(i)] );
}
return num;
}
}
You could hash the entire URL, but if you just want to shorten the id, do as marcel suggested. I wrote this Python implementation:
https://gist.github.com/778542
Take a look at https://hashids.org/ it is open source and in many languages.
Their page outlines some of the pitfalls of other approaches.
If you don't want re-invent the wheel ... http://lilurl.sourceforge.net/
// simple approach
$original_id = 56789;
$shortened_id = base_convert($original_id, 10, 36);
$un_shortened_id = base_convert($shortened_id, 36, 10);
alphabet = map(chr, range(97,123)+range(65,91)) + map(str,range(0,10))
def lookup(k, a=alphabet):
if type(k) == int:
return a[k]
elif type(k) == str:
return a.index(k)
def encode(i, a=alphabet):
'''Takes an integer and returns it in the given base with mappings for upper/lower case letters and numbers 0-9.'''
try:
i = int(i)
except Exception:
raise TypeError("Input must be an integer.")
def incode(i=i, p=1, a=a):
# Here to protect p.
if i <= 61:
return lookup(i)
else:
pval = pow(62,p)
nval = i/pval
remainder = i % pval
if nval <= 61:
return lookup(nval) + incode(i % pval)
else:
return incode(i, p+1)
return incode()
def decode(s, a=alphabet):
'''Takes a base 62 string in our alphabet and returns it in base10.'''
try:
s = str(s)
except Exception:
raise TypeError("Input must be a string.")
return sum([lookup(i) * pow(62,p) for p,i in enumerate(list(reversed(s)))])a
Here's my version for whomever needs it.
Why not just translate your id to a string? You just need a function that maps a digit between, say, 0 and 61 to a single letter (upper/lower case) or digit. Then apply this to create, say, 4-letter codes, and you've got 14.7 million URLs covered.
Here is a decent URL encoding function for PHP...
// From http://snipplr.com/view/22246/base62-encode--decode/
private function base_encode($val, $base=62, $chars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ') {
$str = '';
do {
$i = fmod($val, $base);
$str = $chars[$i] . $str;
$val = ($val - $i) / $base;
} while($val > 0);
return $str;
}
Don't know if anyone will find this useful - it is more of a 'hack n slash' method, yet is simple and works nicely if you want only specific chars.
$dictionary = "abcdfghjklmnpqrstvwxyz23456789";
$dictionary = str_split($dictionary);
// Encode
$str_id = '';
$base = count($dictionary);
while($id > 0) {
$rem = $id % $base;
$id = ($id - $rem) / $base;
$str_id .= $dictionary[$rem];
}
// Decode
$id_ar = str_split($str_id);
$id = 0;
for($i = count($id_ar); $i > 0; $i--) {
$id += array_search($id_ar[$i-1], $dictionary) * pow($base, $i - 1);
}
Did you omit O, 0, and i on purpose?
I just created a PHP class based on Ryan's solution.
<?php
$shorty = new App_Shorty();
echo 'ID: ' . 1000;
echo '<br/> Short link: ' . $shorty->encode(1000);
echo '<br/> Decoded Short Link: ' . $shorty->decode($shorty->encode(1000));
/**
* A nice shorting class based on Ryan Charmley's suggestion see the link on Stack Overflow below.
* #author Svetoslav Marinov (Slavi) | http://WebWeb.ca
* #see http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener/10386945#10386945
*/
class App_Shorty {
/**
* Explicitly omitted: i, o, 1, 0 because they are confusing. Also use only lowercase ... as
* dictating this over the phone might be tough.
* #var string
*/
private $dictionary = "abcdfghjklmnpqrstvwxyz23456789";
private $dictionary_array = array();
public function __construct() {
$this->dictionary_array = str_split($this->dictionary);
}
/**
* Gets ID and converts it into a string.
* #param int $id
*/
public function encode($id) {
$str_id = '';
$base = count($this->dictionary_array);
while ($id > 0) {
$rem = $id % $base;
$id = ($id - $rem) / $base;
$str_id .= $this->dictionary_array[$rem];
}
return $str_id;
}
/**
* Converts /abc into an integer ID
* #param string
* #return int $id
*/
public function decode($str_id) {
$id = 0;
$id_ar = str_split($str_id);
$base = count($this->dictionary_array);
for ($i = count($id_ar); $i > 0; $i--) {
$id += array_search($id_ar[$i - 1], $this->dictionary_array) * pow($base, $i - 1);
}
return $id;
}
}
?>
public class TinyUrl {
private final String characterMap = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private final int charBase = characterMap.length();
public String covertToCharacter(int num){
StringBuilder sb = new StringBuilder();
while (num > 0){
sb.append(characterMap.charAt(num % charBase));
num /= charBase;
}
return sb.reverse().toString();
}
public int covertToInteger(String str){
int num = 0;
for(int i = 0 ; i< str.length(); i++)
num += characterMap.indexOf(str.charAt(i)) * Math.pow(charBase , (str.length() - (i + 1)));
return num;
}
}
class TinyUrlTest{
public static void main(String[] args) {
TinyUrl tinyUrl = new TinyUrl();
int num = 122312215;
String url = tinyUrl.covertToCharacter(num);
System.out.println("Tiny url: " + url);
System.out.println("Id: " + tinyUrl.covertToInteger(url));
}
}
This is what I use:
# Generate a [0-9a-zA-Z] string
ALPHABET = map(str,range(0, 10)) + map(chr, range(97, 123) + range(65, 91))
def encode_id(id_number, alphabet=ALPHABET):
"""Convert an integer to a string."""
if id_number == 0:
return alphabet[0]
alphabet_len = len(alphabet) # Cache
result = ''
while id_number > 0:
id_number, mod = divmod(id_number, alphabet_len)
result = alphabet[mod] + result
return result
def decode_id(id_string, alphabet=ALPHABET):
"""Convert a string to an integer."""
alphabet_len = len(alphabet) # Cache
return sum([alphabet.index(char) * pow(alphabet_len, power) for power, char in enumerate(reversed(id_string))])
It's very fast and can take long integers.
For a similar project, to get a new key, I make a wrapper function around a random string generator that calls the generator until I get a string that hasn't already been used in my hashtable. This method will slow down once your name space starts to get full, but as you have said, even with only 6 characters, you have plenty of namespace to work with.
I have a variant of the problem, in that I store web pages from many different authors and need to prevent discovery of pages by guesswork. So my short URLs add a couple of extra digits to the Base-62 string for the page number. These extra digits are generated from information in the page record itself and they ensure that only 1 in 3844 URLs are valid (assuming 2-digit Base-62). You can see an outline description at http://mgscan.com/MBWL.
Very good answer, I have created a Golang implementation of the bjf:
package bjf
import (
"math"
"strings"
"strconv"
)
const alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
func Encode(num string) string {
n, _ := strconv.ParseUint(num, 10, 64)
t := make([]byte, 0)
/* Special case */
if n == 0 {
return string(alphabet[0])
}
/* Map */
for n > 0 {
r := n % uint64(len(alphabet))
t = append(t, alphabet[r])
n = n / uint64(len(alphabet))
}
/* Reverse */
for i, j := 0, len(t) - 1; i < j; i, j = i + 1, j - 1 {
t[i], t[j] = t[j], t[i]
}
return string(t)
}
func Decode(token string) int {
r := int(0)
p := float64(len(token)) - 1
for i := 0; i < len(token); i++ {
r += strings.Index(alphabet, string(token[i])) * int(math.Pow(float64(len(alphabet)), p))
p--
}
return r
}
Hosted at github: https://github.com/xor-gate/go-bjf
Implementation in Scala:
class Encoder(alphabet: String) extends (Long => String) {
val Base = alphabet.size
override def apply(number: Long) = {
def encode(current: Long): List[Int] = {
if (current == 0) Nil
else (current % Base).toInt :: encode(current / Base)
}
encode(number).reverse
.map(current => alphabet.charAt(current)).mkString
}
}
class Decoder(alphabet: String) extends (String => Long) {
val Base = alphabet.size
override def apply(string: String) = {
def decode(current: Long, encodedPart: String): Long = {
if (encodedPart.size == 0) current
else decode(current * Base + alphabet.indexOf(encodedPart.head),encodedPart.tail)
}
decode(0,string)
}
}
Test example with Scala test:
import org.scalatest.{FlatSpec, Matchers}
class DecoderAndEncoderTest extends FlatSpec with Matchers {
val Alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
"A number with base 10" should "be correctly encoded into base 62 string" in {
val encoder = new Encoder(Alphabet)
encoder(127) should be ("cd")
encoder(543513414) should be ("KWGPy")
}
"A base 62 string" should "be correctly decoded into a number with base 10" in {
val decoder = new Decoder(Alphabet)
decoder("cd") should be (127)
decoder("KWGPy") should be (543513414)
}
}
Function based in Xeoncross Class
function shortly($input){
$dictionary = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','0','1','2','3','4','5','6','7','8','9'];
if($input===0)
return $dictionary[0];
$base = count($dictionary);
if(is_numeric($input)){
$result = [];
while($input > 0){
$result[] = $dictionary[($input % $base)];
$input = floor($input / $base);
}
return join("", array_reverse($result));
}
$i = 0;
$input = str_split($input);
foreach($input as $char){
$pos = array_search($char, $dictionary);
$i = $i * $base + $pos;
}
return $i;
}
Here is a Node.js implementation that is likely to bit.ly. generate a highly random seven-character string.
It uses Node.js crypto to generate a highly random 25 charset rather than randomly selecting seven characters.
var crypto = require("crypto");
exports.shortURL = new function () {
this.getShortURL = function () {
var sURL = '',
_rand = crypto.randomBytes(25).toString('hex'),
_base = _rand.length;
for (var i = 0; i < 7; i++)
sURL += _rand.charAt(Math.floor(Math.random() * _rand.length));
return sURL;
};
}
My Python 3 version
base_list = list("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
base = len(base_list)
def encode(num: int):
result = []
if num == 0:
result.append(base_list[0])
while num > 0:
result.append(base_list[num % base])
num //= base
print("".join(reversed(result)))
def decode(code: str):
num = 0
code_list = list(code)
for index, code in enumerate(reversed(code_list)):
num += base_list.index(code) * base ** index
print(num)
if __name__ == '__main__':
encode(341413134141)
decode("60FoItT")
For a quality Node.js / JavaScript solution, see the id-shortener module, which is thoroughly tested and has been used in production for months.
It provides an efficient id / URL shortener backed by pluggable storage defaulting to Redis, and you can even customize your short id character set and whether or not shortening is idempotent. This is an important distinction that not all URL shorteners take into account.
In relation to other answers here, this module implements the Marcel Jackwerth's excellent accepted answer above.
The core of the solution is provided by the following Redis Lua snippet:
local sequence = redis.call('incr', KEYS[1])
local chars = '0123456789ABCDEFGHJKLMNPQRSTUVWXYZ_abcdefghijkmnopqrstuvwxyz'
local remaining = sequence
local slug = ''
while (remaining > 0) do
local d = (remaining % 60)
local character = string.sub(chars, d + 1, d + 1)
slug = character .. slug
remaining = (remaining - d) / 60
end
redis.call('hset', KEYS[2], slug, ARGV[1])
return slug
Why not just generate a random string and append it to the base URL? This is a very simplified version of doing this in C#.
static string chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
static string baseUrl = "https://google.com/";
private static string RandomString(int length)
{
char[] s = new char[length];
Random rnd = new Random();
for (int x = 0; x < length; x++)
{
s[x] = chars[rnd.Next(chars.Length)];
}
Thread.Sleep(10);
return new String(s);
}
Then just add the append the random string to the baseURL:
string tinyURL = baseUrl + RandomString(5);
Remember this is a very simplified version of doing this and it's possible the RandomString method could create duplicate strings. In production you would want to take in account for duplicate strings to ensure you will always have a unique URL. I have some code that takes account for duplicate strings by querying a database table I could share if anyone is interested.
This is my initial thoughts, and more thinking can be done, or some simulation can be made to see if it works well or any improvement is needed:
My answer is to remember the long URL in the database, and use the ID 0 to 9999999999999999 (or however large the number is needed).
But the ID 0 to 9999999999999999 can be an issue, because
it can be shorter if we use hexadecimal, or even base62 or base64. (base64 just like YouTube using A-Z a-z 0-9 _ and -)
if it increases from 0 to 9999999999999999 uniformly, then hackers can visit them in that order and know what URLs people are sending each other, so it can be a privacy issue
We can do this:
have one server allocate 0 to 999 to one server, Server A, so now Server A has 1000 of such IDs. So if there are 20 or 200 servers constantly wanting new IDs, it doesn't have to keep asking for each new ID, but rather asking once for 1000 IDs
for the ID 1, for example, reverse the bits. So 000...00000001 becomes 10000...000, so that when converted to base64, it will be non-uniformly increasing IDs each time.
use XOR to flip the bits for the final IDs. For example, XOR with 0xD5AA96...2373 (like a secret key), and the some bits will be flipped. (whenever the secret key has the 1 bit on, it will flip the bit of the ID). This will make the IDs even harder to guess and appear more random
Following this scheme, the single server that allocates the IDs can form the IDs, and so can the 20 or 200 servers requesting the allocation of IDs. The allocating server has to use a lock / semaphore to prevent two requesting servers from getting the same batch (or if it is accepting one connection at a time, this already solves the problem). So we don't want the line (queue) to be too long for waiting to get an allocation. So that's why allocating 1000 or 10000 at a time can solve the issue.
I wonder if is there a good way to get the number of digits in right/left side of a decimal number PHP. For example:
12345.789 -> RIGHT SIDE LENGTH IS 3 / LEFT SIDE LENGTH IS 5
I know it is readily attainable by helping string functions and exploding the number. I mean is there a mathematically or programmatically way to perform it better than string manipulations.
Your answers would be greatly appreciated.
Update
The best solution for left side till now was:
$left = floor(log10($x))+1;
but still no sufficient for right side.
Still waiting ...
To get the digits on the left side you can do this:
$left = floor(log10($x))+1;
This uses the base 10 logarithm to get the number of digits.
The right side is harder. A simple approach would look like this, but due to floating point numbers, it would often fail:
$decimal = $x - floor($x);
$right = 0;
while (floor($decimal) != $decimal) {
$right++;
$decimal *= 10; //will bring in floating point 'noise' over time
}
This will loop through multiplying by 10 until there are no digits past the decimal. That is tested with floor($decimal) != $decimal.
However, as Ali points out, giving it the number 155.11 (a hard to represent digit in binary) results in a answer of 14. This is because as the number is stored as something like 155.11000000000001 with the 32 bits of floating precision we have.
So instead, a more robust solution is needed. (PoPoFibo's solutions above is particularly elegant, and uses PHPs inherit float comparison functions well).
The fact is, we can never distinguish between input of 155.11 and 155.11000000000001. We will never know which number was originally given. They will both be represented the same. However, if we define the number of zeroes that we can see in a row before we just decide the decimal is 'done' than we can come up with a solution:
$x = 155.11; //the number we are testing
$LIMIT = 10; //number of zeroes in a row until we say 'enough'
$right = 0; //number of digits we've checked
$empty = 0; //number of zeroes we've seen in a row
while (floor($x) != $x) {
$right++;
$base = floor($x); //so we can see what the next digit is;
$x *= 10;
$base *= 10;
$digit = floor($x) - $base; //the digit we are dealing with
if ($digit == 0) {
$empty += 1;
if ($empty == $LIMIT) {
$right -= $empty; //don't count all those zeroes
break; // exit the loop, we're done
}
} else {
$zeros = 0;
}
}
This should find the solution given the reasonable assumption that 10 zeroes in a row means any other digits just don't matter.
However, I still like PopoFibo's solution better, as without any multiplication, PHPs default comparison functions effectively do the same thing, without the messiness.
I am lost on PHP semantics big time but I guess the following would serve your purpose without the String usage (that is at least how I would do in Java but hopefully cleaner):
Working code here: http://ideone.com/7BnsR3
Non-string solution (only Math)
Left side is resolved hence taking the cue from your question update:
$value = 12343525.34541;
$left = floor(log10($value))+1;
echo($left);
$num = floatval($value);
$right = 0;
while($num != round($num, $right)) {
$right++;
}
echo($right);
Prints
85
8 for the LHS and 5 for the RHS.
Since I'm taking a floatval that would make 155.0 as 0 RHS which I think is valid and can be resolved by String functions.
php > $num = 12345.789;
php > $left = strlen(floor($num));
php > $right = strlen($num - floor($num));
php > echo "$left / $right\n";
5 / 16 <--- 16 digits, huh?
php > $parts = explode('.', $num);
php > var_dump($parts);
array(2) {
[0]=>
string(5) "12345"
[1]=>
string(3) "789"
As you can see, floats aren't the easiest to deal with... Doing it "mathematically" leads to bad results. Doing it by strings works, but makes you feel dirty.
$number = 12345.789;
list($whole, $fraction) = sscanf($number, "%d.%d");
This will always work, even if $number is an integer and you’ll get two real integers returned. Length is best done with strlen() even for integer values. The proposed log10() approach won't work for 10, 100, 1000, … as you might expect.
// 5 - 3
echo strlen($whole) , " - " , strlen($fraction);
If you really, really want to get the length without calling any string function here you go. But it's totally not efficient at all compared to strlen().
/**
* Get integer length.
*
* #param integer $integer
* The integer to count.
* #param boolean $count_zero [optional]
* Whether 0 is to be counted or not, defaults to FALSE.
* #return integer
* The integer's length.
*/
function get_int_length($integer, $count_zero = false) {
// 0 would be 1 in string mode! Highly depends on use case.
if ($count_zero === false && $integer === 0) {
return 0;
}
return floor(log10(abs($integer))) + 1;
}
// 5 - 3
echo get_int_length($whole) , " - " , get_int_length($fraction);
The above will correctly count the result of 1 / 3, but be aware that the precision is important.
$number = 1 / 3;
// Above code outputs
// string : 1 - 10
// math : 0 - 10
$number = bcdiv(1, 3);
// Above code outputs
// string : 1 - 0 <-- oops
// math : 0 - INF <-- 8-)
No problem there.
I would like to apply a simple logic.
<?php
$num=12345.789;
$num_str="".$num; // Converting number to string
$array=explode('.',$num_str); //Explode number (String) with .
echo "Left side length : ".intval(strlen($array[0])); // $array[0] contains left hand side then check the string length
echo "<br>";
if(sizeof($array)>1)
{
echo "Left side length : ".intval(strlen($array[1]));// $array[1] contains left hand check the string length side
}
?>
I have a system of equations of grade 1 to resolve in PHP.
There are more equations than variables but there aren't less equations than variables.
The system would look like bellow. n equations, m variables, variables are x[i] where 'i' takes values from 1 to m. The system may have a solution or not.
m may be maximum 100 and n maximum ~5000 (thousands).
I will have to resolve like a few thousands of these systems of equations. Speed may be a problem but I'm looking for an algorithm written in PHP for now.
a[1][1] * x[1] + a[1][2] * x[2] + ... + a[1][m] * x[m] = number 1
a[2][1] * x[1] + a[2][2] * x[2] + ... + a[2][m] * x[m] = number 2
...
a[n][1] * x[1] + a[n][2] * x[2] + ... + a[n][m] * x[m] = number n
There is Cramer Rule which may do it. I could make 1 square matrix of coefficients, resolve the system with Cramer Rule (by calculating matrices' determinants) and than I should check the values in the unused equations.
I believe I could try Cramer by myself but I'm looking for a better solution.
This is a problem of Computational Science,
http://en.wikipedia.org/wiki/Computational_science#Numerical_simulations
I know there are some complex algorithms to solve my problem but I can't tell which one would do it and which is the best for my case. An algorithm would use me better than just the theory with the demonstration.
My question is, does anybody know a class, script, code of some sort written in PHP to resolve a system of linear equations of grade 1 ?
Alternatively I could try an API or a Web Service, best to be free, a paid one would do it too.
Thank you
I needed exactly this, but I couldn't find determinant function, so I made one myself. And the Cramer rule function too. Maybe it'll help someone.
/**
* $matrix must be 2-dimensional n x n array in following format
* $matrix = array(array(1,2,3),array(1,2,3),array(1,2,3))
*/
function determinant($matrix = array()) {
// dimension control - n x n
foreach ($matrix as $row) {
if (sizeof($matrix) != sizeof($row)) {
return false;
}
}
// count 1x1 and 2x2 manually - rest by recursive function
$dimension = sizeof($matrix);
if ($dimension == 1) {
return $matrix[0][0];
}
if ($dimension == 2) {
return ($matrix[0][0] * $matrix[1][1] - $matrix[0][1] * $matrix[1][0]);
}
// cycles for submatrixes calculations
$sum = 0;
for ($i = 0; $i < $dimension; $i++) {
// for each "$i", you will create a smaller matrix based on the original matrix
// by removing the first row and the "i"th column.
$smallMatrix = array();
for ($j = 0; $j < $dimension - 1; $j++) {
$smallMatrix[$j] = array();
for ($k = 0; $k < $dimension; $k++) {
if ($k < $i) $smallMatrix[$j][$k] = $matrix[$j + 1][$k];
if ($k > $i) $smallMatrix[$j][$k - 1] = $matrix[$j + 1][$k];
}
}
// after creating the smaller matrix, multiply the "i"th element in the first
// row by the determinant of the smaller matrix.
// odd position is plus, even is minus - the index from 0 so it's oppositely
if ($i % 2 == 0){
$sum += $matrix[0][$i] * determinant($smallMatrix);
} else {
$sum -= $matrix[0][$i] * determinant($smallMatrix);
}
}
return $sum;
}
/**
* left side of equations - parameters:
* $leftMatrix must be 2-dimensional n x n array in following format
* $leftMatrix = array(array(1,2,3),array(1,2,3),array(1,2,3))
* right side of equations - results:
* $rightMatrix must be in format
* $rightMatrix = array(1,2,3);
*/
function equationSystem($leftMatrix = array(), $rightMatrix = array()) {
// matrixes and dimension check
if (!is_array($leftMatrix) || !is_array($rightMatrix)) {
return false;
}
if (sizeof($leftMatrix) != sizeof($rightMatrix)) {
return false;
}
$M = determinant($leftMatrix);
if (!$M) {
return false;
}
$x = array();
foreach ($rightMatrix as $rk => $rv) {
$xMatrix = $leftMatrix;
foreach ($rightMatrix as $rMk => $rMv) {
$xMatrix[$rMk][$rk] = $rMv;
}
$x[$rk] = determinant($xMatrix) / $M;
}
return $x;
}
Wikipedia should have pseudocode for reducing the matrix representing your equations to reduced row echelon form. Once the matrix is in that form, you can walk through the rows to find a solution.
There's an unmaintained PEAR package which may save you the effort of writing the code.
Another question is whether you are looking mostly at "wide" systems (more variables than equations, which usually have many possible solutions) or "narrow" systems (more equations than variables, which usually have no solutions), since the best strategy depends on which case you are in — and narrow systems may benefit from using a linear regression technique such as least squares instead.
This package uses Gaussian Elimination. I found that it executes fast for larger matrices (i.e. more variables/equations).
There is a truly excellent package based on JAMA here: http://www.phpmath.com/build02/JAMA/docs/index.php
I've used it for simple linear right the way to highly complex Multiple Linear Regression (writing my own Backwards Stepwise MLR functions on top of that). Very comprehensive and will hopefully do what you need.
Speed could be considered an issue, for sure. But works a treat and matched SPSS when I cross referenced results on the BSMLR calculations.
I have these possible bit flags.
1, 2, 4, 8, 16, 64, 128, 256, 512, 2048, 4096, 16384, 32768, 65536
So each number is like a true/false statement on the server side. So if the first 3 items, and only the first 3 items are marked "true" on the server side, the web service will return a 7. Or if all 14 items above are true, I would still get a single number back from the web service which is is the sum of all those numbers.
What is the best way to handle the number I get back to find out which items are marked as "true"?
Use a bit masking operator. In the C language:
X & 8
is true, if the "8"s bit is set.
You can enumerate the bit masks, and count how many are set.
If it really is the case that the entire word contains bits, and you want to simply
compute how many bits are set, you want in essence a "population count". The absolute
fastest way to get a population count is to execute a native "popcnt" usually
available in your machine's instruction set.
If you don't care about space, you can set up a array countedbits[...] indexed by your value with precomputed bit counts. Then a single memory access computes your bit count.
Often used is just plain "bit twiddling code" that computes bit counts:
(Kernigan's method):
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}
(parallel bit summming, 32 bits)
v = v - ((v >> 1) & 0x55555555); // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
If you haven't seen the bit twiddling hacks before, you're in for a treat.
PHP, being funny, may do funny things with some of this arithmetic.
if (7 & 1) { // if bit 1 is set in returned number (7)
}
Thought the question is old might help someone else. I am putting the numbers in binary as its clearer to understand. The code had not been tested but hope the logic is clear. The code is PHP specific.
define('FLAG_A', 0b10000000000000);
define('FLAG_B', 0b01000000000000);
define('FLAG_C', 0b00100000000000);
define('FLAG_D', 0b00010000000000);
define('FLAG_E', 0b00001000000000);
define('FLAG_F', 0b00000100000000);
define('FLAG_G', 0b00000010000000);
define('FLAG_H', 0b00000001000000);
define('FLAG_I', 0b00000000100000);
define('FLAG_J', 0b00000000010000);
define('FLAG_K', 0b00000000001000);
define('FLAG_L', 0b00000000000100);
define('FLAG_M', 0b00000000000010);
define('FLAG_N', 0b00000000000001);
function isFlagSet($Flag,$Setting,$All=false){
$setFlags = $Flag & $Setting;
if($setFlags and !$All) // at least one of the flags passed is set
return true;
else if($All and ($setFlags == $Flag)) // to check that all flags are set
return true;
else
return false;
}
Usage:
if(isFlagSet(FLAG_A,someSettingsVariable)) // eg: someSettingsVariable = 0b01100000000010
if(isFlagSet(FLAG_A | FLAG_F | FLAG_L,someSettingsVariable)) // to check if atleast one flag is set
if(isFlagSet(FLAG_A | FLAG_J | FLAG_M | FLAG_D,someSettingsVariable, TRUE)) // to check if all flags are set
One way would be to loop through your number, left-shifting it (ie divide by 2) and compare the first bit with 1 using the & operand.
As there is no definite answer with php code, I add this working example:
// returns array of numbers, so for 7 returns array(1,2,4), etc..
function get_bits($decimal) {
$scan = 1;
$result = array();
while ($decimal >= $scan){
if ($decimal & $scan) $result[] = $scan;
$scan<<=1;
}
return $result;
}
I am working on a user-role / permission system in PHP for a script.
Below is a code using a bitmask method for permissions that I found on phpbuilder.com.
Below that part is a much simpler version w3hich could do basicly the same thing without the bit part.
Many people have recommended using bit operators and such for settings and other things in PHP, I have never understood why though. In the code below is there ANY benefit from using the first code instead of the second?
<?php
/**
* Correct the variables stored in array.
* #param integer $mask Integer of the bit
* #return array
*/
function bitMask($mask = 0) {
$return = array();
while ($mask > 0) {
for($i = 0, $n = 0; $i <= $mask; $i = 1 * pow(2, $n), $n++) {
$end = $i;
}
$return[] = $end;
$mask = $mask - $end;
}
sort($return);
return $return;
}
define('PERMISSION_DENIED', 0);
define('PERMISSION_READ', 1);
define('PERMISSION_ADD', 2);
define('PERMISSION_UPDATE', 4);
define('PERMISSION_DELETE', 8);
//run function
// this value would be pulled from a user's setting mysql table
$_ARR_permission = bitMask('5');
if(in_array(PERMISSION_READ, $_ARR_permission)) {
echo 'Access granted.';
}else {
echo 'Access denied.';
}
?>
non-bit version
<?PHP
/*
NON bitwise method
*/
// this value would be pulled from a user's setting mysql table
$user_permission_level = 4;
if($user_permission_level === 4) {
echo 'Access granted.';
}else {
echo 'Access denied.';
}
?>
Why not just do this...
define('PERMISSION_DENIED', 0);
define('PERMISSION_READ', 1);
define('PERMISSION_ADD', 2);
define('PERMISSION_UPDATE', 4);
define('PERMISSION_DELETE', 8);
//run function
// this value would be pulled from a user's setting mysql table
$_ARR_permission = 5;
if($_ARR_permission & PERMISSION_READ) {
echo 'Access granted.';
}else {
echo 'Access denied.';
}
You can also create lots of arbitrary combinations of permissions if you use bits...
$read_only = PERMISSION_READ;
$read_delete = PERMISSION_READ | PERMISSION_DELETE;
$full_rights = PERMISSION_DENIED | PERMISSION_READ | PERMISSION_ADD | PERMISSION_UPDATE | PERMISSION_DELETE;
//manipulating permissions is easy...
$myrights = PERMISSION_READ;
$myrights |= PERMISSION_UPDATE; // add Update permission to my rights
The first allows people to have lots of permissions - read/add/update for example. The second example, the user has just PERMISSION_UPDATE.
Bitwise testing works by testing bits for truth values.
For example, the binary sequence 10010 would identify a user with PERMISSION_DELETE and PERMISSION_READ (the bit identifying PERMISSION_READ is the column for 2, the bit identifying PERMISSION_DELETE is the column for 16), 10010 in binary is 18 in decimal (16 + 2 = 18). Your second code sample doesn't allow you to do that sort of testing. You could do greater-than style checks, but that assumes everyone with PERMISSION_DELETE should also have PERMISSION_UPDATE, which may not be a valid assumption.
Maybe it's just because I don't use bitmasks very often, but I find that in a language like PHP where developer productivity and code readability are more important than speed or memory usage (within limits, obviously), there's no real reason to use bitmasking.
Why not instead create a class that tracks things like permissions, and logged in users, and so on? Let's call it Auth. Then, if you want to check that a user has a permission, you can create a method HasPermission.
e.g.,
if(Auth::logged_in() && Auth::currentUser()->hasPermission('read'))
//user can read
then if you want to check whether they have some combination of permissions:
if(Auth::logged_in() && Auth::currentUser()->hasAllPermissions('read', 'write'))
//user can read, and write
or if you want to check whether they have any of a certain group of permissions:
if(Auth::logged_in() && Auth::currentUser()->hasAnyPermissions('read', 'write'))
//user can read, or write
Of course, it may not be a bad idea to define constants, such as PERMISSION_READ, which you can just define to be the string 'read', and so on.
I find this approach easier to read than bitmasks because the method names tell you exactly what it is you're looking for.
Edit: rereading the question, it looks like the user's permissions are coming back from your database in a bitfield. If that's the case, you are going to have to use bitwise operators. A user who's permission in the database is 5 has PERMISSION_READ and PERMISSION_DENIED because (PERMISSION_READ & 5) != 0, and (PERMISSION_DENIED & 5) != 0. He wouldn't have PERMISSION_ADD, because (PERMISSION_ADD & 5) == 0
Does that make sense? All the complex stuff in your bitwise example looks unnecessary.
If you don't fully understand bitwise operations, then don't use them. It will only lead to lots of headaches. If you are comfortable with them, then use them where you feel they are appropriate. You (or whoever wrote the bitwise code) doesn't seem to fully grasp bitwise operations. There are several problems with it, like the fact that the pow() function is used, that would negate any kind of performance benefit. (Instead of pow(2, $n), you should use the bitwise 1 << $n, for example.)
That said, the two pieces of code do not seem to do the same things.
Try using what is in the bit.class.php at http://code.google.com/p/samstyle-php-framework/source/browse/trunk/class/bit.class.php
Checking against a specific bit:
<?php
define('PERMISSION_DENIED', 1);
define('PERMISSION_READ', 2);
define('PERMISSION_ADD', 3);
define('PERMISSION_UPDATE', 4);
define('PERMISSION_DELETE', 5);
if(bit::query($permission,PERMISSION_DENIED)){
echo 'Your permission is denied';
exit();
}else{
// so on
}
?>
And to turn on and off:
<?php
$permissions = 8;
bit::toggle(&$permissions,PERMISSION_DENIED);
var_dump($permissions); // outputs int(9)
?>
problem of this is if PERMISSION_READ is a mask itself
if($ARR_permission & PERMISSION_READ) {
echo 'Access granted.';
}else {
echo 'Access denied.';
then for
0101 - $rightWeHave
0011 - $rightWeRequire
it is access granted, which we probably do not want so it should be
if (($rightWeHave & $rightWeRequire) == $rightWeRequire) {
echo 'access granted';
}
so now for
0101
0011
result is
0001 so access is not granted because it is not equal to 0011
but for
1101
0101
it is ok as the result is 0101
Script checks which mask has been set in decimal. Maybe someone will need it:
<?php
$max = 1073741824;
$series = array(0);
$x = 1;
$input = $argv[1]; # from command line eg.'12345': php script.php 12345
$sum = 0;
# generates all bitmasks (with $max)
while ($x <= $max) {
$series[] = $x;
$x = $x * 2;
}
# show what bitmask has been set in '$argv[1]'
foreach ($series as $value) {
if ($value & $input) {
$sum += $value;
echo "$value - SET,\n";
} else {
echo "$value\n";
}
}
# sum of set masks
echo "\nSum of set masks: $sum\n\n";
Output (php maskChecker.php 123):
0
1 - SET,
2 - SET,
4
8 - SET,
16 - SET,
32 - SET,
64 - SET,
128
256
512
1024
2048
4096
8192
(...)
Sum of set mask: 123
I guess the first example gives you more control of exactly what permissions a user has. In the second you just have a user 'level'; presumably higher levels inherit all the permissions granted to a lower 'level' user, so you don't have such fine control.
Also, if I have understood correctly, the line
if($user_permission_level === 4)
means that only users with exactly permission level 4 have access to the action - surely you would want to check that users have at least that level?