Regex matching and encoding duplicate characters in a string

Regex matching and encoding duplicate characters in a string - php

My problem is that I've got URL access keys that look like "Bd333333d". I need the string length to be no longer than the original, but may be shorter. I want to convert/obfuscate the duplicate characters in the string and be able to convert them back to the original.

PHP can already do string compression, so why would you want to come up with your own algorithm? See this post for some excellent suggestions of combining gzip compression with urlencoding.
You don't say whether you're storing these strings internally or using them as part of a URL. If it's the former, then this is even easier because you can just store it as the much more compact binary.

This is a good task for preg_replace_callback
$str = 'Bd333333dddd';
function shorten( $str ) {
return preg_replace_callback(
'~(.)\1+~',
function( $matches ) {
return sprintf( '%s.%s', $matches[1], strlen( $matches[0] ) );
},
$str
);
}

UPDATE: Thanks for your help! After doing some work on the hybrid ROT13 concept, I came up with something that works for me. Sorry to be lame and post my own solution, but here it is:
function ROT_by_strpos($s,$type='in'){
$index = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
for ($n = 0; $n<strlen($index); $n++){
$k[] = substr( $index,$n ,1);
}
if($type == 'out'){
$k = array_reverse($k);
}
$rot = '';
$count = 1;
$len = strlen($s);
for ($n = 0; $n<strlen($s); $n++){
$key_in[] = substr( $s,$n ,1);
}
for ( $i = 0; $i < $len; $i++ ){
$key = array_search($key_in[$i], $k)+1;
if($type == 'in'){
if($key+$i > count($k)){
$rev = $key+$i - count($k);
$new_key = $rev;
}else{
$new_key = $key+$i;
}
}else{
if($key+$i >= count($k)){
$adv = $key+$i - count($k);
$new_key = $adv;
}else{
$new_key = $key+$i;
}
}
$rot .= $k[$new_key];
}
return $rot;
}
This assumes that possible chars are from $index and code string length <= 10 chars long.
Usage:
$key = "Bd333333d";
$in = ROT_by_strpos($key,'in');
$out = ROT_by_strpos($in,'out');
echo "$key - $in - $out"; //Bd333333d - Cf6789ABm - Bd333333d
There's probably a more elegant way to do this, but it does work. Any feedback or improvements would be appreciated if you want to add something. :)

Related

PHP Loop Stuck on one character

i have some problem.
i just want my loop to run, but when i try to do it, it fails, it has to increment each letter by a few, but it doesn't take any new letters at all, why is this happening and what is the reason? in c ++ such code would work.
function accum('ZpglnRxqenU') {
// your code
$result = '';
$letters_result = '';
$letter_original = '';
$num_if_str = strlen($s);
$j = 0;
for ( $i=0; $i <= $num_if_str; $i++ )
{
$letter_original = substr($s, $i, $i+1);
$j = 0;
while($j == $i)
{
$letters_result = $letters_result . $letter_original;
$j++;
}
if($i != strlen($s))
{
$letters_result = $letters_result . '-';
}
}
return $letters_result;
}
It returns
- Expected: 'Z-Pp-Ggg-Llll-Nnnnn-Rrrrrr-Xxxxxxx-Qqqqqqqq-Eeeeeeeee-Nnnnnnnnnn-Uuuuuuuuuuu'
Actual : 'Z-----------'
what problem with what PHP code?

There are a number of problems here:
you're using $s but never initialise it
Your call to substr() uses an incorrect value for the length of substring to return
you're inner loop only runs while $i = $j, but you initialise $j to 0 so it will only run when $i is zero, i.e. for the first letter of the string.
There is a simpler way to do this. In PHP you can address individual characters in a string as if they were array elements, so no need for substr()
Further, you can use str_repeat() to generate the repeating strings, and if you store the expanded strings in an array you can join them all with implode().
Lastly, combining ucwords() and strtolower() returns the required case.
Putting it all together we get
<?php
$str = "ZpglnRxqenU";
$output = [];
for ($i = 0;$i<strlen($str);$i++) {
$output[] = str_repeat($str[$i], $i+1);
}
$output = ucwords(strtolower(implode('-',$output)),"-");
echo $output; // Z-Pp-Ggg-Llll-Nnnnn-Rrrrrr-Xxxxxxx-Qqqqqqqq-Eeeeeeeee-Nnnnnnnnnn-Uuuuuuuuuuu
Demo:https://3v4l.org/OoukZ

I don't have much more to add to #TangentiallyPerpendicular's answer as far as critique, other than you've made the classic while($i<=strlen($s)) off-by-one blunder. String bar will have a length of 3, but arrays are zero-indexed [eg: [ 0 => 'b', 1 => 'a', '2' => 'r' ]] so when you hit $i == strlen() at 3, that's an error.
Aside from that your approach, when corrected and made concise, would look like:
function accum($input) {
$result = '';
for ( $i=0, $len=strlen($input); $i < $len; $i++ ) {
$letter = substr($input, $i, 1);
for( $j=0; $j<=$i; $j++ ) {
$result .= $letter;
}
if($i != $len-1) {
$result .= '-';
}
}
return $result;
}
var_dump(accum('ZpglnRxqenU'));
Output:
string(76) "Z-pp-ggg-llll-nnnnn-RRRRRR-xxxxxxx-qqqqqqqq-eeeeeeeee-nnnnnnnnnn-UUUUUUUUUUU"
Also keep in mind that functions have their own isolated variable scope, so you don't need to namespace variables like $letters_foo which can make your code a bit confusing to the eye.

An encoding scheme to shorten a string of numbers that is url safe

I have a string that looks like this (contains numbers, periods and dashes):
1372137673.276886940002-19690324617-19694854617-18953258947
Since I only have numbers, periods and dashes, I would like to use an url-safe (only numbers and letters) encoding scheme to shorten it. I also need to be able to reverse the encoded string to its original form.
I have had a look at base64, but it increases the size of the string by a fair bit, which is not what I want.
I plan to have this implemented in PHP and Javascript.
Are there any existing schemes that can do this? My main motivation is to make the above string shorter and the result should be URL safe.

Convert the numbers into their binary form, then Base64 encode that.

One reasonable attempt would be:
break down the string into tokens separated by dashes and dots
convert each token to a higher base (36 is a base which you can convert to and from easily from both JS and PHP)
join the tokens back together -- both dashes and dots are valid in a URL
However, the "have to do this in JS also" requirement seems a bit suspect -- why does your client-side code have to extract information from URLs which are ultimately under the server's authority? In general URLs should be opaque and when that's not true alarm bells should start ringing.

To do it in javascript also, you need the help of http://phpjs.org/ :)
i think all php functions i used in this script is available there, like bcomp
You can tweak this code to get an even smaller string, bit busy now, if i got time, sure i will update this answer :)
<?php
/**
* This function will encode a larger number to small string
**/
function encode( $int = null ) {
$chars = 'kwn7uh2qifbj8te9vp64zxcmayrg50ds31';
$uid = '';
while( bccomp( $int, 0, 0) != 0 ) {
$rem = bcmod( $int, 34 );
$int = bcdiv( bcsub( $int, $rem, 0 ), 34, 0 );
$uid = $chars[$rem].$uid;
}
return $uid;
}
/**
* This function will decode a string encoded with above function to its original state
**/
function decode( $uid = null ) {
$chars = 'kwn7uh2qifbj8te9vp64zxcmayrg50ds31';
$id = '';
$len = strlen( $uid );
for( $i = $len - 1; $i >= 0; $i-- ) {
$value = strpos( $chars, $uid[$i] );
$id = bcadd( $id, bcmul( $value, bcpow( 34, ( $len - $i - 1 ) ) ) );
}
return $id;
}
/**
* Below function is only for your needs
**/
function int_to_str( $str = null, $decode = false ) {
//$len = array(); // reserved for further updates :)
$numbers1 = explode( "-", $str );
foreach( $numbers1 as &$num1 ) {
$func = ( $decode ) ? "decode" : "encode";
$num1 = implode( ".", array_map( $func, explode( ".", $num1 ) ) );
}
$numbers1 = implode( "-", $numbers1 );
return $numbers1;
}
// Encode your numbers to short strings
$str = int_to_str( "1372137673.276886940002-19690324617-19694854617-18953258947" );
// Decode your encoded string to its original state
$int = int_to_str( $str, true );
echo $str."<br />";
echo $int;
?>

Just because it was fun to do... it encodes the string on a custom base 64 keeping the separators intact:
function encode_token($digit) {
if ($digit < 10)
return (string) $digit;
if ($digit < 36)
return chr(ord('A') + ($digit - 10));
if ($digit < 62)
return chr(ord('a') + ($digit - 36));
if ($digit == 62) return ',';
return '+';
}
function encode_value($value) {
if (in_array($value, array('.', '-'))) return $value;
$int = (int) $value;
$encoded = '';
while($int) {
$encoded .= encode_token($int & 0x3F);
$int >>= 6;
}
return $encoded;
}
function encode($string) {
$values = preg_split(',([\.-]),', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
$encoded = '';
foreach($values as $value)
$encoded .= encode_value($value);
return $encoded;
}
function decode_token($token) {
if ($token <= '9') return (int) $token;
if ($token <= 'Z') return 10 + ord($token) - ord('A');
if ($token <= 'z') return 36 + ord($token) - ord('a');
if ($token == ',') return 62;
return 63;
}
function decode_value($value) {
if (in_array($value, array('.', '-'))) return $value;
$decoded = 0;
for($i = strlen($value) - 1;$i >= 0;$i--) {
$decoded <<= 6;
$decoded |= decode_token($value[$i]);
}
return $decoded;
}
function decode($string) {
$values = preg_split(',([\.-]),', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
$decoded = '';
foreach($values as $value)
$decoded .= decode_value($value);
return $decoded;
}
$string = '1372137673.276886940002-19690324617-19694854617-18953258947';
echo $string . PHP_EOL;
$encoded = encode($string);
echo $encoded . PHP_EOL;
$decoded = decode($encoded);
echo $decoded . PHP_EOL;

PHP: String to full ASCII

I'm using an RTF converter and I need 240 as &#U050&#U052&#U048 but Im not to sure how to do this!?!
I have tried using the following function:
function string_to_ascii($string) {
$ascii = NULL;
for ($i = 0; $i < strlen($string); $i++) {
$ascii += "&#U"+str_pad(ord($string[$i]),3,"0",STR_PAD_LEFT);
}
return($ascii);
}
But it still just outputs just the number (e.g. 2 = 50) and ord just makes it go mad.
I've tried echo "-&#U"+ord("2")+"-"; and I get 50416 !?!?
I have a feeling it might have something to do with encoding

I think you're over thinking this. Convert the string to an array with str_split, map ord to all of it, then if you want to format each one, use sprintf (or str_pad if you'd like), like this:
function string_to_ascii($string) {
$array = array_map( 'ord', str_split( $string));
// Optional formatting:
foreach( $array as $k => &$v) {
$v = sprintf( "%03d", $v);
}
return "&#U" . implode( "&#U", $array);
}
Now, when you pass string_to_ascii( '240'), you get back string(18) "&#U050&#U052&#U048".

Just found this:
function to_ascii($string) {
$ascii_string = '';
foreach (str_split($string) as $char) {
$ascii_string .= '&#' . ord($char) . ';';
}
return $ascii_string;
}
here

Generate random 5 characters string

I want to create exact 5 random characters string with least possibility of getting duplicated. What would be the best way to do it? Thanks.

$rand = substr(md5(microtime()),rand(0,26),5);
Would be my best guess--Unless you're looking for special characters, too:
$seed = str_split('abcdefghijklmnopqrstuvwxyz'
.'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
.'0123456789!##$%^&*()'); // and any other characters
shuffle($seed); // probably optional since array_is randomized; this may be redundant
$rand = '';
foreach (array_rand($seed, 5) as $k) $rand .= $seed[$k];
Example
And, for one based on the clock (fewer collisions since it's incremental):
function incrementalHash($len = 5){
$charset = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
$base = strlen($charset);
$result = '';
$now = explode(' ', microtime())[1];
while ($now >= $base){
$i = $now % $base;
$result = $charset[$i] . $result;
$now /= $base;
}
return substr($result, -5);
}
Note: incremental means easier to guess; If you're using this as a salt or a verification token, don't. A salt (now) of "WCWyb" means 5 seconds from now it's "WCWyg")

If for loops are on short supply, here's what I like to use:
$s = substr(str_shuffle(str_repeat("0123456789abcdefghijklmnopqrstuvwxyz", 5)), 0, 5);

You can try it simply like this:
$length = 5;
$randomletter = substr(str_shuffle("abcdefghijklmnopqrstuvwxyz"), 0, $length);
more details: http://forum.arnlweb.com/viewtopic.php?f=7&t=25

A speedy way is to use the most volatile characters of the uniqid function.
For example:
$rand = substr(uniqid('', true), -5);

The following should provide the least chance of duplication (you might want to replace mt_rand() with a better random number source e.g. from /dev/*random or from GUIDs):
<?php
$characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
$result = '';
for ($i = 0; $i < 5; $i++)
$result .= $characters[mt_rand(0, 61)];
?>
EDIT:
If you are concerned about security, really, do not use rand() or mt_rand(), and verify that your random data device is actually a device generating random data, not a regular file or something predictable like /dev/zero. mt_rand() considered harmful:
https://spideroak.com/blog/20121205114003-exploit-information-leaks-in-random-numbers-from-python-ruby-and-php
EDIT:
If you have OpenSSL support in PHP, you could use openssl_random_pseudo_bytes():
<?php
$length = 5;
$randomBytes = openssl_random_pseudo_bytes($length);
$characters = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz';
$charactersLength = strlen($characters);
$result = '';
for ($i = 0; $i < $length; $i++)
$result .= $characters[ord($randomBytes[$i]) % $charactersLength];
?>

I always use the same function for this, usually to generate passwords. It's easy to use and useful.
function randPass($length, $strength=8) {
$vowels = 'aeuy';
$consonants = 'bdghjmnpqrstvz';
if ($strength >= 1) {
$consonants .= 'BDGHJLMNPQRSTVWXZ';
}
if ($strength >= 2) {
$vowels .= "AEUY";
}
if ($strength >= 4) {
$consonants .= '23456789';
}
if ($strength >= 8) {
$consonants .= '##$%';
}
$password = '';
$alt = time() % 2;
for ($i = 0; $i < $length; $i++) {
if ($alt == 1) {
$password .= $consonants[(rand() % strlen($consonants))];
$alt = 0;
} else {
$password .= $vowels[(rand() % strlen($vowels))];
$alt = 1;
}
}
return $password;
}

It seems like str_shuffle would be a good use for this.
Seed the shuffle with whichever characters you want.
$my_rand_strng = substr(str_shuffle("ABCDEFGHIJKLMNOPQRSTUVWXYZ"), -5);

I also did not know how to do this until I thought of using PHP array's. And I am pretty sure this is the simplest way of generating a random string or number with array's. The code:
function randstr ($len=10, $abc="aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ0123456789") {
$letters = str_split($abc);
$str = "";
for ($i=0; $i<=$len; $i++) {
$str .= $letters[rand(0, count($letters)-1)];
};
return $str;
};
You can use this function like this
randstr(20) // returns a random 20 letter string
// Or like this
randstr(5, abc) // returns a random 5 letter string using the letters "abc"

$str = '';
$str_len = 8;
for($i = 0, $i < $str_len; $i++){
//97 is ascii code for 'a' and 122 is ascii code for z
$str .= chr(rand(97, 122));
}
return $str

Similar to Brad Christie's answer, but using sha1 alrorithm for characters 0-9a-zA-Z and prefixed with a random value :
$str = substr(sha1(mt_rand() . microtime()), mt_rand(0,35), 5);
But if you have set a defined (allowed) characters :
$validChars = array('0','1','2' /*...*/,'?','-','_','a','b','c' /*...*/);
$validCharsCount = count($validChars);
$str = '';
for ($i=0; $i<5; $i++) {
$str .= $validChars[rand(0,$validCharsCount - 1)];
}
** UPDATE **
As Archimedix pointed out, this will not guarantee to return a "least possibility of getting duplicated" as the number of combination is low for the given character range. You will either need to increase the number of characters, or allow extra (special) characters in the string. The first solution would be preferable, I think, in your case.

If it's fine that you'll get only letters A-F, then here's my solution:
str_pad(dechex(mt_rand(0, 0xFFFFF)), 5, '0', STR_PAD_LEFT);
I believe that using hash functions is an overkill for such a simple task as generating a sequence of random hexadecimal digits. dechex + mt_rand will do the same job, but without unnecessary cryptographic work. str_pad guarantees 5-character length of the output string (if the random number is less than 0x10000).
Duplicate probability depends on mt_rand's reliability. Mersenne Twister is known for high-quality randomness, so it should fit the task well.

works fine in PHP (php 5.4.4)
$seed = str_split('abcdefghijklmnopqrstuvwxyz');
$rand = array_rand($seed, 5);
$convert = array_map(function($n){
global $seed;
return $seed[$n];
},$rand);
$var = implode('',$convert);
echo $var;
Live Demo

Source: PHP Function that Generates Random Characters
This simple PHP function worked for me:
function cvf_ps_generate_random_code($length=10) {
$string = '';
// You can define your own characters here.
$characters = "23456789ABCDEFHJKLMNPRTVWXYZabcdefghijklmnopqrstuvwxyz";
for ($p = 0; $p < $length; $p++) {
$string .= $characters[mt_rand(0, strlen($characters)-1)];
}
return $string;
}
Usage:
echo cvf_ps_generate_random_code(5);

Here are my random 5 cents ...
$random=function($a, $b) {
return(
substr(str_shuffle(('\\`)/|#'.
password_hash(mt_rand(0,999999),
PASSWORD_DEFAULT).'!*^&~(')),
$a, $b)
);
};
echo($random(0,5));
PHP's new password_hash() (* >= PHP 5.5) function is doing the job for generation of decently long set of uppercase and lowercase characters and numbers.
Two concat. strings before and after password_hash within $random function are suitable for change.
Paramteres for $random() *($a,$b) are actually substr() parameters. :)
NOTE: this doesn't need to be a function, it can be normal variable as well .. as one nasty singleliner, like this:
$random=(substr(str_shuffle(('\\`)/|#'.password_hash(mt_rand(0,999999), PASSWORD_DEFAULT).'!*^&~(')), 0, 5));
echo($random);

function CaracteresAleatorios( $Tamanno, $Opciones) {
$Opciones = empty($Opciones) ? array(0, 1, 2) : $Opciones;
$Tamanno = empty($Tamanno) ? 16 : $Tamanno;
$Caracteres=array("0123456789","abcdefghijklmnopqrstuvwxyz","ABCDEFGHIJKLMNOPQRSTUVWXYZ");
$Caracteres= implode("",array_intersect_key($Caracteres, array_flip($Opciones)));
$CantidadCaracteres=strlen($Caracteres)-1;
$CaracteresAleatorios='';
for ($k = 0; $k < $Tamanno; $k++) {
$CaracteresAleatorios.=$Caracteres[rand(0, $CantidadCaracteres)];
}
return $CaracteresAleatorios;
}

I`ve aways use this:
<?php function fRand($len) {
$str = '';
$a = "abcdefghijklmnopqrstuvwxyz0123456789";
$b = str_split($a);
for ($i=1; $i <= $len ; $i++) {
$str .= $b[rand(0,strlen($a)-1)];
}
return $str;
} ?>
When you call it, sets the lenght of string.
<?php echo fRand([LENGHT]); ?>
You can also change the possible characters in the string $a.

Simple one liner which includes special characters:
echo implode("", array_map(function() {return chr(mt_rand(33,126));}, array_fill(0,5,null)));
Basically, it fills an array with length 5 with null values and replaces each value with a random symbol from the ascii-range and as the last, it joins them together t a string.
Use the 2nd array_fill parameter to control the length.
It uses the ASCII Table range of 33 to 126 which includes the following characters:
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

php's preg_replace() versus(vs.) ord()

What is quicker, for camelCase to underscores;
using preg_replace() or using ord() ?
My guess is the method using ord will be quicker,
since preg_replace can do much more then needed.
<?php
function __autoload($class_name){
$name = strtolower(preg_replace('/([a-z])([A-Z])/', '$1_$2', $class_name));
require_once("some_dir/".$name.".php");
}
?>
OR
<?php
function __autoload($class_name){
// lowercase first letter
$class_name[0] = strtolower($class_name[0]);
$len = strlen($class_name);
for ($i = 0; $i < $len; ++$i) {
// see if we have an uppercase character and replace
if (ord($class_name[$i]) > ord('A') && ord($class_name[$i]) < ord('Z')) {
$class_name[$i] = '_' . strtolower($class_name[$i]);
// increase length of class and position
++$len;
++$i;
}
}
return $class_name;
}
?>
disclaimer -- code examples taken from StackOverflowQuestion 1589468.
edit, after jensgram's array-suggestion and finding array_splice i have come up with the following :
<?php
function __autoload ($string)// actually, function camel2underscore
{
$string = str_split($string);
$pos = count( $string );
while ( --$pos > 0 )
{
$lower = strtolower( $string[ $pos ] );
if ( $string[ $pos ] === $lower )
{
// assuming most letters will be underscore this should be improvement
continue;
}
unset( $string[ $pos ] );
array_splice( $string , $pos , 0 , array( '_' , $lower ) );
}
$string = implode( '' , $string );
return $string;
}
// $pos could be avoided by using the array key, something i might look into later on.
?>
When i will be testing these methods i will add this one
but feel free to tell me your results at anytime ;p

i think (and i'm pretty much sure) that the preg_replace method will be faster - but if you want to know, why dont you do a little benchmark calling both functions 100000 times and measure the time?

(Not an answer but too long to be a comment - will CW)
If you're going to compare, you should at least optimize a little on the ord() version.
$len = strlen($class_name);
$ordCurr = null;
$ordA = ord('A');
$ordZ = ord('Z');
for ($i = 0; $i < $len; ++$i) {
$ordCurr = ord($class_name[$i]);
// see if we have an uppercase character and replace
if ($ordCurr >= $ordA && $ordCurr <= $ordZ) {
$class_name[$i] = '_' . strtolower($class_name[$i]);
// increase length of class and position
++$len;
++$i;
}
}
Also, pushing the name onto a stack (an array) and joining at the end might prove more efficient than string concatenation.
BUT Is this worth the optimization / profiling in the first place?

My usecase was slightly different than the OP's, but I think it's still illustrative of the difference between preg_replace and manual string manipulation.
$a = "16 East, 95 Street";
echo "preg: ".test_preg_replace($a)."\n";
echo "ord: ".test_ord($a)."\n";
$t = microtime(true);
for ($i = 0; $i &lt 100000; $i++) test_preg_replace($a);
echo (microtime(true) - $t)."\n";
$t = microtime(true);
for ($i = 0; $i &lt 100000; $i++) test_ord($a);
echo (microtime(true) - $t)."\n";
function test_preg_replace($s) {
return preg_replace('/[^a-z0-9_-]/', '-', strtolower($s));
}
function test_ord($s) {
$a = ord('a');
$z = ord('z');
$aa = ord('A');
$zz = ord('Z');
$zero = ord('0');
$nine = ord('9');
$us = ord('_');
$ds = ord('-');
$toret = '';
for ($i = 0, $len = strlen($s); $i < $len; $i++) {
$c = ord($s[$i]);
if (($c >= $a && $c <= $z)
|| ($c >= $zero && $c <= $nine)
|| $c == $us
|| $c == $ds)
{
$toret .= $s[$i];
}
elseif ($c >= $aa && $c <= $zz)
{
$toret .= chr($c + $a - $aa); // strtolower
}
else
{
$toret .= '-';
}
}
return $toret;
}
The results are
0.42064881324768
2.4904868602753
so the preg_replace method is vastly superior. Also, string concatenation is slightly faster than inserting into an array and imploding it.

If all you want to do is convert camel case to underscores, you can probably write a more efficient function to do so than either ord or preg_replace in less time than it takes to profile them.

I've written a benchmark using the following four functions and I figured out that the one implemented in Magento is the fastest one (it's Test4):
Test1:
/**
* #see: http://www.paulferrett.com/2009/php-camel-case-functions/
*/
function fromCamelCase_1($str)
{
$str[0] = strtolower($str[0]);
return preg_replace('/([A-Z])/e', "'_' . strtolower('\\1')", $str);
}
Test2:
/**
* #see: http://stackoverflow.com/questions/3995338/phps-preg-replace-versusvs-ord#answer-3995435
*/
function fromCamelCase_2($str)
{
// lowercase first letter
$str[0] = strtolower($str[0]);
$newFieldName = '';
$len = strlen($str);
for ($i = 0; $i < $len; ++$i) {
$ord = ord($str[$i]);
// see if we have an uppercase character and replace
if ($ord > 64 && $ord < 91) {
$newFieldName .= '_';
}
$newFieldName .= strtolower($str[$i]);
}
return $newFieldName;
}
Test3:
/**
* #see: http://www.paulferrett.com/2009/php-camel-case-functions/#div-comment-133
*/
function fromCamelCase_3($str) {
$str[0] = strtolower($str[0]);
$func = create_function('$c', 'return "_" . strtolower($c[1]);');
return preg_replace_callback('/([A-Z])/', $func, $str);
}
Test4:
/**
* #see: http://svn.magentocommerce.com/source/branches/1.6-trunk/lib/Varien/Object.php :: function _underscore($name)
*/
function fromCamelCase_4($name) {
return strtolower(preg_replace('/(.)([A-Z])/', "$1_$2", $name));
}
Result using the string "getExternalPrefix" 1000 times:
fromCamelCase_1: 0.48158717155457
fromCamelCase_2: 2.3211658000946
fromCamelCase_3: 0.63665509223938
fromCamelCase_4: 0.18188905715942
Result using random strings like "WAytGLPqZltMfHBQXClrjpTYWaEEkyyu" 1000 times:
fromCamelCase_1: 2.3300149440765
fromCamelCase_2: 4.0111720561981
fromCamelCase_3: 2.2800230979919
fromCamelCase_4: 0.18472790718079
Using the test-strings I got a different output - but this should not appear in your system:
original:
MmrcgUmNfCCTOMwwgaPuGegEGHPzvUim
last test:
mmrcg_um_nf_cc_to_mwwga_pu_geg_eg_hpzv_uim
other tests:
mmrcg_um_nf_c_c_t_o_mwwga_pu_geg_e_g_h_pzv_uim
As you can see at the timestamps - the last test has the same time in both tests :)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex matching and encoding duplicate characters in a string - php

My problem is that I've got URL access keys that look like "Bd333333d". I need the string length to be no longer than the original, but may be shorter. I want to convert/obfuscate the duplicate characters in the string and be able to convert them back to the original.

This is a good task for preg_replace_callback $str = 'Bd333333dddd'; function shorten( $str ) { return preg_replace_callback( '~(.)\1+~', function( $matches ) { return sprintf( '%s.%s', $matches[1], strlen( $matches[0] ) ); }, $str ); }

Related

PHP Loop Stuck on one character

An encoding scheme to shorten a string of numbers that is url safe

PHP: String to full ASCII

Generate random 5 characters string

php's preg_replace() versus(vs.) ord()

Categories

Resources