PHP problem with Bulgarian-MIK character set, cant get one letter

PHP problem with Bulgarian-MIK character set, cant get one letter - php

I know that Bulgarian-MIK character set can be converted with the ord function and adding 64, and the bulgarian-MIK characters are from 127 to 191 but i cant get the letter "а"(ord - 127).I tried a lot of ways but it seems that php is processing "а" with a blank symbol and i cant get it.
define("PHP_NL", "<br>");
$string = '-------------- 1 --------------'.PHP_NL;
$string .= '413 …±Ї°Ґ±® €¶® X1.000'.PHP_NL;
$string .= '358 ЊЁ ‚®¤  0.5 X1.000'.PHP_NL;
$string .= '--------------------------------'.PHP_NL;
$string .= '1 -Ђ¤°Ё   - ЊЂ‘Ђ: 1 - 6'.PHP_NL;
$string .= '17-08-2018 09:05:32'.PHP_NL;
$string .= '--------------------------------';
That is my string with Bulgarian-MIK encoding.I tried to convert it and i every letter is converted fine, but only "а" i cant get.
My function
function ConvertDosToWin($string) {
$chr = null;
for ($i = 1;$i<strlen($string);$i++) {
$chr = mb_convert_encoding($string[$i],'utf-8','windows-1251');
if((ord($chr) >= 127) && (ord($chr)<=(127+64)) ) {
echo 'inside if';
$string[$i] = chr(ord($chr)+64);
}
}
return $string;
}

I fixed the problem using iconv.
function ConvertWinToDos($string) {
$chr = null;
for ($i = 1;$i<strlen($string);$i++) {
$string = iconv(mb_detect_encoding($string,mb_detect_order(),true),'windows-1251',$string);
$chr = $string[$i];
if ((ord($chr) >= 192) && (ord($chr) <= 255)) {
$string[$i] = chr(ord($chr) - 64);
}
}
return $string;
}

I think that this approach may help. I used this in an old project and next is working example. PHP file is Windows-1251 encoded. If your text is in different encoding, you need to convert text using mb_convert_encoding() or iconv(), because ord() returns the binary value of the first byte of text as an unsigned integer between 0 and 255.
Test.php:
<?php
// Functions
function ConvertDosToWin($string) {
$chr = null;
for ($i = 0; $i<strlen($string); $i++) {
if ((ord($chr) >= 128) && (ord($chr) <= 191)) {
$string[$i] = chr(ord($chr) + 64);
}
}
return $string;
}
function ConvertWinToDos($string) {
$chr = null;
for ($i = 0; $i<strlen($string); $i++) {
$chr = $string[$i];
if ((ord($chr) >= 192) && (ord($chr) <= 255)) {
$string[$i] = chr(ord($chr) - 64);
}
}
return $string;
}
// Output
$text = 'АБВГДЕЖЗИЙ';
$text = ConvertWinToDos($text);
file_put_contents('dos.txt', $text);
?>

Related

convert emoji to their hex code

I'm trying to detect the emoji that I get through e.g. a POST (the source ist not necessary).
As an example I'm using this emoji: ✊🏾 (I hope it's visible)
The code for it is U+270A U+1F3FE (I'm using http://unicode.org/emoji/charts/full-emoji-list.html for the codes)
Now I converted the emoji with json_encode and I get: \u270a\ud83c\udffe
Here the only part that is equal is 270a. \ud83c\udffe is not equal to U+1F3FE, not even if I add them together (1B83A)
How do I get from ✊🏾 to U+270A U+1F3FE with e.g. php?

Use mb_convert_encoding and convert from UTF-8 to UTF-32. Then do some additional formatting:
// Strips leading zeros
// And returns str in UPPERCASE letters with a U+ prefix
function format($str) {
$copy = false;
$len = strlen($str);
$res = '';
for ($i = 0; $i < $len; ++$i) {
$ch = $str[$i];
if (!$copy) {
if ($ch != '0') {
$copy = true;
}
// Prevent format("0") from returning ""
else if (($i + 1) == $len) {
$res = '0';
}
}
if ($copy) {
$res .= $ch;
}
}
return 'U+'.strtoupper($res);
}
function convert_emoji($emoji) {
// ✊🏾 --> 0000270a0001f3fe
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$hex = bin2hex($emoji);
// Split the UTF-32 hex representation into chunks
$hex_len = strlen($hex) / 8;
$chunks = array();
for ($i = 0; $i < $hex_len; ++$i) {
$tmp = substr($hex, $i * 8, 8);
// Format each chunk
$chunks[$i] = format($tmp);
}
// Convert chunks array back to a string
return implode($chunks, ' ');
}
echo convert_emoji('✊🏾'); // U+270A U+1F3FE

Simple function, inspired by #d3L answer above
function emoji_to_unicode($emoji) {
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emoji)));
return $unicode;
}
Exmaple
emoji_to_unicode("💵");//returns U+1F4B5

You can do like this, consider the emoji a normal character.
$emoji = "✊🏾";
$str = str_replace('"', "", json_encode($emoji, JSON_HEX_APOS));
$myInput = $str;
$myHexString = str_replace('\\u', '', $myInput);
$myBinString = hex2bin($myHexString);
print iconv("UTF-16BE", "UTF-8", $myBinString);

Encoding/decoding string in hexadecimal and back

Given a string that may contain any character (including a unicode characters), how can I convert this string into hexadecimal representation, and then reverse and obtain from hexadecimal this string?

Use pack() and unpack():
function hex2str( $hex ) {
return pack('H*', $hex);
}
function str2hex( $str ) {
return array_shift( unpack('H*', $str) );
}
$txt = 'This is test';
$hex = str2hex( $txt );
$str = hex2str( $hex );
echo "{$txt} => {$hex} => {$str}\n";
would produce
This is test => 546869732069732074657374 => This is test

Use a function like this:
<?php
function bin2hex($str) {
$hex = "";
$i = 0;
do {
$hex .= dechex(ord($str{$i}));
$i++;
} while ($i < strlen($str));
return $hex;
}
// Look what happens when ord($str{$i}) is 0...15
// you get a single digit hexadecimal value 0...F
// bin2hex($str) could return something like 4a3,
// decimals(74, 3), whatever the binary value is of those.
function hex2bin($str) {
$bin = "";
$i = 0;
do {
$bin .= chr(hexdec($str{$i}.$str{($i + 1)}));
$i += 2;
} while ($i < strlen($str));
return $bin;
}
// hex2bin("4a3") just broke. Now what?
// Using sprintf() to get it right.
function bin2hex($str) {
$hex = "";
$i = 0;
do {
$hex .= sprintf("%02x", ord($str{$i}));
$i++;
} while ($i < strlen($str));
return $hex;
}
// now using whatever the binary value of decimals(74, 3)
// and this bin2hex() you get a hexadecimal value you can
// then run the hex2bin function on. 4a03 instead of 4a3.
?>
Source: http://php.net/manual/en/function.bin2hex.php

Optimal function to create a random UTF-8 string in PHP? (letter characters only)

I wrote this function that creates a random string of UTF-8 characters. It works well, but the regular expression [^\p{L}] is not filtering all non-letter characters it seems. I can't think of a better way to generate the full range of unicode without non-letter characters.. short of manually searching for and defining the decimal letter ranges between 65 and 65533.
function rand_str($max_length, $min_length = 1, $utf8 = true) {
static $utf8_chars = array();
if ($utf8 && !$utf8_chars) {
for ($i = 1; $i <= 65533; $i++) {
$utf8_chars[] = mb_convert_encoding("&#$i;", 'UTF-8', 'HTML-ENTITIES');
}
$utf8_chars = preg_replace('/[^\p{L}]/u', '', $utf8_chars);
foreach ($utf8_chars as $i => $char) {
if (trim($utf8_chars[$i])) {
$chars[] = $char;
}
}
$utf8_chars = $chars;
}
$chars = $utf8 ? $utf8_chars : str_split('abcdefghijklmnopqrstuvwxyz');
$num_chars = count($chars);
$string = '';
$length = mt_rand($min_length, $max_length);
for ($i = 0; $i < $length; $i++) {
$string .= $chars[mt_rand(1, $num_chars) - 1];
}
return $string;
}

\p{L} might be catching too much. Try to limit to {Ll} and {LU} -- {L} includes {Lo} -- others.

With PHP7 and IntlChar there is now a better way:
function utf8_random_string(int $length) : string {
$r = "";
for ($i = 0; $i < $length; $i++) {
$codePoint = mt_rand(0x80, 0xffff);
$char = \IntlChar::chr($codePoint);
if ($char !== null && \IntlChar::isprint($char)) {
$r .= $char;
} else {
$i--;
}
}
return $r;
}

php's preg_replace() versus(vs.) ord()

What is quicker, for camelCase to underscores;
using preg_replace() or using ord() ?
My guess is the method using ord will be quicker,
since preg_replace can do much more then needed.
<?php
function __autoload($class_name){
$name = strtolower(preg_replace('/([a-z])([A-Z])/', '$1_$2', $class_name));
require_once("some_dir/".$name.".php");
}
?>
OR
<?php
function __autoload($class_name){
// lowercase first letter
$class_name[0] = strtolower($class_name[0]);
$len = strlen($class_name);
for ($i = 0; $i < $len; ++$i) {
// see if we have an uppercase character and replace
if (ord($class_name[$i]) > ord('A') && ord($class_name[$i]) < ord('Z')) {
$class_name[$i] = '_' . strtolower($class_name[$i]);
// increase length of class and position
++$len;
++$i;
}
}
return $class_name;
}
?>
disclaimer -- code examples taken from StackOverflowQuestion 1589468.
edit, after jensgram's array-suggestion and finding array_splice i have come up with the following :
<?php
function __autoload ($string)// actually, function camel2underscore
{
$string = str_split($string);
$pos = count( $string );
while ( --$pos > 0 )
{
$lower = strtolower( $string[ $pos ] );
if ( $string[ $pos ] === $lower )
{
// assuming most letters will be underscore this should be improvement
continue;
}
unset( $string[ $pos ] );
array_splice( $string , $pos , 0 , array( '_' , $lower ) );
}
$string = implode( '' , $string );
return $string;
}
// $pos could be avoided by using the array key, something i might look into later on.
?>
When i will be testing these methods i will add this one
but feel free to tell me your results at anytime ;p

i think (and i'm pretty much sure) that the preg_replace method will be faster - but if you want to know, why dont you do a little benchmark calling both functions 100000 times and measure the time?

(Not an answer but too long to be a comment - will CW)
If you're going to compare, you should at least optimize a little on the ord() version.
$len = strlen($class_name);
$ordCurr = null;
$ordA = ord('A');
$ordZ = ord('Z');
for ($i = 0; $i < $len; ++$i) {
$ordCurr = ord($class_name[$i]);
// see if we have an uppercase character and replace
if ($ordCurr >= $ordA && $ordCurr <= $ordZ) {
$class_name[$i] = '_' . strtolower($class_name[$i]);
// increase length of class and position
++$len;
++$i;
}
}
Also, pushing the name onto a stack (an array) and joining at the end might prove more efficient than string concatenation.
BUT Is this worth the optimization / profiling in the first place?

My usecase was slightly different than the OP's, but I think it's still illustrative of the difference between preg_replace and manual string manipulation.
$a = "16 East, 95 Street";
echo "preg: ".test_preg_replace($a)."\n";
echo "ord: ".test_ord($a)."\n";
$t = microtime(true);
for ($i = 0; $i &lt 100000; $i++) test_preg_replace($a);
echo (microtime(true) - $t)."\n";
$t = microtime(true);
for ($i = 0; $i &lt 100000; $i++) test_ord($a);
echo (microtime(true) - $t)."\n";
function test_preg_replace($s) {
return preg_replace('/[^a-z0-9_-]/', '-', strtolower($s));
}
function test_ord($s) {
$a = ord('a');
$z = ord('z');
$aa = ord('A');
$zz = ord('Z');
$zero = ord('0');
$nine = ord('9');
$us = ord('_');
$ds = ord('-');
$toret = '';
for ($i = 0, $len = strlen($s); $i < $len; $i++) {
$c = ord($s[$i]);
if (($c >= $a && $c <= $z)
|| ($c >= $zero && $c <= $nine)
|| $c == $us
|| $c == $ds)
{
$toret .= $s[$i];
}
elseif ($c >= $aa && $c <= $zz)
{
$toret .= chr($c + $a - $aa); // strtolower
}
else
{
$toret .= '-';
}
}
return $toret;
}
The results are
0.42064881324768
2.4904868602753
so the preg_replace method is vastly superior. Also, string concatenation is slightly faster than inserting into an array and imploding it.

If all you want to do is convert camel case to underscores, you can probably write a more efficient function to do so than either ord or preg_replace in less time than it takes to profile them.

I've written a benchmark using the following four functions and I figured out that the one implemented in Magento is the fastest one (it's Test4):
Test1:
/**
* #see: http://www.paulferrett.com/2009/php-camel-case-functions/
*/
function fromCamelCase_1($str)
{
$str[0] = strtolower($str[0]);
return preg_replace('/([A-Z])/e', "'_' . strtolower('\\1')", $str);
}
Test2:
/**
* #see: http://stackoverflow.com/questions/3995338/phps-preg-replace-versusvs-ord#answer-3995435
*/
function fromCamelCase_2($str)
{
// lowercase first letter
$str[0] = strtolower($str[0]);
$newFieldName = '';
$len = strlen($str);
for ($i = 0; $i < $len; ++$i) {
$ord = ord($str[$i]);
// see if we have an uppercase character and replace
if ($ord > 64 && $ord < 91) {
$newFieldName .= '_';
}
$newFieldName .= strtolower($str[$i]);
}
return $newFieldName;
}
Test3:
/**
* #see: http://www.paulferrett.com/2009/php-camel-case-functions/#div-comment-133
*/
function fromCamelCase_3($str) {
$str[0] = strtolower($str[0]);
$func = create_function('$c', 'return "_" . strtolower($c[1]);');
return preg_replace_callback('/([A-Z])/', $func, $str);
}
Test4:
/**
* #see: http://svn.magentocommerce.com/source/branches/1.6-trunk/lib/Varien/Object.php :: function _underscore($name)
*/
function fromCamelCase_4($name) {
return strtolower(preg_replace('/(.)([A-Z])/', "$1_$2", $name));
}
Result using the string "getExternalPrefix" 1000 times:
fromCamelCase_1: 0.48158717155457
fromCamelCase_2: 2.3211658000946
fromCamelCase_3: 0.63665509223938
fromCamelCase_4: 0.18188905715942
Result using random strings like "WAytGLPqZltMfHBQXClrjpTYWaEEkyyu" 1000 times:
fromCamelCase_1: 2.3300149440765
fromCamelCase_2: 4.0111720561981
fromCamelCase_3: 2.2800230979919
fromCamelCase_4: 0.18472790718079
Using the test-strings I got a different output - but this should not appear in your system:
original:
MmrcgUmNfCCTOMwwgaPuGegEGHPzvUim
last test:
mmrcg_um_nf_cc_to_mwwga_pu_geg_eg_hpzv_uim
other tests:
mmrcg_um_nf_c_c_t_o_mwwga_pu_geg_e_g_h_pzv_uim
As you can see at the timestamps - the last test has the same time in both tests :)

Reverse letters in each word of a string without using native splitting or reversing functions [duplicate]

This question already has answers here:
Reverse the letters in each word of a string
(6 answers)
Closed 1 year ago.
This task has already been asked/answered, but I recently had a job interview that imposed some additional challenges to demonstrate my ability to manipulate strings.
Problem: How to reverse words in a string? You can use strpos(), strlen() and substr(), but not other very useful functions such as explode(), strrev(), etc.
Example:
$string = "I am a boy"
Answer:
I ma a yob
Below is my working coding attempt that took me 2 days [sigh], but there must be a more elegant and concise solution.
Intention:
1. get number of words
2. based on word count, grab each word and store into array
3. loop through array and output each word in reverse order
Code:
$str = "I am a boy";
echo reverse_word($str) . "\n";
function reverse_word($input) {
//first find how many words in the string based on whitespace
$num_ws = 0;
$p = 0;
while(strpos($input, " ", $p) !== false) {
$num_ws ++;
$p = strpos($input, ' ', $p) + 1;
}
echo "num ws is $num_ws\n";
//now start grabbing word and store into array
$p = 0;
for($i=0; $i<$num_ws + 1; $i++) {
$ws_index = strpos($input, " ", $p);
//if no more ws, grab the rest
if($ws_index === false) {
$word = substr($input, $p);
}
else {
$length = $ws_index - $p;
$word = substr($input, $p, $length);
}
$result[] = $word;
$p = $ws_index + 1; //move onto first char of next word
}
print_r($result);
//append reversed words
$str = '';
for($i=0; $i<count($result); $i++) {
$str .= reverse($result[$i]) . " ";
}
return $str;
}
function reverse($str) {
$a = 0;
$b = strlen($str)-1;
while($a < $b) {
swap($str, $a, $b);
$a ++;
$b --;
}
return $str;
}
function swap(&$str, $i1, $i2) {
$tmp = $str[$i1];
$str[$i1] = $str[$i2];
$str[$i2] = $tmp;
}

$string = "I am a boy";
$reversed = "";
$tmp = "";
for($i = 0; $i < strlen($string); $i++) {
if($string[$i] == " ") {
$reversed .= $tmp . " ";
$tmp = "";
continue;
}
$tmp = $string[$i] . $tmp;
}
$reversed .= $tmp;
print $reversed . PHP_EOL;
>> I ma a yob

Whoops! Mis-read the question. Here you go (Note that this will split on all non-letter boundaries, not just space. If you want a character not to be split upon, just add it to $wordChars):
function revWords($string) {
//We need to find word boundries
$wordChars = 'abcdefghijklmnopqrstuvwxyz';
$buffer = '';
$return = '';
$len = strlen($string);
$i = 0;
while ($i < $len) {
$chr = $string[$i];
if (($chr & 0xC0) == 0xC0) {
//UTF8 Characer!
if (($chr & 0xF0) == 0xF0) {
//4 Byte Sequence
$chr .= substr($string, $i + 1, 3);
$i += 3;
} elseif (($chr & 0xE0) == 0xE0) {
//3 Byte Sequence
$chr .= substr($string, $i + 1, 2);
$i += 2;
} else {
//2 Byte Sequence
$i++;
$chr .= $string[$i];
}
}
if (stripos($wordChars, $chr) !== false) {
$buffer = $chr . $buffer;
} else {
$return .= $buffer . $chr;
$buffer = '';
}
$i++;
}
return $return . $buffer;
}
Edit: Now it's a single function, and stores the buffer naively in reversed notation.
Edit2: Now handles UTF8 characters (just add "word" characters to the $wordChars string)...

My answer is to count the string length, split the letters into an array and then, loop it backwards. This is also a good way to check if a word is a palindrome. This can only be used for regular string and numbers.
preg_split can be changed to explode() as well.
/**
* Code snippet to reverse a string (LM)
*/
$words = array('one', 'only', 'apple', 'jobs');
foreach ($words as $d) {
$strlen = strlen($d);
$splits = preg_split('//', $d, -1, PREG_SPLIT_NO_EMPTY);
for ($i = $strlen; $i >= 0; $i=$i-1) {
#$reverse .= $splits[$i];
}
echo "Regular: {$d}".PHP_EOL;
echo "Reverse: {$reverse}".PHP_EOL;
echo "-----".PHP_EOL;
unset($reverse);
}

Without using any function.
$string = 'I am a boy';
$newString = '';
$temp = '';
$i = 0;
while(#$string[$i] != '')
{
if($string[$i] == ' ') {
$newString .= $temp . ' ';
$temp = '';
}
else {
$temp = $string[$i] . $temp;
}
$i++;
}
$newString .= $temp . ' ';
echo $newString;
Output: I ma a yob

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP problem with Bulgarian-MIK character set, cant get one letter - php

Related

convert emoji to their hex code

Encoding/decoding string in hexadecimal and back

Optimal function to create a random UTF-8 string in PHP? (letter characters only)

php's preg_replace() versus(vs.) ord()

Reverse letters in each word of a string without using native splitting or reversing functions [duplicate]

Categories

Resources