How to cut string in UTF 8.
I have searched from web this function:
function cutString($str, $lenght = 100, $end = ' …', $charset = 'UTF-8', $token = '~') {
$str = strip_tags($str);
if (mb_strlen($str, $charset) >= $lenght) {
$wrap = wordwrap($str, $lenght, $token);
$str_cut = mb_substr($wrap, 0, mb_strpos($wrap, $token, 0, $charset), $charset);
return $str_cut .= $end;
} else {
return $str;
}
}
But result of this function isn't too good. Because if we set to cut 200 letters, it will return about 110, but I need about 200.
I have just tested it and it works fine. If you run it with
echo cutString($mystring, 200);
It returns 201 characters from the string I gave it.
i think wordwrap() function does wrong in this case
cut the string manually. i use a function like this (just add 'mb_' and $charset to the string functions):
function str_cut_end_by_word($s, $max_len, $trailer = "...")
{
if (strlen($s) <= $max_len)
return $s;
$s = trim($s);
$s = substr($s, 0, $max_len);
for ($i = strlen($s) - 1; $i >= 0; $i--)
{
if (in_array($s{$i}, array(" ", "\t", "\r", "\n")))
{
return rtrim(substr($s, 0, $i)).$trailer;
}
}
return $s;
}
Related
I know that Bulgarian-MIK character set can be converted with the ord function and adding 64, and the bulgarian-MIK characters are from 127 to 191 but i cant get the letter "а"(ord - 127).I tried a lot of ways but it seems that php is processing "а" with a blank symbol and i cant get it.
define("PHP_NL", "<br>");
$string = '-------------- 1 --------------'.PHP_NL;
$string .= '413 …±Ї°Ґ±® €¶® X1.000'.PHP_NL;
$string .= '358 ЊЁ ‚®¤ 0.5 X1.000'.PHP_NL;
$string .= '--------------------------------'.PHP_NL;
$string .= '1 -Ђ¤°Ё - ЊЂ‘Ђ: 1 - 6'.PHP_NL;
$string .= '17-08-2018 09:05:32'.PHP_NL;
$string .= '--------------------------------';
That is my string with Bulgarian-MIK encoding.I tried to convert it and i every letter is converted fine, but only "а" i cant get.
My function
function ConvertDosToWin($string) {
$chr = null;
for ($i = 1;$i<strlen($string);$i++) {
$chr = mb_convert_encoding($string[$i],'utf-8','windows-1251');
if((ord($chr) >= 127) && (ord($chr)<=(127+64)) ) {
echo 'inside if';
$string[$i] = chr(ord($chr)+64);
}
}
return $string;
}
I fixed the problem using iconv.
function ConvertWinToDos($string) {
$chr = null;
for ($i = 1;$i<strlen($string);$i++) {
$string = iconv(mb_detect_encoding($string,mb_detect_order(),true),'windows-1251',$string);
$chr = $string[$i];
if ((ord($chr) >= 192) && (ord($chr) <= 255)) {
$string[$i] = chr(ord($chr) - 64);
}
}
return $string;
}
I think that this approach may help. I used this in an old project and next is working example. PHP file is Windows-1251 encoded. If your text is in different encoding, you need to convert text using mb_convert_encoding() or iconv(), because ord() returns the binary value of the first byte of text as an unsigned integer between 0 and 255.
Test.php:
<?php
// Functions
function ConvertDosToWin($string) {
$chr = null;
for ($i = 0; $i<strlen($string); $i++) {
if ((ord($chr) >= 128) && (ord($chr) <= 191)) {
$string[$i] = chr(ord($chr) + 64);
}
}
return $string;
}
function ConvertWinToDos($string) {
$chr = null;
for ($i = 0; $i<strlen($string); $i++) {
$chr = $string[$i];
if ((ord($chr) >= 192) && (ord($chr) <= 255)) {
$string[$i] = chr(ord($chr) - 64);
}
}
return $string;
}
// Output
$text = 'АБВГДЕЖЗИЙ';
$text = ConvertWinToDos($text);
file_put_contents('dos.txt', $text);
?>
I'm trying to detect the emoji that I get through e.g. a POST (the source ist not necessary).
As an example I'm using this emoji: ✊🏾 (I hope it's visible)
The code for it is U+270A U+1F3FE (I'm using http://unicode.org/emoji/charts/full-emoji-list.html for the codes)
Now I converted the emoji with json_encode and I get: \u270a\ud83c\udffe
Here the only part that is equal is 270a. \ud83c\udffe is not equal to U+1F3FE, not even if I add them together (1B83A)
How do I get from ✊🏾 to U+270A U+1F3FE with e.g. php?
Use mb_convert_encoding and convert from UTF-8 to UTF-32. Then do some additional formatting:
// Strips leading zeros
// And returns str in UPPERCASE letters with a U+ prefix
function format($str) {
$copy = false;
$len = strlen($str);
$res = '';
for ($i = 0; $i < $len; ++$i) {
$ch = $str[$i];
if (!$copy) {
if ($ch != '0') {
$copy = true;
}
// Prevent format("0") from returning ""
else if (($i + 1) == $len) {
$res = '0';
}
}
if ($copy) {
$res .= $ch;
}
}
return 'U+'.strtoupper($res);
}
function convert_emoji($emoji) {
// ✊🏾 --> 0000270a0001f3fe
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$hex = bin2hex($emoji);
// Split the UTF-32 hex representation into chunks
$hex_len = strlen($hex) / 8;
$chunks = array();
for ($i = 0; $i < $hex_len; ++$i) {
$tmp = substr($hex, $i * 8, 8);
// Format each chunk
$chunks[$i] = format($tmp);
}
// Convert chunks array back to a string
return implode($chunks, ' ');
}
echo convert_emoji('✊🏾'); // U+270A U+1F3FE
Simple function, inspired by #d3L answer above
function emoji_to_unicode($emoji) {
$emoji = mb_convert_encoding($emoji, 'UTF-32', 'UTF-8');
$unicode = strtoupper(preg_replace("/^[0]+/","U+",bin2hex($emoji)));
return $unicode;
}
Exmaple
emoji_to_unicode("💵");//returns U+1F4B5
You can do like this, consider the emoji a normal character.
$emoji = "✊🏾";
$str = str_replace('"', "", json_encode($emoji, JSON_HEX_APOS));
$myInput = $str;
$myHexString = str_replace('\\u', '', $myInput);
$myBinString = hex2bin($myHexString);
print iconv("UTF-16BE", "UTF-8", $myBinString);
I have this function to put some random characters into a string:
function random($string) {
$chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
$shuffle_start = substr(str_shuffle($chars), 0, 6);
$shuffle_end = substr(str_shuffle($chars), 0, 6);
$letters = str_split($string);
$str = '';
$count = count($letters);
foreach($letters AS $l) {
$count--;
$str .= $l;
if($count) {
$str .= substr(str_shuffle($chars), 0, 5);
}
}
return $shuffle_start . $str . $shuffle_end;
}
This function prints this from the string "hello": aApi3VhKJrDjeAbCkalprX7ll7N0Qjo3qymiw. Now, I want to remove the random characters from the string so the word "hello" are being clearly seen.
How can I do this?
Just move backwards. Strip 6 characters form start and end, and then get every sixth character
function unrandom($str){
$base = substr($str, 6, strlen($str)-12);
$ret = '';
for($i=0;$i < strlen($base); $i+=6) {
$ret .= substr($base, $i,1);
}
return $ret;
}
I need a good fast function that shortens strings to a set length with UTF8 support. Adding trailing '...' at ends is a plus. Can anyone help?
Assuming mb_* functions installed.
function truncate($str, $length, $append = '…') {
$strLength = mb_strlen($str);
if ($strLength <= $length) {
return $str;
}
return mb_substr($str, 0, $length) . $append;
}
CodePad.
Keep in mind this will add one character (the elipsis). If you want the $append included in the length that is truncated, just minus the mb_strlen($append) from the length of the string you chop.
Obviously, this will also chop in the middle of words.
Update
Here is a version that can optionally preserve whole words...
function truncate($str, $length, $breakWords = TRUE, $append = '…') {
$strLength = mb_strlen($str);
if ($strLength <= $length) {
return $str;
}
if ( ! $breakWords) {
while ($length < $strLength AND preg_match('/^\pL$/', mb_substr($str, $length, 1))) {
$length++;
}
}
return mb_substr($str, 0, $length) . $append;
}
CodePad.
It will preserve all letter characters up to the first non letter character if the third argument is TRUE.
I guess you need to truncate text, so this may be helpful:
if (!function_exists('truncate_string')) {
function truncate_string($string, $max_length) {
if (mb_strlen($string, 'UTF-8') > $max_length){
$string = mb_substr($string, 0, $max_length, 'UTF-8');
$pos = mb_strrpos($string, ' ', false, 'UTF-8');
if($pos === false) {
return mb_substr($string, 0, $max_length, 'UTF-8').'…';
}
return mb_substr($string, 0, $pos, 'UTF-8').'…';
}else{
return $string;
}
}
}
This is something like #alex just posted, but it does not break words.
Try this:
$length = 100;
if(mb_strlen($text, "utf-8") > $length){
$last_space = mb_strrpos(mb_substr($text, 0, $length, "utf-8"), " ", "utf-8");
$text = mb_substr($text, 0, $last_space, "utf-8")." ...";}
Cheers...
I am working on something where I need to generate the sequence 1,2,3...a,b,c,d...z,11,12,13...aa,ab,ac...zzzzzzzz, using php. This will only ever have to happen once, so it dosen't need to be very fast.
Thanks!
function incrementAlphanumeric($number) {
return base_convert(base_convert($number, 36, 10) + 1, 10, 36);
}
echo incrementAlphanumeric(9); // outputs "a"
To populate an array:
$number = 1;
$numbers = array();
while ($number != 'zzzzzzzz') {
$numbers[] = $number;
$number = incrementAlphanumeric($number);
}
http://php.net/base-convert
I recently had to do this with a non-standard set of character (they left out certain characters).
I put together a few functions I found on the net and got:
// this array misses a few letters due to the special naming convention
private $alphabet = array('0', '1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','J','K','L','M','N','P','Q','R','S','T','U','V','W','X','Y','Z');
private function createDecimalFromCode($string){
$decimal = 0;
$base = count($this->alphabet);
$charset = implode($this->alphabet, '');
$charset = substr($charset, 0, $base);
do {
$char = substr($string, 0, 1);
$string = substr($string, 1);
$pos = strpos($charset, $char);
if ($pos === false) {
$error[] = "Illegal character ($char) in INPUT string";
return false;
} // if
$decimal = ($decimal * $base) + $pos;
} while($string <> null);
return $decimal;
}
private function createCodeFromDecimal($decimal){
$s = '';
while($decimal > 0) {
$s = $this->alphabet[$decimal%sizeof($this->alphabet)] . $s;
$decimal = floor($decimal/sizeof($this->alphabet));
}
return $s == '' ? '0' : $s;
}
Essentially I take my last created code, convert it to a decimal, add 1 and then convert that back to the next alphanumeric code.