How to get substring of unicode characters from mysql using php - php

The Unicode characters are stored in mysql database in this format
یہاں تو
There is no only unicode characters in my database by also html and english characters mixed up.
The Problem is I want to get a part of the string from database field 'post_body'
I have used the following sql query
"SELECT SUBSTRING(post_body,1,120) as pst_body from mytable";
This string gives me back 120 characters accurately. But the Problem is if there are unicode symbols in the database then ی is equal to 1 unicode character, so my requirement does not fulfill in this way.
Is there any function that can give me back my specified number of characters regardless of is it unicode character or english character, mean if there is unicode data it should count ی as one character .

I do not think, there is any option in mysql, you can fetch data from mysql then take the action in PHP.
function getSubstring($string, $number){
$keywords = preg_split("/([&])+/", htmlentities($string));
$finalArray = array();
unset($keywords[0]);
for($index = 1;$index <= $number;$index++){
$finalArray[] = $keywords[$index];
}
return str_replace('amp;', '&', implode('', $finalArray));
}
//$string = یہاں تو
//$number = 10;// number of character to be fetch
echo getSubstring($string,10);

Related

Encode cyrillic UTF-8 to Unicode symbols

In a project's code there's this part:
$companies = $db->fetchAll("SELECT id FROM companies where contact_person like '%num%\"". $num. "\"%'");
For example, $num = 'ЦВ123456' (ЦВ - cyrillic symbols). But in the db ЦВ is stored in unicode - \u0426\u0412. So there are no hits. So how can I convert $num to unicode so the query becomes ...contact_person like '\u0426\u0412123456 ?
you can use php build in mb-convert encoding,
convert your $string to the encoding used in the database , and query with the encoded value
https://www.php.net/manual/en/function.mb-convert-encoding.php
something like this
$string = "ЦВ123456";
$unicode = mb_convert_encoding($string, "utf-8", "unicode");

how to change ascii alphabet to utf-8 in php

I have an ASCII string. I like to change its encoding to utf-8.
But I found there's a simple function to change ascii to utf-8 in php.
and vice verse, I like to change utf-8 alphabet to ascii.
Please advise.
I have tried:
<?php
// utf-8
$str = "CHONKIOK";
// I can't even how to print these utf-8 characters in php. I just copied/pasted the string.
// strlen($str) => 24 bytes
// mb_detect_encoding($str) => utf-8
$str2 = "CHONKIOK";
// strlen($str2) => 8 bytes
// mb_detect_encoding($str2) => ascii
// change ascii to utf-8
$str = mb_convert_encoding($str2, "UTF-8");
echo mb_detect_encoding($str);
// returns ascii
What you are doing is correct.
As per mb_detect_encoding it states that it detects the most likely character encoding.
As the entire ASCII set is contained within UTF-8 at the exact same character positions, this function is telling you that it's an ASCII string because it technically is. The bytes of this string when encoded in both ASCII and UFT-8 are identical.
As you've found, when you include some characters outside of the ASCII set then it will give you the next probable encoding.
What exactly should I do to obtain this string: "CHONKIOK" from "CHONKIOK"?
The characters you're after are called "Fullwidth Latin" characters.
Given the C character provided is character 65,315 and a regular C is character 67, you could possible obtain the strings you're after by adding the difference of 65,248. This is only possible because the alphabet tends to repeat in the same order throughout different parts of the character charts.
You can get the code point of a character using mb_ord and convert it back to a character using mb_chr, after adding 65,248.
That might look something like:
$str_input = "ABC abc 123";
$convertable = "ABCDEFG12349abcdefg";
$str_output = "";
for ($i = 0; $i < strlen($str_input); $i++) {
$char = mb_ord($str_input[$i], "UTF-8");
if(str_contains($convertable, $str_input[$i])) $char += 65248;
$str_output .= mb_chr($char, "UTF-8");
}
echo $str_output; // outputs "ABC abc 123"
Just be sure to include the whole alphabet in $convertable
try this to convert to utf-8:
utf8_encode(string $string): string
try this to convert to ASCII:
utf8_decode(string $string): string

Displaying diamond question mark character in Korean characters using str_replace

I'm replacing '*' in the second letter of the variable named $author but the result seems error.
NOTE: I already placed the <meta charset="utf-8"> but the error still the same.
Here's the example code
$str_to_replace = "*";
$author_second_char = $row['author'][1]; // value: �
$author_display = $row['author']; //value: 제드
$author = str_replace($author_second_char, "*" ,$author_display );
//example output = �*�드
Korean uses multibyte characters, so you cannot use the string as an array like structure, because each position will only represent part of each Korean character. Instead, you'll need to split the string into an array based on the number of bytes used to store each character. Trial and error yielded a byte length of 3 for Korean characters.
Here's a code snippet for how to implement it. I simplified it to do a straight replacement once the correct position was identified.
$a = '제드';
$str_to_replace = "*";
$author_array = str_split( $a, 3 ); // necessary because korean uses multibyte characters
$author_array[1] = '*';
$author = implode( '', $author_array);
echo("<br>$a<br>$author");
Output:
제드
제*

Normalize Name-Surname strings: PHP+REGEX (Spanish chars- UTF8)

I'm having strings with name and surname which I need to normalize with a functiont and make them like:
Name Surname (I can recive strings like NAME SURNAME, Name SURNAME, etc...)
I've found this snipet:
echo nameize("HÉCTOR MAÑAÇ");
function nameize($str,$a_char = array("'","-"," ")){
//$str contains the complete raw name string
//$a_char is an array containing the characters we use as separators for capitalization. If you don't pass anything, there are three in there as default.
$string = strtolower($str);
foreach ($a_char as $temp){
$pos = strpos($string,$temp);
if ($pos){
//we are in the loop because we found one of the special characters in the array, so lets split it up into chunks and capitalize each one.
$mend = '';
$a_split = explode($temp,$string);
foreach ($a_split as $temp2){
//capitalize each portion of the string which was separated at a special character
$mend .= ucfirst($temp2).$temp;
}
$string = substr($mend,0,-1);
}
}
return ucfirst($string);
}
Which works pretty well, but, as you can see testing this exact example, doesn't parse spanish chars (utf8) I've tested mb_regex_encoding("UTF-8"); mb_internal_encoding("UTF-8");, headers UTF8, etc. But can't make it work fine with "special" spanish chars.
Any suggestion?
Can't see, where you use the Multibyte String Functions.
Maybe this would be convenient for your needs:
echo mb_convert_case("HÉCTOR MAÑAÇ", MB_CASE_TITLE, "UTF-8");
output:
Héctor Mañaç
Your function works fine for the given example also. Please check your file encoding type. It must be UTF-8. You can check it in Notepadd++.

MySQL insert String error - UTF-8?

I am inserting into a mysql database. I get the following error when trying to do the insert
Incorrect string value: '\xF0\x9F\x87\xB7\xF0\x9F...' for column 'field_4' at row 1
I thought I had figured out this error by simply changing the column encoding the to utf8mb4 and had tested but recently this error appeared again. I am using php to parse the string and run the following function before inserting...
function strip_emoji($subject) {
if (is_array($subject)) {
// Recursive strip for multidimensional array
foreach ($subject as &$value) $value = $this->strip_emoji($value);
return $subject;
} else {
// Match Emoticons
$regexEmoticons = '/[\x{1F600}-\x{1F64F}]/u';
$clean_text = preg_replace($regexEmoticons, '', $subject);
// Match Miscellaneous Symbols and Pictographs
$regexSymbols = '/[\x{1F300}-\x{1F5FF}]/u';
$clean_text = preg_replace($regexSymbols, '', $clean_text);
// Match Transport And Map Symbols
$regexTransport = '/[\x{1F680}-\x{1F6FF}]/u';
$clean_text = preg_replace($regexTransport, '', $clean_text);
return
}
There are several similar questions to this but I still have these errors. Any further advice on how to prevent this error? I realize that it is an emoji unicode character / sprite but not sure how to deal with it.
You are trying to insert a character that spans 4 bytes, so you have to convert the column to the utf8mb4 character set.
The utf8 character set is limited to characters that span 3 bytes (the Unicode characters U+0000 through U+FFFF).
Do you have utf8 charset for the connection as well?
Adding ";charset=utf8" in the PDO-connection string, or executing the query "set names utf8".

Categories