I send russian alphabet with inline-keyboard, in callback_data I pass the letter that user selected. It looks like this:
But telegram returns me this letter is this way \xd0\xb3.
I also save word for compare in mysql db. It returns in this way \u0438\\u043c\\u043f\\u0435\\u0440\\u0430\\u0442\\u0438\\u0432. The encoding in the database is utf8_general_ci.
And as a result, I need to check if the selected letter is in the word from the database. How can I do that?
MySQL never generates \u0438, a Unicode representation. It will generate the 2-byte character whose hex is D0B3 (which might show as \xd0\xb3), specifically a Cyrillic character. And you should provide that format when INSERTing into a MySQL table.
PHP's json_encode will generate the Unicode form instead of the other, depending on the absence or presence of JSON_UNESCAPED_UNICODE in the second argument.
To check the database, do something like:
SELECT col, HEX(col) ...
If "correct" you should get something like
г D0B3
(That's a Cyrillic GHE, not a latin r.)
Who knows what telegram is doing to the data. There are over a hundred packages that use MySQL under the covers; I don't know anything about this one.
Terminology: The encoding is utf8 (or could be utf8mb4). The collation, according to what you say, is utf8_general_ci. Encoding is relevant to the querstion; collation has to do with the ordering of strings in comparisons and sorting.
Anoter example: Cyrillic small letter I и = utf8 hex D0B8 = Unicode codepoint \U0438
HTML is quite happy with Unicode codepoints; it will show и when given \U0438. Perhaps Telegram is converting to codepoints as it builds the web page?
Related
I am using below mysql query to check which records vary from the trimmed value
SELECT id, BINARY(username) as binary_username, TRIM(username) as trim_username FROM table.
Above query returns binary value and trimmed value as shown below.
Result of mysql query:
Highlighted values in above image show that binary value vary from trimmed value.
I tried below 2 things:
calculating length of both binary and trimmed column but it is same LENGTH(binary_username) != LENGTH(trim_username).
equating them directly binary_username != trim_username.
but both of them are returning empty records.
How can I fetch these highlighted entries using mysql?
Edit 1: I have added HEX value in the query result
SELECT id, BINARY(username) as binary_username, TRIM(username) as trim_username, HEX(username) as hex_username FROM table
Thanks in advance...
To avoid storing, trimming, etc, the trailing zeros, use VARBINARY instead of BINARY. Why, pray tell, are you using BINARY for text strings??
Please do SELECT HEX(username) FROM ... so we can further diagnose the problem. That screenshot is suspect -- we don't know what the client did to "fix" the output.
Well, none of those are encoded in UTF-8, nor anything else that I recognize. The 'bad' characters (02, 04, 0c 17), are all "control codes" in virtually all encodings. ("Unicode" is not an encoding method, so it is not relevant.)
Would you like a REGEXP that tests for control codes?
In PHP, json_encode has an option for JSON_UNESCAPED_UNICODE. See https://www.php.net/manual/en/function.json-encode.php
But that generates \u1234 type text.
When storing binary data into MySQL, use the binding or escaping mechanism in PDO or mysqli.
I'm trying to figure out the collation I should use for simple user tables that only contain two columns, email and password, whereby the input for password will be the output of password_hash($str, PASSWORD_DEFAULT).
What is the lightest weighted collation necessary for password_hash? Is it ascii_bin? latin1_bin?
Collation performance...
..._bin has the least to do, so they are the fastest.
ascii_... checks to see if you are using only 7 bits; so quite fast.
..._general_ci checks only bytes, no combinations of bytes. Example: German ß <> 'ss', unlike most other collations.
utf8_... and utf8mb4_... check the bytes for valid encodings.
Meanwhile, MySQL 8.0 has made the utf8mb4_... collations "orders of magnitude faster" than 5.7.
But I usually find that other considerations are more important in any operation in MySQL.
Another example of that... SELECT ... function(foo) ... -- The cost of evaluating the function is usually insignificant relative to the cost of fetching the row. So, I focus on how to optimize fetching the row(s).
As for hashes, ... It depends on whether the function returns a hexadecimal string or a bunch of bytes...
Hex: Use CHARACTER SET ascii COLLATION ascii_bin (or ascii_ci) The ...ci will do case folding, thereby be more forgiving; this is probably the 'right' collation for the case.
Bytes: Use the datatype BINARY; which is roughly equivalent to CHAR CHARACTER SET binary.
As for whether to use BINARY versus VARBINARY or CHAR versus VARCHAR, that should be controlled by whether the function returns a fixed length result. For example:
MD5('asdfb') --> '23c42e11237c24b5b4e01513916dab4a' returns exactly 32 hex bytes, so CHAR(32) COLLATION ascii_ci is 'best'.
But, you can save space by using BINARY(16) (no collation) and put UNHEX(MD5('asdfb')) into it.
UUID() --> '161b6a10-e17f-11e8-bcc6-80fa5b3669ce', which has some dashes to get rid of. Otherwise, it is CHAR(36) or BINARY(16).
I have set my Charset "utf8 turkish ci" in my MySql database. Because I will store some Turkish characters in my project. I can properly enter Turkish charaters and see them. But my problem is that:
For example, i define "username" as varhar(20) and the maxlenght of inputbox is 20. That means user can't write any username more than 20 characters. But when user uses Turkish unicode characters (like ş,i,ü,ğ) there becomes "Data too long for column 'username'" error, because unicode characters are 2 bytes long!
I tried to update my database with phpmyadmin. But updating the lenght, brings some more errors. so do i have to drop all the tables and write them with x2 lenght? (i mean if data will be 20 char, that i define it varchar(40) ) I have 30 tables and it is a nightmare. Is there any way that i can do?
MySQL will by default use 3 bytes to store any character for a VARCHAR specified as UTF8 (or 4 bytes for UTF8MB4).
VARCHAR(10) actually does mean 10 characters, 30 bytes. It doesn't mean 10 bytes.
I suspect your <form> needs to include the charset: <form accept-charset="UTF-8">
I'm using PDO to connect to a MySQL database. In my connection string I have already added charset=utf8mb4 and all of my databases and tables are utf8mb4_unicode_ci, But I'm facing a problem.
In order to search for entries based on their title on content table I'm using the code below:
SELECT * FROM content WHERE title LIKE '%سيگنالها%'
the keyword is a Persian word. Now the above code returns 1 result which is correct and as expected.
But If I make a form in my PHP app and enter the SAME word either by using a macOS/Windows PC or by using an Android phone I get 0 results.
I tracked this issue down and it seems like even though the words entered by user look exactly the same as the one already in the database, they are in fact NOT the same.
According to this online tool, the decimal character code
for سيگنالها it's: 1587, 1610, 1711, 1606, 1575, 1604, 1607, 1575
While
for سیگنالها it's: 1587, 1740, 1711, 1606, 1575, 1604, 1607, 1575
Did you spot the difference? It's in bold. In fact if you copy both values and past them in here you will see the difference for yourself.
What can I do to solve this annoying problem? I'm using PHP 7 and MariaDB 10.1.
Your first "ي" in the word "سيگنالها" is different character from second word "سیگنالها" which is "ی"
First ي: is ARABIC LETTER YEH (U+064A)
Second ی: is ARABIC LETTER FARSI YEH (U+06CC)
They are different in their Unicode entities, so that they are not match.
Please see https://www.key-shortcut.com/en/writing-systems/%EF%BA%95%EF%BA%8F%D8%A2-arabic-alphabet/ for more information.
They are not the same character, even though they look the same when stringed together and might even have the same meaning.
The first string (1610) is ARABIC LETTER FARSI YEH[1] while the other (1740) is ARABIC LETTER YEH[2].
[1] https://en.wiktionary.org/wiki/%DB%8C
[2] https://en.wiktionary.org/wiki/%D9%8A
I also created a simple form for PHP and tested both strings to see if the value sent through $_POST is kept. Result: the value isn't converted.
So what's probably going on is that you're using an Arabic keyboard to produce Farsi text. The recommended solution is some kind of normalization of the input.
See these discussions:
1) https://groups.google.com/forum/embed/?place=forum/persian-computing#!topic/persian-computing/xS-G0qIGS8A
2) https://github.com/Samsung/KnowledgeSharingPlatform/blob/master/sameas/lib/lucene-analyzers-common-5.0.0/org/apache/lucene/analysis/fa/PersianNormalizer.java
3) can't search in farsi text with arabic keyboard on iphone
Can we declare a variable with fixed length in PHP?
I'm not asking about trimming or by putting condition do substring.
Can we declare variable just like database char(10).
The reason I'm asking am doing an export process, PHP export data to DB.
In DB I have a field with size 100, and I'm passing a field with length 25, using PHP.
When I look in DB, it's showing some extra space for that field.
Maybe it's your database that is the problem.
The CHAR datatype will always fill up the remaining unused characters when storing data. If you have CHAR(3) and pass 'hi', it will store it as 'hi '. This is true for a lot of relational database engines (MySQL, Postgres, SQLite, etc.).
This is why some database engines also have the VARCHAR datatype (which is variable, like the name says). This one doesn't pad the content with spaces if the data stored in isn't long enough.
In most cases, you are looking for the VARCHAR datatype. CHAR is mostly useful when you store codes, etc. that always have the same length (e.g.: a CHAR(3) field for storing codes like ADD, DEL, CHG, FIX, etc.).
No, a string in PHP is always variable length. You could trim the string to see if extra space is still passed to your DB.
Nope. PHP has no provision to limit string size.
You could simulate something in an object using setter and getter variables, though, throwing an error (or cutting off the data) if the incoming value is larger than allowed.
No, but I really don't think you're having a problem with php. I think you should check your DB2 configuration, perhaps it automatically completes strings with spaces... How much spaces are added? Are they added before? After?
As others have said: No.
I don't understand how it would help anyway. I'm not familiar with DB2 but it sounds like if you have extra spaces, they are either coming in the variable (and thus it should be trimmed) or DB2 does space padding to make the value have 100 characters. If your input is only 25 characters long then if it is doing space padding, it seems it would do it anyway.
If you want to store variable length strings in DB2 then go with VARCHAR, if you always want the same length for each string in the column, define the exact length using CHAR (for postal codes, for instance).
Details on character strings is available here: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0008470.html with a good summary:
Fixed-length character string (CHAR)
All values in a fixed-length string column have the same length, which is determined by the length attribute of the column. The length attribute must be between 1 and 254, inclusive.
Varying-length character strings
There are two types of varying-length character strings:
A VARCHAR value can be up to 32,672 bytes long.
A CLOB (character large object) value can be up to 2 gigabytes minus 1 byte (2,147,483,647 bytes) long.
Of course it then gets more detailed, depending on what sort of encoding you're using, etc... ( like UTF-16 or UTF-32 )