PHP Convert win-1251 to UTF-8 is not working

PHP Convert win-1251 to UTF-8 is not working - php

I have source string (received from mail body)
=C7=E4=F0=E0=E2=F1=F2=E2=F3=E9=F2=E5
Online decoder says it Windows-1251 encoding and successfully convert it to UTF-8. mb_detect_encoding says it ASCII
I need to convert via PHP. I tried mb_convert_encoding and iconv, solution from stackoverflow (for example and one more) and many others. But there is no result. Source string is not changed.
Maybe you know working solution? Thank you.

Yes you could try apply iconv() in this case:
header('Content-Type: text/html; charset=utf-8');
$string = '=C7=E4=F0=E0=E2=F1=F2=E2=F3=E9=F2=E5';
$string = str_replace('=', '%', $string);
$string = rawurldecode($string);
$string = iconv('Windows-1251', 'UTF-8', $string);
echo $string; // Здравствуйте

Related

PHP - Encode base64 with UTF-16 Schema

I'm trying to encode strings in base64 with UTF-16 schema.
For testing, I'm using this online tool, that works perfect, but I need it in php code:
https://8gwifi.org/Base64Functions.jsp
Setting config on:
Encode - Schema UTF-16
At this moment I tryed this:
$str = '/v8AVABoAGkAcwAgAGkAcwAgAGEAbgAgAGUAbgBjAG8AZABlAGQAIABzAHQAcgBpAG4AZw==';
$str = base64_decode($str);
$str = iconv("UTF-16", "ISO-8859-1", $str);
echo $str.PHP_EOL.PHP_EOL;
//$str = 'This is an encoded string';
$str = iconv("ISO-8859-1", "UTF-16", $str);
$str = base64_encode($str);
echo $str;
First conversion works well. Takes encoded string, decodes correctly, everything ok.
But reverse conversion outputs this:
//5UAGgAaQBzACAAaQBzACAAYQBuACAAZQBuAGMAbwBkAGUAZAAgAHMAdAByAGkAbgBnAA==
Witch is not the same.
Thanks a lot.

/$str = 'This is an encoded string';
encryption and decryption is working fine without any external support in php version
7.2.5
Please try

Converting to UTF-8 in PHP

I'm calling the Google Translate API and I need to send UTF-8 as input.
I have a piece of code to convert a string to UTF-8 but not matter what I try, when I check the encoding right after the conversion operation I get ASCII as the encoding of the string.
Here is the most popular answer I could find:
iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);
The other way I tried was like this:
$text = utf8_encode($text);
As soon as I check the encoding again (on both cases) I get ASCII as the result:
echo mb_detect_encoding($text);
What am I missing here?
Thanks for any tips.

Convert UTF-8 to WINDOWS-1258 using PHP

I'm needing to convert a UTF-8 character set to Windows-1252 using PHP and i'm not having much luck thus far. My aim is to transfer text to a 3rd party system and exclude any characters not in the Windows-1252 character set.
I've tried both iconv and mb_convert_encoding but both give unexpected results.
$text = 'KØBENHAVN Ø ô& üü þþ';
echo iconv("UTF-8", "WINDOWS-1252", $text);
echo mb_convert_encoding($text, "WINDOWS-1252");
Output for both is 'K?BENHAVN ? ?& ?? ??'
I would not have expected the ?'s as these characters are in the WINDOWS-1252 character set.
Can anyone help cast some light on this for me please.

I ended up running the text from UTF-8 to WINDOWS-1252 and then back from WINDOWS-1252 to UTF-8. This gave the desire output.
$text = "Ѭjanky";
$converted = iconv("UTF-8//IGNORE", "WINDOWS-1252//IGNORE", $text);
$converted = iconv("WINDOWS-1252//IGNORE", "UTF-8//IGNORE", $converted);
echo $text; // outputs "janky"

html_entity_decode in FPDF(using tFPDF extension)

I am using tFPDF to generate a PDF. The php file is UTF-8 encoded.
I want © for example, to be output in the pdf as the copyright symbol.
I have tried iconv, html_entity_decode, htmlspecialchars_decode. When I take the string I am trying to decode and hard-code it in to a different file and decode it, it works as expected. So for some reason it is not being output in the PDF. I have tried output buffering. I am using DejaVuSansCondensed.ttf (true type fonts).
Link to tFPDF: http://fpdf.org/en/script/script92.php
I am out of ideas. I tried double decoding, I checked everywhere to make sure it was not being encoded anywhere else.

you need this:
iconv('UTF-8', 'windows-1252', html_entity_decode($str));
the html_entity_decode decodes the html entities. but due to any reason you must convert it to utf8 with iconv. i suppose this is a fpdf-secret... cause in normal browser view it is displayed correctly.

Actully, fpdf project FAQ has an explanation for it:
http://www.fpdf.org/~~V/en/FAQ.php#q7
Don't use UTF-8 encoding. Standard FPDF fonts use ISO-8859-1 or
Windows-1252. It is possible to perform a conversion to ISO-8859-1
with utf8_decode():
$str = utf8_decode($str);
But some characters such as Euro won't be translated correctly. If the
iconv extension is available, the right way to do it is the following:
$str = iconv('UTF-8', 'windows-1252', $str);
So, as emfi suggests, a combination of iconv() and html_entity_decode() PHP functions is the solution to your question:
$str = iconv('UTF-8', 'windows-1252', html_entity_decode("©"));

I'm pretty sure there is no automatic conversion available from HTML entity codes to their UTF-8 equivalents. In cases like this I have resorted to manual string replacement, eg:
$strOut = str_replace( "©", "\xc2\xa9", $strIn );

I have fix the problem with this code:
$str = utf8_decode($str);
$str = html_entity_decode($str);
$str = iconv('UTF-8', 'windows-1252',$str);

You can also use setFont('Symbol') or setFont('ZapfDingbats') to select the special characters that you want to print.
define('TICK', chr(214)); # in font 'Symbol' -> print a tick symbol
...
$this->SetFont('Symbol', 'B', 8);
$this->Cell(5, 5, TICK, 0, 'L'); # will output the symbol to PDF
Output: √
This way, you won't need to convert to ISO-8859-1 or Windows-1252 OR use another library tFPDF for special characters :)
Refer: http://www.fpdf.org/en/script/script4.php for font & character list

Converting HTML Entities in UTF-8 to SHIFT_JIS

I am working with a website that needs to target old, Japanese mobile phones, that are not Unicode enabled. The problem is, the text for the site is saved in the database as HTML entities (ie, Ӓ). This database absolutely cannot be changed, as it is used for several hundred websites.
What I need to do is convert these entities to actual characters, and then convert the string encoding before sending it out, as the phones render the entities without converting them first.
I've tried both mb_convert_encoding and iconv, but all they are doing is converting the encoding of the entities, but not creating the text.
Thanks in advance
EDIT:
I have also tried html_entity_decode. It is producing the same results - an unconverted string.
Here is the sample data I am working with.
The desired result: シェラトン・ヌーサリゾート＆スパ
The HTML Codes: シェラトン・ヌーサリゾート＆スパ
The output of html_entity_decode([the string above],ENT_COMPAT,'SHIFT_JIS'); is identical to the input string.

Just take care you're creating the right codepoints out of the entities. If the original encoding is UTF-8 for example:
$originalEncoding = 'UTF-8'; // that's only assumed, you have not shared the info so far
$targetEncoding = 'SHIFT_JIS';
$string = '... whatever you have ... ';
// superfluous, but to get the picture:
$string = mb_convert_encoding($string, 'UTF-8', $originalEncoding);
$string = html_entity_decode($string, ENT_COMPAT, 'UTF-8');
$stringTarget = mb_convert_encoding($string, $targetEncoding, 'UTF-8');

I found this function on php.net, it works for me with your example:
function unhtmlentities($string) {
// replace numeric entities
$string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
$string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
// replace literal entities
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
$trans_tbl = array_flip($trans_tbl);
return strtr($string, $trans_tbl);
}

I think you just need html_entity_decode.
Edit: Based on your edit:
$output = preg_replace_callback("/(&#[0-9]+;)/", create_function('$m', 'return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); '), $original_string);
Note that this is just your first step, to convert your entities to the actual characters.

just to participate as I encountered some kind of encoding bug while coding, I would suggest this snippet :
$string_to_encode=" your string ";
if(mb_detect_encoding($string_to_encode)!==FALSE){
$converted_string=mb_convert_encoding($string_to_encode,'UTF-8');
}
Maybe not the best for a large amount of data, but still works.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Convert win-1251 to UTF-8 is not working - php

Related

PHP - Encode base64 with UTF-16 Schema

Converting to UTF-8 in PHP

Convert UTF-8 to WINDOWS-1258 using PHP

html_entity_decode in FPDF(using tFPDF extension)

Converting HTML Entities in UTF-8 to SHIFT_JIS

Categories

Resources