I'm trying to encode strings in base64 with UTF-16 schema.
For testing, I'm using this online tool, that works perfect, but I need it in php code:
https://8gwifi.org/Base64Functions.jsp
Setting config on:
Encode - Schema UTF-16
At this moment I tryed this:
$str = '/v8AVABoAGkAcwAgAGkAcwAgAGEAbgAgAGUAbgBjAG8AZABlAGQAIABzAHQAcgBpAG4AZw==';
$str = base64_decode($str);
$str = iconv("UTF-16", "ISO-8859-1", $str);
echo $str.PHP_EOL.PHP_EOL;
//$str = 'This is an encoded string';
$str = iconv("ISO-8859-1", "UTF-16", $str);
$str = base64_encode($str);
echo $str;
First conversion works well. Takes encoded string, decodes correctly, everything ok.
But reverse conversion outputs this:
//5UAGgAaQBzACAAaQBzACAAYQBuACAAZQBuAGMAbwBkAGUAZAAgAHMAdAByAGkAbgBnAA==
Witch is not the same.
Thanks a lot.
/$str = 'This is an encoded string';
encryption and decryption is working fine without any external support in php version
7.2.5
Please try
Related
Hi I'm having a problem converting special characters to upper case.
With regular strtoupper I get something like DANIëL and when applying mb_strtoupper I get DANI?L.
Here's the code:
mb_strtoupper(rtrim($pieces[1], ","), 'UTF-8')
Mind you, I already have this running on the input:
iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $tr->TD[0])
Could this be the reason? Or is there something else?
Typical issue of trying to uppercasing a Latin1 when the converter expect UTF-8
Be sure to check your string source. This sample will works if your text editor works in Latin1 pagecode, and not in UTF-8
$str = "daniël"; //or your rtrim($pieces[1],",")
$str = mb_convert_encoding($str,'UTF-8','Latin1');
echo mb_strtoupper($str, 'UTF-8');
//will echo DANIËL
My boss is forcing me to use an access mdb database (yes, I'm serious) in a php server.
I can connect it and retrieve data from it, but as you could imagine, I have problems with encodings because I want to work using utf8.
The thing is that now I have two "solutions" to translate Windows-1252 to UTF-8
This is the first way:
mb_convert_encoding($string, "UTF-8", "Windows-1252").
It works, but the problem is that special chars are not properly converted, for example char º is converted to \u00ba and char Ó is converted to \u00d3.
My second way is doing this:
mb_convert_encoding(mb_convert_encoding($string, "UTF-8", "Windows-1252"), "HTML-ENTITIES", "UTF-8")
It works too, but it happens the same, special chars are not correctly converted. Char º is converted to º
Does anybody know how to properly change encoding including special chars?
Or does anybody know how to convert from º and \u00ba to something readable?
I did simple test to convert codepoint to letters
<?php
function codepoint_decode($str) {
return json_decode(sprintf('"%s"', $str));
}
$string_with_codepoint = "Ahed \u00d3\u00ba\u00d3";
// $string_with_codepoint = mb_convert_encoding($string, "UTF-8", "Windows-1252");
$output = codepoint_decode($string_with_codepoint);
echo $output; // Ahed ÓºÓ
Credit go for this answer
I finally found the solution.
I had the solution from the beginning but I was doing my tests wrong.
My bad.
The right way to do it for me is mb_convert_encoding($string, "UTF-8", "Windows-1252")
But i was checking the result like this:
$stringUTF8 = mb_convert_encoding($string, "UTF-8", "Windows-1252");
echo json_encode($stringUTF8);
that's why it was returning unicode chars like \u20ac, if I would have done:
$stringUTF8 = mb_convert_encoding($string, "UTF-8", "Windows-1252");
echo $stringUTF8;
I should have seen the solution from the beginning but I was wrong. It was json_encode() what was turning special chars into unicode chars.
Thanks everybody for your help!!
I have source string (received from mail body)
=C7=E4=F0=E0=E2=F1=F2=E2=F3=E9=F2=E5
Online decoder says it Windows-1251 encoding and successfully convert it to UTF-8. mb_detect_encoding says it ASCII
I need to convert via PHP. I tried mb_convert_encoding and iconv, solution from stackoverflow (for example and one more) and many others. But there is no result. Source string is not changed.
Maybe you know working solution? Thank you.
Yes you could try apply iconv() in this case:
header('Content-Type: text/html; charset=utf-8');
$string = '=C7=E4=F0=E0=E2=F1=F2=E2=F3=E9=F2=E5';
$string = str_replace('=', '%', $string);
$string = rawurldecode($string);
$string = iconv('Windows-1251', 'UTF-8', $string);
echo $string; // Здравствуйте
I am working with a website that needs to target old, Japanese mobile phones, that are not Unicode enabled. The problem is, the text for the site is saved in the database as HTML entities (ie, Ӓ). This database absolutely cannot be changed, as it is used for several hundred websites.
What I need to do is convert these entities to actual characters, and then convert the string encoding before sending it out, as the phones render the entities without converting them first.
I've tried both mb_convert_encoding and iconv, but all they are doing is converting the encoding of the entities, but not creating the text.
Thanks in advance
EDIT:
I have also tried html_entity_decode. It is producing the same results - an unconverted string.
Here is the sample data I am working with.
The desired result: シェラトン・ヌーサリゾート&スパ
The HTML Codes: シェラトン・ヌーサリゾート&スパ
The output of html_entity_decode([the string above],ENT_COMPAT,'SHIFT_JIS'); is identical to the input string.
Just take care you're creating the right codepoints out of the entities. If the original encoding is UTF-8 for example:
$originalEncoding = 'UTF-8'; // that's only assumed, you have not shared the info so far
$targetEncoding = 'SHIFT_JIS';
$string = '... whatever you have ... ';
// superfluous, but to get the picture:
$string = mb_convert_encoding($string, 'UTF-8', $originalEncoding);
$string = html_entity_decode($string, ENT_COMPAT, 'UTF-8');
$stringTarget = mb_convert_encoding($string, $targetEncoding, 'UTF-8');
I found this function on php.net, it works for me with your example:
function unhtmlentities($string) {
// replace numeric entities
$string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
$string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
// replace literal entities
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
$trans_tbl = array_flip($trans_tbl);
return strtr($string, $trans_tbl);
}
I think you just need html_entity_decode.
Edit: Based on your edit:
$output = preg_replace_callback("/(&#[0-9]+;)/", create_function('$m', 'return mb_convert_encoding($m[1], "UTF-8", "HTML-ENTITIES"); '), $original_string);
Note that this is just your first step, to convert your entities to the actual characters.
just to participate as I encountered some kind of encoding bug while coding, I would suggest this snippet :
$string_to_encode=" your string ";
if(mb_detect_encoding($string_to_encode)!==FALSE){
$converted_string=mb_convert_encoding($string_to_encode,'UTF-8');
}
Maybe not the best for a large amount of data, but still works.
Im working on an imdb data scraper for a site, and I they seem to encode everything in a weird encoding I never saw before.
Exploding Ship
A Bug's Life
Is there a php function that will convert these to regular characters?
This is not encoding, it's html entities hexadecimal codes.
try
$converted = html_entity_decode($string, ENT_QUOTES, 'UTF-8');
Those are SGML character escapes. They can be either decimal (') or hexadecimal ( ) and refer directly to a Unicode code point.
html_entity_decode() should work in PHP 5. Though I can't test at the moment.
In the first comment on that reference page, the following code is given for older PHP versions:
// For users prior to PHP 4.3.0 you may do this:
function unhtmlentities($string)
{
// replace numeric entities
$string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
$string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
// replace literal entities
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
$trans_tbl = array_flip($trans_tbl);
return strtr($string, $trans_tbl);
}