PHP: need to decode a string with utf chars embedded - php
I have a string which is decoded as base36, ie 0-9a-z,
any other characters were decoded as follows: a unicode character code, converted to base36 and preceeded by capital letter 'A', and followed by letter 'B'.
If multiple unicode chars appear, only the last one if followed by 'B'.
Example:
zergme#wtfd-婴儿服饰.com
converted as:
zergmeA1sBwtfdA19Ahv8Ag1rAkctAub4A1aBcom
It was convenient to convert the data that way, but now I'm bashing my head on how to write a decode it back algorithm.
I already provided for a function that convert charcodes to the Unicode chars, which let be called 'unichr($code)';
...but I can't think of a good way finding these chars.
I was trying to use regexp first, something like:
preg_replace('/A.*?B?(?=[AB])/',"$1",$mail);
But it didn't work the way I wanted... And I also didn't realize how to cast my custom convertion function aka 'unichr()' on the matches.
Then I was also thinking about manually finding chars with strpos(), but it also turned out to be messy.
Could you advice some pattern? Or whether I should elaborate on regexp or try to use some loop? I'm kinda blank... Thanks :)
LOLMAO
That is it, Looks like I figured out, thanks to your contribution:
'/A(.*?)((?=A)|B)/'
Have you looked into using preg_replace_callback() instead? It takes a function instead of a string as the replace value, and will pass the matches to the function and use the function's return value as the replace string.
Loose example, you'll have to play around a bit
<?php
$str = 'zergmeA1sBwtfdA19Ahv8Ag1rAkctAub4A1aBcom';
function convert_to_unicode_cb( $match )
{
// $match1 would be 1s, 19, hv8, etc
return unichr( $match[1] );
}
preg_replace_callback( '/A(.*?)(?=A|B)/', 'convert_to_unicode_cb', $str );
How aobut Base64 encoding (gzcompress) and decoding (gzuncompress).
Save the following with the name "testBase64.php":
<?php
if(isset($_POST['text'])){
echo("<b>input:</b> ".$_POST['text']."<br/>");
$c = gzcompress($_POST['text']);
echo("<b>base64 encoding:</b> .".$c."<br/>");
echo("<b>base64 decoding:</b> " .gzuncompress($c));
exit;
}
?>
<html>
<body>
<form method=post action=testBase64.php>
<input type=text name=text />
<input type=submit />
</form>
</body>
</html>
Run and enter "zergme#wtfd-婴儿服饰.com" in the text field.
Output:
input: zergme#wtfd-婴儿服饰.com
base64 encoding: .xœ«J-JÏMu(/IKÑUS62645³Òæ–– ÚÌØÂH[YXë%ççG°#
base64 decoding: zergme#wtfd-婴儿服饰.com
Hope this helps.
Related
Cookie and string comparison won't match
I've got a problem, I store string in $_COOKIE['restaurant_name'] it stores string for example: "MMM skanu", when I try comparing them, they seem like they're different strings, if ($_COOKIE['restaurant_name'] == "MMM skanu") { // always false } but when I for example try to print it, with echo $_COOKIE['restaurant_name']; I see it's printing the same string "MMM skanu". I tried using strval() function, but it's still the same. How do I parse or convert this cookie to string? I can also see in my google chrome cookies, that restaurant_name = %20MMM%20skanu%20, does it have anything to do with it?
Here I'm decoding any encoding like '%20' using the inbuilt function urldecode. This function decodes encoded characters and turns them into a space charachter for example "what%20" after decoding is "what ". Using trim I'm removing any extra space for example "%20what" after decoding becomes " what" and trim removes the space there. $restaurant_name = trim(urldecode($_COOKIE['restaurant_name'])); if($restaurant_name == "MMM skanu"){ // do something }
$_GET didn't not print out # symbol and text behind
i have a string send by $_GET['foo']="C# Programmer"; when I echo $_GET['foo'], it only print C is any way to solve this problem when string container # or other symbols send by $_GET
Spaces and hashes (#) are not valid in HTTP URLs and will need to be encoded if you want to use it in parameter values. You can use urlencode() to create URL-safe paramter values. The following should work: foo=C%23%20Programmer If you're trying to send the GET request from a different page, you'd want something like this: <?php $var = 'C# Programmer'; ?> Go Now, in select.php, if you try to echo $_GET['foo'];, it'll display C# Programmer.
Make use of functions like urlencode for passing this kind of values, and urldecode to get this values. Try to make use of post method to avoid this kind of problems
You shoud use the function named urlencode which returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. <?php $var = 'C# Programmer'; ?> <!-- end of PHP code --> <a href="select.php?foo= <?php echo urlencode($var) ?> <!-- start and end of PHP code --> "> Go </a> Live example For more information, read about urlencode.
PHP function question
I don't now if this is the place to ask this kind of question so I will give it a try. I was wondering what does the following php user defined function do in the code example below? If someone explain it to me in detail thanks. function decode_characters($info) { $info = mb_convert_encoding($info, "HTML-ENTITIES", "UTF-8"); $info = preg_replace('~^(&([a-zA-Z0-9]);)~',htmlentities('${1}'),$info); return($info); }
The function is a little odd. The first function call transforms a string encoded in UTF-8 to an ASCII encoded string where the non-mapped characters are converted to HTML entities (named entities if they exist in HTML 4, otherwise numeric entities). For instance: echo mb_convert_encoding("foo\"é⌑'&", "HTML-ENTITIES", "UTF-8"); yields foo"é⌑'& So this differs from htmlentities in that 1) numerical entities are used in the circumstances given and 2) special characters such as &, " or < are not touched. The second function call, however, is more strange. It finds if a named entity with only one ASCII alphanumeric character starts the input, and, if so, calls htmlentities on this input (actually it doesn't because the e modifier is not used and the function name is not in a string, so it's executed when the arguments are evaluated). This call has no effect because htmlentities('${1}') is '${1}' and the backreference 1 encompasses the whole match, so, even if the expression matches, there's no substitution.
Help with replacing characters
Hopefully someone can help out here; I am trying to write a function which replaces special characters and returns the correct one. This is what I have so far: function convertlatin($output){ $latinchar = array("€", "‚","Æ'","„","…","‡","ˆ","‰","Å","‹","Å'",'Ž','‘','’','“','â€','•','â€"','â€"','Ëœ','â"¢','Å¡','›','Å"',"ž",'Ÿ','¡','¢','£','¤','Â¥','¦','§','¨','©','ª','«','¬','®','¯','°','±','²','³','´','µ','¶','·','¸','¹','º','»','¼',"½",'¾','¿','À','Ã','Â','Ã','Ã"','Ã…','Æ','Ç','È','É','Ê','Ë','ÃŒ ','Ã','ÃŽ','ß','Ã',"Ã'","Ã'",'Ã"','Ã"','Õ','Ö','×','Ø','Ù','Ú','Û','Ãœ','Ã','Þ','ß','Ã','á','â','ã','ä','Ã¥','æ','ç','è','é','ê','ë','ì','Ã','î','ï','ð','ñ','ò','ó','ô','õ','ö','÷','ø','ù','ú','û','ü','ý',"þ","ÿ"); $correctchar = array("€", "‚","ƒ",'"','…','‡','ˆ','‰',"Š",'‹','Œ','Ž',"'","'",'"','"','•','–','—','˜','™','š','›','œ','ž','Ÿ','¡','¢','£','¤','¥','¦','§','¨','©','ª','«','¬','®','¯','°','±','²','³','´','µ','¶','·','¸','¹','º','»','¼','½','¾','¿','À','Á','Â','Ã','Ä','Å','Æ','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ð','Ñ','Ò','Ó','Ô','Õ','Ö','×','Ø','Ù','Ú','Û','Ü','Ý','Þ','ß','à','á','â','ã','ä','å','æ','ç','è','é','ê','ë','ì','í','î','ï','ð','ñ','ò','ó','ô','õ','ö','÷','ø','ù','ú','û','ü','ý','þ',"ÿ"); $returnval = str_replace($latinchar, $correctchar, $output); echo($returnval); return $returnval; } The problem I have is I thought it was working but it has random results, such as if it finds a match on just one of the characters it replaces a different one in that array. What I would like to do is find and replace an exact match of latin char within a supplied string eg "testingÿ" with "testingÿ" - at the mo it replaces ÿ with testingá¿ It just seems to replace one character in some occasions, when I would like it to match and replace both parameters. I also tried strcmp with not much success. Any ideas ?
Looks like your problem is not wrong chars, it's more a wrong encoding. Maybe you better try to change the encoding of $output. utf8_encode will not help you, the "wrong" chars look like some wrong converted Windows-1252-input. Try: echo mb_convert_encoding('testingÿ','CP1252','UTF-8');
How to Convert Arabic Characters to Unicode Using PHP
I want to to know how can I convert a word into unicode exactly like: http://www.arabunic.free.fr/ can anyone know how to do that using PHP considering that Arabic text may contains ligatures? thanks Edit I'm not sure what is that "unicode" but I need to have the Arabic Character in it's equivalent machine number considering that arabic characters have different contextual forms depending on their position - see here: http://en.wikipedia.org/wiki/Arabic_alphabet#Table_of_basic_letters the same character in different position: ب | ـب | ـبـ | بـ I think it must be a way to convert each Arabic character into it's equivalent number, but how? Edit I still believe there's a way to convert each character to it's form depending on positions any idea is appreciated..
All what you need is function called: utf8Glyphs which you can find it in ArGlyphs.class.php download it from ar-php and visit Ar-PHP for the ArPHP more information about the project and classes. This will reverse the word with same of its characters (glyphs). Example of usage: <?php include('Arabic.php'); $Arabic = new Arabic('ArGlyphs'); $text = 'بسم الله الرحمن الرحيم'; $text = $Arabic->utf8Glyphs($text); echo $text; ?>
i assume you wnat to convert بهروز to \u0628\u0647\u0631\u0648\u0632 take a look at http://hsivonen.iki.fi/php-utf8/ all you have to do after calling unicodeToUtf8('بهروز') is to convert integers you got in array to hex & make sure they have 4digigts & prefix em with \u & you're done. also you can get same using json_encode json_encode('بهروز') // returns "\u0628\u0647\u0631\u0648\u0632" EDIT: seems you want to get character codes of بب which first one differs from second one, all you have to do is applying bidi algorithm on your text using fribidi_log2vis then getting character code by one of ways i said before. here's example: $string = 'بب'; // \u0628\u0628 $bidiString = fribidi_log2vis($string, FRIBIDI_LTR, FRIBIDI_CHARSET_UTF8); json_encode($bidiString); // \ufe90\ufe91 EDIT: i just remembered that tcpdf has bidi algorithm which implemented using pure php so if you can not get fribidi extension of php to work, you can use tcpdf (utf8Bidi by default is protected so you need to make it public) require_once('utf8.inc'); // http://hsivonen.iki.fi/php-utf8/ require_once('tcpdf.php'); // http://www.tcpdf.org/ $t = new TCPDF(); $text = 'بب'; $t->utf8Bidi(utf8ToUnicode($text)); // will return an array like array(0 => 65168, 1 => 65169)
Just set the element containing the arabic text to "rtl" (right to left), then input correctly spelled arabic and the text will flow with all ligatures looked for. div { direction:rtl; } On a side note, don't forget to read "The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)" Think about that : The "ba" (ب) arabic letter is a "ba" no matter where it appears in the sentence.
Try this: <?php $string = 'a'; $expanded = iconv('UTF-8', 'UTF-32', $string); $arr = unpack('L*', $expanded); print_r($arr); ?>
I'm totally agree with FloatBird about the use of the arabic.php which you will find it as he said at ar-php, The thing is they have changed the class name after version 4 from Arabic to I18N_Arabic so in order for the code to work using arabic.php ver 4.0 you need to change the code to <?php include('Arabic.php'); $Arabic = new I18N_Arabic('ArGlyphs'); $text = 'بسم الله الرحمن الرحيم'; $text = $Arabic->utf8Glyphs($text); echo $text; ?> Also notice that you need to put the php code file inside the I18N folder. Anyway it is working fantastically, Thanks again FloatBird
I had a similar problem when I wanted to store an object that had values in Arabic, so writing in Arabic was stored as UNICODE," so the solution was as follows. $detailsLog = $product->only(['name', 'unit', 'quantity']); $detailsLog = json_encode($detailsLog, JSON_UNESCAPED_UNICODE); $log->details = $detailsLog; $log->save(); When you put the second parameter of the json_encode JSON_UNESCAPED_UNICODE follower, the Arabic words return without encoding.
i think you could try: <meta charset="utf-8" /> if this does not work use FloatBird Answer