Encoding of Danish letters - php

I get a city name as a GET variable. I need to make the first letter capitalized.
If the Get variable is "herning" I kan with no problem make it H, but if the Variable is "ølstykke" I can only make it lowercase ø, not uppercase Ø.
header('Content-type: text/html; charset=utf-8');
print strToUpper(mb_substr($_GET["city"], 0, 1));
If I do not set the header to utf-8, I just get strange characters.
Any ideas?
Updated code
<?php
header('Content-type: text/html; charset=utf-8');
$city = mb_convert_case($_GET["city"], MB_CASE_TITLE, "UTF-8");//Ølstykke
print $city;
$section = file_get_contents('https://api.dataforsyningen.dk/steder?hovedtype=Bebyggelse&undertype=by&prim%C3%A6rtnavn='.$city);
$section = json_decode($section);
print '<pre>';
print_r($section);
print '</pre>';
Solution: urlencode() around $city when sending to dataforsyningen did the job.

Use mb_strtoupper and specify the character-encoding in mb_substr
echo mb_strtoupper(mb_substr('ølstykke', 0, 1,'utf-8'));//Ø
In your case maybe you want not only first character but also the rest characters,
so maybe mb_convert_case function can help you.
echo mb_convert_case('ølstykke', MB_CASE_TITLE, "UTF-8");//Ølstykke

Related

PHP UTF-8 mb_convert_encode and Internet-Explorer

Since some days I read about Character-Encoding, I want to make all my Pages with UTF-8 for Compability. But I get stuck when I try to convert User-Input to UTF-8, this works on all Browsers, expect Internet-Explorer (like always).
I don't know whats wrong with my code, it seems fine to me.
I set the header with char encoding
I saved the file in UTF-8 (No BOM)
This happens only, if you try to access to the page via $_GET on the internet-Explorer myscript.php?c=äüöß
When I write down specialchars on my site, they would displayed correct.
This is my Code:
// User Input
$_GET['c'] = "äüöß"; // Access URL ?c=äüöß
//--------
header("Content-Type: text/html; charset=utf-8");
mb_internal_encoding('UTF-8');
$_GET = userToUtf8($_GET);
function userToUtf8($string) {
if(is_array($string)) {
$tmp = array();
foreach($string as $key => $value) {
$tmp[$key] = userToUtf8($value);
}
return $tmp;
}
return userDataUtf8($string);
}
function userDataUtf8($string) {
print("1: " . mb_detect_encoding($string) . "<br>"); // Shows: 1: UTF-8
$string = mb_convert_encoding($string, 'UTF-8', mb_detect_encoding($string)); // Convert non UTF-8 String to UTF-8
print("2: " . mb_detect_encoding($string) . "<br>"); // Shows: 2: ASCII
$string = preg_replace('/[\xF0-\xF7].../s', '', $string);
print("3: " . mb_detect_encoding($string) . "<br>"); // Shows: 3: ASCII
return $string;
}
echo $_GET['c']; // Shows nothing
echo mb_detect_encoding($_GET['c']); // ASCII
echo "äöü+#"; // Shows "äöü+#"
The most confusing Part is, that it shows me, that's converted from UTF-8 to ASCII... Can someone tell me why it doesn't show me the specialchars correctly, whats wrong here? Or is this a Bug on the Internet-Explorer?
Edit:
If I disable converting it says, it's all UTF-8 but the Characters won't show to me either... They are displayed like "????"....
Note: This happens ONLY in the Internet-Explorer!
Although I prefer using urlencoded strings in address bar but for your case you can try to encode $_GET['c'] to utf8. Eg.
$_GET['c'] = utf8_encode($_GET['c']);
An approach to display the characters using IE 11.0.18 which worked:
Retrieve the Unicode of your character : example for 'ü' = 'U+00FC'
According to this post, convert it to utf8 entity
Decode it using utf8_decode before dumping
The line of code illustrating the example with the 'ü' character is :
var_dump(utf8_decode(html_entity_decode(preg_replace("/U\+([0-9A-F]{4})/", "&#x\\1;", 'U+00FC'), ENT_NOQUOTES, 'UTF-8')));
To summarize: For displaying purposes, go from Unicode to UTF8 then decode it before displaying it.
Other resources:
a post to retrieve characters' unicode

check if the string begin with euro/pound symbol

I'm trying to check if a string is start with '€' or '£' in PHP.
Below are the codes
$text = "€123";
if($text[0] == "€"){
echo "true";
}
else{
echo "false";
}
//output false
If only check a single char, it works fine
$symbol = "€";
if($symbol == "€"){
echo "true";
}
else{
echo "false";
}
// output true
I have also tried to print the string on browser.
$text = "€123";
echo $text; //display euro symbol correctly
echo $text[0] //get a question mark
I have tried to use substr(), but the same problem occurred.
Characters, such as '€' or '£' are multi-byte characters. There is an excellent article that you can read here. According to the PHP docs, PHP strings are byte arrays. As a result, accessing or modifying a string using array brackets is not multi-byte safe, and should only be done with strings that are in a single-byte encoding such as ISO-8859-1.
Also make sure your file is encoded with UTF-8: you can use a text editor such as NotePad++ to convert it.
If I reduce the PHP to this, it works, the key being to use mb_substr:
<?php
header ('Content-type: text/html; charset=utf-8');
$text = "€123";
echo mb_substr($text,0,1,'UTF-8');
?>
Finally, it would be a good idea to add the UTF-8 meta-tag in your head tag:
<meta charset="utf-8">
I suggest this as the easiest solution to you. Convert the symbols to their unicode identifiers using htmlentities().
htmlentities($text, ENT_QUOTES, "UTF-8");
Which will either give you £ or €. Now that allows you to run a switch() {case:} statement to check. (Or your if statements)
$symbols = explode(";", $text);
switch($symbols[0]) {
case "&pound":
echo "It's Pounds";
break;
case "&euro":
echo "It's Euros";
break;
}
Working Example
This happens because you’re using a multi-byte character encoding (probably UTF-8) in which both € and £ are recorded using multiple bytes. That means that "€" is a string of three bytes, not just one.
When you use $text[0] you're getting only the first byte of the first character, and so it doesn't match the three bytes of "€". You need to get the first three bytes instead, to check whether one string starts with another.
Here’s the function I use to do that:
function string_starts_with($string, $prefix) {
return substr($string, 0, strlen($prefix)) == $prefix;
}
The question mark appears because the first byte of "€" isn’t enough to encode a whole character: the error is indicated by ‘�’ when available, otherwise ‘?’.

How to echo string with special characters?

I have the below string (in Turkish):
$string = "Otpor lideri Maroviç: Gezi eylemcileriyle temasımız olmadı";
However, when I attempt to echo the string, I get the below result:
Otpor lideri Maroviç: Gezi eylemcileriyle temas?m?z olmad?
How can I solve this problem?
First of all:
$string = "Otpor lideri Maroviç: Gezi eylemcileriyle temasımız olmadı";
echo htmlentities($string);
And make sure to add... to your head
<meta charset="UTF-8">
header('Content-type: text/plain; charset=utf-8');
$string = "Otpor lideri Maroviç: Gezi eylemcileriyle temasımız olmadı";
echo $string;
Open your file with code editor ex. Notepad++ and use "Convert to UTF-8" function.
This should help. Here in Poland we are also using special characters and this is a common problem.

Weird character after UTF8_encode

When I try to change from windows-1256 to utf8 text become like that
ÇáÑßä ÇáÚÇã ááãæÇÖíÚ ÇáÚÇãÉ
I'm trying to change the encoding of webpage I grabbed using file_get_contents.
header('Content-Type: text/html; charset=utf-8');
This sounds like a job for iconv
$output = iconv("ISO-8859-1", "UTF-8", file_get_contents($url));
Since I can't know what your content is, you might have to try UTF-8//TRANSLIT and UTF-8//IGNORE
Although I don't know Arabic, this might point you in the right direction:
$str = 'ÇáÑßä ÇáÚÇã ááãæÇÖíÚ ÇáÚÇãÉ';
$str = iconv("windows-1256", "utf-8//TRANSLIT//IGNORE", $str);
echo $str;

Problem in UTF Encoding in PHP

I use the following lines of code:
$revTerm = "". strrev($limitAry["term"]);
$revTerm = utf8_encode($revTerm);
The $revTerm contains Norwegian characters as ø æ å. However, it is shown correctly. I need to reverse them before displaying, so I use the first line.
When I display them this way, I get an error of bad xml format - used to fill a grid.
When I try to use the second line, I don't get an error but the characters are not shown correctly. Could there be any other way to solve that?
If it may help, I use jqGrid to fill those data in.
strrev, like most PHP string functions, is not safe for multi-byte encodings.
try this example
$test = 'А роза упала на лапу Азора ウィキ';
$test = iconv('utf-8', 'utf-16le', $test);
$test = strrev($test);
// キィウ арозА упал ан алапу азор А
echo iconv('utf-16be', 'utf-8', $test);
(russian)
http://bolknote.ru/2012/04/02/~3625#56
Try this:
$revTerm = utf8_decode($limitAry["term"]);
$revTerm = strrev($revTerm);
$revTerm = utf8_encode($revTerm);
For using strrev you have to decode your string to a non-multibyte string.

Categories