PHP greek url convert - php

I have a URL like: domain.tld/Σχετικά_με_μας
[edit]
Reading the $_SERVER['REQUEST_URI'] I get to work with:
%CE%A3%CF%87%CE%B5%CF%84%CE%B9%CE%BA%CE%AC_%CE%BC%CE%B5_%CE%BC%CE%B1%CF%82
[/edit]
In PHP I need to convert it to HTML, I get pretty far with:
htmlentities(urldecode($navstring), ENT_QUOTES, 'UTF-8');
It results in:
Σχετικά_με_μας
but the 'ά' becomes 'ά' But I need it converted to
ά
I'dd really appreciate help. I need a universal solution, not a "string replace"

I have been playing around a little, and the following worked. Use mb-convert-encoding instead of htmlentities.:
mb_convert_encoding(urldecode($navstring),'HTML-ENTITIES','UTF-8');
//string(90) "domain.tld/Σχετικά_με_μας"
See mb-convert-encoding

Information
All modern web browsers understand UTF-8 character encoding.
My advice would be :
Always know the character encoding of the data you are using.
Store your data with UTF-8.
Output data with UTF-8
The mbstring php extension doesn't just manipulate Unicode strings. It also converts multibyte strings between various character encodings.
Use the mb_detect_encoding() (ref) and mb_convert_encoding() (ref 2) functions to convert Unicode strings from one character encoding to another.
PHP Needs to know !
You also need to tell PHP that you are working with UTF-8, to tell him the default value, you can do it in your php.ini file :
default_charset = "UTF-8";
That default value is added to the default Content-Type header returned by PHP unless you specified it with the header() function :
header('Content-Type: application/json;charset=utf-8');
Keep in mind
The default character set is used by a lot of functions in PHP such as :
htmlentities()
htmlspecialchars()
all the mbstring functions
...

Related

Problems with php xml characters

Hello friends i have a problem with some characters reading a xml file from php i am using this source code:
$file = 'test.xml';
$xml_1 = simplexml_load_file($file);
echo ($xml_1->content);
its work ok but when the content is a special character like ñ ó it show a rarer character like this ñ i tried to include in html head utf8 charset but its the same
SimpleXML emits UTF-8 output by design. If you application does not support UTF-8 you'll have to convert with the usual tools (e.g. mb_convert_encoding()) but you need to take this into account:
You need to know for sure the encoding your app is using.
UTF-8 can hold the complete Unicode catalogue thus some characters may not have an equivalent in your target encoding.
Whatever, in 2016 there's no reason to use anything else than UTF-8 unless your maintaining legacy code.
Finally i find the solution i must to use utf8_decode php function to convert the characters it is not enought with put utf8 charset in the head page you must to convert using php before

PHP and UTF-8 String functions WITHOUT MB-Functions?

I try to use UTF-8 with PHP, the Output seems okay (Display correct äöüß etc, when testing) on my Site, but there is a simply Problem... When I use echo strlen("Ä"); it shows me "2"... I read this Topic: strlen() and UTF-8 encoding
In the answer I read this:
The replacement character often gets inserted when a UTF-8 decoder reads data that's not valid UTF-8 data.
I wonder, why my Data is not valid UTF-8? Because:
I saved all my files in "UTF-8 no BOM"
Used UTF-8 header on the first line
My browser says also "Encoding: UTF-8"
This is my code:
<?php
header("Content-Type: text/html; charset=utf-8");
$test = 'Ä';
echo strlen($test);
var_dump($test);
?>
My Question: Can I use normal PHP-Functions with UTF-8 or must I use the "mb"-Functions?
If it's possible to use the normal PHP-Functions, why show me strlen() 2 in my code, instead of 1?
strlen() will return the length of the string in bytes by default, not characters... you can change this by setting the mbstring.func_overload ini setting to tell PHP to return characters from a strlen() call instead.... but this is global, and affects a number of other functions as well, like strpos() and substr() (full list in the documentation link)
This can have serious adverse effects elsewhere in your code, particularly if you're using 3rd party libraries that aren't aware of it, so it isn't recommended.
It's better to use the mb_* functions if you know that you're working with UTF-8 strings... and (when it comes to it) setting the mbstring.func_overload is simply telling PHP to use mb_* functions as an alternative to the normal string functions "under the hood"

get_meta_tags and persian phrases

I used this function,
$code = get_meta_tags('http://www.narenji.ir/');
and I've seen this
'مکانی برای آشنایی با ابزارها Ùˆ اخبار داغ دنیای Ùناوری'
How can I fix this issue?
Can I fix it without using JSON?
You must be missing some link here, your code just works:
Example
The key point is that you preserve the UTF-8 encoding so that Persian is supported. Otherwise you would need some other encoding (one that I do not yet know) that supports Persian and a library that is able to re-encode that.
Which encoding do you want to use for Persian output?
If you are executing your script from a browser, make sure you sending UTF-8 as your content encoding. Add a Content-Type header before echo'ing anything.
header('Content-Type:text/html; charset=utf-8');
utf8_decode() is built specifically for converting from UTF-8 to ISO-8859-1 (latin1). Persian characters are not in Latin1, so why would you feel it's necessary here??
working example: http://codepad.viper-7.com/tEjZAz

let htmlspecialchars use UTF-8 as default charset?

Is there a way to tell PHP to use UTF-8 as default for functions like htmlspecialchars ?
I have already setted this:
ini_set('mbstring.internal_encoding','UTF-8');
ini_set('mbstring.func_overload',7);
If not, please can you post a list of all functions where I need to specify the charset?
(I need this because I am re-factorizing all my framework to get working with UTF-8)
Just use htmlspecialchars() instead of htmlentities(). Because it doesn't touch the non-ASCII characters, it doesn't matter whether you use 'utf8' charset or the default 'latin1'(*), the results are the same. As a bonus your output is smaller. (Though it does mean you have to ensure you're actually serving your page with the correct encoding.)
(*: there are a few East Asian multibyte charsets which can differ in their use of ASCII code points, so if you're using those you would still need to pass a $charset argument to htmlspecialchars(). But certainly no such problem for UTF-8.)
Is there a way to tell PHP to use UTF-8 as default for functions like htmlspecialchars ?
Nope, not as far as I know. mbstring.internal_encoding will define a default encoding for the mb_* family of functions only.
If not, please can you post a list of all functions where I need to specify the charset?
I'm not sure whether such a list exists - if in doubt, just walk through the manual and look out for any charset parameters.

Read ansi file and convert to UTF-8 string

Is there any way to do that with PHP?
The data to be inserted looks fine when I print it out.
But when I insert it in the database the field becomes empty.
$tmp = iconv('YOUR CURRENT CHARSET', 'UTF-8', $string);
or
$tmp = utf8_encode($string);
Strange thing is you end up with an empty string in your DB. I can understand you'll end up with some garbarge in your DB but nothing at all (empty string) is strange.
I just typed this in my console:
iconv -l | grep -i ansi
It showed me:
ANSI_X3.4-1968
ANSI_X3.4-1986
ANSI_X3.4
ANSI_X3.110-1983
ANSI_X3.110
MS-ANSI
These are possible values for YOUR CURRENT CHARSET
As pointed out before when your input string contains chars that are allowed in UTF, you dont need to convert anything.
Change UTF-8 in UTF-8//TRANSLIT when you dont want to omit chars but replace them with a look-a-like (when they are not in the UTF-8 set)
"ANSI" is not really a charset. It's a short way of saying "whatever charset is the default in the computer that creates the data". So you have a double task:
Find out what's the charset data is using.
Use an appropriate function to convert into UTF-8.
For #2, I'm normally happy with iconv() but utf8_encode() can also do the job if source data happens to use ISO-8859-1.
Update
It looks like you don't know what charset your data is using. In some cases, you can figure it out if you know the country and language of the user (e.g., Spain/Spanish) through the default encoding used by Microsoft Windows in such territory.
Be careful, using iconv() can return false if the conversion fails.
I am also having a somewhat similar problem, some characters from the Chinese alphabet are mistaken for \n if the file is encoded in UNICODE, but not if it is UFT-8.
To get back to your problem, make sure the encoding of your file is the same with the one of your database. Also using utf-8_encode() on an already utf-8 text can have unpleasant results. Try using mb_detect_encoding() to see the encoding of the file, but unfortunately this way doesn't always work. There is no easy fix for character encoding from what i can see :(

Categories