Charset issue from local to remote server

Charset issue from local to remote server - php

I have a problem of charset.
On localhost everything works fine, but now on remote server I see strange characters replacing others like à or è. I have read it's a charset issue and I think the problem can be my php.ini (I can't edit it).
To solve it I've tried many things:
I've set
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
on html,
ini_set('default_charset', 'UTF-8');
on php,
AddDefaultCharset utf-8
on my .htaccess file,
if I use utf8_encode on strings letters are replaced by ã or similar, if I leave it without doing anything letters are �
There is another way to solve this problem that I have not found yet?
Sorry, I forgot to say it: strings are retrieved from another site by a file_get_contents (I'm using a Yandex API)
Here's some code:
$yandex = 'https://dictionary.yandex.net/api/v1/dicservice.json/lookup?key=my_api_key&lang=it-it&text=attualità';
// get json from this page
$object = json_decode(file_get_contents($yandex));
$syns_array = array();
$type = '';
// if the word exists
if (!empty($object->def) && $object->def != FALSE && $object->def != NULL)
{
$type = $object->def[0]->tr[0]->pos;
$rows = $object->def[0]->tr;
// if there're synonyms
if (!empty($rows) && $rows != FALSE && $rows != NULL)
{
foreach ($rows as $row)
{
array_push($syns_array, $row->text);
// if there're more rows with syns
if (!empty($row->syn) && $row->syn !== FALSE && $row->syn !== NULL)
{
foreach ($row->syn as $syns_obj)
{
array_push($syns_array, $syns_obj->text);
}
}
}
}
}
// I echo my synonyms from the array
foreach($syns_array as $syn) {
echo $syn;
}

I forgot to say I was using mb_strtolower on those strings. Replacing it with strotolower the problem is solved... Sorry

Related

Russian Characters in URL PHP

I have scoured the internet and I'm either not finding the answer or I'm unable to implement it correctly.
Basically I am trying to implement a website in Russian on PHP. I have already got the website in English and French but bringing a whole new range of characters in has sort of broken the test site. My actual goal is to have the Cyrillic characters in the URL, similar to how Wikipedia are able to do it > https://ru.wikipedia.org/wiki/Компьютер and also still find this in the database to show the correct location information.
In my SQL database I have a range of locations, countries such as France, Germany, Australia etc. I have set it up so that the location page is generated dynamically from those entries using the $data['Name'] variable. Now... in the header.php file I use this to generate the location names from the database for the navigation:
<li class="dropdown">
Места
<ul>
<?php if (is_array($locations)) {
foreach ($locations as $key => $location) {
$name = strtolower(str_ireplace(" ","-", $location['name']));
if ($location['top_location'] == 1)
echo '<li>'.$location['name'].'</li>';
}
}
?>
</ul>
</li>
Where $name is replaced by database entries. If I change one of the database entries to Russian (Australia for example - Австралия) then the location page throws a 404 error as it's actually trying to find location/%D0%90%D0%B2%D1%81%D1%82%D1%80%D0%B0%D0%BB%D0%B8%D1%8F rather than location/Австралия.
My location page has the following code to get information from the database:
<?php
include './inc/utils.php';
if (isset($_GET['id'])) {
$name = str_ireplace("-"," ", $_GET['id']);
$result = get_data("Locations", array("name" => $name))[0];
}
else
$result = null;
if ($result != null) {
$data['Name'] = $result['name'];
$data['Url_Name'] = $_GET['id'];
$data['Image'] = $result['image'];
$data['Slider_Text'] = $result['slider_text'];
$data['Description'] = $result['description'];
$data['Country'] = $result['top_location'] != 0 ? true : false;
$data['Cars_In_Location'] = get_cars_in_location($result['id']);
$img_url = $MASTER['car_img_url'];
$link_url = $MASTER['base_url'].'car/';
$cities_id = explode(",", $result['related']);
foreach ($cities_id as $value) {
$data['Related'][] = get_data("Locations", array("id" => $value))[0];
}
}
else {
$data['Name'] = "";
$data['Url_Name'] = "";
$data['Image'] = "";
$data['Slider_Text'] = "";
$data['Description'] = "";
$data['Country'] = false;
$data['Related'] = "";
$data['Cars_In_Location'] = "";
}
if (empty($data['Name']) == true) {
header('HTTP/1.1 404 Not Found');
header('Location: '.$MASTER['base_url'].'404.php');
}
include 'header.php';
?>
I have tried using urldecode to no avail. I think I am missing something either on the SQL side or in one of the function files.
my header.php file contains both
<meta http-equiv="Content-type" content="text/html; charset=utf-8" /> and
header('Content-Type: text/html; charset=utf-8');
as well as my location page containing
header('Content-Type: text/html; charset=utf-8');
my .htaccess file has
AddDefaultCharset UTF-8
I don't know what else I'm missing.
You can see the page here:
https://redfoxluxurycarhire.com/ru/location/Австралия
I am using print_r($host_url) to correctly print the URL despite what it shows so you can see my issue. I am also able to echo Австралия onto the location pages with no problems or encoding.
Any help would be much appreciated as I'm wracking my brain as how to get this to work!

I would start checking if you have any file-systems encoding problem. Check your scripts are using UTF-8, I believe your MySQL database is OK. You should be able to decode requests with urldecode.

What does "ÿþ" in the content of an URL mean? [duplicate]

I'm trying to read ID3 data in bulk. On some of the tracks, ÿþ appears. I can remove the first 2 characters, but that hurts the tracks that don't have it.
This is what I currently have:
$trackartist=str_replace("\0", "", $trackartist1);
Any suggestions would be greatful, thanks!

ÿþ is 0xfffe in UTF-8; this is the byte order mark in UTF-16.
You can convert your string to UTF-8 with iconv or mb_convert_encoding():
$trackartist1 = iconv('UTF-16LE', 'UTF-8', $trackartist1);
# Same as above, but different extension
$trackartist1 = mb_convert_encoding($trackartist1, 'UTF-16LE', 'UTF-8');
# str_replace() should now work
$trackartist1 = str_replace('ÿþ', '', $trackartist1);
This assumes $trackartist1 is always in UTF-16LE; check the documentation of your ID3 tag library on how to get the encoding of the tags, since this may be different for different files. You usually want to convert everything to UTF-8, since this is what PHP uses by default.

I had a similar problem but was not able to force UTF-16LE as the input charset could change. Finally I detect UTF-8 as follows:
if (!preg_match('~~u', $html)) {
For the case that this fails I obtain the correct encoding through the BOM:
function detect_bom_encoding($str) {
if ($str[0] == chr(0xEF) && $str[1] == chr(0xBB) && $str[2] == chr(0xBF)) {
return 'UTF-8';
}
else if ($str[0] == chr(0x00) && $str[1] == chr(0x00) && $str[2] == chr(0xFE) && $str[3] == chr(0xFF)) {
return 'UTF-32BE';
}
else if ($str[0] == chr(0xFF) && $str[1] == chr(0xFE)) {
if ($str[2] == chr(0x00) && $str[3] == chr(0x00)) {
return 'UTF-32LE';
}
return 'UTF-16LE';
}
else if ($str[0] == chr(0xFE) && $str[1] == chr(0xFF)) {
return 'UTF-16BE';
}
}
And now I'm able to use iconv() as you can see in #carpetsmoker answer:
iconv(detect_bom_encoding($html), 'UTF-8', $html);
I did not use mb_convert_encoding() as it did not remove the BOM (and did not convert the linebreaks as iconv() does):

Use regex replacement:
$trackartist1 = preg_replace("/\x00?/", "", $trackartist1);
The regex above seeks the first occurrence of "\x00"(hexadecimal zeros), if possible, and replaces it with nothing.

Issues with showing french accents from my Database

I am connection to a Filemaker DB through ODBC, and some data contains accents such as é or è. These characters appear as "?" right now, which is a bit of a problem. Here is what my code looks like:
$connection = odbc_connect($dsn, $username, $password, SQL_CUR_USE_ODBC);
$sql = "SELECT * FROM Table1";
$res = odbc_exec($connection,$sql);
while ($row = odbc_fetch_array($res)){
$x++;
$values= ($x . ": Customer:". $row['Customer'] . "\n");
print($values);
}
odbc_free_result($res);
odbc_close($connection);
I tried a few things, such as adding 'charset=utf-8' in the header, but nothing seems to work so far. I'm pretty sure I need to include utf-8 somewhere, I just haven't found examples with odbc similar to my code online. Thanks!

You will need to connect using the correct encoding. You can determine the correct encoding with the following query:
SELECT hex(CustomerCustomer) FROM Table1;
Match the hex code of the offending character with the target encodings, most likely latin1 and UTF-8. If you cannot identify the hex codes, then paste the output here and I will identify it for you.

ODBC use a encode type called WIN1252.
Try it:
mb_convert_encoding($value,'UTF-8','Windows-1252');
i've used it to do the opposite from win1252 to utf8 by this way should works to.. Let me know
So try it:
Use the function mb_detect_encoding(). If the function doesn't exist try this code.
if ( !function_exists('mb_detect_encoding') ) {
function mb_detect_encoding ($string, $enc=null, $ret=null) {
static $enclist = array(
'UTF-8', 'ASCII',
'ISO-8859-1', 'ISO-8859-2', 'ISO-8859-3', 'ISO-8859-4', 'ISO-8859-5',
'ISO-8859-6', 'ISO-8859-7', 'ISO-8859-8', 'ISO-8859-9', 'ISO-8859-10',
'ISO-8859-13', 'ISO-8859-14', 'ISO-8859-15', 'ISO-8859-16',
'Windows-1251', 'Windows-1252', 'Windows-1254',
);
$result = false;
foreach ($enclist as $item) {
$sample = iconv($item, $item, $string);
if (md5($sample) == md5($string)) {
if ($ret === NULL) { $result = $item; } else { $result = true; }
break;
}
}
return $result;
}
Source:
PHP

Removing empty lines with notepad++ causes T_Variable Syntax Error

Removing empty lines in Notepad++
As described in the thread above you are able to remove empty lines with notepad++. I did try those methods but I am always receiving weird T_Variable Syntax Errors (mostly in line 1-6). There definitely is no error in those lines and I can not see one anywhere.
It also happens when I manually delete the empty lines in some areas of the code (first 5 lines for example). I am guessing this is an encoding problem but also reencoding in UTF-8, Ascii etc. did not help.
Those lines were added when I used an online editor from a webhoster a couple of months ago (1 empty line between the lines that were there before).
I do not get it, maybe you do (thanks in advance!). The file is at http://lightningsoul.com/index.php
And here is the first block of code:
<?php
include("db/lg_db_login.php");
//require 'fb/facebook.php';
if (isset($_GET['c'])) { $content = $_GET['c']; }
else { $content = ""; }
if (isset($_GET['sc'])) { $subcontent = $_GET['sc']; }
else { $subcontent = ""; }
if (isset($_GET['setlang'])) { $setlang = $_GET['setlang']; }
else { $setlang = "eng"; }
$cat = $_GET['cat'];
// Check if Lightningsoul.de or .com
$findme = '.de';
$posde = strpos($thisurl, $findme);
// Note our use of ===. Simply == would not work as expected
// because the position of 'a' was the 0th (first) character.
if ($posde === false) {
$lang = "en";
} else {
$lang = "de";
}
include("db/pageturn_class.php");
$findStr = '/lightningsoulcom';
$isApp = strpos($thisurl, $findStr);
// Beachten Sie die Verwendung von ===. Ein einfacher Vergleich (==) liefert
// nicht das erwartete Ergebnis, da die Position von 'a' die nullte Stelle
// (also das erste Zeichen) ist
/*if ($isApp == false) {
$getStyle = "css/get_style.php";
} else {
$getStyle = "css/get_style_small.php";
} */
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

The problem was in line 181 where something (notepad++?!) must have changed
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
to
<meta http-equiv="content-type" content="text/html; charset=Windows-1250" />
That produced lots of errors as one would expect. I only found it by removing the head part by part.

Avoid re-conversion of a UTF-8 String PHP

Well, I have a BD with a lot of ISO strings and another with UTF-8 (yes, I ruin everything) and now I'm making a custom function that rewrite all the BD again to have all in UTF-8, the problem, is the conversion with UTF-8 strings... The ? appears:
$field = $fila['Field'];
$acon = mysql_fetch_array(mysql_query("SELECT `$field` as content FROM `$curfila` WHERE id='$i'"));
$content = $acon['content'];
if(!is_numeric($content)) {
if($content != null) {
if(ip2long($content) === false) {
mb_internal_encoding('UTF-8');
if(mb_detect_encoding($content) === "UTF-8") {
$sanitized = utf8_decode($content);
if($sanitized != $content) {
echo 'Fila [ID ('.$i.')] <b>'.$field.'</b> => '.$sanitized.'<br>';
//mysql_query("UPDATE `$curfila` SET `$field`='$sanitized' WHERE id='$i'");
}
}
}
}
}
PD: I check all the columns and rows of all the tables of the BD. (I show all everything before doing anything)
So, how can I detect that?
I tried mb_detect_encoding, but the all the string are in UTF-8... So, which function can I use now?
Thanks in advance.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Charset issue from local to remote server - php

I forgot to say I was using mb_strtolower on those strings. Replacing it with strotolower the problem is solved... Sorry

Related

Russian Characters in URL PHP

What does "ÿþ" in the content of an URL mean? [duplicate]

Issues with showing french accents from my Database

Removing empty lines with notepad++ causes T_Variable Syntax Error

Avoid re-conversion of a UTF-8 String PHP

Categories

Resources