UTF-8 encoding and SESSION inconsistency?

UTF-8 encoding and SESSION inconsistency? - php

QUESTION:
Hello stackoverflow!
So this encoding stuff is getting on my last nerve. Not enough that it is difficult to figure out what the best combination of encodings needs to be when sending stuff forth and back using AJAX and PHP and SQL etc.. But it also causing problems with SESSION???!
So basically I already found a hot-fix solution no-thanks to google, partly the reason I'm writing this now. But I would also like to see if anyone of you actually have any more practical solution.
PROBLEM:
For example if I want my PHP file to have UTF-8 encoding, it then adds hidden characters in the file which then can only be viewed and deleted in a hex-editor. For those that don't know, YES any extra characters that aren't commented out will cause problems with SESSION and give you header error. So when I delete them, and re-upload the file, it falls back to ANSI encoding. Maybe there are different editors that can encode files more properly into UTF-8? I don't know, I'm using Notepad++ at the moment and am perfectly happy with it and it is hard to believe it should cause problems with encoding. I have also tried to change my default encoding in .htaccess file and no difference for the index file anyways.

It seems, although we get WARNING: session_start(): Cannot send session cache limiter - headers already sent ... the sessions are still set perfectly fine and all we could do at this point is simply turning off warning errors by placing this on top of our php file: error_reporting(~E_NOTICE & ~E_WARNING); although this doesn't really solve our problem and simply hiding it from public eye.

Page open Notepad2 or Sublime Text -> Save with Encoding -> UTF-8
index.php
<?php
session_start();
header('Content-type: text/html; charset=utf-8');
echo 'Hello ÇÖİŞÜĞüğışçö'; // bla bla
?>

SOLUTION:
I had to therefore make a fix by making 2 separate files for one and the same index, just with different encoding. Like my main file is ANSI encoded and called
index.php
that will have session_start(); line in it and beneath it we include our main scripts that were originally supposed to be there, but instead now included with this include('index_.php'); ................ Also I found out that this problem will NOT occur on all hosting servers, but only some. So the real solution may be found trough somewhere in the server settings.

Related

encoding issues in drupal when importing from wordpress

I am currently moving blog posts from wordpress to drupal. however after moving it
some of the text is not being displayed correctly.
wordpress is displaying :
When it hasn’t (html code is <h2>When it hasn’t</h2>)
Drupal is displaying :
When it hasnâ€™t (html code is <h2>When it hasnâ€™t</h2>)
In the wordpress and drupal db the value is correct. The source is the same.
<h2>When it hasnâ€™t</h2>
I did a search and found many options. None of them helped.
Below are the ones I have done and checked.
1) I double checked that utf-8 is the character encoing in drupal and wp.
I also made a simple test.php file to check nothing else was coming in the way
and it still did not display correctly.
2) I made sure when we take a mysqldump and upload to drupal utf-8
is used.
3) I also made sure the .php file is in utf-8 when saved.
4) I changed the encoding type in chrome for every option available and nothing
displayed it correctly.
5) I also used php functions to recode it but they did not work.
$value2="<h2>When it hasnâ€™t</h2>";
$out = recode_string('..utf-8', $value2);
//output - When it hasnt
$out2= mb_convert_encoding($value2,'UTF-8', "UTF-8");
// output - When it hasnÃ¢â‚¬â„¢t
$out3= #iconv('UTF-8', 'utf-8', $value2);
// output - When it hasnÃ¢â‚¬â„¢t
I have ran out of options now and I am stuck. Please help

You say the text in both databases is correct, but actually this doesn't mean too much: to viewing the content of a record you must use some client, and quite a few transformations may happen depending on how the text is rendered so you can read it.
So only two things matters:
the encoding of the column
the encoding of the HTML page returned by Drupal
Since your page outputs â€™ (in CP1252 is xE2x80x99) for ’ (Unicode U+2019, UTF-8 is 0xE28099) I guess the column is indeed UTF-8, however there's someone between the database and the browser who thinks the text is CP1252. This is what you have to check:
If using MySQL, the connection encoding must be UTF-8 so that what you have in your PHP script is UTF-8 text. You can use SET NAMES 'UTF-8'. Note that if you don't need the Unicode set, you can even use CP1252: the only important thing is that you know the encoding, since PHP strings are just byte arrays.
Explicitely define the response encoding in the HTTP Content-Type header. I mean, configure Drupal to call header('Content-Type: text/html; charset=utf-8');
If the HTTP response encoding is different than the one used for the text retrieved from the db, transcode the query result accordingly

PHPExcel outputs garbled text

Like many others out there i have had my fair share of issues trying to download an Excel file output by PHPExcel.
What happened in my case was whenever I wanted to download a file using
$obj->save('php://output')
i always used to get garbled text in my excel file with a warning saying my file was corrupt. Eventually i resolved the issue. The problem being i had a
require('dbcon.php')
at the top of my php script. I just replaced that with whatever was there inside dbcon.php and it worked fine again.
Though the problem is solved i would really like to know what caused the problem. It would be great if anyone out there could help me out with this one.
Thanks.

If you get that error - you should follow the advice we always give in that situation: you use a text editor to look in the generated file for leading or trailing whitespace, or plaintext error messages - and then in your own scripts for anything that might generate that such as echo statements, blank lines outside ?> <?php, etc.
Another way of testing for this is to save to the filesystem rather than php://output and see if you get the same problem: if that works, then the problem is always something that your own script is sending to php://output as well.
Clearly you had a problem along those lines in your dbcon.php file. This can be as simple as a trailing newline after a closing ?> in the file...

Tanmay.
In situations like your's, there can be couple of reasons for broken output:
in file dbcon.php can be a whitespace before opening or ending php
tag, so that produce some chars to output and can make file broken
(this is reason for using only opening tag in php 5.3+);
maybe file dbcon.php wasn't found by require, so you got error message in otput;
any other errors or notices or warnings in dbcon.php, because presence of global vars from current file..

How does the PHP code execute even without closing the ?> PHP tag?

Following is the code I come to notice from a PHP file:
<?php
# Should log to the same directory as this file
$log = KLogger::instance(dirname(__FILE__), KLogger::DEBUG);
$args1 = array('a' => array('b' => 'c'), 'd');
$args2 = NULL;
$log->logInfo('Info Test');
$log->logNotice('Notice Test');
$log->logWarn('Warn Test');
$log->logError('Error Test');
$log->logFatal('Fatal Test');
$log->logAlert('Alert Test');
$log->logCrit('Crit test');
$log->logEmerg('Emerg Test');
$log->logInfo('Testing passing an array or object', $args1);
$log->logWarn('Testing passing a NULL value', $args2);
You can notice that the closing PHP tag(?>) is not present there but still all the statements within code are working perfect. I'm not getting how this could be possible to execute the code without completion of PHP tag(?>). I researched but didn't get any satisfatory explanation. Can anyone guide me in this regard? Thanks in advance.

The closing tag exists to tell the interpretter that it should stop executing the text and just output it verbatim. Unlike XML, which requires openning and closing tags to match to be valid, the PHP interpretter simply uses the tags to delimit where execution should start and stop.
Just like a PHP file could have no opening tag - meaining that the entire contents would be output, no closing tag is necessary as once the end-of-file is reached execution ends.

While I can't remember any other reason, sending headers earlier than the normal course may have far reaching consequences. Below are just a few of them that happened to come to my mind at the moment:
While current PHP releases may have output buffering on, the actual production servers you will be deploying your code on are far more important than any development or testing machines. And they do not always tend to follow latest PHP trends immediately.
By sending headers inadvertently, you might have introduced a security vulnerability: say, you are doing a redirection, but hence the headers are already sent, the redirection does not work and the rest of the page might be output, thus the visitor may see what she was not supposed to see. While this can be mitigated by using exit, you know the story, only if every one of us utilize good programming habits every time.
Even if letting the visitor stay in the wrong page does not have a security implication, by breaking a session behavior, or in some other ways I've encountered over years, the security and/or session cycle might have taken some sort of blow in the end.
If not security, you may have headaches over inexplicable functionality loss. Say, you are implementing some kind payment gateway, and redirect user to a specific URL after successful confirmation by the payment processor. If some kind of PHP error, even a warning, or an excess line ending happens, the payment may remain unprocessed and the user may still seem unbilled. This is also one of the reasons why needless redirection is evil and if redirection is to be used, it must be used with caution.
You may get "Page loading canceled" type of errors in Internet Explorer, even in the most recent versions. This is because an AJAX response/json include contains something that it shouldn't contain, because of the excess line endings in some PHP files, just as I've encountered a few days ago.
If you have some file downloads in your app, they can break too, because of this. And you may not notice it, even after years, since the specific breaking habit of a download depends on the server, the browser, the type and content of the file (and possibly some other factors I don't want to bore you with).
Bonus: a few gotchas (actually currently one) related to these 2 characters:
Even some well-known libraries may contain excess line endings after ?>. An example is Smarty, even the most recent versions of both 2.* and 3.* branch have this. So, as always, watch for third party code. Bonus in bonus: A regex for deleting needless PHP endings: replace (\s*\?>\s*)$ with empty text in all files that contain PHP code.

From the PHP Manual:
The closing tag of a PHP block at the end of a file is optional, and in some cases omitting it is helpful when using include or require, so unwanted whitespace will not occur at the end of files, and you will still be able to add headers to the response later. It is also handy if you use output buffering, and would not like to see added unwanted whitespace at the end of the parts generated by the included files.

Sending UTF8 in GET parameter

When navigating to a URL like this:
http://example.com/user?u=ヴィックサ
I notice that Chrome encodes the characters as:
http://example.com/user?u=%E3%83%B4%E3%82%A3%E3%83%83%E3%82%AF%E3%82%B5
And everything works serer-side.
However, in IE I get this error from my code:
The user you are trying to find (?????) does not exist.
Note the five question marks. For some reason the PHP never gets to see the parameter.
What could be causing this, and is there any way to fix it?

Sadly it seems what you want to do is not going to work for the current generation of IE
The accepted answer for this question UTF-8 Encoding issue in IE query parameters says that you need to encode the characters yourself rather than relying on the browser as support varies from browser to browser, and maybe even device to device
<a href='/path/to/page/?u=<?=urlencode('ヴィックサ')?>'>View User</a>
Also I presume you are setting utf8 headers from the webserver? you didn't say, if not, in php
header('Content-Type: text/html; charset=utf-8');

Why doesn't jQuery.parseJSON() work on all servers?

Hey there, I have an Arabic contact script that uses Ajax to retrieve a response from the server after filling the form.
On some apache servers, jQuery.parseJSON() throws an invalid json excepion for the same json it parses perfectly on other servers. This exception is thrown only on chrome and IE.
The json content gets encoded using php's json_encode() function. I tried sending the correct header with the json data and setting the unicode to utf-8, but that didn't help.
This is one of the json responses I try to parse (removed the second part of if because it's long):
{"pageTitle":"\u062e\u0637\u0623 \u0639\u0646\u062f \u0627\u0644\u0625\u0631\u0633\u0627\u0644 !"}
Note: This language of this data is Arabic, that's why it looks like this after being parsed with php's json_encode().
You can try to make a request in the examples given down and look at the full response data using firebug or webkit developer tools. The response passes jsonlint!
Finally, I have two urls using the same version of the script, try to browse them using chrome or IE to see the error in the broken example.
The working example : http://namodg.com/n/
The broken example: http://www.mt-is.co.cc/my/call-me/
Updated: To clarify more, I would like to note that I manged to fix this by using the old eval() to parse the content, I released another version with this fix, it was like this:
// Parse the JSON data
try
{
// Use jquery's default parser
data = $.parseJSON(data);
}
catch(e)
{
/*
* Fix a bug where strange unicode chars in the json data makes the jQuery
* parseJSON() throw an error (only on some servers), by using the old eval() - slower though!
*/
data = eval( "(" + data + ")" );
}
I still want to know if this is a bug in jquery's parseJSON() method, so that I can report it to them.

Found the problem! It was very hard to notice, but I saw something funny about that opening brace... there seemed to be a couple of little dots near it. I used this JavaScript bookmarklet to find out what it was:
javascript:window.location='http://www.google.com/search?q=u+'+('000'+prompt('String?').charCodeAt(prompt('Index?')).toString(16)).slice(-4)
I got the results page. Guess what the problem is! There is an invisible character, repeated twice actually, at the beginning of your output. The zero width non-breaking space is also called the Unicode byte order mark (BOM). It is the reason why jQuery is rejecting your otherwise valid JSON and why pasting the JSON into JSONLint mysteriously works (depending on how you do it).
One way to get this unwanted character into your output is to save your PHP files using Windows Notepad in UTF-8 mode! If this is what you are doing, get another text editor such as Notepad++. Resave all your PHP files without the BOM to fix your problem.
Step 1: Set up Notepad++ to encode files in UTF-8 without BOM by default.
Step 2: Open each existing PHP file, change the Encoding setting, and resave it.

You should try using json2.js (it's on https://github.com/douglascrockford/JSON-js)
Even John Resig (creator of jQuery) says you should:
This version of JSON.js is highly recommended. If you're still using the old version, please please upgrade (this one, undoubtedly, cause less issues than the previous one).
http://ejohn.org/blog/the-state-of-json/

I don't see anything related to parseJSON()
The only difference I see is that in the working example a session-cookie is set(guess it is needed for the "captcha", the mathematical calculation), in the other example no session-cookie is set. So maybe the comparision of the calculation-result fails without the session-cookie.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.