I haven't got clue if this is a normal issue or not but I have a small flash application that handles management for my company. It's a small company, so its not a big deal, its just a bunch of INSERTs, SELECTs, UPDATEs and other stuff to manage their clients, address, phone numbers, etc.
The flash (in AS3) sends the variables through a URLRequest to several php pages and the php handles the request to mySQL.
My problem is that, sometimes, instead of inserting the String I sent, it instead gets a weird string, made mostly, but not only, of numbers (and it happens like 1 column out of about 10 per INSERT, so its fairly common).
Is this a known issue? Could it be because of the encoding (I used UTF-8, which I believe is the one that we use here in portugal, due to special characters, like ã, à, á, etc)?
Thank you for your time.
Marco Fox.
After connecting to the DB, try the following query "SET CHARACTER SET utf8;".
Make sure every PHP page are in utf-8.
To do that, open the file in Notepad++ and use the menu Encoding -> Convert to UTF-8 without BOM, or open the file in notepad and ask to save as and look at encoding dropdown bellow name (this will save the BOM, which is not good).
Some IDE have the ability to save in ANSI, UTF-8 and more, or have the conversion option.
In Flash, use encodeURI() in your URLLoader data if you are passing it by GET.
Hopes that this solves your problem (if it is, in fact, encoding issues).
Related
We are having a PHP5.6 website project and we are about to re-launch it on PHP7.4.
Let's call them old environment and new environment. Old one still intact. Both are on different server machines.
Charsets (html meta tags) are set to utf-8.
Zend Framework 1 is involved in both. The database is on an SQL Server, shared by both environments. We use the SqlSrv driver to connect to the database (new environment), the old environment has PDO-Sql.
The encoding of the database is set to Latin1_General_CI_AS.
Information is getting inserted and selected into/from many tables (INSERT, SELECT). Html textfields and
textareas are in use.
In the old environment, any text written in textfields/-areas with special characters, such as umlauts, is being saved in the database in a corrupt form, like instead of ö there is ö in the database table. On the screen, after a select-statement however, it is shown as ö (clean!).
That was all okay until now, but now we have the new environment.
Let's say there is are old entries saved during the old environment era and we open the website on the new environment. The content is shown 1:1 as seen in the database table, in other words: corrupted. Which explains why anything that is saved with the help of the new environment is shown correctly on screen since special characters and umlauts are saved without any changes in the database table.
But the entries made with the new environment cannot be seen on the old environment website.
Using utf8_encode or utf8_decode didn't help much, either it looked even worse, or there is no text on screen to be seen neither.
Writing some script that changes the encoding in the table would cause mayhem, because since the old environment still in use, it can't be done that easy.
There are no encoding options mentioned in the options, that are used on the class called Zend_Db_Adapter_Sqlsrv.
Well, I don't trust mb_detect_encoding and yet we tried that, but it returned UTF-8 on the returned values from the tables.
So what would people recommend? I might have missed some facts, but I'll provide you with more information if needed.
This sounds very similar to a problem I've solved in the past. Unfortunately I solved it in ASP.NET, so I can only describe what I did and let you translate it into PHP.
So the issue probably arises because your old system is using a non-UTF-8 codepage, in my case the codepage was windows-1252 which was fairly common at the time. The codepage determines the character coding that your code uses.
So on my more modern system what I had to do was force the codepage back to windows-1252 while I was reading from the database. And then before rendering the page, set the content encoding to UTF-8.
So unless you are able to fix the problem at source, you basically have to hack your new system to continue operating the same way - which is a unfortunate but sometimes necessary.
The ASP.NET code looks like this:
protected void Page_Load(object Sender, EventArgs Args)
{
// Set the encoding for building and rendering, then switch later to display as utf-8
Response.Charset = "windows-1252"; // Hmmm... double check this
Response.ContentEncoding = System.Text.Encoding.GetEncoding("windows-1252");
}
protected void Render(HtmlTextWriter writer)
{
// Now that all the character encoding has taken place, switch to utf-8 to force it to display this way...
Response.Charset = "utf-8";
Response.ContentEncoding = Encoding.UTF8;
}
Hopefully that gives you enough to go on... its been a long time since I did this, but the pain still sticks in my mind!
Since it was a project someone else did, we decided to update all tables and correct the values.
Example :
$fire = '🔥';
I know PHP 5+ supports this functionality natively but is it best practice or should I be storing them using their codepoints instead and if so, why?
As far as your editor and the PHP compiler are concerned, it's all just text, and '🔥' is no different from 'fire' or 'Φωτιά'.
When PHP runs, it will read the bytes in from the file and put them in memory, without caring what they mean. This leads to the most likely problem you'll have: if you save the file in your text editor as UTF-16, and then echo the string to a browser telling it that it's UTF-8, the browser won't show the right thing. But that's easily avoided by making sure your editor always uses UTF-8, and your output headers tell the browser that's what you're using.
If you don't trust your editor to do that, and you're running PHP7, you could write it in the escaped notation "\u{1f525}", but when it runs, the same bytes will end up in memory.
You might have similar problems if you send the text elsewhere - to a database, for instance - and that somewhere else doesn't know to handle it as UTF-8. How you write the string in your source file won't make any difference to that, though, that's just a case of making sure everything is configured to match.
Note: you don't actually have to use UTF-8 for this, you could use UTF-16, or some other encoding, as long as you're consistent; but UTF-8 is by far the most common these days, particularly on the web.
I'm doing a kind of roundabout experiment thing where I'm pulling data from tables in a remote page to turn it into an ICS so that I can find out when this sports team is playing (because I can't find anywhere that the information is more readily available than in this table), but that's just to give you some context.
I pull this data using cURL and parse it using domDocument. Then I take it and parse it for the info I need. What's giving me trouble is the opposing team. When I display the data on the initial PHP page, it's correct. But when I write to an ICS file, special UTF-8 characters get messed up. I thought utf8_encode would solve that problem, but it actually seems to have the opposite effect: when I run the function on my data, even the stuff displayed on the page (which had been displaying correctly), not in the separate ICS file (which was writing incorrectly), is incorrect. As an example: it turns "Inđija" to "InÄija."
Any tips or resources as far as dealing with UTF-8 strings in PHP? My server (a remote host) doesn't have mbstring installed either, which is a pain.
utf8_encode encodes a string in ISO 8859-1 as UTF-8. If you put UTF-8 into it, it's going to interpret it as if it was ISO 8859-1, and hence produce mojibake.
To help with your first problem, before this, I'd want to know what sort of "special" characters are being messed up in the original problem, and what way are they being messed up?
I have a php script which accesses a MSSQL2005 database, reads some data from it and sends the results in a mail.
There are special characters in both some column names and in the fields itself.
When I access the script through my browser (webserver iis), the query is executed correctly and the contents of the mail are correctly (for my audience) encoded.
However, when I execute php from the console, the query fails (due to the special characters in the column names). If I replace the special characters in the query with calls to chr() and the character code in latin-1, the query gets executed correctly, but the results are also encoded in latin-1 and therefore not displayed correctly in the mail.
Why is PHP/the MSSQL driver/… using a different encoding in the two scenarios? Is there a way around it?
If you wonder, I need the console because I want to schedule the script using SQLAgent (or taskmanager or whatever).
Depending on the type of characters you have in your database, it might be a console limitation I guess. If you type chcp in the console, you'll see what is the active code page, which might something like CP437 also known as Extended ASCII. If you have characters out of this code page, like in UTF8, you might run into problems. You can change the current active code page by typing chcp 65001 to switch to UTF8.
You might also want to change the default Raster font to Lucida Console depending on the required characters as not all fonts support extended characters (right click on command prompt window's title, properties, font).
As already said, PHP's unicode support is not ideal, but you can manage to do it in PHP5 with a few well placed function call of utf8_decode. The secret of character encoding is to understand well what is the current encoding of all the tools you are using: database, database connection, current bytes in your PHP variable, your output to the console screen, your email's body encoding, your email client, and so on...
For everything that have special characters, in our modern days, something like UTF8 is often recommended. Make sure everything along the way is set to UTF8 and convert only where necessary.
PHP's poor support for the non English world is well known. I've never used a database with characters outside the basic ASCII realm, but obviously you already have a work around and it seems you just have to live with it.
If you wanted to take it a step further, you could:
1. Write an array that contains all the special chars and their CHR equivalents
2. foreach the array and str_replace on the query
But if the query is hardcoded, I guess what you have is fine. Also, make sure you are using the latest PHP, at least 4.4.x, there's always a change this was fixed but I skimmed the 4.x.x release notes and I don't see anything that relates to your problem.
The thing to remember about PHP strings is that they are streams of bytes. If you want to get the data in the correct character set (for whatever you are doing), you have to do this explicitly through some kind of function or filter. It's all pretty low-level.
Depending on your setup, you may need to know the internal character set of the strings in the database, but at the very least you need to know what character set the database is sending to PHP (because, remember, to PHP it's just a stream of bytes).
Then you have to know the target character set (and possibly specify it, which you really should anyway). For example, say that you are getting utf-8 from the database, but wish to send a latin-1 (and therefore base64 or q-printable encoded as 'Content-transfer-encoding'):
$send_string = base64_encode(utf8_decode($database_string));
Of course in this case, you'd have to know that all the utf-8 characters exist in the latin-1 character set, and you probably wouldn't really want base64 (PHP unfortunately does not have a good q-printable encoding function, though curiously, it does for decoding), and if you aren't talking about utf-8 <=> latin-1 you'll want to whip out the mbstring functions instead.
As far as the console, you'd have to know what PHP is getting when you are typing in special characters from the console, which probably depends on the shell and/or PHP settings. But remember that PHP only understands strings as byte byte byte and you should be able to work it out.
Okay, so emoji basically shows the above on a computer. Is that another programming language? So how do I put those little boxes into a php file? When I put it into a php file, it turns into question marks and what not. Also, how can I store these in a MySQL without it turning into question marks and other weird things?
how do I put those little boxes into a php file?
Same way as any other Unicode character. Just paste them and make sure you're saving the PHP file and serving the PHP page as UTF-8.
When I put it into a php file, it turns into question marks and what not
Then you have an encoding problem. Work it out with Unicode characters you can actually see properly first, for example ąαд™日本, before worrying about the emoji.
Your PHP file should be saved as UTF-8; the page it produces should be served as Content-Type: text/html;charset:UTF-8 (or with similar meta tag); the MySQL database should be using a UTF-8 collation to store data and PHP should be talking to MySQL using UTF-8.
However. Even handling everything correctly like this, PCs will still not show the emoji. That's because:
they don't have fonts that include shapes for those characters, and
emoji are still completely unstandardised. Those characters you posted are in the Unicode Private Use Area, which means they don't have any official meaning at all.
Each network in Japan uses different character codes for their emoji, mapped to different areas in the PUA. So even on another mobile phone, it probably won't display the correct character, unless you spend ages manually converting emoji codes for different networks. I'm guessing the ones you posted above are from SoftBank (iPhone?).
There is an ongoing proposal led by Google and Apple to collate the different networks' emoji and give them a proper standardised place in Unicode. Until then, getting emoji to display consistently across networks is an exercise in unhappiness. See the character overview from the standardisation work to see how much converting you would have to do.
God, I hate emoji. All that pain for such a load of useless twee rubbish.
This has nothing to do with programming languages, just with encoding and fonts. As a very brief overview: Every character is stored by its character code (e.g.: 0x41 = A, 0x42 = B, etc), which is rendered as a meaningful character on your screen using a font (which says "the character with the code 0x41 should look like this ...").
These emoji occupy the "private use area" of the Unicode table, which is a range of codes that are undefined and free for anyone to use. That makes them perfectly valid character codes, it's just that no standard font has an appropriate character to display for them, since they are undefined. Only the iPhone and other handhelds, mostly in Japan, have appropriate icons for these codes. This is done to save bandwidth; instead of transmitting relatively large image files back and forth, emoji can be transmitted using a single character code.
As for how to store them: They should be storable as is, as long as you don't try to convert them to another encoding, in which case they may get lost. Just be aware that they only make sense on the iPhone and other SoftBank phones in Japan.
Character Viewer http://img.skitch.com/20091110-e7nkuqbjrisabrdipk96p4yt59.png
If you're on OSX you can copy and paste the character into the Character Viewer to find out what it is. I think there's a similar Character Map on Windows (albeit inferior ;-P). You could put it through PHP's ord(), but that only works on ASCII characters. See the discussion on the ord page for UTF8 functions.
BTW, just for the fun of it, these characters display fine on the iPhone as is, because the iPhone has a font which has icons for them:
iPhone http://img.skitch.com/20091110-bjt3tutjxad1kw4p9uhem5jhnk.png
I'm using FF3.5 and WinXP. I see little boxes in my browser, too.
This tells me the string requires a character set not installed on my computer.
When you put the string into a PHP file, the question marks tell you the same thing: your computer doesn't know how to display the characters.
You could store these emoji characters in MySQL if you encoded them differently, probably using UTF-8.
Do a web search for character encoding, as it relates to MySQL.