I have created one php page with UTF-8-BOM encoding. I want to use this encoding because I have some content which are in my regional language, and do display it properly i need to use UTF-8-BOM encoding.
Now I want to use session with this page but it is throwing error of headers already set.
So is there any way i can use both together.
If I am trying to use UTF-8 only I am not getting problem displaying data in regional format.
See Attached Image
The "Byte Order Mark" is a sequence of 3 bytes that a file begins with, making it pretty much incompatible with PHP, because a script that is supposed to contain only PHP code must start with the <?php tag instead.
Obviously, it's not like the whole thing doesn't work at all, but anything that involves sending HTTP headers (which is A LOT) automatically gets broken.
Sessions use cookies - transferred via headers - won't work.
Redirecting to another page - the Location header - won't work.
Dynamically generated downloads - the downloaded file itself will be broken.
etc.
Sorry, but you'll have to give up on BOM and figure another way to handle your locale-specific data (which I can only assume is using another charset for whatever reason).
Related
When I try that line:
echo "<script >alert(' مشترك بالفعل!'); location.href='index.php';</script>";
in chrome it display garbage but firefox display it correctly. What's wrong with chrome?
Any help will be appreciated. Thanks!
That is because, firefox rockssss – The COMPLETE PHP Newbie 47 secs ago
That comment is probably right (although maybe accidentally. :) Firefox may be sniffing the document's encoding with more flexibility than Chrome is.
The most likely explanation is that your HTML document's encoding is not defined, and the PHP source file (where you store the text in) is stored in a different encoding than you are outputting.
Make sure the encoding of the PHP file, and the HTML document you're outputting, match.
The encoding of the PHP source file can probably be set in your IDE
The encoding of the HTML page is defined by the Content-type header your web server sends, and/or the Content-type META tag.
This question gives a complete overview: UTF-8 all the way through
Using ARC2, textual data gets corrupted.
My RDF input file is in UTF-8. It gets loaded in ARC2, which uses a MySQL backend, through a LOAD <path/to/file.rdf> query. The MySQL database is in UTF-8 too, as a check with PHPMyAdmin makes sure.
However, the textual data gets corrupted. After several conversion checks, the problem seems to be that the original UTF-8 file is believed to be in ISO-8859-1, and converted to UTF-8 once again.
Example: "surmonté" → "surmonteÌ".
This "surmonteÌ" is actulally available in UTF-8 in the database.
Is this related to the way ARC2 opens files (digging through the code, not exhaustively but quite deep, did not show anything suspicious), or could this be a more general case with PHP and MySQL?
How can I make sure the imported data is not wrongly re-encoded but taken as the original?
ARC2 uses two functions: $store->setUp(), which CREATEs TABLEs and DATABASE if needs be; and query(LOAD…, a detailed in the question.
It turns out, the setUp() part must not be called in the same script as the load part. At least, not during the same execution. The solution I took was to make two separate scripts, one to init the database, another to load the data, but simply commenting out the init part once it is done also works. In any case, the trick is to make sure the loading won't take place right after the initialization.
This happens because the SET NAMES utf8 encoding specification upon DB connection is set only after collation detection, for which MySQL does not seem to detect properly if the database has just been created. I made a pull request of a fix.
As a side note, it is not efficient to use the LOAD <path/to/file.rdf construct of the question: this will be computed as a relative web address, calling the server to download from itself through the network. It is much more efficient to use a construct such as:
$store->query('LOAD <file://' . dirname(__FILE__) . '/path/to/file.rdf>')
I scrape some sites that occasionally have UTF-8 characters in the title, but that don't specify UTF-8 as the charset (qq.com is an example). When I use look at the website in my browser, the data I want to copy (i.e. the title) looks correct (Japanese or Chinese..not too sure). I can copy the title and paste it into the terminal and it looks exactly the same. I can even write it to the DB and when I retrieve from the DB it still looks the same, and correct.
However, when I use cURL, the data that gets printed is wrong. I can run cURL from the command line or use PHP .. when it's printed to the terminal it's clearly incorrect, and it remains that way when I store it to the DB (remember: the terminal can display these characters properly). I've tried all eligible combinations of the following:
Setting CURLOPT_BINARYTRANSFER to true
mb_convert_encoding($html, 'UTF-8')
utf8_encode($html)
utf8_decode($html)
None of these display the characters as expected. This is very frustrating since I can get the right characters so easily just by visiting the site, but cURL can't. I've read a lot of suggestions such as this one: How to get web-page-title with CURL in PHP from web-sites of different CHARSET?
The solution in general seems to be "convert the data to UTF-8." To be honest, I don't actually know what that means. Don't the above functions convert the data to UTF-8? Why isn't it already UTF-8? What is it, and why does it display properly in some circumstances, but not for cURL?
have you tried :
$html = iconv("gb2312","utf-8",$html);
the gb2312 was taken from the qq.com headers
I'm trying to figure this out but I'm quite puzzled at the mo.
I have a directory in my website containing pdf files with greek filenames (ie ΤΙΜΟΚΑΤΑΛΟΓΟΣ.pdf)
I want to have links for the files on a web page so that users can open or save the files.
So far I can list the files ok but if I click on them I get a 404 error. It's as if the server thinks they're not there although they are.
I understand it's problably an encoding issue but beyond that I'm not sure what to look for. The website encoding is utf-8 and in order to display the filenames correctly I had to use mb_convert_encoding($file->filename, 'utf8', 'iso-8859-7').
This is the url: http://www.med4u.gr/timokatalogoi/
This is the directory listing: http://www.med4u.gr/pricelists/
The site is based on Joomla and it's hosted on a linux server.
Any ideas?
ISO-8859-* MUST DIE! (That's not personal!) Do everything in UTF-8. Everything. With good reason, some of us get upset when we see them being used, especially Latin-1 (8859-1) which bites a lot of people. I think you would find it very helpful to just dump them and move on to UTF-8.
Things to check:
Store your files encoded in UTF-8: Usually no difficulties with that.
Make sure your server is sending the files with UTF-8 charset: add header('Content-Type: text/html;charset=UTF-8'); near the top of your PHP.
Just in case someone saves your page, it's helpful in that case to put the same thing in a <meta> tag in the head.
Check it all in your browser: right click, view page info, and make sure the encoding is right.
CPanel is very flexible, so that's all doable without much fuss. Feel free to comment if you want more detail.
If you have a database, there are a few more hoops to jump through, but it's worth it. With UTF-8 you never have to worry, and it's the definitive, future-proof way of doing things.
Let's suppose for the sake of argument that the file name on disk is aa.pdf but your conversion displays it as ab.pdf. You need either to revert the conversion so it points back to aa.pdf, or teach the server to remap or redirect requests for ab.pdf to this file. Or if you prefer, rename the file to ab.pdf instead, if your file system can handle this name.
It's definitely an encoding problem. You'll need to escape the URL, or convert it to whatever character set your server recognises.
e.g. 'ΤΙΜΟΚΑΤΑΛΟΓΟΣ LASER.pdf' in iso-8859-7 = 'ÔÉÌÏÊÁÔÁËÏÃÏÓ LASER.pdf' in iso-8859-1
I have content stored in a Postgres DB, now everytime I call the content so that it gets displayed using php, i get funny squares in IE and funny square type question marks in Firefox?
Example below
* - March � May 2009
How do I remove this?
I do not have access to the server so can't adjust the encoding there, only have postgres DB details and FTP access to upload my files
I would also recommend: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky, I've read it only recently myself, it will definitely help you sort out your problems.
You need to make sure that Postgres, PHP, and your browser all agree on the content encoding, and that you have an appropriate font selected in your browser. The simplest way to do that is to choose UTF8 for everything.
I don't know about PHP, but I do know about databases and browsers. First you need to find out if the database is UTF8. (From psql, I would do a "\l" and look at the encoding.) Then you need to find out if PHP supports UTF8 (I have no idea how you do that). Then you need to see if how those characters are being stored in the database by the PHP app. Then you need to figure out if the web server is correctly reporting the content encoding. (On Linux/Unix, I'd use the program "HEAD" (not "head") to see the headers its returning.) And then you need to figure out if your browser is using a font that supports UTF8.
Or, you could just make sure you only store ASCII and forget the rest of the world exists. Not recommended.
Wrong charset somewhere. The characters could be stored wrong already in database, or you have wrong charset in meta tags on the page(try manually change charset in browser), or there could be problem with wrong encoding when page is communicating with database.
Check this page http://www.postgresql.org/docs/8.2/static/multibyte.html for more informations.
Try to have same encoding on all places, preferably UTF-8
You have encoding issues. Make sure the encoding is set right in the database, in the html markup and make sure the files themselves are saved in proper encoding.