I need to be able to produce an X amount of Word Documents so i'm using PHPWord and a certain template.
I finally worked the replace Values issue but i'm stuck at inserting UTF-8 strings in the doc.(Greek)
echo mb_detect_encoding($date); right before the insert shows UTF-8.
Now i found the function that replaces them and it contains this
if(!is_array($replace)) {
$replace = utf8_encode($replace);
}
So i thought double encoding might mess it up. I removed that but then when i try to open the
doc i just produced it won't. Some kind of error with the XML.
Anyone knows how can i bypass this? Or with what encoding should i insert the values to Word ?
Related
I'm grabbing a bunch of data from a database and putting it into a PHP array. I'm then looking to json_encode that array using $output = json_encode($out).
My issue is that from time to time, something in the array is not able to be read by json_encode and the whole thing fails. If I use print_r($out) to have a look, I can clearly see where it's failing, because the character that is screwing things up always appears as a question mark inside of a black diamond �.
First - what are these characters?
Second - Is there a function I can pass the elements through prior to adding them to the array that would strip these out, or replace 'them' with blanks?
I found the answer to this. Since the data coming FROM the database was stored with the "black diamond" character, I needed to get this out POST grabbing it from the database.
$x[4] = utf8_encode(odbc_result($query, 'B'));
By passing the result through utf8_encode, the string is encoded into UTF-8 and the illegal character is removed.
Say echo json_encode($out);
This will solve your issue
Black diamonds are browser issue. Database uses plain question marks.
It seems you are getting already wrong data from databalse. But that's quite tricky to have incorrect utf with your settings. You need to check everything
if your table marked with utf8 charset
if your data indeed encoded in utf (not marked but indeed encoded)
if your server sending correct charset in Content-type header.
it is also useful to see the page choosing different charsets from your browser menu.
But first of all you have to wipe any trace of all random actions you tried, all these various encode, decode and stuff. Just plain and direct output from database. Otherwise you will never get to the problem
In my db I have a field value looking like this:
ΜΑΚΑΡΙΟΥ Γ\'
I think it must be Greek chars inserted when I didn't have set UTF-8 for my db (I think I was using the default Latin 1).
Is there a way to get the actual characters?
Thank you
Not sure, Try this :
$str = "ΜΑΚΑΡΙΟΥ Γ\'";
$val = iconv(mb_detect_encoding($str), "UTF-8", $str);
echo $val;
Try saving the data into a text file and opening the text file in a hex editor (there are a bunch of good free ones). That could show you the underlying code values of the letters, which you could then match against published encodings.
For example, this page lists Unicode values for Polytonic Greek values (not sure you were using Polytonic, though): http://leb.net/reader/text/standards/unicode/old/MappingTables/NewTables/Polytonic_Greek.txt.
Looking at the text with a hex editor will help you to get code values to look up in lookup tables like this.
I have had the problem a few times now while working on projects and I would like to know if there's an elegant solution.
Problem
I am pulling tweets via XML from twitter and uploading them to my DB however when I output them to screen I get these characters:
"moved to dusseldorf.�"
OR
también
and if I have Russian characters then I get lots of ugly boxes in place.
What I would like is the correct native accents to show under one encoding. I thought was possible with UTF-8.
What I am using
PHP, MYSQL
After reading in the XML file I am doing the following to cleanse the data:
$data = trim($data);
$data = htmlentities($data);
$data = mysql_real_escape_string($data);
My Database Collation is: utf8_general_ci
Web page character set is: charset=UTF-8
I think it could have something to do with HTML entities but I really appreciate a solution that works across the board on projects.
Thanks in advance.
Replace this line:
$data = htmlentities($data);
With this:
$data = htmlentities($data, null, "UTF-8");
That way, htmlentities() will leave valid UTF-8 characters alone. For more information see the documentation for htmlentities().
You need to change your connection's encoding to UTF-8 (it's usually iso-8859-1). See here: How can I store the '€' symbol in MySQL using PHP?
Calling htmlentities() is unnecessary when you get the encodings right. I would remove it completely. You'll just have to be careful to use htmlspecialchars() when outputting the data a in HTML context.
Make sure that you set your php internal encoding ot UTF8 using iconv_set_encoding, and that you call htmlentities with the encoding information as EdoDodo said. Also make sure that you're database stores with UTF8-encoding, though you say that's already the case.
You can't use htmlentities() in it's default state for XML data, because this function produces HTML entities, not XML entities.
The difference is that the HTML DTD defines a bunch of entity codes which web browsers are programmed to interpret. But most XML DTDs don't define them (if the XML even has a DTD).
The only entitity codes that are available by default to XML are >, < and &. All other entities need to be presented using their numeric entity.
PHP doesn't have an xmlentities() function, but if you read the manual page for htmlentities(), you'll see in the comments that that plenty of people have had this same issue and have posted their solutions. After a quick browse through it, I'd suggest looking at the one named philsXMLClean().
Hope that helps.
I seem to be completely unable to get around utf-8 character encoding.
So I'm exporting content from a database as a utf-8 xml file.
The software I am importing into is quite strict about character encoding, so I can't just put everything in CDATA tags.
There's a whole bunch of weird characters, e.g. ’, — … already in the data.
These aren't working in the xml and need to be replaced out (normally with just a ' quote).
Ideally, I'd like to decode all the characters, and then use htmlspecialchars($text, ENT_COMPAT, 'UTF-8', FALSE) to encode them back again. But I can't seem to find a function that will decode them. Is there one?
I've started to manually go through each entity with a str_replace() but it's turning into a much bigger job than I anticipated.
Any help would be a lifesaver.
Thanks
html_entity_decode() perhaps?
in some cases, in character conversion issues in php, it is important to have a locale set. Doesn't matter which, e.g.
setlocale(LC_CTYPE,'en_US.utf8');
But I would advise that any time invested in getting the encoding right from the beginning, without reverting to entities, if at all possible, is worth it.
I sometimes import data from CSV files that were provided to me, into a mysql table.
In the last one I did, some of the entries has a weird bad character in front of the actual data, and it got imported in my database. Now I'm looking for a way to clean it up.
The bad data is in the mysql column 'email', it seems to be always right in front of the actual data. When trying to print it on my screen using PHP, it shows up as �. When exporting it to a CSV file, it looks like  , and if I SET CHARACTER SET utf8 before printing it on the screen using PHP, it looks like a normal space ' '.
I was thinking of writing a PHP script that goes over all my rows one at a time, fix the email address field, and update the row. However I'm not quite sure about the "fix the email" part!
I was thinking maybe to do a "explode" and use the bad character as a delimiter, but I don't know how to type that character into my code.
Is there maybe a way to find the underlying value/utf8/hex or whatever of that character, then find it in the string?
I hope it's clear enough.
Thanks
EDIT:
In Hex, it looks like it's A0. What can I do to search and delete a character by its hex value? Either in PHP or directly in MySQL I guess ...
SELECT HEX(field) FROM table; should help determine the character.
As an alternative solution, it might actually be easier to fix the issue at the source. I've encountered similar problems with CSV files exported from Excel and have generally found that using something along the lines of...
$correctedLine = mb_convert_variables('UTF-8', 'Windows-1252', $sourceLine);
...tends to rectify the issue. (That said, you'll need to ensure that you have the multi byte string extension compiled in/enabled.)
you can trim any leading unprintable ascii char with something like:
update t set email = substr(email, 2) where ascii(email) not between 32 and 126
you can get the ascii value of the offending char with this:
select ascii(email) as first_char
I think I found a PHP answer that seems to work more reliably:
$newemail = preg_replace('/\xA0/', '', $row['oldemail']);
And then I'm going to update the row with the new email