I sometimes import data from CSV files that were provided to me, into a mysql table.
In the last one I did, some of the entries has a weird bad character in front of the actual data, and it got imported in my database. Now I'm looking for a way to clean it up.
The bad data is in the mysql column 'email', it seems to be always right in front of the actual data. When trying to print it on my screen using PHP, it shows up as �. When exporting it to a CSV file, it looks like  , and if I SET CHARACTER SET utf8 before printing it on the screen using PHP, it looks like a normal space ' '.
I was thinking of writing a PHP script that goes over all my rows one at a time, fix the email address field, and update the row. However I'm not quite sure about the "fix the email" part!
I was thinking maybe to do a "explode" and use the bad character as a delimiter, but I don't know how to type that character into my code.
Is there maybe a way to find the underlying value/utf8/hex or whatever of that character, then find it in the string?
I hope it's clear enough.
Thanks
EDIT:
In Hex, it looks like it's A0. What can I do to search and delete a character by its hex value? Either in PHP or directly in MySQL I guess ...
SELECT HEX(field) FROM table; should help determine the character.
As an alternative solution, it might actually be easier to fix the issue at the source. I've encountered similar problems with CSV files exported from Excel and have generally found that using something along the lines of...
$correctedLine = mb_convert_variables('UTF-8', 'Windows-1252', $sourceLine);
...tends to rectify the issue. (That said, you'll need to ensure that you have the multi byte string extension compiled in/enabled.)
you can trim any leading unprintable ascii char with something like:
update t set email = substr(email, 2) where ascii(email) not between 32 and 126
you can get the ascii value of the offending char with this:
select ascii(email) as first_char
I think I found a PHP answer that seems to work more reliably:
$newemail = preg_replace('/\xA0/', '', $row['oldemail']);
And then I'm going to update the row with the new email
Related
I'm grabbing a bunch of data from a database and putting it into a PHP array. I'm then looking to json_encode that array using $output = json_encode($out).
My issue is that from time to time, something in the array is not able to be read by json_encode and the whole thing fails. If I use print_r($out) to have a look, I can clearly see where it's failing, because the character that is screwing things up always appears as a question mark inside of a black diamond �.
First - what are these characters?
Second - Is there a function I can pass the elements through prior to adding them to the array that would strip these out, or replace 'them' with blanks?
I found the answer to this. Since the data coming FROM the database was stored with the "black diamond" character, I needed to get this out POST grabbing it from the database.
$x[4] = utf8_encode(odbc_result($query, 'B'));
By passing the result through utf8_encode, the string is encoded into UTF-8 and the illegal character is removed.
Say echo json_encode($out);
This will solve your issue
Black diamonds are browser issue. Database uses plain question marks.
It seems you are getting already wrong data from databalse. But that's quite tricky to have incorrect utf with your settings. You need to check everything
if your table marked with utf8 charset
if your data indeed encoded in utf (not marked but indeed encoded)
if your server sending correct charset in Content-type header.
it is also useful to see the page choosing different charsets from your browser menu.
But first of all you have to wipe any trace of all random actions you tried, all these various encode, decode and stuff. Just plain and direct output from database. Otherwise you will never get to the problem
I need to be able to produce an X amount of Word Documents so i'm using PHPWord and a certain template.
I finally worked the replace Values issue but i'm stuck at inserting UTF-8 strings in the doc.(Greek)
echo mb_detect_encoding($date); right before the insert shows UTF-8.
Now i found the function that replaces them and it contains this
if(!is_array($replace)) {
$replace = utf8_encode($replace);
}
So i thought double encoding might mess it up. I removed that but then when i try to open the
doc i just produced it won't. Some kind of error with the XML.
Anyone knows how can i bypass this? Or with what encoding should i insert the values to Word ?
I'm using php and mysqli and I meet a problem with an insert query which looks like :
SET NAMES 'utf8'
$text = mysqli_real_escape_string($connection, $text)
insert into table values('', '".$text."');
Pages are encoded utf8 without BOM and mysql is utf8 general ci
The problem is when I use phpmyadmin the request works fine but when I use website interface and type a text with character "+" it replace with a space " " in mysql but all other characters like ', ", accents, \, /, % are correctly inserted...
It worked before I probably made a mistake.
Thanks you by advance and sorry for my poor english.
It is neither mysql, not mysqli, not PHP.
None of them put any special meaning in this character.
If you care to verify your inserts, by simply echoing $text out before insert, you will see that it is already stripped of + sign. So, you have to find the code that strips that symbol out.
A program is not a "black box" which you feed with data and it returns some unexpected output.
But rather set of operators, each performing some data manipulations.
So, you have to debug your code, means you have to echo your $text variable out in various parts of your code to see where it gets changed. Most likely it is getting some unnecessary treatment. After finding that code you may either remove it or ask here if it ok or not.
The only possible case of automated replacement of + character would be if you type your text right in the browser's address bar. In this case + can be replaced with space automatically as PHP does decode urlencoded text and + is used to substitute space character in the URL
I am trying to replace £ with £ and it did not work.
I've tried:
echo str_replace("£", "£", "£3 Discount Discount");
I have also tried html_entity_decode which also did not work.
This is an issue with trying to display UTF-8–encoded data as non–UTF-8. You need to make sure that all character encodings are consistent, and if not then you're converting between them appropriately. The easiest way is to ensure that absolutely everything is in UTF-8. This includes:
The data that's saved in the database (MySQL's character set / collation)
The client connection to the database (Using SET NAMES UTF-8)
The output to the browser (header('Content-Type: text/html; charset=utf-8');)
The PHP script containing the code (yes, this sometimes has an impact)
I would first suggest checking that there isn't any mojibake in your database (e.g. using phpMyAdmin or command-line client), before checking the character sets above. If you find that the database actually contains £, then I would suggest applying the same logic above to any input mechanism to the database (including character encoding of HTML forms).
(Note: I've assumed MySQL throughout this answer.)
If you're able try and use £ instead of the £ character and save yourself the trouble.
You can try cleaning it up in the DB instead. Adapt this query to suit your needs.
UPDATE YOUR_TABLE_NAME SET THE_ROW = REPLACE(THE_ROW , '£', '£');
Try utf8_decode() instead.
i have a txt file with a list of country's. For my form i just read all the data in a select list line per line with fgets(). And that works fine except for some problems.
1) When i have a country with ¨ on a letter it comes in the list just as a blank.
2) When i put the data in an xml at the end it seams there is a return at the end of each value in the form of '
'.
so my question. Is there either a way to fix these problems or is there a better way to read data from a file. Or should i use on other filetype then txt?
It sounds like a trouble with the text encoding. You could try to run htmlentities on the text before echo:ing it out. Another solution is to use utf8_encode or utf8_decode (depending on which encoding your pages are served as, and on the encoding of the file).
In character data, the carriage-return (#xD) character is represented by
Just make sure that after you've read each line, you str_replace('\r', '', $line) each line to remove the carriage-return character from the end of the line.