I have an ods spreadsheet (managed with OpenOffice). Several cells contain multiple lines. The data table contents are used for display on a website.
When I import the file with phpmyadmin, these cells are truncated at the first newline character.
In the ods file, the newline character is char(10). In my case this has to be replaced with the string <br/>,the HTML newline tag. Writing a php program that does the replacement makes no sense since the newline character is already cut after import. For the moment I run a pc program that patches the char(10) with the '|' character in the ods file. After import, I replace the '|' with <br/> using php. Terrible! Is there a way to prevent the import by phpmyadmin to truncate on char(10)?
Thanks, Chris.
I had the same problem. My solution is not the perfect one but did the job for me.
What I did was, I replaced new line character in ODS so I can replace it back in PHP.
Open ODS file, open search&replace box then search \n and replace it some unique char where u can locate in PHP.
in my case I did something like -EOL-
in my php script replaced -EOL- with
I know it's not shortcut but a solution...
Hope it works for u as well
Related
First I want to convert pdf file to html, but the api can't do that.
So I tried to convert pdf to txt. I have a lot of problems with multiple space or line...
So I tried (again) to convert pdf to word and word. The word is perfect.
Unfortunately, ConvertApi can't convert word to html... and I can found a free library to convert word to html.
So I tried (again and again) to convert word to txt.
Now I have accents problems on the txt file :
régime become r‚gime
matière become matiŠres
contrôle become contr“le
I try to parse a csv file in PHP via SplFileObject. Sadly SplFileObject stucks sometimes if there are erroneous invisible characters in the text. The function detects a quote instead of skipping or read it as normal character while iterating over the lines in the csv file.
The screenshot below is from Textwrangler:
I also copied it from Textwrangler here (invisible char should be between "forgé." and "Circa"):
Fer forgé.� Circa
My code (SplFileObject part):
$splFile = new \SplFileObject($file);
$splFile->setFlags(\SplFileObject::DROP_NEW_LINE | \SplFileObject::SKIP_EMPTY | \SplFileObject::READ_AHEAD | \SplFileObject::READ_CSV);
$splFile->setCsvControl(",", '"', '"');
I tried to figure out which charset the csv file has via file -I my.csv. Output: my.csv: application/octet-stream; charset=binary. That is a weird result as the file is readable via Textwrangler and is therfore NOT binary. I also read another csv generated in the same way and the output is as expected: second.csv: text/plain; charset=utf-8. The tool used to generate the csv files is called Visual Web Ripper (tool for crawling web pages).
How I can determine which character this upside-down question mark is (it seems not to be the spanish upside down question mark - maybe just a placeholder inserted by Textwrangler)?
How can I delete this character and all "invalid" characters in my csv file? Is there a regular expression which matches every character, number, sign (punctuation and other textual symbols) which is in fact a real character and leave out something like in the example above? I am looking for an unicode-safe regular expression (need to preserve german umlauts, french, russian, chinese, japan and korean characters as well). Alternatively: How can I convert a csv file with charset=binary to UTF-8?
Edit:
If I open it via nano editor it shows forgé.^# Circa. After a quick search it seems to be a NUL character or \u0000 (see comments and https://en.wikipedia.org/wiki/Null_character for reference).
Edit 2:
I digged a little more into it: It seems that there is a problem with the $splFile->current() function, which reads a line at the current file pointer. The line gets truncated after the NUL character (no matter if I try to read it via SplFileObject::READ_CSV or just as normal string (without SplFileObject::READ_CSV parameter)).
The solution was to omit the SplFileObject::DROP_NEW_LINE parameter. I also checked if the NUL character is present: It is present, but it is now considered as part of the text value of the specific column in the csv and is NOT detected as quote or column enclosure.
Of course you have to filter out empty lines by yourself now with f. e. something like:
$splFileObject = new \SplFileObject();
$splFileObject->setFlags(\SplFileObject::SKIP_EMPTY | \SplFileObject::READ_AHEAD | \SplFileObject::READ_CSV);
$columns = $splFileObject->current();
if (count($columns) === 1 && array_key_exists(0, $columns) && $columns[0] === NULL) {
// empty csv line
}
I've got an excel importer for my website, which seems to be working fine - up until I found a row which has apostrophes, and it's trying to save the information into the database using �.
Example:
Branches in Vava’u, Haapai, ‘Eua and Niuatoputapu
Changes to:
Branches in Vava�u, Haapai, �Eua and Niuatoputapu
Is there any way I can fix this easily within php?
Try to replace the � by ' before saving in database. Sometimes MS Excel uses other chars with different codes for special chars (non printable ASCII codes).
Vava’u - contains 0x19 char code, use 0x27 instead
‘Eua - contains 0x18 char code, use 0x27 instead
i have a txt file with a list of country's. For my form i just read all the data in a select list line per line with fgets(). And that works fine except for some problems.
1) When i have a country with ¨ on a letter it comes in the list just as a blank.
2) When i put the data in an xml at the end it seams there is a return at the end of each value in the form of '
'.
so my question. Is there either a way to fix these problems or is there a better way to read data from a file. Or should i use on other filetype then txt?
It sounds like a trouble with the text encoding. You could try to run htmlentities on the text before echo:ing it out. Another solution is to use utf8_encode or utf8_decode (depending on which encoding your pages are served as, and on the encoding of the file).
In character data, the carriage-return (#xD) character is represented by
Just make sure that after you've read each line, you str_replace('\r', '', $line) each line to remove the carriage-return character from the end of the line.
I sometimes import data from CSV files that were provided to me, into a mysql table.
In the last one I did, some of the entries has a weird bad character in front of the actual data, and it got imported in my database. Now I'm looking for a way to clean it up.
The bad data is in the mysql column 'email', it seems to be always right in front of the actual data. When trying to print it on my screen using PHP, it shows up as �. When exporting it to a CSV file, it looks like  , and if I SET CHARACTER SET utf8 before printing it on the screen using PHP, it looks like a normal space ' '.
I was thinking of writing a PHP script that goes over all my rows one at a time, fix the email address field, and update the row. However I'm not quite sure about the "fix the email" part!
I was thinking maybe to do a "explode" and use the bad character as a delimiter, but I don't know how to type that character into my code.
Is there maybe a way to find the underlying value/utf8/hex or whatever of that character, then find it in the string?
I hope it's clear enough.
Thanks
EDIT:
In Hex, it looks like it's A0. What can I do to search and delete a character by its hex value? Either in PHP or directly in MySQL I guess ...
SELECT HEX(field) FROM table; should help determine the character.
As an alternative solution, it might actually be easier to fix the issue at the source. I've encountered similar problems with CSV files exported from Excel and have generally found that using something along the lines of...
$correctedLine = mb_convert_variables('UTF-8', 'Windows-1252', $sourceLine);
...tends to rectify the issue. (That said, you'll need to ensure that you have the multi byte string extension compiled in/enabled.)
you can trim any leading unprintable ascii char with something like:
update t set email = substr(email, 2) where ascii(email) not between 32 and 126
you can get the ascii value of the offending char with this:
select ascii(email) as first_char
I think I found a PHP answer that seems to work more reliably:
$newemail = preg_replace('/\xA0/', '', $row['oldemail']);
And then I'm going to update the row with the new email