Accents issue doc to txt

Accents issue doc to txt - php

First I want to convert pdf file to html, but the api can't do that.
So I tried to convert pdf to txt. I have a lot of problems with multiple space or line...
So I tried (again) to convert pdf to word and word. The word is perfect.
Unfortunately, ConvertApi can't convert word to html... and I can found a free library to convert word to html.
So I tried (again and again) to convert word to txt.
Now I have accents problems on the txt file :
régime become r‚gime
matière become matiŠres
contrôle become contr“le

Related

LibreOffice writer truncate html .doc files at 65533 characters?

I generate a .doc html formated file from a PHP script. Everithing work fine, my file is well generated, but if I try to open it with LibreOffice (v4.2.8.2) the file is silently truncated to the 65533th character when displayed.
Is there a workaround ? Is it a bug ? Have you any informations about that?

I found the problem. All my text was within the <body> tag. I broke parts of my text within <div> and it worked (it won't work with html5 tags such as <article>).
I think that LibreOffice can't handle more than 65533 characters by tags.
In addition to that I also remarked the "same" problem in LibreOffice Calc, if you open a .xls file html formated, it will not display more than 65533 non empty cells (here I didn't find (/searched) a workaround).
I think it's a BIG bug with this software (I didn't test with other such as Ooo or MS Office). At least a warning message might be displayed.

Convert doc to txt

I'm on a Linux server and I need to convert MS Word 97-2003 .doc format to plain text .txt files using PHP
I already tried this solutions:
How to extract text from word file .doc,docx,.xlsx,.pptx php
Extract text from doc and docx
But both are just working fine for .docx format.
The issue is when I convert files, I got scrap characters at the end of the text.
The length of the chars I don't need vary depending on the length of the file.
Also, it may happen that if the file is a bit long, it get truncated.
Is there any simple way to get this converted?

I've lastly come to use the following solution, launching Antiword:
private function doc() {
$file = escapeshellarg($this->filename);
$text = `/usr/sbin/antiword -w 0 $file`;
return html_entity_decode(utf8_encode(trim($text)));
}

Import ods with newline in cells

I have an ods spreadsheet (managed with OpenOffice). Several cells contain multiple lines. The data table contents are used for display on a website.
When I import the file with phpmyadmin, these cells are truncated at the first newline character.
In the ods file, the newline character is char(10). In my case this has to be replaced with the string <br/>,the HTML newline tag. Writing a php program that does the replacement makes no sense since the newline character is already cut after import. For the moment I run a pc program that patches the char(10) with the '|' character in the ods file. After import, I replace the '|' with <br/> using php. Terrible! Is there a way to prevent the import by phpmyadmin to truncate on char(10)?
Thanks, Chris.

I had the same problem. My solution is not the perfect one but did the job for me.
What I did was, I replaced new line character in ODS so I can replace it back in PHP.
Open ODS file, open search&replace box then search \n and replace it some unique char where u can locate in PHP.
in my case I did something like -EOL-
in my php script replaced -EOL- with
I know it's not shortcut but a solution...
Hope it works for u as well

PHP Converting UnUnicode characters to Unicode (Full)

I've been trying to convert some special and ununicode characters to the original ones but i didn't get what i needed .
Lets have an example. I have a file in Persian name and i want to upload it. when i upload it the name changes to some strange characters like Ø ! I dont know why (And i dont want to know) but when i start to download it with IDM , the strange name changes back to its original .
Original File Name: ترمینال_گنبدکاووس.rar
Strange File Name: ØªØ±Ù…ÛŒÙ†Ø§Ù„_Ú¯Ù†Ø¨Ø¯Ú©Ø§ÙˆÙˆØ³.rar
And this is not just my problem. the file name in every language converts to its own strange characters. and in another example when i want to convert babylon dictionaries (bgl) to stardict (mdx) format, the phonetics are converted to strange characters again.
I want to know that is there an absolute way to convert all these characters back to its original using PHP ???????

Using PHP to write a CSV file with Spanish characters

I have a php script with utf-8 encoding. In it I have an array with special characters (like and n with a ~ on top). It looks just fine in my editor. The php matches the array with text coming in from a html form and writes a csv file. When I write the file I do it like this;
fwrite($fp,utf8_encode($data),strlen($data)+100);
When I open the file it says it is utf-8 encoded but the charters are all messed up.

have you tried without using utf_encode() on the data?
it seems that you are reencoding something that's already utf-8 encoded

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Accents issue doc to txt - php

Related

LibreOffice writer truncate html .doc files at 65533 characters?

Convert doc to txt

Import ods with newline in cells

PHP Converting UnUnicode characters to Unicode (Full)

Using PHP to write a CSV file with Spanish characters

Categories

Resources