I have some xml files that contain text, which are displayed on my website. I want to extract the text from these xml files and convert them to a pdf document that users can download.
how can I can extract this text from the xml documents? (libraries etc?)
how can I use this text to create a pdf document?
I am working in a PHP environment, however if this is not the suitable language, I could change.
There are many ways to parse an XML file, and many ways to output a PDF file.
I suggest you start with the XML functions within PHP http://php.net/manual/en/book.xml.php
There are also various classes to write PDF files, try googleing for them.
Related
I have generated an E-bill using html, css, bootstrap and php. Is there any way I can get it directly into PDF format (easily downloadable format)?
What all I could find was creating pdf using the code(FPDF etc). But that's not what I want. I already have the e-slip structure ready.
Thanks
Is there a way in PHP that can open remote PDF file, searching for specific text, replace it with new one & save the modified PDF file on the server?
Thanks
You can use a combination of tools or just one of them. Depends on how complex the file is and how much noise either of these generates.
DomPDF - converts HTML to PDF.
TCPDF - creates PDF files.
FPDI - uses already existing PDF files.
My personal recommendation would be to go for FPDI.
After many days of research, there is no plugin can extract pdf data from 'extraction protected' PDF.
I have uploaded the doc files in my server and now i have to show the content of doc files in web page. Now i am not getting any way to show that content as doc may contain images, tables, and other type of texts.
I tried to work with file_get_contents and then using headers but not possible in all ways.
Firstly, .doc/.docx file is not a text file. I don't think with file_get_contents you shall get any normal text character as output.
You can do two things:
Convert the doc/docx to pdf and show in your web page [Try: OfficeToPdf]
Convert the doc/docx to html compatible format using library [Try: PhpWord]
Links
PhpWord: http://phpword.codeplex.com/
OfficeToPdf: http://officetopdf.codeplex.com/
The libraries I mentioned are for example, there are a plenty of libraries available for free. Just Google to get them. Good luck :)
Docx is a zipped file format, so you can't get any information with file_get_contents.
Doc is not zipped, so you will directly get all information in XML format when opening with file_get_contents.
If you want to convert the files for a webviewing, you could use one of the libraries mentioned by #programmer
PhpWord: http://phpword.codeplex.com/
OfficeToPdf: http://officetopdf.codeplex.com/
I'd like to be able to parse a PDF file using PHP, specifically search the large PDF file using certain keywords and split it into multiple PDF files at the keywords searched for.
I did some research and I found a lot of ways to write PDF files using PHP but very few to parse and split the PDF file.
any web-sites I could go to? libraries I could use?
I am trying to parse a word document file. I upload the using PHP then I am trying to get contents using file_get_contents(); function but the problem is when its displayed in front end a lots of garbage code in there like
Æ�Ѐ¤d�¤d�[$\$gd®l±����„h¤d�¤d�[$\$^„hgd®l±���
&�F�¤d�¤d�[$\$gd3¡���gd3¡����„,¤d�¤d�[$\$^„,gd(E����¤d�¤d�[$\$gdÿ/��<��C��D��I��Å������O��P��‚��¡��¢��¬����®��Ù��ã��ó��ô�����
So my question is how can I clean up this text?
Maybe give this a shot? http://www.phpclasses.org/package/3553-PHP-Edit-Microsoft-Word-documents-using-COM-objects.html
Word documents (like docx and doc) are not straight text files - they are actually proprietary file types that do not just have the text from byte 0 - this is how they have fancy formatting and fonts. .docx files are actually archives (.zip files) that contain a myriad of XML and styles.
Your best bet is to use a text input form, or find code online that allows you to extract just the text. Or, download the doc files to your own computer and use your own copy of MS word to open it.