Advanced ePub reader - php

I'm trying to build an advanced ePub reader using jQuery and PHP/Zend Framework1.12 (for the epub3.0 format). The reader should contain the following features:
books should be displayed using pages (2 pages at a time)
the user should be able to navigate between pages and chapters using a slider
the user can create highlights and bookmark pages
the reader must be cross-browser (I don't care much about older versions of IE, but it must work on Safari, Mozilla, Chrome)
My idea is to make some kind of PHP parser that will handle the epub content and pass it on to the Javascript code in a more 'friendly' format, but I haven't worked with epubs before and I'm not sure where to start.
Here are a few questions that I have been struggling with:
The first problem I have encountered is how to extract the content from an .ePub archive and render it in a such a way that will allow the paginated view. What PHP library would you recommend for parsing epubs? I have already tested some libraries like BookGluttonEpub (seems quite old) and EPUBParser (difficult to understand since there are no examples and docs). Are there others I missed?
Should I clean the html code (like remove invalid tags for example) before passing it to the reader?
What do you consider is the best way to display the pages? Should I use CSS and the 'column' property? Or should I make a more advanced script that will split the html content of a chapter into pages?
Thanks

First extract the .epub file, its same like your zip file so you can use PHP unzip library and dont need to parse HTML or CSS. You can create your reader using HTML5 canvas and CSS 3 properties.
I think better option is to use HTML5 and CSS3 if you are not thinking IE compatibility.

Related

Convert PDF to HTML version 3.2 with images and html file in a folder

I hope you are doing well.
I need to know about a PHP library that converts a PDF file having images as well to be converted in a HTML file with the following features that the library can do.
HTML file needs to be of version 3.2 compatible
Save the images in PDF file having .jpg extension
Correct font from PDF needs to be used in the HTML file.
A result folder that contains the images and html file in one folder
I have tried most of the PHP libraries but most of the PHP libraries are NOT doing my needed tasks.
Please, help let me know about a library that do all the above 4 requirements (image attached for reference)
Waiting for your kind responses.
Thanks
I am not very sure, But here is a library in PHP I found.
Here
Try this:
http://www.pdfaid.com/pdf-to-html.aspx
Or this:
http://webdesign.about.com/od/pdf/tp/tools-for-converting-pdf-to-html.htm
Or this...
http://www.pdfconvertonline.com/pdf-to-html-online.html
There are plenty of options available to you, the secret is to use a new fangled thing called a Search Engine, such as a Bing or a Google.
you will also do well to research on Stack Overflow before asking your question:
1) HTML 3.2 wes superceeded in 1997, this is very nearly twenty years ago, why on eart are you still needing a comparatively ancient technology when there are far better improvements available such as XML HTML, HTML 4.01 and HTML5.
2) Please read How can I extract embedded fonts from a PDF as valid font files?
3) Also to extract images you can use:
http://www.makeuseof.com/tag/extract-images-pdf-files-save-windows/
but again, there are several options available to you if you care to look for them.
You seem to imply a fundamental misunderstanding about HTML; there are several different ways of getting any desired result with HTML. You have a PDF file and you want it to look a certain way, this look depends on the browser you are looking at it on. For example if you use a PDF to HTML converter as linked above you will very probably find that the output will look different on Internet Explorer 7 versus on Firefox versus Internet Explorer 10. There is no one way of writing output on HTML or with CSS.
If you want a custom built library to do your specific task then you will need to employ a professional to do it, or you will need to code it yourself. This obviously should be charged to the client for requiring a technology that is extremely outdated. You can probably search github for a similar library (the one linked by CK Khan looks like what you're after) and then fork it and make your own variation for your needs. I very much doubt anyone is going to put time into developing a system to output HTML 3.2 from a PDF, and even less likely to develop this system for free and to your exact specifications.
It also appears that you can not directly incorporate font families into the <font> tag in HTML 3.2, only being able to edit size and colour of fonts. You can use CSS1 font-family to show font families. See here.

Extracting portion of the HTML page

Is it possible to extract a portion of a remote HTML page, and print it on another page, using PHP cURL, HTML DOM parser, or any other method, preserve the original formatting styles, images, tabs functioning?
For example, how to extract content of central column (with tabs and formatting, preserve the look of the original text), from http://ru.yahoo.com/?
As far as I understand, the script should process an external CSS, so that returned content has the same look as the original. What would be most appropriate way, if that's possible? If yes, an example would be highly appreciated. I looked several examples, but didn't find any solution for my case.
Well if I had to do it quickly (read: very dirty) I would do is this I think:
Pull the HTML from the remote server using standard PHP
Use the HTML that you stole took from the other site and add your own HTML to it down at the bottom.
Also add your own CSS to hide the html of the other site you don't want to be visible and style your own html.
Fiddle until it look okay enough. However: I think this will break the loading of the external JS files because of the same domain policy.
A nice approach would be this:
Pull the HTML from the remote server using standard PHP
Parse the HTML with some PHP HTML parser and strip out all external CSS and JS files and pull those files as well.
Use XPath to extract the parts that you need.
Create a new HTML document with your own HTML, the parts that you need, new links to your newly downloaded CSS and JS files. Also add your own CSS and JS to style the result.
You know: RSS was invented for this and if they don't provide an rss feed they most likely don't want you to get the content and post it on your own site. :P

Generate PDF from HTML PHP

I want to generate PDF from a PHP file that includes HTML controls like textbox, and textarea. I attached CSS in the same. I tried FPDF, DOMPDF and TCPDF, but still I don't get exactly what I want. How do I pass HTML controls with PHP variables and CSS to these libraries?
mpdf is another option that you could try.
EDIT :
Found another solution for it, TCPDF is a FLOSS PHP class for generating PDF documents. Looks more dominating library.
"PRINCEXML" is a good library (not completely free now).
Others:
If your meaning is to create a PDF file from PHP, pdflib will help you (as some other suggested).
Else, if you want to convert an HTML page in PDF via PHP, you'll find
a little trouble outta here.. For three years I have been trying to do it as best as I
can.
So, the options I know are:
HTML2PS: same of DOMPDF, but this one convert first in .ps
(Ghostscript), then, in whatever format you need (PDF, JPEG, PNG). For
me it is a little better than dompdf, but I have the same speed problem.. Oh,
it has better compatibility with CSS.
Those two are PHP classes, but if you can install some software on the
server, and access it through passthru() or system(), have a look at
these too:
wkhtmltopdf: based on webkit (safari's wrapper), is really fast and
powerful... It seem like it is the best one (atm) for converting HTML pages to PDF on the fly, taking only two seconds for a three pages XHTML document
with CSS 2. It is a recent project. Anyway, the Google Code page is often
updated.
htmldoc: this one is a tank, it really never stops orcrashes... The project
seems to have died in 2007, but anyway if you don't need CSS compatibility
this can be nice for you.
** Thumbs Up For Strae.
If I understand your needs correctly I don't think any PHP-PDF class would do that.
Mostly you could insert only text and images to a PDF file, so if you would want something that looks like an HTML element you would need to insert it as an image.
Usually just putting HTML doesn't mean all your elements would stay intact in the PDF . (Different world, after all)
http://www.fpdf.org/ is the site having a great HTML-to-PDF class which work well. I am using it, but you have to first study its functionality and then start.

edit uploaded docx file in PHP

I need to open uploaded .docx file and possibility to change values. I know, that docx file consists of xml files. So, the main question is maybe somebody know a good WYSIWYG web-based xml editor?
I know one called XOPUS, but i have no idea how to configure it. Maybe somebody knows other alternatives for that task or advices, how to put xml file to textfield, where i could change values.
There are a couple of PHP toolkits that you can use for this task, first off there's an early dev on on codeplex:
http://openxmlapi.codeplex.com/
However you may b better off with one of the more mature ones:
http://holloway.co.nz/docvert/index.html
http://www.phpdocx.com/
Both of these can convert from docx to most of the popular formats, HTML included.
Once you've converted to something like HTML, then you can use an onscreen editor such as tiny MCE:
http://www.tinymce.com/
To provide in page rich editing capabilities, before finally using the above toolkits to convert back to DOCX or any other applicable format.
Update February 2014
Since I first wrote this reply things have moved on. The open xml kits I mentioned above are still valid, however in page editing is now more of a possibility than ever using the new HTML5 content editable and edit mode attributes.
It's now insanely easy to add your own buttons (Using something like bootstrap) above a div that has a content editable attribute attached to it.
Connecting the buttons to "document.execCommand" can then send, bold, italic, underline, link & image creation, list insertion and all manner of other HTML constructions methods directly to this div without needing anything like tinyMce or another in page editor anymore.
There is full details available on the Mozilla developer network, and I am planning o do a blog post on using this stuff very soon.
Have you tried PHPWord?
One may use the DocxUtilities class of PHPDocX to do some partial editing of an existing Word document.
This class allows you for:
searching and replacing a particular string of text
searching a string of text and remove the containing paragraph or section
highlight predefined strings (search and highlight)
full merging of docx files (text, images, charts, footnotes, ...)
If that is not enough for your purposes you should then prepare a PHPDocX template to fully customized an existing Word document.

Javascript based horizontal Scrolling of a multi-page PDF?

I'm wondering how I can accomplish horizontal scrolling of the pages of a PDF using JavaScript. Is it better to:
Convert the pages of the PDF into HTML files and then click left-right between iframes where src="...each page.html"?
Convert the pages of the PDF into some other HTML element besides iframe (e.g., DIV?) and then click left-right between elements containing the contents of each page.
I'd like to ensure that the PDF's text is searchable so I don't want to make its pages into images. I'm also skeptical of using iframes because of the formatting challenges of having multiple iframes in a single webpage. I've already tested this approach after converting the PDF to HTML using "PDFtoHTML" linux-based software and find that in general this is a suboptimal solution.
It seems like option 2 is the way to go but wouldn't know how to programmatically parse a PDF into multiple DIVs. Besides JavaScript, I'm familiar with PHP and Linux but not other languages if that would be helpful in thinking of solutions.
PDF plugin intercepts mouse events so there is no way to control it directly from the browser / JavaScript.
Your other method, converting to html, is feasible.
Converting a PDF page to a HTML file is more or less the exact same thing as "parsing it into a <div>". If you already found a tool that can do it for you ("PDFtoHTML"), just use that, and strip away everything except what's inside the <body> of the .html it outputs.

Categories