Does anybody know how instead of providing a link to users to download a doc file, I can embed PART of the file on an iframe on the same page. I want to give users a teaser on the iframe but not access to the entire document..Thanks!
Browsers can't open native MS Word files.
For the '.doc' or '.docx' files, you'll need to read the except on the server-side and convert it into HTML. For '.txt', most browsers will read those natively, but if you want to show only an excerpt, you will need to read into the file, probably server-side.
See Convert .doc to html in php. Once you have HTML on the server, you can trim it down to make your excerpt before displaying it.
The bad news is this is probably more complicated than you thought. The good news is that you won't need iframes.
Related
Hi I'm trying to write a PHP script that download all images and videos from a subreddit and store them locally.
My plan is to get all the links from a URL then process to decide whether it's an image or video then download it.
If someone can guide me or give me an idea on how to proceed that will be appreciated.
My idea would be to curl download the link to website, so you get the html version of it, than take a look on this topic. Having this, you can extract all needed tags, for example "img" tags and their href.
Than just load them into array, and iterate curl to download them and store them locally.
Another approach would be to download html and load all links based on filter (ex. beggining on "\"http://" and ending with quote (also make another filter for single quote if there are singe quotes in the html).
Than just iterate all of links and whitelist them based on extension, if that's file what you are intrested in. Than curl download and store them.
EDIT:
I forgot - also do not forget to fix links in the .html and .css and .js (and probably more) files. Also just offtopic sidenote, watch out for images with php in them.
Is it possible to extract a portion of a remote HTML page, and print it on another page, using PHP cURL, HTML DOM parser, or any other method, preserve the original formatting styles, images, tabs functioning?
For example, how to extract content of central column (with tabs and formatting, preserve the look of the original text), from http://ru.yahoo.com/?
As far as I understand, the script should process an external CSS, so that returned content has the same look as the original. What would be most appropriate way, if that's possible? If yes, an example would be highly appreciated. I looked several examples, but didn't find any solution for my case.
Well if I had to do it quickly (read: very dirty) I would do is this I think:
Pull the HTML from the remote server using standard PHP
Use the HTML that you stole took from the other site and add your own HTML to it down at the bottom.
Also add your own CSS to hide the html of the other site you don't want to be visible and style your own html.
Fiddle until it look okay enough. However: I think this will break the loading of the external JS files because of the same domain policy.
A nice approach would be this:
Pull the HTML from the remote server using standard PHP
Parse the HTML with some PHP HTML parser and strip out all external CSS and JS files and pull those files as well.
Use XPath to extract the parts that you need.
Create a new HTML document with your own HTML, the parts that you need, new links to your newly downloaded CSS and JS files. Also add your own CSS and JS to style the result.
You know: RSS was invented for this and if they don't provide an rss feed they most likely don't want you to get the content and post it on your own site. :P
I'm wondering how I can accomplish horizontal scrolling of the pages of a PDF using JavaScript. Is it better to:
Convert the pages of the PDF into HTML files and then click left-right between iframes where src="...each page.html"?
Convert the pages of the PDF into some other HTML element besides iframe (e.g., DIV?) and then click left-right between elements containing the contents of each page.
I'd like to ensure that the PDF's text is searchable so I don't want to make its pages into images. I'm also skeptical of using iframes because of the formatting challenges of having multiple iframes in a single webpage. I've already tested this approach after converting the PDF to HTML using "PDFtoHTML" linux-based software and find that in general this is a suboptimal solution.
It seems like option 2 is the way to go but wouldn't know how to programmatically parse a PDF into multiple DIVs. Besides JavaScript, I'm familiar with PHP and Linux but not other languages if that would be helpful in thinking of solutions.
PDF plugin intercepts mouse events so there is no way to control it directly from the browser / JavaScript.
Your other method, converting to html, is feasible.
Converting a PDF page to a HTML file is more or less the exact same thing as "parsing it into a <div>". If you already found a tool that can do it for you ("PDFtoHTML"), just use that, and strip away everything except what's inside the <body> of the .html it outputs.
I am working on a job site where job seekers can upload their resume . When editing their profile , I want them to view their previous resume . I used <iframe> for this purpose , but instead of displaying the doc file,it shows an option to download . So how can I display their resume (in .doc,.docx and .odt format)
You can use php's fopen function, however there is a lot of unwanted code within a word doc file. Maybe a quick search could help you with what you want:
Reading/Writing a MS Word file in PHP
It would require a lot of effort. best bet is to create a wysiwyg editor instead of an upload that they can copy and paste to
You can't display .doc,.docx and .odt format files on browser. Basically a browser is designed for parsing HTML files only. You can use Flash to display doc inside your browser,
You need to embed the src, if you were to embed the source, it stops Adobe Reader or what not, taking over the browser entirely, and actually views it in the window, this should help:
Recommended way to embed PDF in HTML?
I know this is for PDF's but I would assume you can use it for .doc too.
I am looking for some way to code a function (I'm open to any language or library at this point) to take an already existing PDF file as input and return a modified PDF file that links certain words to different URLs. I know PHP and ColdFusion both have good tools for dealing with PDF's, but I haven't been able to find anything that works.
I've been doing this by going through Acrobat and linking the text by hand and was wondering if there was any way to automate the procedure.
Thanks!
With ColdFusion you can extract the text with DDX (see Extracting text from a PDF document on the page), modify it using search/replace and generate new document.
If I understand what you're trying to do, you should be able to use CFPDF (http://livedocs.adobe.com/coldfusion/8/htmldocs/Tags_p-q_02.html#2922772) to read the pdf file into a ColdFusion variable, replace whatever content you want in that variable, then save the content back to pdf.