How to extract text from the PDF document? [closed] - php

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
How to extract text from the PDF document using PHP?
(I can't use other tools, I don't have root access)
I've found some functions working for plain text, but they don't handle well Unicode characters:
http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html

Download the class.pdf2text.php # https://pastebin.com/dvwySU1a or http://www.phpclasses.org/browse/file/31030.html (Registration required)
Code:
include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('filename.pdf');
$a->decodePDF();
echo $a->output();
class.pdf2text.php Project Home
pdf2textclass doesn't work with all the PDF's I've tested, If it doesn't work for you, try PDF Parser

Related

Link crawler (for download or development) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have a question regarding webcrawling..
What I need is a webcrawler that can save all external links from a website and print them to a file (csv).
I am in the middle of developing it myself (with php), but was wondering if there were some downloadable solutions already (doesn't have to be php solution)..
Of course I have looked for myself, but couldn't find anything. So if anyone can help me out here, I would really appreciate it.
Also, what would be the best way to developing it be?
You can Simple HTML Dom Parser (http://simplehtmldom.sourceforge.net/)
Eg.
<?php
include 'simple_html_dom.php';
$html = file_get_html('http://google.com/');
foreach($html->find('a') as $element) {
$link[]=$element->href;
}
//Write into your CSV file
?>

read data from a scanned document(pdf or other format) for db persistence purposes [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
How to extract text from the PDF document using PHP?
(I can't use other tools, I don't have root access)
I've found some functions working for plain text, but they don't handle well Unicode characters:
http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html
Download the class.pdf2text.php # https://pastebin.com/dvwySU1a or http://www.phpclasses.org/browse/file/31030.html (Registration required)
Code:
include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('filename.pdf');
$a->decodePDF();
echo $a->output();
class.pdf2text.php Project Home
pdf2textclass doesn't work with all the PDF's I've tested, If it doesn't work for you, try PDF Parser

How to convert HTML page to Image? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have a html page http://gthost.dyndns.org/gtfb_final/cam/3097952a5c3a90d7d35.38138446.html. I want to convert it to image (480 X 480). This full html will be converted to an image. Is there any PHP code for that?
look at this:
pear.php.net html renderer
and for self doing:
5 Minute tutorial
for other results search for html rendering in php (or similar)
You can install Imagemagick and invoke it with exec to render HTML into an image.
http://www.imagemagick.org/

Microsoft Word to HTML in PHP script [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I want to know the PHP code for Word to HTML conversion. When we retrive the Word document from the database and display in the frontend, the Word document should display as HTML page.
wvware
I don't completly understand your request but I think you can try TinyMCE for PHP, it's a Web based Wysiwyg editor, where all you type and format will be saved as HTML code, you can even paste and convert Word documents.
So like symcbean said, the best way to do is wvWare, otherwise,you could hardcode a class yourself, but it will be a difficult path.

Batch code indenters and beautifiers [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Does anyone here know of good batch file code indenters or beautifiers?
Specifically for PHP, JS and SGML-languages.
Preferably with options as to style.
The following page has code on it to tidy Javascript (written in javascript as well):
http://www.howtocreate.co.uk/tutorials/jsexamples/JSTidy.html
There are various ways to tidy SGML based files (i.e. XML) - HTMLTidy will often do the trick, and there are various 'pretty print' implementations in various languages out there.
And finally a link to a web site with PHP code for pretty printing PHP: http://tobyinkster.co.uk/blog/2007/07/17/php-pretty-printer/
For HTML/XML HTML Tidy is the best option:
http://tidy.sourceforge.net/

Categories