the best way to solve this was to use "pdftotext" that is in the "xpdf" package but in all shared hosts that i googled shell_exec is disabled . i found alternative metods that used only php like a function called pdf2string() (on php.net) but none of those functions didn't work as expected (with some pdf files they just didn't output correct text and with some other pdf they didn't output nothing and some other versions of this function just didnt work at all so i excluded this option). any way to convert that open source pdftotext into a php script ? (source is in c++ i think and can be found here : http://www.foolabs.com/xpdf/download.html) . any other solution will be accepted as far as it gives to me text output of the pdf (the correct one)
Since you have a restricted environment, you may want to look at this.
http://webcheatsheet.com/php/reading_clean_text_from_pdf.php
This uses no external library to parse pdf to text formats.
However, since this parse text out of raw pdf format, i m not sure how stable it is.
Related
I have PHP code running on Debian stretch with ImageMagick. It tries to convert SVG to another format. Here is how it starts
$im = new Imagick();
$im->readImageBlob($svg);
The variable $svg contains valid SVG code in a string. If I copy this string to a text file with a .svg extension then it opens just fine. But readImageBlob throws an exception saying no delegate for this image format.
I have seen similar questions solved by installing more packages to the system. But I've already installed libxml2-dev, librsvg2-bin, libmagickcore-6.q16-3-extra and libfreetype6-dev.
I have no idea what else I am missing.
I had to prepend the $svg with <?xml version="1.0" ?> and it worked. It least it does read in the SVG and attempts to create png/jpeg. One piece of text gets misplaced during convertion though. So the task in general is still failed. But this is another issue. I think the question is answered.
What do you get from running the following command from PHP exec()
convert -list format
The line SVG should say RSVG or MSVG/XML. Does it show that? if you need RSVG, you will have to install that delegate and then reinstall Imagemagick so that Imagemagick can find it. Imagemagick is used by Imagick. They are not the same. The RSVG delegate can be found by a Google Search or from linuxfromscratch.org/blfs/view/svn/general/librsvg.html. Your svg file seems to render properly for me in Imagemagick using RSVG, but I am not sure what it should look like. It is just a graph set of horizontal lines.
I do not know much about using readImageBlob(). Just use readImage(), where you supply the path to a saved svg image file. That should work. Try that and see what you get.
Here is what I get using RSVG 2.42.2 in Imagemagick 6.9.10.3 Q16 Mac OSX were I have saved your text in a file called test.svg.
convert test.svg test.png
If I force the use of the Imagemagick MSVG/XML, it does not look as good.
convert MSVG:test.svg test2.png
Having trouble capturing the following dynamic image on disk, all I get is a 1K size file
http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER
I have setup PHP cURL feature to work just fine on static imagery, but does not work for the above link. Similarly, also copy function, file_put_contents (file_get_contents)...they all work fine for static image. Plenty of references in SO for usage of these PHP functions, so I will not get into details here. Just the copy command:
copy('http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER', 'precip5.png');
Behavior is same, getting precip5.png size 760 bytes, on my windows development box and linux staging box, so can rule OS issues out. Again, all PHP functions do exactly the same thing - generate a file - but empty. Command line curl program is also generating that same junk 1K file.
So, the issue seems to be source and the best I can tell is that it is a dynamic (streaming?) image.
Ideally, I would like this be done in PHP or some command line utility like curl. I am trying to avoid adding java (imageio) dependency just for this...until I absolutely have have to go there...
I am trying to understand the nature of the beast (the image) first ;-)...
The URL you are saving produces HTML output, not the image. You are missing the parameter &print=1
http://water.weather.gov/precip/save.php?timetype=RECENT&loctype=NWS&units=engl&timeframe=current&product=observed&loc=regionER&print=1
Below is the error that I am getting while I use the code
$output = shell_exec("/usr/bin/ebook-convert test.epub mech4eck.pdf");
<br>echo $output;
I need to run this with PHP only, and so I am trying to execute the shell commands. I am using Ubuntu 12.
No write acces to /root/.config/calibre using a temporary dir instead /opt/lampp/lib/libgcc_s.so.1: version GCC_4.2.0' not found (required by /usr/lib/i386-linux-gnu/libstdc++.so.6) /opt/lampp/lib/libgcc_s.so.1: versionGCC_4.2.0' not found (required by /usr/lib/i386-linux-gnu/libstdc++.so.6) 1% Converting input to HTML... InputFormatPlugin: EPUB Input running on /opt/lampp/htdocs/test.epub Found HTML cover content/calibre_title_page.html Parsing all content... 34% Running transforms on ebook... Merging user specified metadata... Detecting structure... Flattening CSS and remapping font sizes... Source base font size is 12.00000pt Removing fake margins... Cleaning up manifest... Trimming unused files from manifest... Creating PDF Output... 67% Creating PDF Output
I know none out here helped but I figured out on it my own.
Here is the trick
Rename the file /opt/lampp/lib/libgcc_s.so.1 to /opt/lampp/lib/libgcc_s.so.1.bak
Taddda it works :)
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Read pdf files with php
Hi,
I have a bulk of pdf documents. I want to read that using php script. I searched a lot, but everyone is about creating pdf files. Here I dont want to create pdf file but I want to read it. Is there any way to read it php?
-Arun
To just get the text from a PDF file, try these:
- http://davidwalsh.name/read-pdf-doc-file-php
- http://www.webcheatsheet.com/php/reading_clean_text_from_pdf.php (more in-depth)
For a more heavyweight solutions, have a look at:
- http://www.setasign.de/products/pdf-php-solutions/fpdi/
You can easily read the contents of a PDF file using a command-line utility like Pdftotext which you can call through exec.
This is an example of what i mean, actually using system
system("pdftotext your.pdf /tmp/txtfile.txt");
$text = file_get_contents("/tmp/txtfile.txt");
EDIT
didn't know about the dash syntax - this is even better:
$content = shell_exec('pdftotext your.pdf -');
This does require pdftotext to be installed on your server though. On a CentOS server this would be:
yum install xpdf
Is it possible to somehow use PHP to read the contents of a .pst file?
There's a standalone program to convert PST to other formats (which may be then readable using PHP extensions, e.g. php_imap): http://www.five-ten-sg.com/libpst/
However, as Microsoft keeps changing the PST format, it's not guaranteed that you'll be able to convert all PST files.
Exporting folders from MS Outlook (FILE -> OPEN -> IMPORT -> EXPORT TO A FILE) into plain text CSV enables easy parsing e.g. via fgetcsv function. The libpst is only supported on linux (RPM).
Format of PST file is complex and parsing it with PHP would be a tremendous job.