extract images from PDF with PHP - php

The thing is that the client wants to upload a pdf with images as a way of batch processing multiple images at once.
I already looked around and out of the box PHP can't read PDF's.
What are my alternatives?
I already know the host has not installed imageMagick or any pdf library and the exec function is disabled. That's basicly leaving me with nothing to work with, I guess?
Does anyone know if there is an online service that can do this, with an api of sorts?
thanks in adv

AFAIK, there is no PHP module to do it. There is a command line tool, pdfimages (part of xpdf). For reference, here's how that works:
pdfimages -j source.pdf image
Which will extract all images from source.pdf as image-000.jpg, image-001.jpg, etc. Note the output format is always Jpeg.
Possible Options
Being a command line tool, you need exec (or system, passthru, any of the command executing functions built into PHP). As your environment doesn't have that, I see four options:
Beg that exec be turned on for you (your hosting provider can limit what you can exec to a single command)
Change the design -- how about a ZIP upload?
Roll your own, using the source code of pdfimages as a model
Let pdfimages do the heavy lifting, by running it on a remote host you do control
Regarding #3, rolling your own, I don't think rolling your own, to solve a very narrow definition of requirements, would be too difficult. I seem to recall that the image boundaries in PDF are well defined: just read in the file to a boundary, cut to the end of the boundary, base64_decode, and write to a file -- repeat. However, that may be too much...
If rolling your own is too complicated, then option #4 is kind of like what Joel Spolsky describes for working with complicated Excel objects (see the numbered list under the bold heading "Let Office do the heavy work for you").
Find a cheap hosting environment (eg Amazon EC2) that let's you exec and curl
Install pdfimages
Write a PHP script that takes a URL to a PDF, curl opens that PDF, writes it to disk, passes it to pdfimages, then returns the URL to the resulting images.
An example exchange could look like this:
GET http://www.cheaphost.com/pdfimages.php?extract=http://www.limitedhost.com/path/to/uploaded.pdf
Content-type: text/html
<html>
<body>
<ul>
<li>http://www.cheaphost.com/pdfimages.php?retrieve=ab9895v/image-000.jpg</li>
<li>http://www.cheaphost.com/pdfimages.php?retrieve=ab9895v/image-001.jpg</li>
</ul>
</body>
</html>
So your single pdfimages.php script (running on the host with the exec functionality) can both extract images, and give you access to the extracted images. When extracting, it reads a PDF you tell it, runs pdfimages on it, and gives you back a list of URL to call to retrieve the extracted images. When retrieving, it just gives you back a straight image.
You would need to deal with cleanup, perhaps the thing to do would be to delete the image after retrieval. You would also need to handle security -- don't know what's in these images, but the content might need to be wrapped in SSL and other precautions taken.

You can use pdfimages and install it this way:
apt install poppler-utils
Then use it this way to get all the images as PNG files:
pdfimages -j mypdf.pdf image -png
Images will be placed in the same folder under image-000.png, image-001.png, etc.
There are many options available, including some to change the output format, more information here.
I hope this helps!

Related

Convert Video to .flv?

Could someone please advise on what my options are when it comes to video type conversion in PHP. I have just discovered that our system uses something called ffmpeg. This isn't a problem but when a website is transferred it does create a problem as this absolute command breaks websites.
system ('/usr/bin/ffmpeg -i '.$video.' -y -f flv -qmin 5 -qmax 9 -ar 22050 '.DATA_DIR . $new_filename);
As you can see, a transferred website would require to have this path on their host and most don't.
So the question is this. I need to replace this. Is there some sort of PHP script or API that will make this work?
Is there any option other than pinging our own servers with the video and our video sending back the video in the new format?
Thanks.
Is there some sort of PHP script or API that will make this work?
No. This is well beyond the scope of PHP. FFMPeg is indeed the household name for video conversion - the best thing is probably to stick with that.
One workaround would be to set up a conversion service script on a server that supports ffmpeg, and all the other web sites sending the material to that server (if file sizes and traffic rates allow.)
There is a php ffmpeg library, but you can just install linux version of ffmpeg in your application and change this directory
No, there are no native PHP alternatives to ffmpeg for transcoding videos, so you must work around that somehow.
As mentioned before, there is no PHP extension that does video conversion (the ffmpeg-php extension can not convert videos) - you will have to call something not in PHP to get the video conversion proper done.
I see two possible problems on the "transferred websites":
If it is simply a path problem: look at this page for how to call ffmpeg - you should not have to include the "/usr/bin/" part in your command.
If the problem is that you cannot install ffmpeg on the transferred websites, you can do two things, depending on which drawback is more acceptable:
You may convert all videos to .flv beforehand, and serve them either from the transferred websites or from your own servers. Use that method for videos that will be watched often, or whose converted version will be watched often.
The transferred websites will point to the video flux from your own servers, that will handle the on-the-fly conversion. Do that for videos that will not be watched as often.
Feel free to install ffmpeg into your home directory on your hosting provider; many, if not most, hosts allow you to install programs in addition to scripts.
However, please do not place this code on a production system. Or, any computer you care about. If some smartass uploads a video named
Puppy;/bin/rm -rf /;.avi
then you can kiss all your data goodbye. If it is named:
Puppy;`nc -l 11111`;.avi
then they have a shell they can use for whatever they please.

OpenCart mp3 preview

What is the best method and player for giving an audio preview on an OpenCart store. This would involve uploading the full track and then extracting a portion to be played
m3psplt is by far your best bet.
It can sometimes be a little dicey to install (particularly on CentOS, other RH based distros) but it's really the only solution I've found.
I usually run a script that analyzes the mp3 with getid3 to get the length, then I calculate the halfway point of the mp3, and pass that plus thirty seconds to mp3splt via the exec command to mp3splt.
It works great when you can get it to install properly. If you're on debian/ubuntu it's actually a cinch to install via aptitude.
The only other thing I could think do do would be to wrap your command line unix audio editing utilities in a php script to basically create a "grab 2 minute head of MP3" function, then run that on files when they are uploaded. then yes, save them in a "previews" area of the file system and store the filename in a DB table for later reference.
I've found a PHP script that could fit your needs (please note I didn't tested it). You can find it here. The class interface seems simple and functional. Anyway, you will need to modify your OpenCart product template to expose the preview command.

Server-side conversion of raster images to vector images

I would like to convert images that have been uploaded by the user (in various formats and conditions) to a vector image format such as .eps. I'm primarily working in PHP.
What options exist?
There are a small number of autotracing software projects released under GPU (for example, POTRACE that you could run via system commands. I can't attest to their quality. Tracing almost always requires some element of human supervision to avoid things looking like a mess of broken pottery, but you won't know until you try. Rather than triggering the tracer via PHP, I would use PHP simply to save incoming images to a temporary folder and then, through cronjob (one- or two-per-minute), crank through the holding folder in batches (you could pace it that way and avoid it being used as a way to DoS your site).
I'm thinking of doing something slightly similar (though not graphic related) for an upcoming project, and I'm considering doing all my heavy lifting on a desktop machine, which would fetch all incoming files and process them before FTPing them back to the server. I'm somewhat nervous about having any complex resource-intensive script like this running on a web server.
Definitelly you can do this with the Inkscape
here is the list of formats it supports What formats can Inkscape import/export?
and it can be of course used with the command line or exec() command Can Inkscape be used from the command line?
Imagetracer is a free and open source (Public Domain) library and application which can be used on the server side. Disclaimer: I made these.
You can use ImageTracer.jar from
https://github.com/jankovicsandras/imagetracerjava
like this with PHP:
<?php exec("java -jar ImageTracer.jar input.png outfilename output.svg"); ?>
You can also use the JavaScript version with Node.js on the server side, here's the example code:
https://github.com/jankovicsandras/imagetracerjs/tree/master/nodecli
https://github.com/jankovicsandras/imagetracerjs/blob/master/nodetest/nodetest.js
PHP is not an image editor. It is a hypertext preprocessor.
You have to move to serverfault.com, or even better on some image processing resource, and ask there for some command line utility that can be run from PHP using the system() command.

Pdf on web page: best solution

I need to include pdf files in some webpages, and I'm gettin' in troubles.
The app is a simple newspaper's archive, in which i can read right on page or download as pdf files, one file per page. What my customer can provide me is one pdf file for each page; what my customer wants from me is to navigate them in indexes (with page thumbnail) and have a read from a choosen one direcly in page; I'm using php/mysql.
I started trying out to use the <object> tag with type="application/pdf", but i found it's deprecate 'cause it's not crossplatform at all (there's no support on linux's browsers, but even my windows' firefox 3.5 couldn't show me anything).
I guessed I could transform that pdf in something different (html or simply images are good enough), but the only thing i found is ImageMagick, that I cannot use as I must install on server and I can't, as I'm not admin of that machine.
So, I'm finally looking for suggestions
Thanks
Display the pdf inline using an IFRAME. The thumbnail you can generate with imageMagik. You should be able to use the command line version of ImageMagik to resize and convert to jpg.
edit
Your best bet is to talk to the server admin and have them install php support for ImageMagik then you can use it as a class.
If you can't get support to install on the server, you will have to use the command line version.
You might be able to Google around for a library that wraps the command line, but it would be trivial to write it yourself.
With this in place you can create a large readable black and white png for each page. It should click through to the pdf.

Website screenshots

Is there any way of taking a screenshot of a website in PHP, then saving it to a file?
LAST EDIT: after 7 years I'm still getting upvotes for this answer, but I guess this one is now much more accurate.
Sure you can, but you'll need to render the page with something.
If you really want to only use php, I suggest you HTMLTOPS, which renders the page and outputs it in a ps file (ghostscript), then, convert it in a .jpg, .png, .pdf.. can be little slower with complex pages (and don't support all the CSS).
Else, you can use wkhtmltopdf to output a html page in pdf, jpg, whatever..
Accept CSS2.0, use the webkit (safari's wrapper) to render the page.. so should be fine.
You have to install it on your server, as well..
UPDATE Now, with new HTML5 and JS feature, is also possible to render the page into a canvas object using JavaScript. Here a nice library to do that: Html2Canvas and here is an implementation by the same author to get a feedback like G+.
Once you have rendered the dom into the canvas, you can then send to the server via ajax and save it as a jpg.
EDIT: You can use the imagemagick tool for transforming pdf to png. My version of wkhtmltopdf does not support images. E.g. convert html.pdf -append html.png.
EDIT: This small shell script gives a simple / but working usage example on linux with php5-cli and the tools mentioned above.
EDIT: i noticed now that the wkhtmltopdf team is working on another project: wkhtmltoimage, that gives you the jpg directly
Since PHP 5.2.2 it is possible, to capture a website with PHP solely!
imagegrabscreen — Captures the whole screen
<?php
$img = imagegrabscreen();
imagepng($img, 'screenshot.png');
?>
imagegrabwindow - Grabs a window or its client area using a windows handle (HWND property in COM instance)
<?php
$Browser = new COM('InternetExplorer.Application');
$Browserhandle = $Browser->HWND;
$Browser->Visible = true;
$Browser->Fullscreen = true;
$Browser->Navigate('http://www.stackoverflow.com');
while($Browser->Busy){
com_message_pump(4000);
}
$img = imagegrabwindow($Browserhandle, 0);
$Browser->Quit();
imagepng($img, 'screenshot.png');
?>
Edit: Note, these functions are available on Windows systems ONLY!
If you don't want to use any third party tools, I have come across to simple solution that is using Google Page Insight api.
Just need to call it's api with params screenshot=true.
https://www.googleapis.com/pagespeedonline/v1/runPagespeed?
url=https://stackoverflow.com/&key={your_api_key}&screenshot=true
For mobile site view pass &strategy=mobile in params,
https://www.googleapis.com/pagespeedonline/v1/runPagespeed?
url=http://stackoverflow.com/&key={your_api_key}&screenshot=true&strategy=mobile
DEMO.
You can use simple headless browser like PhantomJS to grab the page.
Also you can use PhantomJS with PHP.
Check out this little php script that do this. Take a look here https://github.com/microweber/screen
And here is the API- http://screen.microweber.com/shot.php?url=https://stackoverflow.com/questions/757675/website-screenshots-using-php
There is a lot of options and they all have their pros and cons. Here is list of options ordered by implementation difficulty.
Option 1: Use an API (the easiest)
ApiFlash (based on chrome)
EvoPDF (has an option for html)
Grabzit
...
Pros
Execute Javascript
Near perfect rendering
Fast when caching options are correctly used
Scale is handled by the APIs
Precise timing, viewport, ...
Most of the time they offer a free plan
Cons
Not free if you plan to use them a lot
Option 2: Use one of the many available libraries
dom-to-image
wkhtmltoimage (included in the wkhtmltopdf tool)
phpwkhtmltopdf
...
Pros
Conversion is quite fast most of the time
Cons
Bad rendering
Does not execute javascript
No support for recent web features (FlexBox, Advanced Selectors, Webfonts, Box Sizing, Media Queries, HTML5 tags...)
Sometimes not so easy to install
Complicated to scale
Option 3: Use PhantomJs and maybe a wrapper library
PhantomJs
php-phantomjs (php wrapper library for PhantomJs)
...
Pros
Execute Javascript
Quite fast
Cons
Bad rendering
PhantomJs has been deprecated and is not maintained anymore.
No support for recent web features (FlexBox, Advanced Selectors, Webfonts, Box Sizing, Media Queries, HTML5 tags...)
Complicated to scale
Not so easy to make it work if there is images to be loaded ...
Option 4: Use Chrome Headless and maybe a wrapper library
Chrome Headless
chrome-devtools-protocol
puphpeteer
...
Pros
Execute Javascript
Near perfect rendering
Cons
Not so easy to have exactly the wanted result regarding:
page load timing
proxy integration
auto scrolling
...
Complicated to scale
Quite slow and even slower if the html contains external links
Disclaimer: I'm the founder of ApiFlash. I did my best to provide an honest and useful answer.
Well, PhantomJS is a browser that can be easily put on a server and integrate it to php. You can find the code in WDudes. They have included lot more features like specifying the image size, cache, download as a file or display in img src etc.
<img src=”screenshot.php?url=google.com” />
URL Parameters
Width and Height: screenshot.php?url=google.com&w=1000&h=800
With cropping:
screenshot.php?url=google.com&w=1000&h=800&clipw=800&cliph=600
Disable cache and load fresh screesnhot:
screenshot.php?url=google.com&cache=0
To download the image: screenshot.php?url=google.com&download=true
You can see the tutorial here: Capture Screenshot of a Website using PHP without API
cutycapt saves webpages to most image formats(jpg,png..) download it from your synaptic, it works much better than wkhtmltopdf
I set up finally using microweber/screen as proposed by #boksiora.
Initially when trying the mentioned link here what I got:
Please download this script from here https://github.com/microweber/screen
I'm on Linux. So if you want to run it, you may adjust my step follow to your environment.
Here are the step I did on my shell on DOCUMENT_ROOT folder:
$ sudo wget https://github.com/microweber/screen/archive/master.zip
$ sudo unzip master.zip
$ sudo mv screen-master screen
$ sudo chmod +x screen/bin/phantomjs
$ sudo yum install fontconfig
$ sudo yum install freetype*
$ cd screen
$ sudo curl -sS https://getcomposer.org/installer | php
$ sudo php composer.phar update
$ cd ..
$ sudo chown -R apache screen
$ sudo chgrp -R www screen
$ sudo service httpd restart
Point your browser to screen/demo/shot.php?url=google.com. When you see the screenshot, you are done. Discussion for more advance setting is available here and here.
There are many open source projects that can generate screenshots. For example PhantomJS, webkit2png etc
The big problem with these projects is that they are based on older browser technology and have problems rendering many sites, especially sites that use webfonts, flexbox, svg and various other additions to the HTML5 and CSS spec over the last couple of months/years.
I've tried a few of the third party services, and most are based on PhantomJS, meaning they also produce poor quality screenshots. The best third party service for generating website screenshots is urlbox.io. It is a paid service, although there is a free 7-day trial to test it out without committing to any paid plan.
Here is a link to the documentation, and below are simple steps to get it working in PHP with composer.
// 1 . Get the urlbox/screenshots composer package (on command line):
composer require urlbox/screenshots
// 2. Set up the composer package with Urlbox API credentials:
$urlbox = UrlboxRenderer::fromCredentials('API_KEY', 'API_SECRET');
// 3. Set your options (all options such as full page/full height screenshots, retina resolution, viewport dimensions, thumbnail width etc can be set here. See the docs for more.)
$options['url'] = 'example.com';
// 4. Generate the Urlbox url
$urlboxUrl = $urlbox->generateUrl($options);
// $urlboxUrl is now 'https://api.urlbox.io/v1/API_KEY/TOKEN/png?url=example.com'
// 5. Now stick it in an img tag, when the image is loaded in browser, the API call to urlbox will be triggered and a nice PNG screenshot will be generated!
<img src="$urlboxUrl" />
For e.g. here's a full height screenshot of this very page:
https://api.urlbox.io/v1/ca482d7e-9417-4569-90fe-80f7c5e1c781/8f1666d1f4195b1cb84ffa5f992ee18992a2b35e/png?url=http%3A%2F%2Fstackoverflow.com%2Fquestions%2F757675%2Fwebsite-screenshots-using-php%2F43652083%2343652083&full_page=true
I'm on Windows so I was able to use the imagegrabwindow function after reading the tip on here from stephan. I added in cropping (to get rid of the Browser header, scroll bars, etc.) and resizing to get a final image. Here's my code. Hope that helps someone.
I used bluga. The api allows you to take 100 snapshots a month without paying, but sometimes it uses more than 1 credit for a single page. I just finished upgrading a drupal module, Bluga WebThumbs to drupal 7 which allows you to print a thumbnail in a template or input filter.
The main advantage to using this api is that it allows you to specify browser dimensions in case you use adaptive css, so I am using it to get renderings for the mobile and tablet layout as well as the regular one.
There are api clients for the following languages:
PHP,
Python,
Ruby,
Java,
.Net C#,
Perl
and Bash (the shell script looks like it requires perl)
It all depends on how you wish to take the screenshot.
You could do this via PHP, using a webservice to get the image for you
grabz.it has a webservice to do just this, here's an article showing a simple example of using the service.
http://www.phpbuilder.com/articles/news-reviews/miscellaneous/capture-screenshots-in-php-with-grabzit-120524022959.html
There are some ways in which you can achieve this in PHP, but realistically it's better to delegate this to a non-PHP based API which you can build yourself, or you can pay for. Many people have already listed screenshot APIs in the answers, and you can use any of those to achieve this. My own screenshot API is extremely well tested and covers many rendering cases that most APIs don't cover, but for most people, this is overkill, honestly.
My recommendation is to build your own API using Puppeteer, which is the canonical solution nowadays to build screenshot solutions. My service is built on top of Puppeteer and it really works well for most basic use cases.
You can build a serverless Puppeteer solution on AWS or GCP using something like https://www.npmjs.com/package/chrome-aws-lambda, which is an excellent serverless package for Puppeteer that comes pre-loaded with Chromium.
You can use https://grabz.it solution.
It's got a PHP API which is very flexible and can be called in different ways such as from a cronjob or a PHP web page.
In order to implement it you will need to first get an app key and secret and download the (free) SDK.
And an example for implementation. First of all initialization:
include("GrabzItClient.class.php");
// Create the GrabzItClient class
// Replace "APPLICATION KEY", "APPLICATION SECRET" with the values from your account!
$grabzIt = new GrabzItClient("Sign in to view your Application Key", "Sign in to view your Application Secret");
And screenshoting example:
// To take a image screenshot
$grabzIt->URLToImage("http://www.google.com");
// Or to take a PDF screenshot
$grabzIt->URLToPDF("http://www.google.com");
// Or to convert online videos into animated GIF's
$grabzIt->URLToAnimation("http://www.example.com/video.avi");
// Or to capture table(s)
$grabzIt->URLToTable("http://www.google.com");
Next is the saving.You can use one of the two save methods, Save if publicly accessible callback handle available and SaveTo if not. Check the documentation for details.
After a lot for surfing on web I found this.
PPTRAAS > A free tool to capture screenshot by passing your URL as a parameter
They provide multiple options by simply hitting their URL.
Get full page screenshot
https://pptraas.com/screenshot?url={YOU URL HERE}
Get page screenshot of specific size
https://pptraas.com/screenshot?url={YOU URL HERE}&size=400,400
One can even convert the page to pdf
https://pptraas.com/pdf?url={YOU URL HERE}
Not directly. Software such as Selenium have features like this and can be controlled by PHP but have other dependencys (such as running their java-based server on the computer with the browser you want to screenshot)
you can use cutycapt .
kwhtml is deprecated and show page like old browser.
I've found this to be the best and easiest tool around: ScreenShotMachine. It's a paid service, but you get 100 free screenshots and you can buy another 2,000 for (about) $20, so it's a pretty good deal. It has a very simple usage, you just use a URL, so I wrote this little script to save a file based on it:
<?php
$url = file_get_contents("http://api.screenshotmachine.com/?key={mykey}&url=https://stackoverflow.com&size=X");
$file = fopen("snapshots/stack.jpg", "w+");
fwrite($file, $url);
fclose($file);
die("saved file!");
?>
They have a very good documentation here, so you should definitely take a look.

Categories