Is there any script which enables automatic screenshot of this webpage running on a server on a daily basis and store the captured images ?
First of all lets define the task and understand its boundaries. Because there is no simple and easy solution to address your question.
To capture the screenshot of the web page, first it must be rendered. This is quite complicated process. You tagged a PHP, so shortly you can not do it simply using only PHP. I would recommend you to read briefly how the rendering of web pages work. This article is a great source to learn about it: https://www.html5rocks.com/en/tutorials/internals/howbrowserswork/
Hence, first it is needed some combination of "services" which will render the page, then capture it as a bitmap (it does not matter which graphic format). Then you can obtain it using PHP (REST or any other suitable way). Roughly saying you need some browser-like system (or exactly the browser itself) which will render the page and return bitmap.
If you are looking for some simple practical solution without burden you have several solutions:
For getting screenshot from any remote page you can use paid API https://thumbnail.ws/ . It has a Free option with limits.
For getting screenshot and other related thumbnail data you can use Google's Page Speed API. The example code can be found at https://gist.github.com/ajdruff/e6b69e3eb5a3bc1dc081
Use some available extensions for Google Chrome or Firefox (you can make your own using JavaScript) then use the data from it.
There are many packages for that purpose out there one of them for example is Screen
Here's an example of how to use it:
Assuming you've already installed it
require './vendor/autoload.php';
use Screen\Capture;
$url = 'https://example.com'; // webpage you want to capture
$screenCapture = new Capture($url);
$screenCapture->save('./test'); // test is the name of the screenshot (default type is 'jpg')
In order to correctly take screenshots of webpages, you need to first make sure that the page rendered correctly and the most convenient way is to use real browsers.
As others mentioned, there are different options:
you can use a paid API to do that (wasting money IMHO).
You can write your own code to do that (which is not necessarily easy and safe).
You can use ready-to-use tools (mostly command-line tools)
cutycapt (cutycapt --url= < URL > --out= < filename >)
firefox (firefox -screenshot < filename > < URL >)
wkhtmltoimage (as part of wkhtmltopdf tool)
Chrome browser ( google-chrome --headless --disable-gpu --screenshot < URL>)
You can run your own server if you are doing screenshot in bulk (for example, https://github.com/filovirid/screenshot_server)
There is no such script which automatically takes you the screenshot unless you are developing a software for it.
The alternative thing which you can do is to write a python script to scrape the data in the webpage daily and store it in a file or else you can use selenium tool for this purpose.
Related
I just need some clarity here on whether this concept is possible or whether i have misunderstood what is capable of crawlers.
Say 1 have a list of 100 websites/blogs and every day, my program ( i am assuming its a crawler thingy ) will crwal thru them and if there is a match for some specified phrases like "miami heat" or "lebron james", it will proceed to download that page -> convert it to a pdf with full text/images and save that pdf.
So my questions are;
This type of thing is possible right ? Pls note that i dont want just text snippets but i am hoping to get the entire page as if it was printed out on a piece of paper?
This type of programs are called as crawlers right ?
i am planning to build on code from http://phpcrawl.cuab.de/about.html
This is perfectly possible, as you are going to use phpcrawl to crawl the web pages use wkhtmltopdf to convert your html to pdf as it is
Yes it is possible, by using wkhtmltopdf tool you can convert web page as it is. its a desktop bases s/w so you can install in in you machine
Yes Crawlers.
Its a perfect tool for building what you want to build.
Yes it is possible.
You could call it a crawler or a scraper, as you are scraping data off the websites.
Rendering the website to a PDF will probably be the hardest part, their are webservices that can do this for you.
For Example
http://pdfmyurl.com/
(I have no affiliation, and I have never used them, it was just the first site in the google results when I checked)
I want the option of converting HTML to image and showing the result to the user. I would be creating an $html variable with PHP, and instead of displaying using echo $html, I want to display it as an image so the user can save the file if they needed to.
I was hoping there would something as simple as $image = convertHTML2Image($html); :p if that exists?!
Thanks!!
As #Pekka says, the job of turning HTML code into an image is the job of a full-blown web browser.
If you want to do this sort of thing, you therefore need to have a script that does the following:
Opens the page in a browser.
Captures the rendered page from the browser as a graphic.
Outputs that graphic to your user.
Traditionally, this would have been a tough task, because web browsers are typically driven by the user and not easy to automate in this way.
Fortunately, there is now a solution, in the form of PhantomJS.
PhantomJS is a headless browser, designed for exactly this kind of thing -- automated tasks that require a full-blown rendering engine.
It's basically a full browser, but without the user interface. It renders the page content exactly as another browser would (it's based on Webkit, so results are similar to Chrome), and it can be controlled by a script.
As it says on the PhantomJS homepage, one of its target use-cases is for taking screenshots or thumbnail images of websites.
(another good use for it is automated testing of your site, where it is also a great tool)
Hope that helps.
This is not possible in pure PHP.
What you call "converting" is in fact a huge, non-trivial task: the HTML page has to be rendered. To do this in PHP, you'd have to rewrite an entire web browser.
You'll either have to use an external tool (which usually taps into a browser's rendering engine) or a web service (which does the same).
It is possible to convert html to image. However, first you must convert to PDF. see link
You may have a look at dompdf which is a php framework to convert a html file to a pdf.
use WKHTMLTOPDF. works like a charm. it converts to any page to PDF ..
a jpeg can be obtained by performing later operation.
http://code.google.com/p/wkhtmltopdf/
I am just stuck a little in making a choice between PHP chart Lib and JavaScript Chart Lib. I do understand that PHP if for the server side and Javascript for the client side. My problem is what difference does it make when using their charting libraries. Is it performance issue or what?
I want to understand the difference in using PHP chart Libs and JavaScript Chart Libs. Please am not looking for examples of their chart libraries. I am looking for why i should choose one over the other.
I tried to google 'php chart vs javascript chart' but didn't get any links that can give me
the difference.
EDIT 1
1)
If this question has been answered before, then point me there.
2)
Am developing the application for internet
EDIT 2
1)
I have found out about PHPChart PHPChart which has both PHP source code and JavaScript source code. If anyone has experience in that library, does it may be solve the problem of server side load (bandwidth issues) etc.. I am thinking since it has both the PHP and JavaScript source then it may be the best to use. Am just assuming. :-)
Thank you very much
Both ways of creating graphs have their own pros and cons.
If you decide to do it using PHP, first you need to make sure that you have all the required graphical libraries installed (e.g. GD, which might not always available on shared hosts).
Assuming you have them, the first negative thing in my opinion is that you will end up with static images. Of course, it's not always a bad thing, as that ensures compatibility with all the clients, be those with or without javascript support, however, it takes away the dynamics of graphs generated on the client side using javascript. Your users won't be able to zoom, move, slide, full screen or do anything that they could with the likes of Highcharts or Flot.
Another con is that images take up more bandwidth than, say, JSON. The bigger you want to have your graph, the more colors it contains, the longer your clients will have to wait till your page loads. And just because those loads are not asynchronous, they will have to wait for the images to load before they will see the rest of the page.
With javscript libraries everything is different though. You only request the data required for your graph and you only request it when your page loads. The amount of data is usually smaller than an image would be plus you can compress your output with GZ to make it even smaller. Users will see nice spinners informing them that the graph is loading instead of some incomplete webpage.
Another thing to take into account is - what if you decide to show a nice table with data in them below each graph? If you chose to render images on the server, you would end up having to add new functionality just to get the data. With JSON, however, you just make one call, render the graph and display the table. Maybe calculate totals or do whatever you want with it. Hand it out to people as an API if you wish, after all :)
If you ask me, I would definitely go with client-side graphs as most of the devices have nice HTML5 support nowadays and being able to display a graph on an Android phone, or an iPhone or an iPad shouldn't pose a problem. If you only need images and you don't wish to expose the original data, go with PHP.
My opinion is that having a server side solution (i.e. php) takes away any browser compatibility issues you may have with a client side solution (i.e. javascript) and hence support issues.
A benfit of using JS is that it does offload resources from your server to the client because you may only have to generate some light weight data (e.g. JSON , XML) and the rendering occurs on the client. You will have to investigate how many hits your server is likely to get, etc to determine if resource is an isuse with PHP or JS.
However, using Php to create images of charts you can always get around the performance/resource issue by using a cache of the image files and serving from the cache (it's a just a folder of images) instead of generating a new one. Whether you cna use a cache will depend on your usage. If clients require up to the second data and its always changing, obviously a cache may not be of use.
Here's what I see :
Using PHP
Increase load on the server for the request
Will work everywhere
Also, like someone said here and made me think of it, you can cache the image that PHP give you, reducing bandwith (no lib to download) and reducing load (cache)
Using Javascript
Decrease load but increase the bandwitch and addition http request (to load the JS lib)
Will work where JS is available
But remember, PHP take more load then an HTTP request.
Also, always remember, Javascript is made for effects and specials stuffs you need to display.
There is one PHP render advantage that no one told about. Since sometime you need to include chart as image into PDF, DOC, XLS etc. file or email it – you have no other way except to render chart on server and store it as image to be inserted.
For data manipulation you use PHP.
For visual and behavioral effects you use JavaScript.
For that reason, you should use Javascript as its designed for visual behavior. Plus it will put less load on your server as all processing will be client side. As more people use your application simultaneously, it will start to slow down as your server will be doing a lot more then it has to.
Hope that helps :)
I am trying to generate a pdf from a web page which has pictures and swf files.
Final pdf should have pictures (swf should be converted into image, last frame is sufficient).
I am able to generate pdf when only images are there but i am stuck in creating pdf when the web page has swf files.
I've used wkhtmltopdf before to render pdfs programatically from web sites. I'm not sure if it'll cope with swf but it may do since it uses a version of webkit compiled in to qt.
You might be able to use wkhtmltopdf --enable-plugins. But according to this bugreport it might not work http://code.google.com/p/wkhtmltopdf/issues/detail?id=48 with the flash plugin (Java however does!).
Another option is running a browser in headless mode, or on a virtual X. Firefox3 works supposedly if you use the extension "CommandLinePrint".
Xvfb :2 -screen 0 1600x1200x24 &
firefox --display=localhost:2.0 -print http://flashgames.com -printmode pdf -printfile '/tmp/test.pdf'
Infos stolen from http://spielwiese.la-evento.com/xelasblog/archives/31-Headless-Firefox-als-HTML-to-PDF.html (in German however).
But there are a few more guides like this ("headless browser, HTML to PDF"). I would totally link to one of the dupes here on Stackoverflow. But I'm too lazy to search right now.
Since you are wanting to output the target page as a PDF I would look at using .rdlc (Report Definition Language Client). It is part of the Microsoft.Reporting namespace and is designed to work with asp.net. It is freely usable and redistributable.
In many cases the layout of a web page is not "printer friendly". By using this technique you can re-arrange the layout and spacing of the PDF output to a presentation that is more printer friendly.
This will not "directly" convert your page to a PDF, but rather allow you to adapt your page layout and data to a dataset and use that to build a report. That report can then be output programmatically at runtime using the reportviewer control. If this approach interests you, let me know and I will be glad to provide more help getting you through setting it up and using.
We are downloading images to our computers when we open new webpages. For example: If a webpage has an image(image.jpg), our computer downloads it while we are surfing that page.
Some webpages are using ajax methods. For example: You don't see an image on the page's source codes, however your computer downloads an image. Because, if you click a link on that page, ajax will be showing that image...
Let me show an example:
<div id="ajax_will_load_image_here"></div>
Okay, how can php curl see (or download) that image? Curl can't see that image when I try to use preg_match function. Actually there is an image. I want to download that image by using php curl. Any advice?
If i understand the question correctly there is no convinient way of doing that.
Your crawler/spider would have to parse the website and evaluate javascript.
There are libraries for that but support is very limited.
There are however methods where an actual browser is used to evaulate the page (without displaying it but setting proper environment variables like resolution etc).
Then the generated source including javascript dom modifications is available.
This is for example how the google search previews are generated.
But if you require user interaction it gets pretty specific and complicated.
I am sorry to dissapoint you, but using curl and preg metch the old school way we used to when javascript was not yet so common wont work.
However for most legit use cases this is more than sufficient and websites are today more and more designed to be non-javascript compliant. Especially the content for crawling purposes. It is a must in search engine optimization, and which website doesnt want that?