A server-side library to extract the content of web-pages [closed] - php

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I'm looking for a server-side library (preferably in PHP) to parse and extract the content of web-pages that is free for commercial use. It should be able to extract the headline and html (including images) of the content part of a page, but filter out ads and irrelevant content.
The Readability Parser API is a non-free software that does that, but I'm looking for free alternatives.
Any thoughts?

I'm using Boilerpipe. It's for Java unfortunately, but if you won't find anything in PHP, it may be useful to you. It's not perfect, obviously, but it's worth a try. It's also open source, thus it's possible to make necessary changes.
It has several so-called 'extractors', so you can choose the one which suits your need the most.
Usage is also pretty straightforward, on example:
URL url = new URL("http://example.com/article");
String articleText = ArticleExtractor.INSTANCE.getText(url);

Try using Simple HTML DOM
I used it to build a scraper for a rather complex website. Works very well.

The best way to get any data from page, like the geographic position of the Eiffel Tower from Wikipedia, is jQuery DOM.
<span class="geo-dms">
<span class="geo-lat">48°51′29″</span>
<span class="geo-lon">2°17′40″</span>
</span>
Test in FireBug console jQuery('.geo-lat').text(). jQuery is a JavaScript library and the best result you get with server-side JavaScript web-server Node.js. There is a lot of good Node.js solutions for web crawling with DOM traversing.

Related

HTML to PDF converter using PHP [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I work on one project, I used html2pdf by spipu but unfortunately I encountered problems at the end, such as problems with pictures and also rendering. My question is what would you recommend if I want to convert the automatically generated html into a php file to pdf on my own domain and site. My idea is something like this that my given html generated code, which already shows me a browser as it would convert to pdf (using html5 ...). Also, and this is probably the biggest problem I need to get rid of before and after element. I was looking at different kinds but some require api key and registering and then converting this html somewhere else, and I would like to avoid it.
I've used tcpdf in the past it works great and it's open source.
Take a look at their website they have plenty of examples that might get you going.
Wanted to mention this answer as a comment since the question is vague but my SO reputation prevents me from that still..

How do I program this thing (extract data from other website) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am a union member of a airlines company, and I possess the elementary level of HTML, PHP and MYSQL. I have experience programming a library system and personnel system.
Our union would like to create an online platform, one of the function is to allow our crews to calculate their flying allowances and salary easily.
I think first of all, I need the roster data, so the platform is required to log in my company website in order to extract the roster data, and then I can code with PHP.
Therefore, I wonder if it is possible to write PHP code "log in to my company website and extract the data".
Or what is the best language for this program you recommend? Maybe I can learn a new language if PHP is not applicable.
Thanks for your attention.
Welcome to Stack Overflow.
To answer your question, Yes, you can write this in PHP. And by you, I mean you! None of us know the site and there is no way for us to know how it operates.
You may want to consider if there is an API for this site. You may want to load the site via cURL or another HTTP request into PHP and parse the HTML. There are a lot of ways to do this.
PHP is not the only language to perform this in: ASP.NET, Perl, Python... in the end, it sounds like you need to collect data that you would normally see using a web browser, so you're going to mimic that behavior with whatever language you choose to use.

what PDF creator do you use with laravel? [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Can anybody advise on the best PDF generator class/library to use with PHP? Preferably one which is maintained.
I am aware that this is a duplicate of the following question, however, the accepted answer is over 3 years old and I want to know whether the answer has changed since this time.
Which one is the best PDF-API for PHP?
Thank you
Try TCPDF, have good features
http://www.tcpdf.org/examples.php
Also simple HTML to PDF Converter API in (PHP, C#, ASP.net C#, ASP VB.net, JAVA,...)
from "PDF CROWD"
http://pdfcrowd.com/html-to-pdf-api/
very simple to use, but I think this API may need to purchase even they provide a free test account..
Have you tried http://www.PDFnow.com?
Provides a powerful template engine, and is pretty easy to use.
Supports complex layouts, layouts for multiple pages, invoices spreading separate pages, pagenumbers, headers, footers, etc. Definitively much better than fpdf.
You can simply integrate it into your PHP code by:
generatePdf(<templateName>, <ParameterArray>);
very straightforward.
Have a look on http://wkhtmltopdf.org/ Convert HTML to PDF using WebKit engine.
It can be used from PHP easily. For example, there is a bundle for Symfony2: http://knpbundles.com/KnpLabs/KnpSnappyBundle
Best One is TCPDF
http://www.tcpdf.org/
Never Use DOMPDF

Looking for PHP/Flash graphics library for interactive networked object presentation [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
We are looking for a PHP library something like graphviz.org which generates an image of a networked groups of objects as shown below. Graphviz also enables you to make each of the nodes a hyperlink, but we are looking for something more interactive, e.g. even with Flash that would be able to react to a click which deletes a node and quickly redraws that area so the other nodes fill in the space, etc.
Does anyone know of a PHP library which generates networked object maps like this but that are also interactive so that nodes can easily added and deleted etc.?
You can give jsPlumb a try:
http://code.google.com/p/jsplumb/
Demo:
http://jsplumb.org/jquery/dynamicAnchorsDemo.html
JsPlumb is cool but you have to position the nodes yourself.
Take a look at http://arborjs.org. It will take care of positioning. The problem there is when you try the examples keep an eye on your cpu usage in firefox :(
See also:
http://www.graphdracula.net/showcase/
http://flare.prefuse.org/demo
http://mbostock.github.com/protovis/ex/force.html
http://js-graph-it.sourceforge.net/index.html
https://github.com/jackrusher/jssvggraph
http://code.google.com/p/jsdot/
http://cytoscapeweb.cytoscape.org/demo
http://flare.prefuse.org/launch/apps/dependency_graph
http://hypertree.woot.com.ar/
I have been in your boat bro. If you can take care of node positioning yourself stick with JsPlumb. They just released 1.3.1 and its a nice release.

PHP and Javascript Documentation Generator [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I'm in the process of generating API docs for an in-house web app that's undergoing some expansion. It's a DHTML project, with a mix of both some OO and mostly procedural PHP, and purely procedural Javascript. At the moment, it's pretty much all documented for the appropriate doc generators (phpdocumentor and jsdoc), but the two were never "connected". I could go through and add manual link statements to the doc blocks, but managing all those links (like "../jsdoc/filename.html#function) is a real pain.
Any suggestions for documentation generators that handle both PHP and JavaScript, and allow something like #see functionName between languages?
If worst comes to worst, I can hack together a script to rewrite LINK URLs from some magic syntax (i.e. js: and php:), but I'd really rather have something that will allow a unified tree view of everything.
Thanks,
Jason
After looking at a number of options, I wrote a PHP script that parses JS files, pulls out the doc blocks and function definitions, and then writes it to a file that phpdoc can process. It just needs one line added to phpDocumentor.ini so it will parse .js files.
The blog post talking about it is at:
http://blog.jasonantman.com/2010/08/documentation-generation-for-web-apps-php-and-javascript/
And the script is at:
http://svn.jasonantman.com/misc-scripts/

Categories