Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I have a question regarding webcrawling..
What I need is a webcrawler that can save all external links from a website and print them to a file (csv).
I am in the middle of developing it myself (with php), but was wondering if there were some downloadable solutions already (doesn't have to be php solution)..
Of course I have looked for myself, but couldn't find anything. So if anyone can help me out here, I would really appreciate it.
Also, what would be the best way to developing it be?
You can Simple HTML Dom Parser (http://simplehtmldom.sourceforge.net/)
Eg.
<?php
include 'simple_html_dom.php';
$html = file_get_html('http://google.com/');
foreach($html->find('a') as $element) {
$link[]=$element->href;
}
//Write into your CSV file
?>
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Are there any popular PHP libraries or services that can help detect duplicate content?
I run a site that has user generated content and I want to detect content that are similar or duplicated. Are there any popular libraries out there that can help with this?
Text similarity/plagiat/duplicate is a big topic. There are so many algos and solutions.
Some projects use the "adaptive local alignment of keywords" (you will find info on that on google.)
Also, you can check this (Check the 3 links in the answer, very instructive):
Cosine similarity vs Hamming distance
Hope this will help.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
How to extract text from the PDF document using PHP?
(I can't use other tools, I don't have root access)
I've found some functions working for plain text, but they don't handle well Unicode characters:
http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html
Download the class.pdf2text.php # https://pastebin.com/dvwySU1a or http://www.phpclasses.org/browse/file/31030.html (Registration required)
Code:
include('class.pdf2text.php');
$a = new PDF2Text();
$a->setFilename('filename.pdf');
$a->decodePDF();
echo $a->output();
class.pdf2text.php Project Home
pdf2textclass doesn't work with all the PDF's I've tested, If it doesn't work for you, try PDF Parser
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I am looking for a simple, easy-to-install, easy-to-customize PHP script that can handle questions and answers that are recorded in a database.
I have been looking for DAYS for an open source script without any luck!
Can anyone help direct me to a good OPEN-SOURCE script that does what I need?
Have you tried Googling for a Stackoveflow Clone?
You can view a comprehensive list at:
https://meta.stackexchange.com/questions/2267/stack-overflow-clones
Have you tried this one?
Although the question is very old but still... Try This : http://www.coordino.com/
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I want to know the PHP code for Word to HTML conversion. When we retrive the Word document from the database and display in the frontend, the Word document should display as HTML page.
wvware
I don't completly understand your request but I think you can try TinyMCE for PHP, it's a Web based Wysiwyg editor, where all you type and format will be saved as HTML code, you can even paste and convert Word documents.
So like symcbean said, the best way to do is wvWare, otherwise,you could hardcode a class yourself, but it will be a difficult path.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Does anyone here know of good batch file code indenters or beautifiers?
Specifically for PHP, JS and SGML-languages.
Preferably with options as to style.
The following page has code on it to tidy Javascript (written in javascript as well):
http://www.howtocreate.co.uk/tutorials/jsexamples/JSTidy.html
There are various ways to tidy SGML based files (i.e. XML) - HTMLTidy will often do the trick, and there are various 'pretty print' implementations in various languages out there.
And finally a link to a web site with PHP code for pretty printing PHP: http://tobyinkster.co.uk/blog/2007/07/17/php-pretty-printer/
For HTML/XML HTML Tidy is the best option:
http://tidy.sourceforge.net/