how to extract texts from PDFs using xpdf? - php

I have many PDFs in a folder. I want to extract the text from these PDFs using xpdf. For example :
example1.pdf extract to example1.txt
example2.pdf extract to example2.txt
etc..
here is my code :
<?php
$path = 'C:/AppServ/www/pdfs/';
$dir = opendir($path);
$f = readdir($dir);
while ($f = readdir($dir)) {
if (eregi("\.pdf",$f)){
$content = shell_exec('C:/AppServ/www/pdfs/pdftotext '.$f.' ');
$read = strtok ($f,".");
$testfile = "$read.txt";
$file = fopen($testfile,"r");
if (filesize($testfile)==0){}
else{
$text = fread($file,filesize($testfile));
fclose($file);
echo "</br>"; echo "</br>";
}
}
}
I get blank result. What's wrong with my code?

try using this :
$dir = opendir($path);
$filename = array();
while ($filename = readdir($dir)) {
if (eregi("\.pdf",$filename)){
$content = shell_exec('C:/AppServ/www/pdfs/pdftotext '.$filename.' ');
$read = strtok ($filename,".");
$testfile = "$read.txt";
$file = fopen($testfile,"r");
if (filesize($testfile)==0){}
else{
$text = fread($file,filesize($testfile));
fclose($file);
echo "</br>"; echo "</br>";
}
}

You do not have to create a temporary txt file
$command = '/AppServ/www/pdfs/pdftotext ' . $filename . ' -';
$a = exec($command, $text, $retval);
echo $text;
if it does not work check the error logs of the server.

The lines
echo "</br>";
echo "</br>";
should be
echo "</br>";
echo $text."</br>";
Hope this helps

Related

Adding a string to the end of each line but not the first line

I am trying to add a string to the end of eachline. So far this works. However I dont want the string to be added to the end of the first line. How can I do this?
So far i have got:
<?php
$EOLString="string \n";
$fileName = "file.txt";
$baseFile = fopen($fileName, "r");
$newFile="";
while(!feof($baseFile)) {
$newFile.= str_replace(PHP_EOL, $EOLString, fgets($baseFile));
}
fclose($baseFile);
file_put_contents("newfile.txt", $newFile);
$bingName = "newfile.txt";
$bingFile = fopen($bingName, "a+");
fwrite($bingFile,$EOLString);
fclose($bingFile);
?>
I have also tried to loop it by doing this:
<?php
$EOLString="string \n";
$fileName = "file.txt";
$baseFile = fopen($fileName, "r");
$newFile="";
$x = 0;
while(!feof($baseFile)) {
if ($x > 0) {
$newFile.= str_replace(PHP_EOL, $EOLString, fgets($baseFile));
}
$x++;
}
fclose($baseFile);
file_put_contents("newfile.txt", $newFile);
$bingName = "newfile.txt";
$bingFile = fopen($bingName, "a+");
fwrite($bingFile,$EOLString);
fclose($bingFile);
?>
So the end result would look like:
firstonestring
secondonestring
thirdonestring
and so on.
I hope you can help me!
Ben :)
Just add a counter to your loop:
$counter = 0;
while(!feof($baseFile)) {
$line = fgets($baseFile)
if($counter++ > 0){
$newFile.= str_replace(PHP_EOL, $EOLString, $line);
}else{
$newFile.= $line . "\n";
}
}
Also, you seem to be writting the new file, only to reopen it and append more data. There is no need to do that, just append to the contents before you write it the 1st time:
fclose($baseFile);
file_put_contents("newfile.txt", $newFile . $EOLString);
//$bingName = "newfile.txt";
//$bingFile = fopen($bingName, "a+");
//fwrite($bingFile,$EOLString);
//fclose($bingFile);
Alternativly, you can just read in the whole file, split into lines, and rejoin:
$EOLString="string \n";
$lines = explode("\n", file_get_contents("file.txt"));
$first = array_shift($lines);
file_put_contents("newfile.txt", $first . "\n" . implode($EOLString, $lines) . $EOLString);
//done!
By using a flag
$first = TRUE;//set true first time
while (!feof($baseFile)) {
$line = fgets($baseFile);
if (!$first) {// only enter for false
$newFile.= str_replace(PHP_EOL, $EOLString, $line);
}
$first = FALSE;// set false except first
}

PHP modify file: File is not modified, but no error messages printed to console

I have been searching the net for a while, but I really cannot find anything helpful. I cannot find the errors in my code and I'm not getting any in the console. I can open my file, I change it's string and I don't get any errors writing to it, but the file does not get modified. What am I missing?
$file = $_SERVER['DOCUMENT_ROOT']."/files/wsplaetze.OLD.txt";
$newline = "";
$handle = fopen($file, "r+");
$counter = 0;
if($handle) {
while(($line = fgets($handle)) !== false) {
++$counter;
$parts = explode(' ', $line);
$to_add = $parts[0] . " " . intval($parts[1]) . '\n';
foreach($ws_array as $ws) {
if($parts[0] == $ws) {
$anzahl = intval($parts[1])-1;
$to_add = $parts[0] . " " . $anzahl . '\n';
}
}
$newline .= $to_add;
}
} else {
debug_to_console(error_get_last()."ERROR");
}
debug_to_console("length ".$counter);
debug_to_console($newline);
if(!fwrite($handle, $newline)) {
debug_to_console("Error writing to file: ".$file);
} else {
debug_to_console("Great! Everything is fine! This is the file you wrote to: ".$file);
}
fclose($handle);
The file itself consists of ids and numbers associated with it, split by whitespaces, like this:
ab 12
abs 23
skd 12
etc.
What am I doing wrong? I really don't know where to look anymore, as I am not getting any of the error messages above!
I can read the file (correctly!), so the file name seems to be write as well? But then again, nothing changes.. I am really confused!
EDIT: Using the help in the comments, the code works now!! The file has to be re-opened! That way the old content is overwritten! I changed it to this:
$file = $_SERVER['DOCUMENT_ROOT']."files/wsplaetze.OLD.txt";
$newline = "";
$handle = fopen($file, "r+");
$counter = 0;
if($handle) {
while(($line = fgets($handle)) !== false) {
++$counter;
$parts = explode(' ', $line);
$to_add = $parts[0] . " " . intval($parts[1]) . "\n";
foreach($ws_array as $ws) {
if($parts[0] == $ws) {
$anzahl = intval($parts[1])-1;
$to_add = $parts[0] . " " . $anzahl . "\n";
}
}
$newline .= $to_add;
}
fclose($handle);
} else {
debug_to_console(error_get_last()."ERROR");
}
$filew = fopen($file, "w");
if($filew) {
fwrite($filew, $newline);
}
fclose($filew);
I'm assuming that you have already stored the content you want to write to the file into the $content variable.
$file = $_SERVER['DOCUMENT_ROOT']."/files/wsplaetze.OLD.txt";
// # prevents errors but allows the function to return true or false
if(#file_put_contents($file, $content, LOCK_EX)){
debug_to_console("Great! Everything is fine! This is the file you wrote to: ".$file);
}else{
debug_to_console("Error writing to file: ".$file);
}

split text by characters (with file_get_contents and file_put_contents)

I have the following code which works exactly as it should:
<?php
$text = "text"; $text_ok = urlencode($text);
if(!#file_get_contents("http://site.com/t=".$text_ok))
{
echo "Error.";
}
else
{
$data = file_get_contents("http://site.com/t=".$text_ok);
$file = "texts/".md5($text).".txt";
if(!file_exists($file)) {
file_put_contents($file, $data);
}
?>
Lorem <?php echo $file; ?>"> ipsum
<?php
}
?>
The problem is that http://site.com/t=$text_ok only works if $text is less than 25 characters. I'm wondering if it is possible when $text exceeds 25 characters to split into multiple parts and create files like texts/md5($text)/1.txt, texts/md5($text)/2.txt etc. I hope you understand. Any help is appreciated. Thanks!
Try this:
<?php
$text = "text";
$split = str_split($text, 25);
$count = 1;
foreach ($split as $s) {
$text_ok = urlencode($s);
if (!#file_get_contents("http://site.com/t=" . $text_ok)) {
echo "Error.";
} else {
$data = file_get_contents("http://site.com/t=" . $text_ok);
$file = "texts/" . md5($text) . "/" . $count . ".txt";
if (!file_exists($file)) {
file_put_contents($file, $data);
}
?>
Lorem <?php echo $file; ?>"> ipsum
<?php
}
$count++;
}
?>

Read line for first top 10 lines and display word in each line of 10 lines using PHP

I have the PHP code as below:
<?php
$path = "files/stats_pmta.affinitead.net.2012-12-12.txt";
$num = 10;
$fh = #fopen($path, 'r');
if ($fh){
for ($i=0;$i<$num;$i++){
$newfile = fgets($fh,1024);
$t = explode(";", $newfile);
echo $t;
echo "<br>";
}
} else {
echo 1;
}
?>
I want to read all data in file stats_pmta.affinitead.net.2012-12-12.txt that read only first 10 lines of the file below:
2012-12-12-0551;affinitead.net;1221588;106346;8.70;gmail.com;123577;7780;6.29
2012-12-12-0551;affinitead.net;1221588;106346;8.70;wanadoo.fr;123562;9227;7.46
2012-12-12-0551;affinitead.net;1221588;106346;8.70;yahoo.fr;104819;1685;1.60
2012-12-12-0551;affinitead.net;1221588;106346;8.70;orange.fr;87132;7341;8.42
2012-12-12-0551;affinitead.net;1221588;106346;8.70;laposte.net;79597;1040;1.30
2012-12-12-0551;affinitead.net;1221588;106346;8.70;hotmail.fr;77601;14107;18.17
2012-12-12-0551;affinitead.net;1221588;106346;8.70;neuf.fr;67392;1793;2.66
2012-12-12-0551;affinitead.net;1221588;106346;8.70;hotmail.com;55300;10494;18.97
2012-12-12-0551;affinitead.net;1221588;106346;8.70;free.fr;43422;1706;3.92
2012-12-12-0551;affinitead.net;1221588;106346;8.70;sfr.fr;39063;251;.64
2012-12-12-0551;affinitead.net;1221588;106346;8.70;aol.com;32061;9859;30.75
2012-12-12-0551;affinitead.net;1221588;106346;8.70;club-internet.fr;22424;233;1.03
2012-12-12-0551;affinitead.net;1221588;106346;8.70;yahoo.com;18646;1365;7.32
2012-12-12-0551;affinitead.net;1221588;106346;8.70;voila.fr;18513;3650
After I read first top 10 lines I want to display the word in each line like:
2012-12-12-0551;affinitead.net;1221588;106346;8.70;gmail.com;123577;7780;6.29
I want to display gmail.com 123577 7780 6.29
But PHP code above I just got the output array.I don't know how to fix this.Anyone help me please , Thanks.
You could do something like this:
$path = 'path/to/file'
$fp = fopen($path, 'r');
$count = 0;
while($columns = fgetcsv($fp, 256, ';', '"')
{
if(++$count > 10)
break;
echo implode("\t", $colums);
}
If you don't know, how implode works, look here: http://de1.php.net/manual/de/function.implode.php
This is a way you could do it by using the PHP-function explode.
$textFile = file_get_contents("test.txt");
$lineFromText = explode("\n", $textFile);
$row = 0;
foreach($lineFromText as $line)
{
if($row <= 10)
{
$words = explode(";",$line);
echo $words[5] . ' ' . $words[6] . ' ' . $words[7] . ' ' . $words[8];
$row++;
}
}
Edited the code so that you can replace your own, you might want to check if the file is empty e t c before trying to do anyting thou.

PHP search function on website. (easy)

I have implemented a php search function on a clients website. What I would like it to do is search for files within the website directory for specific pdf files.
However I can't seem to get it to work. If I type in "pdf" into the search box it returns all the files in the directory but if I put in a specific file name then it returns nothing.
Below is the php script I am using:
<?php
$my_server = "http://www.gwent.org".":".getenv("http://www.gwent.org_80");
$my_root = getenv("docroot/");
$s_dirs = array("");
$hits = null;
$full_url = $_SERVER['PHP_SELF'];
$site_url = eregi_replace('customer_information.php', '', $full_url);
$directory_list = array('sold_msds');
$s_files = ".pdf";
foreach($directory_list as $dirlist)
{
$directory_url = $site_url.$dirlist."/";
$getDirectory = opendir($dirlist);
while($dirName = readdir($getDirectory))
$getdirArray[] = $dirName;
closedir($getDirectory);
$dirCount = count($getdirArray);
sort($getdirArray);
for($dir=0; $dir < $dirCount; $dir++)
{
if (substr($getdirArray[$dir], 0, 1) != ".")
{
$label = eregi_replace('_', ' ', $getdirArray[$dir]);
$directory = $dirlist.'/'.$getdirArray[$dir]."/";
$complete_url = $site_url.$directory;
if(is_dir($directory))
{
$myDirectory = opendir($directory);
$dirArray = null;
while($entryName = readdir($myDirectory))
$dirArray[] = $entryName;
closedir($myDirectory);
$indexCount = count($dirArray);
sort($dirArray);
}
else
{
$hits++;
if(file_exists($dirlist."/".$label))
{
$fd=fopen($dirlist."/".$label, "r");
$text=fread($fd, 50000);
$keyword_html = htmlentities($keyword);
if(!empty($keyword))
{
$do=stristr($text, $keyword) || stristr($text, $keyword_pdf);
}
if($do)
{
$strip = strip_tags($text);
$keyword = preg_quote($keyword);
$keyword = str_replace("/","\/","$keyword");
$keyword_html = preg_quote($keyword_html);
$keyword_html = str_replace("/","\/","$keyword_html");
echo "<span>";
if(preg_match_all("/((\s\S*){0,3})($keyword|$keyword_html)((\s?\S*){0,3})/i", $strip, $match, PREG_SET_ORDER));
{
$number=count($match);
if($number > 0)
{
echo "<a href='".$dirlist."/".$label."'>".$label."</a> (".$number.")";
echo "<br />";
}
for ($h=0;$h<$number;$h++)
{
if (!empty($match[$h][3]))
{
printf("<i><b>..</b> %s<b>%s</b>%s <b>..</b></i>", $match[$h][1], $match[$h][3], $match[$h][4]);
}
}
echo "</span><br /><br />";
if($number > 0):
echo "<hr />";
endif;
}
}
}
}
}
}
}
?>
Many thanks In advance
Look up the glob function http://php.net/manual/en/function.glob.php
$found = glob("/path/to/dir/*.pdf");
Edit: Nevermind your question makes it sound completely different to what your code is doing. Im guessing what i posted is incorrect
Simple search, this is not recursive. Give it a directory and it will spit out the found files
$files = glob("c:/xampp/htdocs/*.php");
if(empty($files)) {
echo "No PHP Files Found";
}
else {
foreach($files as $f) {
echo "PHP File Found: ".$f."\n";
}
}

Categories