phpexcel reads long numbers as boolean

phpexcel reads long numbers as boolean - php

I'm using Symfony2.3.4, PHP5.6.3 and PHPExcel 1.8.0.
When I tried to read an excel file it works OK for almost all cells.
If the cell contains a very large number, when I read it and show the value in an html view it outputs false.
I tried to use a custom value binder like Mark Baker instructed here but I couldn't make it work, it just comes as a boolean right from the beginning.
IMPORTANT:
The excels I'm trying to load in the html are downloaded(generated) from another site and I noticed when you try to open them with Microsoft Excel, it first prompts you with a warning window telling the user that the FILE EXTENSION AND THE FILE FORMAT DO NOT MATCH, although if you choose to open it anyway, it opens fine.
I think that's what's causing the problem, I'm almost sure(I can't contact the guys who implemented the other site's download function) they did something like this:
$objWriter = \PHPExcel_IOFactory::createWriter($objPHPExcel, $ext == 'xlsx' ?
'Excel5' : 'Excel2007');
when they should have done something like this:
$objWriter = \PHPExcel_IOFactory::createWriter($objPHPExcel, $ext == 'xls' ?
'Excel5' : 'Excel2007');
making the EXTENSION and the FORMAT match, as instructed in the PHPExcel's docs.
If you need any specific clarification please ask.
My code to load the file into the html:
public function uploadAction() {
$request = $this->getRequest();
$form = $this->createFormBuilder()
->add('file', 'file')
->getForm();
if ($request->getMethod() == 'POST'){
$form->submit($request);
$file = $form['file'];
$file->getData()->move(
'uploads', $form['file']->getData()->getClientOriginalName());
$ext = pathinfo($file->getData()->getClientOriginalName(), PATHINFO_EXTENSION);
$name = pathinfo($file->getData()->getClientOriginalName(), PATHINFO_BASENAME);
//$objReader = \PHPExcel_IOFactory::createReader('xlsx' == $ext ? 'Excel2007' : 'Excel5');
$objReader = \PHPExcel_IOFactory::createReaderForFile('uploads/' . $name);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load('uploads/' . $name);
$activeSheet = $objPHPExcel->getActiveSheet();
$rowIter = $activeSheet->getRowIterator();
foreach ($rowIter as $key => $row) {
$columns = array();
$cellIterator = $row->getCellIterator();
$cellIterator->setIterateOnlyExistingCells(false);
foreach ($cellIterator as $cell)
$columns[] = $cell->getCalculatedValue();
}
}
}
NOTE: I really don't know the difference between:
$objReader = \PHPExcel_IOFactory::createReader('xlsx' == $ext ? 'Excel2007' : 'Excel5');
and
$objReader = \PHPExcel_IOFactory::createReaderForFile('uploads/' . $name);
I DO know I can't use the first because of the problem I described above about the files being ill-generated and so. If I try to use it, the browser goes:
The filename uploads/<name>.xls is not recognised as an OLE file.
Can anyone point me to a workaround, because it's now me on the hook and I'm supposed to make it work somehow. Maybe there's nothing wrong with the files and it's me doing something wrong. Please help, this is causing me problems with dates too but one step at a time.
EDIT:
This is but the read function in OLERead.php.
I was browsing it and var_dump-ing all I could get my hands on.
As you can see there are two var_dumps in the code below, those output:
string '<div>
' (length=8)
string '��ࡱ�' (length=8)
Which doesn't happen when I try it with a regular .xls file created manually:
string '��ࡱ�' (length=8)
string '��ࡱ�' (length=8)
I guessed you could use this better than me if it helps at all. Thanks again.
public function read($sFileName) {
// Check if file exists and is readable
if (!is_readable($sFileName)) {
throw new PHPExcel_Reader_Exception("Could not open " . $sFileName . " for reading! File does not exist, or it is not readable.");
}
// Get the file identifier
// Don't bother reading the whole file until we know it's a valid OLE file
$this->data = file_get_contents($sFileName, FALSE, NULL, 0, 8);
////VAR_DUMPSSSSSSSSSSSS
var_dump($this->data);
var_dump(self::IDENTIFIER_OLE);
die();
// Check OLE identifier
if ($this->data != self::IDENTIFIER_OLE) {
throw new PHPExcel_Reader_Exception('The filename ' . $sFileName . ' is not recognised as an OLE file');
}
// Get the file data
$this->data = file_get_contents($sFileName);
// Total number of sectors used for the SAT
$this->numBigBlockDepotBlocks = self::_GetInt4d($this->data, self::NUM_BIG_BLOCK_DEPOT_BLOCKS_POS);
// SecID of the first sector of the directory stream
$this->rootStartBlock = self::_GetInt4d($this->data, self::ROOT_START_BLOCK_POS);
// SecID of the first sector of the SSAT (or -2 if not extant)
$this->sbdStartBlock = self::_GetInt4d($this->data, self::SMALL_BLOCK_DEPOT_BLOCK_POS);
// SecID of the first sector of the MSAT (or -2 if no additional sectors are used)
$this->extensionBlock = self::_GetInt4d($this->data, self::EXTENSION_BLOCK_POS);
// Total number of sectors used by MSAT
$this->numExtensionBlocks = self::_GetInt4d($this->data, self::NUM_EXTENSION_BLOCK_POS);
$bigBlockDepotBlocks = array();
$pos = self::BIG_BLOCK_DEPOT_BLOCKS_POS;
$bbdBlocks = $this->numBigBlockDepotBlocks;
if ($this->numExtensionBlocks != 0) {
$bbdBlocks = (self::BIG_BLOCK_SIZE - self::BIG_BLOCK_DEPOT_BLOCKS_POS) / 4;
}
for ($i = 0; $i < $bbdBlocks; ++$i) {
$bigBlockDepotBlocks[$i] = self::_GetInt4d($this->data, $pos);
$pos += 4;
}
for ($j = 0; $j < $this->numExtensionBlocks; ++$j) {
$pos = ($this->extensionBlock + 1) * self::BIG_BLOCK_SIZE;
$blocksToRead = min($this->numBigBlockDepotBlocks - $bbdBlocks, self::BIG_BLOCK_SIZE / 4 - 1);
for ($i = $bbdBlocks; $i < $bbdBlocks + $blocksToRead; ++$i) {
$bigBlockDepotBlocks[$i] = self::_GetInt4d($this->data, $pos);
$pos += 4;
}
$bbdBlocks += $blocksToRead;
if ($bbdBlocks < $this->numBigBlockDepotBlocks) {
$this->extensionBlock = self::_GetInt4d($this->data, $pos);
}
}
$pos = 0;
$this->bigBlockChain = '';
$bbs = self::BIG_BLOCK_SIZE / 4;
for ($i = 0; $i < $this->numBigBlockDepotBlocks; ++$i) {
$pos = ($bigBlockDepotBlocks[$i] + 1) * self::BIG_BLOCK_SIZE;
$this->bigBlockChain .= substr($this->data, $pos, 4 * $bbs);
$pos += 4 * $bbs;
}
$pos = 0;
$sbdBlock = $this->sbdStartBlock;
$this->smallBlockChain = '';
while ($sbdBlock != -2) {
$pos = ($sbdBlock + 1) * self::BIG_BLOCK_SIZE;
$this->smallBlockChain .= substr($this->data, $pos, 4 * $bbs);
$pos += 4 * $bbs;
$sbdBlock = self::_GetInt4d($this->bigBlockChain, $sbdBlock * 4);
}
// read the directory stream
$block = $this->rootStartBlock;
$this->entry = $this->_readData($block);
$this->_readPropertySets();
}

The difference between
$objReader = \PHPExcel_IOFactory::createReader('xlsx' == $ext ? 'Excel2007' : 'Excel5');
and
$objReader = \PHPExcel_IOFactory::createReaderForFile('uploads/' . $name);
The first is trusting that the extension is correct for the actual format of the file, that a file with an extension of .xlsx really is an OfficeOpenXML-format file or an extension of .xls really is a BIFF-format file, and then telling PHPExcel to use the appropriate reader.
This isn't normally a problem unless it isn't (for example) just HTML markup in a file with an .xls or .xlsx extension.... then you're selecting the wrong Reader for the actual format of the file; and this is what MS Excel itself is telling you with its message that "FILE EXTENSION AND THE FILE FORMAT DO NOT MATCH"
The second is using PHPExcel's identify() method to work out what format the file really is (irrespective of what it claims to be based on a false extension), and then selecting the appropriate Reader for that format.
EDIT
Unsure exactly how large your large numbers are, but I'll take a look at the HTML Reader and see if I can identify why it should be giving a boolean false instead of an actual numeric value

Related

PDF Files from database keep getting corrupted

So I am storing my files in a database. Don't ask why, just know that I am not in control of this. Next, I am able to successfully store them as a hexidecimal representation and then spit them back for display with no problem, but then I attach them to an email using PHPMailer and they get sent properly with the right name and all, but they are corrupted. I will walk you through step by step below so that you know exactly how it is being stored, and this may help me debug my issue. (Please note that all code is paraphrased to save space and only show what is needed)
STEP 1
File is grabbed and then processed
$name = $_FILES['file_data']['name'];
$file = prepareImageDBString($_FILES['file_data']['tmp_name']);
$mime_type = $_FILES['file_data']['type'];
name, file, and mime_type are stored
here is the function prepareImageDBString()
function prepareImageDBString($filepath){
$out = 'null';
$handle = #fopen($filepath, 'r');
if($handle){
$content = #fread($handle, filesize($filepath));
$content = bin2hex($content);
#fclose($handle);
$out = $content;
}
return $out;
}
STEP 2
When the file is being viewed I show it as an embedded object. This file is small so I just posted the whole code. Do note that the file shows up with no problems here.
$q = "SELECT lease_doc_file_data FROM lease_doc_file WHERE lease_doc__id ='".$_GET['id']."'";
$file = "";
foreach($CONN->query($q) as $row){
$file = $row['lease_doc_file_data'];
}
if(!empty($file)){
header("Content-type: application/pdf");
ob_clean();
flush();
echo hextobin($file);
}
Here is the function hextobin()
function hextobin($hexstr){
$n = strlen($hexstr);
$sbin = "";
$i = 0;
while($i < $n){
$a = substr($hexstr,$i,2);
$c = pack("H*", $a);
if ( $i == 0 ){ $sbin = $c; }
else { $sbin .= $c;}
$i += 2;
}
return $sbin;
}
STEP 3
Finally the part where I go to send it as a mailer.
$q = "SELECT lease_doc_file_data, lease_doc_file_name, lease_doc_file_type FROM lease_doc_file WHERE lease_doc__id ='$id'";
$file_data = "";
$file_name = "";
$file_type = "";
foreach($CONN->query($q) as $row){
$file_data = $row['lease_doc_file_data'];
$file_name = $row['lease_doc_file_name'];
$file_type = $row['lease_doc_file_type'];
}
$file_data = hextobin($file_data);
$mail->AddStringAttachment($file_data, $file_name, 'binary', $file_type);
So this is the three step process and I"m not sure where the error is coming from. Hopefully someone can help! Thank you for all help in advance!

Continue Loop if Term is in an Array (odd result)

The script in question takes an excel file of language vocabulary (French to English, etc) and creates XML (zips and downloads) to format a crossword generator we use.
It works, but I've been asked to remove any duplicate terms as an enhancement. Below is the original code in full, and then the new code to skip duplicate terms. With the new code, everything runs, but it creates a corrupt ZIP. Please see the before and after code and tell me what is going on:
Full before working code:
<?php
/** Error reporting */
error_reporting(E_ALL);
ini_set('display_errors', TRUE);
ini_set('display_startup_errors', TRUE);
define('EOL',(PHP_SAPI == 'cli') ? PHP_EOL : '<br />');
/** PHPExcel */
require_once 'Classes/PHPExcel.php';
require_once 'Classes/PHPExcel/IOFactory.php';
/** Functions */
require_once 'zip.php';
require_once 'named_to_number.php';
if ($_FILES["file"]["error"] > 0) {
echo "Error: " . $_FILES["file"]["error"] . "<br>";
}
/** Create Excel Object using PHPExcel **/
$inputFileName = $_FILES["file"]["tmp_name"];
$objPHPExcel = PHPExcel_IOFactory::load($inputFileName);
$objWorksheet = $objPHPExcel->getActiveSheet();
/** Get how many rows **/
$highestRow = $objWorksheet->getHighestRow();
/** Keeps track of chapters to make new files on change **/
$chapter = -1;
/** For keeping track of the files created to zip & delete. **/
$files_to_zip = array();
/** Iterates through every row, writing data from cells into XML files. **/
for ($row = 2; $row <= $highestRow; ++$row) {
//remove spaces
$term = str_replace(' ', '', $objWorksheet->getCellByColumnAndRow(1, $row)->getValue());
//skip terms if they are too long or contain a non-alpha character.
if (strlen($term) >= 17 || !preg_match('/^\p{L}+$/ui', $term)) {
continue;
}
//translates accented characters to numbered HTML code
$term = named_to_number(htmlentities($term, ENT_SUBSTITUTE , 'UTF-8'));
/** Checks first column to see if the chapter has changed.
If it has, the current file will be closed and a new one opened. **/
if ($chapter != $objWorksheet->getCellByColumnAndRow(0, $row)->getValue()){
fwrite ($f, "</words>\n</content>");
fclose($f);
if (strlen($objWorksheet->getCellByColumnAndRow(0, $row)->getValue()) < 2) {
$filename = 'ch0' . $objWorksheet->getCellByColumnAndRow(0, $row)->getValue() . '.xml';
} else {
$filename = 'ch' . $objWorksheet->getCellByColumnAndRow(0, $row)->getValue() . '.xml';
}
$f = fopen($filename, 'a');
/** Add to the list of files to zip and delete **/
array_push($files_to_zip, $filename);
fwrite ($f, "<content>\n<words>\n");
/** Update chapter value **/
$chapter = $objWorksheet->getCellByColumnAndRow(0, $row)->getValue();
}
/** Write terms **/
$data =
"<word><entry>" . $term .
"</entry><clue>" . $objWorksheet->getCellByColumnAndRow(2, $row)->getValue() .
"</clue></word>\n";
fwrite ($f, $data);
}
fwrite ($f, "</words>\n</content>");
fclose($f);
/** Removes any blank ch.xml files **/
if(($key = array_search('ch.xml', $files_to_zip)) !== false) {
unset($files_to_zip[$key]);
unlink('chapter.htm');
}
$zip = create_zip($files_to_zip, 'crossword.zip');
foreach($files_to_zip as &$del){
unlink($del);
}
header("Content-disposition: attachment; filename=crossword.zip");
header("Content-type: application/zip");
readfile("crossword.zip");
unlink("crossword.zip");
?>
Relevant snippet where code was added (comments only related to new code):
//array to be used to hold terms to check against
$used = array();
for ($row = 2; $row <= $highestRow; ++$row) {
$term = str_replace(' ', '', $objWorksheet->getCellByColumnAndRow(1, $row)->getValue());
//Add term to running array to check against to see if it exists
array_push($used, $term);
//Added a third condition to check the array to see if the term exists. I have also tried this using isset with array_flip.
if (strlen($term) >= 17 || !preg_match('/^\p{L}+$/ui', $term) || in_array($term, $used)) {
continue;
}
Again, the odd thing is that the script runs, but it just produces a bad zip. It is definitely that third conditional that is tripping the script up. I have tried it in its own if statement (just to be sure the syntax was right), but the problem persists.
Please help!
Thanks,
Mike

Tailing Log File and Write results to new file

I'm not sure how to word this so I'll type it out and then edit and answer any questions that come up..
Currently on my local network device (PHP4 based) I'm using this to tail a live system log file: http://commavee.com/2007/04/13/ajax-logfile-tailer-viewer/
This works well and every 1 second it loads an external page (logfile.php) that does a tail -n 100 logfile.log The script doesn't do any buffering so the results it displayes onscreen are the last 100 lines from the log file.
The logfile.php contains :
<? // logtail.php $cmd = "tail -10 /path/to/your/logs/some.log"; exec("$cmd 2>&1", $output);
foreach($output as $outputline) {
echo ("$outputline\n");
}
?>
This part is working well.
I have adapted the logfile.php page to write the $outputline to a new text file, simply using fwrite($fp,$outputline."\n");
Whilst this works I am having issues with duplication in the new file that is created.
Obviously each time tail -n 100 is run produces results, the next time it runs it could produce some of the same lines, as this repeats I can end up with multiple lines of duplication in the new text file.
I can't directly compare the line I'm about to write to previous lines as there could be identical matches.
Is there any way I can compare this current block of 100 lines with the previous block and then only write the lines that are not matching.. Again possible issue that block A & B will contain identical lines that are needed...
Is it possible to update logfile.php to note the position it last tooked at in my logfile and then only read the next 100 lines from there and write those to the new file ?
The log file could be upto 500MB so I don't want to read it all in each time..
Any advice or suggestions welcome..
Thanks
UPDATE # 16:30
I've sort of got this working using :
$file = "/logs/syst.log";
$handle = fopen($file, "r");
if(isset($_SESSION['ftell'])) {
clearstatcache();
fseek($handle, $_SESSION['ftell']);
while ($buffer = fgets($handle)) {
echo $buffer."<br/>";
#ob_flush(); #flush();
}
fclose($handle);
#$_SESSION['ftell'] = ftell($handle);
} else {
fseek($handle, -1024, SEEK_END);
fclose($handle);
#$_SESSION['ftell'] = ftell($handle);
}
This seems to work, but it loads the entire file first and then just the updates.
How would I get it start with the last 50 lines and then just the updates ?
Thanks :)
UPDATE 04/06/2013
Whilst this works it's very slow with large files.
I've tried this code and it seems faster, but it doesn't just read from where it left off.
function last_lines($path, $line_count, $block_size = 512){
$lines = array();
// we will always have a fragment of a non-complete line
// keep this in here till we have our next entire line.
$leftover = "";
$fh = fopen($path, 'r');
// go to the end of the file
fseek($fh, 0, SEEK_END);
do{
// need to know whether we can actually go back
// $block_size bytes
$can_read = $block_size;
if(ftell($fh) < $block_size){
$can_read = ftell($fh);
}
// go back as many bytes as we can
// read them to $data and then move the file pointer
// back to where we were.
fseek($fh, -$can_read, SEEK_CUR);
$data = fread($fh, $can_read);
$data .= $leftover;
fseek($fh, -$can_read, SEEK_CUR);
// split lines by \n. Then reverse them,
// now the last line is most likely not a complete
// line which is why we do not directly add it, but
// append it to the data read the next time.
$split_data = array_reverse(explode("\n", $data));
$new_lines = array_slice($split_data, 0, -1);
$lines = array_merge($lines, $new_lines);
$leftover = $split_data[count($split_data) - 1];
}
while(count($lines) < $line_count && ftell($fh) != 0);
if(ftell($fh) == 0){
$lines[] = $leftover;
}
fclose($fh);
// Usually, we will read too many lines, correct that here.
return array_slice($lines, 0, $line_count);
}
Any way this can be amend so it will read from the last known position.. ?
Thanks

Introduction
You can tail a file by tracking the last position;
Example
$file = __DIR__ . "/a.log";
$tail = new TailLog($file);
$data = $tail->tail(100) ;
// Save $data to new file
TailLog is a simple class i wrote for this task here is a simple example to show its actually tailing the file
Simple Test
$file = __DIR__ . "/a.log";
$tail = new TailLog($file);
// Some Random Data
$data = array_chunk(range("a", "z"), 3);
// Write Log
file_put_contents($file, implode("\n", array_shift($data)));
// First Tail (2) Run
print_r($tail->tail(2));
// Run Tail (2) Again
print_r($tail->tail(2));
// Write Another data to Log
file_put_contents($file, "\n" . implode("\n", array_shift($data)), FILE_APPEND);
// Call Tail Again after writing Data
print_r($tail->tail(2));
// See the full content
print_r(file_get_contents($file));
Output
// First Tail (2) Run
Array
(
[0] => c
[1] => b
)
// Run Tail (2) Again
Array
(
)
// Call Tail Again after writing Data
Array
(
[0] => f
[1] => e
)
// See the full content
a
b
c
d
e
f
Real Time Tailing
while(true) {
$data = $tail->tail(100);
// write data to another file
sleep(5);
}
Note: Tailing 100 lines does not mean it would always return 100 lines. It would return new lines added 100 is just the maximum number of lines to return. This might not be efficient where you have heavy logging of more than 100 line per sec is there is any
Tail Class
class TailLog {
private $file;
private $data;
private $timeout = 5;
private $lock;
function __construct($file) {
$this->file = $file;
$this->lock = new TailLock($file);
}
public function tail($lines) {
$pos = - 2;
$t = $lines;
$fp = fopen($this->file, "r");
$break = false;
$line = "";
$text = array();
while($t > 0) {
$c = "";
// Seach for End of line
while($c != "\n" && $c != PHP_EOL) {
if (fseek($fp, $pos, SEEK_END) == - 1) {
$break = true;
break;
}
if (ftell($fp) < $this->lock->getPosition()) {
break;
}
$c = fgetc($fp);
$pos --;
}
if (ftell($fp) < $this->lock->getPosition()) {
break;
}
$t --;
$break && rewind($fp);
$text[$lines - $t - 1] = fgets($fp);
if ($break) {
break;
}
}
// Move to end
fseek($fp, 0, SEEK_END);
// Save Position
$this->lock->save(ftell($fp));
// Close File
fclose($fp);
return array_map("trim", $text);
}
}
Tail Lock
class TailLock {
private $file;
private $lock;
private $data;
function __construct($file) {
$this->file = $file;
$this->lock = $file . ".tail";
touch($this->lock);
if (! is_file($this->lock))
throw new Exception("can't Create Lock File");
$this->data = json_decode(file_get_contents($this->lock));
// Check if file is valida json
// Check if Data in the original files as not be delete
// You expect data to increate not decrease
if (! $this->data || $this->data->size > filesize($this->file)) {
$this->reset($file);
}
}
function getPosition() {
return $this->data->position;
}
function reset() {
$this->data = new stdClass();
$this->data->size = filesize($this->file);
$this->data->modification = filemtime($this->file);
$this->data->position = 0;
$this->update();
}
function save($pos) {
$this->data = new stdClass();
$this->data->size = filesize($this->file);
$this->data->modification = filemtime($this->file);
$this->data->position = $pos;
$this->update();
}
function update() {
return file_put_contents($this->lock, json_encode($this->data, 128));
}
}

Not really clear on how you want to use the output but would something like this work ....
$dat = file_get_contents("tracker.dat");
$fp = fopen("/logs/syst.log", "r");
fseek($fp, $dat, SEEK_SET);
ob_start();
// alternatively you can do a while fgets if you want to interpret the file or do something
fpassthru($fp);
$pos = ftell($fp);
fclose($fp);
echo nl2br(ob_get_clean());
file_put_contents("tracker.dat", ftell($fp));
tracker.dat is just a text file that contains where the read position position was from the previous run. I'm just seeking to that position and piping the rest to the output buffer.

Use tail -c <number of bytes, instead of number of lines, and then check the file size. The rough idea is:
$old_file_size = 0;
$max_bytes = 512;
function last_lines($path) {
$new_file_size = filesize($path);
$pending_bytes = $new_file_size - $old_file_size;
if ($pending_bytes > $max_bytes) $pending_bytes = $max_bytes;
exec("tail -c " + $pending_bytes + " /path/to/your_log", $output);
$old_file_size = $new_file_size;
return $output;
}
The advantage is that you can do away with all the special processing stuff, and get good performance. The disadvantage is that you have to manually split the output into lines, and probably you could end up with unfinished lines. But this isn't a big deal, you can easily work around by omitting the last line alone from the output (and appropriately subtracting the last line number of bytes from old_file_size).

File manupulation search and replace csv php

I need a script that is finding and then replacing a sertain line in a CSV like file.
The file looks like this:
18:110327,98414,127500,114185,121701,89379,89385,89382,92223,89388,89366,89362,89372,89369
21:82297,79292,89359,89382,83486,99100
98:110327,98414,127500,114185,121701
24:82297,79292,89359,89382,83486,99100
Now i need to change the line 21.
This is wat i got so far.
The first 2 to 4 digits folowed by : ar a catergory number. Every number after this(followed by a ,) is a id of a page.
I acces te id's i want (i.e. 82297 and so on) from database.
//test 2
$sQry = "SELECT * FROM artikelen WHERE adviesprijs <>''";
$rQuery = mysql_query ($sQry);
if ( $rQuery === false )
{
echo mysql_error ();
exit ;
}
$aResult = array ();
while ( $r = mysql_fetch_assoc ($rQuery) )
{
$aResult[] = $r['artikelid'];
}
$replace_val_dirty = join(",",$aResult);
$replace_val= "21:".$replace_val_dirty;
// file location
$file='../../data/articles/index.lst';
// read the file index.lst
$file1 = file_get_contents($file);
//strip eerde artikel id van index.lst
$file3='../../data/articles/index_grp21.lst';
$file3_contents = file_get_contents($file3);
$file2 = str_replace($file3_contents, $replace_val, $file1);
if (file_exists($file)) {
echo "The file $filename exists";
} else {
echo "The file $filename does not exist";
}
if (file_exists($file3)) {
echo "The file $filename exists";
} else {
echo "The file $filename does not exist";
}
// replace the data
$file_val = $file2;
// write the file
file_put_contents($file, $file_val);
//write index_grp98.lst
file_put_contents($file3, $replace_val);
mail('info#', 'Aanbieding catergorie geupdate', 'Aanbieding catergorie geupdate');
Can anyone point me in the right direction to do this?
Any help would be appreciated.

You need to open the original file and go through each line. When you find the line to be changed, change that line.
As you can not edit the file while you do that, you write a temporary file while doing this, so you copy over line-by-line and in case the line needs a change, you change that line.
When you're done with the whole file, you copy over the temporary file to the original file.
Example Code:
$path = 'file';
$category = 21;
$articles = [111182297, 79292, 89359, 89382, 83486, 99100];
$prefix = $category . ':';
$prefixLen = strlen($prefix);
$newLine = $prefix . implode(',', $articles);
This part is just setting up the basics: The category, the IDs of the articles and then building the related strings.
Now opening the file to change the line in:
$file = new SplFileObject($path, 'r+');
$file->setFlags(SplFileObject::DROP_NEW_LINE | SplFileObject::SKIP_EMPTY);
$file->flock(LOCK_EX);
The file is locked so that no other process can edit the file while it gets changed. Next to that file, the temporary file is needed, too:
$temp = new SplTempFileObject(4096);
After setting up the two files, let's go over each line in $file and compare if it needs to be replaced:
foreach ($file as $line) {
$isCategoryLine = substr($line, 0, $prefixLen) === $prefix;
if ($isCategoryLine) {
$line = $newLine;
}
$temp->fwrite($line."\n");
}
Now the $temporary file contains already the changed line. Take note that I used UNIX type of EOF (End Of Line) character (\n), depending on your concrete file-type this may vary.
So now, the temporary file needs to be copied over to the original file. Let's rewind the file, truncate it and then write all lines again:
$file->seek(0);
$file->ftruncate(0);
foreach ($temp as $line) {
$file->fwrite($line);
}
And finally you need to lift the lock:
$file->flock(LOCK_UN);
And that's it, in $file, the line has been replaced.
Example at once:
$path = 'file';
$category = 21;
$articles = [111182297, 79292, 89359, 89382, 83486, 99100];
$prefix = $category . ':';
$prefixLen = strlen($prefix);
$newLine = $prefix . implode(',', $articles);
$file = new SplFileObject($path, 'r+');
$file->setFlags(SplFileObject::DROP_NEW_LINE | SplFileObject::SKIP_EMPTY);
$file->flock(LOCK_EX);
$temp = new SplTempFileObject(4096);
foreach ($file as $line) {
$isCategoryLine = substr($line, 0, $prefixLen) === $prefix;
if ($isCategoryLine) {
$line = $newLine;
}
$temp->fwrite($line."\n");
}
$file->seek(0);
$file->ftruncate(0);
foreach ($temp as $line) {
$file->fwrite($line);
}
$file->flock(LOCK_UN);
Should work with PHP 5.2 and above, I use PHP 5.4 array syntax, you can replace [111182297, ...] with array(111182297, ...) in case you're using PHP 5.2 / 5.3.

reading and counting words from pdf document

i have been working on this text extraction project of various file extensions,
but i am having the most pain with pdf and powerpoint,here is the code for pdf
any one here know how to read text from existing pdf documents using any tool or library tcpdf , xpdf or fpdfi because i havent seen any exact solution for reading text from pdf or ppt,but please no zend solutions
function pdf2txt($filename){
$data = getFileData($filename);
// grab objects and then grab their contents (chunks)
$a_obj = getDataArray($data,"obj","endobj");
foreach($a_obj as $obj){
$a_filter = getDataArray($obj,"<<",">>");
if (is_array($a_filter)){
$j++;
$a_chunks[$j]["filter"] = $a_filter[0];
$a_data = getDataArray($obj,"stream\r\n","endstream");
if (is_array($a_data)){
$a_chunks[$j]["data"] = substr($a_data[0],strlen("stream\r\n"),strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
}
}
}
// decode the chunks
foreach($a_chunks as $chunk){
// look at each chunk and decide how to decode it - by looking at the contents of the filter
$a_filter = split("/",$chunk["filter"]);
if ($chunk["data"]!=""){
// look at the filter to find out which encoding has been used
if (substr($chunk["filter"],"FlateDecode")!==false){
$data =# gzuncompress($chunk["data"]);
if (trim($data)!=""){
$result_data .= ps2txt($data);
} else {
//$result_data .= "x";
}
}
}
}
return $result_data;
}
// Function : ps2txt()
// Arguments : $ps_data - postscript data you want to convert to plain text
// Description : Does a very basic parse of postscript data to
// : return the plain text
// Author : Jonathan Beckett, 2005-05-02
function ps2txt($ps_data){
$result = "";
$a_data = getDataArray($ps_data,"[","]");
if (is_array($a_data)){
foreach ($a_data as $ps_text){
$a_text = getDataArray($ps_text,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
} else {
// the data may just be in raw format (outside of [] tags)
$a_text = getDataArray($ps_data,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
return $result;
}
// Function : getFileData()
// Arguments : $filename - filename you want to load
// Description : Reads data from a file into a variable
// and passes that data back
// Author : Jonathan Beckett, 2005-05-02
function getFileData($filename){
$handle = fopen($filename,"rb");
$data = fread($handle, filesize($filename));
fclose($handle);
return $data;
}
// Function : getDataArray()
// Arguments : $data - data you want to chop up
// $start_word - delimiting characters at start of each chunk
// $end_word - delimiting characters at end of each chunk
// Description : Loop through an array of data and put all chunks
// between start_word and end_word in an array
// Author : Jonathan Beckett, 2005-05-02
function getDataArray($data,$start_word,$end_word){
$start = 0;
$end = 0;
unset($a_result);
while ($start!==false && $end!==false){
$start = strpos($data,$start_word,$end);
if ($start!==false){
$end = strpos($data,$end_word,$start);
if ($end!==false){
// data is between start and end
$a_result[] = substr($data,$start,$end-$start+strlen($end_word));
}
}
}
return $a_result;
}
this one is for powerpoint i found here some where but that isnt working also
function parsePPT($filename) {
// This approach uses detection of the string "chr(0f).Hex_value.chr(0x00).chr(0x00).chr(0x00)" to find text strings, which are then terminated by another NUL chr(0x00). [1] Get text between delimiters [2]
$fileHandle = fopen($filename, "r");
$line = #fread($fileHandle, filesize($filename));
$lines = explode(chr(0x0f),$line);
$outtext = '';
foreach($lines as $thisline) {
if (strpos($thisline, chr(0x00).chr(0x00).chr(0x00)) == 1) {
$text_line = substr($thisline, 4);
$end_pos = strpos($text_line, chr(0x00));
$text_line = substr($text_line, 0, $end_pos);
$text_line = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t#\/\_\(\)]/","",$text_line);
if(substr($text_line,0,20)!="Click to edit Master")
if (strlen($text_line) > 1) {
$outtext.= substr($text_line, 0, $end_pos)."\n<br>";
}
}
}
return $outtext;
}

Why are you trying to reinvent the wheel? You could either resort to using ie. xpdf or a similar tool to extract the text data inside the PDF, and afterwards process the plain text file resulting from that operation. The same approach could be used for virtually any file format that contains text (ie. first convert to a plain text version, then process that)...
Indexing PDF Documents with Zend_Search_Lucene could be an interesting read if you opt for that solution.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

phpexcel reads long numbers as boolean - php

Related

PDF Files from database keep getting corrupted

Continue Loop if Term is in an Array (odd result)

Tailing Log File and Write results to new file

File manupulation search and replace csv php

reading and counting words from pdf document

Categories

Resources