Find the number of pages in a PDF using PHP - php

I have researched this for a few days now and finally found something that seemed to work, but I am getting the wrong result. I need to count the number of pages in a PDF file on a remote server. My code opens the PDF, but it's not finding the correct number of pages and I'm not sure why.
Here is my code so far:
$CI = &get_instance();
$CI->load->library('Awss3', null, 'S3');
$CI->load->library('Pdflib');
$data = $CI->S3->readFile('uploads/225572/filename.pdf', false, 'bucket-name');
$needle = 'Page';
$positions = array();
$lastPos = 0;
while (($lastPos = strpos($data, $needle, $lastPos))!==false) {
$positions[] = $lastPos;
$lastPos = $lastPos + strlen($needle);
}
echo count($positions);
foreach ($positions as $value) {
echo $value . '<br />';
}
$test = strpos($data, 'Page');
If I echo out the $data, I get lots of symbols, etc. and some words, but the $test comes out to 0 when it should be 16. Does it depend on the type of PDF or do I need to decode it or something like that?

Simplest of all is using ImageMagick
here is a sample code
$image = new Imagick();
$image->pingImage('myPdfFile.pdf');
echo $image->getNumberImages();
otherwise you can also use PDF libraries like MPDF or TCPDF for PHP

Related

How to read a csv file with php code inside?

i searched Google but found nothing what fits for my problem, or i search with the wrong words.
In many threads i read, the smarty Template was the solution, but i dont wont use smarty because its to big for this little project.
My problem:
I got a CSV file, this file contents only HTML and PHP code, its a simple html template document the phpcode i use for generating dynamic imagelinks for example.
I want to read in this file (that works) but how can i handle the phpcode inside this file, because the phpcode shown up as they are. All variables i use in the CSV file still works and right.
Short Version
how to handle, print or echo phpcode in a CSV file.
thanks a lot,
and sorry for my Bad english
Formatting your comment above you have the following code:
$userdatei = fopen("selltemplate/template.txt","r");
while(!feof($userdatei)) {
$zeile = fgets($userdatei);
echo $zeile;
}
fclose($userdatei);
// so i read in the csv file and the content of csv file one line:
// src="<?php echo $bild1; ?>" ></a>
This is assuming $bild1 is defined somewhere else, but try using these functions in your while loop to parse and output your html/php:
$userdatei = fopen("selltemplate/template.txt","r");
while(!feof($userdatei)) {
$zeile = fgets($userdatei);
outputResults($zeile);
}
fclose($userdatei);
//-- $delims contains the delimiters for your $string. For example, you could use <?php and ?> instead of <?php and ?>
function parseString($string, $delims) {
$result = array();
//-- init delimiter vars
if (empty($delims)) {
$delims = array('<?php', '?>');
}
$start = $delims[0];
$end = $delims[1];
//-- where our delimiters start/end
$php_start = strpos($string, $start);
$php_end = strpos($string, $end) + strlen($end);
//-- where our php CODE starts/ends
$php_code_start = $php_start + strlen($start);
$php_code_end = strpos($string, $end);
//-- the non-php content before/after the php delimiters
$pre = substr($string, 0, $php_start);
$post = substr($string, $php_end);
$code_end = $php_code_end - $php_code_start;
$code = substr($string, $php_code_start, $code_end);
$result['pre'] = $pre;
$result['post'] = $post;
$result['code'] = $code;
return $result;
}
function outputResults($string) {
$result = parseString($string);
print $result['pre'];
eval($result['code']);
print $result['post'];
}
Having PHP code inside a CSV file that should be parsed and probably executed using eval sounds pretty dangerous to me.
If I get you right you just want to have dynamic parameters in your CSV file right? If thats the case and you don't want to implement an entire templating language ( like Mustache, Twig or Smarty ) into your application you could do a simple search and replace thing.
$string = "<img alt='{{myImageAlt}}' src='{{myImage}}' />";
$parameters = [
'myImageAlt' => 'company logo',
'myImage' => 'assets/images/logo.png'
];
foreach( $parameters as $key => $value )
{
$string = str_replace( '{{'.$key.'}}', $value, $string );
}

PHP Get Newest File Path

I'm using the following to convert CSV to JSON (https://gist.github.com/robflaherty/1185299). I need to need to modify it so that instead of using the exact file url path, it's pulling the newest file url in the directory as it's "source" in $feed.
Any help would be great! I've tried using the code found here PHP: Get the Latest File Addition in a Directory, but can't seem to figure how modify it so that it would work.
<?php
header('Content-type: application/json');
// Set your CSV feed
$feed = 'http://myurl.com/test.csv';
// Arrays we'll use later
$keys = array();
$newArray = array();
// Function to convert CSV into associative array
function csvToArray($file, $delimiter) {
if (($handle = fopen($file, 'r')) !== FALSE) {
$i = 0;
while (($lineArray = fgetcsv($handle, 4000, $delimiter, '"')) !== FALSE) {
for ($j = 0; $j < count($lineArray); $j++) {
$arr[$i][$j] = $lineArray[$j];
}
$i++;
}
fclose($handle);
}
return $arr;
}
// Do it
$data = csvToArray($feed, ',');
// Set number of elements (minus 1 because we shift off the first row)
$count = count($data) - 1;
//Use first row for names
$labels = array_shift($data);
foreach ($labels as $label) {
$keys[] = $label;
}
// Add Ids, just in case we want them later
$keys[] = 'id';
for ($i = 0; $i < $count; $i++) {
$data[$i][] = $i;
}
// Bring it all together
for ($j = 0; $j < $count; $j++) {
$d = array_combine($keys, $data[$j]);
$newArray[$j] = $d;
}
// Print it out as JSON
echo json_encode($newArray);
?>
It's a difficult question to answer because there isn't enough detail.
Here are some questions that need answered.
1). Are you creating the csv files that are being read? If you are, you just make sure that the file you want to read is called "latest.csv" and when you go to create "latest.csv" you check for an existing "latest.csv" and rename/archive it first. Your directory then contains archives but the latest one is always of the same name.
2). If you are not creating the csv files then you might want to ask the provider of the csv files if there's a way for you to identify the latest one, as surely, if they are providing them they'd expect to be providing everyone the latest feed and have a mechanism of doing that.
3). If you don't know the provider and want to take a guess, have a look at how the files are named and try to predict the latest one. Eg, if they appear to be including a month and year in them do a file_exists() (if you can) on the predicted next latest file. Again, just a possibility.
Based on your comments, if the files reside on the same server or are accessible on a filesystem that supports the file functions, then:
array_multisort(array_map('filemtime', $files=glob('/path/to/*.csv')), SORT_DESC, $files);
$newest = $files[0];
For remote access you could look at something like this: How can I download the most recent file on FTP with PHP?

Php parse string error

I am extracting files from a string which can be entered by a user or taken from reading a page source.
I want to extract all .jpg image URLs
So, I am using the following (example text shown) but a) it only returns the first one and b) it misses off '.jpg'
$word1='http://';
$word2='.jpg';
$contents = 'uuuuyyyyyhttp://image.jpgandagainhereitishttp://image2.jpgxxxxcccffff';
$between=substr($contents, strpos($contents, $word1), strpos($contents, $word2) - strpos($contents, $word1));
echo $between;
Is there maybe a better way to do this?
In the case of parsing a web page I cannot use a simple DOM e.g. $images = $dom->getElementsByTagName('img'); as sometimes the image references are not in standard tags
You can do something like this :
<?php
$contents = 'uuuuyyyyyhttp://image.jpgandagainhereitishttp://image2.jpgxxxxcccffff';
$matches = array();
preg_match_all('#(http://[^\s]*?\.jpg)#i',$matches);
print_r($matches);
You can either do this using preg_match_all (as previously answered) or alternatively use the following function.
It simply explodes the original string, checks all parts for a valid link and adds it to the array, that's getting returned.
function getJpgLinks($string) {
$return = array();
foreach (explode('.jpg', $string) as $value) {
$position = strrpos($value, 'http://');
if ($position !== false) {
$return[] = substr($value, $position) . '.jpg';
}
}
return $return;
}

reading and counting words from pdf document

i have been working on this text extraction project of various file extensions,
but i am having the most pain with pdf and powerpoint,here is the code for pdf
any one here know how to read text from existing pdf documents using any tool or library tcpdf , xpdf or fpdfi because i havent seen any exact solution for reading text from pdf or ppt,but please no zend solutions
function pdf2txt($filename){
$data = getFileData($filename);
// grab objects and then grab their contents (chunks)
$a_obj = getDataArray($data,"obj","endobj");
foreach($a_obj as $obj){
$a_filter = getDataArray($obj,"<<",">>");
if (is_array($a_filter)){
$j++;
$a_chunks[$j]["filter"] = $a_filter[0];
$a_data = getDataArray($obj,"stream\r\n","endstream");
if (is_array($a_data)){
$a_chunks[$j]["data"] = substr($a_data[0],strlen("stream\r\n"),strlen($a_data[0])-strlen("stream\r\n")-strlen("endstream"));
}
}
}
// decode the chunks
foreach($a_chunks as $chunk){
// look at each chunk and decide how to decode it - by looking at the contents of the filter
$a_filter = split("/",$chunk["filter"]);
if ($chunk["data"]!=""){
// look at the filter to find out which encoding has been used
if (substr($chunk["filter"],"FlateDecode")!==false){
$data =# gzuncompress($chunk["data"]);
if (trim($data)!=""){
$result_data .= ps2txt($data);
} else {
//$result_data .= "x";
}
}
}
}
return $result_data;
}
// Function : ps2txt()
// Arguments : $ps_data - postscript data you want to convert to plain text
// Description : Does a very basic parse of postscript data to
// : return the plain text
// Author : Jonathan Beckett, 2005-05-02
function ps2txt($ps_data){
$result = "";
$a_data = getDataArray($ps_data,"[","]");
if (is_array($a_data)){
foreach ($a_data as $ps_text){
$a_text = getDataArray($ps_text,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
} else {
// the data may just be in raw format (outside of [] tags)
$a_text = getDataArray($ps_data,"(",")");
if (is_array($a_text)){
foreach ($a_text as $text){
$result .= substr($text,1,strlen($text)-2);
}
}
}
return $result;
}
// Function : getFileData()
// Arguments : $filename - filename you want to load
// Description : Reads data from a file into a variable
// and passes that data back
// Author : Jonathan Beckett, 2005-05-02
function getFileData($filename){
$handle = fopen($filename,"rb");
$data = fread($handle, filesize($filename));
fclose($handle);
return $data;
}
// Function : getDataArray()
// Arguments : $data - data you want to chop up
// $start_word - delimiting characters at start of each chunk
// $end_word - delimiting characters at end of each chunk
// Description : Loop through an array of data and put all chunks
// between start_word and end_word in an array
// Author : Jonathan Beckett, 2005-05-02
function getDataArray($data,$start_word,$end_word){
$start = 0;
$end = 0;
unset($a_result);
while ($start!==false && $end!==false){
$start = strpos($data,$start_word,$end);
if ($start!==false){
$end = strpos($data,$end_word,$start);
if ($end!==false){
// data is between start and end
$a_result[] = substr($data,$start,$end-$start+strlen($end_word));
}
}
}
return $a_result;
}
this one is for powerpoint i found here some where but that isnt working also
function parsePPT($filename) {
// This approach uses detection of the string "chr(0f).Hex_value.chr(0x00).chr(0x00).chr(0x00)" to find text strings, which are then terminated by another NUL chr(0x00). [1] Get text between delimiters [2]
$fileHandle = fopen($filename, "r");
$line = #fread($fileHandle, filesize($filename));
$lines = explode(chr(0x0f),$line);
$outtext = '';
foreach($lines as $thisline) {
if (strpos($thisline, chr(0x00).chr(0x00).chr(0x00)) == 1) {
$text_line = substr($thisline, 4);
$end_pos = strpos($text_line, chr(0x00));
$text_line = substr($text_line, 0, $end_pos);
$text_line = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t#\/\_\(\)]/","",$text_line);
if(substr($text_line,0,20)!="Click to edit Master")
if (strlen($text_line) > 1) {
$outtext.= substr($text_line, 0, $end_pos)."\n<br>";
}
}
}
return $outtext;
}
Why are you trying to reinvent the wheel? You could either resort to using ie. xpdf or a similar tool to extract the text data inside the PDF, and afterwards process the plain text file resulting from that operation. The same approach could be used for virtually any file format that contains text (ie. first convert to a plain text version, then process that)...
Indexing PDF Documents with Zend_Search_Lucene could be an interesting read if you opt for that solution.

Parse CSV file of links to php array, feed these links to simplehtmldom

I have a php code that will read and parse csv files into a multiline array, what i need to do next is to take this array and let simplehtmldom fire off a crawler to return some company stocks info.
The php code for the CSV parser is
$arrCSV = array();
// Opening up the CSV file
if (($handle = fopen("NASDAQ.csv", "r")) !==FALSE) {
// Set the parent array key to 0
$key = 0;
// While there is data available loop through unlimited times (0) using separator (,)
while (($data = fgetcsv($handle, 0, ",")) !==FALSE) {
// Count the total keys in each row $data is the variable for each line of the array
$c = count($data);
//Populate the array
for ($x=0;$x<$c;$x++) {
$arrCSV[$key][$x] = $data[$x];
}
$key++;
} // end while
// Close the CSV file
fclose($handle);
} // end if
echo "<pre>";
echo print_r($arrCSV);
echo "</pre>";
This works great and parses the array line by line, $data being the variable for each line. What i need to do now is to get this to be read via simplehtmldom, which is where it breaks down, im looking at using this code or something very similar, im pretty inexperienced at this but guess i would be needing a foreach statement somewhere along the line.
This is the simplehtmldom code
$html = file_get_html($data);
$html->find('div[class="detailsDataContainerLt"]');
$tickerdetails = ("$es[0]");
$FileHandle2 = fopen($data, 'w') or die("can't open file");
fwrite($FileHandle2, $tickerdetails);
fclose($FileHandle2);
fclose($handle);
So my qyestion is how can i get them both working together, i jave checked out simplehtmldom manual page several times and find it a littlebit vague in this area, the simplehtmldom code above is what i use in another function but by direclty linking so i know that it works.
regards
Martin
Your loop could be reduced to (yes, it's the same):
while ($data = fgetcsv($handle, 0, ',')) {
$arrCSV[] = $data;
}
Using SimpleXML instead of SimpleDom (Since it's standard PHP):
foreach ($arrCSV as $row) {
$xml = simplexml_load_file($row[0]); // Change 0 to the index of the url
$result = $xml->xpath('//div[contains(concat(" ", #class, " "), " detailsDataContainerLt")]');
if ($result->length > 0) {
$file = fopen($row[1], '2'); // Change 1 to the filename you want to write to
if ($file) {
fwrite($file, (string) $result->item(0));
fclose($file);
}
}
}
that should do it if I understood correctly...

Categories