Im trying to create a dynamic pdf using php,
I can load the pdf once the form is submitted but soon as I try to edit the pdf it fails to load
I shortened the code to keep it straight forward
PDF Structure
Your Name is : <<NAME>>
PHP Dynamic Script
set_time_limit(180);
//Create Variable Names
$name = $_POST['name'];
function pdf_replace($pattern, $replacement, $string) {
$len = strlen($pattern);
$regexp = '';
for($i = 0; $i <$len; $i++) {
$regexp .= $pattern[$i];
if($i < $len - 1) {
$regexp .= "(\)\-{0,1}[0-9]*\(){0,1}";
}
}
return ereg_replace ($regexp, $replacement, $string);
}
header('Content-Disposition: filename=cert.pdf');
header('Content-type: application/pdf');
//$date = date('F d, Y');
$filename = 'testform.pdf';
$fp = fopen($filename, 'r');
$output = fread($fp, filesize($filename));
fclose($fp);
//Replace the holders
$output = pdf_replace('<<NAME>>', $name, $output);
echo $output;
If I comment out the output it loads the form fine but soon as I try to run the function to replace the placeholder it fails to load. Anyone do something like this before?
I've tried your code and I can assume there are one of the following reasons that can cause the problem:
When you call pdf_replace() PHP returns Deprecated notice on ereg_replace() function. This breaks PDF structure and cause PDF fail to load. This function is deprecated since PHP 5.3.0. Simple solution is to start using preg_replace() instead.
function pdf_replace($pattern, $replacement, $string) {
return preg_replace('/'.preg_quote($pattern,'/').'/', $replacement, $string);
}
In case you can't do it then the solution is to either edit php.ini file and edit error_reporting parameter. You can add "^ E_DEPRECATED" to your current config value to disable Deprecated notices. Other option is to add error_reporting() at the beginning of your script with appropriate value.
I do not see the PDF you use but some PDF generators encode PDF source. In this case it is a problem to find text there. For example I've tried "Print to PDF" feature on Mac and I was not able to find plain text in source there. So either fopen or ereg_replace can complain about wrong file format. In this case you should use some library that can work with PDF in more clever manner. I prefer FPDF but there are plenty of such libraries.
I'm a web designer and I use ezPDF to create pdf files for making reports and views it in any browsers. Here's a webpage that i looked for the tutorials: http://www.weberdev.com/get_example.php3?ExampleID=4804
I hope this would be helpful :)
Your code is seemingly having two problems.
First of all, ereg_replace() is a DEPRECATED function. Due to this the php script is throwing an error and the error message breaks the pdf structure. Change the function to preg_replace() and it should work.
Secondly, I tried your code with a sample pdf-form. It seems that ereg_replace() cannot process some characters. In the pdf-form that I used, this function truncates the string(the pdf-form data) after it meets a specific character namely, ยง. Thats why even if you suppress the the error using error_reporting(E_ALL ^ E_DEPRECATED);,
the code will not work even then.
So, you better go for preg_replace();
<?php
set_time_limit(180);
//Create Variable Names
$name = $_POST['name'];
function pdf_replace($pattern, $replacement, $string) {
$len = strlen($pattern);
$regexp = '';
for($i = 0; $i <$len; $i++) {
$regexp .= $pattern[$i];
if($i < $len - 1) {
$regexp .= "(\)\-{0,1}[0-9]*\(){0,1}";
}
}
return preg_replace ($regexp, $replacement, $string);
}
header('Content-Disposition: filename=cert.pdf');
header('Content-type: application/pdf');
//$date = date('F d, Y');
$filename = 'testform.pdf';
$fp = fopen($filename, 'r');
$output = fread($fp, filesize($filename));
//fclose($fp);
//Replace the holders
$output = pdf_replace('<<NAME>>', $name, $output);
echo $output;
?>
It is not possible to replace a text string in a PDF without very specific requirements on the original PDF document.
Some comments on such project:
A PDF document uses byte offsets to allow fast access to specific objects within the document. By changing a string these offsets will get invalid and the PDF document can be seen as damaged.
At the end most content streams in a PDF document are compressed. So the string you are searching for is (maybe) in one of these streams and not "visible".
A string you "see" after a PDF is rendered has not to be the same as in the PDF source.
The replaced/new string may use characters which are not available in the used font or they are matched to other characters by an separate encoding.
And much more things to consider...
why you are not using str_replace?
please check if this code is working for you (assumed you can see the words you want to replace in simple text editor):
// hide all errors so there will be no errors output which will break your PDF file
error_reporting(0);
ini_set("display_errors", 0);
//Create Variable Names
$name = $_POST['name'];
$filename = 'testform.pdf';
$fp = fopen($filename, 'r');
$output = fread($fp, filesize($filename));
fclose($fp);
//Replace the holders
$output = str_replace('<<NAME>>', $name, $output);
// I added the time just to avoid browser caching of the PDF file
header('Content-Disposition: filename=cert'.time().'.pdf');
header('Content-type: application/pdf');
echo $output;
There are many PHP projects to generate dynamic PDF files using PHP, why you dont use them, why you are using this regular expression?, probably you will have some designs you will
have to add in the future to the pdf such as tables logos and stuff and the easiest way is to convert HTML to PDF.
If you are looking for performance and almost %100 of look alike the HTML and CSS you have you should use wkhtmltopdf its free and easy to use it runs on your server but it is not PHP it is executable file that you will have to execute it from your PHP code.
http://wkhtmltopdf.org/
another alternative is to use is pure PHP unlike the previous which is executable
http://www.tcpdf.org/
I used both and prefer the wkhtmltopdf because its faster and priciest in HTML and CSS
Related
I have tried many things like How to extract text from word file .doc,docx,.xlsx,.pptx php.
But this isn't a solution.
My server is Linux based so enabling extension=php_com_dotnet.dll is not the solution.
Another solution was installing LIBRE office on server and converting the .doc file to .txt on the fly and then counting the words from that file. This is very tedious job and time consuming.
I just need a simple php script that removes the special characters from the .doc file and count the number of words.
You can try with this PHP class that claims to be able to convert both .doc and .docx files in textual format.
http://www.phpclasses.org/package/7934-PHP-Convert-MS-Word-Docx-files-to-text.html
According to the example given, that's how you can use it:
require("doc2txt.class.php");
$docObj = new Doc2Txt("test.docx");
//$docObj = new Doc2Txt("test.doc");
$txt = $docObj->convertToText();
echo $txt;
As you pointed out, the core function of this library, as of many others, is something like this:
<?php
function read_doc($filename)
{
$fileHandle = fopen($filename, "r");
$line = #fread($fileHandle, filesize($filename));
$lines = explode(chr(0x0D) , $line);
$outtext = "";
foreach($lines as $thisline)
{
$pos = strpos($thisline, chr(0x00));
if (($pos !== FALSE) || (strlen($thisline) == 0))
{
}
else
{
$outtext.= $thisline . " ";
}
}
$outtext = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t#\/_()]/", "", $outtext);
return $outtext;
}
echo read_doc("sample.doc");
?>
I've tested this function with a .doc file and it seems to work quite well. It needs some fixes with the last part of the document (there is still some random text that is generated at the end of the output), but with some fine tuning it works reasonably.
EDIT:
You are right, this functions works correctly only with .docx documents (the document I tested was probably made using the same mechanism). Saving a file with .doc extension, this function doesn't work!
The only help I'm able to give you right now is the .doc binary specifications link (here is an even more complete file), where you can actually see how the binary structure is made and extract the informations from there. I can't do it now, so I hope that somebody else may help you through this!
At the end i had to use Libreoffice. But its very efficient to use it. It solved my all the problem.
So my advice would be to install the 'HEADLESS' package of libreoffice on server and use the command line conversion
I've built a tool that incorporates various methods found around the web and on Stack Overflow that provides word, line and page counts for doc, docx, pdf and txt files. I hope it's of use to people. If anyone can get rtf working with it I'd love a pull request! https://github.com/joeblurton/doccounter
I try to read in a bash script from a text file and print it to the screen via php.
I tried
$code = #file_get_contents( $myFileName );
as well as
$code = "";
$myFile = fopen($myFileName, "r");
while ($line = fgets($myFile)) {
$code .= $line;
}
However, the string I get from reading in the file doesn't contain all of the file's contents. The problem is that the text file contains the string
<<EOF
After that the String abruptly stops.
How come? It seems weird to me that php isn't able to deal with those few characters and misinterpret them as the actual EOF.
Is there a way I can read in the whole file?
Thanks in advance!
When I try it, I don't experience that problem therefore, presumably, you are outputting the text to an HTML document and testing your code by looking at the rendering of that document in a browser (as opposed to looking at the raw output of the script, as would appear in View > Source).
In HTML < indicates the start of a tag. You need to escape your HTML with htmlspecialchars() for < to be treated as data instead of markup.
I am trying to replace strings in a word document by reading the file into a variable $content and then using str_ireplace() to change the string. I can read the content from the file but I str_ireplace() does not seem to be able to replace the string. I assumed it would because the string is 'binary safe' according to the PHP documentation. Sorry, I am a beginner with PHP file manipulation so all this is quite new to me.
This is what I have written.
copy('jack.doc' , 'newFile.doc');
$handle = fopen('newFile.doc','rb');
$content = '';
while (!feof($handle))
{
$content .= fread($handle, 1);
}
fclose($handle);
$handle = fopen('newFile.doc','wb');
$content = str_ireplace('USING_ICT_BOX', 'YOUR ICT CONTENT', $content);
fwrite($handle, $content);
fclose($handle);
When I download the new file, it opens as it should in MS Word but it shows the old string and not the one that should be replaced.
Can I fix this issue? Is there any better tool I can use for replacing strings in MS Word thourgh PHP?
I have same requirement for Edit .doc or .docx file using php and i have find solution for it.
And i have write post on It :: http://www.onlinecode.org/update-docx-file-using-php/
copy('jack.doc' , 'newFile.doc');
$full_path = 'newFile.doc';
if($zip_val->open($full_path) == true)
{
// In the Open XML Wordprocessing format content is stored.
// In the document.xml file located in the word directory.
$key_file_name = 'word/document.xml';
$message = $zip_val->getFromName($key_file_name);
$timestamp = date('d-M-Y H:i:s');
// this data Replace the placeholders with actual values
$message = str_replace("client_full_name", "onlinecode org", $message);
$message = str_replace("client_email_address", "ingo#onlinecode.org", $message);
$message = str_replace("date_today", $timestamp, $message);
$message = str_replace("client_website", "www.onlinecode.org", $message);
$message = str_replace("client_mobile_number", "+1999999999", $message);
//Replace the content with the new content created above.
$zip_val->addFromString($key_file_name, $message);
$zip_val->close();
}
Maybe this would point you to the right direction: http://davidwalsh.name/read-pdf-doc-file-php
Solutions I've found so far (not tested though):
Docvert - works for Doc, free, but not directly usable
PHPWordLib - works for Doc, not free
PHPDocX - DocX only, needs Zend.
I am going to opt for PHPWord www.phpword.codeplex.com as I believe teachers are going to get Office 2007 next year and also I will try and find some way to convert between .docx and .doc through PHP to support them in the mean time.
If you can reach a web-service, look at Docmosis Cloud services since it can mailmerge a doc file with your data and give you back a doc/pdf/other. You can https post to the service to make the request so is pretty straight forward from PHP.
There is many way to handle word document file on linux
antiword - not much effective as it converts into plain text.
pyODconvert
open-office or liboffice - through UNO
unoconv utility - need to installation permission on server
There is one python script which is most usable for online file conversion but you need to convert those file through command line.
There is no specific and satisfied solution to handle word files by only using php code.
I hunted for a long time to reach at this suggestion.
first of all: I really love this site and I think this is the best forum for programming :)
Now to my problem, which I try to display using code and comments:
$file = fopen ($URL, "r");
// $URL is a string set before, which is correct
// Also, I got the page-owners permission to acquire the page like that
if (!$file)
{
echo "<p>Could not open file.\n";
exit;
}
while (!feof ($file))
{
$buffer = fgets($file);
$buffer= strstr($buffer, "Montag</b>");
// If I don't use this line, the whole page gets displayed...
// If I use this line, only the first line after the needle gets displayed
echo $buffer;
}
fclose($file);
So basically, I'm able to display the whole page, or one line after the needle, but not everything after the needle....
I tried to find a solution using the PHP Reference, the Stackoverflow Search engine and of course google, but I couldn't find a solution, thanks for everybody willing to help me.
Greetings userrr3
Extracting text from the file
You are only grabbing one line at a time from the file using fgets() DOCs if you want the whole file then use file_get_contents() DOCs instead:
$file = file_get_contents($URL);
$buffer= strstr($file, "Montag</b>");
// If I don't use this line, the whole page gets displayed...
// If I use this line, only the first line after the needle gets displayed
echo $buffer;
Grabbing your text
This can be achieved using PHPs substr() DOCs function combined with strpos() DOCs:
$buffer = substr($buffer, strpos($buffer, 'Montag</b>'));
This will grab all the text after the first occurrence of the needle Montag</b>.
Putting it all together
$file = file_get_contents($URL);
$buffer = substr($file, strpos($buffer, 'Montag</b>'));
echo $buffer;
I did PHP coding using an XML file whose source code I copied manually, it looks like
<title type='text'>content I've extracted</title>
<content type='text'>content I've extracted</content>
Now everything is done and when I generate the content by PHP coding and when I try to extract the things from title and content tags the output is not generated...when I cross checked I found the PHP generated file (source code, RSS feed)looks like
<title type=\'text\'>content to be extracted </title>
<content type=\'text\'>content to be extracted</content>
As there are backward slashes it is not able to extract the content, I guess
The sample PHP code which I'm using to get contents from those tags is
$titles = $entry->getElementsByTagName( "title" );
$title = $titles->item(0)->nodeValue;
$descrs = $entry->getElementsByTagName( "content" );
$descr = $descrs->item(0)->nodeValue;
How can I proceed?
This is the PHP code which I used to generate XML
$url='http://gdata.youtube.com/feeds/api/playlists/12345';
$fp = fopen($url, 'r');
$buffer='';
if ($fp) {
while (!feof($fp))
$buffer .= fgets($fp, 1024);
fclose($fp);
file_put_contents('feed.xml', $buffer);
I found the solution
$buff=stripslashes($buffer);
file_put_contents('ka.xml', $buff);
so stripslashes() function removes backslash and it works
It looks like you have magic quotes enabled.
If magic_quotes_runtime is enabled, most functions that return data from any sort of external source including databases and text files will have quotes escaped with a backslash.
So when you use fgets to read in the file, any quotes will be escaped. Magic Quotes are deprecated as of PHP 5.3. You should not use them in your script.
Also see http://www.php.net/manual/en/security.magicquotes.php
On a sidenote, your approach to copy the file is much more complicated than it needs to be. All of these would work for saving the remote XML to a file:
$src = 'http://gdata.youtube.com/feeds/api/playlists/E6DE6EC9A373AF57?v=2';
copy($src, 'dest.xml');
or
$src = 'http://gdata.youtube.com/feeds/api/playlists/E6DE6EC9A373AF57?v=2';
file_put_contents('dest.xml', file_get_contents($src));
or
$src = 'http://gdata.youtube.com/feeds/api/playlists/E6DE6EC9A373AF57?v=2';
stream_copy_to_stream(fopen($src, 'r'), fopen('dest.xml', 'w+'));