Determine if a PDF is Corrupt - php

How can I determine if a PDF file is corrupt (not openable) in PHP? I have downloaded thousands of PDFs via CURL and a small number are incomplete.

$part = 'pdffile.pdf';
$escPath = str_replace( " ", "\\ ", escapeshellcmd( $part ) );
$out = shell_exec( 'pdfinfo ' . $escPath . ' 2>&1' );
if( $out != null && !preg_match( '~Error~i', $out ) )
echo "GOOD: $part\n";
else
echo "CORRUPT: $part\n";
I can only find a way to do this via the command line. The second line is required to escape file paths.

Related

Downloading file to server from url

I have a laravel application where I download files to my server from given URLs. I am using the following code to do this.
$file_name = $files_directory . str_replace( " ", "-", $_POST['file_name'] ) . $_POST['file_extension'];
if ( file_put_contents( $file_name, fopen( $file_url, 'r' ) ) !== false ) {
$success = true;
$msg = "File Downloaded Successfully";
}
I am using user input to create a filename and extension. Is there a way to get the filename and extension from the URL response? Or is there a better way to approach this problem?
I think, you will have problems with the solution . Because you havn't put try/catch cases and you hasn't validated file extensions. And these can bring security issuses in future. You have to change your script like this:
$file_name = $files_directory . str_replace( " ", "-", $_POST['file_name'] ) . $_POST['file_extension'];
try {
if(in_array(mb_strtolower($_POST['file_extension']), ['jpg','png','...permitted_extenions.....'])){
if ( file_put_contents( $file_name, fopen( $file_url, 'r' ) ) !== false ) {
$success = true;
$msg = "File Downloaded Successfully";
}
}else throw new Exception('Errors with extention');
}catch(\Exception $e){
echo $e->getMessage();
}

How to format data when writng to txt file with php?

This is my code:
<?php
//check if the form allows user input to be extracted
if(isset($_POST['email']))
//if so loop begins
{
//creates a variable called $data that contains the user input for a particular input name from the form
//writes to txt file
$myfile = fopen("form.txt", "w") or die("Unable to open file!");
$email = "Email: ";
fwrite($myfile, $email);
fclose($myfile);
//appends to txt file
$data=$_POST['email'];
//creates a variable called $fp that contains the function fopen which opens a file called form.txt
$fp = fopen('form.txt', 'a');
//initiates the function fwrite which displays the user input in the txt file
fwrite($fp, $data); //when echo in html use div center tag
//closes txt file with the function fclose
fclose($fp);
}
if(isset($_POST['title']))
{
$data=$_POST['title'];
$fp = fopen('form.txt', 'a');
fwrite($fp, $data);
fclose($fp);
}
if(isset($_POST['date']))
{
$data=$_POST['date'];
$fp = fopen('form.txt', 'a');
fwrite($fp, $data);
fclose($fp);
}
if(isset($_POST['link']))
{
$data=$_POST['link'];
$fp = fopen('form.txt', 'a');
fwrite($fp, $data);
fclose($fp);
}
?>
I want to know how to write the user input to the txt file in php with a space between the header and the input and a linebreak after every one. Whenever I try to use the 'w' more than once the text from the first time it was used is not displayed.
Whenever I try to use the 'w' more than once the text from the first time it was used is not displayed.
Referring to the manual fopen
'w' - write mode will create a new file or reset the pointer to the beginning of the file (overwrite the existing content)
'a' - append mode will set the pointer to the end of the file (add content to the end)
So, that's why you're seeing that behaviour.
With regards to
I want to know how to write the user input to the txt file in php with a space between the header and the input and a linebreak after every one.
I'm unclear from your code and comments what exactly you consider to be the Header, Input and "each one". You code refers to a non-existent loop.
So, with the assumption that this actually all just executes in one go and "header is $email and that "input" and "each one" is every instance of $data. Your code should look something more like the following.
(Caveat: I only had a couple of minutes so it could be improved upon with string formatting and such, and I am assuming some php versioning)
Use PHP_EOL for cross-platform end of line. Refer When do I use the PHP constant "PHP_EOL"? for more information.
Again an assumption that this is all one post action and that you want a clean file for each email. If so, from the code you've provided, there appears no need to open the file multiple times.
if( isset( $_POST['email'] ) ) {
$myfile = fopen("form.txt", "w") or die("Unable to open file!");
fwrite($myfile,
"Email: " . $_POST['email'] . PHP_EOL .
( isset( $_POST['title'] ) ? ( $_POST['title'] . PHP_EOL ) : '' ) .
( isset( $_POST['date'] ) ? ( $_POST['date'] . PHP_EOL ) : '' ) .
( isset( $_POST['link'] ) ? ( $_POST['link'] . PHP_EOL ) : '' ) .
);
fclose($myfile);
}
However, if all fields are not captured at the same time, and there is no need for ordering of the entries to be relevant in the file, nor for it to be "clean" then just use append mode instead of write mode.
$myfile = fopen("form.txt", "a") or die("Unable to open file!");
fwrite($myfile,
( isset( $_POST['email'] ) ? ( "Email: " . $_POST['email'] . PHP_EOL ) : '') .
( isset( $_POST['title'] ) ? ( $_POST['title'] . PHP_EOL ) : '' ) .
( isset( $_POST['date'] ) ? ( $_POST['date'] . PHP_EOL ) : '' ) .
( isset( $_POST['link'] ) ? ($_POST['link'] . PHP_EOL ) : '' )
);
fclose($myfile);

How do I extract from WordPress database to MS Excel?

I am trying to add data into excel file which is extracted from wordpress database, Actually I am trying to export data (tags) from database into excel file. And I write down a code, but when I click on generate button. This generates empty file.
Please guys check what I am doing wrong.
Codes are below:
if (check_admin_referer('tag-export'))
{
$blogname = str_replace(" ", "", get_option('blogname'));
$date = date("m-d-Y");
$xls_file_name = $blogname."-exported-tags-".$date;
$tags = get_terms( 'post_tag' , 'hide_empty=0' );
$count = count($tags);
if ( $count > 0 )
{
echo 'name' . "\t" . 'slug' . "\n";
foreach ( $tags as $tag )
{
echo $tag->name . "\t" . $tag->slug . "\n";
}
}
ob_clean();
echo $xls_file;
header( "Content-Type: application/vnd.ms-excel" );
header( "Content-disposition: attachment; filename=$xls_file_name.xls" );
exit();
}
The above codes are not writing data into excel file. please check and let me know.
Just based on your existing code:
if (check_admin_referer('tag-export'))
{
$blogname = str_replace(" ", "", get_option('blogname'));
$date = date("m-d-Y");
$xls_file_name = $blogname."-exported-tags-".$date;
$tags = get_terms( 'post_tag' , 'hide_empty=0' );
$count = count($tags);
$xls_file = '';
if ( $count > 0 )
{
$xls_file .= 'name' . "\t" . 'slug' . "\n";
foreach ( $tags as $tag )
{
$xls_file .= $tag->name . "\t" . $tag->slug . "\n";
}
}
ob_clean();
header( "Content-Type: application/vnd.ms-excel" );
header( "Content-disposition: attachment; filename=$xls_file_name.xls" );
echo $xls_file;
exit();
}
A more general suggestion, not a solution for your coding problem: create an HTML table file from the code and then open it in Excel for conversion. Doing it so you'll have a better understand on what's going on with your code: you can add var_dumps or simply debug it like a normal web page.
Having an html table is also useful because excel works quite well in converting it to XLS files.
After your HTML file works well, then you can apply necessary formatting/header to the code in order to create the xls file from scratch.

Backup Entire Website Using PHP

Using PHP, I am developing a CMS. This needs to support website backups.
Musts:
Compressed ZIP Folders
Must work on at least Linux and Windows
Must work on PHP 5.0, PHP 4 would be nice
I just need a function/class, don't link me open-source software as I need to do this my self
CMS does not need MySQL backups as it is XML powered
I've already checked into ZipArchive in PHP. Here is all I got so far. However when I try to go to the ZIP file on the server that it says it created, I get a 404? It isn't working and I don't know why.
<?php
$filename = CONTENT_DIR . 'backups/' . date( 'm-d-Y_H-i-s' ) . '.zip';
if ( $handle = opendir( ABS_PATH ) ) {
$zip = new ZipArchive();
if ( $zip->open( $filename, ZIPARCHIVE::CREATE ) !== true ) {
exit( "cannot open <$filename>\n" );
}
$string = '';
while ( ( $file = readdir( $handle ) ) !== false ) {
$zip->addFile( $file );
$string .= "$file\n<br>";
}
closedir( $handle );
$string .= "Status of the Zip Archive: " . $zip->status;
$string .= "<br>System status of the Zip Archive: " . $zip->statusSys;
$string .= "<br>Number of files in archive: " . $zip->numFiles;
$string .= "<br>File name in the file system: " . $zip->filename;
$string .= "<br>Comment for the archive: " . $zip->comment;
$zip->close();
echo $string;
}
?>

MediaWiki + Graphviz + Image maps + Pagelinks

Background: Working with MediaWiki 1.19.1, Graphviz 2.28.0, Extension:GraphViz 0.9 on WAMP stack (Server 2008, Apache 2.4.2, MySQL 5.5.27, PHP 5.4.5). Everything is working great and as expected for the basic functionality of rendering a clickable image from a Graphviz diagram using the GraphViz extension in MediaWiki.
Problem: The links in the image map are not added to the MediaWiki pagelinks table. I get why they aren't added but it becomes an issue if there is no way to follow the links back with the 'What links here' functionality.
Desired solution: During the processing of the diagram in the GraphViz extension, I would like to use the generated .map file to then create a list of wikilinks to add on the page to get picked up by MediaWiki and added to the pagelinks table.
Details:
This GraphViz extension code:
<graphviz border='frame' format='png'>
digraph example1 {
// define nodes
nodeHello [
label="I say Hello",
URL="Hello"
]
nodeWorld [
label="You say World!",
URL="World"
]
// link nodes
nodeHello -> nodeWorld!
}
</graphviz>
Generates this image:
And this image map code in a corresponding .map file on the server:
<map id="example1" name="example1">
<area shape="poly" id="node1" href="Hello" title="I say Hello" alt="" coords="164,29,161,22,151,15,137,10,118,7,97,5,77,7,58,10,43,15,34,22,31,29,34,37,43,43,58,49,77,52,97,53,118,52,137,49,151,43,161,37"/>
<area shape="poly" id="node2" href="World" title="You say World!" alt="" coords="190,125,186,118,172,111,152,106,126,103,97,101,69,103,43,106,22,111,9,118,5,125,9,133,22,139,43,145,69,148,97,149,126,148,152,145,172,139,186,133"/>
</map>
From that image map file, I would like to be able to extract the href and title to build wikilinks like so:
[[Hello|I say Hello]]
[[World|You say World!]]
I'm guessing that since that .map file is essentially XML that I could just use XPATH to query the file, but that is just a guess. PHP is not my strongest area and I don't know the best approach to going about the XML/XPATH option or if that is even the best approach to pull that info from the file.
Once I got that collection/array of wikilinks from the .map file, I'm sure I can hack up the GraphViz.php extension file to add it to the contents of the page to get it added to the pagelinks table.
Progress: I had a bit of an Rubber Duck Problem Solving moment right as I submitted the question. I realized that since I had well formed data in the image map, that XPATH was probably the way to go. It was fairly trivial to be able to pull the data I needed, especially since I found that the map file contents was stilled stored in a local string variable.
$xml = new SimpleXMLElement( $map );
foreach($xml->area as $item) {
$links .= "[[" . $item->attributes()->href . "|" . $item->attributes()->title . "]]";
}
Final Solution: See my accepted answer below.
Thanks for taking a look. I appreciate any assistance or direction you can offer.
I finally worked through all of the issues and now have a fairly decent solution to render the graph nicely, provide a list of links, and register the links with wiki. My solution doesn't fully support all of the capabilities of the current GraphViz extension as it is written as there is functionality we do not need and I do not want to support. Here are the assumptions / limitations of this solution:
Does not support MscGen: We only have a need for Graphviz.
Does not support imageAtrributes: We wanted to control the format and presentation and it seemed like there were inconsistencies in the imageAttributes implementation that would then cause further support issues.
Does not support wikilinks: While it would be nice to provide consistent link usage through wiki and the Graphviz extension, the reality is that Graphviz is a completely different markup environment. While the current extension 'supports' wikilinks, the implementation is a little weak and leaves areas for confusion. Example: Wikilinks support giving the link an optional description but Graphviz already uses the node label for the description. So then you end up ignoring the wikilink description and telling users that 'Yes, we support wikilinks but don't use the description part' So since we aren't really using wikilinks correctly, just implement a regular link implementation and try to avoid the confusion entirely.
Here is what the output looks like:
Here are the changes that were made
Comment out this line:
// We don't want to support wikilinks so don't replace them
//$timelinesrc = rewriteWikiUrls( $timelinesrc ); // if we use wiki-links we transform them to real urls
Replace this block of code:
// clean up map-name
$map = preg_replace( '#<ma(.*)>#', ' ', $map );
$map = str_replace( '</map>', '', $map );
if ( $renderer == 'mscgen' ) {
$mapbefore = $map;
$map = preg_replace( '/(\w+)\s([_:%#/\w]+)\s(\d+,\d+)\s(\d+,\d+)/',
'<area shape="$1" href="$2" title="$2" alt="$2" coords="$3,$4" />',
$map );
}
/* Procduce html
*/
if ( $wgGraphVizSettings->imageFormatting )
{
$txt = imageAtrributes( $args, $storagename, $map, $outputType, $wgUploadPath ); // if we want borders/position/...
} else {
$txt = '<map name="' . $storagename . '">' . $map . '</map>' .
'<img src="' . $wgUploadPath . '/graphviz/' . $storagename . '.' . $outputType . '"' .
' usemap="#' . $storagename . '" />';
}
With this code:
$intHtml = '';
$extHtml = '';
$badHtml = '';
// Wrap the map/area info with top level nodes and load into xml object
$xmlObj = new SimpleXMLElement( $map );
// What does map look like before we start working with it?
wfDebugLog( 'graphviz', 'map before: ' . $map . "\n" );
// loop through each of the <area> nodes
foreach($xmlObj->area as $areaNode) {
wfDebugLog( 'graphviz', "areaNode: " . $areaNode->asXML() . "\n" );
// Get the data from the XML attributes
$hrefValue = (string)$areaNode->attributes()->href;
$textValue = (string)$areaNode->attributes()->title;
wfDebugLog( 'graphviz', '$hrefValue before: ' . $hrefValue . "\n" );
wfDebugLog( 'graphviz', '$textValue before: ' . $textValue . "\n" );
// For the text fields, multiple spaces (" ") in the Graphviz source (label)
// turns into a regular space followed by encoded representations of
// non-breaking spaces ("   ") in the .map file which then turns
// into the following in the local variables: ("   ").
// The following two options appear to convert/decode the characters
// appropriately. Leaving the lines commented out for now, as we have
// not seen a graph in the wild with multiple spaces in the label -
// just happened to stumble on the scenario.
// See http://www.php.net/manual/en/simplexmlelement.asxml.php
// and http://stackoverflow.com/questions/2050723/how-can-i-preg-replace-special-character-like-pret-a-porter
//$textValue = iconv("UTF-8", "ASCII//TRANSLIT", $textValue);
//$textValue = html_entity_decode($textValue, ENT_NOQUOTES, 'UTF-8');
// Now we need to deal with the whitespace characters like tabs and newlines
// and also deal with them correctly to replace multiple occurences.
// Unfortunately, the \n and \t values in the variable aren't actually
// tab or newline characters but literal characters '\' + 't' or '\' + 'n'.
// So the normally recommended regex '/\s+/u' to replace the whitespace
// characters does not work.
// See http://stackoverflow.com/questions/6579636/preg-replace-n-in-string
$hrefValue = preg_replace("/( |\\\\n|\\\\t)+/", ' ', $hrefValue);
$textValue = preg_replace("/( |\\\\n|\\\\t)+/", ' ', $textValue);
// check to see if the url matches any of the
// allowed protocols for external links
if ( preg_match( '/^(?:' . wfUrlProtocols() . ')/', $hrefValue ) ) {
// external link
$parser->mOutput->addExternalLink( $hrefValue );
$extHtml .= Linker::makeExternalLink( $hrefValue, $textValue ) . ', ';
}
else {
$first = substr( $hrefValue, 0, 1 );
if ( $first == '\\' || $first == '[' || $first == '/' ) {
// potential UNC path, wikilink, absolute or relative path
$hrefValue = '#InvalidLink';
$badHtml .= Linker::makeExternalLink( $hrefValue, $textValue ) . ', ';
$textValue = 'Invalid link. Check Graphviz source.';
}
else {
$title = Title::newFromText( $hrefValue );
if ( is_null( $title ) ) {
// invalid link
$hrefValue = '#InvalidLink';
$badHtml .= Linker::makeExternalLink( $hrefValue, $textValue ) . ', ';
$textValue = 'Invalid link. Check Graphviz source.';
}
else {
// internal link
$parser->mOutput->addLink( $title );
$intHtml .= Linker::link( $title, $textValue ) . ', ';
$hrefValue = $title->getFullURL();
}
}
}
$areaNode->attributes()->href = $hrefValue;
$areaNode->attributes()->title = $textValue;
}
$map = $xmlObj->asXML();
// The contents of $map, which is now XML, gets embedded
// in the HTML sent to the browser so we need to strip
// the XML version tag and we also strip the <map> because
// it will get replaced with a new one with the correct name.
$map = str_replace( '<?xml version="1.0"?>', '', $map );
$map = preg_replace( '#<ma(.*)>#', ' ', $map );
$map = str_replace( '</map>', '', $map );
// Let's see what it looks like now that we are done with it.
wfDebugLog( 'graphviz', 'map after: ' . $map . "\n" );
$txt = '' .
'<table style="background-color:#f9f9f9;border:1px solid #ddd;">' .
'<tr>' .
'<td style="border:1px solid #ddd;text-align:center;">' .
'<map name="' . $storagename . '">' . $map . '</map>' .
'<img src="' . $wgUploadPath . '/graphviz/' . $storagename . '.' . $outputType . '"' . ' usemap="#' . $storagename . '" />' .
'</td>' .
'</tr>' .
'<tr>' .
'<td style="font:10px verdana;">' .
'This Graphviz diagram links to the following pages:' .
'<br /><strong>Internal</strong>: ' . ( $intHtml != '' ? rtrim( $intHtml, ' ,' ) : '<em>none</em>' ) .
'<br /><strong>External</strong>: ' . ( $extHtml != '' ? rtrim( $extHtml, ' ,' ) : '<em>none</em>' ) .
( $badHtml != '' ? '<br /><strong>Invalid</strong>: ' . rtrim($badHtml, ' ,') .
'<br /><em>Tip: Do not use wikilinks ([]), UNC paths (\\) or relative links (/) when creating links in Graphviz diagrams.</em>' : '' ) .
'</td>' .
'</tr>' .
'</table>';
Possible enhancements:
It would be nice if the list of links below the graph were sorted and de-duped.

Categories