I'm using library PDFparser (https://github.com/smalot/pdfparser) to convert PDF file to text.
When I try to convert a file on a local web-server, it parses OK. When I try to convert a file on remote web-server, it fails with the following error: TCPDF_PARSER ERROR: Invalid object reference: Array.
I couldn't find a proper solution in the bug-tracker of a library, although there is exist the similar question (it didn't solved for two years).
How can I avoid this error? Or should I use another library for converting pdf to text (which)?
I am using it straight as mentioned in documentation:
use Smalot\PdfParser\Parser;
$this->parser = new Parser;
if (file_exists($full_path) && !is_dir($full_path)) {
$paper->text = $this->parser->parseFile($full_path)->getText();
}
Related
I am using Spatie\PdfToImage in my Symfony application to change PDFs into images. Here is the function I am using:
public function savePdfPreviewImage($fullFilePath, $thumbnailPath)
{
$pdf = new Pdf($fullFilePath);
$pdf->saveImage($thumbnailPath);
return $this;
}
When given a path to a PDF, the library returns this message:
An image could not be created from the given input
How can I go about finding a solution to this?
So far I have tried verifying with an ls that the file exists in the place where the app thinks it is. I have also tried opening the file -- which has a .pdf file extension -- in a PDF reader to verify that it is not corrupt. Neither of those two actions yielded any clues.
=====
Edit 1: I traced this message back to the Imagine.php file, where I removed an error-suppression line. That gave me this slightly less opaque message:
Warning: imagecreatefromstring(): Data is not in a recognized format
====
Edit 2: I have also verified that ghostscript is installed. The gs command is available from my server environment. I have also verified that the path provided for $thumbnailPath is a valid path/filename ending in .jpg.
I figured out what the problem was.
The conversion to PDF was actually behaving just fine. What was misbehaving was a later call within the application to the Imagine.php library, unsuccessfully resizing the image that my code successfully created. Here is the code that allowed me to see this:
public function savePdfPreviewImage($fullFilePath, $thumbnailPath)
{
//$pdf = new Pdf($fullFilePath);
//$pdf->saveImage($thumbnailPath); //This gives us "An image could not be created from the given input" and "Data is not in a recognized format"
//Let's try it with a manual call to GhostScript instead ...
exec(
'gs -o ' . //This creates the image successfully, but the error still shows up.
$thumbnailPath . //That means the error isn't coming from here, since we're no longer calling any external PHP libraries.
' -sDEVICE=jpeg ' .
$fullFilePath
);
return $this;
}
You do not need a lib for this, imagick alone will do fine.
$pdf = new \Imagick();
$pdf->setColorspace(\Imagick::COLORSPACE_SRGB);
$pdf->readImage($srcPath);
$pdf->setIteratorIndex($pageNo);
$pdf->writeImage($outPath);
(Imagick will recognize the file extension from $outPath)
I was trying to upload a video of size close to 50 MB using Azure's PHP SDK.
I ran into this error:
PHP Fatal error: Call to undefined method MicrosoftAzure\\Storage\\Blob\\Models\\CreateBlobOptions::getUseTransactionalMD5() in /var/www/<domain>/vendor/microsoft/azure-storage-blob/src/Blob/BlobRestProxy.php on line 1941
It only happens if I pass blob options to set correct mime type (video/mp4 in this case). If I upload the same video without setting blob options then it works just fine. Many other videos work fine even with the blob options set to video/mp4. The error is throwing me off.
Any guesses why it treats getUseTransactionalMD5 as an undefined method in this case?
Here is the minimal code
use MicrosoftAzure\Storage\Blob\BlobRestProxy;
use MicrosoftAzure\Storage\Common\Exceptions\ServiceException;
$connString = "DefaultEndpointsProtocol=http;AccountName=" . AZURE_ACCOUNTNAME . ";AccountKey=" . AZURE_KEY;
$blobRestProxy = BlobRestProxy::createBlobService($connString);
$content = fopen($file['tmp_name'], "r");
$contentType = "video/mp4";
$options = new CreateBlobOptions();
$options->setContentType($contenttype);
blobRestProxy->createBlockBlob("mycontainer", "myblob", $content, $options);
As mentioned in the comment, please change the following line of code:
$options = new CreateBlobOptions();
to
$options = new CreateBlockBlobOptions();
And that will fix the problem.
Essentially the reason you're getting this error is because of a breaking change in the SDK where they included the use of transaction MD5 in all upload/download methods. However, it was exposed in CreateBlockBlobOptions class through getUseTransactionalMD5() and not in CreateBlobOptions class. Because you're using latter instead of former, you're getting this error message.
I am trying to load xml content from a URL containing about 60MB of data. When I do that using simple XML built in library I keep getting the following error:
PHP Warning: DOMDocument::loadXML(): internal error: Huge input lookup in Entity, line: 845125
And the script is being stopped. What's wrong? How can I deal with this?
Sample url I use:
http://foo.com/feed.xml
I'm using PDFParser by Smalot
Some of my PDF's can be converted without any issues using PDF2Text. Unfortunately it often returns empty content or no line breaks for example. While the demo of PDFParser from Smalot converts all my pdf's without any problem. After installig with composer when using it I get a Connection Reset (ERR_CONNECTION_RESET).
I've tried set_time_limit to increase my execution time and setting the memory limit up to 1024M, but both had no success.
Calling my php from the command line works fine.
I've also tried to convert these pdf's on http://pdfparser.org/demo and here they get converted without any problem.
I'm not doing anything special in my index file.
Here's the content if it is in anyway useful
// Include Composer autoloader if not already done.
include 'vendor/autoload.php';
// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('test.pdf');
$text = $pdf->getText();
echo $text;
I've got a webservice which expects a parameter of type "xs:base64Binary" - this is a file to store in the database.
I'm trying to consume the service using PHP 5's native webservice classes. I've tried a few things:
// Get the posted file
$file = file_get_contents($_FILES['Filedata']['tmp_name']);
// Add the file, encoding it as a base64
$parameters = array("fileBytes" => base64_encode($file));
// Call the webservice
$response = $client->attachFile($parameters);
The result is an error saying "Bad Request." If the file is a text file and I don't base64_encode, it works fine. Problem results when posting a binary file such as an image.
Anyone know the trick here?
EDIT 1
Also problematic is if I encode the text file, it seems to work but of course it's encoded and ends up being junk once downloaded and viewed again (i.e, the text is encoded and doesn't seem to get de-coded by the server).
As far as I know, base64_encode() should be doing the job.
Are you 100% sure $file contains something? Have you made a dump?
Ok, so it seems there is no need to use base64_encode. The file_get_contents already puts it into the required format.
Additionally, the problem was because I had the server side config setting for the maxArrayLength too low.