I am trying to use PHP to read the text from a PDF file that is stored in a mysql database. I tried using class.pdf2text.php, which works with an actual file. I tried to use the MYSQL_RESULT variable with the pdf file contents with that class, but it doesn't work. I've got to be missing something really easy, I just know it.
Basically, this is what I'm trying to do:
I have a database with PDF files. I need to convert a PDF from that database to text and then search on that text for certain data. Is there a way to do this without creating external files in PHP?
Related
I am pretty new to PHP and MySQL, and brand new to PHPWord. I have done multiple searches and cannot find the solution I am looking for.
I already have the PHP code to extract the plain text from an actual docx file located on the Windows file system.
My situation:
Some MS Word docx files are stored as blobs in the database using base64_encode(). I need to find a specified string in one or more of these files.
No rows are returned when I attempted to use FROM_BASE64('columnname') because the data is in the docx format, so the string does not match anything in the data.
Then I used a simple SELECT to get a docx file into a string variable.
I would like to avoid creating a temporary file to parse using PHPWord which would have to be done for each docx file.
My question(s):
Is there a way to use the string variable as the source for PHPWord processing? Or does PHPWord require the source to be an actual file on the file system?
So I save .docx files in my database as mediumblob, when I want to show it, it looks like this:
PK!ߤ�lZ [Content_Types].xml
I already have a class that can read this, but that class needs a file and can't read from the database directly:
How to extract text from word file .doc,docx,.xlsx,.pptx php
Is there a way to directly read this from the database without needing to use file_put_contents();?
Thank you for helping.
I have 70gb PDF files, and I want to search inside them with PHP and some Ajax.
The code must search on all PDF files and extract the data out into table,
For example: 1547AD
When I hit enter the code will search in all PDF files and extract all PDF files that contain "1547AD" inside them.
My problem is: of course putting these data inside MySQL will be better for the server and stronger but imagine extracting all tables in 70GB of PDF files! and these pdf files updated daily, also there is alot of traffic on this page.
My question is: Is it the right way to build this in PHP or I should use another language and/or another method for this kind of heavy data?
How can i detect hidden text from a pdf file using php. I know that there are different ways on how to hide a text from a pdf file. So I need to know if there is a way to check a pdf file if it has a hidden text in it using php.
I am still not familiar with the php libraries for pdf so I need to know which one does the job for this.
Can we able to read the data from a PDF file to PHP.
We are able to read data from an excel or csv file, that can be directly imported to db using PHP.
Similarly is there any way to read data from a pdf file, and import it to database using PHP.
For Eg:
in a pdf file i have a table employee details
can we able to import that data to db using php..
You can try something like PDF Parser - http://www.pdfparser.org/
This will allow you to extract text from a PDF. From there, you can create a script to parse the extracted text and insert it into your database.