I'm using Laravel7 with Aws-S3 as file storage.
The files are PDF only and and I want to add search files feature i.e. If a user search for a text, I want to list all PDF files that has the matching text.
Is this something possible using Laravel alone or using Aws/S3?
I know I can extract all the text of a file on upload and store it in database, and when user search for a text, I can search it from database using %LIKE% query but this will take a huge database space.
I'm looking for something better.
Related
I have 70gb PDF files, and I want to search inside them with PHP and some Ajax.
The code must search on all PDF files and extract the data out into table,
For example: 1547AD
When I hit enter the code will search in all PDF files and extract all PDF files that contain "1547AD" inside them.
My problem is: of course putting these data inside MySQL will be better for the server and stronger but imagine extracting all tables in 70GB of PDF files! and these pdf files updated daily, also there is alot of traffic on this page.
My question is: Is it the right way to build this in PHP or I should use another language and/or another method for this kind of heavy data?
I'm trying do do a research page that search every pdf from my database that contain the keyword I'm searching.
The problem is I can't have my pdf + the raw text inside my Database (I am extremely short on the space...)
What I am doing right now is when a user search something, On all my pdf, one by one I use a .php I transform the PDF into raw text then search for the keywords... But this is really long before having a result. and I fear when they'll be many user my server won't like it. (I just assume that I've never used server before and I don't really know what is good or bad)
Would it be worth it for me to add space on my server to put all the raw text from my pdf into the database aswell so I can search with Mysql query ? or is there a smarter way to do it i didn't think of ?
(I don't have the PDF inside the database, just the path, so i can't get space on that)
I've been tasked with creating a search system that will help users to navigate through multiple 1000+ page pdf files. However, these files will first have to be put on a MySQL DB. The issue that i'm currently having is how do I store these PDF files on the DB and assign the relevant PDF headers to the DB.
Example:
Adding each Part/Header/Section/Subsection individually on the DB in different tables.
Would this all have to be manually entered? Bare in mind we are talking 100,000s pages + of PDF.
Thanks
You would be better to store some meta data in the database, and store the location of the PDF file.
i.e. a table called 'documents' may have the following fields:
id,path,keywords,category
The path would be: /some/location/to/my/pdf/file.pdf
The keywords could be; 'pdf1, what is a pdf, some search text'
This will allow you to store the pdf files.
Alternatively you could use something like Google - they allow you to use their search technology. It used to be in the form of a 'google yellow box' but I believe it's now part of their cloud stuff!
HTH
I am trying to use PHP to read the text from a PDF file that is stored in a mysql database. I tried using class.pdf2text.php, which works with an actual file. I tried to use the MYSQL_RESULT variable with the pdf file contents with that class, but it doesn't work. I've got to be missing something really easy, I just know it.
Basically, this is what I'm trying to do:
I have a database with PDF files. I need to convert a PDF from that database to text and then search on that text for certain data. Is there a way to do this without creating external files in PHP?
I have a task where I need to upload about 50 msword document in mysql database, it is not a problem. But in the "admin" how can I develop a script where it can find the string in the database and it also can find the string in the file like msword?
For example, I find the word "programming" in the search box, and the word is only in the file "msword"!
So the main problem is how to develop search script that can read data in mysql and in the file like msword?
I am currently using "wordpress" with "contact form 7" to develop upload form.
Can someone give me some suggestions.
You should not only save the Word files as binary files (which are not searchable by MySQL) but should also extract the text within the Word files and save it as a TEXT in the DB. Then you could search with wildcards (i.e. 'WHERE text LIKE "%searchterm%"') and find the words in the files.
Note however that this will become very slow with more and larger files!