Setting the directory to include subdirectories using html2text - php

Long time reader, first time poster. I know just enough about php to be dangerous and this is my first BIG project using it.
Some background:
I have over 1 million (yes, million) .html files that were generated from an old news gathering program. These .html files contain important archive information that needs to be searched on daily basis. I have yet to get to other servers which might very well have more so 2-3 million+ is not out of the question.
I am taking these .html files and transferring them into a mysql database. At least, so far, the code has worked wonderfully with several hundred test files. I'll attach the code at the end.
The problem starts when the .html files are archived, and it's a function of the box generating the archive which cannot be changed, is the files go into folders. They are broken down like this
archives>year>month>file.html
so an example is
archives>2002>05may>lots and lots of files.html
archives>2002>06june>lots and lots of files.html
archives>2002>07july>lots and lots of files.html
With help and research, I wrote code to strip the files of markup that includes html2text and simple_html_dom and put the information from each tag in the proper fields in my database, which works great. But ALL of the files need to be moved to the same directory for it to work. Again, over a million and possibly more for other severs takes a REALLY long time to move. I am using a batch file to robocopy the files now.
My question is this:
Can I use some sort of wildcard to define all of the subdirectories so I don't have to move all of the files and they can stat in their respective directories?
Top of my code:
// Enter absolute path of folder with HTML files in it here (include trailing slash):
$directory = "C:\\wamp1\\www\\name\\search\\files\\";
The subdirectories are under the files directory.
In my searches for an answer, I have seen "why would you want to do that?" or other questions asking about .exe files or .bat files in the directories and how it could be dangerous so don't do it. My question is just for these html files so there is nothing being called or running and no danger.
Here is my code for stripping the html into the database. Again, works great, but I would like to skip the step of having to move all of the files into one directory.
<?php
// Enter absolute path of folder with HTML files in it here (include trailing slash):
$directory = "C:\\wamp1\\www\\wdaf\\search\\files\\";
// Enter MySQL database variables here:
$db_hostname = "localhost";
$db_username = "root";
$db_password = "password";
$db_name = "dbname";
$db_tablename = "dbtablename";
/////////////////////////////////////////////////////////////////////////////////////
// Include these files to strip all characters that we don't want
include_once("simple_html_dom.php");
include_once("html2text.php");
//Connect to the database
mysql_connect($db_hostname, $db_username, $db_password) or trigger_error("Unable to connect to the database host: " . mysql_error());
mysql_select_db($db_name) or trigger_error("Unable to switch to the database: " . mysql_error());
//scan the directory and look for all the htmls files
$files = scandir($directory);
for ($filen = 0; $filen < count($files); $filen++) {
$html = file_get_html($directory . $files[$filen]);
// first check if $html->find exists
if (method_exists($html,"find")) {
// then check if the html element exists to avoid trying to parse non-html
if ($html->find('html')) {
//Get the filename of the file from which it will extract
$filename = $files[$filen];
//define the path of the files
$path = "./files/";
//Combine the patha and filename
$fullpath = $path . $filename;
// Get our variables from the HTML: Starts with 0 as the title field so use alternate ids starting with 1 for the information
$slug = mysql_real_escape_string(convert_html_to_text($html->find('td', 8)));
$tape = mysql_real_escape_string(convert_html_to_text($html->find('td', 9)));
$format0 = mysql_real_escape_string(convert_html_to_text($html->find('td', 10)));
$time0 = mysql_real_escape_string(convert_html_to_text($html->find('td', 11)));
$writer = mysql_real_escape_string(convert_html_to_text($html->find('td', 12)));
$newscast = mysql_real_escape_string(convert_html_to_text($html->find('td', 13)));
$modified = mysql_real_escape_string(convert_html_to_text($html->find('td', 14)));
$by0 = mysql_real_escape_string(convert_html_to_text($html->find('td', 15)));
$productionCues = mysql_real_escape_string(convert_html_to_text($html->find('td', 16)));
$script = mysql_real_escape_string(convert_html_to_text($html->find('td', 18)));
// Insert variables into a row in the MySQL table:
$sql = "INSERT INTO " . $db_tablename . " (`path`, `fullpath`, `filename`, `slug`, `tape`, `format0`, `time0`, `writer`, `newscast`, `modified`, `by0`, `productionCues`, `script`) VALUES ('" . $path . "', '" . $fullpath . "', '" . $filename . "', '" . $slug . "', '" . $tape . "', '" . $format0 . "', '" . $time0 . "', '" . $writer . "', '" . $newscast . "', '" . $modified . "', '" . $by0 . "', '" . $productionCues . "', '" . $script . "');";
$sql_return = mysql_query($sql) or trigger_error("Query Failed: " . mysql_error());
}
}
}
?>
Thanks in advance,
Mike

Just wanted to update this post with a answer to my question that works quite well. With some help, we found that scandir used recursively to create an array would work.
I thought I'd post this so if anyone else was looking to do something similar, they would be able wouldn't have to look far! I know I like to see answers!
The code is from the second user-contributed note here with a few modifications: http://php.net/manual/en/function.scandir.php
so in my code above, I replaced
//scan the directory and look for all the htmls files
$files = scandir($directory);
for ($filen = 0; $filen < count($files); $filen++) {
$html = file_get_html($directory . $files[$filen]);
with
function import_dir($directory, $db_tablename) {
$cdir = scandir($directory);
foreach ($cdir as $key => $value)
{
if (!in_array($value,array(".","..")))
{
if (is_dir($directory . DIRECTORY_SEPARATOR . $value))
{
// Item in this directory is sub-directory...
import_dir($directory . DIRECTORY_SEPARATOR . $value,$db_tablename);
}
else
// Item in this directory is a file...
{
$html = file_get_html($directory . DIRECTORY_SEPARATOR . $value);
and then for the filenames, replaced
//Get the filename of the file from which it will extract
$filename = $files[$filen];
//define the path of the files
$path = "./files/";
//Combine the patha and filename
$fullpath = $path . $filename;
with
//Get the filename of the file from which it will extract
$filename = mysql_real_escape_string($value);
//define the path of the files
$path = mysql_real_escape_string($directory . DIRECTORY_SEPARATOR);
//Combine the patha and filename
$fullpath = $path . $value;
Thanks to those who answered!
Mike

I'm not sure how long it would take before your PHP query times out, but there is an inbuilt function RecursiveDirectoryIterator which sounds like it might do the trick for you.

Related

How to generate thumbnail from PDF upload in laravel

I am trying to generate a thumbnail of the PDF I upload in laravel the thumbnail should be the first page of the PDF. Right now I am manually uploading an image to make the thumbnail like this:
if (request()->has('pdf')) {
$pdfuploaded = request()->file('pdf');
$pdfname = $request->book_name . time() . '.' . $pdfuploaded->getClientOriginalExtension();
$pdfpath = public_path('/uploads/pdf');
$pdfuploaded->move($pdfpath, $pdfname);
$book->book_file = '/uploads/pdf/' . $pdfname;
$pdf = $book->book_file;
}
if (request()->has('cover')) {
$coveruploaded = request()->file('cover');
$covername = $request->book_name . time() . '.' . $coveruploaded->getClientOriginalExtension();
$coverpath = public_path('/uploads/cover');
$coveruploaded->move($coverpath, $covername);
$book->card_image = '/uploads/cover/' . $covername;
}
This can be tedious while entering many data I want to generate thumbnail automatically. I searched many answers but I am not able to find laravel specific. I tried to use ImageMagic and Ghost script but I couldn't find a solution and proper role to implement.
Sorry, can't comment yet!
You can use spatie/pdf-to-image to parse the first page as image when file is uploaded and store it in your storage and save the link in your database.
First you need to have php-imagick and ghostscript installed and configured. For issues with ghostscript installation you can refer this. Then add the package composer require spatie/pdf-to-image.
As per your code sample:
if (request()->has('pdf')) {
$pdfuploaded = request()->file('pdf');
$pdfname = $request->book_name . time() . '.' . $pdfuploaded->getClientOriginalExtension();
$pdfpath = public_path('/uploads/pdf');
$pdfuploaded->move($pdfpath, $pdfname);
$book->book_file = '/uploads/pdf/' . $pdfname;
$pdf = $book->book_file;
$pdfO = new Spatie\PdfToImage\Pdf($pdfpath . '/' . $pdfname);
$thumbnailPath = public_path('/uploads/thumbnails');
$thumbnail = $pdfO->setPage(1)
->setOutputFormat('png')
->saveImage($thumbnailPath . '/' . 'YourFileName.png');
// This is where you save the cover path to your database.
}

Store image in new directory in PHP

so i can upload my photo from my Android app fine to /var/www/html/ProductPhotos but when i want to get the name of the Product and use that as the name of the new directory and image name then its not working. I create the new directory and /var/www/html/ProductPhotos with 777 permissions (even though its super bad) but for now its what i need. here's my PHP code:
<?php
$ProductAccountName = $_POST['ProductAccountName'];
$ProductName = $_POST['ProductName'];
$ProductImage = $_POST['EncodedImage'];
$NewDirectory = "/var/www/html/ProductPhotos/" . $ProductAccountName;
mkdir($NewDirectory, 0777, true);
//$DecodedProductImage = base64_decode("$ProductImage");
//$ProductName = $ProductName .".JPG";
file_put_contents("/var/www/html/ProductPhotos/" . $ProductAccountName, $ProductName . ".JPG", $DecodedProductImage);
?>
You're using a comma instead of a period. And you're missing a slash.
file_put_contents("/var/www/html/ProductPhotos/" . $ProductAccountName . "/" . $ProductName . ".JPG", $DecodedProductImage);`
See the file_put_contents docs.
You may want to be put into place some checks to make sure the user doesn't use relative paths(using ../ as part the ProductAccountName, for example). Just be careful of the user using this to do malicious things.

Laravel 5.1 file upload, File's same name for folder and database

My controller code where i store the file name into a database table and also moving the file to a folder.
The issue is that i am storing the original name of a file in database table, in contrast i am moving files with uniqueid() and time() . It will arise issues in future. because in database table file name and moved file are with different names.
if(Input::hasFile('profile_pic')){
$pic = Input::file('profile_pic');
$mobile->photo1 = $pic[0]->getClientOriginalName();
$mobile->photo2 = $pic[1]->getClientOriginalName();
$mobile->photo3 = $pic[2]->getClientOriginalName();
$mobile->photo4 = $pic[3]->getClientOriginalName();
$mobile->photo5 = $pic[4]->getClientOriginalName();
foreach ($pic as $k=>$file){
if(!empty($file)){
$file->move(public_path() . '/uploads/', time() . uniqid() . '-' . $k . '-laptop');
}
}
}
You can try to use something like that:
if(Input::hasFile('profile_pic')){
$pic = Input::file('profile_pic');
foreach ($pic as $k=>$file){
if(!empty($file)){
$temp = $k+1;
$mobile->photo.$temp = time() . uniqid() . '-' . $k . '-laptop';
$file->move(public_path() . '/uploads/', $mobile->photo.$temp);
}
}
}
You can store both names in your database. Store one as original_name and one as generated_name for example.
And you can serve the file with the original name by retrieving it from the database if you want to let your users download it. It should look like this:
$photo = Photo::find(1);
return response()->download($photo->generated_filename, $photo->filename);

Copy file to folder if exist +1

I'm new in php.
Every 17 seconds my php code generates id1.php and id2.php in this folder:"sitename.com/"
So, every 17 seconds i've got this:
sitename.com/master.php
sitename.com/id1.php
sitename.com/id2.php
sitename.com/filebag/id1.php
sitename.com/filebag/id2.php
sitename.com/filebag/id3.php
after script generates id1.php and id2.php i need to copy this files to sitename/filebag/ and if filename exist add +1. so in the end, i must get this situation:
sitename.com/
sitename.com/master.php
sitename.com/filebag/id1.php
sitename.com/filebag/id2.php
sitename.com/filebag/id3.php
sitename.com/filebag/id4.php
sitename.com/filebag/id5.php and so on...
i use master.php to do a replace
<?php
$idname = "id1";
copy ("./id1.php","./filebag/$idname.php");
?>
question is how can i rename file if filename exist in this folder "sitename.com/filebag/"
This code delivers what you want:
<?php
$stringName = "id";
$numName = "1";
$file_orig = './' . $stringName . $numName . '.php';
$file_dest = './filebag/' . $stringName . $numName . '.php';
if(file_exists($file_dest))
{
$count = (int)$numName;
while(file_exists($file_dest))
{
$count = $count + 1;
$file_dest = './filebag/' . $stringName . $count . '.php';
}
}
copy ($file_orig, $file_dest);
?>

Save 'Username' as Filename

I wonder whether someone could help me please.
I'm using Image Uploader from Aurigma, and to save the uploaded images, I've put this script together.
<?php
//This variable specifies relative path to the folder, where the gallery with uploaded files is located.
//Do not forget about the slash in the end of the folder name.
$galleryPath = 'UploadedFiles/';
require_once 'Includes/gallery_helper.php';
require_once 'ImageUploaderPHP/UploadHandler.class.php';
/**
* FileUploaded callback function
* #param $uploadedFile UploadedFile
*/
function onFileUploaded($uploadedFile) {
$packageFields = $uploadedFile->getPackage()->getPackageFields();
$userid = $packageFields["userid"];
$locationid= $packageFields["locationid"];
global $galleryPath;
$absGalleryPath = realpath($galleryPath) . DIRECTORY_SEPARATOR;
$absThumbnailsPath = $absGalleryPath . 'Thumbnails' . DIRECTORY_SEPARATOR;
if ($uploadedFile->getPackage()->getPackageIndex() == 0 && $uploadedFile->getIndex() == 0) {
initGallery($absGalleryPath, $absThumbnailsPath, FALSE);
}
$dirName = $_POST['folder'];
$dirName = preg_replace('/[^a-z0-9_\-\.()\[\]{}]/i', '_', $dirName);
if (!is_dir($absGalleryPath . $dirName)) {
mkdir($absGalleryPath . $dirName, 0777);
}
$path = rtrim($dirName, '/\\') . '/';
$originalFileName = $uploadedFile->getSourceName();
$files = $uploadedFile->getConvertedFiles();
// save converter 1
$sourceFileName = getSafeFileName($absGalleryPath, $originalFileName);
$sourceFile = $files[0];
/* #var $sourceFile ConvertedFile */
if ($sourceFile) {
$sourceFile->moveTo($absGalleryPath . $sourceFileName);
}
// save converter 2
$thumbnailFileName = getSafeFileName($absThumbnailsPath, $originalFileName);
$thumbnailFile = $files[1];
/* #var $thumbnailFile ConvertedFile */
if ($thumbnailFile) {
$thumbnailFile->moveTo($absThumbnailsPath . $thumbnailFileName);
}
//Load XML file which will keep information about files (image dimensions, description, etc).
//XML is used solely for brevity. In real-life application most likely you will use database instead.
$descriptions = new DOMDocument('1.0', 'utf-8');
$descriptions->load($absGalleryPath . 'files.xml');
//Save file info.
$xmlFile = $descriptions->createElement('file');
$xmlFile->setAttribute('name', $_POST['folder'] . '/' . $originalFileName);
$xmlFile->setAttribute('source', $sourceFileName);
$xmlFile->setAttribute('size', $uploadedFile->getSourceSize());
$xmlFile->setAttribute('originalname', $originalFileName);
$xmlFile->setAttribute('thumbnail', $thumbnailFileName);
$xmlFile->setAttribute('description', $uploadedFile->getDescription());
//Add additional fields
$xmlFile->setAttribute('userid', $userid);
$xmlFile->setAttribute('locationid', $locationid);
$xmlFile->setAttribute('folder', $dirName);
$descriptions->documentElement->appendChild($xmlFile);
$descriptions->save($absGalleryPath . 'files.xml');
}
$uh = new UploadHandler();
$uh->setFileUploadedCallback('onFileUploaded');
$uh->processRequest();
?>
What I'd like to do is replace the files element of the filename and replace it with the username, so each saved folder and associated files can be indentified to each user.
I've added a username text field to the form which this script saves from
I think I'm right in saying that this is line that needs to change $descriptions->save($absGalleryPath . 'files.xml');.
So amongst many attempts I've tried changing this to $descriptions->save($absGalleryPath . '$username.xml, $descriptions->save($absGalleryPath . $username '.xml, but none of these have worked, so I'm not quite sure what I need to change.
I just wondered whether someone could perhaps have a look at this please and let me know where I'm going wrong.
Many thanks
'$username.xml' will be interpreted as $username.xml, you need to use "$username.xml". Single quotes "disable" the variable use inside strings.
What you are tryiing can be a bad idea, as you are making so a username can't contain 'special characters' like "/". Perhaps is not a problem if you aready have a rule that stop "/" being part of a username.

Categories