How can I detect a file change during a chunking upload?

How can I detect a file change during a chunking upload? - php

I am doing a chunk file upload to a server and I am interested in such a thing as determining whether this file has changed.
Suppose I send a file of size 5 mb. The size of one chunk is 1 mb. 4 chunks are sent to the server, and the last one did not go because of a broken connection.
After the document was changed at the beginning and the first chunk of this document no longer matches the first chunk on the server, the last chunk is loaded, but the contents are already different.
In order to determine that one of the chunks has been changed, you need to re-send all the chunks to the server to get the hash of the amount, but then the whole load loses its meaning.
How can I determine that a file has been modified without sending all the chunks to the server?
Additionally:
Uploading files is as follows:
First, the browser sends a request to create a new session for uploading files.
Request params:
part_size: 4767232
files: [
{
"file_size":4767232,
"file_type":"application/msword",
"file_name":"5 mb.doc"
}
]
Next:
New records about uploaded files are added to the database.
The server creates temporary folders for storing chunks.
(Folder name is the guids of the file record created by the database.)
The method returns file guids.
After receiving the guids, the browser divides the files into chunks using the JavaScript Blob.slice() method and sends each chunk as a separate request, attaching the file identifier to the request.
Chunks are saved and after the last chunk has been uploaded the file is collected.
Code:
/**
* #param $binary - The binary data.
* #param $directory - Current file with chunks directory path.
*/
private static function createChunk($binary, $directory)
{
//Create a unique id for the chunk.
$id = 'chunk_' . md5($binary);
// Save the chunk to dad with the rest of the chunks of the file.
Storage::put($directory . '/' . $id, $binary);
// We get a json file with information about uploading session.
$session = self::uploadSessionInfo($directory);
// Increase the number of loaded chunks by 1 and add a new element to the subarray chunks.
$session['chunks_info']['loaded_chunks'] = $session['chunks_info']['loaded_chunks'] + 1;
$session['chunks_info']['chunks'][] = [
'chunk_id' => $id
];
// Save the modified session file.
Storage::put($directory . '/session.json', json_encode($session));
// If the number of loaded chunks and the number of total chunks are the same, then we create the final file.
if ($session['chunks_info']['total_chunks'] === $session['chunks_info']['loaded_chunks']) {
Storage::put($directory . '/' . $session['file_name'], []);
foreach ($session['chunks_info']['chunks'] as $key => $value) {
$file = fopen(storage_path() . '/app/' . $directory . '/' . $value['chunk_id'], 'rb');
$buff = fread($file, 2097152);
fclose($file);
$final = fopen(storage_path() . '/app/'. $directory . '/' . $session['file_name'], 'ab');
$write = fwrite($final, $buff);
fclose($final);
}
}
}
Visually looks like this:

Related

Creating multiple tar files in batches duration increases with every batch

I've written a script to get all image records from a database, use each image's tempname to find the imagine on disk, copy it to a new folder and create a tar file out of these files. To do so, I'm using PHP's PharData. The problem is the images being TIFF files, and pretty large ones at that (the entire folder of 2000-ish images is 95gb in size).
Initially I created one archive, looped through all database records to find the specific file and used PharData::addFile() to add each file to the archive individually, but this eventually lead to adding a single file taking 15+ seconds.
I've now switched to using PharData::buildFromDirectory in batches, which is significantly faster, but the time to create batches increases with each batch. The first batch got done in 28 seconds, the second in 110, the third one didn't even finish. Code;
$imageLocation = '/path/to/imagefolder';
$copyLocation = '/path/to/backupfolder';
$images = [images];
$archive = new PharData($copyLocation . '/images.tar');
// Set timeout to an hour (adding 2000 files to a tar archive takes time apparently), should be enough
set_time_limit(3600);
$time = microtime(true);
$inCurrentBatch = 0;
$perBatch = 100; // Amount of files to be included in one archive file
$archiveNumber = 1;
foreach ($images as $image) {
$path = $imageLocation . '/' . $image->getTempname();
// If the file exists, copy to folder with proper file name
if (file_exists($path)) {
$copyName = $image->getFilename() . '.tiff';
$copyPath = $copyLocation . '/' . $copyName;
copy($path, $copyPath);
$inCurrentBatch++;
// If the current batch reached the limit, add all files to the archive and remove .tiff files
if ($inCurrentBatch === $perBatch) {
$archive = new PharData($copyLocation . "/images_{$archiveNumber}.tar");
$archive->buildFromDirectory($copyLocation);
array_map('unlink', glob("{$copyLocation}/*.tiff"));
$inCurrentBatch = 0;
$archiveNumber++;
}
}
}
// Archive any leftover files in a last archive
if (glob("{$copyLocation}/*.tiff")) {
$archive = new PharData($copyLocation . "/images_{$archiveNumber}.tar");
$archive->buildFromDirectory($copyLocation);
array_map('unlink', glob("{$copyLocation}/*.tiff"));
}
$taken = microtime(true) - $time;
echo "Done in {$taken} seconds\n";
exit(0);
The copied images get removed in between batches to save on disk space.
We're fine with the entire script taking a while, but I don't understand why the time to create an archive increases so much in between batch.

Splitting up a large text document into multiple smaller text files

I am developing a text collection engine using fwrite() to write text but I want to put a file size cap of 1.5 mb on the writing process so if the file is larger that 1.5mb it will start writing a new file from where it left off and so on until it writes the contents of the source file into multiple files. I have Google-searched but many of the tutorials and examples are too complex for me because I am a novice programmer. The code below is inside a for loop which fetches the text ($RemoveTwo). It does not work as I need. Any help would be appreciated.
switch ($FileSizeCounter) {
case ($FileSizeCounter> 1500000):
$myFile2 = 'C:\TextCollector/'.'FilenameA'.'.txt';
$fh2 = fopen($myFile2, 'a') or die("can't open file");
fwrite($fh2, $RemoveTwo);
fclose($fh2);
break;
case ($FileSizeCounter> 3000000):
$myFile3 = 'C:\TextCollector/'.'FilenameB'.'.txt';
$fh3 = fopen($myFile3, 'a') or die("can't open file");
fwrite($fh3, $RemoveTwo);
fclose($fh3);
break;
default:
echo "continue and continue until it stops by the user";
}

Try doing something like this. You need to read from a source then write piece by piece all the while checking for end of file from the source. When you compare the max and buffer values, if they are true, then close the current file and open a new one with an auto-incremented numeric:
/*
** #param $filename [string] This is the source
** #param $toFile [string] This is the base name for the destination file & path
** #param $chunk [num] This is the max file size based on MB so 1.5 is 1.5MB
*/
function breakDownFile($filename,$toFile,$chunk = 1)
{
// Take the MB value and convert it into KB
$chunk = ($chunk*1024);
// Get the file size of the source, divide by kb
$length = filesize($filename)/1024;
// Put a max in bits
$max = $chunk*1000;
// Start value for naming the files incrementally
$i = 1;
// Open *for reading* the source file
$r = fopen($filename,'r');
// Create a new file for writing, use the increment value
$w = fopen($toFile.$i.'.txt','w');
// Loop through the file as long as the file is readable
while(!feof($r)) {
// Read file but only to the max file size value set
$buffer = fread($r, $max);
// Write to disk using buffer as a guide
fwrite($w, $buffer);
// Check the bit size of the buffer to see if it's
// same or larger than limit
if(strlen($buffer) >= $max) {
// Close the file
fclose($w);
// Add 1 to our $i file
$i++;
// Start a new file with the new name
$w = fopen($toFile.$i.'.txt','w');
}
}
// When done the loop, close the writeable file
fclose($w);
// When done loop close readable
fclose($r);
}
To use:
breakDownFile(__DIR__.'/test.txt',__DIR__.'/tofile',1.5);

JSON in This Format in PHP

I'm working on a project that needs to add JSON to the file after every form submission.
My goal is to get the JSON to look like:
{
"word0":[
"cheese",
"burger"
],
"word1":[
"sup",
"boi"
],
"word2":[
"nothin'",
"much"
]
}
But I'm not able to add to the JSON file afterwards.
EDIT: I'm thinking about just creating a new file for every form submission. Would this be a better option?
(Storage size isn't a problem)
Here's my current code that places JSON into a file:
$response['word' . $count] = array('word1' => $_POST['firstrow'], 'word2' => $_POST['secondrow']);
file_put_contents("query.json", file_get_contents("query.json") . json_encode($response) . "\n");

Well if you have no problem with storage size you can do a new file for every form submission.
But you can make it one large file via reading the writing the file.
$data = json_decode(file_get_contents("data.json"), true);
$data["word3"] = array("i don't" , "know");
file_put_contents("data.json", json_encode($data));
If you want to save on your IO, you can do writing at a specific position via fseek.
$file = fopen("data.json", "c");
fseek($file, -5, SEEK_END); // 5 character back from the end of the file.
fwrite($file, $newJsonArrayElement);
fclose($file);
^^^^^^^^^^^^^^^^^^^^^^^^^
this is an example snipper, you will need to calculate the characters to seek back from the end and somehow generate the new json.

upload images through php using unique file names

I am currently in the process of writing a mobile app with the help of phonegap. One of the few features that I would like this app to have is the ability to capture an image and upload it to a remote server...
I currently have the image capturing and uploading/emailing portion working fine with a compiled apk... but in my php, I am currently naming the images "image[insert random number from 10 to 20]... The problem here is that the numbers can be repeated and the images can be overwritten... I have read and thought about just using rand() and selecting a random number from 0 to getrandmax(), but i feel that I might have the same chance of a file overwriting... I need the image to be uploaded to the server with a unique name every-time, no matter what... so the php script would check to see what the server already has and write/upload the image with a unique name...
any ideas other than "rand()"?
I was also thinking about maybe naming each image... img + date + time + random 5 characters, which would include letters and numbers... so if an image were taken using the app at 4:37 am on March 20, 2013, the image would be named something like "img_03-20-13_4-37am_e4r29.jpg" when uploaded to the server... I think that might work... (unless theres a better way) but i am fairly new to php and wouldn't understand how to write something like that...
my php is as follows...
print_r($_FILES);
$new_image_name = "image".rand(10, 20).".jpg";
move_uploaded_file($_FILES["file"]["tmp_name"], "/home/virtual/domain.com/public_html/upload/".$new_image_name);
Any help is appreciated...
Thanks in advance!
Also, Please let me know if there is any further info I may be leaving out...

You may want to consider the PHP's uniqid() function.
This way the code you suggested would look like the following:
$new_image_name = 'image_' . date('Y-m-d-H-i-s') . '_' . uniqid() . '.jpg';
// do some checks to make sure the file you have is an image and if you can trust it
move_uploaded_file($_FILES["file"]["tmp_name"], "/home/virtual/domain.com/public_html/upload/".$new_image_name);
Also keep in mind that your server's random functions are not really random. Try random.org if you need something indeed random. Random random random.
UPD: In order to use random.org from within your code, you'll have to do some API requests to their servers. The documentation on that is available here: www.random.org/clients/http/.
The example of the call would be: random.org/integers/?num=1&min=1&max=1000000000&col=1&base=10&format=plain&rnd=new. Note that you can change the min, max and the other parameters, as described in the documentation.
In PHP you can do a GET request to a remote server using the file_get_contents() function, the cURL library, or even sockets. If you're using a shared hosting, the outgoing connections should be available and enabled for your account.
$random_int = file_get_contents('http://www.random.org/integers/?num=1&min=1&max=1000000000&col=1&base=10&format=plain&rnd=new');
var_dump($random_int);

You should use tempnam() to generate a unique file name:
// $baseDirectory Defines where the uploaded file will go to
// $prefix The first part of your file name, e.g. "image"
$destinationFileName = tempnam($baseDirectory, $prefix);
The extension of your new file should be done after moving the uploaded file, i.e.:
// Assuming $_FILES['file']['error'] == 0 (no errors)
if (move_uploaded_file($_FILES['file']['tmp_name'], $destinationFileName)) {
// use extension from uploaded file
$fileExtension = '.' . pathinfo($_FILES['file']['name'], PATHINFO_EXTENSION);
// or fix the extension yourself
// $fileExtension = ".jpg";
rename($destinationFileName, $destinationFileName . $fileExtension);
} else {
// tempnam() created a new file, but moving the uploaded file failed
unlink($destinationFileName); // remove temporary file
}

Have you considered using md5_file ?
That way all of your files will have unique name and you would not have to worry about duplicate names. But please note that this will return same string if the contents are the same.
Also here is another method:
do {
$filename = DIR_UPLOAD_PATH . '/' . make_string(10) . '-' . make_string(10) . '-' . make_string(10) . '-' . make_string(10);
} while(is_file($filename));
return $filename;
/**
* Make random string
*
* #param integer $length
* #param string $allowed_chars
* #return string
*/
function make_string($length = 10, $allowed_chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890') {
$allowed_chars_len = strlen($allowed_chars);
if($allowed_chars_len == 1) {
return str_pad('', $length, $allowed_chars);
} else {
$result = '';
while(strlen($result) < $length) {
$result .= substr($allowed_chars, rand(0, $allowed_chars_len), 1);
} // while
return $result;
} // if
} // make_string

Function will create a unique name before uploading image.
// Upload file with unique name
if ( ! function_exists('getUniqueFilename'))
{
function getUniqueFilename($file)
{
if(is_array($file) and $file['name'] != '')
{
// getting file extension
$fnarr = explode(".", $file['name']);
$file_extension = strtolower($fnarr[count($fnarr)-1]);
// getting unique file name
$file_name = substr(md5($file['name'].time()), 5, 15).".".$file_extension;
return $file_name;
} // ends for is_array check
else
{
return '';
} // else ends
} // ends
}

Php WHILE loops only find one element

I got a problem with the following php code. It is supposed to list the items of a S3 bucket and find&delete files which contain a certain string in their filenames.
Problem is: only one file is deleted the others remain on the bucket after the execution of the script.
I can't find where the issue comes from so I ask you :/
$aS3Files = $s3->getBucket($bucketName); // list all elements in the bucket
$query = mysql_query("SELECT filename FROM prizes_media WHERE prize_id=" . $_POST["prizeId"]); // finds all filenames linked to the prize
while($media = mysql_fetch_array($query)){
// Find relevant files
while ( list($cFilename, $rsFileData) = each($aS3Files) ) { // reformat the bucket list into a table and reads through it
if(strpos($cFilename,$media['filename'])) {
$s3->deleteObject($bucketName, $cFilename); // deletes all files that contain $media['filename'] in their filename
}
}
}
// 2. Delete DB entry
mysql_query("DELETE FROM prizes WHERE id=" . $_POST['prizeId'] ); // deletes the entry correponding to the prize in the DB (deletes media table in cascade)

You may be getting false negatives on your if, you should be using this:
if(strpos($cFilename,$media['filename']) !== FALSE) { ...
Edit
Here is a different way to loop the bucket, based on the structure on your comment:
foreach($aS3Files as $filename => $filedata) {
if(strpos($filename, $media['filename']) !== FALSE) {
$s3->deleteObject($bucketName, $filename); // deletes all files that contain $media['filename'] in their filename
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.