PHP generating same random number at same time (in seconds) - php

I'm using random number function in a PHP script while uploading files. Because I wanted to avoid overwrite files with same name. So following is the script potion is used while upload the file.
$filename = rand(0,100000).strtolower($_FILES['file']['name']);
$dir="/file/upload/directory/".$filename;
move_uploaded_file($_FILES["user_file"]["tmp_name"], $dir);
This application expected to have large amount of concurrent users. So QA testing this application with different automated tools by applying high concurrent visit. That point the random number seems generating the same value within the same seconds.
Then we test the random number separately this same random number on same time was clearly identified.
While search on web some post suggest on mt_rand() but still it is same on milli second level.
Is there any way of generating random number in time independent way in PHP?

Random numbers are generated with time. But for this particular issue we need to write few lines of code. If we check for file existence and apply incremental number to the file name it will be a fixed solution. The code can be like follows.
$filename = strtolower($_FILES['file']['name']);
$dir="/file/upload/directory/";
$i = 1;
while(is_file($dir . $i . $filename))
{
$i++;
}
move_uploaded_file($_FILES["user_file"]["tmp_name"], $dir . $i . $filename);
Even though loop is inefficient. This will make sure the file overwrite won't happen.

Related

BigQuery PHP API - large query result memory bloat - even with paging

I am running a range of queries in BigQuery and exporting them to CSV via PHP. There are reasons why this is the easiest method for me to do this (multiple queries dependent on variables within an app).
I am struggling with memory issues when the result set is larger than 100mb. It appears that the memory usage of my code seems to grow in line with the result set, which I thought would be avoided by paging. Here is my code:
$query = $bq->query($myQuery);
$queryResults = $bq->runQuery($query,['maxResults'=>5000]);
$FH = fopen($storagepath, 'w');
$rows = $queryResults->rows();
foreach ($rows as $row) {
fputcsv($FH, $row);
}
fclose($FH);
The $queryResults->rows() function returns a Google Iterator which uses paging to scroll through the results, so I do not understand why memory usage grows as the script runs.
Am I missing a way to discard previous pages from memory as I page through the results?
UPDATE
I have noticed that actually since upgrading to the v1.4.3 BigQuery PHP API, the memory usage does cap out at 120mb for this process, even when the result set reaches far beyond this (currently processing a 1gb result set). But still, 120mb seems too much. How can I identify and fix where this memory is being used?
UPDATE 2
This 120mb seems to be tied at 24kb per maxResult in the page. E.g. adding 1000 rows to maxResults adds 24mb of memory. So my question is now why is 1 row of data using 24kb in the Google Iterator? Is there a way to reduce this? The data itself is < 1kb per row.
Answering my own question
The extra memory is used by a load of PHP type mapping and other data structure info that comes alongside the data from BigQuery. Unfortunately I couldn't find a way to reduce the memory usage below around 24kb per row multiplied by the page size. If someone finds a way to reduce the bloat that comes along with the data please post below.
However thanks to one of the comments I realized you can extract a query directly to CSV in a Google Cloud Storage Bucket. This is really easy:
query = $bq->query($myQuery);
$queryResults = $bq->runQuery($query);
$qJobInfo = $queryResults->job()->info();
$dataset = $bq->dataset($qJobInfo['configuration']['query']['destinationTable']['datasetId']);
$table = $dataset->table($qJobInfo['configuration']['query']['destinationTable']['tableId']);
$extractJob = $table->extract('gs://mybucket/'.$filename.'.csv');
$table->runJob($extractJob);
However this still didn't solve my issue as my result set was over 1gb, so I had to make use of the data sharding function by adding a wildcard.
$extractJob = $table->extract('gs://mybucket/'.$filename.'*.csv');
This created ~100 shards in the bucket. These need to be recomposed using gsutil compose <shard filenames> <final filename>. However, gsutil only lets you compose 32 files at a time. Given I will have variable numbers of shards, opten above 32, I had to write some code to clean them up.
//Save above job as variable
$eJob = $table->runJob($extractJob);
$eJobInfo = $eJob->info();
//This bit of info from the job tells you how many shards were created
$eJobFiles = $eJobInfo['statistics']['extract']['destinationUriFileCounts'][0];
$composedFiles = 0; $composeLength = 0; $subfile = 0; $fileString = "";
while (($composedFiles < $eJobFiles) && ($eJobFiles>1)) {
while (($composeLength < 32) && ($composedFiles < $eJobFiles)) {
// gsutil creates shards with a 12 digit number after the filename, so build a string of 32 such filenames at a time
$fileString .= "gs://bucket/$filename" . str_pad($composedFiles,12,"0",STR_PAD_LEFT) . ".csv ";
$composedFiles++;
$composeLength++;
}
$composeLength = 0;
// Compose a batch of 32 into a subfile
system("gsutil compose $fileString gs://bucket/".$filename."-".$subfile.".csv");
$subfile++;
$fileString="";
}
if ($eJobFiles > 1) {
//Compose all the subfiles
system('gsutil compose gs://bucket/'.$filename.'-* gs://fm-sparkbeyond/YouTube_1_0/' . $filepath . '.gz') ==$
}
Note in order to give my Apache user access to gsutil I had to allow the user to create a .config directory in the web root. Ideally you would use the gsutil PHP library, but I didn't want the code bloat.
If anyone has a better answer please post it
Is there a way to get smaller output from the BigQuery library than 24kb per row?
Is there a more efficient way to clean up variable numbers of shards?

Upload image evenly into a directory structure

Im sure this question has been asked thousand of times, so here goes my version...
I have a form that uploads images...
Every image contains an unique id. I use the following function to generate my unid id:
function generateUnid($key) {
$name = $_FILES[$key]['name']; //get image name from global variable $_FILES
$ext = pathinfo($name, PATHINFO_EXTENSION); //get image extension
$prefix = 'fc'; //prefix for unid
do {
$unid = uniqid($prefix, true); //generate a unid
$filename = $unid . '.' . $ext; //replace image name with unid
$path = PATH_UPLOAD_ARTWORK . $filename; // image path
} while (file_exists($path)); // check if the image name exists
return $filename;
}
A sample of return values is:
fc4e7801523a04e6.06876802.jpg
So far so good. Now, i want to create some sort of directory structure for my images. Something similar like:
0
0
1
2
fc4e7801523a04e6.06876802.jpg
...
3
...
1
0
1
2
3
...
2
0
1
...
I could probably get the last 2 integers in my unique id for filing the image in the correct directory. But, i'm not to sure if that is the correct strategy...
How can i make sure that the images are filed evenly in the folders. I don't want to find my self with one folder that contains 12 000 images and one folder with 1 500 images...
Am i doing it the correct way by extracting the last 2 numbers of my uniq? Are there better ways for filing the image evenly?
Thanks
Assuming the unique id is uniformly (psuedo)random, which I think it is, this strategy will work pretty well I think. There will inevitably be a few folders with many more or many less than the average, predicted by normal distribution.
A slightly better technique for "binning" the images is to use the modulo (%) of many digits from the uid, rather than using the last two digits, in case the digits you have picked have some kind of pattern.
My advice would be to give it a go and see how it works for you. Ideally, you could create a "test harness" which calls the algorithm hundreds of thousands of times, after which you could assess whether the distribution of files in the directory structure is appropriate for your purposes.

Create Unique Image Names

What's a good way to create a unique name for an image that my user is uploading?
I don't want to have any duplicates so something like MD5($filename) isn't suitable.
Any ideas?
as it was mentioned, i think that best way to create unique file name is to simply add time(). that would be like
$image_name = time()."_".$image_name;
Grab the file extension from uploaded file:
$ext = pathinfo($uploaded_filename, PATHINFO_EXTENSION);
Grab the time to the second: time()
Grab some randomness: md5(microtime())
Convert time to base 36: base_convert (time(), 10, 36) - base 36 compresses a 10 byte string down to about 6 bytes to allow for more of the random string to be used
Send the whole lot out as a 16 char string:
$unique_id = substr( base_convert( time(), 10, 36 ) . md5( microtime() ), 0, 16 ) . $ext;
I doubt that will ever collide - you could even not truncate it if you don't mind very long file names.
If you actually need a filename (it's not entirely clear from your question) I would use tempnam(), which:
Creates a file with a unique filename, with access permission set to 0600, in the specified directory.
...and let PHP do the heavy lifting of working out uniqueness. Note that as well as returning the filename, tempnam() actually creates the file; you can just overwrite it when you drop the image file there.
You could take a hash (e.g., md5, sha) of the image data itself. That would help identify duplicate images too (if it was byte-for-byte, the same). But any sufficiently long string of random characters would work.
You can always rig it up in a way that the file name looks like:
/image/0/1/012345678/original-name.jpg
That way the file name looks normal, but it's still unique.
I'd recommend sha1_file() over md5_file(). It's less prone to collisions.
You could also use hash_file('sha256', $filePath) to get even better results.
http://php.net/manual/en/function.uniqid.php maybe?
You can prefix it with the user id to avoid collisions between 2 users (in less than one millisecond).
For short names:
$i = 0;
while(file_exists($name . '_' . $i)){
$i++;
}
WARNING: this might fail on a multi threaded server if two user upload a image with the same name at the same time.
In that case you should include the md5 of the username.
lol there are around 63340000000000000000000000000000000000000000000000 possibility's that md5 can produce
plus you could use just tobe on the safe side
$newfilename = md5(time().'image');
if(file_exists('./images/'.$newfilename)){
$newfilename = md5(time().$newfilename);
}
//uploadimage
How big is the probablity of two users uploading image with same name on same microsecond ?
try
$currTime = microtime(true);
$finalFileName = cleanTheInput($fileName)."_".$currTime;
// you can also append a _.rand(0,1000) in the end to have more foolproof name collision
function cleanTheInput($input)
{
// do some formatting here ...
}
This would also help you in tracking the upload time of the file for analysis. or may be sort the files,manage the files.
For good performance and uniqueness you can use approach like this:
files will be stored on a server with names like md5_file($file).jpg
the directory to store file in define from md5 file name, by stripping first two chars (first level), and second two (second level) like that:
uploaded_files\ 30 \ c5 \ 30 c5 67139b64ee14c80cc5f5006d8081.pdf
create record in database with file_id, original file name, uploaded user id, and path to file on server
on server side create script that'll get role of download providing - it'll get file by id from db, and output its content with original filename provided by user (see php example of codeigniter download_helper ). So url to file will look like that:
http://site.com/download.php?file=id
Pros:
minified collisions threat
good performance at file lookup (not much files in 1 directory, not much directories at the same level)
original file names are saved
you can adjust access to files by server side script (check session or cookies)
Cons:
Good for small filesizes, because before user can download file, server have to read this file in memory
try this file format:
$filename = microtime(true) . $username . '.jpg';
I think it would be good for you.
<?php
$name=uniqid(mt_rand()).$image_name;
?>
You should try to meet two goals: Uniqueness, and usefulness.
Using a GUID guarantees uniqueness, but one day the files may become detached from their original source, and then you will be in trouble.
My typical solution is to embed crucial information into the filename, such as the userID (if it belongs to a user) or the date and time uploaded (if this is significant), or the filename used when uploading it.
This may really save your skin one day, when the information embedded in the filename allows you to, for example, recover from a bug, or the accidental deletion of records. If all you have is GUIDs, and you lose the catalogue, you will have a heck of a job cleaning that up.
For example, if a file "My Holiday: Florida 23.jpg" is uploaded, by userID 98765, on 2013/04/04 at 12:51:23 I would name it something like this, adding a random string ad8a7dsf9:
20130404125123-ad8a7dsf9-98765-my-holiday-florida-23.jpg
Uniqueness is ensured by the date and time, and random string (provided it is properly random from /dev/urandom or CryptGenRandom.
If the file is ever detached, you can identify the user, the date and time, and the title.
Everything is folded to lower case and anything non-alphanumeric is removed and replaced by dashes, which makes the filename easy to handle using simple tools (e.g. no spaces which can confuse badly written scripts, no colons or other characters which are forbidden on some filesystems, and so on).
Something like this could work for you:
while (file_exists('/uploads/' . $filename . '.jpeg')) {
$filename .= rand(10, 99);
}
Ready-to-use code:
$file_ext = substr($file['name'], -4); // e.g.'.jpg', '.gif', '.png', 'jpeg' (note the missing leading point in 'jpeg')
$new_name = sha1($file['name'] . uniqid('',true)); // this will generate a 40-character-long random name
$new_name .= ((substr($file_ext, 0, 1) != '.') ? ".{$file_ext}" : $file_ext); //the original extension is appended (accounting for the point, see comment above)

How to write to file in large php application(multiple questions)

What is the best way to write to files in a large php application. Lets say there are lots of writes needed per second. How is the best way to go about this.
Could I just open the file and append the data. Or should i open, lock, write and unlock.
What will happen of the file is worked on and other data needs to be written. Will this activity be lost, or will this be saved. and if this will be saved will is halt the application.
If you have been, thank you for reading!
Here's a simple example that highlights the danger of simultaneous wites:
<?php
for($i = 0; $i < 100; $i++) {
$pid = pcntl_fork();
//only spawn more children if we're not a child ourselves
if(!$pid)
break;
}
$fh = fopen('test.txt', 'a');
//The following is a simple attempt to get multiple threads to start at the same time.
$until = round(ceil(time() / 10.0) * 10);
echo "Sleeping until $until\n";
time_sleep_until($until);
$myPid = posix_getpid();
//create a line starting with pid, followed by 10,000 copies of
//a "random" char based on pid.
$line = $myPid . str_repeat(chr(ord('A')+$myPid%25), 10000) . "\n";
for($i = 0; $i < 1; $i++) {
fwrite($fh, $line);
}
fclose($fh);
echo "done\n";
If appends were safe, you should get a file with 100 lines, all of which roughly 10,000 chars long, and beginning with an integer. And sometimes, when you run this script, that's exactly what you'll get. Sometimes, a few appends will conflict, and it'll get mangled, however.
You can find corrupted lines with grep '^[^0-9]' test.txt
This is because file append is only atomic if:
You make a single fwrite() call
and that fwrite() is smaller than PIPE_BUF (somewhere around 1-4k)
and you write to a fully POSIX-compliant filesystem
If you make more than a single call to fwrite during your log append, or you write more than about 4k, all bets are off.
Now, as to whether or not this matters: are you okay with having a few corrupt lines in your log under heavy load? Honestly, most of the time this is perfectly acceptable, and you can avoid the overhead of file locking.
I do have high-performance, multi-threaded application, where all threads are writing (appending) to single log file. So-far did not notice any problems with that, each thread writes multiple times per second and nothing gets lost. I think just appending to huge file should be no issue. But if you want to modify already existing content, especially with concurrency - I would go with locking, otherwise big mess can happen...
If concurrency is an issue, you should really be using databases.
If you're just writing logs, maybe you have to take a look in syslog function, since syslog provides an api.
You should also delegate writes to a dedicated backend and do the job in an asynchroneous maneer ?
These are my 2p.
Unless a unique file is needed for a specific reason, I would avoid appending everything to a huge file. Instead, I would wrap the file by time and dimension. A couple of configuration parameters (wrap_time and wrap_size) could be defined for this.
Also, I would probably introduce some buffering to avoid waiting the write operation to be completed.
Probably PHP is not the most adapted language for this kind of operations, but it could still be possible.
Use flock()
See this question
If you just need to append data, PHP should be fine with that as filesystem should take care of simultaneous appends.

Generate unique names?

I am working on a php site in which we have to upload images from users.i have to rename that file for preventing conflicts in the name of the image.
uniqid(rand(), true);
and adding a large random number after it.
Will this work perfectly. Any suggestions..??
Its about generation unique names for the image.....
Function tempnam() creates a file with a unique name.
Take an md5 of the file and use that. IIRC, the odds of a collision are 1 in 64M. If that's not enough, prefix it with the timestamp expressed in seconds or milliseconds. That way even if a duplicate md5 is generated, the files would have to come in during the same second/millisecond for a collision.
You can use Base36 on the AutoIncrement value from a SQL Table (hoping that you do use a SQL table).
$filename = base_convert($last_insert_id, 10, 36);
You have two approaches depending on "how" big can be your image library:
1. for a non-big amount of files I do this
<?php
$file = sanitize_file($file); // remove all no [az-09_] characters for safe url linking;
$file_md5 = $md5($file);
$file_extention = $md5($file);
// since I assume the file should belongs to someone you can do this
$file_name = $user_id . $file_md5 . $file_extension;
// then save the file
?>
option.... CacheMogul. Here you need to use your imagination. but for huge amount of files this does a nice sharding so you dont need to worry about a folder max quantity or size

Categories