php-updating a file and reading concurrently - php

I want to regularly update (rewrite, not append) a txt file from php by using file_put_contents. another php API reads this file and prints the content for the user.
is it possible that when the user wants to read the file via PHP API, it returns empty? because when the first PHP file tries to update the file, it erases the data and then writes new content. if it is possible, how to avoid it?

It can prevent and sure the source file won't be empty try following solution :
you can keep your processing text file in tmp folder e.g. tmp_txt which you can create parallel to same location where as your current text file, so first your text file goes to in this tmp folder
Create a shell script file and keep that under the tmp folder or any other folder
add the shell script which will observer the file size, and put that in to cron job scheduler
find /your project root path/tmp_txt/ -type f -size +1 -name "mytext.txt" -exec mv {} /your project root paht/folder where you want it/
"find" is command for search the file and next your tmp folder path"
"-type f" this will consider only the file
"-size +1" +1 mean above 1 KB
"-name "mytext.txt"" you can define your file name, if dynamic names then -name "*.txt"
"-exec mv {}" this will move the file on path that next to it, if match the file size with above condition which is 1KB you can change that as per your need
e.g. cronjob entry which will run the every minutes
bash /yor project root path/tmp_txt/shellscriptfilename>> /dev/null 2>&1

Related

Move millions of files to root of S3

I have 20 million local files. Each file is represented by a numeric ID that is hashed.
File 1 is named 356a192b7913b04c54574d18c28d46e6395428ab (sha1 of "1")
File 2 is named da4b9237bacccdf19c0760cab7aec4a8359010b0 (sha1 of "2")
etc. etc.
Not every number represent a file, but i have a list of all numbers that do.
The files are placed in folders named after the first two characters in the hash, followed by the next two, followed by the next two.
For file 1 (da4b9237bacccdf19c0760cab7aec4a8359010b0) the folder structure is
da/4b/92/
In that folder the file is placed and it's named it's full hash, so the full path of the file is
da/4b/92/da4b9237bacccdf19c0760cab7aec4a8359010b0
I now want to move all the files from the file system to a bucket at Amazon S3, and while doing so I want to move them out to the root of that bucket.
As there are so many files it would be good if there was a way to log what files have been moved and what files might have failed for some reson, i need to be able to resume the operation if it fails.
My plan is to create a table in mysql called moved_files and then run a PHP script that fetches X number of ID's from the files table, uses the AWS SDK for PHP to copy the file to S3, if it succeeds it add that ID to the moved_files table. However I'm not sure if this would be the fastest way to do it, maybe I should look into writing a bash script using the AWS Cli.
Any suggestions would be appreciated!
I do NOT use AWS S3, but a little Googling suggests you need a command like:
aws s3 cp test.txt s3://mybucket/test2.txt
So, if you want to run that for all your files, I would suggest you use GNU Parallel to keep your connection fully utilised and to reduce latencies.
Please make a test directory with a few 10s of files to test with, then cd to that directory and try this command:
find . -type f -print0 | parallel -0 --dry-run aws s3 cp {} s3://rootbucket/{/}
Sample Output
aws s3 cp ./da/4b/92/da4b9237bacccdf19c0760cab7aec4a8359010b0 s3://rootbucket/da4b9237bacccdf19c0760cab7aec4a8359010b0
aws s3 cp ./da/4b/92/da4b9237bacccdf19c0760cab7aec4a8359010b1 s3://rootbucket/da4b9237bacccdf19c0760cab7aec4a8359010b1
If you have 8 CPU cores, that will run 8 parallel copies of aws at a time till all your files are copied.
The {} expands to mean "the current file", and {/} expands to mean "the current file without its directory". You can also add --bar to get a progress bar.
If that looks hopeful, we can add a little bash function for each file that updates your database, or deletes the local file, conditionally upon the success of the aws command. That looks like this - start reading at the bottom ;-)
#!/bin/bash
# bash function to upload single file
upload() {
local="$1" # Pick up parameters
remote="$2" # Pick up parameters
echo aws s3 cp "$local" "s3://rootbucket/$remote" # Upload to AWS
if [ $? -eq 0 ] ; then
: # Delete locally or update database with success
else
: # Log error somewhere
fi
}
# Export the upload function so processes started by GNU Parallel can find it
export -f upload
# Run GNU Parallel on all files
find . -type f -print0 | parallel -0 upload {} {/}

PHP symlink a huge files list

I have a huge file's list (more than 48k files with paths) and I wanna to do a symlink for these files
Here is my PHP code:
$files=explode("\n","files.txt");
foreach($files as $file){
$file=trim($file);
#symlink($file,"/home/".$file."#".rand(1,80000).".txt");
}
The problem is the process takes more than 1 hour
I thought about checking if the file exists first and then do a symlink, so I made some research in php.net and there some functions like is_link() and readlink() for what I wanted in the first place, but a comment took my attention:
It is neccesary to be notified that readlink only works for a real link file, if this link file locates to a directory, you cannot detect and read its contents, the is_link() function will always return false and readlink() will display a warning is you try to do.
So I made this new code:
$files=explode("\n","files.txt");
foreach($files as $file){
$file=trim($file);
if(!empty(readlink($file))){
#symlink($file,"/home/".$file."#".rand(1,80000).".txt");
}
}
The problem now : "there is no symlink files !"
How I can prevent this problems ? Should I use a multi threading or there is another option
Obviously you are running Linux-based operating system and your question is related to File system.
In this case I would recommend to create bash script to read the file.txt and create the symlinks for all of them.
Good start to this is:
How to symlink a file in Linux?
Linux/UNIX: Bash Read a File Line By Line
Random number from a range in a Bash Script
So you may try something like this:
#!/bin/bash
while read name
do
# Do what you want to $name
ln -s /full/path/to/the/file/$name /path/to/symlink/shuf -i 1-80000 -n 1$name'.txt'
done < file.txt
EDIT:
One line solution:
while read name; do ln -s /full/path/to/the/file/$name /path/to/symlink/shuf -i 1-80000 -n 1$name'.txt'; done < file.txt
Note: Replace the "file.txt" with full path to the file. And test it on small amount of files if anything goes wrong.

reading a file created from a cron job

I have a small program that runs and generates a new text dump every 30sec-1min via cron.
program > dump.txt
However I have another PHP web program that accesses the text dump in a read-only mode whenever someone visits the webpage. The problem is that I believe if someone accesses the website the very second the cron job is running the webpage may only read half the file because i think Linux does not lock the file when > is used.
I was thinking of doing:
echo "###START###" > dump.txt
program >> dump.txt
echo "###END###" >> dump.txt
Then when the PHP webpage reads the dump in memory, I could do a regex to check if the start and end flags are present and if not, then to try again until it reads the file with both flags.
Will this ensure the files integrity? If not, how can I ensure that when I read dump.txt it will be intact.
Rather than create the file in the directory, why not create it somewhere else? Then after the data's been written, just move it into the webroot and overwrite the previous set.
Example:
sh create_some_data.sh > /home/cronuser/my_data.html
mv /home/cronuser/my_data.html /var/www/

tar gz file extract exclude folder "data"

I have a little problem, I have a large 41gb file on a server and I need to extract it..
How would i go about it, the file is in a tar.gz format and it will take 24hr on a godaddy server and then it stops for some reason
I need to exclude a folder name data this contains the bulk of the data 40.9gb the rest is just php.
home/xxx/public_html/xxx.com.au/data << this is the folder I don't need
I have been searching google and other sites for day's but it doesn't work..
shell_exec('tar xvf xxx_backup_20140921.tar.gz'); this is the command I use I have even used the 'k' to skip files and it dont work
I have used the -exclude command but nothing,
Try this:
shell_exec("tar xzvf xxx_backup_20140921.tar.gz --exclude='home/xxx/public_html/xxx.com.au/data'");
This should prevent the path listed (relative to the root of the archive) from being extracted.

Call PHP File via BATCH (Cronjob)

I have a PHP file on my server which should get executed to import a CSV File. How can I do that via BATCH, just like a Cron Job does.
Thanks
You can directly add php file to crontab, for this file must have 755 permission.
To add
php -f absolute file path
or you can create .sh file and call php file from it, for this also .sh file must have 755 permission.
#!/bin/bash
php -f absolute file path

Categories