tar gz file extract exclude folder "data" - php

I have a little problem, I have a large 41gb file on a server and I need to extract it..
How would i go about it, the file is in a tar.gz format and it will take 24hr on a godaddy server and then it stops for some reason
I need to exclude a folder name data this contains the bulk of the data 40.9gb the rest is just php.
home/xxx/public_html/xxx.com.au/data << this is the folder I don't need
I have been searching google and other sites for day's but it doesn't work..
shell_exec('tar xvf xxx_backup_20140921.tar.gz'); this is the command I use I have even used the 'k' to skip files and it dont work
I have used the -exclude command but nothing,

Try this:
shell_exec("tar xzvf xxx_backup_20140921.tar.gz --exclude='home/xxx/public_html/xxx.com.au/data'");
This should prevent the path listed (relative to the root of the archive) from being extracted.

Related

PHP symlink a huge files list

I have a huge file's list (more than 48k files with paths) and I wanna to do a symlink for these files
Here is my PHP code:
$files=explode("\n","files.txt");
foreach($files as $file){
$file=trim($file);
#symlink($file,"/home/".$file."#".rand(1,80000).".txt");
}
The problem is the process takes more than 1 hour
I thought about checking if the file exists first and then do a symlink, so I made some research in php.net and there some functions like is_link() and readlink() for what I wanted in the first place, but a comment took my attention:
It is neccesary to be notified that readlink only works for a real link file, if this link file locates to a directory, you cannot detect and read its contents, the is_link() function will always return false and readlink() will display a warning is you try to do.
So I made this new code:
$files=explode("\n","files.txt");
foreach($files as $file){
$file=trim($file);
if(!empty(readlink($file))){
#symlink($file,"/home/".$file."#".rand(1,80000).".txt");
}
}
The problem now : "there is no symlink files !"
How I can prevent this problems ? Should I use a multi threading or there is another option
Obviously you are running Linux-based operating system and your question is related to File system.
In this case I would recommend to create bash script to read the file.txt and create the symlinks for all of them.
Good start to this is:
How to symlink a file in Linux?
Linux/UNIX: Bash Read a File Line By Line
Random number from a range in a Bash Script
So you may try something like this:
#!/bin/bash
while read name
do
# Do what you want to $name
ln -s /full/path/to/the/file/$name /path/to/symlink/shuf -i 1-80000 -n 1$name'.txt'
done < file.txt
EDIT:
One line solution:
while read name; do ln -s /full/path/to/the/file/$name /path/to/symlink/shuf -i 1-80000 -n 1$name'.txt'; done < file.txt
Note: Replace the "file.txt" with full path to the file. And test it on small amount of files if anything goes wrong.

how to avoid duplicate filename in wordpress

As we all know wordpress stores your uploaded files (for me,it's just JPG files) in a folder named "uploads" under "wp-content". Files are separated into folders based on year and month.
Now i want to copy every file from every folder into a single folder on another server (for some purposes). I want to know, does wordpress rename duplicate files? is it possible that my files be overwritten on the new server?
If yes, how can i avoid this? is there a way to make wordpress rename files before storing them?
You can scan your uploaded file folder and you have to options:
1.- Set a random name for each file
2.- Set a name convention including path and file name, for example: my_path_my_filename.jpg
By the way your file wont be overwritten cause is another server
This question seems about export/import...
Check exported XML (WordPress eXtended RSS file format), you can download all media URLs at <wp:attachment_url> tag... Use any XML parser.
Example without parser, at terminal:
cat exportedSite.xml | grep wp:attachment_url
will list all URLs. Each parsed URL can be downloaded by curl or wget.
If you whant to restore the XML backup, change (only) the URLs of the wp:attachment_url tags by the new repo URLs

Can back up specific files automatically from FTP to a specific folder using PHP?

I have lots of backup files in my FTP.
The file name like : index.php.bk-2013-12-27
I want to back up those files to the folder named /backup/
so inside of my httpdocs folder looks like this.
index.php
backup/index.php.bk.2013-12-27
the following both methods are fine to done this.
01. if any file contain name .bk that should be backed up automatically to the folder backup
or
02.
create a text file named backup_move.text that file contains the
paths of files that need to be copied and placed it into the httpdocs folder.
then the php script extract those file path from the
backup_move.text and sync the files to the folder named backup
How can I do this with some php coding.?
Any help will be very much appreciated.
It can be done (Both solutions).
But you need to tell if the solution (01) need to be recursive or Not. I suppose you know php has got a "time" to run (standard are 60 seconds), so you know that the file you need to backup cannot require more then 55 seconds to get a backup.
You can try and use the next link, it will backup completly a site, db and file and put it into a zip file. It need to be configured a little, but it can help.
http://www.starkinfotech.com/php-script-to-take-a-backup-of-your-site-and-database/

Zip files contain same files but have different hashes?

I have created hundreds of folders and text files using php, I then add them to a zip archive.
This all works fine but if I create another zip archive using the same folders and files, the new archive will have a different hash to the first one. This is the same if I use winrar instead of php to create an archive.
It only seems to show different hashes when I zip the files I have created through php, yet they open fine.
Very strange anyone shed any light on this?
Thanks
Zip is not deterministic. To solve this zip problem (it's really problem when you have CI and need to update AWS lambda, for example and don't want to update it each time, but only when something was really changed) I used this article: https://medium.com/#pat_wilson/building-deterministic-zip-files-with-built-in-commands-741275116a19
Like this:
find . -exec touch -t "$(git ls-files -z . | \
xargs -0 -n1 -I{} -- git log -1 --date=format:"%Y%m%d%H%M" --format="%ad" '{}' | \
sort -r | head -n 1)" '{}' +
zip -rq -D -X -9 -A --compression-method deflate dest.zip sources...
There is certainly some difference in the files. If the lengths are not exactly the same, the hash will be different. You can use a comparing hex editor, like Hex Workshop for example, to see what exactly the differences are.
Possibilities that come to my mind:
As #orn mentioned, there may be a timestamp in the zip format you are using (not sure).
The order that the files are added to the archive may be different (depending on how you're selecting them / building the source array).
You can consider using deterministic_zip it solves this issue, from its documentation:
There are three tricks to building a deterministic zip:
Files must be added to the zip in the same order. Directory iteration order may vary across machines, resulting in different zips. deterministic_zip sorts all files before adding them to the zip archive.
Files in the zip must have consistent timestamps. If I share a directory to another machine, the timestamps of individual files may differ, despite having identical content. To achieve timestamp consistency, deterministic_zip sets the timestamp of all added files to 2019-01-01 00:00:00.
Files in the zip must have consistent permissions. File permissions look like -rw-r--r-- for a file that is readable by all users, and only writable by the user who owns the file. Similarly executable files might have permissions that look like: -rwxr-xr-x or -rwx------. deterministic_zip sets the permission of all files added to the archive to either -r--r--r--, or -r-xr-xr-x. The latter is only used when the user running deterministic_zip has execute access on the file.
Note: deterministic_zip does not modify nor update timestamps of any files it adds to archives. The techniques used above apply only to the copies of files within archives deterministic_zip creates.

How to efficiently monitor a directory for changes on linux?

I am working with Magento, and there is a function that merges CSS and Javascript into one big file.
Regardless the pros and cons of that, there is the following problem:
The final file gets cached at multiple levels that include but are not limited to:
Amazon CloudFront
Proxy servers
Clients browser cache
Magento uses an MD5 sum of the concatenated css filenames to generate a new filename for the merged css file. So that every page that has a distinct set of css files gets a proper merged css file.
To work around the caching issue, I also included the file modification timestamps into that hash, so that a new hash is generated, everytime a css file is modified.
So the full advantages of non revalidative caching score, but if something gets changed, its visible instantly, because the resource link has changed.
So far so good:
Only problem is, that the filenames that are used to generate the has, are only the ones that would normally be directly referenced in the HTML-Head block, and don't include css imports inside those files.
So changes in files that are imported inside css files don't result in a new hash.
No I really don't want to recursively parse all out imports and scan them or something like that.
I rather thought about a directory based solution. Is there anything to efficiently monitor the "last change inside a directory" on a file system basis?
We are using ext4.
Or maybe is there another way, maybe with the find command, that does all the job based on inode indexes?
Something like that?
I have seen a lot of programs that instantly "see" changes without scanning whole filesystems. I believe there are also sort of "file manipulation watch" daemons available under linux.
The problem is that the css directory is pretty huge.
Can anyone point me in the right direction?
I suggest you use php-independent daemon to modify change date of your main css file when one of dependent php files are modified. You can use dnotify for it, something like:
dnotify -a -r -b -s /path/to/imported/css/files/ -e touch /path/to/main/css/file;
It will execute 'touch' on main css file each time one of the files in other folder are modified (-a -r -b -s = any access/recursive directory lookup/run in background/no output). Or you can do any other action and test for it from PHP.
If you use the command
ls -ltr `find . -type f `
It will give you a long listing of all files with the newest at the bottom.
Try to have a look to inotify packages that will allows you to be notified eah time a modification occurs in a directory.
InotifyTools
php-inotify
I've never used it, but apparently there is inotify support for PHP.
(inotify would be the most efficient way to get notifications under Linux)

Categories