Read content of 12000 files from another FTP server - php

What I would like to script: a PHP script to find a certain string in loads of files
Is it possible to read contents of thousands of text files from another ftp server without actually downloading those files (ftp_get) ?
If not, would downloading them ONCE -> if already exists = skip / filesize differs = redownload -> search certain string -> ...
be the easiest option?

If URL fopen wrappers are enabled, then file_get_contents can do the trick and you do not need to save the file on your server.
<?php
$find = 'mytext'; //text to find
$files = array('http://example.com/file1.txt', 'http://example.com/file2.txt'); //source files
foreach($files as $file)
{
$data = file_get_contents($file);
if(strpos($data, $find) !== FALSE)
echo "found in $file".PHP_EOL;
}
?>
[EDIT]: If Files are accessible only by FTP:
In that case, you have to use like this:
$files = array('ftp://user:pass#domain.com/path/to/file', 'ftp://user:pass#domain.com/path/to/file2');

If you are going to store the files after you download them, then you may be better served to just download or update all of the files, then search through them for the string.
The best approach depends on how you will use it.
If you are going to be deleting the files after you have searched them, then you may want to also keep track of which ones you searched, and their file date information, so that later, when you go to search again, you won't waste time searching files that haven't changed since the last time you checked them.
When you are dealing with so many files, try to cache any information that will help your program to be more efficient next time it runs.

PHP's built-in file reading functions, such as fopen()/fread()/fclose() and file_get_contents() do support FTP URLs, like this:
<?php
$data = file_get_contents('ftp://user:password#ftp.example.com/dir/file');
// The file's contents are stored in the $data variable
If you would need to get a list of the files in the directory, you might want to check out opendir(), readdir() and closedir(), which I'm pretty sure supports FTP URLs.
An example:
<?php
$dir = opendir('ftp://user:password#ftp.example.com/dir/');
if(!$dir)
die;
while(($file = readdir($dir)) !== false)
echo htmlspecialchars($file).'<br />';
closedir($dir);

If you can connect via SSH to that server, and if you can install new PECL (and PEAR) modules, then you might consider using PHP SSH2. Here's a good tutorial on how to install and use it. This is a better alternative to FTP. But if it is not possible, your only solution is file_get_content('ftp://domain/path/to/remote/file');.
** UPDATE **
Here is a PHP-only implementation of an SSH client : SSH in PHP.

With FTP you'll always have to download to check.
I do not know what kind of bandwidth you're having and how big the files are, but this might be an interesting use-case to run this from the cloud like Amazon EC2, or google-apps (if you can download the files in the timelimit).
In the EC2 case you then spin up the server for an hour to check for updates in the files and shut it down again afterwards. This will cost a couple of bucks per month and avoid you from potentially upgrading your line or hosting contract.

If this is a regular task then it might be worth using a simple queue system so you can run multiple processes at once (will hugely increase speed) This would involve two steps:
Get a list of all files on the remote server
Put the list into a queue (you can use memcached for a basic message queuing system)
Use a seperate script to get the next item from the queue.
The procesing script would contain simple functionality (in do while loop)
ftp_connect
do
item = next item from queue
$contents = file_get_contents;
preg_match(.., $contents);
while (true);
ftp close
You could then in theory fork off multiple processes through the command line without needing to worry about race conditions.
This method is probabaly best suited to crons/batch processing, however it might work in this situation too.

Related

Extract a Zip Archive using PHP

what i'm trying to do is to basically extract the contents of Zip archives on my server.Here is some code:
$entry="test.zip";
$zip = new ZipArchive;
if ($zip->open($entry,ZIPARCHIVE::OVERWRITE) === TRUE)
{
$zip->extractTo('unpacked');
$zip->close();
}else
{
echo ‘failed’;
}
the directory "unpacked" is writeable for everyone and all the used methods of the ZipArchive Class return true. However nothing is being extracted. Does anyone happen to have an idea what could cause this behaviour? Any hint will be highly appreciated...Thanks in advance!
If you are using PHP 5.2.0 or later can you check zlib extension first http://www.zlib.net/
You also check PECL extensions, In order to access ZipArchive, you can also try zip_open, zip_read just for checking.
If this code is in-house, and you can safely make the assumption that you won't move this code from Linux to Windows (or vice versa), you also have the option to execute local system commands, which may help solve your problem.
<?php
echo `unzip myarchive.zip`;
echo `tar -xzf myotherarchive.tar.gz`;
?>
When developing internal-use and/or maintenance scripts, I used to opt for straight-up system calls, as it was more in-line with the commands sysadmins were used to using.
In case of failure you should echo out $zip as it contains the error.
Furthermore I'd guess that you may not have the needed permissions for test.zip
If your zip archive is big, sometimes you cannot extract all files during the maximum allowed execution time of your server.
The only solution, if you cannot change the maximum_execution_time in your php.ini, is to use a javascript to extract one file after the other. On the first javascript request you take the number of files in the archive
$nbr_of_files = $zip->numFiles;
And after you extract one file after another using the id number in the zip archive for each file
$zip->extractTo('unpacked', array($zip->getNameIndex($file_nbr)));
Please try removing the ZIPARCHIVE::OVERWRITE flag from the ZipArchive open method. (The flag may not be functioning as expected and may be the root of the issue if you have followed the advice in the other answers.)
I had the same issue too. The solution:
$zip->extractTo(public_path() .'/restoreDb/extracted/');
add the public_path() helper function.

find public html folder using php's ftp functions

I have a php script that logs into my servers via the ftp function, and backs up the entire thing easily and quickly, but I can't seem to find a way to let the script determine the folder where the main index file is located.
For example, I have 6 servers with a slew of ftp accounts all set up differently. Some log into the FTP root that has httpdocs/httpsdocs/public_html/error_docs/sub_domains and folders like that, and some log in directly to the httpdocs where the index file is. I only want to backup the main working web files and not all the other stuff that may be in there
I've set up a way to define the working directory, but that means I have to have different scripts for each server or backup I want to do.
Is it possible to have the php script find or work out the main web directory?
One option would be to set up a database that has either the directory to use or nothing if the ftp logs in directly to that directory, but I'm going for automation here.
If it's not possible I'll go with the database option though.
You cannot figure out through FTP alone what the root directory configured in apache is - unless you fetch httpd.conf and parse it, which I'm fairly sure you don't want to do. Presumably you are looping to do this backup from multiple servers with the same script?
If so, just define everything in an array, and loop it with a foreach and all the relevant data will be available in each iteration.
So I would do something like this:
// This will hold all our configuration
$serverData = array();
// First server
$serverData['server1']['ftp_address'] = 'ftp://11.22.33.44/';
$serverData['server1']['ftp_username'] = 'admin';
$serverData['server1']['ftp_password'] = 'password';
$serverData['server1']['root_dir'] = 'myuser/public_html';
// Second server
$serverData['server2']['ftp_address'] = 'ftp://11.22.22.11/';
$serverData['server2']['ftp_username'] = 'root';
$serverData['server2']['ftp_password'] = 'hackmeplease';
$serverData['server2']['root_dir'] = 'myuser/public_html';
// ...and so on
// Of course, you could also query a database to populate the $serverData array
foreach ($serverData as $server) {
// Process each server - all the data is available in $server['ftp_address'], $server['root_dir'] etc etc
}
No, you can't do it reliably without knowledge of how Apache is setup for each of those domains. You'd be better off with the database/config file route. One-time setup cost for that plus a teensy bit of maintenance as sites are added/modded/removed.
You'll probably spend days getting a detector script going, and it'll fail the next time some unknown configuration comes up. Attemping to create an AI is hard... you have to get it to the Artificial Stupidity level first (e.g. the MS Paperclip).

How to determine whether a file is still being transferred via ftp

I have a directory with files that need processing in a batch with PHP. The files are copied on the server via FTP. Some of the files are very big and take a long time to copy. How can I determine in PHP if a file is still being transferred (so I can skip the processing on that file and process it in the next run of the batch process)?
A possibility is to get the file size, wait a few moments, and verify if the file size is different. This is not waterproof because there is a slight chance that the transfer was simply stalled for a few moments...
One of the safest ways of doing this is to upload the files with a temporary name, and rename them once the transfer is finished. You program should skip files with the temporary name (a simple extension works just fine.) Obviously this requires the client (uploader) to cooperate, so it's not ideal.
[This also allows you to delete failed (partial) transfers after a given time period if you need that.]
Anything based on polling the file size is racy and unsafe.
Another scheme (that also requires cooperation from the uploader) can involve uploading the file's hash and size first, then the actual file. That allows you to know both when the transfer is done, and if it is consistent. (There are lots of variants around this idea.)
Something that doesn't require cooperation from the client is checking whether the file is open by another process or not. (How you do that is OS dependent - I don't know of a PHP builtin that does this. lsof and/or fuser can be used on a variety of Unix-type platforms, Windows has APIs for this.) If another process has the file open, chances are it's not complete yet.
Note that this last approach might not be fool-proof if you allow restarting/resuming uploads, or if your FTP server software doesn't keep the file open for the entire duration of the transfer, so YMMV.
Our server admin suggested ftpwho, which outputs which files are currently transferred.
http://www.castaglia.org/proftpd/doc/ftpwho.html
So the solution is to parse the output of ftpwho to see if a file in the directory is being transferred.
Some FTP servers allow running commands when certain event occurs. So if your FTP server allows this, then you can build a simple signalling scheme to let your application know that the file has been uploaded more or less successfully (more or less is because you don't know if the user intended to upload the file completely or in parts). The signalling scheme can be as simple as creation of "uploaded_file_name.ext.complete" file, and you will monitor existence of files with ".complete" extension.
Now, you can check if you can open file for writing. Most FTP servers won't let you do this if the file is being uploaded.
One more approach mentioned by Mat is using system-specific techniques to check if the file is opened by other process.
Best way to check would be to try and get an exclusive lock on the file using flock. The sftp/ftp process will be using the fopen libraries.
// try and get exclusive lock on file
$fp = fopen($pathname, "r+");
if (flock($fp, LOCK_EX)) { // acquire an exclusive lock
flock($fp, LOCK_UN); // release the lock
fclose($fp);
}
else {
error_log("Failed to get exclusive lock on $pathname. File may be still uploading.");
}
It's not realy nice trick, but it's simple :-), the same u can do with filemtime
$result = false;
$tryies = 5;
if (file_exists($filepath)) {
for ($i=0; $i < $tryies; $i++) {
sleep(1);
$filesize[] = filesize($filepath);
}
$filesize = array_unique($filesize);
if (count($filesize) == 1) {
$result = true;
} else {
$result = false;
}
}
return $result;

PHP - Download from URL and Upload via FTP

Slightly weird concept here... A client of ours wants data pushed to them over FTP/S
The idea is that we download one of our reports by downloading from a URL (a CSV File), then push this to the client over FTP/S. I know I can do this in bash scripts using wget & ftp - but need to add to this over a web interface so PHP is the best way forward.
As this is a background task I can extend time-outs etc.
I know also I can use fopen to download and save a file, then find it and upload it using the PHP FTP library. Just looking for a way to download using fopen, hold the data in memory to upload straight away.
Any help appreciated in advance!
To retrieve the data from the URL you have a few options. You say you want the data in memory only to push directly to the FTP host.
One approach (that I find the simplest to use, but lacking in terms of reliability and error handling) is file_get_contents()
Example:
$url = 'http://www.domain.com/csvfile';
$data = file_get_contents($url);
Now you have your csv data in $data, over to how to push this to an ftp server.
Again the simplest way to do this is to use the builtin stream wrappers as used in the get example above. (Note however that this requires PHP 4.3.0)
Simply build up the connection string like this.
$protocol = 'ftps';
$hostname = 'ftp.domain.com';
$username = 'user';
$password = 'password';
$directory = '/pub';
$filename = 'filename.csv';
$connectionString = sprintf("%s://%s:%s#%s%s/%s",
$protocol,$username,$hostname,
$password,$directory,
$filename);
file_put_contents($connectionString,$data);
Have a look at the ftp wrappers manual
If this does not work there are other options.
You could use curl to get the data and the FTP Extension to push it.
To avoid saving the file to disk and "to upload straight away" i.e. to start pushing to FTP as soon as the first chunk of Data is downloaded?
Try this:
http://www.php.net/manual/en/function.stream-copy-to-stream.php
You'll need an FTP server and client library which support resuming uploads

PHP Subversion Setup FTP

I work at a small php shop and I recently proposed that we move away from using our nas as a shared code base and start using subversion for source control.
I've figured out how to make sure our dev server gets updated with every commit to our development branch... and I know how to merge into trunk and have that update our staging server, because we have direct access to both of these, but my biggest question is how to write a script that will update the production server, which we many times only have ftp access to. I don't want to upload the entire site every time... is there any way to write a script that is smart enough to upload only what has changed to the web server when we execute it (don't want it to automatically be uploading to the production enviroment, we want to execute it manually)?
Does my question even make sense?
Basically, your issue is that you can't use subversion on the production server. What you need to do is keep, on a separate (ideally identically configured) server a copy of your production checkout, and copy that through whatever method to the production server. You could think of this as your staging server, actually, since it will also be useful for doing final tests on releases before rolling them out.
As far as the copy goes, if your provider supports rsync, you're set. If you have only FTP you'll have to find some method of doing the equivalant of rsync via FTP. This is not the first time anybody's had that need; a web search will help you out there. But if you can't find anything, drop me a note and I'll look around myself a little further.
EDIT: Hope the author doesn't mind me adding this, but I think it belongs here. To do something approximately similar to rsync with ftp, look at weex http://weex.sourceforge.net/. Its a wrapper around command line ftp that uses a local mirror to keep track of whats on the remote server so that it can send only changed files. Works great for me.
It doesn't sound like SVN plays well with FTP, but if you have http access, that may prove sufficient to push changes using svnsync. That's how we push changes to our production severs -- we use svnsync to keep a read-only mirror of the repository available.
I use the following solution. Just install the SVN client on your webserver, and attach this into a privately accessible url:
<?php
// make sure you have a robot account that can't commit ;)
$username = Settings::Load()->Get('svn', 'username');
$password = Settings::Load()->Get('svn', 'password');
$repos = Settings::Load()->Get('svn', 'repository');
echo '<h1>updating from svn</h1><pre>';
// for secutity, define an array of folders that you do want to be synced from svn. The rest should be skipped.
$svnfolders = array( 'includes/' ,'plugins/' ,'images/' ,'templates/', 'index.php' => 'index.php');
if(!empty($_GET['justthisone']) && array_search($_GET['justthisone'], $svnfolders) !== false){ // you can also just update one of above by passing it to $_GET
$svnfiles = array($_GET['justthisone']);
}
foreach($svnfiles as $targetlocation)
{
echo system("svn export --username={$username} --password {$password} {$repos}{$targetlocation} ".dirname(__FILE__)."/../{$targetlocation} --force");
}
die("</pre><h1>Done!</h1>");
I'm going to make an assumption here and say you are using a post-commit hook to do your merging/updating of your staging server. This may work, but I would strongly recommend you look into a Continuous Integration solution. The following are some that I am aware of:
Xinc - http://code.google.com/p/xinc/ (PHP Specific)
CruiseControl - http://cruisecontrol.sourceforge.net/ (Wildly popular.)
PHP integration made possible with http://phpundercontrol.org/about.html
Hudson - [https://hudson.dev.java.net/] (Appears to be Java based, but allows for plugins/extensions)
LFTP is capable of synchronizing directories over ftp.
Just an idea:
You could hold a revision of your project on a host you have access to and where subversion is installed. This single revision reflects the production server's version.
You could now write a PHP script that makes this repository update over svn and then find all files that have been changed since the rep was updated. These files you can upload.
Such a script could look like this:
$path = realpath( '/path/to/production/mirror' );
chdir( $path );
$start = time();
shell_exec( 'svn co' );
$list = array();
$i = new RecursiveIteratorIterator( new RecursiveDirectoryIterator( $path ), RecursiveIteratorIterator::SELF_FIRST );
foreach( $i as $node )
{
if ( $node->isFile() && $node->getCTime() > $start )
{
$list[] = $node->getPathname();
}
// directories should also be handled
}
$conn = ftp_connect( ... );
// and so on
Just as it came to my mind.
I think this will help you
https://github.com/midhundevasia/deploy
its works well in Windows.

Categories