Best way to manage file transfert of heavy folders

Best way to manage file transfert of heavy folders - php

The situation is a bit complicated here.
We are receiving large folders with lots of files on a distant fileserver (accessible with FTP, let's say FTP1).
Thoses folders can have a complex arboresence and weigh between 50Mo and 4Go.
With PHP, the goal is to remove unwanted files (.exe, .pdf ...).
Take all files and put them on the root folder and then order them by creating a new defined arborescence.
And after this process, the webserver should send everything to another distant FTP server (FTP2).
Then folders/files can be removed from FTP1
With laravel and Storage everything is easy to make but my main concern is about the speed.
Is it better to
Copy file on webserver, launch process, copy to distant and then clean
Process directly on FTP1 and then copy to FTP2
Copy to FTP2 and then process directly on FTP2
I dont have that much experience in IT infrastructure/architecture but both FTP are only accessible through internet and will never be in the same network as the webserver.
The connection between FTP's servers and webserver should be on high availability but we all knows what it means ...
I dont expect an answer but more like a guideline or the the usual way of dealing with this case.

I don't quite understand you. By 50Mo do you mean 50mb? But anyway, since you are just looking for a rough guide and from what i see,
You should
Reduce the need for FTP because FTP is extremely slow. So if you can have 1 FTP transfer over 2, it is definitely better.
Check your ftp servers hardware specification. You obviously want the faster one to do the processing.

Related

php security issue - file uploads

On of my client approached me to check and fix the hacked site. Site was developed by another developer , Very inexperienced developer not even basic security taken care of.
Well the problem was somehow PHP files were written to the images folder. Hackers also wrote an index.html which displays site is hacked. When I check images folder has 777 permissions. So I came to rough conclusion that its because of folder permissions. Hosting support guy says that some PHP file has poorly written scripts which allowed any extension file to upload to server, and then hackers executed files to gain access or do whatever they want.
I have few questions:
Is it only through upload functionality can we upload other PHP files ?
Is it not possible other way to write files from remote as folder permissions are 777?
Sit has some fckeditors editors and couple of upload functionalities. I checked them, there are enough validations , so when extensions other then images or PDF are tried to upload they just return false .
Does'nt setting folder permissions to lower level fix the issue?
I asked the support guy to change folder permissions and it would solve the issue, but he says there is some PHP file through of which other PHP files were written and he wants that to be fixed otherwise site cannot go live. He says even folder permissions are changed hacker can again change them to 777 and execute whatever he wants because that poorly written PHP file.
How should be my approach to find if there is such PHP file? Any help or pointers would be much appreciated.

777 means that any user on the system (with execute access for all the parent directories, anyway) can add anything to that directory. Web users are not system users, though, and most web servers (Apache included) won't let random clients write files there right out of the box. You'd have to specifically tell the server to allow that, and i'm fairly certain that's not what happened.
If you're allowing any file uploads, though, the upload folder needs to at least be writable by the web server's user (or the site's, if you're using something like suPHP). And if the web server can write to that directory, then any PHP code can write to that directory. You can't set permissions high enough to allow uploads and low enough to keep PHP code from running, short of making the directory write-only (which makes it pretty useless for fckeditor and such).
The compromise almost certainly happened because of a vulnerability in the site itself. Chances are, either there's a file upload script that's not properly checking where it's writing to, or a script that blindly accepts a name of something to include. Since the PHP code typically runs as the web server's user, it has write access to everything the web server has write access to. (It's also possible that someone got in via FTP, in which case you'd better change your passwords. But the chances of the web server being at fault are slim at best.)
As for what to do at this point, the best option is to wipe the site and restore from backup -- as has been mentioned a couple of times, once an attacker has gotten arbitrary code to run on your server, there's not a whole lot you can trust anymore. If you can't do that, at least find any files with recent modification times and delete them. (Exploits hardly ever go through that much trouble to cover their tracks.)
Either way, then set the permissions on any non-upload, non-temp, non-session directories -- and all the existing scripts -- to disallow writes, period...particularly by the web server. If the site's code runs as the same user that owns the files, you'll want to use 555 for directories and 444 for files; otherwise, you can probably get by with 755/644. (A web server would only be able to write those if it's horribly misconfigured, and a hosting company that incompetent would be out of business very quickly.)
Frankly, though, the "support guy" has the right idea -- i certainly wouldn't let a site go live on my servers knowing that it's going to be executing arbitrary code from strangers. (Even if it can't write anything to the local filesystem, it can still be used to launch an attack on other servers.) The best option for now is to remove all ability to upload files for now. It's obvious that someone has no idea how to handle file uploads securely, and now that someone out there knows you're vulnerable, chances are you'd keep getting hacked anyway til you find the hole and plug it.
As for what to look for...unfortunately, it's semi vague, as we're talking about concepts above the single-statement level. Look for any PHP scripts that either include, require, or write to file names derived in any way from $_GET, $_POST, or $_COOKIE.

Changing folder permissions won’t solve the issue unless you’re using CGI, since PHP probably needs to be able to write to an upload folder, and your web server probably needs to be able to read from it. Check the extension of any uploaded files!
(So no, 0777 permissions don’t mean that anyone can upload anything.)

As cryptic mentioned, once a hacker can run code on your server then you have to assume that all files are potentially dangerous. You should not try to fix this yourself - restoring from a backup (either from the client or the original developer) is the only safe way around this.
Once you have the backup files ready, delete everything on your your site and upload the backup - if it is a shared host you should contact them as well in case other files are compromised [rarely happens though].

You've identified 2 issues: the permissions and the lack of extension checking however have you any evidence that these were the means by which the system was compromised? You've not provided anything to support this assertion.
Changing the permissions to something more restrictive would have provided NO PROTECTION against users uploading malicious PHP scripts.
Checking the extensions of files might have a made it a bit more difficult to inject PHP code into the site, it WOULD NOT PREVENT IT.
Restoring from backup might remove the vandalized content but WILL NOT FIX THE VULNERABILITIES in the code.
You don't have the skills your client (whom is probably paying you for this) needs to resolve this. And acquiring those skills is a much longer journey than reading a few answers here (although admittedly it's a start).

Is it only through upload functionality can we upload other PHP files ? Is it not possible other way to write files from remote as folder permissions are 777?
There definitely are multiple possible ways to write a file in the web server’s document root directory. Just think of HTTP’s PUT method, WebDAV, or even FTP that may be accessible anonymously.
Sit has some fckeditors editors and couple of upload functionalities. I checked them, there are enough validations , so when extensions other then images or PDF are tried to upload they just return false .
There are many things one can do wrong when validating an uploaded file. Trusting the reliability of information the client sent is one of the biggest mistakes one can do. This means, it doesn’t suffice to check whether the client says the uploaded file is an image (e.g. one of image/…). Such information can be easily forged. And even proper image files can contain PHP code that is being executed when interpreted by PHP, whether it’s in an optional section like a comment section or in the image data itself.
Does'nt setting folder permissions to lower level fix the issue?
No, probably not. The upload directory must be writable by PHP’s and readable by the web server’s process. Since both are probably the same and executing a PHP file requires only reading permissions, any uploaded .php file is probably also executable. The only solution is to make sure that the stored files don’t have any extension that denote files that are executed by the web server, i.e. make sure a PNG is actually stored as .png.

Keep Uploaded Files in Sync Across Multiple Servers - PHP Linux

I have a website right now that is currently utilizing 2 servers, a application server and a database server, however the load on the application server is increasing so we are going to add a second application server.
The problem I have is that the website has users upload files to the server. How do I get the uploaded files on both of the servers?
I do not want to store images directly in a database as our application is database intensive already.
Is there a way to sync the servers across each other or is there something else I can do?
Any help would be appreciated.
Thanks
EDIT: I am adding the following links for people that helped me understand this question more:
Synchronize Files on Multiple Servers
and
Keep Uploaded Files in Sync Across Multiple Servers - LAMP
For all Reading this post NFS seems to be the better of the 2.
NFS will keep files in sync but you could also use ftp to upload the files across all servers as well but NFS looks like the way to go.

This is a question for serverfault.
Anyway I think you should definitely consider getting in the "cloud".
Syncing uploads from one server to another is simply unreliable - you have no idea what kind of errors you can get and why you can get them. Also the syncing process will load both servers. For me the proper solution is going in the cloud.
Should you chose the syncing method you have a couple of solutions:
Use rsync to sync the files you need between the servers.
Use crontab to sync the files every X minutes/hours/days.
Copy the files upon some event (user login etc)

I got this answer from server fault:
The most appropriate course of action in a situation like this is to break the file share into a separate service of its own. Don't duplicate files if you have a network that can let the files be "everywhere (almost) at once." You can do this through NFS/CIFS or through a proper storage protocol like iSCSI. Mount as local storage in the appropriate directory. Depending on the performance of your network and your storage needs, this could add a couple of undetectable milliseconds to page load time.
So using NFS to share server files would work OR
as stated by #kgb you could specify one single server to hold all uploaded files and have other servers pull from that (just make sure you run a cron or something to back up the file)

Most sites solve this problem by using a 3rd party designated file server like Amazon S3 for the user uploads.

Another answer could be to use a piece of software called BTSync, it is very easy to install and use and could allow you to easily keep files in sync accross as many servers as you need to. It takes only 3 terminal commands to install and is very efficient.
Take a look here
and here

You can use db server for storage... Not in the db i mean, have a web server running there too. It is not going to increase cpu load much, but is going to require a better channel.

you could do it with rsync.. people have suggested using nfs.. but that way you create one point of failure... if the nfs server goes down.. both your servers are screwed... correct me if im wrong

Serve a PHP website with PHP files being remote

This is the situation:
I have a LAMP server, which serves HTML, PHP, etc... Now I have remote folder, somewhere in the web, which has a directory full of PHP files, images, an MVC folder structure (CodeIgniter), etc...
Now, What I want to do is that instead of every time I want to serve those PHP files, instead of downloading them and uploaded them into my LAMP server, I want to use those PHP files directly and serve them in my LAMP server.
Again, I want the PHP files from a folder in another server, which I only have access to the direct link to each individual file, being serve in my LAMP server, so if I access my website, for instance: www.website.com/page1, gets the folder structure from the remote web server or all PHP files, and get serve within my server.
I know this sounds a little bit complicated but I'm not sure what to use... Maybe reverse proxy? Do you think I may download the files directly and constantly syncing the files? If anyone gets with a good solution I may even pay that person...
EDIT(1)
Good answers so far... but I think I did not make a good question so here it goes again:
I have access to a "list" of PHP files, and in order to get them I need to authenticate myself using oath via PHP. Once I get authenticated, I can retrieve a list of PHP, html, etc.. files, each one of them having a public URL that anyone can access. So the think is that instead of downloading all files in that repository, and serve those files, I want to be able to reuse that repository's web space and I just serve these files myself. So basically I want to be able to have symbolic links to urls, which I think is not possible, but being able to just read the files and serve the PHP logic, even though the files are elsewhere.
I'm concern about the security issues involved, but if someone could help me I will be thankful... Also if you are interested in what I'm doing I always can use a partner for this project which I intent to use it in charity, but still can pay that person.

This is not a smart thing to do. You open yourself up to potential security issues, but at a minimum, you will significantly slow your site down.
I would recommend that you simply script synchronizing the files on both servers over SSH by a script.
Edit: ManseUK's suggestion if rsync is also a good one.

If you have ftp access to the remote server, you could mount the folder using fuse, and serve as usual for apache.

Do you have the ability to mount the remote folder as an NFS volume, or perhaps with SSHFS? If those options are available, either could work for you. You'd mount the remote folder locally and tell your local web server to serve files from that path.
Not that it would be the most efficient setup in the world, but I don't know why you have all this split apart in the first place. ;)
You could write a cronjob to grab the remote file list every X minutes/hours/days then store the results locally, then write a simple script to parse those results upon request. Alternatively, you could still use an NFS or SSHFS mount to read the remote paths in real time and build whatever URL's you need.

Make Apache read PHP documents from RAM / Memory?

Is there any way to make Apache to read PHP documents from RAM?
I'm thinking of creating a virtual disk in the memory and then modify httpd.conf to change the document root directory to the virtual disk in the memory.
Is this viable?
Basically, what I want to do is distribute my PHP code to my users' computers so they can run it. But I don't want them to be able to look at the PHP source code easily - the code can't be stored in the harddisk in plain text, instead, they are stored in a data file and then read by my program into the memory where Apache reads it.
Is this viable? Is it easy to create a virtual disk in memory in C++ yet the virtual disk can't be accessed by any other means such as My Computer?
Update:
Thank you all for the questions which would help me better percept my goals, but I think I know what I'm doing. Please just suggest any solutions you may have towarding my needs.
The hard part thus far is for Apache to read from somewhere other than a plain directory in the harddisk that contains all the source code of my project. I would like it to be as concealed as possible. I know a little about windows desktop development and thought virtual disk might be a solution but if you have better ones, please suggest.

You can, in theory, have Apache serve files out of a Samba share. You would need to configure the server to mount a specific file share made by the user. This won't work if the user is behind a firewall or NAT gateway of any variety.
This will be:
Slower than molasses in January ... in Alaska. Apache does a lot of stat calls on each request by default. This is going to add a lot of overhead before even finding the file, transferring it over, and then executing it.
Hard to configure. Adding mounts is a non-trivial task at the server level and Samba can be rather finicky on both sides. Further, if you are using RHEL/CentOS or any other distro running SELinux, you're going to have to do the chcon/setsebool tapdance to even get it working. The default settings expressly prohibit Apache from touching any file that came to the system through a Samba share.
A security nightmare. You will be allowing Apache to serve up files to anyone from a computer that is not under your direct control. The malicious possibilities are endless. This is a horrible idea that you should not seriously consider.
A safer-but-still-insane alternative might be available. FastCGI. The remote systems can run a FastCGI process and actually host and execute the code directly. Apache can be configured to pass PHP requests to the remote FastCGI process. This will still break if the users are firewalled or NATted. This will only be an acceptable solution if the user can actually run a FastCGI process and you don't mind the code actually executing on their system instead of the server.
This has the distinct advantage of the files not executing in the context of the server.
Perhaps I've entirely misunderstood -- are you asking for code to be run live from user's systems? Because I wrote this answer under that interpretation.

File / Image Replication

I have a simple question and wish to hear others' experiences regarding which is the best way to replicate images across multiple hosts.
I have determined that storing images in the database and then using database replication over multiple hosts would result in maximum availability.
The worry I have with the filesystem is the difficulty synchronising the images (e.g I don't want 5 servers all hitting the same server for images!).
Now, the only concerns I have with storing images in the database is the extra queries hitting the database and the extra handling i'd have to put in place in apache if I wanted 'virtual' image links to point to database entries. (e.g AddHandler)
As far as my understanding goes:
If you have a script serving up the
images: Each image would require a
database call.
If you display the images inline as
binary data: Which could be done in
a single database call.
To provide external / linkable
images you would have to add a
addHandler for the extension you
wish to 'fake' and point it to your
scripting language (e.g php, asp).
I might have missed something, but I'm curious if anyone has any better ideas?
Edit:
Tom has suggested using mod_rewrite to save using an AddHandler, I have accepted as a proposed solution to the AddHandler issue; however I don't yet feel like I have a complete solution yet so please, please, keep answering ;)
A few have suggested using lighttpd over Apache. How different are the ISAPI modules for lighttpd?

If you store images in the database, you take an extra database hit plus you lose the innate caching/file serving optimizations in your web server. Apache will serve a static image much faster than PHP can manage it.
In our large app environments, we use up to 4 clusters:
App server cluster
Web service/data service cluster
Static resource (image, documents, multi-media) cluster
Database cluster
You'd be surprised how much traffic a static resource server can handle. Since it's not really computing (no app logic), a response can be optimized like crazy. If you go with a separate static resource cluster, you also leave yourself open to change just that portion of your architecture. For instance, in some benchmarks lighttpd is even faster at serving static resources than apache. If you have a separate cluster, you can change your http server there without changing anything else in your app environment.
I'd start with a 2-machine static resource cluster and see how that performs. That's another benefit of separating functions - you can scale out only where you need it. As far as synchronizing files, take a look at existing file synchronization tools versus rolling your own. You may find something that does what you need without having to write a line of code.

Serving the images from wherever you decide to store them is a trivial problem; I won't discuss how to solve it.
Deciding where to store them is the real decision you need to make. You need to think about what your goals are:
Redundancy of hardware
Lots of cheap storage
Read-scaling
Write-scaling
The last two are not the same and will definitely cause problems.
If you are confident that the size of this image library will not exceed the disc you're happy to put on your web servers (say, 200G at the time of writing, as being the largest high speed server-grade discs that can be obtained; I assume you want to use 1U web servers so you won't be able to store more than that in raid1, depending on your vendor), then you can get very good read-scaling by placing a copy of all the images on every web server.
Of course you might want to keep a master copy somewhere too, and have a daemon or process which syncs them from time to time, and have monitoring to check that they remain in sync and this daemon works, but these are details. Keeping a copy on every web server will make read-scaling pretty much perfect.
But keeping a copy everywhere will ruin write-scalability, as every single web server will have to write every changed / new file. Therefore your total write throughput will be limited to the slowest single web server in the cluster.
"Sharding" your image data between many servers will give good read/write scalability, but is a nontrivial exercise. It may also allow you to use cheap(ish) storage.
Having a single central server (or active/passive pair or something) with expensive IO hardware will give better write-throughput than using "cheap" IO hardware everywhere, but you'll then be limited by read-scalability.

Having your images in a database doesn't necessarily mean a database call for each one; you could cache these separately on each host (e.g. in temporary files) when they are retrieved. The source images would still be in the database and easy to synchronise across servers.
You also don't really need to add Apache handlers to serve an image through a PHP script whilst maintaining nice urls- you can make urls like http://server/image.php/param1/param2/param3.JPG and read the parameters through $_SERVER['PATH_INFO'] . You could also remove the 'image.php' portion of the URL (if you needed to) using mod_rewrite.

What you are looking for already exists and is called MogileFS
Target setup involves mogilefsd, replicated mysql databases and lighttd/perlbal for serving files; It will bring you failover, fine grained file replication (for exemple, you can decide to duplicate end-user images on several physical devices, and to keep only one physical instance of thumbnails). Load balancing can also be achieved quite easily.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.