I am in the process of developing an application (in Go or possibly in PHP) where users needs to upload photos and images.
I have setup up a couple of ZFS (mirror) storage servers on different locations, but I am in doubt about how to let users best upload files. ZFS handles quotas and reservation.
I am running with a replicated Galera database on all servers, both for safety, but also for easy access to user accounts from each server. In other words, each server has a local copy of the database all the time. All users are virtual users only.
So far I have testes the following setup options:
Solution 1
Running SFTP (ProFTPD with module) or FTPS (Pure-FTPs with TLS) on the storage servers with virtual users.
This gives people direct access to the storage servers using a client like Filezilla. At the same time users can also upload using our web GUI from our main web server.
One advantage with this setup is that the FTP server can handle virtual users. Our web application will also send files via SFTP or FTPS.
One disadvantage is that FTP is meeh, annoying to firewall. Also I much prefer FTP over SSH (SFTP), rather than FTP over TLS (FTPS). However, only ProFTPD has a module for SSH, but it has been a real pain (many problems with non-working configuration options and file permission errors) to work with compared to PureFTPd, but PureFTPd only supports TLS.
Running with real SSH/SCP accounts and using PAM is not an option.
Solution 2
Mount the storage servers locally on the web server using NFS or CIFS (Samba is great at automatic resume in case a box goes down).
In this setup users can only upload via our main web server. The web server application, and the application running on the storage servers, then needs to support resumable uploads. I have been looking into using the tus protocol.
A disadvantage with both the above setups is that storage capacity needs to be managed somehow. When storage server 1 reaches its maximum number of users, the application needs to know this and then only create virtual users for storage server 2, 3, etc.
I have calculated how many users each storage server can hold and then have the web application check the database with virtual users to see when it needs to move newly created users to the next storage server.
This is rather old school, but it works.
Solution 3
Same as solution 2 (no FTP), but clone our web application upload thingy to each storage server and then redirect users (or provide them with a physical link to the storage server, s1.example.com, s2.example.com, etc.)
The possibly advantage with this setup is that users upload directly to the storage server they have been assigned to rather than go trough our main web server (preventing it from becoming a possible bottleneck).
Solution 4
Use GlusterFS on the storage servers and build a cluster that can easily be expanded. I have tested out GlusterFS and it works very well for this purpose.
The advantage with this setup is that I don't really need to care about where files physically go on which storage servers, and I can easily expand storage by adding more servers to the cluster.
However, the disadvantage here is again that our main web server might become a bottleneck.
I have also considered adding a load balancer and then use multiple web server in case our main web server becomes a bottleneck for uploading files.
In any case, I much prefer to keep it simple! I don't like adding stuff. I want it to be easy to maintain in the long run.
Any ideas, suggestions, and advice will be greatly appreciated.
How do you do it?
A web application should be agnostic of the underlying storage in case we are talking of file storage; Separation of concerns.
(S)FTP(S) on the other hand is not a storage method. It is a communication protocol. It does not preclude you from having a shared storage. See above.
ZFS does not come with the ability of shared storage included, so you are basically down to the following choices:
Which underlying filesystem?
Do I want to offer an additional access mode via (S)FTP(S)?
How do I make my filesystem available across multiple servers? GlusterFS, CIFS or NFS?
So, let us walk this through.
Filesystem
I know ZFS is intriguing, but here is the thing: xfs for example already has a maximum filesystem size of 8 exbibytes minus one byte. The specialist term for this is "a s...load". To give you a relation: The library of congress holds about 20TB of digital media - and would fit into that roughly 400k times. Even good ol' ext4 can hold 50k LoCs. And if you hold that much data, your FS is your smallest concern. Building the next couple of power plants to keep your stuff going presumably is.
Gist Nice to think about, but use whatever you feel comfortable with. I personally use xfs (on LVM) for pretty much everything.
Additional access methods
Sure, why not? Aside from the security nightmare (privilege escalation, anyone?). And ProFTPd, with it's in build coffee machine and kitchen sink is the last FTP server I would use for anything. It has a ginormous code base, which lends itself to accidentally introducing vulnerabilities.
Basically it boils down to the skills present in the project. Can you guys properly harden a system and an FTP server and monitor it for security incidents? Unless your answer is a confident "Yes, ofc, plenty with experience with it!" you should minimize the attack surface you present.
Gist Don't, unless you really know what you are doing. And if you have to ask, you probably do not. No offense intended, just stating facts.
Shared filesystem
Personally, I have made... less than perfect experiences with GlusterFS. The replication has quite some requirements when it comes to network latency and stuff. In a nutshell: if we are talking of multiple availability zones, say EMEA, APAC and NCSA, it is close to impossible. You'd be stuck to georeplication, which is less than ideal for the use case you describe.
NFS and CIFS on the other hand have the problem that there is no replication at all, and all clients need to access the same server instance in order to access the data - hardly a good idea if you think you need an underlying ZFS to get along.
Gist Shared filesystems at a global scale with halfway decent replication lags and access times are very hard to do and can get very expensive.
Haha, Smartypants, so what would you suggest?
Scale. Slowly. In the beginning, you should be able to get along with a simple FS based repository for your files. And then check various other means for large scale shared storage and migrate to it.
Taking the turn towards implementation, I would even go a step further, you should make your storage an interface:
// Storer takes the source and stores its contents under path for further reading via
// Retriever.
type Storer interface {
StreamTo(path string, source io.Reader) (err error)
}
// Retriever takes a path and streams the file it has stored under path to w.
type Retriever interface {
StreamFrom(path string, w io.Writer) (err error)
}
// Repository is a composite interface. It requires a
// repository to accept andf provide streams of files
type Repository interface {
Storer
Retriever
Close() error
}
Now, you can implement various storage methods quite easily:
// FileStore represents a filesystem based file Repository.
type FileStore struct {
basepath string
}
// StreamFrom statisfies the Retriever interface.
func (s *FileStore) StreamFrom(path string, w io.Writer) (err error) {
f, err := os.OpenFile(filepath.Join(s.basepath, path), os.O_RDONLY|os.O_EXCL, 0640)
if err != nil {
return handleErr(path, err)
}
defer f.Close()
_, err = io.Copy(w, f)
return err
}
Personally, I think this would be a great use case for GridFS, which, despite its name is not a filesystem, but a feature of MongoDB. As for the reasons:
MongoDB comes with a concept called replica sets to ensure availability with transparent automatic failover between servers
It comes with a rather simple mechanism of automatic data partitioning, called a sharded cluster
It comes with an indefinite number of access gateways called mongos query routers to access your sharded data.
For the client, aside from the connection URL, all this is transparent. So it does not make a difference (almost, aside from read preference and write concern) whether it's storage backend consists of a single server or a globally replicated sharded cluster with 600 nodes.
If done properly, there is not a single point of failure, you can replicate across availability zones while keeping the "hot" data close to the respective users.
I have created a repository on GitHub which contains an example of the interface suggestion and implements a filesystem based repository as well as a MongoDB repository. You might want to have a look at it. It lacks caching at the moment. In case you would like to see that implemented, please open an issue there.
Related
I did some Google searches and can't seem to find what i want. I'm designing my web site to use MYSQL, PHP Web Servers. multiple web servers with load balancers and a MySql Custer for scaling is planed so far. But then i get to images/videos/mp3s. I need a file system multiple servers can read files from and write files to. So one web server can run the MySQL, Networked File System and Web Server, but as the site scales the site can be switched to multiple servers. Does anyone have any examples, tutorials or resources to help me on this? The site runs on Ubuntu Servers. My original idea was to just store the images in MySQL(I know how to do that and have working examples) so all servers could read/write but other people told me thats a bad idea and i should use a file system(but don't want to use the local one, as i don't think it san scale for large sites).
There are Three systems that come to mind - Mogilefs, Mongodb GridFS and a cloud based storage solution.
MogileFS (OMG Files!) was developed for Livejournal and stores metadata in Mysql. It uses that to find the actual disk with the appropriate file and streams it out.
MongoDB GridFS is a lot newer, and probably easier to get going, certainly for a smaller system. It uses a new 'NoSql' database to store parts of files across its database, assembling as required. Searching around for information will find plenty of information.
Finally, you could simply avoid the whole issue and just upload images into Amazon's S3, or Rackspace Cloudfiles. I've done the latter before (though the site was already running inside Rackspace's system) and it's not very difficult, again with plenty of examples around.
For S3 there is also a command-line tool, s3cmd that can be set to sync (or, better) upload and then delete a directory full of files into an S3 'bucket'.
First storing images/large files is not really possible with MySQL because of the maximum size limitation
To quote this answer Choosing data type for MySQL?
MySQL is incapable of working with any data that is larger than max_allowed_packet (default: 1M) in size, unless you construct complicated and memory intense workarounds at the server side. This further restricts what can be done with TEXT/BLOB-like types, and generally makes the LARGETEXT/LARGEBLOB type useless in a default configuration.
Now for storage and upgrade compatibility why not just store them on an NAS or Raid system that you can continue to tack drives onto. Then in your DB just store a path to the file. Much lest db intensive and allows for decent scalability.
I'm planning a system for serving image files from a server cluster with load-balancing. I'm battling with the architechture and whether to save the actual image files as blobs in the database or in filesystem.
My problem is that, the database connection is required anyways as the users need to be authenticated. Different users have access only to contents of their friends and items uploaded by themselves. Since the connection is required anyways, maybe the images could be retrieved from there aswell?
Images should be stored with no single point of failure. And obviously, the system should be fast.
For database approach:
The database is separate from rest of my application, so my applications main database won't get bloated by all the images. Database would be easy to scale as I just need to add more servers to the cluster. Problem is, that I've heard this might be a slow system from a website with millions, even billions of photos.
For filesystem:
I would be really interested in knowing how could one design a system, where the webservers are load balanced, and none of them is too important for the overall system. All the servers should use a common storage, so they can access the same files in the cluster.
What do you think? Which is the best solution in this case?
What kind of overall architechture and servers would you recommend for a image serving cluster? Note: This cluster only serves images. Applications servers are a whole different story.
I definitely wouldn't store them in the database. If you need to use PHP for authentication, then do that as quickly as possible and use X-SendFile to hand over the actual image serving to your web server.
For the filesystem it sounds like MogileFS would be a good fit.
For the web server I'd suggest nginx. If you can adapt your authentication mechanism to use one of the existing modules, or write your own module for it, you could omit PHP completely (there's already a MogileFS client module).
I have a file host website thats burning through 2gbit of bandwidth, so I need to start adding secondary media servers to store the files. What would be the best way to manage a multiple server setup, with a large amount of files? Preferably through php only.
Currently, I only have around 100Gb of files... so I could get a 2nd server, mirror all content between them, and then round robin the traffic 50/50, 33/33/33, etc. But once the total amount of files grows beyond the capacity of a single server, this wont work.
The idea that I had was to have a list of media servers stored in the DB with the amounts of free space left on each server. Once a file is uploaded, php will choose to which server the file is actually uploaded to, and spread out all the files evenly among the servers.
Was hoping to get some more input/inspiration.
Cant use any 3rd party services like Amazon. The files range from several bytes to a gigabyte.
Thanks
You could try MogileFS. It is a distributed file system. Has a good API for PHP. You can create categories and upload a file to that category. For each category you can define on how many servers it should be distributed. You can use the API to get a URL to that file on a random node.
If you are doing as much data transfer as you say, it would seem whatever it is you are doing is growing quite rapidly.
It might be worth your while to contact your hosting provider and see if they offer any sort of shared storage solutions via iscsi, nas, or other means. Ideally the storage would not only start out large enough to store everything you have on it, but it would also be able to dynamically grow beyond your needs. I know my hosting provider offers a solution like this.
If they do not, you might consider colocating your servers somewhere that either does offer a service like that, or would allow you install your own storage server (which could be built cheaply from off the shelf components and software like Freenas or Openfiler).
Once you have a centralized storage platform, you could then add web-servers to your hearts content and load balance them based on load, all while accessing the same central storage repository.
Not only is this the correct way to do it, it would offer you much more redundancy and expandability in the future if you endeavor continues to grow at the pace it is currently growing.
The other solutions offered using a database repository of what is stored where, would work, but it not only adds an extra layer of complexity into the fold, but an extra layer of processing between your visitors and the data they wish to access.
What if you lost a hard disk, do you lose 1/3 or 1/2 of all your data?
Should the heavy IO's of static content be on the same spindles as the rest of your operating system and application data?
Your best bet is really to get your files into some sort of storage that scales. Storing files locally should only be done with good reason (they are sensitive, private, etc.)
Your best bet is to move your content into the cloud. Mosso's CloudFiles or Amazon's S3 will both allow you to store an almost infinite amount of files. All your content is then accessible through an API. If you want, you can then use MySQL to track meta-data for easy searching, and let the service handle the actual storage of the files.
i think your own idea is not the worst one. get a bunch of servers, and for every file store which server(s) it's on. if new files are uploaded, use most-free-space first*. every server handles it's own delivery (instead of piping through the main server).
pros:
use multiple servers for a single file. e.g. for cutekitten.jpg: filepath="server1\cutekitten.jpg;server2\cutekitten.jpg", and then choose the server depending on the server load (or randomly, or alternating, ...)
if you're careful you may be able to move around files automatically depending on the current load. so if your cute-kitten image gets reddited/slashdotted hard, move it to the server with the lowest load and update the entry.
you could do this with a cron-job. just log the downloads for the last xx minutes. try some formular like (downloads-per-minutefilesize(product of serverloads)) for weighting. pick tresholds for increasing/decreasing the number of servers those files are distributed to.
if you add a new server, it's relativley painless (just add the address to the server pool)
cons:
homebrew solutions are always risky
your load distribution algorithm must be well tested, otherwise bad things could happen (everything mirrored everywhere)
constantly moving files around for balancing adds additional server load
* or use a mixed weighting algorithm: free-space, server-load, file-popularity
disclaimer: never been in the situation myself, just guessing.
Consider HDFS, which is part of Apache's Hadoop. This will integrate with PHP, but you'll be setting up a second application. This will also solve all your points of balancing among servers and handling things when your file space usage exceeds one server's ability. It's not purely in PHP, though, but I don't think that's what you meant when you said "pure" anyway.
See http://hadoop.apache.org/core/docs/current/hdfs_design.html for the idea of it. They cover the whole idea of how it handles large files, many files, replication, etc.
I have a simple question and wish to hear others' experiences regarding which is the best way to replicate images across multiple hosts.
I have determined that storing images in the database and then using database replication over multiple hosts would result in maximum availability.
The worry I have with the filesystem is the difficulty synchronising the images (e.g I don't want 5 servers all hitting the same server for images!).
Now, the only concerns I have with storing images in the database is the extra queries hitting the database and the extra handling i'd have to put in place in apache if I wanted 'virtual' image links to point to database entries. (e.g AddHandler)
As far as my understanding goes:
If you have a script serving up the
images: Each image would require a
database call.
If you display the images inline as
binary data: Which could be done in
a single database call.
To provide external / linkable
images you would have to add a
addHandler for the extension you
wish to 'fake' and point it to your
scripting language (e.g php, asp).
I might have missed something, but I'm curious if anyone has any better ideas?
Edit:
Tom has suggested using mod_rewrite to save using an AddHandler, I have accepted as a proposed solution to the AddHandler issue; however I don't yet feel like I have a complete solution yet so please, please, keep answering ;)
A few have suggested using lighttpd over Apache. How different are the ISAPI modules for lighttpd?
If you store images in the database, you take an extra database hit plus you lose the innate caching/file serving optimizations in your web server. Apache will serve a static image much faster than PHP can manage it.
In our large app environments, we use up to 4 clusters:
App server cluster
Web service/data service cluster
Static resource (image, documents, multi-media) cluster
Database cluster
You'd be surprised how much traffic a static resource server can handle. Since it's not really computing (no app logic), a response can be optimized like crazy. If you go with a separate static resource cluster, you also leave yourself open to change just that portion of your architecture. For instance, in some benchmarks lighttpd is even faster at serving static resources than apache. If you have a separate cluster, you can change your http server there without changing anything else in your app environment.
I'd start with a 2-machine static resource cluster and see how that performs. That's another benefit of separating functions - you can scale out only where you need it. As far as synchronizing files, take a look at existing file synchronization tools versus rolling your own. You may find something that does what you need without having to write a line of code.
Serving the images from wherever you decide to store them is a trivial problem; I won't discuss how to solve it.
Deciding where to store them is the real decision you need to make. You need to think about what your goals are:
Redundancy of hardware
Lots of cheap storage
Read-scaling
Write-scaling
The last two are not the same and will definitely cause problems.
If you are confident that the size of this image library will not exceed the disc you're happy to put on your web servers (say, 200G at the time of writing, as being the largest high speed server-grade discs that can be obtained; I assume you want to use 1U web servers so you won't be able to store more than that in raid1, depending on your vendor), then you can get very good read-scaling by placing a copy of all the images on every web server.
Of course you might want to keep a master copy somewhere too, and have a daemon or process which syncs them from time to time, and have monitoring to check that they remain in sync and this daemon works, but these are details. Keeping a copy on every web server will make read-scaling pretty much perfect.
But keeping a copy everywhere will ruin write-scalability, as every single web server will have to write every changed / new file. Therefore your total write throughput will be limited to the slowest single web server in the cluster.
"Sharding" your image data between many servers will give good read/write scalability, but is a nontrivial exercise. It may also allow you to use cheap(ish) storage.
Having a single central server (or active/passive pair or something) with expensive IO hardware will give better write-throughput than using "cheap" IO hardware everywhere, but you'll then be limited by read-scalability.
Having your images in a database doesn't necessarily mean a database call for each one; you could cache these separately on each host (e.g. in temporary files) when they are retrieved. The source images would still be in the database and easy to synchronise across servers.
You also don't really need to add Apache handlers to serve an image through a PHP script whilst maintaining nice urls- you can make urls like http://server/image.php/param1/param2/param3.JPG and read the parameters through $_SERVER['PATH_INFO'] . You could also remove the 'image.php' portion of the URL (if you needed to) using mod_rewrite.
What you are looking for already exists and is called MogileFS
Target setup involves mogilefsd, replicated mysql databases and lighttd/perlbal for serving files; It will bring you failover, fine grained file replication (for exemple, you can decide to duplicate end-user images on several physical devices, and to keep only one physical instance of thumbnails). Load balancing can also be achieved quite easily.
In ASPNET, I grew to love the Application and Cache stores. They're awesome. For the uninitiated, you can just throw your data-logic objects into them, and hey-presto, you only need query the database once for a bit of data.
By far one of the best ASPNET features, IMO.
I've since ditched Windows for Linux, and therefore PHP, Python and Ruby for webdev. I use PHP most because I dev several open source projects, all using PHP.
Needless to say, I've explored what PHP has to offer in terms of caching data-objects. So far I've played with:
Serializing to file (a pretty slow/expensive process)
Writing the data to file as JSON/XML/plaintext/etc (even slower for read ops)
Writing the data to file as pure PHP (the fastest read, but quite a convoluted write op)
I should stress now that I'm looking for a solution that doesn't rely on a third party app (eg memcached) as the apps are installed in all sorts of scenarios, most of which don't have install rights (eg: a cheap shared hosting account).
So back to what I'm doing now, is persisting to file secure? Rule 1 in production server security has always been disable file-writing, but I really don't see any way PHP could cache if it couldn't write. Are there any tips and/or tricks to boost the security?
Is there another persist-to-file method that I'm forgetting?
Are there any better methods of caching in "limited" environments?
Serializing is quite safe and commonly used. There is an alternative however, and that is to cache to memory. Check out memcached and APC, they're both free and highly performant. This article on different caching techniques in PHP might also be of interest.
Re: Is there another persist-to-file method that I'm forgetting?
It's of limited utility but if you have a particularly beefy database query you could write the serialized object back out to an indexed database table. You'd still have the overhead of a database query, but it would be a simple select as opposed to the beefy query.
Re: Is persisting to file secure? and cheap shared hosting account)
The sad fact is cheap shared hosting isn't secure. How much do you trust the 100,500, or 1000 other people who have access to your server? For historic and (ironically) security reasons, shared hosting environments have PHP/Apache running as a unprivileged user (with PHP running as an Apache module). The security rational here is if the world facing apache process gets compromised, the exploiters only have access to an unprivileged account that can't screw with important system files.
The bad part is, that means whenever you write to a file using PHP, the owner of that file is the same unprivileged Apache user. This is true for every user on the system, which means anyone has read and write access to the files. The theoretical hackers in the above scenario would also have access to the files.
There's also a persistent bad practice in PHP of giving a directory permissions of 777 to directories and files to enable the unprivileged apache user to write files out, and then leaving the directory or file in that state. That gives anyone on the system read/write access.
Finally, you may think obscurity saves you. "There's no way they can know where my secret cache files are", but you'd be wrong. Shared hosting sets up users in the same group, and most default file masks will give your group users read permission on files you create. SSH into your shared hosting account sometime, navigate up a directory, and you can usually start browsing through other users files on the system. This can be used to sniff out writable files.
The solutions aren't pretty. Some hosts will offer a CGI Wrapper that lets you run PHP as a CGI. The benefit here is PHP will run as the owner of the script, which means it will run as you instead of the unprivileged user. Problem averted! New Problem! Traditional CGI is slow as molasses in February.
There is FastCGI, but FastCGI is finicky and requires constant tuning. Not many shared hosts offer it. If you find one that does, chances are they'll have APC enabled, and may even be able to provide a mechanism for memcached.
I had a similar problem, and thus wrote a solution, a memory cache written in PHP. It only requires the PHP build to support sockets. Other then that, it is a pure php solution and should run just fine on Shared hosting.
http://code.google.com/p/php-object-cache/
What I always do if I have to be able to write is to ensure I'm not writing anywhere I have PHP code. Typically my directory structure looks something like this (it's varied between projects, but this is the general idea):
project/
app/
html/
index.php
data/
cache/
app is not writable by the web server (neither is index.php, preferably). cache is writable and used for caching things such as parsed templates and objects. data is possibly writable, depending on need. That is, if the users upload data, it goes into data.
The web server gets pointed to project/html and whatever method is convenient is used to set up index.php as the script to run for every page in the project. You can use mod_rewrite in Apache, or content negotiation (my preference but often not possible), or whatever other method you like.
All your real code lives in app, which is not directly accessible by the web server, but should be added to the PHP path.
This has worked quite well for me for several projects. I've even been able to get, for instance, Wikimedia to work with a modified version of this structure.
Oh... and I'd use serialize()/unserialize() to do the caching, although generating PHP code has a certain appeal. All the templating engines I know of generate PHP code to execute, making post-parse very fast.
If you have access to the Database Query Cache (ie. MySQL) you could go with serializing your objects and storing them in the DB. The database will take care of holding the query results in memory so that should be pretty fast.
You don't spell out -why- you're trying to cache objects. Are you trying to speed up a slow database query, work around expensive object instantiation, avoid repeated generation of complex page, maintain application state or are you just compulsively storing away objects in case of a long winter?
The best solution, given the atrocious limitations of most low-cost shared hosting, is going to depend on what you're trying to accomplish. Going for bottom of the barrel shared-hosting means you have to accept that you won't be working with the best tools. The numbers are hard to quantify, but there's a trade off between hosting costs, site performance & developer time (ie - fast, cheap or easy).
It's in theory possible to store objects in sessions. That might get you past the file writing disabled problem. Additionally you could store the session in a mysql memory backed table to speed up the query.
Some hosting places may have APC compiled in.. That would allow you to store the objects in memory.