Referencing Files from the Cloud

Referencing Files from the Cloud - php

I asked a question earlier about the difference between cloud apps and web apps, and the answers and links I received made me to believe that 'cloud' is more of a location of an application, and not just about specific applications. And that prompts these questions:
1) If I'm developing an application that'll be based in the cloud using PHP and MySQL; traditional server setups requires me to have a PHP and MySQL engines on the server; otherwise, they won't run. Is it the same with the cloud? Do I have to look for clouds with these engines, install them myself, or they aren't needed at all?
2) When building applications, files are usually referenced relatively or absolutely, based on their location to the calling file. With the cloud, since you don't know the location of the files, how can you reference the required files? Do you have to use URLs for that?
I've pored over many of the cloud questions on here, and it seems that there are a lot of confused souls out there just like myself, and most of the answers don't seem too convincing. Hence, my reason for asking again.
Thanks.

The cloud doesn't mean you don't know the location of your files, it only means that the files are not stored on the end user's computer (possibly). From your perspective as the developer of the web application, you still will (indeed must) know the locations of any stored files, since it is your application storing them.
To give your end user a reference URL to a file, you can do many different things. One method, for example, involves storing some kind of unique identifier along with the file path to a stored file on your server together in a database. You give the user a URL that references the unique identifier, and in your code you then retrieve the file from disk and stream it down to the user using the correct headers.
Another method is to store files in the database as binary BLOBs, and retrieve the data and send it down to the browser with the correct headers. Again, you as the application developer are still responsible for the fate of those files, even though the end user doesn't need to worry about where or how they're stored.

Related

Back-end functionality for a file sharing website

How file uploads to a clustered environment are typically handled using PHP?
I am currently working on a backend system for a cloud storage/sharing website
I would like to make sure that a clients upload is stored to the node with least load and most available resources, creating a unique file reference stored to the database. Also multiple uploads of the same file should be detected to reduce space.
Are there any best practices, design patterns documentation covering this topic and might help illustrating what is needed to get things done here.

Web Application Activation | Computer, Local Server

Have done some research and found some stuff that may be helpful.
I would like your opinion about my approaches on this.
THE GOAL
I will develop an application in PHP (That's the only language I know and unfortunately I don't have time to learn another one right now). I want this application to be able to run offline and locally to any pc. I will use Wamp server and cakePHP framework for this.
THE PROBLEM
This application will be for sale. So I will need some activation method to prevent each app from being used in multiple computers. I don't want something complicated or very very secure. I just need something simple, to prevent non-programmers to run this app in any computer. Of course, the more secure, the better! :)
POSSIBLE SOLUTIONS I AM THINKING OF
First of all, I am thinking to force users to activate their application, by going online during installation. That way they could get a unique KEY from my online database.
I found php's shell_exec command. So I am thinking, during online installation, to get the Host ID (Machine ID) of that computer, send it to my server and store it to my database next to a unique KEY. Then Machine ID and unique KEY can be stored to a php file. (Could I store it somewhere more secure? Maybe encrypt it?)
Every time the user opens the application, php will read machine ID. If not the same with the one stored in php file, an activation will be required. (Maybe could store computer's name too or some other id?)
Is that a good approach? Would it be possible?
Another approach I am thinking of, is to have a guy create a non php installation file. When run, will promp wamp installation and when installation finishes, will transfer all necessary files to wamp root folder (automatization for the user). I can only guess though this will work, as my knowledge over other languages is limited...
Could I benefit from this in validation terms? Can a non php file interact with my php application and validate it, for only one unique computer?
Any info will be very appreciated. I have just started building the application and want to know if there is a good way (or non) to secure it.
Thanks!

There is no point in all of this because if people want they can simply crack any of the copy protection methods you came up with. This also applies to any other app written in any other language. If people want to use it without permissions there are ways to do that.
There are some ways to obfuscate the code (see Is there a code obfuscator for PHP?) but these solutions are just silly because if people really want they can get the code in plain text anyways.
A better idea might be to run the app on your server and allow people to pay for it monthly, Software as a Service like Google Apps for Business.

Storing profile pictures. (Database or Filesystem?) [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Storing Images in DB - Yea or Nay?
I need to store profile pictures. Lots of them.
I'm not sure if I should store them in the database. I'm not even sure if that's a good idea to begin with, or if I should just store them in a separate directory on the server, and disallow access to them with HTAccess.
But I'm not overly familiar with HTAccess and when I have used snippets to disallow access to a folder, it has just never worked.
I am using winhost.com to host my sites, so I would assume that HTAccess would work.
Can anyone suggest which way would be better for storing tens of thousands of profile pictures on a single server? I have read many blogs, forum posts etc that I've found on Google, and am a little bit more confused since half of them suggest one thing, and the other half disagree and suggest using a database would be perfectly fine.

Personal experience says that storing lots of image in a database makes the database very slow to back up. That can be irritating when you come to run repeatable tests, or update the DB schema and you want to take an ad-hoc backup, as well as in a general case. Also, depending on database, storing blobs (which inevitably means that you're storing rows of non-fixed length) can make querying the table quite slow - although that can easily be fixed with appropriate indexing.
If you store them in the filesystem and serve them directly with your webserver as you suggest, one problem you will find is how to appropriately access-control them if you want only logged-in users to see them. That will depend on the design of your application and may not be a problem.
Two other options:
you can store them in the filesystem and serve them with an application page, so that it can e.g. check access control before fetching the image and sending it to the client.
you can use X-SendFile: headers if your webserver supports them to serve a file on the filesystem - the application page tells the webserver the file to fetch, and the webserver will fetch the file and send it. Potentially the application and the image files can live on different machines if you use e.g. FastCGI, and the image is never sent over the FastCGI connection.
You may also want to consider cacheing - if you write any programmatic way to send the file, you'll need to add additional logic so that the image can be cached by the browser, or you'll just end up serving the image over and over again and upping your bandwidth costs.

There is a trade off - it will depend on your exact situation and needs. The benefits of each include
Filesystem
Performance, especially caching and I/O
Database
Easier to scale out to multiple web servers
Easier to administer (backup, security etc)
I'm guessing that you are using MySQL, but on the off chance that you have a SQL 2008 DB, have a look at FileStream in this SO article - this gives the best of both worlds.

I'd definitely root for storing only the image path in the database. Storing the image data will slow your site down and put extra strain on your system.
The only case I could imagine an advantage in storing the image data inside the database would be, if you're planning on moving the site around. Then you wouldn't have to worry about filepaths etc..

Future proof file storage

I accept file uploads from users. Each file has a pointer in the db which has info on the file location in the filesystem.
Currently, I'm storing the files in the filesystem non categorically, and each file is currently just named a unique value. All categorisation and naming etc is done in the app using the db.
A factor that I'm concerned about is that of file synchronization issues.
If I wanted to set up file system synchronization where, for example, the user's files are automatically updated by bridging with a pc app, would this system still work well?
I have no idea how such a system would work so hopefully I can get some input.
Basically, is representing a file's name and location purely in the database optimal, especially if said file may be synchronized with a pc application?

Yes, the way you are doing this is the best way to do it. You are using a file system to store files and a database to sore structured data.
One suggestion I would make is that you create a directory tree on the file system. You may one day run up against a maximum files per directory limitation of your file system. I have built systems that create a new sub directory for each day or week.
Make sure you have good backups of the database as well as the document repository.

All you need to make such a system work is to make sure the API you use (or, more likely, create) can talk to the database and to the filesystem in a sensible way. Since this is what your site is already doing anyway, it shoudn't be hard to implement.
The mere fact that your files are given identifiers instead of plain-English names is mostly irrelevant with regard to remote synchronization.

Store a file hash in the database rather than a path (i.e. SHA1) and have a separate database connect the hash with the path. Write a small app that will synchronize the hash database so that when you move your files to a different location it'll be easy to build a new database with updated paths.
That way you can also have the system load the file from a different location depending of which hash database you use to locate the file so it offers some transparency if you need people to be able to access the same file from diverse locations (i.e. nfs or webdav).

We use exactly this model for file storage, along with (shameless plug) SabreDAV to make it seem to the end-user it's a normal filesystem.
I think this is a perfectly fine model, as long as looking up the file is documented and easily retrieved there shouldn't be an issue. Just make backups of your DB :)
One other advice I can give, we use an md5() on the file-id to generate a unique filename. We use parts of the files to generate a directory structure, for example.. id 1 will yield: b026324c6904b2a9cb4b88d6d61c81d1, the resulting filename will become:
b02/632/4c6/904b2a9cb4b88d6d61c81d1 The reason for this is that most stable filesystems can become very slow after a high number of files (or directories) in one directory. It's much, much faster too traverse a few sub-directories.

The Boring Answer™:
I think it depends on what you wanna do, as always :)
I mean take your regular web hosting company. Developers are synching files to web servers all the time. Would it make sense for a web server to store hash-generated file names in a db that pointed to physical files? No. Then you couldn't log in with your FTP-client and upload files like that, and you'd have to code a custom module to get Apache to work etc. Instant headache.
Does it make sense for Flickr to use a db? Yes, absolutely! (Then again, you can't log in with an FTP-client and manage your photos—and that's probably a good thing!)
Just remember, a file system is a (very simple) db too. And it's a db that comes with a lot of useful free tools.
my 2¢
/0

File / Image Replication

I have a simple question and wish to hear others' experiences regarding which is the best way to replicate images across multiple hosts.
I have determined that storing images in the database and then using database replication over multiple hosts would result in maximum availability.
The worry I have with the filesystem is the difficulty synchronising the images (e.g I don't want 5 servers all hitting the same server for images!).
Now, the only concerns I have with storing images in the database is the extra queries hitting the database and the extra handling i'd have to put in place in apache if I wanted 'virtual' image links to point to database entries. (e.g AddHandler)
As far as my understanding goes:
If you have a script serving up the
images: Each image would require a
database call.
If you display the images inline as
binary data: Which could be done in
a single database call.
To provide external / linkable
images you would have to add a
addHandler for the extension you
wish to 'fake' and point it to your
scripting language (e.g php, asp).
I might have missed something, but I'm curious if anyone has any better ideas?
Edit:
Tom has suggested using mod_rewrite to save using an AddHandler, I have accepted as a proposed solution to the AddHandler issue; however I don't yet feel like I have a complete solution yet so please, please, keep answering ;)
A few have suggested using lighttpd over Apache. How different are the ISAPI modules for lighttpd?

If you store images in the database, you take an extra database hit plus you lose the innate caching/file serving optimizations in your web server. Apache will serve a static image much faster than PHP can manage it.
In our large app environments, we use up to 4 clusters:
App server cluster
Web service/data service cluster
Static resource (image, documents, multi-media) cluster
Database cluster
You'd be surprised how much traffic a static resource server can handle. Since it's not really computing (no app logic), a response can be optimized like crazy. If you go with a separate static resource cluster, you also leave yourself open to change just that portion of your architecture. For instance, in some benchmarks lighttpd is even faster at serving static resources than apache. If you have a separate cluster, you can change your http server there without changing anything else in your app environment.
I'd start with a 2-machine static resource cluster and see how that performs. That's another benefit of separating functions - you can scale out only where you need it. As far as synchronizing files, take a look at existing file synchronization tools versus rolling your own. You may find something that does what you need without having to write a line of code.

Serving the images from wherever you decide to store them is a trivial problem; I won't discuss how to solve it.
Deciding where to store them is the real decision you need to make. You need to think about what your goals are:
Redundancy of hardware
Lots of cheap storage
Read-scaling
Write-scaling
The last two are not the same and will definitely cause problems.
If you are confident that the size of this image library will not exceed the disc you're happy to put on your web servers (say, 200G at the time of writing, as being the largest high speed server-grade discs that can be obtained; I assume you want to use 1U web servers so you won't be able to store more than that in raid1, depending on your vendor), then you can get very good read-scaling by placing a copy of all the images on every web server.
Of course you might want to keep a master copy somewhere too, and have a daemon or process which syncs them from time to time, and have monitoring to check that they remain in sync and this daemon works, but these are details. Keeping a copy on every web server will make read-scaling pretty much perfect.
But keeping a copy everywhere will ruin write-scalability, as every single web server will have to write every changed / new file. Therefore your total write throughput will be limited to the slowest single web server in the cluster.
"Sharding" your image data between many servers will give good read/write scalability, but is a nontrivial exercise. It may also allow you to use cheap(ish) storage.
Having a single central server (or active/passive pair or something) with expensive IO hardware will give better write-throughput than using "cheap" IO hardware everywhere, but you'll then be limited by read-scalability.

Having your images in a database doesn't necessarily mean a database call for each one; you could cache these separately on each host (e.g. in temporary files) when they are retrieved. The source images would still be in the database and easy to synchronise across servers.
You also don't really need to add Apache handlers to serve an image through a PHP script whilst maintaining nice urls- you can make urls like http://server/image.php/param1/param2/param3.JPG and read the parameters through $_SERVER['PATH_INFO'] . You could also remove the 'image.php' portion of the URL (if you needed to) using mod_rewrite.

What you are looking for already exists and is called MogileFS
Target setup involves mogilefsd, replicated mysql databases and lighttd/perlbal for serving files; It will bring you failover, fine grained file replication (for exemple, you can decide to duplicate end-user images on several physical devices, and to keep only one physical instance of thumbnails). Load balancing can also be achieved quite easily.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.