File creation limits - php

Dear respective developer across the globe,
I'm only seeking knowledge and understanding. Please provide as much
information as you can to not only help me out, but also others around the
world.
I will divide my questions so it's easier to understand what i'm asking for
and allow you to answer the questions individually for better
understanding.
My questions is:
a)
I'm runing a home server using mamp pro on a windows 8 32gb ram, 4k ssd, i7 cpu.My server is dns is set with cloudflare.com. When people from the public world view my site there is a php script that create a text-file with their username. Like: {username}.txt. For every username. The same file get re-created everytime the user login to keep the data fresh about him. That was some information about what i'm doing. What i want to understand is. Is there any limits? let say 500000 people at same time try to reach my site and every user will make my site create a new fresh txt file for him. Will it work? is there any problem.. please share with me.
b)
Can a textfile get views by let say 1000000 as same time? i'm talking about viewed not created here.

The number of files is not limited at all by the operating system or by php.
But is is limited by the file system you safe the files in. The exact numbers depend on the type of file system and its configuration. Typical limits are 32000 inodes in a single directory. But as mentioned that can be configured.
What is typically done in such cases is that you spread all those files over directories, where the directories are named by a substring of the file name itself. So for example the file somegoodguy.txt is saved under /som/ego/odg/somegoodguy.txt. Provided you have a more or less equal usage of characters that should prevent that you hit any limits, since the files are equally spread over many, many folders.
However:
It is questionable if that is a good approach at all. File based storage is not exactly efficient. You get a much better performance if you use a database instead. One entry (row) per user in a database table. Accessing that information is really efficient and fast. And you don't have to worry about any limits.

Related

I get mass traffic to one of my PHP files, should I split to several same files?

Ok, so I get lots of traffic from around the world to a php file that is hosted on my server. this file runs several checks against the visitor, runs several SQL queries, and decide upon the user status what to do.
I'm getting like hundreds of hits per second.
So, my question is:
Should I create many same files, and randomly drive the traffic to each of the files I created?
I want to avoid traffic loss and overload, but I dont know if this even matter by splitting to different files.
Thanks for all the helpers.

Input on decision: file hosting with amazon s3 or similar and php

I appreciate your comments to help me decide on the following.
My requirements:
I have a site hosted on a shared server and I'm going to provide content to my users. About 60 GB of content (about 2000 files 30mb each. Users will have access to only 20 files at a time), I calculate about 100 GB monthly bandwidth usage.
Once a user registers for the content, links will be accessible for the user to download. But I want the links to expire in 7 days, with the posibility to increase the expiration time.
I think that the disk space and bandwidth calls for a service like Amazon S3 or Rackspace Cloud files (or is there an alternative? )
To manage the expiration I plan to somehow obtain links that expire (I think S3 has that feature, not Rackspace) OR control the expiration date on my database and have a batch process that will rename on a daily basis all 200 files on the cloud and on my database (in case a user copied the direct link, it won't work the next day, only my webpage will have the updated links). PHP is used for programming.
So what do you think? Cloud file hosting is the way to go? Which one? Does managing the links makes sense that way or it is too difficult to do that through programming (send commands to the cloud server...)
EDIT:
Some host companies have Unlimited space and Bandwidth on their shared plans.. I asked their support staff and they said that they really honor the "unlimited" deal. So 100 GB of transfer a month is ok, the only thing to look out is CPU usage. So going shared hosting is one more alternative to choose from..
FOLLOWUP:
So digging more into this I found that the TOS of the Unlimited plans say that it is not permitted to use the space primarily to host multimedia files. So I decided to go with Amazon s3 and the solution provided by Tom Andersen.
Thanks for the input.
I personally don't think you necessarily need to go to a cloud based solution for this. It may be a little costly. You could simply get a dedicated server instead. One provider that comes to mind gives 3,000 GB/month of bandwidth on some of their lowest level plans. That is on a 10Mbit uplink; you can upgrade to 100Mbps for $10/mo of 1Gbit for $20/mo. I won't mention any names, but you can search for dedicated servers and possibly find one to your liking.
As for expiring the files, just implement that in PHP backed by a database. You won't have to move files around, store all the files in a directory not accessible from the web, and use a PHP script to determine if the link is valid, and if so read the contents of the file and pass them through to the browser. If the link is invalid, you can show an error message instead. It's a pretty simple concept and I think there are a lot of pre-written scripts that do that available, but depending on your needs, it isn't too difficult to do it yourself.
Cloud hosting has advantages, but right now I think its costly and if you aren't trying to spread the load geographically or plan on supporting thousands of simultaneous users and need the elasticity of the cloud, you could possibly use a dedicated server instead.
Hope that helps.
I can't speak for S3 but I use Rackspace Cloud files and servers.
It's good in that you don't pay for incoming bandwidth, so uploads are super cheap.
I would do it like this:
Upload all the files you need to a 'private' container
Create a public container with CDN enabled
That'll give you a special url like http://c3214146.r65.ce3.rackcdn.com
Make your own CNAME DNS record for your domain point to that, like: http://cdn.yourdomain.com
When a user requests a file, use the COPY api operation with a long random filename to do a server side copy from the private container to the public container.
Store the filename in a mysql DB for your app
Once the file expires, use the DELETE api operation, then the PURGE api operation to get it out of the CDN .. finally delete the record from the mysql table.
With the PURGE command .. I heard it doesn't work 100% of the time and it may leave the file around for an extra day .. also in the docs it says to reserve it's use for only emergency things.
Edit: I just heard, there's a 25 purge per day limit.
However personally I've just used delete on objects and found that took it out the CDN straight away. In summary, the worst case would be that the file would still be accessible on some CDN nodes for 24 hours after deletion.
Edit: You can change the TTL (caching time) on the CDN nodes .. default is 72 hours so might pay to set it to something lower .. but not so low that you loose the advantage of CDN.
The advantages I find with the CDN are:
It pushes content right out to end users far away from the USA servers and gives super fast download times for them
If you have a super popular file .. it won't take out your site when 1000 people start trying to download it .. as they'd all get copies pushed out the whatever CDN node they were closest to.
You don't have to rename the files on S3 every day. Just make them private (which is default), and hand out time limited urls for day or a week to anyone who is authorized.
I would consider making the links only good for 20 mins, so that a user has to re-login in order to re-download the files. Then they can't even share the links they get from you.

ImageMagick server requirements

I'm in the process of building a simple website, or to be more precise a simple component for a website that adds a watermark to an image, creates a few different size images, and overlays it onto a few products. These edits will be made every time someone queries an image in a certain directory on the server.
I know this can all be done with imagemagick, my only concern is that the whole website will grind to a halt every time someone views their image for the first time (after the edit's been made once, the database is updated to get the edited version every time a user accesses it).
The website isn't hosted yet, for the time being I'm testing on XAMPP, but I figured for this I'm going to need a virtual or dedicated server, I just need some advice on what sort of hardware specs I ought to be looking at. I doubt more than 2 or 3 people will be viewing photos at any one time, but at a guess I need to be sure that the server can handle up to 10 or so and still be functional.
Hope someone can advise on this, cheers!
For Imageprocessing you need some CPU Power. If the Images are large they will also consume memory. But I think you should in any case work with caching. I don't know your application but certainly there are possibilities to cache images which are rendered once into the filesystem.

What's the best way to manage multiple media servers, and file allocations between them?

I have a file host website thats burning through 2gbit of bandwidth, so I need to start adding secondary media servers to store the files. What would be the best way to manage a multiple server setup, with a large amount of files? Preferably through php only.
Currently, I only have around 100Gb of files... so I could get a 2nd server, mirror all content between them, and then round robin the traffic 50/50, 33/33/33, etc. But once the total amount of files grows beyond the capacity of a single server, this wont work.
The idea that I had was to have a list of media servers stored in the DB with the amounts of free space left on each server. Once a file is uploaded, php will choose to which server the file is actually uploaded to, and spread out all the files evenly among the servers.
Was hoping to get some more input/inspiration.
Cant use any 3rd party services like Amazon. The files range from several bytes to a gigabyte.
Thanks
You could try MogileFS. It is a distributed file system. Has a good API for PHP. You can create categories and upload a file to that category. For each category you can define on how many servers it should be distributed. You can use the API to get a URL to that file on a random node.
If you are doing as much data transfer as you say, it would seem whatever it is you are doing is growing quite rapidly.
It might be worth your while to contact your hosting provider and see if they offer any sort of shared storage solutions via iscsi, nas, or other means. Ideally the storage would not only start out large enough to store everything you have on it, but it would also be able to dynamically grow beyond your needs. I know my hosting provider offers a solution like this.
If they do not, you might consider colocating your servers somewhere that either does offer a service like that, or would allow you install your own storage server (which could be built cheaply from off the shelf components and software like Freenas or Openfiler).
Once you have a centralized storage platform, you could then add web-servers to your hearts content and load balance them based on load, all while accessing the same central storage repository.
Not only is this the correct way to do it, it would offer you much more redundancy and expandability in the future if you endeavor continues to grow at the pace it is currently growing.
The other solutions offered using a database repository of what is stored where, would work, but it not only adds an extra layer of complexity into the fold, but an extra layer of processing between your visitors and the data they wish to access.
What if you lost a hard disk, do you lose 1/3 or 1/2 of all your data?
Should the heavy IO's of static content be on the same spindles as the rest of your operating system and application data?
Your best bet is really to get your files into some sort of storage that scales. Storing files locally should only be done with good reason (they are sensitive, private, etc.)
Your best bet is to move your content into the cloud. Mosso's CloudFiles or Amazon's S3 will both allow you to store an almost infinite amount of files. All your content is then accessible through an API. If you want, you can then use MySQL to track meta-data for easy searching, and let the service handle the actual storage of the files.
i think your own idea is not the worst one. get a bunch of servers, and for every file store which server(s) it's on. if new files are uploaded, use most-free-space first*. every server handles it's own delivery (instead of piping through the main server).
pros:
use multiple servers for a single file. e.g. for cutekitten.jpg: filepath="server1\cutekitten.jpg;server2\cutekitten.jpg", and then choose the server depending on the server load (or randomly, or alternating, ...)
if you're careful you may be able to move around files automatically depending on the current load. so if your cute-kitten image gets reddited/slashdotted hard, move it to the server with the lowest load and update the entry.
you could do this with a cron-job. just log the downloads for the last xx minutes. try some formular like (downloads-per-minutefilesize(product of serverloads)) for weighting. pick tresholds for increasing/decreasing the number of servers those files are distributed to.
if you add a new server, it's relativley painless (just add the address to the server pool)
cons:
homebrew solutions are always risky
your load distribution algorithm must be well tested, otherwise bad things could happen (everything mirrored everywhere)
constantly moving files around for balancing adds additional server load
* or use a mixed weighting algorithm: free-space, server-load, file-popularity
disclaimer: never been in the situation myself, just guessing.
Consider HDFS, which is part of Apache's Hadoop. This will integrate with PHP, but you'll be setting up a second application. This will also solve all your points of balancing among servers and handling things when your file space usage exceeds one server's ability. It's not purely in PHP, though, but I don't think that's what you meant when you said "pure" anyway.
See http://hadoop.apache.org/core/docs/current/hdfs_design.html for the idea of it. They cover the whole idea of how it handles large files, many files, replication, etc.

File / Image Replication

I have a simple question and wish to hear others' experiences regarding which is the best way to replicate images across multiple hosts.
I have determined that storing images in the database and then using database replication over multiple hosts would result in maximum availability.
The worry I have with the filesystem is the difficulty synchronising the images (e.g I don't want 5 servers all hitting the same server for images!).
Now, the only concerns I have with storing images in the database is the extra queries hitting the database and the extra handling i'd have to put in place in apache if I wanted 'virtual' image links to point to database entries. (e.g AddHandler)
As far as my understanding goes:
If you have a script serving up the
images: Each image would require a
database call.
If you display the images inline as
binary data: Which could be done in
a single database call.
To provide external / linkable
images you would have to add a
addHandler for the extension you
wish to 'fake' and point it to your
scripting language (e.g php, asp).
I might have missed something, but I'm curious if anyone has any better ideas?
Edit:
Tom has suggested using mod_rewrite to save using an AddHandler, I have accepted as a proposed solution to the AddHandler issue; however I don't yet feel like I have a complete solution yet so please, please, keep answering ;)
A few have suggested using lighttpd over Apache. How different are the ISAPI modules for lighttpd?
If you store images in the database, you take an extra database hit plus you lose the innate caching/file serving optimizations in your web server. Apache will serve a static image much faster than PHP can manage it.
In our large app environments, we use up to 4 clusters:
App server cluster
Web service/data service cluster
Static resource (image, documents, multi-media) cluster
Database cluster
You'd be surprised how much traffic a static resource server can handle. Since it's not really computing (no app logic), a response can be optimized like crazy. If you go with a separate static resource cluster, you also leave yourself open to change just that portion of your architecture. For instance, in some benchmarks lighttpd is even faster at serving static resources than apache. If you have a separate cluster, you can change your http server there without changing anything else in your app environment.
I'd start with a 2-machine static resource cluster and see how that performs. That's another benefit of separating functions - you can scale out only where you need it. As far as synchronizing files, take a look at existing file synchronization tools versus rolling your own. You may find something that does what you need without having to write a line of code.
Serving the images from wherever you decide to store them is a trivial problem; I won't discuss how to solve it.
Deciding where to store them is the real decision you need to make. You need to think about what your goals are:
Redundancy of hardware
Lots of cheap storage
Read-scaling
Write-scaling
The last two are not the same and will definitely cause problems.
If you are confident that the size of this image library will not exceed the disc you're happy to put on your web servers (say, 200G at the time of writing, as being the largest high speed server-grade discs that can be obtained; I assume you want to use 1U web servers so you won't be able to store more than that in raid1, depending on your vendor), then you can get very good read-scaling by placing a copy of all the images on every web server.
Of course you might want to keep a master copy somewhere too, and have a daemon or process which syncs them from time to time, and have monitoring to check that they remain in sync and this daemon works, but these are details. Keeping a copy on every web server will make read-scaling pretty much perfect.
But keeping a copy everywhere will ruin write-scalability, as every single web server will have to write every changed / new file. Therefore your total write throughput will be limited to the slowest single web server in the cluster.
"Sharding" your image data between many servers will give good read/write scalability, but is a nontrivial exercise. It may also allow you to use cheap(ish) storage.
Having a single central server (or active/passive pair or something) with expensive IO hardware will give better write-throughput than using "cheap" IO hardware everywhere, but you'll then be limited by read-scalability.
Having your images in a database doesn't necessarily mean a database call for each one; you could cache these separately on each host (e.g. in temporary files) when they are retrieved. The source images would still be in the database and easy to synchronise across servers.
You also don't really need to add Apache handlers to serve an image through a PHP script whilst maintaining nice urls- you can make urls like http://server/image.php/param1/param2/param3.JPG and read the parameters through $_SERVER['PATH_INFO'] . You could also remove the 'image.php' portion of the URL (if you needed to) using mod_rewrite.
What you are looking for already exists and is called MogileFS
Target setup involves mogilefsd, replicated mysql databases and lighttd/perlbal for serving files; It will bring you failover, fine grained file replication (for exemple, you can decide to duplicate end-user images on several physical devices, and to keep only one physical instance of thumbnails). Load balancing can also be achieved quite easily.

Categories