Splitting form submissions to speed up transfer time

Splitting form submissions to speed up transfer time - php

I have a simple CRM system that allows sales to put in customer info and upload appropriate files to create a project.
The system is already being hosted in the cloud. But the office internet upload speed is horrendous. One file may take up to 15 minutes or more to finish, causing a bottleneck in the sales process.
Upgrading our office internet is not an option; what other good solutions are out there?
I propose splitting the project submission form into 2 parts. Project info fields are posted directly to our cloud server webapp and stored in the appropriate DB table, the file submission will actually be submitted to a LAN server with a simple DB and api that will allow the cloud-hosted server webapp to communicate with to retrieve the file if ever needed again via a download link. Details need to be worked out for this set-up. But this is what I want to do in general.
Is this a good approach to solving this slow upload problem? I've never done this before, so are there also any obstacles to this implementation (cross-domain restrictions is something that comes into mind, but I believe that can be fixed with using an iFrame)?

If bandwidth is the bottleneck, then you need a solution that doesn't chew up all your bandwidth. You mentioned that you can't upgrade your bandwidth - what about putting in a second connection?
If not, the files need to stay on the LAN a little longer. It sounds like your plan would be to keep the files on the LAN forever, but you can store them locally initially and then push them later.
When you do copy the files out to the cloud, be sure to compress them and also setup rate limiting (so they take up maybe 10% of your available bandwidth during business hours).
Also put some monitoring in place to make sure the files are being sent in a timely manner.
I hope nobody needs to download those files! :(

Related

PHP Image Generation

So, for a simple test game, I'm working on generating user images based on their current in-game avatar. I got this idea from Club Penguin and GTA V. They both generate images of the current in-game avatar.
I created a script to simply put a few images together and print out the final image to the client. It's similar to how Club Penguin does it, I believe: http://cdn.avatar.clubpenguin.com/%7B13bcb2a5-2e21-442c-b8e4-10516be6abc6%7D/cp?size=300
As you can see, the penguin is wearing multiple clothing items. The items are each different images located at http://mobcdn.clubpenguin.com/game/items/images/paper/image/300/ (ex: http://mobcdn.clubpenguin.com/game/items/images/paper/image/300/210.png)
Anyway, I've already made the script and all, but I have a few questions.
When going to Club Penguin's or Grand Theft Auto's avatar generator, you'll notice it finishes the request so fast. Even when it's a new user, (so before it has a chance to cache the image since it hasn't been generated yet), it finishes in under a second.
How could I possibly speed up the image generation process? Right now I'm just using PHP, but I could definitely switch over to another language. I know a few others too and I'm willing to learn. Which language can provide the fastest web-image generator (it has to connect to a database first to grab the user avatar info)?
For server specs, how much RAM and all that fun stuff would be an okay amount? Right now I'm using an OVH cloud server (VPS Cloud 2) to test it and it's fine and all. But, if someone with experience with this could help, what might happen if I started getting a lot more traffic and there were people with 100+ image requests being made per client when they first log in (relationship system that shows their friend's avatar). I'll probably use Cloudflare and other caching tools to help so that most of them get cached for a maximum of 24 hours, but I can't completely rely on that.
tl;dr:
Two main questions:
What's the fastest way to generate avatars on the web (right now I'm using PHP)?
What are some good server specs for around 100+ daily unique clients (at minimum) using this server for generating these avatars?
Edit: Another question, which webserver could process more requests for this? Right now I'm using Apache for this server, but my other servers are using nginx for other API things (like logging users in, getting info, etc).

IMHO, language is not the bottleneck. PHP is fast enough for real-time small images processing. You just need right algorithm. Also, check out bytecode caching engines such as APC, or XCache, or even HHVM. They can significantly improve PHP performance.
I think, any VPS can do the job until you have >20 concurrent requests. The more clients use service at the same time the more RAM you need. You can easily determine your script memory needs and other performance info by using profiler, such as XHProf.
Nginx or Lighttpd in FastCGI mode use less RAM than Apache http server and they can handle more concurrent connections. But is's not important until you have many concurrent connections.

Yes, PHP is can do this job fast and flexible(example generate.php?size=32)
I know only German webspaces, but they have also an English interface. www.nitrado.net

Keep Uploaded Files in Sync Across Multiple Servers - PHP Linux

I have a website right now that is currently utilizing 2 servers, a application server and a database server, however the load on the application server is increasing so we are going to add a second application server.
The problem I have is that the website has users upload files to the server. How do I get the uploaded files on both of the servers?
I do not want to store images directly in a database as our application is database intensive already.
Is there a way to sync the servers across each other or is there something else I can do?
Any help would be appreciated.
Thanks
EDIT: I am adding the following links for people that helped me understand this question more:
Synchronize Files on Multiple Servers
and
Keep Uploaded Files in Sync Across Multiple Servers - LAMP
For all Reading this post NFS seems to be the better of the 2.
NFS will keep files in sync but you could also use ftp to upload the files across all servers as well but NFS looks like the way to go.

This is a question for serverfault.
Anyway I think you should definitely consider getting in the "cloud".
Syncing uploads from one server to another is simply unreliable - you have no idea what kind of errors you can get and why you can get them. Also the syncing process will load both servers. For me the proper solution is going in the cloud.
Should you chose the syncing method you have a couple of solutions:
Use rsync to sync the files you need between the servers.
Use crontab to sync the files every X minutes/hours/days.
Copy the files upon some event (user login etc)

I got this answer from server fault:
The most appropriate course of action in a situation like this is to break the file share into a separate service of its own. Don't duplicate files if you have a network that can let the files be "everywhere (almost) at once." You can do this through NFS/CIFS or through a proper storage protocol like iSCSI. Mount as local storage in the appropriate directory. Depending on the performance of your network and your storage needs, this could add a couple of undetectable milliseconds to page load time.
So using NFS to share server files would work OR
as stated by #kgb you could specify one single server to hold all uploaded files and have other servers pull from that (just make sure you run a cron or something to back up the file)

Most sites solve this problem by using a 3rd party designated file server like Amazon S3 for the user uploads.

Another answer could be to use a piece of software called BTSync, it is very easy to install and use and could allow you to easily keep files in sync accross as many servers as you need to. It takes only 3 terminal commands to install and is very efficient.
Take a look here
and here

You can use db server for storage... Not in the db i mean, have a web server running there too. It is not going to increase cpu load much, but is going to require a better channel.

you could do it with rsync.. people have suggested using nfs.. but that way you create one point of failure... if the nfs server goes down.. both your servers are screwed... correct me if im wrong

Input on decision: file hosting with amazon s3 or similar and php

I appreciate your comments to help me decide on the following.
My requirements:
I have a site hosted on a shared server and I'm going to provide content to my users. About 60 GB of content (about 2000 files 30mb each. Users will have access to only 20 files at a time), I calculate about 100 GB monthly bandwidth usage.
Once a user registers for the content, links will be accessible for the user to download. But I want the links to expire in 7 days, with the posibility to increase the expiration time.
I think that the disk space and bandwidth calls for a service like Amazon S3 or Rackspace Cloud files (or is there an alternative? )
To manage the expiration I plan to somehow obtain links that expire (I think S3 has that feature, not Rackspace) OR control the expiration date on my database and have a batch process that will rename on a daily basis all 200 files on the cloud and on my database (in case a user copied the direct link, it won't work the next day, only my webpage will have the updated links). PHP is used for programming.
So what do you think? Cloud file hosting is the way to go? Which one? Does managing the links makes sense that way or it is too difficult to do that through programming (send commands to the cloud server...)
EDIT:
Some host companies have Unlimited space and Bandwidth on their shared plans.. I asked their support staff and they said that they really honor the "unlimited" deal. So 100 GB of transfer a month is ok, the only thing to look out is CPU usage. So going shared hosting is one more alternative to choose from..
FOLLOWUP:
So digging more into this I found that the TOS of the Unlimited plans say that it is not permitted to use the space primarily to host multimedia files. So I decided to go with Amazon s3 and the solution provided by Tom Andersen.
Thanks for the input.

I personally don't think you necessarily need to go to a cloud based solution for this. It may be a little costly. You could simply get a dedicated server instead. One provider that comes to mind gives 3,000 GB/month of bandwidth on some of their lowest level plans. That is on a 10Mbit uplink; you can upgrade to 100Mbps for $10/mo of 1Gbit for $20/mo. I won't mention any names, but you can search for dedicated servers and possibly find one to your liking.
As for expiring the files, just implement that in PHP backed by a database. You won't have to move files around, store all the files in a directory not accessible from the web, and use a PHP script to determine if the link is valid, and if so read the contents of the file and pass them through to the browser. If the link is invalid, you can show an error message instead. It's a pretty simple concept and I think there are a lot of pre-written scripts that do that available, but depending on your needs, it isn't too difficult to do it yourself.
Cloud hosting has advantages, but right now I think its costly and if you aren't trying to spread the load geographically or plan on supporting thousands of simultaneous users and need the elasticity of the cloud, you could possibly use a dedicated server instead.
Hope that helps.

I can't speak for S3 but I use Rackspace Cloud files and servers.
It's good in that you don't pay for incoming bandwidth, so uploads are super cheap.
I would do it like this:
Upload all the files you need to a 'private' container
Create a public container with CDN enabled
That'll give you a special url like http://c3214146.r65.ce3.rackcdn.com
Make your own CNAME DNS record for your domain point to that, like: http://cdn.yourdomain.com
When a user requests a file, use the COPY api operation with a long random filename to do a server side copy from the private container to the public container.
Store the filename in a mysql DB for your app
Once the file expires, use the DELETE api operation, then the PURGE api operation to get it out of the CDN .. finally delete the record from the mysql table.
With the PURGE command .. I heard it doesn't work 100% of the time and it may leave the file around for an extra day .. also in the docs it says to reserve it's use for only emergency things.
Edit: I just heard, there's a 25 purge per day limit.
However personally I've just used delete on objects and found that took it out the CDN straight away. In summary, the worst case would be that the file would still be accessible on some CDN nodes for 24 hours after deletion.
Edit: You can change the TTL (caching time) on the CDN nodes .. default is 72 hours so might pay to set it to something lower .. but not so low that you loose the advantage of CDN.
The advantages I find with the CDN are:
It pushes content right out to end users far away from the USA servers and gives super fast download times for them
If you have a super popular file .. it won't take out your site when 1000 people start trying to download it .. as they'd all get copies pushed out the whatever CDN node they were closest to.

You don't have to rename the files on S3 every day. Just make them private (which is default), and hand out time limited urls for day or a week to anyone who is authorized.
I would consider making the links only good for 20 mins, so that a user has to re-login in order to re-download the files. Then they can't even share the links they get from you.

Will I run into load problems with this application stack?

I am designing a file download network.
The ultimate goal is to have an API that lets you directly upload a file to a storage server (no gateway or something). The file is then stored and referenced in a database.
When the file is requsted a server that currently holds the file is selected from the database and a http redirect is done (or an API gives the currently valid direct URL).
Background jobs take care of desired replication of the file for durability/scaling purposes.
Background jobs also move files around to ensure even workload on the servers regarding disk and bandwidth usage.
There is no Raid or something at any point. Every drive ist just hung into the server as JBOD. All the replication is at application level. If one server breaks down it is just marked as broken in the database and the background jobs take care of replication from healthy sources until the desired redundancy is reached again.
The system also needs accurate stats for monitoring / balancing and maby later billing.
So I thought about the following setup.
The environment is a classic Ubuntu, Apache2, PHP, MySql LAMP stack.
An url that hits the currently storage server is generated by the API (thats no problem far. Just a classic PHP website and MySQL Database)
Now it gets interesting...
The Storage server runs Apache2 and a PHP script catches the request. URL parameters (secure token hash) are validated. IP, Timestamp and filename are validated so the request is authorized. (No database connection required, just a PHP script that knows a secret token).
The PHP script sets the file hader to use apache2 mod_xsendfile
Apache delivers the file passed by mod_xsendfile and is configured to have the access log piped to another PHP script
Apache runs mod_logio and an access log is in Combined I/O log format but additionally estended with the %D variable (The time taken to serve the request, in microseconds.) to calculate the transfer speed spot bottlenecks int he network and stuff.
The piped access log then goes to a PHP script that parses the url (first folder is a "bucked" just as google storage or amazon s3 that is assigned one client. So the client is known) counts input/output traffic and increases database fields. For performance reasons i thought about having daily fields, and updating them like traffic = traffic+X and if no row has been updated create it.
I have to mention that the server will be low budget servers with massive strage.
The can have a close look at the intended setup in this thread on serverfault.
The key data is that the systems will have Gigabit throughput (maxed out 24/7) and the fiel requests will be rather large (so no images or loads of small files that produce high load by lots of log lines and requests). Maby on average 500MB or something!
The currently planned setup runs on a cheap consumer mainboard (asus), 2 GB DDR3 RAM and a AMD Athlon II X2 220, 2x 2.80GHz tray cpu.
Of course download managers and range requests will be an issue, but I think the average size of an access will be around at least 50 megs or so.
So my questions are:
Do I have any sever bottleneck in this flow? Can you spot any problems?
Am I right in assuming that mysql_affected_rows() can be directly read from the last request and does not do another request to the mysql server?
Do you think the system with the specs given above can handle this? If not, how could I improve? I think the first bottleneck would be the CPU wouldnt it?
What do you think about it? Do you have any suggestions for improvement? Maby something completely different? I thought about using Lighttpd and the mod_secdownload module. Unfortunately it cant check IP adress and I am not so flexible. It would have the advantage that the download validation would not need a php process to fire. But as it only runs short and doesnt read and output the data itself i think this is ok. Do you? I once did download using lighttpd on old throwaway pcs and the performance was awesome. I also thought about using nginx, but I have no experience with that. But
What do you think ab out the piped logging to a script that directly updates the database? Should I rather write requests to a job queue and update them in the database in a 2nd process that can handle delays? Or not do it at all but parse the log files at night? My thought that i would like to have it as real time as possible and dont have accumulated data somehwere else than in the central database. I also don't want to keep track on jobs running on all the servers. This could be a mess to maintain. There should be a simple unit test that generates a secured link, downlads it and checks whether everything worked and the logging has taken place.
Any further suggestions? I am happy for any input you may have!
I am also planning to open soure all of this. I just think there needs to be an open source alternative to the expensive storage services as amazon s3 that is oriented on file downloads.
I really searched a lot but didnt find anything like this out there that. Of course I would re use an existing solution. Preferrably open source. Do you know of anything like that?

MogileFS, http://code.google.com/p/mogilefs/ -- this is almost exactly thing, that you want.

What's the best way to manage multiple media servers, and file allocations between them?

I have a file host website thats burning through 2gbit of bandwidth, so I need to start adding secondary media servers to store the files. What would be the best way to manage a multiple server setup, with a large amount of files? Preferably through php only.
Currently, I only have around 100Gb of files... so I could get a 2nd server, mirror all content between them, and then round robin the traffic 50/50, 33/33/33, etc. But once the total amount of files grows beyond the capacity of a single server, this wont work.
The idea that I had was to have a list of media servers stored in the DB with the amounts of free space left on each server. Once a file is uploaded, php will choose to which server the file is actually uploaded to, and spread out all the files evenly among the servers.
Was hoping to get some more input/inspiration.
Cant use any 3rd party services like Amazon. The files range from several bytes to a gigabyte.
Thanks

You could try MogileFS. It is a distributed file system. Has a good API for PHP. You can create categories and upload a file to that category. For each category you can define on how many servers it should be distributed. You can use the API to get a URL to that file on a random node.

If you are doing as much data transfer as you say, it would seem whatever it is you are doing is growing quite rapidly.
It might be worth your while to contact your hosting provider and see if they offer any sort of shared storage solutions via iscsi, nas, or other means. Ideally the storage would not only start out large enough to store everything you have on it, but it would also be able to dynamically grow beyond your needs. I know my hosting provider offers a solution like this.
If they do not, you might consider colocating your servers somewhere that either does offer a service like that, or would allow you install your own storage server (which could be built cheaply from off the shelf components and software like Freenas or Openfiler).
Once you have a centralized storage platform, you could then add web-servers to your hearts content and load balance them based on load, all while accessing the same central storage repository.
Not only is this the correct way to do it, it would offer you much more redundancy and expandability in the future if you endeavor continues to grow at the pace it is currently growing.
The other solutions offered using a database repository of what is stored where, would work, but it not only adds an extra layer of complexity into the fold, but an extra layer of processing between your visitors and the data they wish to access.
What if you lost a hard disk, do you lose 1/3 or 1/2 of all your data?
Should the heavy IO's of static content be on the same spindles as the rest of your operating system and application data?

Your best bet is really to get your files into some sort of storage that scales. Storing files locally should only be done with good reason (they are sensitive, private, etc.)
Your best bet is to move your content into the cloud. Mosso's CloudFiles or Amazon's S3 will both allow you to store an almost infinite amount of files. All your content is then accessible through an API. If you want, you can then use MySQL to track meta-data for easy searching, and let the service handle the actual storage of the files.

i think your own idea is not the worst one. get a bunch of servers, and for every file store which server(s) it's on. if new files are uploaded, use most-free-space first*. every server handles it's own delivery (instead of piping through the main server).
pros:
use multiple servers for a single file. e.g. for cutekitten.jpg: filepath="server1\cutekitten.jpg;server2\cutekitten.jpg", and then choose the server depending on the server load (or randomly, or alternating, ...)
if you're careful you may be able to move around files automatically depending on the current load. so if your cute-kitten image gets reddited/slashdotted hard, move it to the server with the lowest load and update the entry.
you could do this with a cron-job. just log the downloads for the last xx minutes. try some formular like (downloads-per-minutefilesize(product of serverloads)) for weighting. pick tresholds for increasing/decreasing the number of servers those files are distributed to.
if you add a new server, it's relativley painless (just add the address to the server pool)
cons:
homebrew solutions are always risky
your load distribution algorithm must be well tested, otherwise bad things could happen (everything mirrored everywhere)
constantly moving files around for balancing adds additional server load
* or use a mixed weighting algorithm: free-space, server-load, file-popularity
disclaimer: never been in the situation myself, just guessing.

Consider HDFS, which is part of Apache's Hadoop. This will integrate with PHP, but you'll be setting up a second application. This will also solve all your points of balancing among servers and handling things when your file space usage exceeds one server's ability. It's not purely in PHP, though, but I don't think that's what you meant when you said "pure" anyway.
See http://hadoop.apache.org/core/docs/current/hdfs_design.html for the idea of it. They cover the whole idea of how it handles large files, many files, replication, etc.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.