Files disappear on server - php

I have an PHP application hosted on AppFog and sometimes it need to generate some files and store them on the server. Files are saved with file_put_contents() or with imagejpeg() and imagepng() functions. After a while files became removed. Can you tell me why and how can I prevent it?

Many PAAS providers, including AppFog, do not provide a persistent filesystem. Generally, you can save files but they will be removed when you redeploy your application.
For persistent file storage, you are encouraged to use a cloud provider like Amazon S3.
From the AppFog FAQ:
Does AppFog have a persistent file system?
Not yet. We're working on this feature, but in the meantime, the file system is volatile. This means that any changes you make to the file system through a web interface, including any admin changes and content uploads, will be lost on the app's next start, stop, restart, deploy, or resource change. Because of this, you should make any changes to the file system on a local development environment and keep media assets and content uploads on an external storage system like Amazon's S3.

Related

Why are muy uploaded files deleted on Heroku when I push a new version of the application?

On my Symfony project, I store uploaded pictures and documents on the directory public/upload/
But when the dyno is rebuilt (push changes from staging to prod, change configuration variables, etc), everything is deleted there.
I don’t know what can be done with Heroku so that it saves some directories during when rebuilding the application image
The file-system for a Heroku app is re-built each a dyno is started. Storing any user-generated content there is a recipe for failure.
You should store these files somewhere else. Either a remote file-system, database, or anything that does not depend on the application container.
Heroku has a generic help page for this, and they advise their users to use S3 to handle their users uploads.

Joomla Deployment in AWS, Especially managing common files/folders between web servers (instances) running in multiple availability zones

What will be the Joomla configuration Log, Tmp paths in configuration.php when we are deploying application on AWS VPC- EC2 Instance. My VPC is auto scalable, hence I assume that some how the Logs should be in any shared location? or should be on EC2 Instance local location?
Also what will be the path that we will use to mention in our configuration.php of joomla.
public $log_path = '/home/public_html/logs';
public $tmp_path = '/home/public_html/temp';
In VPC when we are having multiple availability zones in which our web server is running, there we get common problem on shared media contents. For example in one particular case when log or temp path, and in case of images or files that could be uploaded by user/admin or generated by the application.
When Load Balancer directs website traffic to random servers(running in different availability zones) this becomes a big problem to tackle.
In such cases, we must need to have a shared common location that holds these media files (that are uploaded by users or generated by the application). Otherwise each server (in a availability zone) will be saving files separately, and if by chance last time user uploaded photo being in Availability Zone-A server the next time if user directed to (by Load Balancer) Availability Zone-B server the files will not be there. This is a very bad case.
Solutions:
Preferably it is recommended to user AWS EFS service that is designed by AWS to address this particular need. This creates a common location to hold shared/common files, that can be accessed more like a local filesystem folder. Just like NFS mounted folder.
You can use S3 storage which is also recommended in some cases, but you need to modify or setup your code in a way that it can handle S3 storage filesystem.
Moreover you can also mount S3 storage to your Web server (EC2 Instance) using 3rd party S3FS. But AWS does recommend to 50% recommendation. As AWS may not be able to do some monitoring stuff that helps customers to monitor the traffic and other stuff.
You can also create a stand alone EC2 instance which main job will only be to hold these common media files. This can be created anywhere in Public or Private subnet, it will work as far as it is in the same Route Table as other Subnets are. We can use this Instance as NFS by create required number of directories in this instance and exporting them to specific web servers (sitting in our different availability zones) or can also export to range of IPs (if we have setup Auto Scalable Group in AWS). Then each client (web server) need to mount the exported directories to the locations which and where our application expect those directories. These directories will then act like a normal local directory but will be common across all running web server in different availability zones.
If we have a NAT EC2 Instance running, we can also use it to act as a NFS server and create directories on it and export them to web servers.
Conclusion:
The space and budget cost are two important factors. The performance is more like similar in all cases mentioned above. When we have media files under <10GB then we either should use right EC2 instance for NFS. But as if we setup separate EC2 instance the option#3 may cost of read/writes in future, in that case option#4 will be more appropriate. S3 is right option as well but when we have option to change the code accordingly. S3 is also right option for huge file sizes.

Connect file scan protection to my site

I am looking into implementing a virus scanner into a web application I am creating that allows the user to upload files. All the background functionality is completed, however I am wary that a malicious user may upload a virus and other users download that file.
I have been researching for a few months now on how to implement a scanner into my web application.
I have located various online virus scanners such as MetaScan-Online and VirusTotal.
I have read into the documentation that they have provided, however I am still confused and am unsure if I can implement these applications into my applications using the API.
Can I?
And if so, is there another virus scanner that enables a whole folder of files to be scanned simultaneously?
If the anti-virus force is strong in you, then you can probably implement a service class and upload the incoming files to one of the public scan services.
Keep in mind that they are limiting the accepted file size and number of files and that they don't store the scan reports forever.
MetaScan
The public API for Metascan is described here: https://www.metascan-online.com/public-api#!/
There is also a PHP checker available. But it uses v1 of their API and looks outdated. Maybe contact them to get an update version using API v2.
https://www.metascan-online.com/apps#!/php-metascan-online-checker
VirusTotal
The public API is described here: https://www.virustotal.com/de/documentation/public-api/
There are multiple libraries for PHP available, just to mention one of them
https://github.com/jayzeng/virustotal_apiwrapper
Local clamAV scan after upload
Another solution is to simply trigger a clamav scan by using clamscan after a file was uploaded to your server. That means upload to sandboxed av-scan folder, scan, drop (if bad), or keep (if ok) and finally move to upload folder. This is slow, because signatures have to be loaded each time the clamscan command is invoked from PHP.
With sandbox folder i mean an restricted system environment for controlling the resources better, e.g. an "upload directory" with restricted or removed read/write/exec permissions (user/group/other), isolated from the application, not accessible from the web and with restricted PHP capabilities (think of disable_functions directive).
When your server runs a clamav daemon (clamd) its possible to invoke clamdscan on the folder. This solution is fast, because the signatures a kept in memory.
You'll have to handle the sending of folder yourself, they'll not bez able (as an external service) to get the list of file for a defined folder.
VirusTotal is providing a public API to scan your file, it's a good start. You could implement multi threading and store the result for each file. This way you'll avoid sending multiple time the same file.

How can I mount an S3 bucket to an EC2 instance and write to it with PHP?

I'm working on a project that is being hosted on Amazon Web Services. The server setup consists of two EC2 instances, one Elastic Load Balancer and an extra Elastic Block Store on which the web application resides. The project is supposed to use S3 for storage of files that users upload. For the sake of this question, I'll call the S3 bucket static.example.com
I have tried using s3fs (https://code.google.com/p/s3fs/wiki/FuseOverAmazon), RioFS (https://github.com/skoobe/riofs) and s3ql (https://code.google.com/p/s3ql/). s3fs will mount the filesystem but won't let me write to the bucket (I asked this question on SO: How can I mount an S3 volume with proper permissions using FUSE). RioFS will mount the filesystem and will let me write to the bucket from the shell, but files that are saved using PHP don't appear in the bucket (I opened an issue with the project on GitHub). s3ql will mount the bucket, but none of the files that are already in the bucket appear in the filesystem.
These are the mount commands I used:
s3fs static.example.com -ouse_cache=/tmp,allow_other /mnt/static.example.com
riofs -o allow_other http://s3.amazonaws.com static.example.com /mnt/static.example.com
s3ql mount.s3ql s3://static.example.com /mnt/static.example.com
I've also tried using this S3 class: https://github.com/tpyo/amazon-s3-php-class/ and this FuelPHP specific S3 package: https://github.com/tomschlick/fuel-s3. I was able to get the FuelPHP package to list the available buckets and files, but saving files to the bucket failed (but did not error).
Have you ever mounted an S3 bucket on a local linux filesystem and used PHP to write a file to the bucket successfully? What tool(s) did you use? If you used one of the above mentioned tools, what version did you use?
EDIT
I have been informed that the issue I opened with RioFS on GitHub has been resolved. Although I decided to use the S3 REST API rather than attempting to mount a bucket as a volume, it seems that RioFS may be a viable option these days.
Have you ever mounted an S3 bucket on a local linux filesystem?
No. It's fun for testing, but I wouldn't let it near a production system. It's much better to use a library to communicate with S3. Here's why:
It won't hide errors. A filesystem only has a few errors codes it can send you to indicate a problem. An S3 library will give you the exact error message from Amazon so you understand what's going on, log it, handle corner cases, etc.
A library will use less memory. Filesystems layers will cache lots of random stuff that you many never use again. A library puts you in control to decide what to cache and not to cache.
Expansion. If you ever need to do anything fancy (set an ACL on a file, generate a signed link, versioning, lifecycle, change durability, etc), then you'll have to dump your filesystem abstraction and use a library anyway.
Timing and retries. Some fraction of requests randomly error out and can be retried. Sometimes you may want to retry a lot, sometimes you would rather error out quickly. A filesystem doesn't give you granular control, but a library will.
The bottom line is that S3 under FUSE is a leaky abstraction. S3 doesn't have (or need) directories. Filesystems weren't built for billions of files. Their permissions models are incompatible. You are wasting a lot of the power of S3 by trying to shoehorn it into a filesystem.
Two random PHP libraries for talking to S3:
https://github.com/KnpLabs/Gaufrette
https://aws.amazon.com/sdkforphp/ - this one is useful if you expand beyond just using S3, or if you need to do any of the fancy requests mentioned above.
Quite often, it is advantageous to write files to the EBS volume, then force subsequent public requests for the file(s) to route through CloudFront CDN.
In that way, if the app must do any transformations to the file, it's much easier to do on the local drive & system, then force requests for the transformed files to pull from the origin via CloudFront.
e.g. if your user is uploading an image for an avatar, and the avatar image needs several iterations for size & crop, your app can create these on the local volume, but all public requests for the file will take place through a cloudfront origin-pull request. In that way, you have maximum flexibility to keep the original file (or an optimized version of the file), and any subsequent user requests can either pull an existing version from cloud front edge, or cloud front will route the request back to the app and create any necessary iterations.
An elementary example of the above would be WordPress, which creates multiple sized/cropped versions of any graphic image uploaded, in addition to keeping the original (subject to file size restrictions, and/or plugin transformations). CDN-capable WordPress plugins such as W3 Total Cache rewrite requests to pull through CDN, so the app only needs to create unique first-request iterations. Adding browser caching URL versioning (http://domain.tld/file.php?x123) further refines and leverages CDN functionality.
If you are concerned about rapid expansion of EBS volume file size or inodes, you can automate a pruning process for seldom-requested files, or aged files.

Inter-network File Transfers using PHP with polling

I am designing a web-based file-managment system that can be conceptualised as 3 different servers:
The server that hosts the system interface (built in PHP) where users 'upload' and manage files (no actual files are stored here, it's all meta).
A separate staging server where files are placed to be worked on.
A file-store where the files are stored when they are not being worked on.
All 3 servers will be *nix-based on the same internal network. Users, based in Windows, will use a web interface to create an initial entry for a file on Server 1. This file will be 'uploaded' to Server 3 either from the user's local drive (if the file doesn't currently exist anywhere on the network) or another network drive on the internal network.
My question relates to the best programmatic approach to achieve what I want to do, namely:
When a user uploads a file (selecting the source via a web form) from the network, the file is transferred to Server 3 as an inter-network transfer, rather than passing through the user (which I believe is what would happen if it was sent as a standard HTTP form upload). I know I could set up FTP servers on each machine and attempt to FXP files between locations, but is this preferable to PHP executing a command on Server 1 (which will have global network access), to perform a cross-network transfer that way?
The second problem is that these are very large files we're talking about, at least a gigabyte or two each, and so transfers will not be instant. I need some method of polling the status of the transfer, and returning this to the web interface so that the user knows what is going on.
Alternatively this upload could be left to run asyncrhonously to the user's current view, but I would still need a method to check the status of the transfer to ensure it completes.
So, if using an FXP solution, how could polling be achieved? If using a file move/copy command from the shell, is any form of polling possible? PHP/JQuery solutions would be very acceptable.
My final part to this question relates to windows network drive mapping. A user may map a drive (and select a file from), an arbitrarily specified mapped drive. Their G:\ may relate to \server4\some\location\therein, but presumably any drive path given to the server via a web form will only send the G:\ file path. Is there a way to determine the 'real path' of mapped network drives?
Any solution would be used to stage files from Server 3 to Server 2 when the files are being worked on - the emphasis being on these giant files not having to pass through the user's local machine first.
Please let me know if you have comments and I will try to make this question more coherant if it is unclear.
As far as I’m aware (and I could be wrong) there is no standard way to determine the UNC path of a mapped drive from a browser.
The only way to do this would be to have some kind of control within the web page. Could be ActiveX or maybe flash. I’ve seen ActiveX doing this, but not flash.
In the past when designing web based systems that need to know the UNC path of a user’s mapped drive I’ve had to have a translation of drive to UNC path stored server side. I did have a luxury though of knowing which drive would map to what UNC path. If the user can set arbitrary paths then this obviously won’t work.
Ok, as I’m procrastinating and avoiding real work I’ve given this some thought.
I’ll preface this by saying that I’m in no way a Linux expert and the system I’m about to describe has just been thought up off the top of my head and is not something you’d want to put into any kind of production. However, it might help you down the right path.
So, you have 3 servers, the Interface Server (LAMP stack I’m assuming?) your Staging Server and your File Store Server. You will also have Client Machines and Network Shares. For the purpose of this design your Network Shares are hosted on nix boxes that your File Store can scp from.
You’d create your frontend website that tracks and stores information about files etc. This will also hold the details about which files are being copied, which are in Staging and so on.
You’ll also need some kind of Service running on the File Store Server. I’ll call this the File Copy Service. This will be responsible for coping the files from your servers hosting the network shares.
Now, you’ve still got an issue with how you figure out what path the users file is actually on. If you can stop users from mapping their own drives and force them to use consistent drive letters then you could keep a translation of drive letter to UNC path on the server. If you can’t, well I’ll let you figure that out. If you’re in a windows domain you can force the drive mappings using Group Policies.
Anyway, the process for the system would work something like this.
User goes to system and selects a file
The Interface server take the file path and calls the File Copy Service on the File Store Server
The File Copy Service connects to the server that hosts the file and initiates the copy. If they’re all nix boxes you could easily use something like SCP. Now, I haven’t actually looked up how to do it but I’d be very surprised if you can’t get a running total of percentage complete from SCP as it’s copying. With this running total the File Copy Service will be updating the database on the Interface Server with how the copy is doing so the user can see this from the Interface Server.
The File Copy Service can also be used to move files from the File Store to the staging server.
As i said very roughly thought out. The above would work, but it all depends a lot on how your systems are set up etc.
Having said all that though, there must be software that would do this out there. Have you looked?
If iam right is this archtecture:
Entlarge image
1.)
First lets sove the issue of "inter server transfer"
I would solve this issue by mount the FileSystem from Server 2 and 3 to Server 1 by NFS.
https://help.ubuntu.com/8.04/serverguide/network-file-system.html
So PHP can direct store files on file system and dont need to know on which server the files realy is.
/etc/exports
of Server 2 + 3
/directory/with/files 192.168.IPofServer.1 (rw,sync)
exportfs -ra
/etc/fstab
of Server 1
192.168.IPofServer.2:/var/lib/data/server2/ /directory/with/files nfs rsize=8192,wsize=8192,timeo=14,intr
192.168.IPofServer.3:/var/lib/data/server3/ /directory/with/files nfs rsize=8192,wsize=8192,timeo=14,intr
mount -a
2.)
Get upload progress for realy large files,
here are some possibilitys to have a progress bar for http uploads.
But for a resume function you would have to use a flash plugin.
http://fineuploader.com/#demo
https://github.com/valums/file-uploader
or you can build it by your selfe using the apc extension
http://www.amwsites.com/blog/2011/01/use-a-combination-of-jquery-php-apc-uploadprogress-to-show-progress-bar-during-an-upload/
3.)
Lets Server load files from Network drive.
This i would try with a java applet to figurre out the real network path and send this to server, so the server can fetch the file in background.
But i never didt thinks like this before and have no further informations.

Categories