Transfer large data from one server to another - php

I m currently trying to transfer large data from one server to another using php cURL (posting the data). In some cases the remote server is getting incomplete data(corrupted).
Is there any other way to achieve this reliably
EDIT - 1
Using FTP seems good idea, anybody would like to say that it is bad or i should avoid it for any reason (Suggestions - #Ed Heal, #Neo)

I would guess your php session is timing out. See How to increase the execution timeout in php?
Or you could get curl to run in it's own thread. Call it from a bash script maybe.

Posting large files is not what http is for. Ftp is for transferring files. Hence the name.
But if you are stuck on using http, you can take a look at the WebDAV extensions to http. There is a php library called SabreDAV that you should take a look at:
http://code.google.com/p/sabredav/

You can even use scp to do so, so that the data transfer is secure as well. You would be able to find libraries to do so. Also basic function in php can be useful: http://php.net/manual/en/function.ssh2-sftp.php

As you say that it is truncated, I would imaging that the server has a file limitation size - i.e. to prevent abuse and denial of service attacks.
I would stick to FTP and perhaps compressing the files.

Related

Handling Very Large Uploads [duplicate]

I want to allow uploads of very large files into our PHP application (hundred of megs - 8 gigs). There are a couple of problems with this however.
Browser:
HTML uploads have crappy feedback, we need to either poll for progress (which is a bit silly) or show no feedback at all
Flash uploader puts entire file into memory before starting the upload
Server:
PHP forces us to set post_max_size, which could result in an easily exploitable DOS attack. I'd like to not set this setting globally.
The server also requires some other variables to be there in the POST vars, such as an secret key. We'd like to be able to refuse the request right away, instead of after the entire file is uploaded.
Requirements:
HTTP is a must.
I'm flexible with client-side technology, as long as it works in a browser.
PHP is not a requirement, if there's some other technology that will work well on a linux environment, that's perfectly cool.
upload_max_filesize can be set on a per-directory basis; the same goes for post_max_size
e.g.:
<Directory /uploadpath/>
php_value upload_max_filesize 10G
php_value post_max_size 10G
</IfModule>
Python Handler?
Using a Python POST handler instead of PHP. Generate a unique identifier from your PHP app that the client can put in the HTTP headers. With mod_python to reject or accept the large upload before the entire POST body is transmitted.
I think
http://www.modpython.org/live/current/doc-html/dir-handlers-hph.html
Allows you to check headers and decline the rest of the POST input. I haven't tried it but might be the right path?
Looking at the source of mod_python, the buffering of the input via read() seems to allow bit-at-a-time evaluation of the HTTP input. Headers are first.
https://svn.apache.org/repos/asf/quetzalcoatl/mod_python/trunk/src/filterobject.c
It's old I know, but maybe someone have this problem nowdays ,too.
Now you can do this with only Javascript and, say, PHP. No Flash or Java required on client side.
demo: http://dnduploader.filkor.org/
The idea is to slice the files with Javascript's Blob slice() method...
How about a Java applet? That's how we had to do it at a company I previously worked for. I know applets suck, especially in this day and age with all our options available, but they really are the most versatile solution to desktop-like problems encountered in web development. Just something to consider.
You can set the post_max_size for just scripts in 1 directory. Place your upload script there, and allow only that script to handle large sizes. It's still possible for that script to be attacked with large/useless files, but it avoids setting it globally.
Use that with APC and you might be able to work out something good:
IBM Developer works article on APC
Tried all of this... this is by far the best I have used yet...
http://www.uploadify.com/
Take a look at jumploader.com
A good java-applet for uploading.
I've used it for uploading images and it works fine. Haven't tried with bigger files than 10MB, but i should work for really big files too.
Have you looked into using APC to check the progress and total file size. Here is a good blog post about it. It might help.
Maybe you could use Webdav and Javascript in the browser
AJAX Big file upload, with progress, to WebDAV
http://www.webdavsystem.com/ajax/programming/upload_progress
A simple library
http://debris.demon.nl/projects/davclient.js/doc/README.html
You can then get the JS to redirect the user to a success page. Secret keys and what-not can be handled in a PHP prelude before handing off the JS Client->WebDAV
I would look into FTP, SSH or SCP this allows you to upload a large file and still have access control over the file as well. This might take a little longer to implement but its probably the most secure way I could think of.
I know it sucks to add another dependency but in my experience, most websites that are doing something like this are using flash on the client side, and uploading the large file as chunks
adobe as a howto on flash file uploads
I also found this tutorial on codeproject:
Multiple File Upload With Progress Bar Using Flash and ASP.NET
PS - I know you're using PHP and not .net, I figured the important part was the flash ;)
I've had success with uploadify, and I would recommend it. It's a jQuery/Flash script that handles large uploads, and you can pass extra parameters to it (like the secret key). To solve the server-side issues, simply use the following code. The changes take affect just for the script they're called in:
//Check to see if the key is there
if(!isset($_POST['secret_key']) || !isValid($_POST['secret_key']))
{
exit("Invalid request");
}
function isValid($key)
{
//Put your validation code here.
}
//This line changes the timeout.
//Give it a value in seconds (3600 = 1 hour)
set_time_limit(3600);
//Set these amounts to whatever you need.
ini_set("post_max_size","8192M");
ini_set("upload_max_filesize","8192M");
//Generally speaking, the memory_limit should be higher
//than your post size. So make sure that's right too.
ini_set("memory_limit","8200M");
EDIT In response to your comment:
Given what you've said, I'm afraid you may not be able to meet your requirements over http. All of the solutions out there are code that add features to http that it was never designed for.
Like you said yourself, it's a simple protocol. Apart from writing your own client software that runs outside of the browser, a java applet, or using a different protocol (like FTP, which was designed for this), you might not get what you want.
I've done the best I could within the given constraints. Sorry I couldn't do better.
Try this: http://www.simple2ftp.com uses a Java based FTP applet from within a clever PHP application wrapper.

PHP redirect file post stream

I'm writing an API using php to wrap a website functionality and returning everything in json\xml. I've been using curl and so far it's working great.
The website has a standard file upload Post that accepts file(s) up to 1GB.
So the problem is how to redirect the file upload stream to the correspondent website?
I could download the file and after that upload it, but I'm limited by my server to just 20MG. And it seems a poor solution.
Is it even possible to control the stream and redirect it directly to the website?
I preserverd original at the bottom for posterity, but as it turns out, there is a way to do this
What you need to use is a combination of HTTP put method (which unfortunately isn't available in native browser forms), the PHP php://input wrapper, and a streaming php Socket. This gets around several limitations - PHP disallows php://input for post data, but it does nothing with regards to PUT filedata - clever!
If you're going to attempt this with apache, you're going to need mod_actions installed an activated. You're also going to need to specify a PUT script with the Script directive in your virtualhost/.htaccess.
http://blog.haraldkraft.de/2009/08/invalid-command-script-in-apache-configuration/
This allows put methods only for one url endpoint. This is the file that will open the socket and forward its data elsewhere. In my example case below, this is just index.php
I've prepared a boilerplate example of this using the python requests module as the client sending the put request with an image file. If you run the remote_server.py it will open a service that just listens on a port and awaits the forwarded message from php. put.py sends the actual put request to PHP. You're going to need to set the hosts put.py and index.php to the ones you define in your virtual host depending on your setup.
Running put.py will open the included image file, send it to your php virtual host, which will, in turn, open a socket and stream the received data to the python pseudo-service and print it to stdout in the terminal. Streaming PHP forwarder!
There's nothing stopping you from using any remote service that listens on a TCP port in the same way, in another language entirely. The client could be rewritten the same way, so long as it can send a PUT request.
The complete example is here:
https://github.com/DeaconDesperado/php-streamer
I actually had a lot of fun with this problem. Please let me know how it works and we can patch it together.
Begin original answer
There is no native way in php to pass a file asynchronously as it comes in with the request body without saving its state down to disc in some manner. This means you are hard bound by the memory limit on your server (20MB). The manner in which the $_FILES superglobal is initialized after the request is received depends upon this, as it will attempt to migrate that multipart data to a tmp directory.
Something similar can be acheived with the use of sockets, as this will circumvent the HTTP protocol at least, but if the file is passed in the HTTP request, php is still going to attempt to save it statefully in memory before it does anything at all with it. You'd have the tail end of the process set up with no practical way of getting that far.
There is the Stream library comes close, but still relies on reading the file out of memory on the server side - it's got to already be there.
What you are describing is a little bit outside of the HTTP protocol, especially since the request body is so large. HTTP is a request/response based mechanism, and one depends upon the other... it's very difficult to accomplish a in-place, streaming upload at an intermediary point since this would imply some protocol that uploads while the bits are streamed in.
One might argue this is more a limitation of HTTP than PHP, and since PHP is designed expressedly with the HTTP protocol in mind, you are moving about outside its comfort zone.
Deployments like this are regularly attempted with high success using other scripting languages (Twisted in Python for example, lot of people are getting onboard with NodeJS for its concurrent design patter, there are alternatives in Ruby or Java that I know much less about.)

best method for large File transfer via http(s) using PHP (POST) invoked via shell

I want to setup a automated backup via PHP, that using http/s I can "POST" a zip file request to another server and send over a large .zip file , basically, I want to backup an entire site (and its database) and have a cron peridocally transmit the file over via a http/s. somethiling like
wget http://www.thissite.com/cron_backup.php?dest=www.othersite.com&file=backup.zip
The appropriate authentication security can be added afterwords....
I prefer http/s because this other site has limited use of ftp and is on a windows box. So the I figure the sure way to communicate with it is via http/s .. the other end would have a correspondign php script that would store the file.
this process needs to be completely programmatic (ie. Flash uploaders will not work, as this needs a browser to work, this script will run from a shell session)//
Are there any generalized PHP libraries or functions that help with this sort of thing? I'm aware of the PHP script timeout issues, but I can typically alter php.ini to minimize this.
I'd personally not use wget, and just run from the shell directly for this.
Your php script would be called from cron like this:
/usr/local/bin/php /your/script/location.php args here if you want
This way you don't have to worry about yet another program to handle things (wget) if your settings are the same at each run then just put them in a config file or directly into the PHP script.
Timeout can be handled by this, makes PHP script run an unlimited amount of time.
set_time_limit(0);
Not sure what libraries you're using, but look into CRUL to do the POST, should work fine.
I think the biggest issues that would come up would be more sever related and less PHP/script related, i.e make sure you got the bandwidth for it, and that your PHP script CAN connect to an outside server.
If its at all possible I'd stay away from doing large transfers over HTTP. FTP is far from ideal too - but for very different reasons.
Yes it is possible to do this via ftp, http and https using curl - but this does not really solve any of the problems. HTTP is optimized around sending relatively small files in erlatively short periods of time - when you stray away from that you will end up undermining a lot of the optimization that's applied to webservers (e.g. if you've got a setting for maxrequestsperchild you could be artificially extending the life of processes which should have stopped, and there's the interaction between LimitRequest* settings and max_file_size, not to mention various timeouts and the other limit settings in Apache).
A far more sensible solution is to use rsync over ssh for content/code backups and the appropriate database replication method for the DBMS you are using - e.g. mysql replication.

Large file uploads from web pages

I code primarily in PHP and Perl. I have a client who is insisting on seeking video submissions (any encoding) from the public via one of their pages rather than letting YouTube do its job.
Server in question is a virtual machine and I can adjust ini settings for max post, max upload size etc as needed.
My initial thought is to use a Flash based uploader with PHP on the back end but I wondered if someone might have useful advice and experience on the subject?
Doing large file transfers of HTTP is not usually fun -- but sometimes it's necessary.
For large files, you'll definitely want to provide some kind of progress gauge for end-users.
There are flash-based tools that do this (swfUpload comes to mind).
If you want to avoid flash and do it with pretty html/javascript/css, you can leverage PHP's APC extension, which for some reason provides support for getting upload status from the server, as explained here
You can adjust the post size and use a normal html form. The big problem is not Apache, its http. If anything goes wrong in the transmission you will have no way to detect the error. Further more there is no way to resume the transfer. This is exactly why BitTorrent is so popular.
I don't know how against youtube your client is, but you can use their api to do the uploads from a page on your site.
http://code.google.com/apis/youtube/2.0/developers_guide_protocol.html#Uploading_Videos
See: browser based uploading.
For web-based uploads, there's not many options. Regardless of web platform, web server, etc. you're still transferring over HTTP. The transfer is all or nothing.
Your best option might be to find a Flash, Java, or other client side option that can chunk files and upload them piecemeal, then do a checksum to verify. That will allow for resuming uploads. Unfortunately, I don't know of any such open source component that does this.
Try to convince your client to change point of view.
Using http (and the browser, hell, the browser!) for this kind of issue is rarely a good deal; Will his users wait 40 minutes with the computer and the browser running until the upload is complete?
I dont think so.
Maybe, you could set up a public ftp account, where users can upload but not download and see the others user's files.. then, who want to use FTP software can, who like to do it via browser can too.
The big problem dealing using a browser is that, if something go wrong, you cant resume but have to restart from zero again.
the past year i had the same issue, i gave a look to ZUpload
, but i didnt use it so i can suggest (we wrote a small python script that we send to our customer; the python script create a torrent of the folder our costumer need to send to us, and we download it via utorrent ;)
p.s: again, sorry for my bad english ;)
I used jupload. Yes it looks horrible, but it just works.
With that said, it's still a better idea to convince the client that doing so is stupid.
I would agree with others stating that using HTML is a poor option. I believe there is a size limitation using Flash as well. I know of a script that uses a JavaScript Applet to perform an actual FTP transfer. It is called Simple2FTP and can be found at http://www.simple2ftp.com
Not sure but perhaps worth a try?

Sending large files via HTTP

I have a PHP client that requests an XML file over HTTP (i.e. loads an XML file via URL). As of now, the XML file is only several KB in size. A problem I can foresee is that the XML becomes several MBs or Gbs in size. I know that this is a huge question and that there are probably a myriad of solutions, but What ideas do you have to transport this data to the client?
Thanks!
based on your use case i'd definitely suggest zipping up the data first. in addition, you may want to md5 hash the file and compare it before initiating the download (no need to update if the file has no changes), this will help with point #2.
also, would it be possible to just send a segment of XML that has been instead of the whole file?
Ignoring how well a browser may or may-not handle a GB-sized XML file, the only real concern I can think of off the top of my head is if the execution time to generate all the XML is greater than any execution time thresholds that are set in your environment.
PHP's max_execution_time setting
PHP's set_time_limit() function
Apache's TimeOut Directive
Given that the XML is created dynamically with your PHP, the simplest thing I can think of is to ensure that the file is gzipped automatically by the webserver, like described here, it offers a general PHP approach and an Apache httpd-specific solution.
Besides that, having a browser (what else can be a PHP-client?) do such a job every night for some data synchonizing sounds like there must be a far simpler solution somewhere else.
And, of course, at some point, transferring "a lot" of data is going to take "a lot" of time...
The problem is that he's syncing up two datasets. The problem is completely misstated.
You need to either a) keep a differential log of changes to dataset A to that you can send that log to dataset B, or b) keep two copies of the dataset (last nights and the current dataset), and then compare them so you can then send the differential log from A to B.
Welcome to the world of replication.
The problem with (a) is that it's potentially invasive to all of your code, though if you're using an RDBMS you could do some logging perchance via database triggers to keep track of inserts/updates/deletes, and write the information in to a table, then export the relevant rows as your differential log. But, that can be nasty too.
The problem with (b) is the whole "comparing the database" all at once. Fine for 100 rows. Bad for 10^9 rows. Nasty nasty.
In fact, it can all be nasty. Replication is nasty.
A better plan is to look into a "real" replication system designed for the particular databases that you're running (assuming you're running a database). Something that perhaps sends database log records over for synchronization rather than trying to roll your own.
Most of the modern DBMS systems have replication systems.
Gallery2, which allows you to upload photos over http, makes you set up a couple of php parameters, post_max_size and upload_max_filesize, to allow larger uploads. You might want to look into that.
It seems to me that posting large files has problems with browser time-outs and the like, but on the plus side it works with proxy servers and firewalls better than trying a different file upload protocol.
Thanks for the responses. I failed to mention that transferring the file should be relatively fast (few mintues max, is this even possible?). The XML that is requested will be parsed and inserted into a database every night. The XML may be the same from the night before, or it may be different. One solution that was proposed is to zip the xml file and then transfer it. So there are basically two requirements: 1. it has to relatively fast 2. it should minimize the number of writes to the database.
One solution that was proposed is to zip the xml file and then transfer it. but that only satisfies (1)
Any other ideas?
Are there any algorithms that I could apply to compress the XML? How are large files such as MP3s being downloaded in a matter of seconds?
PHP receiving GB's of data will take long and is overhead.
Even more perceptible to flaws.
I would - dispatch the assignment to a shellscript (wget with simple error catching) that is not bothered by execution time and on failure could perhaps even retry on its own merit.
Am not experienced with this, but though one could use exec() or alike, these sadly run modal.
Calling a script with **./test.sh &** makes it run in background and solves that problem / i guess. The script could easily let your PHP pick it back up via a wget `http://yoursite.com/continue-xml-stuff.php?id=1049381023&status=0ยด. The id could be a filename, if you don't need to backtrack lost requests. The status would indicate how the script ended up handling the request.
Have you thought about using some sort of version control system to handle this? You could leverage its ability to calculate and send just the differences in the files, plus you get the added benefits of maintaining a version history of your file.
Since I don't know the details of your situation I'll throw question out there. Just for sake of argument does it have to be HTTP? FTP is much better suited for large data transfer and can be automated easily via PHP or Perl.
If you are using Apache, you might also consider Apache mod_gzip. This should allow you to compress the file automatically and the decompression should also happen automatically, as long as both sides accept gzip compression.

Categories