Partial uploads, upload only header

Partial uploads, upload only header - php

Is there a way in PHP use a HTML form to upload only the header of the file. The length of the header is defined within the first few bytes of the file, I want only that part nothing else. After I have received the last byte of the header, I don't want any more of the file. Is there a graceful way to do this in PHP or am I going to have to do some socket programming?

In the PHP side, you cannot abort an upload (HTTP connection) gracefully. The client would face a "connection reset by peer" error (or whatever friendlier flavored message the webbrowser made out of the error). Best what you can do is to ignore the read bytes, but that's not (clearly?) easy in PHP since it's a pretty high level language.
In the HTML side, there's absolutely no way to upload only the "header" of a file. It's the complete file or nothing. There are countless different file formats and HTML has no notion of any of them, let alone know how to distinguish the "header" part of the file in question.
Your best bet is to write a little application which get served by the webpage and downloaded into client's machine and runs locally over there. This application should have a file chooser/picker/opener and then programmatically determine the "header" and send it to the server. You could do that in flavor of a Java Applet or MS Silverlight or probably Adobe Flash.

Related

Process Uploaded file on web server without storing locally first?

I am trying to process the user uploaded file real time on the websever,
but it seems, APACHE invokes PHP, only once complete file is uploaded.
When i uploaded the file using CURL, and set
Transfer-Encoding : "Chunked"
I had some success, but can't do same thing via browser.
I used Dropzone.js but when i tried to set same header, it said Transfer -Encoding is an unsafe header, hence not setting it.
This answer explains what is the issue there.
Can't set Transfer-Encoding :"Chunked from Browser"
In a Nutshell problem is , when a user uploads the file to webserver, i want webserver to start processing it as soon as first byte is available.
by process i mean, PIPING it to a Named Pipe.
Dont want 500mb first getting uploaded to a server, then start processing it.
But with current Webserver (APACHE - PHP), I cant seem to be able to accomplish it.
could someone please explain, what technology stack or workarounds to use, so that i can upload the large file via browser and start processing it, as soon as first byte is available.

It is possible to use NodeJS/Multiparty to do that. Here they have an example of a direct upload to Amazon S3. This is the form, which sets content type to multipart/form-data. And here is the function for form parts processing. part parameter is of type ReadableStream, which will allow per-chunk processing of the input using data event.
More on readable streams in node js is here.

If you really want that (sorry don`t think thats a good idea) you should try looking for a FUSE Filesystem which does your job.
Maybe there is already one https://github.com/libfuse/libfuse/wiki/Filesystems
Or you should write your own.
But remember as soon as the upload is completed and the post script finishes his job the temp file will be deleted

you can upload file with html5 resumable upload tools (like Resumable.js) and process uploaded parts as soon as they received.
or as a workaround , you may find the path of uploaded file (usually in /tmp) and then write a background job to stream it to 3rd app. it may be harder.
there may be other solutions...

How is it possible to get the name of PHP's currently uploading file's temp name

As all of us know, PHP finishes the upload and the enables you to use move_uploaded_file(); before this, however, it creates a temp file and then does the job. I want to know is it possible to get the name of this uploaded file during the file upload and before populating it into $_FILES?
I want to get the upload progress, while $_SESSION and Javascript onprogress solution both suck..

$_FILES['file']['tmp_name']; is the filename. It is not possible in PHP (without using ugly tricks) to get the filename before the upload is finished.
To do this, you have to fallback on either Flash (uploadify) or CGI (Perl / Python / C++ / Other)

A "reliable" progress bar, which seems to be your goal, will always require some sort of server and client support. In its most general and portable instance, PHP will see only the completed upload and you'll get no progress bar, but only the filled $_FILES structure.
On some platforms the information can be garnered from the system itself. For example under Linux/Apache you can inspect what temporary files Apache has opened in the /proc pseudo-filesystem, where available; so you need to put in the requisites "Linux, Apache, php5_module, /proc".
You can use a dedicated POST endpoint that does not terminate on the Web server, but on a specially crafted uploader process (I worked on a Perl script doing this years ago; I recall it used POE, and the architecture):
POST (from browser) ==> (server, proxying) ==> UPLOADER
The uploader immediately echoes a crafted GET to the server, activating
a PHP "pre-upload" page, and then might call a progress GET URL periodically
to update the upload status. When completed, it would issue a pseudo POST
to PHP "almost" as if it came from the client, sending $_POST['_FILES']
instead of $_FILES.
The $_SESSION solution is a good compromise but relies on the server not doing buffering.
A better and more "modern" solution would be to leverage the chunked upload AJAX trick and get resumable uploads, reliable progress and large file support all in one nifty package. See for example this other answer. Now you get wider server support but the solution won't work on some older browsers.
You could offer the user the choice between old-style FILE upload, Flash uploader (which bypasses all problems as it doesn't rely on the browser but on Flash code), Java FTP upload control (same thing, but sometimes with some protocol and firewall issues since it doesn't use HTTP as the container web page does), and AJAX HTML5 chunking, possibly based on browser capabilities.
I.e., a user with IE6 would see a form saying
SORRY!
Your browser does not support large file uploads and progress bar.
To send a file of no more than XXX meg,
[ ] [Choose file...] [ >> BEGIN UPLOAD >>> ]

PHP redirect file post stream

I'm writing an API using php to wrap a website functionality and returning everything in json\xml. I've been using curl and so far it's working great.
The website has a standard file upload Post that accepts file(s) up to 1GB.
So the problem is how to redirect the file upload stream to the correspondent website?
I could download the file and after that upload it, but I'm limited by my server to just 20MG. And it seems a poor solution.
Is it even possible to control the stream and redirect it directly to the website?

I preserverd original at the bottom for posterity, but as it turns out, there is a way to do this
What you need to use is a combination of HTTP put method (which unfortunately isn't available in native browser forms), the PHP php://input wrapper, and a streaming php Socket. This gets around several limitations - PHP disallows php://input for post data, but it does nothing with regards to PUT filedata - clever!
If you're going to attempt this with apache, you're going to need mod_actions installed an activated. You're also going to need to specify a PUT script with the Script directive in your virtualhost/.htaccess.
http://blog.haraldkraft.de/2009/08/invalid-command-script-in-apache-configuration/
This allows put methods only for one url endpoint. This is the file that will open the socket and forward its data elsewhere. In my example case below, this is just index.php
I've prepared a boilerplate example of this using the python requests module as the client sending the put request with an image file. If you run the remote_server.py it will open a service that just listens on a port and awaits the forwarded message from php. put.py sends the actual put request to PHP. You're going to need to set the hosts put.py and index.php to the ones you define in your virtual host depending on your setup.
Running put.py will open the included image file, send it to your php virtual host, which will, in turn, open a socket and stream the received data to the python pseudo-service and print it to stdout in the terminal. Streaming PHP forwarder!
There's nothing stopping you from using any remote service that listens on a TCP port in the same way, in another language entirely. The client could be rewritten the same way, so long as it can send a PUT request.
The complete example is here:
https://github.com/DeaconDesperado/php-streamer
I actually had a lot of fun with this problem. Please let me know how it works and we can patch it together.
Begin original answer
There is no native way in php to pass a file asynchronously as it comes in with the request body without saving its state down to disc in some manner. This means you are hard bound by the memory limit on your server (20MB). The manner in which the $_FILES superglobal is initialized after the request is received depends upon this, as it will attempt to migrate that multipart data to a tmp directory.
Something similar can be acheived with the use of sockets, as this will circumvent the HTTP protocol at least, but if the file is passed in the HTTP request, php is still going to attempt to save it statefully in memory before it does anything at all with it. You'd have the tail end of the process set up with no practical way of getting that far.
There is the Stream library comes close, but still relies on reading the file out of memory on the server side - it's got to already be there.
What you are describing is a little bit outside of the HTTP protocol, especially since the request body is so large. HTTP is a request/response based mechanism, and one depends upon the other... it's very difficult to accomplish a in-place, streaming upload at an intermediary point since this would imply some protocol that uploads while the bits are streamed in.
One might argue this is more a limitation of HTTP than PHP, and since PHP is designed expressedly with the HTTP protocol in mind, you are moving about outside its comfort zone.
Deployments like this are regularly attempted with high success using other scripting languages (Twisted in Python for example, lot of people are getting onboard with NodeJS for its concurrent design patter, there are alternatives in Ruby or Java that I know much less about.)

How to read a client-side file header from a webpage?

I am making an online tool for identifying certain file types. I need to access some byte values from the file header to do this.
The user selects the file on the client machine. Somehow, I need to get the key byte values from the file, and then these are looked up in a server side database to categorize the file.
How can I read bytes from a client-side file?
I know I could have the user upload the file to the server, but these files are very large, and I only need a few bytes, so it would be slow and wasteful to upload the whole file.
Could I somehow upload part of the file? It seems it is difficult to cancel a html form upload and the file-part is not available after cancel. Is this correct?
Is it possible to read a file in javascript? I have googled this, but the answer is unclear. I have read that it is possible with a java applet, but only if the applet is signed.
Is there some other way?

You can use html5, but will need to fallback on flash or some other non-javascript method for older browsers.
http://www.html5rocks.com/en/tutorials/file/dndfiles/

So. as Said above you must use non-javascript methodds. But each of this methods has some minus.
FLASH - bad work with proxy. Really bad. Of course you can use flash obly for get base64 code of file and give it to js. In this case this will be work greate.
Java Applet - greate work but not many users have JVM or versions of JVM may not be sasme (but if you will use JDK1.4 or 1.5 thi is no problem).
ActiveX - work only in IE and on Windows
HTML5 File Api - not cross browsers solution. Will be work only on last browsers and not in all.
of course much better use server side - in php for example getmimetype and other functions.
But I can manually change headers of my file. For example i can add to php file headers from jpeg or png - and your script will be think that is image.
So this is bad solution : use headers. For check filetype maybe simple use mimetype of file of trust to user and generate icon through file extension

Uploading big files over HTTP

I need to upload potentially big (as in, 10's to 100's of megabytes) files from a desktop application to a server. The server code is written in PHP, the desktop application in C++/MFC. I want to be able to resume file uploads when the upload fails halfway through because this software will be used over unreliable connections. What are my options? I've found a number of HTTP upload components for C++, such as http://www.chilkatsoft.com/refdoc/vcCkUploadRef.html which looks excellent, but it doesn't seem to handle 'resume' of half done uploads (I assume this is because HTTP 1.1 doesn't support it). I've also looked at the BITS service but for uploads it requires an IIS server. So far my only option seems to be to cut up the file I want to upload into smaller pieces (say 1 meg each), upload them all to the server, reassemble them with PHP and run a checksum to see if everything went ok. To resume, I'd need to have some form of 'handshake' at the beginning of the upload to find out which pieces are already on the server. Will I have to code this by hand or does anyone know of a library that does all this for me, or maybe even a completely different solution? I'd rather not switch to another protocol that supports resume natively for maintenance reasons (potential problems with firewalls etc.)

I'm eight months late, but I just stumbled upon this question and was surprised that webDAV wasn't mentioned. You could use the HTTP PUT method to upload, and include a Content-Range header to handle resuming and such. A HEAD request would tell you if the file already exists and how big it is. So perhaps something like this:
1) HEAD the remote file
2) If it exists and size == local size, upload is already done
3) If size < local size, add a Content-Range header to request and seek to the appropriate location in local file.
4) Make PUT request to upload the file (or portion of the file, if resuming)
5) If connection fails during PUT request, start over with step 1
You can also list (PROPFIND) and rename (MOVE) files, and create directories (MKCOL) with dav.
I believe both Apache and Lighttpd have dav extensions.

You need a standard size (say 256k). If your file "abc.txt", uploaded by user x is 78.3MB it would be 313 full chunks and one smaller chunk.
You send a request to upload stating filename and size, as well as number of initial threads.
your php code will create a temp folder named after the IP address and filename,
Your app can then use MULTIPLE connections to send the data in different threads, so you could be sending chunks 1,111,212,313 at the same time (with separate checksums).
your php code saves them to different files and confirms reception after validating the checksum, giving the number of a new chunk to send, or to stop with this thread.
After all thread are finished, you would ask the php to join all the files, if something is missing, it would goto 3
You could increase or decrease the number of threads at will, since the app is controlling the sending.
You can easily show a progress indicator, either a simple progress bar, or something close to downthemall's detailed view of chunks.

libcurl (C api) could be a viable option
-C/--continue-at
Continue/Resume a previous file transfer at the given offset. The given offset is the exact number of bytes that will be skipped, counting from the beginning of the source file before it is transferred to the destination. If used with uploads, the FTP server command SIZE will not be used by curl.
Use "-C -" to tell curl to automatically find out where/how to resume the transfer. It then uses the given output/input files to figure that out.
If this option is used several times, the last one will be used

Google have created a Resumable HTTP Upload protocol. See https://developers.google.com/gdata/docs/resumable_upload

Is reversing the whole proccess an option? I mean, instead of pushing file over to the server make the server pull the file using standard HTTP GET with all bells and whistles (like accept-ranges, etc.).

Maybe the easiest method would be to create an upload page that would accept the filename and range in parameter, such as http://yourpage/.../upload.php?file=myfile&from=123456 and handle resumes in the client (maybe you could add a function to inspect which ranges the server has received)

# Anton Gogolev
Lol, I was just thinking about the same thing - reversing whole thing, making server a client, and client a server. Thx to Roel, why it wouldn't work, is clearer to me now.
# Roel
I would suggest implementing Java uploader [JumpLoader is good, with its JScript interface and even sample PHP server side code]. Flash uploaders suffer badly when it comes to BIIIGGG files :) , in a gigabyte scale that is.

F*EX can upload files up to TB range via HTTP and is able to resume after link failures.
It does not exactly meets your needs, because it is written in Perl and needs an UNIX based server, but the clients can be on any operating system. Maybe it is helpful for you nevertheless:
http://fex.rus.uni-stuttgart.de/

Exists the protocol called TUS for resumable uploads with some implementations in PHP and C++

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.