I have a PHP script that receives images from remote devices and saves them to the database. The script, launched from Apache, receives first a header defining what is being uploaded, and then the contents of the images uploaded, all as a single multipart transmission. After successfully adding the images, it replies with a confirmation for the device.
Now the problem is the connection isn't very reliable. Sometimes the transmission times out. That wouldn't be a problem as the device resends the data after some time if it didn't receive the confirmation. Except if the transmission is broken halfway through, Apache launches the script as usual, and the script happily saves the incomplete set of images to the database, with their creation timestamps as unique keys. Then the remote device resends data, which then the script receives correctly, but it's unable to save it as the unique key is already taken by the corrupt data.
Is there some reliable way to either tell from within a PHP script that it's been launched on incomplete multipart transmission, or prevent Apache from starting it if the transmission didn't end successfully?
(we can't really change the database structure or the format received from the remote device.)
Check your device's application code for send timeout. Maybe connection is too slow and it brakes it as time limit is reached. In that case apache server would run php script with partially received data in spite of not matching Content-Length header.
Related
I am trying to process the user uploaded file real time on the websever,
but it seems, APACHE invokes PHP, only once complete file is uploaded.
When i uploaded the file using CURL, and set
Transfer-Encoding : "Chunked"
I had some success, but can't do same thing via browser.
I used Dropzone.js but when i tried to set same header, it said Transfer -Encoding is an unsafe header, hence not setting it.
This answer explains what is the issue there.
Can't set Transfer-Encoding :"Chunked from Browser"
In a Nutshell problem is , when a user uploads the file to webserver, i want webserver to start processing it as soon as first byte is available.
by process i mean, PIPING it to a Named Pipe.
Dont want 500mb first getting uploaded to a server, then start processing it.
But with current Webserver (APACHE - PHP), I cant seem to be able to accomplish it.
could someone please explain, what technology stack or workarounds to use, so that i can upload the large file via browser and start processing it, as soon as first byte is available.
It is possible to use NodeJS/Multiparty to do that. Here they have an example of a direct upload to Amazon S3. This is the form, which sets content type to multipart/form-data. And here is the function for form parts processing. part parameter is of type ReadableStream, which will allow per-chunk processing of the input using data event.
More on readable streams in node js is here.
If you really want that (sorry don`t think thats a good idea) you should try looking for a FUSE Filesystem which does your job.
Maybe there is already one https://github.com/libfuse/libfuse/wiki/Filesystems
Or you should write your own.
But remember as soon as the upload is completed and the post script finishes his job the temp file will be deleted
you can upload file with html5 resumable upload tools (like Resumable.js) and process uploaded parts as soon as they received.
or as a workaround , you may find the path of uploaded file (usually in /tmp) and then write a background job to stream it to 3rd app. it may be harder.
there may be other solutions...
In short, if I am sending an HTTP POST with a large-ish (20-30mb) payload and the connection drops halfway through sending the request to the server, can I recover the 10mb+ that was sent before the connection dropped?
In my testing of PHP on NGINX, if the connection drops during the upload, my PHP never seems to start. I have ignore_user_abort(1) at the top of the script, but that only seems to be relevant once a complete request has been received.
Is there a configuration setting somewhere that will allow me to see all of the request that was received, even if it wasn't received in full?
I'm sending these files mostly over intermittent connections, so I'd like to send as much as I can per request, and then just ask the server where to continue from. As things stand at the moment I have to send the files in pieces, and reducing the size of the pieces if there are errors, or increasing the size if there haven't been any errors for a while. That's very slow and wasteful of bandwidth.
=======
I should clarify that it's not so much about uploading a large file in one go that I'm after as much as if the connection breaks, can I pick up from where I left off? In all my testing, if the complete post is not received, the whole request is junked and PHP not notified, so I have to start from scratch.
I'll have to run some tests, but are you saying that if I used chunked transfer encoding for the request, that PHP would get all the chunks received before disconnection? It's worth a try, and certainly better that making multiple smaller posts on the offchance that the connection will break.
Thanks for the suggestion.
Never process big file uploads through scripting backend (Ruby, PHP), there is a built-in direct upload functionality called client_body_in_file_only, see my very deep overview on it here: https://coderwall.com/p/swgfvw/nginx-direct-file-upload-without-passing-them-through-backend
The only limit it doesn't work with multipart form data, but only via AJAX or direct POST from mobile or server to server.
I have an odd problem: I've got an AJAX file uploader (fineuploader, as it happens), with a server-side script (PHP) that handles the upload. Although the uploader has an "allowedExtensions" setting, I want the allowed extensions to be specified on the server side, so my upload handler checks the file extension and returns an error response if the file's extension is not allowed.
The problem I'm having is that whereas in my dev environment the upload handler returns this response straight away (as expected) and the upload stops, on another server the upload goes ahead and the response is only returned after the upload has completed. Which potentially means a very long wait before receiving an error message.
Looking at the network info in developer tools, it seems the browser is waiting for a response the whole time, and (on the problematic server) only receives it after the upload has completed, which seems to suggest that on the server the upload handler script is actually not being executed until after all the data has been received.
It seems to me that the most likely culprit is some setting to do with how PHP handles uploads / multi-part form data, but I can't figure out what that might be. Would be grateful for any advice!
Update:
It seems the problem is the same on both servers (I just didn't notice the lag on one). So it seems my upload handler script is not executing until the file transfer has completed (this seems likely because the script checks the filename early on and throws an Exception if the extension is wrong, so it should respond quickly once it's started. Also, it always responds almost immediately when the upload has completed - however long that takes - suggesting that's when it's being executed).
Is this just a feature of how PHP handles multi-part form data? Is there any way of allowing the script to respond immediately if the filename is unsuitable?
It could be as simple as the dev server being really fast to receive the file and respond as fast. If the dev server is on your own machine or a local dev server, 100mbits connexion makes even the pretty large file blazingly fast while on a production server that is often outside of the current network, the upload is long...
Sadly no, it's not possible for PHP to respond before the request being complete because thats the very nature of HTTP. You can cut off the connection once the request is sent and not read the response but you can't expect to receive a response until the whole request is sent. Although i used to do this with my friends, that is cut them off and answer before the end of the question, i can ensure you that only humans are capable of that feat! Oh and dont do that, it breaks friendships
o multipart doesn't mean chunked upload, it means the message is separated into different parts using a message boundary to separate the different element. If the process started before the whole request was sent anyway, you'd need to integrate mechanisms to detect if a certain part of your request was completely uploaded which would make the web even harder to program than it is now!
You can look at this for an example of what a multipart request looks like htmlcodetutorial.com/forms/form_enctype.html
I have a really weird behavior going on.
I'm hosting a tracking software for users, that mainly logs mobile traffic. Now, the path is as follows:
1. My client gets a php code snippet to put in his website.
2. This code sends a cURL post (based on predefined post fields like: visiotr IP, useragent, host etc) to my server.
3. my server logs the data, and decide what the risk level is.
4. it then responds the client server about the status. That is, it sends "true" or "false" back to the client server.
5. client server gets that r
esponse, and decides what to do (load diffrent HTML content, redirect, block the visitor etc).
The problem I'm facing is, for some reason, all the requests made from my client's server to my server, are recorded and stored in the a log file, but my clients report of click loss as if my server sends back the response, but their server fails to receive those responses or something.
I may note that, there are tons of requests every minute from different clients' servers, and from each client himself.
Could the reason be related to the CURL_RETURNTRANSFER not getting any response ? or, maybe the problem is cURL overload ?
I really have no idea. My server is pretty fast, and uses only 10% of its sources.
Thanks in advance for your thoughts.
You touched very problematic domain - high load servers, you problem can be in so many places, so you will have to really spend time to fix it, or at least partially fix.
First of all, you should understand what is really going on, check out this simplified scheme:
Client's php code tries to open connection to your server, to do this it sends some data via network to your server
Your server (I suppose apache) tries to accept it, if it has resources - check max connections properties in apache config
If server can accept connection it tries to create new thread (or use one from thread pool)
After thread is started, it runs your php script
Your php script do some work, connecto to db and sends response back via network
Client waits till the answer from p5 or closes connection because of timeout
So, at each point you can have bottleneck:
Network bandwidth
Max opened connections
Thread pool size
Script execution time
Max database connections, table locks, io wait times
Clients timeouts
And it is not a full list of possible places where problem can occur and finally lead to empty curl response.
From the very start I suggest you to add logging to both PHP codes (clients and servers) and store all curl_error problems in some text file, at least you will see what problems occur often.
I have a PHP file in server. When a user sends the request, that PHP file stores their user values in to database. But my problem is, it is storing values for some particular requests only.
If you want to ensure that a request was received, you can
Check the server access log.
No matter how your PHP script is hosted, you should be able to access this. The specifics vary depending on your situation, though.
Try logging requests to a text file when the script starts.
If your database insertion somehow fails, this should still work, so it's a possible way to ensure that the script actually gets the request.
Your webserver will probably log all connection attempts; find out where those logs are stored (it varies based on which webserver you use, which operating system you run it under, and how you have configured it), and then you can search for the specific URL that you are interested in.