preg_replace on HTML page gives net ::ERR_INVALID_CHUNKED_ENCODING

preg_replace on HTML page gives net ::ERR_INVALID_CHUNKED_ENCODING - php

I have a simple curl call that retrieves HTML page from the server, then preg_replace() that inserts something in the page and then the result of that is echoed back to the browser.
What I noticed is that if HTTP server that curl is trying to get HTML page from, uses header 'Transfer-Enoding: chunked', html output will be somehow encoded(I noticed a few strange signs) and preg_replace() call will do the job but the browser will just get ERR_INVALID_CHUNKED_ENCODING and won't load the page. There must be a way to replace part of the page without messing up chunked encoding ?

Chunked transfer-encoding is a HTTP 1.1 feature where the server doesn't know the size of the resource when it starts to send the data so it sends a series "chunks" to the client, each chunk preceded with the size (in number of bytes in hexadecimal) of the chunk.
Alas, if you insert data into a chunk, you must change the size of the chunk too when you send it to the browser. Alternatively of course, you get the full thing, do your replacement and send out the entire response in one single chunk (or even without chunks).
A proper HTTP 1.1 client should be able to decode the chunks and a proper HTTP 1.1 server should send a legitimate series of chunks (a somewhat common server-side error is to leave out the final zero-sized chunk).
See here for the spec: https://www.rfc-editor.org/rfc/rfc7230#section-4.1

Related

Verifying complete response from cURL

Sometimes when using multiple concurrent connections and scraping with cURL in my PHP script, incomplete webpages are returned. Is there some value in curl_getinfo() that will let me know if a webpage was 100% fetched vs. only 90% fetched?
Would the content-size header of a returned page be the actual size of what was returned or would it be the entire page? If so, I could check the content-size against the actual size of the response..
Thanks!

Assuming your question is whether you can check if the content size header comes from the other side or is calculated on your side, yes, you can use that header to check if you've received the full response because it is generated on the other side from the actually intended content. A few things, though:
It's Content-Length, not Content-Size;
you can use it as long as you trust the other party to set it correctly;
it may not be available because while it SHOULD exist, it is not strictly necessary.

Display BLOB data PHP?

How would I display BLOB data with PHP? I've entered the BLOB into the DB, but how would I retrieve it? Any examples would be great.

I considered voting to close this a a duplicate, but the title is pretty good, and looking through other questions, I don't find a complete answer to a general question. These sorts of questions betray an absence of understanding of the basics of HTTP, so I wrote this long answer instead. I've glossed over a bit, but anyone who understands the following probably wouldn't need to ask a question like this one. Or if they did, they'd be able to ask a more specific question.
First - If you're storing images or other files in the database, stop and reconsider your architecture. RDBMSes aren't really optimized to handle BLOBs. There are a number of (non-relational) databases that are specifically tuned to handle files. They are called filesystems, and they're really good at this. At least 95% of the time that I've found regular files stuck in a RDBMS, it's been pointless. So first off, consider not storing the file data in the database, use the filesystem, and store some small data in the database (paths if you must, often you can organize your filesystem so all you need is a unique id).
So, you're sure you want to store your blob in the database?
In that case, you need to understand how HTTP works. Without getting into too much detail, whenever some client requests a URL (makes an HTTP Request), the server responds with a HTTP Response. A HTTP response has two major parts: the headers, and the data. The two parts are separated by two consecutive newlines.
Headers, on the wire, are simple plain-text key/value pairs that look like:
Name: value
and are separated by a newline.
The data is basically a BLOB. It's just data. The way that data is interpreted is decided (by the client) based on the value of the Content-Type header that accompanies it. The Content-Type header specifies the Internet Media Type of the data contained in the data section.
See it work
There's nothing magic about this. For a regular HTML page, the whole response is human readable. Try the following:
$ telnet google.com 80 # connect go google.com on port 80
You'll see something like:
Trying 74.125.113.104...
Connected to google.com.
Escape character is '^]'.
Now type:
GET /
(followed by return).
You've just made a very simple HTTP request! And you've probably received a response. Look at the response. You'll see all the headers, followed by a blank line, followed by the HTML code of the google home page.
So what?
So now you know what web servers do. They take requests (like GET /), and return responses (comprised of headers followed by a blank line (two consecutive newlines) followed by data).
Now, it's time to realize that:
Your web application is really just a customized web server
All that code you write takes whatever the request is, and translates it into an HTTP response. So you're basically just making a specialized version of apache, or IIS, or nginx, or lighty, or whatever.
Now, the default way that a web server usually handles requests is to look for a file in a directory (the document root), look at it to figure out which headers to send, and then send those headers, followed by the file contents.
But, while your webserver does all that magically for files in the filesystem, it is completely ignorant of some BLOB in an RDBMS. So you have to do it yourself.
If you know the contents of your BLOB are, say, a JPG image that should be named based on a "name" column in the same table, you might do something like:
<?php
$result = query('select name, blobdata from table where id = 5');
$row = fetch_assoc($result);
header('Content-Type: image/jpeg');
echo $row['blobdata'];
?>
(If you wanted to hint that browser should download the file instead of display it, you might use an additional header like: header('Content-Disposition: attachment; filename="' . $row['name'].'"');)
PHP is smart enough to provide the header() function, which sets headers, and makes sure they're sent first (and separated form the data). Once you're done setting headers, you just send your data.
As long as your headers give the client enough information about how to handle the data payload, everything is hunkey-dorey.
Hooray.

Simple example:
$blob_data = "something you've got from BLOB field";
header('Content-type: image/jpeg'); // e.g. if it's JPEG image
echo $blob_data;

Stripp all header in PHP response

How if at all can I strip all the header from a PHP response through apache in order to just stream the text response. I've tried adding a custom htaccess but to no avail. I've got limited control of the hosting server. The stream is read by an embedded device which doesn't need any headers.

It get's to a point where certain headers are NEEDED to be interpreted by the browser so it can render the output. If the reason why you want to remove the header is for a chat-like feature, think about using a persitant keep-alive connection
Tips in reducing bandwidth
Use ajax: keep the response from PHP in JSON format and update DOM elements
Gzip.
Just don't worry about headers -- typically a HTTP OK response will only take up < 200 bytes, hardly anything in comparison to the actual page content. Focus on where it really matters.
Edit:
To suit your case look into using sockets (UDP would be a good option if wanting to cut back on a lot of bandwidth) socket_listen() (non UDP) or socket_bind() capabable of UDP

That's impossible.
You are using HTTP protocol and HTTP protocol response always contains headers.
Either do not use HTTP or teach your device to strip headers. It's not that hard.
Anyway, php has very little to do with removing headers. There is also a web-server that actually interacts with your device and taught to send proper headers.

There is a PHP function called header_remove(). I never used it before but you can try if this works for you. Note that this function is available since PHP 5.3.0.

How is http post method implemented?

I know want to know what happens behind the scene of a HTTP post method.
i.e browser sends a HTTP post request to a server side script in PHP (eg).
How does PHP's $_POST variable get the values from the client.
Could someone explain in details or point to a guide.

The HTTP protocol(*) specifies how the browser should send the request.
HTTP basically consists of a set of headers in plain text, separated by line feeds, followed by the data being transmitted. Inside the HTTP request, POST data is actually formatted pretty much the same as GET data; it's just in a different part of the HTTP headers.
You can use tools like Firebug or Fiddler to see exactly how the headers and data are formatted for incoming and outgoing HTTP requests. It's actually all quite simple to read, so you should be able to work it out just by looking.
Once it gets to the server, the PHP interpreter is responsible for translating the raw HTTP request data into its standard $_GET, $_POST, etc variables. This is something that PHP does for you.
Other languages (eg Perl) do not have this functionality built in, so a Perl programmer would have to have code in their program to parse the incoming request data into useful variables. Fortunately, even Perl has a standard library which can be included that does the job, so even Perl programmers don't generally have to write the code themselves any more.
The way PHP, and any other language, does it is simply string manipulation. As I said, the HTTP data is plain text and is received in simple string format, so it's just a case of breaking it down by splitting it on question mark and equal sign characters.
As PHP does it all behind the scenes, you probably don't need to worry about the exact mechanisms it uses, but the PHP source code is available if you really want to find out.
I said it's all in plain text. HTTPS, of course, is encrypted. However by the time PHP gets hold of it, the Apache server has already done the decryption, so as far as PHP is concerned it's still plain text.
(*) Before anyone pulls me up on it, yes, I know that saying "HTTP protocol" is a redundancy, like "ATM machine" or "PIN number".

The browser encodes the data according to the content-type of the form, then transmits it as the body of a POST request. PHP then picks it up and populates $_POST with the names and values (performing special handling when the name includes the characters [ and ] or .).

I'd suggest to get a capturing proxy (e.g. Fiddler) or a network capture tool (e.g. Wireshark) and watch your own browsing traffic for a while; it will give you a nice view of the issue.
Other than that, POST is rather similar to GET, except that the data is sent in the body of the request instead of the URL, and there are two ways to encode them (multipart-form-data in addition to the urlencode that's shared with GET)

Well, let's ilustrate step by step, starting with a page containing a [form action="foo.php" method="post"]
Once you click submit (or hit enter), browser will trigger an event named "submit". This event can be catched internally for processing with javascript/dom, and this is what most sites do for validation or Ajax routines.
If routines does not stop the flow with a return false, browser continues to process the post request (this process is the same as making a post with XMLHttpRequest Object).
Browser will check first method, action and content encoding, then parse inputs values to know the size of data it will send, and encode it.
Finally it send something like this (raw values):
POST /foo.php HTTP/1.1
Host: example.org
Content-Type: application/x-www-form-urlencoded
Content-Length: 7
foo=bar
This is a POST request. But note that it can send content-length and send variables in chunks. Browser and server know this can happen (this is the POST method purpose). When a server receives a POST request, it keeps listening to the browser until the content received match the informed content length.
Now the other side. Server receives the request, listen the content, parse it (foo = bar; xxx = baz), and make it available on its environment for that specific request, thus you can catch it with PHP or Python, or Java...
That's it. Ah note you can pass both GET and POST variables in the same request!
Using a [form action="foo.php?someVar=123&anotherVar=TRUE" method="post"]
Will make the browser send the request as
POST /foo.php?someVar=123&anotherVar=TRUE HTTP/1.1
Host: example.org
Content-Type: application/x-www-form-urlencoded
Content-Length: 7
foo=bar
And server when parsing this request will make the following variables available:
GET[someVar] = 123
GET[anotherVar] = TRUE
POST[foo] = bar

How to send data in PHP without content-length?

I know it's possible, but I can't seem to figure it out.
I have a mySQL query that has a couple hundred thousand results. I want to be able to send the results, but it seems the response requires content-length header to start downloading.
In phpMyAdmin, if you go to export a database, it starts the download right away, FF just says unknown file size, but it works. I looked at that code, but couldn't figure it out.
Help?
Thanks.

this document in section 14.14 states that In HTTP, it SHOULD be sent whenever the message's length can be determined prior to being transferred, unless this is prohibited by the rules in section 4.4. This means it DOESN'T have to be sent if you can't say the size of it.
Just don't send it.
If you want to send parts of data to the browser before all data is available, do you flush your output buffer? Maybe that's what is the problem, not lack of a header?
The way you use flush is like that:
generate some output, which should add it to the buffer
flush() it, which should send current buffer to the client
goto 1
So, if your query returns a lot of results, you could just generate the output for, lets say, 100 or 1000 of them, then flush, and so on.
Also, to tell the client browser to attempt to save a file instead of displaying it in window, you can try using the Content-disposition: attachment header as well. See the specification here, 19.5.1 section.

You can use chunked transfer-encoding. Basically, you send a "Transfer-encoding: chunked" header, and then the data is sent in chunked mode, meaning that you send the length of a chunk followed by the chunk. You keep repeating this until the end of data, at which point you send a zero-length chunk.
Details are in RFC 2616.

It’s possible that you are using gzip which waits for all content to be generated. Check your .htaccess for directives regarding this.

You don’t need to set a Content-Length header field. This header field is just to tell the client amount of data it has to expected. In fact, a “wrong” value can cause that the client discards some data.

If you use the LiveHTTPHeaders plugin in FireFox, you can see the difference between the headers being sent by phpMyAdmin and the headers being sent by your application. I don't know specifically what you're missing, but this should give you a hint. One header that I see running a quick test is "Transfer-Encoding: chunked", and I also see that they're not sending content-length. Here's a link to the FireFox plugin if you don't already have it:
LiveHTTPHeaders

http content length is a SHOULD field, so you can drop it...
but you have to set transfer encoding then
have a look at Why "Content-Length: 0" in POST requests?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.