PHP Post Body Encoding Problems

PHP Post Body Encoding Problems - php

I'm trying to mimick an application that sends octet streams to and from a server. The data contained in the body looks like raw bytes, and I'm fairly certain the data being sent for each command is static, so I'm hoping to map the bytes to something more readable in my application. For example, I'll have an array that does: "test" => "&^D^^&#*#dgkel" So I can call "test" and get the real bytes that need to be sent. Trouble is, PHP seems to convert these bytes. I'm not sure if it is an encoding problem or what, but what has been happing is I'll give it some bytes (for example, �ھ����#�qs��������������������X����������������������������) which has a length of 67 I believe, but PHP will say (when I do a var_dump of the HTTP request) that the headers sent contained "Content-Length: 174" or something close to that and the bytes will look like ï¿½Ú¾ï¿½ï¿½ï¿½ï¿½#ï¿½qsï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Xï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½
So I'm not really sure how to fix this.. Anyone have any ideas? Cheers!
Edit, a little PHP:
$request = new HttpRequest($this->GetMessageURL(), HTTP_METH_POST);
$request->addHeaders($headers);
$request->addRawPostData($buttonMapping[$button]);
$request->send();

Related

Binary Data from PHP to Angular 6 via http

the goal is to make a http request (empty) from Angular 7 to PHP to receive binary data in Angular for the use with protobuf3.
More specifically, the binary data (encoded like described here: https://developers.google.com/protocol-buffers/docs/encoding) in PHP (source) is encapsulated in a string, while the goal in Angular is a Uint8Array.
Therefore, I currently have the following working code:
PHP Code (a simple ProcessWire root template):
header('Content-Type: application/b64-protobuf');
…
echo base64_encode($response->serializeToString());
Angular:
let res = this.httpClient.get(`${this.API_URL}`, { responseType: 'text' });
res.subscribe((data) => {
let binary_string = atob(data);
let len = binary_string.length;
let bytes = new Uint8Array(len);
for (let i = 0; i < len; i++) {
bytes[i] = binary_string.charCodeAt(i);
}
let parsedResponse = pb.Response.deserializeBinary(bytes)
})
As you can see I encode the data as base64 before sending it. So, it is not as efficient as it could be, because base64 reduces the amount of information per character. I tried already quite a lot to get binary transmission working, but in the end the data always gets corrupted, i.e. the variable bytes is not identical to the argument of base64_encode.
But still, according to some sources (e.g. PHP write binary response, Binary data corrupted from php to AS3 via http (nobody says it would not be possible)) it should be possible.
So my question is: What must change to directly transfer binary data? Is it even possible?
What have I tried?
using different headers, such as
header('Content-Type:binary/octet-stream;'); or using Blob in Angular.
I also tried to remove base64_encode from the PHP Code and atob
from the Angular Code. The result: the content of the data is modified between serializeToString and deserializeBinary(bytes), which is not desired.
I checked for possible characters before <?php
Specifications:
PHP 7.2.11
Apache 2.4.35
Angular 7.0.2
If further information is needed, just let me know in the comments. I am eager to provide it. Thanks.

Sending compressed text over Amazon SQS from PHP to NodeJS

I seem to be stuck at sending the compressed messages from PHP to NodeJS over Amazon SQS.
Over on the PHP side I have:
$SQS->sendMessage(Array(
'QueueUrl' => $queueUrl,
'MessageBody' => 'article',
'MessageAttributes' => Array(
'json' => Array(
'BinaryValue' => bzcompress(json_encode(Array('type'=>'article','data'=>$vijest))),
'DataType' => 'Binary'
)
)
));
NOTE 1: I also tried putting compressed data directly in the message, but the library gave me an error with some invalid byte data
On the Node side, I have:
body = decodeBzip(message.MessageAttributes.json.BinaryValue);
Where message is from sqs.receiveMessage() call and that part works since it worked for raw (uncompressed messages)
What I am getting is TypeError: improper format
I also tried using:
PHP - NODE
gzcompress() - zlib.inflateraw()
gzdeflate() - zlib.inflate()
gzencode() - zlib.gunzip()
And each of those pairs gave me their version of the same error (essentially, input data is wrong)
Given all that I started to suspect that an error is somewhere in message transmission
What am I doing wrong?
EDIT 1: It seems that the error is somewhere in transmission, since bin2hex() in php and .toString('hex') in Node return totally different values. It seems that Amazon SQS API in PHP transfers BinaryAttribute using base64 but Node fails to decode it. I managed to partially decode it by turning off automatic conversion in amazon aws config file and then manually decoding base64 in node but it still was not able to decode it.
EDIT 2: I managed to accomplish the same thing by using base64_encode() on the php side, and sending the base64 as a messageBody (not using MessageAttributes). On the node side I used new Buffer(messageBody,'base64') and then decodeBzip on that. It all works but I would still like to know why MessageAttribute is not working as it should. Current base64 adds overhead and I like to use the services as they are intended, not by work arounds.

This is what all the SQS libraries do under the hood. You can get the php source code of the SQS library and see for yourself. Binary data will always be base64 encoded (when using MessageAttributes or not, does not matter) as a way to satisfy the API requirement of having form-url-encoded messages.
I do not know how long the data in your $vijest is, but I am willing to bet that after zipping and then base64 encoding it will be bigger than before.
So my answer to you would be two parts (plus a third if you are really stubborn):
When looking at the underlying raw API it is absolutely clear that not using MessageAttributes does NOT add additional overhead from base64. Instead, using MessageAttributes adds some slight additional overhead because of the structure of the data enforced by the SQS php library. So not using MessageAttributes is clearly NOT a workaround and you should do it if you want to zip the data yourself and you got it to work that way.
Because of the nature of a http POST request it is a very bad idea to compress your data inside your application. Base64 overhead will likely nullify the compression advantage and you are probably better off sending plain text.
If you absolutely do not believe me or the API spec or the HTTP spec and want to proceed, then I would advise to send a simple short string 'teststring' in the BinaryValue parameter and compare what you sent with what you got. That will make it very easy to understand the transformations the SQS library is doing on the BinaryValue parameter.

gzcompress() would be decoded by zlib.Inflate(). gzdeflate() would be decoded by zlib.InflateRaw(). gzencode() would be decoded by zlib.Gunzip(). So out of the three you listed, two are wrong, but one should work.

File reading from PHP using python script

Okay, this is driving me crazy. I have a small file. Here is the dropbox link https://www.dropbox.com/s/74nde57f07jj0zj/transcript.txt?dl=0.
If I try to read the content of the file using python f.read(), I can easily read it. But, if I try to run the same python program using php shell_exec(), the file read fails. This is the error I get.
Traceback (most recent call last):
File "/var/www/python_code.py", line 2, in <module>
transcript = f.read()
File "/opt/anaconda/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 107: ordinal not in range(128)
I have checked all the permission issues and there is no problem with that.
Can anyone kindly shed some light?
Here is my python code.
f = open('./transcript/transcript.txt', 'r')
transcript = f.read()
print(transcript)
Here is my PHP code.
$output = shell_exec("/opt/anaconda/bin/python /var/www/python_code.py");
Thank you!
EDIT: I think the problem is in the file content. If I replace the content with simple 'I eat rice', then I can read the content from php. But the current content cannot be read. Still don't know why.

The problem appears is that your file contains non-ASCII characters, but you're trying to read it as ASCII text.
Either it is text, but is in some encoding or other that you haven't told us (probably UTF-8, Latin-1, or cp1252, but there are countless other possibilities), or it's not text at all, but rather arbitrary binary data.
When you open a text file without specifying an encoding, Python has to guess. When you're running from inside the terminal or whatever IDE you use, presumably, it's guessing the same encoding that you used in creating the file, and you're getting lucky. But when you're running from PHP, Python doesn't have as much information, so it's just guessing ASCII, which means it fails to read the file because the file has bytes that aren't valid as ASCII.
If you want to understand how Python guesses, see the docs for open, but briefly: it calls locale.getpreferredencoding(), which, at least on non-Windows platforms, reads it from the locale settings in the environment. On a typical linux system that's not new enough to be based on systemd but not too old, the user's shell will be set up for a UTF-8 locale, but services will be set up for C locale. If all of that makes sense to you, you may see a way to work around your problem. If it all sounds like gobbledegook, just ignore it.
If the file is meant to be text, then the right solution is to just pass the encoding to the open call. For example, if the file is UTF-8, do this:
f = open('./transcript/transcript.txt', 'r', encoding='utf-8')
Then Python doesn't have to guess.
If, on the other hand, the file is arbitrary binary data, then don't open it in text mode:
f = open('./transcript/transcript.txt', 'rb')
In this case, of course, you'll get bytes instead of str every time you read from it, and print is just going to print something ugly like b'aq\x9bz' that makes no sense; you'll have to figure out what you actually want to do with the bytes instead of printing them as a bytes.

Large print/echo break http response

I am having a problem with an echo/print that returns a large amount of data. The response is broken and is as follows:
end of data
http response header printed in body
start of data
I am running the following script to my browser to replicate the problem:
<?php
// Make a large array of strings
for($i=0;$i<10000;$i++)
{
$arr[] = "testing this string becuase it is must longer than all the rest to see if we can replicate the problem. testing this string becuase it is must longer than all the rest to see if we can replicate the problem. testing this string becuase it is must longer than all the rest to see if we can replicate the problem.";
}
// Create one large string from array
$var = implode("-",$arr);
// Set HTTP headers to ensure we are not 'chunking' response
header('Content-Length: '.strlen($var));
header('Content-type: text/html');
// Print response
echo $var;
?>
What is happening here?
Can someone else try this?

You might have automatic output buffering activated on your server. If the buffer overflows, it just starts pooping out the rest of the data instead.
Note that something like gzip compression also implicitly buffers the output. If it's the case, an ob_end_flush() call after the headers should solve it.

Browsers often limits the characters you're allowed to pass through a get variable.
To work around this, you could base 64 encode the string, and then decode it, once you recievede the respone.
I think there's javascript base 64 encode libraries available.
Like this one:
http://www.webtoolkit.info/javascript-base64.html

How to parse dict output in a user friendly way in PHP?

I am trying to implement a dictionary-type service.
I send a request with php using cURL to dict.org with the dict protocol.
This is my code (which on its own works and may be helpful for future readers):
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "dict://dict.org/define:(hello):english:exact");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$definition = curl_exec($ch);
curl_close($ch);
echo $definition;
The server returns the definition, as expected, along with several headers (that I do not need). The response looks something like this:
220 miranda.org dictd 1.9.15/rf on Linux 2.6.26-2-686 <auth.mime> <29631663.31530.1250750274#miranda.org>
250 ok
150 3 definitions retrieved
151 "Hello" gcide "The Collaborative International Dictionary of English v.0.48"
Hello \Hel*lo"\, interj. & n.
An exclamation used as a greeting, to call attention, as an
exclamation of surprise, or to encourage one. This variant of
{Halloo} and {Holloo} has become the dominant form. In the
United States, it is the most common greeting used in
answering a telephone.
[1913 Webster +PJC]
(... some content removed)
.
250 ok [d/m/c = 3/0/162; 0.000r 0.000u 0.000s]
221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s]
I was wondering if:
a) Is there a way to specify to curl (or an option in the dict protocol) to not return all that extra information (i.e. 250 ok [d/m/c = 3/0/162; 0.000r...])
b) You probably noticed that the dict response returns information that is not displayed in the most user friendly way. I was wondering if anybody knew of any existing php library that will allow me to display this in a nicer way. Otherwise I'd have to code my own.
c) If this is not the way most dictionary websites retrieve their definitions, how do they do it? In my understanding the most comprehensive dictionary database is the one at dict.org (which supports the dict protocol and is where I am sending my cURL request).
Thank you!

Before I start let me state that I don't know the specific of the dict protocol.
I doubt that you'll be able to create a request that only delivers the text. The information you wish to discard looks like status information and is therefore useful.
The way I'd handle this is as follows:
Read the curl response data into an array so that each line is an separate entry in the array. You could use explode() and split at the new line character (\n) to do this.
Iterate the array, EG for ($response as $responseLine) {}
perform a regex (or some other form of pattern matching) on $responseLine to find the definition. It looks like the actual text is the only $responseLine which doesn't start with a number.
You may want to check what characterset the dict protocol uses. I haven't mentioned any error handling, but that should be straight forward.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.