http_post_data adding extra characters in response - php

Hey Guys I am getting some extra characters like '5ae' and '45c' interspersed along with valid data when using http_post_data. The data I am sending is XML and so is the response. The response contains these weird characters thats making the XML invalid. If I use fsockopen I do not have this issue. Would really like some input on this.

Your question is not giving much details, but (quite a wild guess, but this reminds me of that) this could related to Chunked transfer encoding (quoting) :
If a Transfer-Encoding header with a
value of chunked is specified in an
HTTP message, the body of the message
is made of an unspecified number of
chunks ending with a last, zero-sized,
chunk.
Each non-empty chunk starts with the
number of octets of the data it embeds
(size written in hexadecimal) followed by a CRLF (carriage return
and line feed), and the data itself.
The 5ae and 45c you're getting in your data could correspond to the size of each chunk.
If you are trying to send HTTP requests by hand, that might no be such a good idea : HTTP is not such an easy protocol, and you should use already-existing libraries that will deal with those kind of troubles for you.
For instance, you could take a look at curl -- see curl_setopt for the impressive list of possible options.
Edit : I realize that http_post_data is a function provided by the PECL http extension.
There's a function that might interest you, in that library, to decode chunked data : http_chunked_decode

Yes, it is a result of Chucked Transfer Encoding.
We may observe this behavior in fiddler by unchecking the 'Chunked Transfer-Encoding' in the Transformer tab of response pane.

Related

How PHP knows encoding of textual data when parsing POST 'multipart/form-data' input?

I wonder how exactly PHP detects encoding of input data which came from HTTP request.
Imagine API which accepts requests from various clients (mobile apps maybe, not from browsers). The incoming request has header Content-Type: multipart/form-data and by this, no additional encoding information was given.
Let's say that there was a text field within the POST body and I want to apply validation on it. How do I know the encoding I should use?
I found no clear answer in the documentation so please help. I guess, I may look at Mbstring extension which has mb_check_encoding() and it could be used for this purpose. Is it a reliable way?
How client can say which encoding they used when sending multipart/form-data POST request:
As RFC-7578 says (https://www.rfc-editor.org/rfc/rfc7578#section-4.4) each part of multipart/form-data payload can have its own Content-Type information. Which solves a problem of communicating encoding information.
Ok, there are actually two problems:
How to communicate encoding that was used for text creation (in case of transferring it with multibyte/form-data method);
How to detect what real encoding was used for text creation;
The first problem is a problem of communication and can be solved both by reading a documentation as well as providing an additional explicit header for text section within request payload. See RFC-7578 for details.
The second problem has no proper solution. It is hard to detect the actual encoding that was used to assemble the text. Though, you can guess with functions like mb_detect_encoding().

PHP_NORMAL_READ cutting off the data being sent

I need some help please,I have problem in reading the binary data that was sent by the device via socket.I could not receive the exact data that was sent. I am using this code
$data = #socket_read($read_sock,2048,PHP_NORMAL_READ);
I am using PHP_NORMAL_READ because it will stop reading with this "\r\n".
but when I receive,the data is not exact it only receive few binary data.
The length parameter specifies the maximum length that will be read from the stream. The PHP documentation is a bit misguiding on this subject, but what I think it means is, that you will get:
less than or exactly 'length' bytes
at least one byte
no '\r' or '\n' in the response, unless it is the only character
Most of the Socket APIs you encounter work this way, they may give you less bytes than requested, because more bytes may not be available and the data may arrive in smaller parts than that the device sent them in. The solution is to read from the socket repeatedly, until you get what you want (that means until you get string ending with newline, in your case).
You also may want to consult http://php.net/manual/en/function.socket-read.php, where the commenters suggest the functions is somewhat buggy when used with PHP_NORMAL_READ. It might be worth searching for some socket library for PHP that supports readLine.

Strange Characters In XML Response From Google Weather API Error

I've just launched a small application i've been working on. Nothing major, but something I would like to get properly working. It's at www.wedrapp.com.
Most of the time it works perfectly fine. Enter a city, XML is returned, parsed and the data returned is shown to the user.
Unfortunately however, an error is returned when certain cities are searched such as Marseille. If you search Marseille you will see what I mean. I have a feeling it is to do with special characters, as Marseille searched actually returns Marseilles, Provence-Alpes-Côte d'Azur in the XML. Similarly Paris gives an error as it actually returns Paris, Île-de-France.
Can anyone shed some light on how to strip these strange characters out, or at least stop them providing an error before hitting the screen? It is XML parsed with PHP.
Find out in which encoding the XML returned by google is. Then re-encode it from that encoding to UTF-8, then you can load the XML with SimpleXML.
The Google Weather API XML has an encoding based on the language that is specified when it's requested (It is possible to specify the encoding you want to have as well, I come to that soon).
For example, it can be ISO-8859-2 as a related question PHP XML — Google Weather API - parsing and modifying data (Language, UTF-8, and F to Celsius) shows.
You can find out which one by looking into the HTTP Response Header Content-Type:
Content-Type: text/xml; charset=ISO-8859-1
You used utf8_encodeDocs to change the encoding, it converts a ISO-8859-1 (also referred to as Latin-1) encoded string to UTF-8. It looks like that standard queries to the secret google weather API return this by default.
You can specify the encoding you'd like to have by adding a oe parameter to the query. For example to get it directly as UTF-8:
http://www.google.com/ig/api?weather=Mountain+View&oe=utf-8
^
Doing this will ensure you always get a specific encoding instead that you need to guess or to parse response headers.

Strange characters at beginning of XML AJAX response?

I'm making multiple AJAX calls that returns XML data. When I get the data back, my success function (in JQuery) tries to turn the XML to JSON (using a plugin). I was quickly reminded why I can't assume I would be getting VALID XML back from my AJAX request -- because it turns out a few of the XML responses were invalid -- causing the JSON conversion to fail, script to fail, etc...
My questions are:
What is the best way to check for
valid XML on an AJAX response? Or,
should I just attempt the JSON
conversion, then do a quick check if
the JSON object is valid?
In troubleshooting the XML, I found that there are a few strange characters at the VERY beginning of the XML response. Here's an image from my Firebug:
Should I try to detect and strip the response of those chars or could there possibly be something wrong with my encoding?
Any help is appreciated! Let me know if more info is needed!
It's the UTF-8 byte-order mark when incorrectly interpreted as ISO-8859-1.
You can't safely strip this because it's just a symptom of a larger problem. Your content is encoded as UTF-8. Somewhere along the way you are decoding it as ISO-8859-1 instead. If you try to hide the problem by stripping the BOM, you're only setting yourself up for more problems down the line as soon as you start using non-ASCII characters. The only reason things are even looking sort-of right is because ASCII is a common subset of both UTF-8 and ISO-8859-1.
The strange characters are the Byte Order Mark and are actually valid XML, you can most likely just strip them without risk in most circumstances.

Anyone have issues going from ColdFusion's serializeJSON method to PHP's json_decode?

The Interwebs are no help on this one. We're encoding data in ColdFusion using serializeJSON and trying to decode it in PHP using json_decode. Most of the time, this is working fine, but in some cases, json_decode returns NULL. We've looked for the obvious culprits, but serializeJSON seems to be formatting things as expected. What else could be the problem?
UPDATE: A couple of people (wisely) asked me to post the output that is causing the problem. I would, except we just discovered that the result set is all of our data (listing information for 2300+ rental properties for a total of 565,135 ASCII characters)! That could be a problem, though I didn't see anything in the PHP docs about a max size for the string. What would be the limiting factor there? RAM?
UPDATE II: It looks like the problem was that a couple of our users had copied and pasted Microsoft Word text with "smart" quotes. Those pesky users...
You could try operating in UTF-8 and also letting PHP know that fact.
I had an issue with PHP's json_decode not being able to decode a UTF-8 JSON string (with some "weird" characters other than the curly quotes that you have). My solution was to hint PHP that I was working in UTF-8 mode by inserting a Content-Type meta tag in the HTML page that was doing the submit to the PHP. That way the content type of the submitted data, which is the JSON string, would also be UTF-8:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
After that, PHP's json_decode was able to properly decode the string.
can you replicate this issue reliably? and if so can you post sample data that returns null? i'm sure you know this, but for informational sake for others stumbling on this who may not, RFC 4627 describes JSON, and it's a common mistake to assume valid javascript is valid JSON. it's better to think of JSON as a subset of javascript.
in response to the edit:
i'd suggest checking to make sure your information is being populated in your PHP script (before it's being passed off to json_decode), and also validating that information (especially if you can reliably reproduce the error). you can try an online validator for convenience. based on the very limited information it sounds like perhaps it's timing out and not grabbing all the data? is there a need for such a large dataset?
I had this exact problem and it turns out it was due to ColdFusion putting none printable characters into the JSON packets (these characters did actually exist in our data) but they can't go into JSON.
Two questions on this site fixed this problem for me, although I went for the PHP solution rather than the ColdFusion solution as I felt it was the more elegant of the two.
PHP solution
Fix the string before you pass it to json_decode()
$string = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
ColdFusion solution
Use the cleanXmlString() function in that SO question after using serializeJSON()
You could try parsing it with another parser, and looking for an error -- I know Python's JSON parsers are very high quality. If you have Python installed it's easy enough to run the text through demjson's syntax checker. If it's a very large dataset you can use my library jsonlib -- memory use will be higher than with demjson, but it will run faster because it's written in C.

Categories