get some funny charactres in json text - php

I am calling a web service that returns json text which ends up with some garbage at the start "". Any help or pointers appreciated. I am a bit rusty with the curl options and this from some old code i have used, it has been some time since I have done work like this.
When i call the web service through the browser i get nice json text, such as following. I have removed some of the values to make only a few lines
{ "values": [[1511596680,3],[1511596740,2],[1511596800,0],[1511596860,6],[1511596920,0],[1511596980,0],[1511597040,0],[1511597100,0],[1511597160,0],[1511603220,0],[1511603280,0],[1511603340,0],[1511603400,0],[1511603460,0],[1511603520,0],[1511603580,0],[1511603640,0],[1511603700,0],[1511603760,0],[1511603820,0]]}
when i call via a php page that acts as a wrapper. i get some crap in front of it, which is preventing php from calling json_decode on it. The called url is exactly the same that i used previously to call the web service in the browser.
{ "values": [[1511596680,3],[1511596740,2],[1511596800,0],[1511596860,6],[1511596920,0],[1511596980,0],[1511597040,0],[1511597100,0],[1511597160,0],[1511603220,0],[1511603280,0],[1511603340,0],[1511603400,0],[1511603460,0],[1511603520,0],[1511603580,0],[1511603640,0],[1511603700,0],[1511603760,0],[1511603820,0]]}
my php code to call the web service is as follows. I am not sure if $post_string being empty is a problem. The url consists of params passed in a url string in form ?param=val&param2=val2 etc.
$contenttype = 'application/json';
$headers = array(
'Content-Type: ' . $contenttype,
'Content-Length: ' . strlen($post_string) /* this an empty string */
);
/* dump of headers
Array
(
[0] => Content-Type: application/json
[1] => Content-Length: 0
)
*/
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $method); // this is get */
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
if (is_array($headers)
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$output = curl_exec($ch); // this contains the crap at the start */

i had insert the following to remive the Byte Order Mark.
$output = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $output);
thanks to the following link
How do I remove  from the beginning of a file?

the "funny characters" are caused by the UTF-8 BOM, that means the string starts with EF BB BF signaling that it was encoded in UTF-8.
you can remove the BOM like this: (found in another answer, by jasonhao):
//Remove UTF8 Bom
function remove_utf8_bom($text)
{
$bom = pack('H*','EFBBBF');
$text = preg_replace("/^$bom/", '', $text);
return $text;
}

Related

PHP - json_decode - issues decoding string

I'm playing with the API from deepl.com that provides automatic translations. I call the API through cURL and I get a json string in return which appears to be fine but cannot be decoded by PHP for some reason.
Let me show first how I make the cURL call :-
$content = "bonjour <caption>monsieur</caption> madame";
$url = 'https://api.deepl.com/v2/translate';
$fields = array(
'text' => $content,
'target_lang' => $lg,
'tag_handling' => 'xml',
'ignore_tags' => 'caption',
'auth_key' => 'my_api_key');
$fields_string = "";
foreach($fields as $key=>$value)
{
$fields_string .= $key.'='.$value.'&';
}
rtrim($fields_string, '&');
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_POST, count($fields));
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields_string);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded','Content-Length: '. strlen($fields_string)));
$result = curl_exec($ch);
curl_close($ch);
If at this stage I do
echo $result;
I get:
{"translations":[{"detected_source_language":"FR","text":"Hola <caption>monsieur</caption> Señora"}]}
Which seems ok to me. Then if I use code below -
echo gettype($result);
I get "string" which is still ok but now, the following code fails:
$result = json_decode($result,true);
print_r($result);
The output is empty!
If I now do something like this:
$test = '{"translations":[{"detected_source_language":"FR","text":"Hola <caption>monsieur</caption> Señora"}]}';
echo gettype($test);
$test = json_decode($test,true);
print_r($test);
I get a perfectly fine array:
(
[translations] => Array
(
[0] => Array
(
[detected_source_language] => FR
[text] => Hola <caption>monsieur</caption> Señora
)
)
)
I did nothing else than copy/pasting the content from the API to a static variable and it works but coming from the API, it doesn't. It's like the data coming from the API is not understood by PHP.
Do you have any idea of what's wrong?
Thanks!
Laurent
I've had very similar issues before and for me the issue was with the encoding of the data returned from an API being unicode. I'm guessing when you do your copy/paste the string you hard-code ends up being a different encoding so it works fine when passed into json_decode.
The PHP docs specify json_decode only works with UTF-8 encoded strings:
http://php.net/manual/en/function.json-decode.php
You may be able to use mb_convert_encoding() to convert to UTF-8:
http://php.net/manual/en/function.mb-convert-encoding.php
Try this before calling json_decode:
$result = mb_convert_encoding($result, "UTF-8");
Make sure to set CURLOPT_RETURNTRANSFER to true. Only then will curl_exec actually return the response, otherwise it will output the response and return a boolean, indicating success or failure.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
if ($result !== false) {
$response = json_decode($result, true);
// do something with $response
} else {
// handle curl error
}
Like said #Eilert Hjelmeseth you have some special char in your JSON string => "Señora"
Another way to encode a string to UTF8: utf8_encode() :
$result = json_decode(utf8_encode($result),true);

Sending binary data to php-cgi via shell_exec

I have a script that sends a post request to /usr/bin/php-cgi. The script is working fine when dealing with plain text, but fails when the data is binary:
$data = file_get_contents('example.jpg');
$size = filesize('example.jpg') + 5;
$post_data = 'file='.$data;
$response = shell_exec('echo "'.$post_data.'" |
REDIRECT_STATUS=CGI
REQUEST_METHOD=POST
SCRIPT_FILENAME=/example/script.php
SCRIPT_NAME=/script.php
PATH_INFO=/
SERVER_NAME=localhost
SERVER_PROTOCOL=HTTP/1.1
REQUEST_URI=/example/index.html
HTTP_HOST=example.com
CONTENT_TYPE=application/x-www-form-urlencoded
CONTENT_LENGTH='.$size.' php-cgi');
I get the following error:
sh: -c: line 1: unexpected EOF while looking for matching `"'
sh: -c: line 5: syntax error: unexpected end of file
I guess this is because the data I'm trying to send is binary and must be encoded/escaped somehow.
Like I said the above code works if the data is plain text:
$post_data = "data=sample data to php-cgi";
$size = strlen($post_data);
I also tried to encode the data using base64_encode() but then I face another problem; the data must be decoded from within the receiving script. I was thinking that perhaps I could encode the data in base64 and then add some content or mime type header to force the php-cgi binary to make the conversation?
One other problem is that I like to send the data as an attachment and therefore I think we must set CONTENT_TYPE to multipart/form-data; boundary=<random_boundary> and CONTENT_DISPOSITION to form-data, but I'm not sure how to set these headers from the commandline.
You are trying to upload binary files through shell_exe to post the contents. shell_exe doesn't accept the binary encoding. If you change the image data to base64 then you problem would be solved. But you will get into another problem i.e. how to identify the submitted text/string i.e. text or image. Presently, I find no solution to identify the submitted value is image or text.
Since, you want to post the image and data, I would suggest you to use CURL and providing the way to submit the image and data through CURL which is used by me also:
$local_directory=dirname(__FILE__).'/local_files/';
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_VERBOSE, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible;)");
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_URL, 'http://localhost/curl_image/uploader.php' );
//most importent curl assues #filed as file field
$post_array = array(
"my_file"=>"#".$local_directory.'filename.jpg',
"upload"=>"Upload"
);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_array);
$response = curl_exec($ch);
echo $response;
You can also store all post data in a temporary file and then cat that file into php-cgi:
char file[20] = "/tmp/php"; //temp post data location
char name[10];
webserver_alpha_random(name, 10); //create random name
strcat(file, name);
int f = open(file, O_TRUNC | O_WRONLY | O_CREAT);
write(f, conn->content, conn->content_len); //post data from mongoose
close(f);
/* cat temp post data into php-cgi */
/* this function also takes care of all environment variables */
/* but the idea is understandable */
output = webserver_shell("cat %s | php-cgi %s", conn, request, file, request);
unlink(file);
Finally I got this working, the solution was to send a base64 encoded request which also contained a constant named field like ORIGINAL_QUERY_URI to a sort of gateway file that in turn would decode the request and bounce it to it's original destination.
Basically, the server does this:
encode any received file data in base64
add a form-data field named ORIGINAL_REQEUST_URI with the original url as value
assemble a valid http request body encoded as multipart/form-data based on above
send this data using shell_exec to a gateway file that will decode the content
Here is the command I used to send everything to php-cgi:
shell_exec('echo "' . $body . '" |
HOST=localhost
REDIRECT_STATUS=200
REQUEST_METHOD=POST
SCRIPT_FILENAME=/<path>/gate.php
SCRIPT_NAME=/gate.php
PATH_INFO=/
SERVER_NAME=localhost
SERVER_PROTOCOL=HTTP/1.1
REQUEST_URI=/example.php
HTTP_HOST=localhost
CONTENT_TYPE="multipart/form-data; boundary=' . $boundary . '"
CONTENT_LENGTH=' . $size . ' php-cgi');
Then inside the gate.php file I decoded the data and included the file pointed to by theORIGINAL_REQUEST_URI field.
// File: gate.php
if (isset($_FILES)) {
foreach ($_FILES as $key => $value) {
// decode the `base64` encoded file data
$content = file_get_contents($_FILES[$key]['tmp_name']);
$content = base64_decode($content);
file_put_contents($_FILES[$key]['tmp_name'], $content);
}
}
// bounce to original destination
include_once($_POST['ORIGINAL_REQUEST_URI']);

MongoLab: using php to PUT update using CURL

I have been trying to use MongoLabs api to simplify my life, and for the most part it was working until I tried to push updates to the db using php and curl, anyway no dice. My code is similar to this:
$data_string = json_encode('{"user.userEmail": "USER_EMAIL", "user.pass":"USER_PASS"}');
try {
$ch = curl_init();
//need to create temp file to pass to curl to use PUT
$tempFile = fopen('php://temp/maxmemory:256000', 'w');
if (!$tempFile) {
die('could not open temp memory data');
}
fwrite($tempFile, $data_string);
fseek($tempFile, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "PUT");
//curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
//curl_setopt($ch, CURLOPT_INFILE, $tempFile); // file pointer
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, DB_API_REQUEST_TIMEOUT);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data_string);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Content-Type: application/json',
'Content-Length: ' . strlen($data_string),
)
);
$cache = curl_exec($ch);
curl_close($ch);
} catch (Exception $e) {
return FALSE;
}
My problem seems to be with MongoLab's api. The code bit works perfect except for the fact that labs tells me that the data I am passing is an 'Invalid object{ "user.firstName" :"Pablo","user.newsletter":"true"}: fields stored in the db can't have . in them.'. I have tried passing a file and using the postfields, but neither worked.
When I test it on firefox's Poster plugin the value work fine. If someone out there has a better understanding of MongoLabs stuff I would love some enlightenment. Thanks in advance!
You will need to remove the dots from your field names. You might try going to a schema like this:
{ "user": { "userEmail": "USER_EMAIL", "pass": "USER_PASS" } }
Unfortunately, MongoDB doesn't support using dots in field names. This is because its query language uses the dot as an operator to chain nested field names. If MongoDB were to allow dots in field names dotted queries would become ambiguous without some kind of escaping mechanism.
If this document were legal:
{ "bow.ties": "uncool", "bow": { "ties": "cool" } }
This query would be ambiguous:
{ "bow.ties": "cool" }
Not clear if the document would match or not. Did you mean the field "bow.ties" or the field "ties" nested within the value of field "bow"?
Here's a capture of a mongo shell session demonstrating these ideas.
% mongo
MongoDB shell version: 2.1.1
connecting to: test
> db.stuff.save({"bow.ties":"uncool"})
Wed Jul 18 11:17:59 uncaught exception: can't have . in field names [bow.ties]
> db.stuff.save({"bow":{"ties":"cool"}})
> db.stuff.find({"bow.ties":"cool"})
{ "_id" : ObjectId("5006ff3f1348197bacb458f7"), "bow" : { "ties" : "cool" } }
After sometime working with some other functionality of the project I realized my mistake, and ultimately the source of the confusion.
The curl PUT was intended to send modifier operations to MongoDB. I was sending all my data as JSON and was interrupting decoding it to use in PHP then re-encoding part of it to send back. So the orignal data received looks something like this:
{"userEmail":"p#g.com","pass":"****", "$oid":"5555", "$set":{"user.firstName":"Pablo","user.newsletter":"true"}}
The problem was that I was grabbing the value of "$set" object (in php) and reencoding only the value, {"user.firstName":"Pablo","user.newsletter":"true"} without the operator "$set" and was sending it giving the error. In this case the proper string to send would have been {"$set":{"user.firstName":"Pablo","user.newsletter":"true"}}
While this is a simple mistake I hope that the next time someone does something like this and gets an invalid object error that they are luck enough to find this.

Decoding JSON after sending using PHP cUrl

I've researched everywhere and cannot figure this out.
I am writing a test cUrl request to test my REST service:
// initialize curl handler
$ch = curl_init();
$data = array(
"products" => array ("product1"=>"abc","product2"=>"pass"));
$data = json_encode($data);
$postArgs = 'order=new&data=' . $data;
// set curl options
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLINFO_HEADER_OUT, TRUE);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postArgs);
curl_setopt($ch, CURLOPT_URL, 'http://localhost/store/rest.php');
// execute curl
curl_exec($ch);
This works fine and the request is accepted by my service and $_Post is populated as required, with two variables, order and data. Data has the encoded JSON object. And when I print out $_Post['data'] it shows:
{"products":{"product1":"abc","product2":"pass"}}
Which is exactly what is expected and identical to what was sent in.
When I try to decode this, json_decode() returns nothing!
If I create a new string and manually type that string, json_decode() works fine!
I've tried:
strip_tags() to remove any tags that might have been added in the http post
utf8_encode() to encode the string to the required utf 8
addslashes() to add slashes before the quotes
Nothing works.
Any ideas why json_decode() is not working after a string is received from an http post message?
Below is the relevant part of my processing of the request for reference:
public static function processRequest($requestArrays) {
// get our verb
$request_method = strtolower($requestArrays->server['REQUEST_METHOD']);
$return_obj = new RestRequest();
// we'll store our data here
$data = array();
switch ($request_method) {
case 'post':
$data = $requestArrays->post;
break;
}
// store the method
$return_obj->setMethod($request_method);
// set the raw data, so we can access it if needed (there may be
// other pieces to your requests)
$return_obj->setRequestVars($data);
if (isset($data['data'])) {
// translate the JSON to an Object for use however you want
//$decoded = json_decode(addslashes(utf8_encode($data['data'])));
//print_r(addslashes($data['data']));
//print_r($decoded);
$return_obj->setData(json_decode($data['data']));
}
return $return_obj;
}
Turns out that when JSON is sent by cURL inside the post parameters & quot; replaces the "as part of the message encoding. I'm not sure why the preg_replace() function I tried didn't work, but using html_entity_decode() removed the &quot and made the JSON decode-able.
old:
$return_obj->setData(json_decode($data['data']));
new:
$data = json_decode( urldecode( $data['data'] ), true );
$return_obj->setData($data);
try it im curious if it works.

Escaping CURL # symbol with PHP

I'm writing a php application that submits via curl data to sign up for an iContact email list. However I keep getting an invalid email address error. I think this may be due to the fact that I'm escaping the # symbol so it looks like %40 instead of #. Also, according to the php documentation for curl_setopt with CURLOPT_POSTFIELDS:
The full data to post in a HTTP
"POST" operation. To post a file,
prepend a filename with # and use the
full path.
So, is there anyway to pass the # symbol as post data through curl in php without running it through urlencode first?
Use http_build_query() on your data-array first before passing it to curl_setopt(), that will lead to it sending the form as application/x-www-form-encoded instead of multipart/form-data (and thus the # is not interpreted).
Also why do you really care about the # in an email-address? It only matters if the # is the first character, not somewhere in the middle.
After search PHP curl manual, I found there is no information to escape the first ‘#’ if the post field is a string instead of a file if post with multipart/form-data encoding.
The way I worked around this problem is prefixing a blank at the beginning of the text. While our backend API will strip blanks so it could remove the blank and restore the original text. I don't know weather Twitter API will trim blanks on user input.
If so, this workaround also works for you.
If any one found the way to escaping the first '#' when using PHP curl with multipart/form-data encoding, please let us know.
I ran into the same issue, though with curl itself and not PHP curl.
When using curl's field option '-F' a leading # symbol will not be sent in the POST but instead will instruct curl to send the file name that immediately succeeds the symbol as part of the POST.
Fortunately, curl offers another option '--form-string', which behaves the same way as '-F', except that the 'form-string' option is not parsed.
As an example, if you want to use curl to POST field1 with value "#value" and file1 with the file "testfile.txt" you can do so as follows:
curl "http://www.url.com" --form-string "field1=#value" -F "file1=#testfile.txt"
This is the true solution that can support both string containing # and files.
Solution for PHP 5.6 or later:
Use CURLFile instead of #.
Solution for PHP 5.5 or later:
Enable CURLOPT_SAFE_UPLOAD.
Use CURLFile instead of #.
Solution for PHP 5.3 or later:
Build up multipart content body by youself.
Change Content-Type header by yourself.
The following snippet will help you :D
<?php
/**
* For safe multipart POST request for PHP5.3 ~ PHP 5.4.
*
* #param resource $ch cURL resource
* #param array $assoc "name => value"
* #param array $files "name => path"
* #return bool
*/
function curl_custom_postfields($ch, array $assoc = array(), array $files = array()) {
// invalid characters for "name" and "filename"
static $disallow = array("\0", "\"", "\r", "\n");
// initialize body
$body = array();
// build normal parameters
foreach ($assoc as $k => $v) {
$k = str_replace($disallow, "_", $k);
$body[] = implode("\r\n", array(
"Content-Disposition: form-data; name=\"{$k}\"",
"",
filter_var($v),
));
}
// build file parameters
foreach ($files as $k => $v) {
switch (true) {
case false === $v = realpath(filter_var($v)):
case !is_file($v):
case !is_readable($v):
continue; // or return false, throw new InvalidArgumentException
}
$data = file_get_contents($v);
$v = call_user_func("end", explode(DIRECTORY_SEPARATOR, $v));
list($k, $v) = str_replace($disallow, "_", array($k, $v));
$body[] = implode("\r\n", array(
"Content-Disposition: form-data; name=\"{$k}\"; filename=\"{$v}\"",
"Content-Type: application/octet-stream",
"",
$data,
));
}
// generate safe boundary
do {
$boundary = "---------------------" . md5(mt_rand() . microtime());
} while (preg_grep("/{$boundary}/", $body));
// add boundary for each parameters
array_walk($body, function (&$part) use ($boundary) {
$part = "--{$boundary}\r\n{$part}";
});
// add final boundary
$body[] = "--{$boundary}--";
$body[] = "";
// set options
return curl_setopt_array($ch, array(
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => implode("\r\n", $body),
CURLOPT_HTTPHEADER => array(
"Expect: 100-continue",
"Content-Type: multipart/form-data; boundary={$boundary}", // change Content-Type
),
));
}
?>
#PatricDaryll's answer is correct, but I needed to make a bit of researches to understand where to use this http_build_query function.
To clarify and summarise, instead of doing:
curl_setopt($ch, CURLOPT_POSTFIELDS, $array);
You will use:
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($array));
Simple but confusing... curl is smart enough to understand if you gave him a string or an array.
In order to escape the # sign in the non-file data, you need to do the following.
Prepend the text string with the NULL character
$postfields = array(
'upload_file' => '#file_to_upload.png',
'upload_text' => sprintf("\0%s", '#text_to_upload')
);
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://example.com/upload-test');
curl_setopt($curl, CURLOPT_POSTFIELDS, $postfields);
curl_exec($curl);
curl_close($curl);

Categories