Decode complex JSON in Python - php

I have a JSON object created in PHP, that JSON object contains another escaped JSON string in one of it's cells:
php > $insidejson = array('foo' => 'bar','foo1' => 'bar1');
php > $arr = array('a' => array('a1'=>json_encode($insidejson)));
php > echo json_encode($arr);
{"a":{"a1":"{\"foo\":\"bar\",\"foo1\":\"bar1\"}"}}
Then, with Python, I try deocding it using simplejson:
>>> import simplejson as json
>>> json.loads('{"a":{"a1":"{\"foo\":\"bar\",\"foo1\":\"bar1\"}"}}')
This fails with the following error:
Traceback (most recent call last):
File "", line 1, in ?
File "build/bdist.linux-i686/egg/simplejson/__init__.py", line 307, in loads
File "build/bdist.linux-i686/egg/simplejson/decoder.py", line 335, in decode
File "build/bdist.linux-i686/egg/simplejson/decoder.py", line 351, in raw_decode
ValueError: Expecting , delimiter: line 1 column 14 (char 14)
How can I get this JSON object decoded in Python? Both PHP and JS decode it successfully and I can't change it's structure since that would require major changes in many different components in different languages.
Thanks!

Try prefixing your string with 'r' to make it a raw string:
# Python 2.6.2
>>> import json
>>> s = r'{"a":{"a1":"{\"foo\":\"bar\",\"foo1\":\"bar1\"}"}}'
>>> json.loads(s)
{u'a': {u'a1': u'{"foo":"bar","foo1":"bar1"}'}}
What Alex says below is true: you can just double the slashes. (His answer was not posted when I started mine.) I think that using raw strings is simpler, if only because it's a language feature that means the same thing and it's harder to get wrong.

Try
The Python Standard Library
jyson
Maybe simplejson is too much "simple".

If you want to insert backslashes into a string they need escaping themselves.
import simplejson as json
json.loads('{"a":{"a1":"{\\"foo\\":\\"bar\\",\\"foo1\\":\\"bar1\\"}"}}')
I've tested it and Python handles that input just fine - except I used the json module included in the standard library (import json, Python 3.1).

Related

how to send an array of URI's from php to python and print it there

I am trying to send array of URI's from my php to my python scraper. the array contains links to scrape.
I got example from here, and it worked fine with sending array of integer, but when I try to fill the array using URI's, error showed up.
php snippet:
$array = ["https://www.google.com/","https://www.google.com/","https://www.google.com/"];
//$array = [1,2,3]; // This is worked fine
$resultScript= system('python C:\xampp\htdocs\selenium\dummy.py ' .escapeshellarg(json_encode($array)));
$resultData = json_decode($resultScript, true);
var_dump($resultData);
python :
import sys
import json
def jsontoarray(json_data):
data = json.loads(json_data)
print(json.dumps(data))
jsontoarray(sys.argv[1])
print(data)
result from my IDE
Traceback (most recent call last):
File "C:\xampp\htdocs\selenium\dummy.py", line 8, in <module>
jsontoarray(sys.argv[1])
File "C:\xampp\htdocs\selenium\dummy.py", line 6, in jsontoarray
data = json.loads(json_data)
File "C:\Users\PC2\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Users\PC2\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\PC2\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 3 (char 2)
NULL
Process finished with exit code 0
As said here, reason of getting error like:
json.decoder.JSONDecodeError: Expecting value: line 1 column 3 (char 2)
can be:
non-JSON conforming quoting
XML/HTML output (that is, a string starting with <), or
incompatible character encoding
So based what you have I would bet that there is encoding problem.
Take a look on your PHP file and make sure it's UTF-8 encoded.
Check Python script file encoding and it would be good if you add at the beginning of the file line below to enforce proper encoding.
# -*- coding: utf-8 -*-
so your python file will look like:
# -*- coding: utf-8 -*-
import sys
import json
def jsontoarray(json_data):
data = json.loads(json_data)
print(json.dumps(data))
jsontoarray(sys.argv[1])
additionally for debuging you could print type of sys.argv[1]:
print(type(sys.argv[1]))
you should get string or bytes (for UTF-8) string. If you got something else you could convert it into string like:
str(sys.argv[1])

Python unserialize PHP session

I have been trying to unserialize PHP session data in Python by using phpserialize and a serek's modules(got it from Unserialize PHP data in python), but it seems like impossible to me.
Both modules expect PHP session data to be like:
a:2:{s:3:"Usr";s:5:"AxL11";s:2:"Id";s:1:"2";}
But the data stored in the session file is:
Id|s:1:"2";Usr|s:5:"AxL11";
Any help would be very much appreciated.
The default algorithm used for PHP session serialization is not the one used by serialize, but another internal broken format called php, which
cannot store numeric index nor string index contains special characters (| and !) in $_SESSION.
The correct solution is to change the crippled default session serialization format to the one supported by Armin Ronacher's original phpserialize library, or even to serialize and deserialize as JSON, by changing the session.serialize_handler INI setting.
I decided to use the former for maximal compatibility on the PHP side by using
ini_set('session.serialize_handler', 'php_serialize')
which makes the new sessions compatible with standard phpserialize.
After reaching page 3 on Google, I found a fork of the original application phpserialize that worked with the string that I provided:
>>> loads('Id|s:1:"2";Usr|s:5:"AxL11";')
{'Id': '2', 'Usr': 'AxL11'}
This is how I do it in a stupid way:
At first, convert Id|s:1:"2";Usr|s:5:"AxL11"; to a query string Id=2&Usr=AxL11& then use parse_qs:
import sys
import re
if sys.version_info >= (3, 0):
from urllib.parse import parse_qs, quote
else:
from urlparse import parse_qs
from urllib import quote
def parse_php_session(path):
with open(path, 'r') as sess:
return parse_qs(
re.sub(r'\|s:([0-9]+):"?(.*?)(?=[^;|]+\|s:[0-9]+:|$)',
lambda m : '=' + quote(m.group(2)[:int(m.group(1))]) + '&',
sess.read().rstrip().rstrip(';') + ';')
)
print(parse_php_session('/session-save-path/sess_0123456789abcdef'))
# {'Id': ['2'], 'Usr': ['AxL11']}
It used to work without replacing ; to & (both are allowed). But since Python 3.10 the default separator for parse_qs is &

executing Python script in PHP and exchanging data between the two

Is it possible to run a Python script within PHP and transferring variables from each other ?
I have a class that scraps websites for data in a certain global way. i want to make it go a lot more specific and already have pythons scripts specific to several website.
I am looking for a way to incorporate those inside my class.
Is safe and reliable data transfer between the two even possible ? if so how difficult it is to get something like that going ?
You can generally communicate between languages by using common language formats, and using stdin and stdout to communicate the data.
Example with PHP/Python using a shell argument to send the initial data via JSON
PHP:
// This is the data you want to pass to Python
$data = array('as', 'df', 'gh');
// Execute the python script with the JSON data
$result = shell_exec('python /path/to/myScript.py ' . escapeshellarg(json_encode($data)));
// Decode the result
$resultData = json_decode($result, true);
// This will contain: array('status' => 'Yes!')
var_dump($resultData);
Python:
import sys, json
# Load the data that PHP sent us
try:
data = json.loads(sys.argv[1])
except:
print "ERROR"
sys.exit(1)
# Generate some data to send to PHP
result = {'status': 'Yes!'}
# Send it to stdout (to PHP)
print json.dumps(result)
You are looking for "interprocess communication" (IPC) - you could use something like XML-RPC, which basically lets you call a function in a remote process, and handles the translation of all the argument data-types between languages (so you could call a PHP function from Python, or vice versa - as long as the arguments are of a supported type)
Python has a builtin XML-RPC server and a client
The phpxmlrpc library has both a client and server
There are examples for both, Python server and client, and a PHP client and server
Just had the same problem and wanted to share my solution. (follows closely what Amadan suggests)
python piece
import subprocess
output = subprocess.check_output(["php", path-to-my-php-script, input1])
you could also do: blah = input1 instead of just submitting an unnamed arg... and then use the $_GET['blah'].
php piece
$blah = $argv[1];
if( isset($blah)){
// do stuff with $blah
}else{
throw new \Exception('No blah.');
}
The best bet is running python as a subprocess and capturing its output, then parsing it.
$pythonoutput = `/usr/bin/env python pythoncode.py`;
Using JSON would probably help make it easy to both produce and parse in both languages, since it's standard and both languages support it (well, at least non-ancient versions do). In Python,
json.dumps(stuff)
and then in PHP
$stuff = json_decode($pythonoutput);
You could also explicitly save the data as files, or use sockets, or have many different ways to make this more efficient (and more complicated) depending on the exact scenario you need, but this is the simplest.
For me the escapeshellarg(json_encode($data)) is giving not exactly a json-formatted string, but something like { name : Carl , age : 23 }.
So in python i need to .replace(' ', '"') the whitespaces to get some real json and be able to cast the json.loads(sys.argv[1]) on it.
The problem is, when someone enters a name with already whitespaces in it like "Ca rl".

Pack/unpack and BSON encode/decode dat

We have an iOS app that sends data in an encoded format. In PHP the following code will decode it properly.
bson_decode(pack("H*", $hex_string));
In Python, the following code will create a valid encoded object that the PHP code can then decode (data is a dict in this).
from bson import BSON
def encode(data):
return str(BSON.encode(data)).encode('hex')
The following Python code will decode a string that was encoded by the above Python code:
from bson import BSON
def parse(str):
hexed = str.decode('hex')
return BSON.decode(BSON(hexed))
In theory that should decoded data sent from the app as well. But it throws the following exceptions:
bson.errors.InvalidBSON: bad eoo
It looks like the Objective C code that encodes the data in the app adds some extra padding. If I remove the last characters from the app encoded string it works. Is there anything I can do to account for this? Changing the app code is NOT possible. Even if it were there are millions of device running the old code which I need to support so I still need to have a fix for this.
According the BSON specification, BSON documents must be terminated with a NULL byte (\x00). Have you checked if the byte string you are trying to decode is NULL terminated? If not, you may need to append a NULL byte at the end.

Python's cPickle deserialization from PHP?

I have to deserialize a dictionary in PHP that was serialized using cPickle in Python.
In this specific case I probably could just regexp the wanted information, but is there a better way? Any extensions for PHP that would allow me to deserialize more natively the whole dictionary?
Apparently it is serialized in Python like this:
import cPickle as pickle
data = { 'user_id' : 5 }
pickled = pickle.dumps(data)
print pickled
Contents of such serialization cannot be pasted easily to here, because it contains binary data.
If you want to share data objects between programs written in different languages, it might be easier to serialize/deserialize using something like JSON instead. Most major programming languages have a JSON library.
Can you do a system call? You could use a python script like this to convert the pickle data into json:
# pickle2json.py
import sys, optparse, cPickle, os
try:
import json
except:
import simplejson as json
# Setup the arguments this script can accept from the command line
parser = optparse.OptionParser()
parser.add_option('-p','--pickled_data_path',dest="pickled_data_path",type="string",help="Path to the file containing pickled data.")
parser.add_option('-j','--json_data_path',dest="json_data_path",type="string",help="Path to where the json data should be saved.")
opts,args=parser.parse_args()
# Load in the pickled data from either a file or the standard input stream
if opts.pickled_data_path:
unpickled_data = cPickle.loads(open(opts.pickled_data_path).read())
else:
unpickled_data = cPickle.loads(sys.stdin.read())
# Output the json version of the data either to another file or to the standard output
if opts.json_data_path:
open(opts.json_data_path, 'w').write(json.dumps(unpickled_data))
else:
print json.dumps(unpickled_data)
This way, if your getting the data from a file you could do something like this:
<?php
exec("python pickle2json.py -p pickled_data.txt", $json_data = array());
?>
or if you want to save it out to a file this:
<?php
system("python pickle2json.py -p pickled_data.txt -j p_to_j.json");
?>
All the code above probably isn't perfect (I'm not a PHP developer), but would something like this work for you?
I know this is ancient, but I've just needed to do this for a Django 1.3 app (circa 2012) and found this:
https://github.com/terryf/Phpickle
So just in case, one day, someone else needs the same solution.
If the pickle is being created by the the code that you showed, then it won't contain binary data -- unless you are calling newlines "binary data". See the Python docs. Following code was run by Python 2.6.
>>> import cPickle
>>> data = {'user_id': 5}
>>> for protocol in (0, 1, 2): # protocol 0 is the default
... print protocol, repr(cPickle.dumps(data, protocol))
...
0 "(dp1\nS'user_id'\np2\nI5\ns."
1 '}q\x01U\x07user_idq\x02K\x05s.'
2 '\x80\x02}q\x01U\x07user_idq\x02K\x05s.'
>>>
Which of the above looks most like what you are seeing? Can you post the pickled file contents as displayed by a hex editor/dumper or whatever is the PHP equivalent of Python's repr()? How many items in a typical dictionary? What data types other than "integer" and "string of 8-bit bytes" (what encoding?)?

Categories