How to handle nested objects in processing a JSON stream - php

I am working on a program where we need to process very large JSON file, so I would like to use a streaming event oriented reader (like jsonstreamingparser) so that we can avoid loading the entire structure into memory at one time. Something I'm concerned about though is the object structure that seems to be required to make this work.
For example, say I'm writing a program like Evite to send out invitations to an activity, with a JSON structure like:
{
"title": "U2 Concert",
"location": "San Jose",
"attendees": [
{"email": "foo#bar.com"},
{"email": "baz#bar.com"}
],
"date": "July 4, 2015"
}
What I would like to do is have a programming "event" that when the stream encounters a new attendee, sends out an invite email. But, I can't do that because the stream has not yet reached the date of the event.
Of course, given the example, it's fine to just read everything into memory - but my dataset has complex objects where the "attendees" attribute are, and there can be tens of thousands of them.
Another "solution" is to just mandate: you HAVE to put all the required "parent" attributes first, but that is what I'm trying to find a way around.
Any ideas?

This is another 'tree walking' problem. The JSON streaming parser reads the source file and starts to build the 'tree'. It does this by collecting 'elements' and storing them in memory. To enable us to process each entry, it 'emits events' at convenient times. Which means it will call your functions passing useful values as required.
Examples of 'tree events' are:
start_object()
end_object()
start_array()
end_array()
...
The 'example' code provided with the 'Parser' is a program that uses the Parser to build the tree in memory. I just modified that example to call our function whenever it has a 'complete Event' stored.
So, how do we identify a 'complete Event'?
The input file consist of an array where each entry is a JSON 'obbject'. Each object consists of 'sub entries' that make up the data of the 'object'.
Now, as we traverse the 'tree' building it, our code will be called at various points as shown above. Specifically when 'starting' and 'ending' of objects and arrays. We need to collect all the data for the 'outer object'.
How do we identify this? We record where we are in the 'tree' as the processing proceeds. This we do by keeping track of the depth of 'nesting' in the tree. Hence the 'levels'. The 'start' of an object 'nests' down one level, the 'end' of an object 'unnests' one level.
The objects we are interested in are a 'level 1'.
The code provided:
1) keeps track of the 'levels' and calls our function when it reaches the end of an object that is at 'level 1'.
2) Accumulates the data in the appropriate structure from the start of the the object at 'level 1'.
Requirements:
1) Call a 'callable' routine when there is a 'complete Event' that can be processed.
Assumptions:
The input file consists of an array of 'Events'.
Processing:
Parse the file
Whenever the current Event is 'complete'
Execute the 'processEvent' callable with access to the current Event.
Source Code:
Source: class Q31079129Listener at Pastebin.com
Source: index.php file at Pastebin.com
Source: test datafile : Q31079129.json at Pastebin.com
Demonstration using the code
Code: index.php
<?php // https://stackoverflow.com/questions/31079129/how-to-handle-nested-objects-in-processing-a-json-stream
require_once __DIR__ .'/vendor/jsonstreamingparser/src/JsonStreamingParser/Parser.php';
require_once __DIR__ .'/vendor/jsonstreamingparser/src/JsonStreamingParser/Listener/IdleListener.php';
require_once __DIR__ .'/Q31079129Listener.php';
/**
* The input file consists of a JSON array of 'Events'.
*
* The important point is that when the file is being 'parsed' the 'listener' is
* 'walking' the tree.
*
* Therefore
* 1) Each 'Event' is at 'level 1' in the tree.
*
* Event Level Changes:
* Start: level will go from 1 => 2
* End: level will go from 2 => 1 !!!!
*
* Actions:
* The 'processEvent' function will be called when the
* 'Event Level' changes to 2 from 1.
*
*/
define('JSON_FILE', __DIR__. '/Q31079129.json');
/**
* This is called when one 'Event' is complete
*
* #param type $listener
*/
function processEvent($listener) {
echo '<pre>', '+++++++++++++++';
print_r($listener->get_event());
echo '</pre>';
}
// ----------------------------------------------------------------------
// the 'Listener'
$listener = new Q31079129Listener();
// setup the 'Event' Listener that will be called with each complete 'Event'
$listener->whenLevelAction = 'processEvent';
// process the input stream
$stream = fopen(JSON_FILE, 'r');
try {
$parser = new JsonStreamingParser_Parser($stream, $listener);
$parser->parse();
}
catch (Exception $e) {
fclose($stream);
throw $e;
}
fclose($stream);
exit;
Code: Q31079129Listener.php
<?php // // https://stackoverflow.com/questions/31079129/how-to-handle-nested-objects-in-processing-a-json-stream
/**
* This is the supplied example modified:
*
* 1) Record the current 'depth' of 'nesting' in the current object being parsed.
*/
class Q31079129Listener extends JsonStreamingParser\Listener\IdleListener {
public $whenLevelAction = null;
protected $event;
protected $prevLevel;
protected $level;
private $_stack;
private $_keys;
public function get_event() {
return $this->event;
}
public function get_prevLevel() {
return $this->prevLevel;
}
public function get_level() {
return $this->prevLevel;
}
public function start_document() {
$this->prevLevel = 0;
$this->level = 0;
$this->_stack = array();
$this->_keys = array();
// echo '<br />start of document';
}
public function end_document() {
// echo '<br />end of document';
}
public function start_object() {
$this->prevLevel = $this->level;
$this->level++;
$this->_start_complex_value('object');
}
public function end_object() {
$this->prevLevel = $this->level;
$this->level--;
$this->_end_complex_value();
}
public function start_array() {
$this->prevLevel = $this->level;
$this->level++;
$this->_start_complex_value('array');
}
public function end_array() {
$this->prevLevel = $this->level;
$this->level--;
$this->_end_complex_value();
}
public function key($key) {
$this->_keys[] = $key;
}
public function value($value) {
$this->_insert_value($value);
}
private function _start_complex_value($type) {
// We keep a stack of complex values (i.e. arrays and objects) as we build them,
// tagged with the type that they are so we know how to add new values.
$current_item = array('type' => $type, 'value' => array());
$this->_stack[] = $current_item;
}
private function _end_complex_value() {
$obj = array_pop($this->_stack);
// If the value stack is now at level 1 from level 2,
// we're done parsing the current complete event, so we can
// move the result into place so that get_event() can return it. Otherwise, we
// associate the value
// var_dump(__FILE__.__LINE__, $this->prevLevel, $this->level, $obj);
if ($this->prevLevel == 2 && $this->level == 1) {
if (!is_null($this->whenLevelAction)) {
$this->event = $obj['value'];
call_user_func($this->whenLevelAction, $this);
$this->event = null;
}
}
else {
$this->_insert_value($obj['value']);
}
}
// Inserts the given value into the top value on the stack in the appropriate way,
// based on whether that value is an array or an object.
private function _insert_value($value) {
// Grab the top item from the stack that we're currently parsing.
$current_item = array_pop($this->_stack);
// Examine the current item, and then:
// - if it's an object, associate the newly-parsed value with the most recent key
// - if it's an array, push the newly-parsed value to the array
if ($current_item['type'] === 'object') {
$current_item['value'][array_pop($this->_keys)] = $value;
} else {
$current_item['value'][] = $value;
}
// Replace the current item on the stack.
$this->_stack[] = $current_item;
}
}

Related

Where condition with multiple result set on ZF2 AbstractTableGateway

In Zf2 application written the model file to retrieve the data set from the table,
it works as expected for returning one result set, but for returning multiple rows not able to achieve by the below code.
Working and Returning single row
/**
* #param $id
* #return bool|Entity\Feeds
*/
public function getAppFeed($id)
{
$row = $this->select(array('app_id' => (int)$id))->current();
if (!$row)
return false;
$feedVal = new Entity\Feeds(array(
'id' => $row->id,
'title' => $row->title,
'link' => $row->link,
'Description' => $row->description,
'created' => $row->created,
));
return $feedVal;
}
Removed current and tried tablegateway object also but throwing the error.
Feeds table will have multiple record for each of the application, I need a function to achieve the same.
The Select always returns a ResultSet. You can access the objects(1) of ResultSet by iterating over it, because it implements the Iterator Interface.
Just an example piece of code:
public function getAppFeed($id)
{
$resultSet = $this->select(array('app_id' => (int)$id));
if ($resultSet instanceof \Zend\Db\ResultSet) {
foreach($resultSet as $item) {
// do your feed stuff here
// e.g. $item->id
}
} else {
return false;
}
}
(1) Object: meaning whatever object you asigned as Prototype in your TableGateway.
For further details, please checkout the documentation of ResultSet.

Gridfield not populate with ArrayList when not in public function getCMSFields

When I create GridField within admin console - everything is ok - I cam populate gridfield via classic method (ex. Member::get() - - or via ArrayList -
$al1 = new ArrayList();
$records = DB::query("SELECT * from Member where id<10");
while ($rec = $records->next()) {
$al1->push(new ArrayData($rec));
}
$grid = new GridField('Pages', 'All pages', $al1)
Both methods are working ok.
However, If I try to create GridField on user page - - presented in a form - - somehow the second method (where GridField should be populated by ArrayList - is not working).
$gridField = new GridField('pages1', 'All pages1', Member::get(), $config);
- woks ok, but the method where I create ArrayList old-fashioned way:
$al = new ArrayList();
$records = DB::query("SELECT * from Member where id<10");
while ($rec1 = $records->next()) {
$al->push(new ArrayData($rec));
}
I get an error when I try to render gridfield through:
return new Form($this, "AllSubmissions", new FieldList($gridField), new FieldList());
The error I am getting is:
[Warning] Missing argument 1 for ArrayData::__construct() GET /ss340/gridfield-test/gridfield-underr-grid/ Line 27 in C:\wamp\www\ss340\framework\view\ArrayData.php
Since I need data from external database to populate gridfield on non admin pages, I am desperate to get the solution for this.
If someone can provide me alternative method to show/edit tabular data in Silverstripe - -would appreciate very much.
I just looked up your error. It comes from the gridfield that tries to use this function:
public function getDisplayFields($gridField) {
if(!$this->displayFields) {
return singleton($gridField->getModelClass())->summaryFields();
}
return $this->displayFields;
}
If you are giving an ArrayList with ArrayData in it, it is trying to create a singleton of ArrayData. This causes an error because ArrayData expects an object or an array.
My oppinion is still to use my old answer, this would give you a DataList and you won't have to go through the trouble.
Old Answer
Why go through all the trouble and not just make use of SilverStripe's ORM with SearchFilters?
$dbConfig = [
"type" => 'MySQLDatabase',
"server" => 'localhost',
"username" => '',
"password" => '',
"database" => '',
"path" => '',
]; // fill this array with your other database configuration
//connect
DB::connect($dbConfig);
$members = Member::get()->filter('ID:LessThan', 10);
//reset to your own database
global $databaseConfig;
DB::connect($databaseConfig);
One last note; when 'developing' it is recommended to put SilverStripe in 'dev mode'. In the comments you say you are getting a Server Error (500) which indicates your SilverStripe is not in dev mode, or your error_reporting is not enabled. Maybe this could help you doing that.
OK, from the error you posted:
* var array
* see ArrayData::_construct() / protected $array;
/*
* #param object|array $value An associative array, or an object with simple properties.
* Converts object properties to keys of an associative array.
*/
public function __construct($value) {
if (is_object($value)) {
$this->array = get_object_vars($value);
} elseif (ArrayLib::is_associative($value)) {
$this->array = $value;
} elseif (is_array($value) && count($value) === 0) {
$this->array = array();
This error looks incomplete, but I noticed your submitted code is:
$al = new ArrayList();
$records = DB::query("SELECT * from Member where id<10");
while ($rec1 = $records->next()) {
$al->push(new ArrayData($rec));
}
But $rec is not defined and so likely you are not passing a valid constructor argument to new ArrayData()
try:
$al = new ArrayList();
$records = DB::query("SELECT * from Member where id<10");
while ($rec = $records->next()) {
$al->push(new ArrayData($rec));
}

PHP memory references

I am wondering this question for a long time, how does PHP handle references are they a good idea to use and I can't explain better than using an example, lets look at the following class and then # the comment of the setResult method.
Lets imagine we are using a model view controller framework and we are building a basic AjaxController, we only got 1 action method (getUsers) so far. Read the comments, and I hope my question is clear, how does PHP handle these kind of situations and is it true what I wrote about the x times in the memory # the setResult docblock.
class AjaxController{
private $json = array(
'result' => array(),
'errors' => array(),
'debug' => array()
);
/**
* Adds an error, always displayed to users if any errors.
*
* #param type $description
*/
private function addError($description){
$this->json['errors'][] = $description;
}
/**
* Adds an debug message, these are displayed only with DEBUG_MODE.
*
* #param type $description
*/
private function addDebug($description){
$this->json['debug'][] = $description;
}
/**
* QUESTION: How does this go in memory? Cause if I use no references,
* the array would be 3 times in the memory, if the array is big (5000+)
* its pretty much a waste of resources.
*
* 1st time in memory # model result.
* 2th time in memory # setResult ($resultSet variable)
* 3th time in memory # $this->json
*
* #param array $resultSet
*/
private function setResult($resultSet){
$this->json['result'] = $resultSet;
}
/**
* Gets all the users
*/
public function _getUsers(){
$users = new Users();
$this->setResult($users->getUsers());
}
public function __construct(){
if(!DEBUG_MODE && count($this->json['debug']) > 0){
unset($this->json['debug']);
}
if(count($this->json['errors']) > 0){
unset($this->json['errors']);
}
echo json_encode($this->json);
}
}
Another simple example: What would be better to use technique A:
function example(){
$latestRequest = $_SESSION['abc']['test']['abc'];
if($latestRequest === null){
$_SESSION['abc']['test']['abc'] = 'test';
}
}
Or technique B:
function example(){
$latestRequest =& $_SESSION['abc']['test']['abc'];
if($latestRequest === null){
$latestRequest = 'test';
}
}
Thanks for reading and advise :)
In short: don't use references.
PHP copies on write. Consider:
$foo = "a large string";
$bar = $foo; // no copy
$zed = $foo; // no copy
$bar .= 'test'; // $foo is duplicated at this point.
// $zed and $foo still point to the same string
You should only use references when you need the functionality that they provide. i.e., You need to modify the original array or scalar via a reference to it.

Is it possible to use AJAX POST data for the constructor values in a PHP class?

I want to call a PHP class via AJAX to process some form data. Since when you instantiate a class in PHP you can pass in values to be used in the classes constructor I wondered if the same thing was possible via AJAX?
I'm currently using the POST method with a separate function in the class to detect the post values and then process them, but I could save time by pre-loading the values in the contructor if this is possible!
Update: Code example
class myAjaxClass {
private $data;
public function __construct($could, $this, $be, $post, $data) {
$this->data = $data;
...etc...
By AJAX You can call only some script, e.g. my_script.php, that will look like
<?php
$myAjaxClass = new MyAjaxClass($_POST['could'], $_POST['this'], $_POST['be'], $_POST['post'], ...);
var_dump($myAjaxClass);
?>
and within JS AJAX call You have to provide the data for post, e.g. with jQuery:
$(document).ready(function(){
$.post(
"my_script.php",
{could: "COULD", this: "THIS", be: "BE", ... },
function(data) {
alert(data); // data must be a string... when object, use data.property, when array, use data['index']
}
);
});
The post values are superglobals so you don't need to pass them to anything. If your ajax request is calling the correct obj all you need do is use $_POST within the methods of that class...
In the end I decided to write a base Ajax handler class to prepare and load the POST data etc. I can then extend this with other classes for specific purposes such as 'AjaxLogin' and 'AjaxRegister'.
Here it is:
class Ajax {
protected $data = array();
protected $command; // Used to request a specific method.
protected $count; // Counter for multi-page forms (Really needed?)
/**
* Response is the output array to be serialised using `json_encode()`
* #var Boolean - Used to imply the success or failure of the AJAX function (i.e. form validation)
* #var Array (optional) - An array of output / error messages [ Defined as: 'messages' => ... ]
*/
protected $response = array('valid' => true); // Output to be serialised using 'json_encode()'
public function __construct() {
/* Redirect empty or insufficient POST data with 'Forbidden' response header (overwrite false) */
if( !$_POST OR count($_POST) < 1 ) {
header('location:'.$_SERVER['HTTP_REFERER'], false, '403');
exit();
}
/* Session validation (if 'hash' sent) */
if( isset($_POST['hash']) AND $_POST['hash'] != md5(session_id()) ) {
header('location:'.$_SERVER['HTTP_REFERER'], false, '403');
exit();
}
$this->processRequest($_POST['data']);
}
protected function addMessage($message) {
$this->response['valid'] = false;
$this->response['messages'][] = $message;
}
/**
* Unserialise AJAX data. Accepts data from either:
* - jQuery.serialize() [String]
* - jQuery.serializeArray() [Array of Objects]
*/
private function unserializeData($data) {
// -- from jQuery.serialize()
if( is_string($data) ) {
$array = explode('&', $data);
foreach($array as $key => $value) {
$string = preg_split('/=/', $value);
$this->data[$string[0]] = $string[1];
}
}
// -- from jQuery.serializeArray()
elseif( is_array($data) ) {
$array = (array) $data;
foreach($array as $element) {
$this->data[$element['name']] = $element['value'];
// $this->addMessage($element['name'].' => '.$element['value']);
}
}
else $this->addMessage('Unable to process your request, Please contact our Technical Support!');
}
// TODO: Use strip_tags or something for security??
private function processRequest($data) {
/* Process serialised data in to an Array */
$this->unserializeData($data);
/* Process additional POST data (if present) */
if( isset($_POST['command']) ) $this->command = $_POST['command'];
if( isset($_POST['count']) ) $this->count = $_POST['count'];
// Add additional POST data processing here!!
}
}
Feel free to use, modify, pass judgement etc. as you see fit, I hope this helps someone! ;)

Indirect modification of overloaded property

I'm creating a forum, and I want to keep track of which threads have been updated since the user last visited. So I have an array that I keep in $_SESSION that is basically structured as [$boardid][$threadid] = 1. If the threadid and boardid are set, then the thread has not been read and the board contains unread threads. When a user views a thread, I just unset() the appropriate board and thread id. However, I've having problems with getting unset to work with arrays like this.
Firstly, I have a session class to make handling session data a little nicer
class Session {
private $_namespace;
public function __construct($namespace = '_default') {
$this->_namespace = $namespace;
}
/**
* Erase all variables in the namespace
*/
public function clear() {
unset($_SESSION[$this->_namespace]);
}
public function __set($name, $value) {
$_SESSION[$this->_namespace][$name] = $value;
}
public function __get($name) {
if(isset($_SESSION[$this->_namespace]) && array_key_exists($name, $_SESSION[$this->_namespace])) {
return $_SESSION[$this->_namespace][$name];
}
return null;
}
public function __isset($name) {
return isset($_SESSION[$this->_namespace][$name]);
}
public function __unset($name) {
unset($_SESSION[$this->_namespace][$name]);
}
};
Then I have a CurrentUser class representing the current user. The CurrentUser class has a member named _data which is-a Session object. In the CurrentUser class I override the __get and __set methods to use the _data member.
public function __set($name, $value) {
$this->_data->$name = $value;
}
public function __isset($name) {
return isset($this->_data->$name);
}
public function __get($name) {
if(isset($this->_data->$name)) {
return $this->_data->$name;
}
return null;
}
Now to keep track of which threads have been unread, I fetch all threads whose date is >= the user's last_seen date. I also have methods to remove board and threads from the array.
public function buildUnreadList($since) {
// Build a "new since last visit" list
$forumModel = new Model_Forum();
$newThreads = $forumModel->fetchThreadsSinceDate($since);
foreach($newThreads as $thread) {
$tmp =& $this->unreadThreadsList;
$tmp[$thread['board']][$thread['id']] = 1;
}
}
public function removeThreadFromUnreadList($boardid, $threadid) {
$threads =& $this->unreadThreadsList;
unset($threads[$boardid][$threadid]);
}
public function removeBoardFromUnreadList($boardid) {
$threads =& $this->_data->unreadThreadsList;
unset($threads[$boardid]);
}
This is where I'm running into problems. I'm getting a Indirect modification of overloaded property Session::$unreadThreadsList has no effect error on $threads =& $this->_data->unreadThreadsList; How can I either fix this problem or design a better solution? I thought about creating a class that keeps track of the array so I don't have to have an array of arrays of arrays of arrays, but I'm not certain on persisting objects and creating an object just to manage an array feels really dirty to me.
Sorry if I'm a little bit off base; I'm trying to understand how the variables are being used (as their initialization is not shown). So $this->unreadThreadsList is an array where the indices (if value set to 1). Why not set everything directly?
Looking at what you're doing, here is an idea I had. It does the same thing but just does some extra checking on $this->unreadThreadsList and it accesses the variable directly.
Assuming I figured out the array structure properly, this should work.
public function buildUnreadList($since) {
// Build a "new since last visit" list
$forumModel = new Model_Forum;
$newThreads = $forumModel->fetchThreadsSinceDate($since);
foreach($newThread as $thread)
{
// Avoid an error if no list pre-exists
if(is_array($this->unreadThreadsList))
if(array_key_exists($thread['board'],$this->unreadThreadsList))
if(array_key_exists($thread['id'],$this->unreadThreadsList[$thread['board']]))
// Skip this result, already in
if($this->unreadThreadsList[$thread['board']][$thread['id']] == 1) continue;
$this->unreadThreadsList[$thread['board']][$thread['id']] = 1;
}
}
This assumes an array structure like:
array(
1 => array(
'board' => 1,
'id' => 2
),
2 => array(
'board' => 3,
'id' => 1
),
3 => array(
'board' => 7,
'id' => 2
));
for the result of "fetchThreadsSinceData($since)" and an array structure of
array(
1 => array(
2 => 1
),
2=> array(
2 => 1
),
3=> array(
2 => 1
));
for the $this->unreadThreadsList where the first index is the board and the second index is the thread id.
For the other functions why not simply unset them directly as well?
unset($this->unreadThreadsList[$boardid][$threadid]);
unset($this->unreadThreadsList[$boardid]);
Good luck!
Dennis M.

Categories