Neo4J, how to query hierarchical data / PHP

Neo4J, how to query hierarchical data / PHP - php

i created a category-structure in graph-database "neo4j".
I have nodes and relationships, everything perfect.
I am using Neoxygen Neoclient for PHP to access the data. How can I query the whole category-graph in an efficient way (including structure) from my root-element?
MATCH (a:`Category`{category_id:0})-[r:HAS_CHILD*]->(b:`Category`)
RETURN b,r
My desired structure in PHP is:
- Root
--- Category A
-------- Subcategory AB
--- Category B
--- Category C
-------- Subcategory CA
----------------- Subsubcategory CAA
...
Any ideas?
Thanks in advance.
mr_g

It is totally feasible and user-friendly in Neoxygen's NeoClient.
The first thing to make sure, is that you activate the response formatter :
$client = ClientBuilder::create()
->setAutoFormatResponse(true)
->addConnection(xxx...)
->build();
Secondly, concerning your query I would definitely set a depth limit to avoid memory behaviors depending of your graph connectedness :
MATCH (a:`Category`{category_id:0})-[r:HAS_CHILD*..20]->(b:`Category`)
RETURN b,r
Then, you can send it with the client and benefit that the client will remap the results in a graph structure :
$query = 'MATCH (a:`Category`{category_id:{id}})-[r:HAS_CHILD*..20]->(b:`Category`)'
RETURN b,r';
$children = $client->sendCypherQuery($q, ['id'=>0])->getResult()->getNodes();
Now, each node will know what he has as relationships and relationships know their start and end nodes, example :
$children are nodes in the first depth, so
$rels = $children->getOutboundRelationships();
$nodes = [];
foreach ($rels as $rel) {
$nodes[] = $rel->getEndNode();
}
$nodes holds now all the nodes in depth 2.
There is, currently, no method to get directly the connected nodes from a node object without getting first the relationship, maybe something I can add to the client.

Cypher returns tabular data so if you want to get a tree hierarchy the most efficient way is to return all of the paths from the root to the leaves. A path is a collection/array of node-(relationship-node)* (that is it's an odd number of objects, always containing a node at each end with alternating nodes and relationships). Here is the cypher for how you would do that:
MATCH path(a:`Category`{category_id:0})-[r:HAS_CHILD*]->(b:`Category`)
WHERE NOT(b-[:HAS_CHILD]->())
RETURN b,r
The WHERE clause ensures that you only return all of the paths to the leafs. You could return all categories in the tree which would give you the partial paths too, but all of those partial paths are contained in some longer path so you'd just end up returning more data than you need.
Once you have the paths (I'm not sure what form they show up in Neoclient as I'm not a PHP guy) you can build a hierarchical data structure in memory from the results. If I recall correctly the map/dictionary-type structure in PHP is an associative array.

Schema:
Indexes
ON :Category(category_id) ONLINE (for uniqueness constraint)
Constraints
ON (category:Category) ASSERT category.category_id IS UNIQUE
Query:
MATCH(c:Category) RETURN c

Related

Counting the number of nodes under a node using hierarchies modelled in neo4j

I have a adjacency list (parent/child relationship) modelled into neo4j but counting number of node under the parent gives wrong count.
Here is the cypher query being used
MATCH (me:Members)-[:IS_PARENT_OF*]->(child)
WHERE me.membershipID = {membershipID}
RETURN count(child)

To do this, the best way is to make a graph traversal.
Thanks to APOC (https://neo4j-contrib.github.io/neo4j-apoc-procedure), it's directly possible in cypher.
This query should give you the expected result :
MATCH (me:Members { membershipID:$membershipID}) WITH me
CALL apoc.path.subgraphAll(me, {relationshipFilter:'IS_PARENT_OF>', uniqueness: 'NODE_GLOBAL'}) YIELD nodes
RETURN size(nodes)

Logisima's answer is a great answer and an efficient way to traverse the subgraph. The reason you are getting what I presume is too many nodes in your result though is that you are likely double counting the some children returned in the result.
If you add a DISTINCT I think you will return the number you expect.
MATCH (me:Members)-[:IS_PARENT_OF*]->(child)
WHERE me.membershipID = {membershipID}
RETURN count(DISTINCT child)
You could identify children that are being double counted by doing something like this
MATCH (me:Members)-[:IS_PARENT_OF*]->(child)
WHERE me.membershipID = {membershipID}
RETURN child, count(*)
Do you have any extraneous IS_PARENT_OF relationships in the subgraph in question?

Google Cloud Datastore: Working with Keys

I haven't found any documentation on this, although it must exist somewhere, being as it's rather simple.
I can query using PHP for all of the tasklists (for example) as follows:
$query = $datastore->query();
$query->kind('tasklist')
->filter('date_approved', '<', 0)
->order("date_approved")
->order("date_updated", $query::ORDER_DESCENDING)
->limit(50);
$res = $datastore->runQuery($query);
And to see the key (for example, for updates), I've been using:
foreach($res as $r) {
$parentkey = $r->key()->pathEnd()['name'];
echo $parentkey; //"default"
}
I noticed if i "JOIN" child records, that were created as follows:
$childkey = $datastore->key('tasklist', $parentkey)
->pathElement('task', 'task001');
$entity = $datastore->entity($childkey, $myTaskArray);
$datastore->upsert($entity);
When I later query for those by "parent" key:
$subquery = $datastore->query();
$subquery->kind('task')
->filter('date_approved','<',0)
->hasAncestor( $datastore->key('tasklist', $parentkey) )
->order("date_approved")
->order("date_updated", $subquery::ORDER_DESCENDING);
$subres = $datastore->runQuery($subquery);
Then printing the key for the child will work the same:
foreach($subres as $sr){
$childkey = $sr->key()->pathEnd()['name'];
echo $childkey; //"task001"
}
Is there a method for working with keys and keys of ancestors that's less goofball than: $entity->key()->pathEnd()['name'];
For example, in MongoDB
$myobj = array();
$db->Insert($myobj);
echo (string) $myobj['_id']; //key
Also, shouldn't i be able to update a document by providing the key alone, and not having to specify the ancestor key?
$childkey = $datastore->key('tasklist', $parentkey)
->pathElement('task', "task001");
$entity = $datastore->lookup($childkey);
$entity = $datastore->entity($childkey, $myUpdatedTaskArray);
$datastore->update($entity, array("allowOverwrite"=>true));
versus:
$childkey = $datastore->key('task', "task001");
$entity = $datastore->lookup($childkey);
$entity = $datastore->entity($childkey, $myUpdatedTaskArray);
$datastore->update($entity, array("allowOverwrite"=>true));
Lastly, can i query for entities AND their descendants without having to do a join (as i'm doing above), while still filtering (date_approved<0 for example) and sorting (date_updated DESC also for example).
NOTE: goofball being a non-technical term

Is there a method for working with keys and keys of ancestors that's less goofball than: $entity->key()->pathEnd()['name'];
Keys in datastore are a rather complex concept, so they're not able to be used quite in the same way you suggest from your work with Mongo. However, there are some helpers on the Google\Cloud\Datastore\Key class which would simplify your code a little bit. You could use pathEndIdentitifer in place of pathEnd()['name']. For instance, $key->pathEndIdentifier(). This is quite useful especially in cases where you may not know whether the key uses an ID or a Name.
Also, shouldn't i be able to update a document by providing the key alone, and not having to specify the ancestor key?
Unfortunately not. A key of form [Parent: john, Child: junior] refers to an entirely different entity than a key of form [Child: junior]. To use parent entities, you must supply the full key path. If however you can think of ways to make this easier on you, please let me know, preferably by filing an issue. I'd love to figure out how to make this easier -- I know it is a bit complex currently.
Lastly, can i query for entities AND their descendants without having to do a join (as i'm doing above), while still filtering (date_approved<0 for example) and sorting (date_updated DESC also for example).
Unfortunately not. You can query for either one kind or none (i.e. a kindless query). This latter type can query multiple kinds, but you cannot do filtering on entities properties or values.

Assign array value to a field

I am working with migration and I am migrating taxonomy terms that the document has been tagged with. The terms are in the document are separated by commas. so far I have managed to separate each term and place it into an array like so:
public function prepareRow($row) {
$terms = explode(",", $row->np_tax_terms);
foreach ($terms as $key => $value) {
$terms[$key] = trim($value);
}
var_dump($terms);
exit;
}
This gives me the following result when I dump it in the terminal:
array(2) {
[0]=>
string(7) "Smoking"
[1]=>
string(23) "Not Smoking"
}
Now I have two fields field_one and field_two and I want to place the value 0 of the array into field_one and value 1 into field_two
e.g
field_one=[0]$terms;
I know this isn't correct and I'm not sure how to do this part. Any suggestions on how to do this please?

If you are only looking to store the string value of the taxonomy term into a different field of a node, then the following code should do the trick:
$node->field_one['und'][0]['value'] = $terms[0];
$node->field_two['und'][0]['value'] = $terms[1];
node_save($node);
Note you will need to load the node first, if you need help with that, comment here and will update my answer.

You are asking specifically about ArrayList and HashMap, but I think to fully understand what is going on you have to understand the Collections framework. So an ArrayList implements the List interface and a HashMap implements the Map interface.
List:
An ordered collection (also known as a sequence). The user of this interface has precise control over where in the list each element is inserted. The user can access elements by their integer index (position in the list), and search for elements in the list.
Map:
An object that maps keys to values. A map cannot contain duplicate keys; each key can map to at most one value.
So as other answers have discussed, the list interface (ArrayList) is an ordered collection of objects that you access using an index, much like an array (well in the case of ArrayList, as the name suggests, it is just an array in the background, but a lot of the details of dealing with the array are handled for you). You would use an ArrayList when you want to keep things in sorted order (the order they are added, or indeed the position within the list that you specify when you add the object).
A Map on the other hand takes one object and uses that as a key (index) to another object (the value). So lets say you have objects which have unique IDs, and you know you are going to want to access these objects by ID at some point, the Map will make this very easy on you (and quicker/more efficient). The HashMap implementation uses the hash value of the key object to locate where it is stored, so there is no guarentee of the order of the values anymore.

You might like to try:
list($field_one, $field_two) = prepareRow($row);
The list function maps entries in an array (in order) to the variables passed by reference.
This is a little fragile, but should work so long as you know you'll have at least two items in your prepareRow result.

Search by term giving empty result using elastica

Using Elastica - Elasticsearch PHP Client. There are so many fields but I want to search in "Screen_name" field. What I have to do for it. I used term but without success. Result is empty array
Here is the code.
// Load index (database)
$elasticaIndex = $elasticaClient->getIndex('twitter_json');
//Load type (table)
$elasticaType = $elasticaIndex->getType('tweet_json');
$elasticaFilterScreenName = new \Elastica\Filter\Term();
$elasticaFilterScreenName->setTerm('screen_name', 'sohail');
//Search on the index.
$elasticaResultSet = $elasticaIndex->search($elasticaFilterScreenName);
var_dump($elasticaResultSet); exit;
$elasticaResults = $elasticaResultSet->getResults();
$totalResults = $elasticaResultSet->getTotalHits();

It's hard to say without knowing your mapping, but there is a good chance the document property "screen_name" does not contain the term "sohail".
Try using a Match query or a Query_String.
"Term" has special meaning in ElasticSearch. A Term is the base, atomic unit of an index. Terms are generated after your text is run through an analyzer. If "screen_name" has an analyzer associated with the index, "sohail" is being modified in some capacity before being saved into the index.

PHP node not passing by reference

I have a bunch of dom manipulation functions within a class.
One of those functions assigns unique ids to specific nodes.
$resource_info_node->setAttribute('id', 'resource_'.$this->ids);
$details['id'] = 'resource_'.$this->ids;
$details['node'] = $resource_info_node;
$this->resource_nodes['resource_'.$this->ids] = $details;
$this->ids += 1;
later I want to look up and modify those nodes.
I have tried :
$current_node = $this->resource_nodes[$id]['node'];
When I print_r() I find that this node is a duplicate of the original node.
It has the original node's attributes but is not a part of the DOM tree.
I get the same results with :
$this->content->getElementById($id);
I suppose I based this whole thing on storing node references in an array. I thought that was a fine thing to do. Even if not, after that using getElementByID() should have returned the node within the dom.
I thought that, in PHP all objects were passed by reference. Including DOM nodes.
Any ideas on how I can test what is actually going on.
EDIT :
Well I used :
$this->xpath->query('//*[#id]');
That returned the right number of items with ids. The node is just not in the DOM tree when I edit it.
and
$current_node = &$this->resource_nodes[$id]['node'];
Using the reference syntax had no affect.
The strangest part is that get elementById() is not returning a node in the dom. It has all the right attributes except no parentNode.
FIX - not answer :
I just used xpath instead of my reference or getElementById().

Use reference explicity:
$current_node = &$this->resource_nodes[$id]['node'];
And modify $current_node

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.