I have some JSON of which the following is a small sample:
{
"results": {
"div": [
{
"class": "sylEntry",
"div": [
{
"class": "sT",
"id": "sOT",
"p": "Mon 11/17, Computer work time"
},
{
"class": "des",
"id": "dOne",
"p": "All classes Siebel 0218"
}
],
"id": "sylOne"
}
]
}
}
I would like to only retrieve the "p" content for the div element with class "sT". I would like to use a loop and doing something like this:
var arrayOfResults = $.results..div.p
does not work because I only want to retrieve the p value for the div element with class "sT".
So how do I construct my JSONpath so that it will retrive the array of p elements that are contained within the divs class "sT".
Thanks!!
Concepts
JSONPath apparently has a filter syntax that allows you to insert arbitrary Javascript into an expression for the purposes of matching or filtering. It also uses # as a shortcut for the current node. Their example of combining these two things looks like this:
$..book[?(#.price<10)] // filter all
books cheapier than 10
So this is probably what you want to use here.
Solution
To test the query I had in mind, I modified the jsonpath-test-js.html file in JSONPath's repo to test your data. You can copy-paste my sample to an HTML file and just load it in a browser.
Their test suite has an array of objects with fields called o and p. o contains the original data to operate on while p contains an array of JSONPath expressions to apply to o. It loops over all these pairs and applies all the ps to their respective os, printing out the result. Not as handy as a simple REPL, but it'll do.
Here's what I came up with:
<html>
<head>
<title> JSONPath - Tests (js)</title>
<script type="text/javascript" src="http://www.json.org/json.js"></script>
<script type="text/javascript"
src="http://jsonpath.googlecode.com/svn/trunk/src/js/jsonpath.js">
</script>
</head>
<body>
<pre>
<script type="text/javascript">
var out = "", tests =
[ { "o": { "results" : { "div" : [ { "clazz": "sylEntry",
"id": "sylOne", "div": [ { "clazz": "sT", "id": "sOT",
"p": "Mon 11/17, Computer work time" }, { "clazz": "des",
"id": "dOne", "p": "All classes Siebel 0218" } ] } ] } },
"p": ["$.results..div[?(#.clazz=='sT')].p", // my suggestion expression
"$.results..div[*].p"]}, // your question's expression
];
function evaluate($, p) {
var res = eval(p);
return res != null ? res.toJSONString() : null;
}
for (var i=0; i<tests.length; i++) {
var pathes;
for (var j=0; j<tests[i].p.length; j++) {
pre = ">";
if (pathes = jsonPath(tests[i].o, tests[i].p[j], {resultType: "PATH"}))
for (var k=0; k<pathes.length; k++) {
out += pre + " " + pathes[k] +
" = " + evaluate(tests[i].o, pathes[k]) + "\n";
pre = " ";
}
}
out += "<hr/>";
}
document.write(out);
</script>
</pre>
</body>
</html>
Note that this will first print the results of my query expression and then print the results of yours, so we can compare what they produce.
Here's the output it produces:
> $['results']['div'][0]['div'][0]['p'] = "Mon 11/17, Computer work time"
> $['results']['div'][0]['div'][0]['p'] = "Mon 11/17, Computer work time"
$['results']['div'][0]['div'][4]['p'] = "All classes Siebel 0218"
So the correct operator in the filter expression is ==, meaning the correct expression for you is:
$.results..div[?(#.class=='sT')].p
However, I discovered one unfortunate issue (at least in the Javascript implementation of JSONPath): using the word 'class' in the above query results in this:
SyntaxError: jsonPath: Parse error: _v.class=='sT'
My only guess is that there's an eval being called somewhere to actually evaluate the JSONPath expression. class is a reserved word in Javascript, so it's causing issues. Let's try using the alternate syntax for #.class:
$.results..div[?(#.['class']=='sT')].p
Results:
> $['results']['div'][0]['div'][0]['p'] = "Mon 11/17, Computer work time"
> $['results']['div'][0]['div'][0]['p'] = "Mon 11/17, Computer work time"
$['results']['div'][0]['div'][5]['p'] = "All classes Siebel 0218"
So use the above expression and you should be good to go! The filter feature looks powerful, so it'll probably be well worth exploring its capabilities!
Instead of using hard-to-grasp, non-standard query style, you could use DefiantJS (http://defiantjs.com), which extends the global object JSON with the method "search" - with which you can query JSON structures with standardised XPath queries. This method returns the matches in an array (empty array if no matches were found).
Here is a working JSfiddle of the code below;
http://jsfiddle.net/hbi99/sy2bb/
var data = {
"results": {
"div": {
"class": "sylEntry",
"id": "sylOne",
"div": [
{
"class": "sT",
"id": "sOT",
"p": "Mon 11/17, Computer work time"
},
{
"class": "des",
"id": "dOne",
"p": "All classes Siebel 0218"
}
]
}
}
},
res = JSON.search( data, '//div[class="sT"]/p' );
console.log( res[0] );
// Mon 11/17, Computer work time
To get an idea of XPath and how it works, check out this XPath Evaluator tool:
http://defiantjs.com/#xpath_evaluator
try this
JsonPath.with(jsonResponse).param("name", "getName").get("findAll { a -> a.name == name }")
Related
I'm trying to scrap information directly from the maersk website.
Exemple, i'm trying scraping the information from this URL https://www.maersk.com/tracking/221242675
I Have a lot of tracking nunbers to update every day on database, so I dicided automate a little bit.
But, if have the following code, but its saying that need JS to work. I alredy even tryed with curl, etc.
But nothing work. Any one know another way?
I tryed the following code:
<?php
// ------------ teste 14 ------------
$html = file_get_contents('https://www.maersk.com/tracking/#tracking/221242675'); //get the html returned from the following url
echo $html;
$ETAupdate = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$ETAupdate->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$ETA_xpath = new DOMXPath($ETAupdate);
//get all the h2's with an id
$ETA_row = $ETA_xpath->query('//strong');
if($ETA_row->length > 0){
foreach($ETA_row as $row){
echo $row->nodeValue . "<br/>";
}
}
}
?>
You need to scrape the data directly from their API requests, rather than trying to scrape the page URL directly (Unless you're using something like puppeteer, but I really don't recommend that for this simple task)
I took a look at the site and the API endpoint is:
https://api.maersk.com/track/221242675?operator=MAEU
This will return a JSON-formatted response which you can parse and use to extract the details. It'll also give you a much easier method to access the data rather than parsing the HTML. Example below.
{
"tpdoc_num": "221242675",
"isContainerSearch": false,
"origin": {
"terminal": "YanTian Intl. Container Terminal",
"geo_site": "1PVA2R05ZGGHQ",
"city": "Yantian",
"state": "Guangdong",
"country": "China",
"country_code": "CN",
"geoid_city": "0L3DBFFJ3KZ9A",
"site_type": "TERMINAL"
},
"destination": {
"terminal": "DCT Gdansk sa",
"geo_site": "02RB4MMG6P32M",
"city": "Gdansk",
"state": "",
"country": "Poland",
"country_code": "PL",
"geoid_city": "3RIGHAIZMGKN3",
"site_type": "TERMINAL"
},
"containers": [ ... ]
}
Hello I need some help to find my XML elements with PHP and xpath.
This is a part of my xml:
"processen": {
"proces": [
{
"#attributes": {
"id": "B1221"
},
"velden": {
"kernomschrijving": "activiteit aanleggen alarminstallatie",
"model-kernomschrijving": "aanleggen alarminstallatie",
"naam": "Het beoordelen van een alarminstallatie",
"standaard-dossiernaam": {
"#attributes": {
"ref": "SCN0000029"
}
},
"[tag:taakveld]": "Bouwzaken & Procedures",
"proceseigenaar": "Bouwzaken",
"toelichting-proces": "bla die bla.",
"aanleiding": "Dit werkproces wordt intern getriggerd.",
"opmerking-proces": {
"#attributes": {
"ref": "SCN0000036"
}
},
"exportprofiel": {
"#attributes": {
"ref": "SCN0000037"
},
},...
For example I want to be able to find the id (fast) and access all the elements under the id B1221
I tried this in al kind of variants but none works:
$xml = simplexml_load_file( $filename );
$proces = $xml->xpath("//processen/proces/#attributes/id=B1221");
$proces = $xml->xpath("//processen/proces[#attributes/id=B1221]");
It always returns an empty array...
Thanks for your help.
What you've shown there is not the XML; it is some kind of representation of a PHP object which has been produced by parsing the XML, and through which you can access the content of the XML.
That may sound like a pedantic distinction, but it's actually key to your problem: XPath expressions aren't specific to PHP, and so aren't searching through this structure; they are a standard language for searching through the XML itself.
So to construct the correct XPath expression, you need to look only at the actual XML. From the representation you show, I'm guessing it looks, in part, something like this:
<processen>
<proces id="B1816">
<velden>
<kernomschrijving>activiteit aanleggen alarminstallatie</kernomschrijving>
<model-kernomschrijving>aanleggen alarminstallatie</model-kernomschrijving>
<naam>Het beoordelen van een alarminstallatie</naam>
</velden>
</proces>
</processen>
In XPath, you access elements (tags) by name, attributes (like id="...") with a leading #, and literal strings in double-quotes. The [...] operator means something like "has", so [#foo="bar"] means "has an attribute foo whose value is the string bar".
Which gives you this:
$xml = simplexml_load_file( $filename );
$proces = $xml->xpath('//processen/proces[#id="B1816"]');
echo $proces[0]->asXML();
(Here's a live demo of that example.)
It looks like you may also have namespaces in there (tags with : in the name); those require some extra tricks discussed in this reference question.
I am trying to retrieve information from a JSON file using JSONReader (I've implemented JSONReader to my PHP configuration) and I am trying to get the information from a simple JSON file (see below) about the whole part of the array (part, where Home Lawrence Library is) and I am struggling with this.
To be honest, I don't know how to use JSONReader properly.
This is my code:
inline <?php $reader = new JSONReader();
$reader->open('http://www.example.com/news.json');
while ($reader->read()) {
switch($reader->tokenType) {
case JSONReader::ARRAY_START:
echo "Array start:\n";
break;
case JSONReader::ARRAY_END:
echo "Array end.\n";
break;
case JSONReader::VALUE:
echo " - " . $reader->value . "\n";
break;
}
}
$reader->close();
?>
It is just printing array start and array end, but does not print the value.
JSON code:
{
"markers": [
{
"homeTeam": "Lawrence Library",
"awayTeam": "LUGip",
"markerImage": "images/red.png",
"information": "Linux users group meets second Wednesday of each month.",
"fixture": "Wednesday 7pm",
"capacity": "",
"previousScore": ""
},
{
"homeTeam": "Hamilton Library",
"awayTeam": "LUGip HW SIG",
"markerImage": "images/white.png",
"information": "Linux users can meet the first Tuesday of the month to work out harward and configuration issues.",
"fixture": "Tuesday 7pm",
"capacity": "",
"tv": ""
},
{
"homeTeam": "Applebees",
"awayTeam": "After LUPip Mtg Spot",
"markerImage": "images/newcastle.png",
"information": "Some of us go there after the main LUGip meeting, drink brews, and talk.",
"fixture": "Wednesday whenever",
"capacity": "2 to 4 pints",
"tv": ""
}
] }
Link to the JSONReader documentation: https://github.com/shevron/ext-jsonreader
Btw, I am trying to parse big JSON files so please do not suggest to use json_decode or curl methods.
I am using PHP on shared server to access external site via API that is returning JSON containing 2 levels of data (Level 1: Performer & Level 2: Category array inside performer). I want to convert this to multidimensional associative array WITHOUT USING json_decode function (it uses too much memory for this usage!!!)
Example of JSON data:
[
{
"performerId": 99999,
"name": " Any performer name",
"category": {
"categoryId": 99,
"name": "Some category name",
"eventType": "Category Event"
},
"eventType": "Performer Event",
"url": "http://www.novalidsite.com/something/performerspage.html",
"priority": 0
},
{
"performerId": 88888,
"name": " Second performer name",
"category": {
"categoryId": 88,
"name": "Second Category name",
"eventType": "Category Event 2"
},
"eventType": "Performer Event 2",
"url": "http://www.novalidsite.com/somethingelse/performerspage2.html",
"priority": 7
}
]
I have tried to use substr and strip the "[" and "]".
Then performed the call:
preg_match_all('/\{([^}]+)\}/', $input, $matches);
This gives me the string for each row BUT truncates after the trailing "}" of the category data.
How can I return the FULL ROW of data AS AN ARRAY using something like preg_split, preg_match_all, etc. INSTEAD of the heavy handed calls like json_decode on the overall JSON string?
Once I have the array with each row identified correctly, I CAN THEN perform json_decode on that string without overtaxing the memory on the shared server.
For those wanting more detail about json_decode usage causing error:
$aryPerformersfile[ ] = file_get_contents('https://subdomain.domain.com/dir/getresults?id=1234');
$aryPerformers = $aryPerformersfile[0];
unset($aryPerformersfile);
$mytmpvar = json_decode($aryPerformers);
print_r($mytmpvar);
exit;
If you have a limited amount of memory, you could read the data as a stream and parse the JSON one piece at a time, instead of parsing everything at once.
getresults.json:
[
{
"performerId": 99999,
"name": " Any performer name",
"category": {
"categoryId": 99,
"name": "Some category name",
"eventType": "Category Event"
},
"eventType": "Performer Event",
"url": "http://www.novalidsite.com/something/performerspage.html",
"priority": 0
},
{
"performerId": 88888,
"name": " Second performer name",
"category": {
"categoryId": 88,
"name": "Second Category name",
"eventType": "Category Event 2"
},
"eventType": "Performer Event 2",
"url": "http://www.novalidsite.com/somethingelse/performerspage2.html",
"priority": 7
}
]
PHP:
$stream = fopen('getresults.json', 'rb');
// Read one character at a time from $stream until
// $count number of $char characters is read
function readUpTo($stream, $char, $count)
{
$str = '';
$foundCount = 0;
while (!feof($stream)) {
$readChar = stream_get_contents($stream, 1);
$str .= $readChar;
if ($readChar == $char && ++$foundCount == $count)
return $str;
}
return false;
}
// Read one JSON performer object
function readOneJsonPerformer($stream)
{
if ($json = readUpTo($stream, '{', 1))
return '{' . readUpTo($stream, '}', 2);
return false;
}
while ($json = readOneJsonPerformer($stream)) {
$performer = json_decode($json);
echo 'Performer with ID ' . $performer->performerId
. ' has category ' . $performer->category->name, PHP_EOL;
}
fclose($stream);
Output:
Performer with ID 99999 has category Some category name
Performer with ID 88888 has category Second Category name
This code could of course be improved by using a buffer for faster reads, take into account that string values may themselves include { and } chars etc.
You have two options here, and neither of them include you writing your own decoder; don't over-complicate the solution with an unnecessary work-around.
1) Decrease the size of the json that is being decoded, or
2) Increase the allowed memory on your server.
The first option would require access to the json that is being created. This may or may not be possible depending on if you're the one originally creating the json. The easiest way to do this is to unset() any useless data. For example, maybe there is some debug info you won't need, so you can do unset($json_array['debug']); on the useless data.
http://php.net/manual/en/function.unset.php
The second option requires you to have access to the php.ini file on your server. You need to find the line with something like memory_limit = 128M and make the 128M part larger. Try increasing this to double the value already within the file (so it would be 256M in this case). This might not solve your problem though, since large json data could still be the core of your problem; this only provides a work-around for inefficient code.
I had a yaml file that I needed parsed into an array and I got that done. Now this array is huge. I only want a couple values...
X1, Y1, X2, Y2, owner, that's all I would like. If I can get them to be spit out into arrays nicely it would mean the world to me. (The owner must be the owner related to those X1, y1, x2, y2 values...
(They are are all related to each other) There are many x1,y1 in the array but they all come under headings... etc I don't know how to get them all...
Here is a look at what the array spits out... (Shortened because of filesize limit)
http://pastebin.com/PyH18mZv
Any help would be appreciated.
To Parse YAML you can use various available PHP parsers. i parsed your YAML by using Online YAML Parser and output the string in JSON. At The end required array values can be accessed by decoding the JSON.
*
please note i cut the string short just for example purpose
*
$arr='{
"Residences": {
"WorkArea": {"BlackList": {"Type": "BLACKLIST", "ItemList": []},
"EnterMessage": "Welcome %player to %residence, owned by %owner.",
"Areas": {
"main": {
"Y1": 217,
"X1": -6301,
"X2": -6306,
"Y2": 205,
"Z1": 3001,
"Z2": 2981
}
},
"Permissions": {"Owner": "cal9897","World": "VivaWorld"}
},
"caylyn55": {
"BlackList": {
"Type": "BLACKLIST",
"ItemList": []
},
"EnterMessage": "Welcome %player to %residence, owned by %owner.",
"StoredMoney": 0,
"IgnoreList": {
"Type": "IGNORELIST",
"ItemList": []
},
"LeaveMessage": "Now leaving %residence.",
"Subzones": {},
"Areas": {
"main": {
"Y1": 67,
"X1": 1220,
"X2": 1210,
"Y2": 64,
"Z1": 369,
"Z2": 360
}
},
"Permissions": {
"Owner": "caylyn55",
"PlayerFlags": {},
"GroupFlags": {},
"World": "VivaWorld"
}
}
},
"Version": 1,
"Seed": 1337068141
}';
Decode JSON
$a= json_decode($arr,true);
First Area Value get through
$a['Residences']['WorkArea']['Areas']['main']['Y1'];
and Second Area value
$a['Residences']['caylyn55']['Areas']['main']['Y1'];
if ['WorkArea'] AND ['caylyn55'] dynamic you can use this code
$b=array_values($a);
foreach($b as $values)
{
if(is_array($values)) {
foreach(array_keys($values) as $c){
echo $a['Residences'][$c]['Areas']['main']['Y1'];
}
}
}
You just need to add the complete reference to the data you're trying to ouput.
Example: echo $data['Residences']['WorkArea']['Permissions']['Owner'];
Something like this
$extracted = array_map(function ($residence) {
$r = $residence['Areas']['main'];
$r['owner'] = $residence['Permissions']['Owner'];
return $r;
}, $data['Residences']);