PHP xpath find elements

PHP xpath find elements - php

Hello I need some help to find my XML elements with PHP and xpath.
This is a part of my xml:
"processen": {
"proces": [
{
"#attributes": {
"id": "B1221"
},
"velden": {
"kernomschrijving": "activiteit aanleggen alarminstallatie",
"model-kernomschrijving": "aanleggen alarminstallatie",
"naam": "Het beoordelen van een alarminstallatie",
"standaard-dossiernaam": {
"#attributes": {
"ref": "SCN0000029"
}
},
"[tag:taakveld]": "Bouwzaken & Procedures",
"proceseigenaar": "Bouwzaken",
"toelichting-proces": "bla die bla.",
"aanleiding": "Dit werkproces wordt intern getriggerd.",
"opmerking-proces": {
"#attributes": {
"ref": "SCN0000036"
}
},
"exportprofiel": {
"#attributes": {
"ref": "SCN0000037"
},
},...
For example I want to be able to find the id (fast) and access all the elements under the id B1221
I tried this in al kind of variants but none works:
$xml = simplexml_load_file( $filename );
$proces = $xml->xpath("//processen/proces/#attributes/id=B1221");
$proces = $xml->xpath("//processen/proces[#attributes/id=B1221]");
It always returns an empty array...
Thanks for your help.

What you've shown there is not the XML; it is some kind of representation of a PHP object which has been produced by parsing the XML, and through which you can access the content of the XML.
That may sound like a pedantic distinction, but it's actually key to your problem: XPath expressions aren't specific to PHP, and so aren't searching through this structure; they are a standard language for searching through the XML itself.
So to construct the correct XPath expression, you need to look only at the actual XML. From the representation you show, I'm guessing it looks, in part, something like this:
<processen>
<proces id="B1816">
<velden>
<kernomschrijving>activiteit aanleggen alarminstallatie</kernomschrijving>
<model-kernomschrijving>aanleggen alarminstallatie</model-kernomschrijving>
<naam>Het beoordelen van een alarminstallatie</naam>
</velden>
</proces>
</processen>
In XPath, you access elements (tags) by name, attributes (like id="...") with a leading #, and literal strings in double-quotes. The [...] operator means something like "has", so [#foo="bar"] means "has an attribute foo whose value is the string bar".
Which gives you this:
$xml = simplexml_load_file( $filename );
$proces = $xml->xpath('//processen/proces[#id="B1816"]');
echo $proces[0]->asXML();
(Here's a live demo of that example.)
It looks like you may also have namespaces in there (tags with : in the name); those require some extra tricks discussed in this reference question.

Related

Scraping websites with PHP

I'm trying to scrap information directly from the maersk website.
Exemple, i'm trying scraping the information from this URL https://www.maersk.com/tracking/221242675
I Have a lot of tracking nunbers to update every day on database, so I dicided automate a little bit.
But, if have the following code, but its saying that need JS to work. I alredy even tryed with curl, etc.
But nothing work. Any one know another way?
I tryed the following code:
<?php
// ------------ teste 14 ------------
$html = file_get_contents('https://www.maersk.com/tracking/#tracking/221242675'); //get the html returned from the following url
echo $html;
$ETAupdate = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$ETAupdate->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$ETA_xpath = new DOMXPath($ETAupdate);
//get all the h2's with an id
$ETA_row = $ETA_xpath->query('//strong');
if($ETA_row->length > 0){
foreach($ETA_row as $row){
echo $row->nodeValue . "<br/>";
}
}
}
?>

You need to scrape the data directly from their API requests, rather than trying to scrape the page URL directly (Unless you're using something like puppeteer, but I really don't recommend that for this simple task)
I took a look at the site and the API endpoint is:
https://api.maersk.com/track/221242675?operator=MAEU
This will return a JSON-formatted response which you can parse and use to extract the details. It'll also give you a much easier method to access the data rather than parsing the HTML. Example below.
{
"tpdoc_num": "221242675",
"isContainerSearch": false,
"origin": {
"terminal": "YanTian Intl. Container Terminal",
"geo_site": "1PVA2R05ZGGHQ",
"city": "Yantian",
"state": "Guangdong",
"country": "China",
"country_code": "CN",
"geoid_city": "0L3DBFFJ3KZ9A",
"site_type": "TERMINAL"
},
"destination": {
"terminal": "DCT Gdansk sa",
"geo_site": "02RB4MMG6P32M",
"city": "Gdansk",
"state": "",
"country": "Poland",
"country_code": "PL",
"geoid_city": "3RIGHAIZMGKN3",
"site_type": "TERMINAL"
},
"containers": [ ... ]
}

Get specific JSON data without decoding multiple links

I have a problem with fetching the correct data from a decoded JSON file. I don't know if my question is correct since I don't really know what I am doing for the moment.
So, this is what I don't want to do.
$ln = 'https://api.steamprices.net/v2/csgoprices/?id='.market_hash_name.'&key=XXX';
$link1 = file_get_contents($ln);
$myarray1 = json_decode($link1, true);
echo $myarray1['median_price'];
I am trying to get the price for every steam skin that's being loaded in my code. What this code does is that it loads this api link for every item I load. So if I have 50 items, this link will be loaded 50 times, which is not accepted by the API.
What I want to do, is that I want to load it once, and fetch the prices for every item from that exact link. That link would look like this:
https://api.steamprices.net/v2/csgoprices/?&key=XXX
So, lets say I load it once, and then when I want to apply market_hash_name to it, how do I do?
I assume it is something like this.
$priceJson = file_get_contents('https://api.steamprices.net/v2/csgoprices/?key=XXX');
$priceData = json_decode($priceJson, true);
echo $priceData[''.$market_hash_name.'']['price'];
But it doesn't seem to work. I am sorry for this messy explanation, I an unfamiliar with this.
Note that an example response for the api link looks like this:
{
"-r-H1Z1 Shirt": {
"price": 0.11,
"image": "https://steamcommunity-a.akamaihd.net/economy/image/iGm5OjgdO5r8OoJ7TJjS39tTyGCTzzQwmWl1QPRXu8oaf69-NOHLAbqw_23aLe8AcRQ8-3uyKA7_CGvsJYds9U65FMF7i6AbXTJ8PDm57EliZdK7KLPuuh3dxC3m4m0ihzss0MKE6NtIt4qs-JukOX73WgETXYze_pxEBA",
"game": "h1z1"
},
"2016 Invitational Crate": {
"price": 0.09,
"image": "https://steamcommunity-a.akamaihd.net/economy/image/iGm5OjgdO5r8OoJ7TJjS39tTyGCTzzQwmWl1QPRXu8oaf69-NOHLAbqw_23aLe8AcRQ8-3uyKA7_CGvsJYds9U65FMF7i6APSjJ6BjX9rGBYZ9ioCPzysSX6hNNacA",
"game": "h1z1"
},
"ANGRYPUG Motorcycle Helmet": {
"price": 0.17,
"image": "https://steamcommunity-a.akamaihd.net/economy/image/iGm5OjgdO5r8OoJ7TJjS39tTyGCTzzQwmWl1QPRXu8oaf69-NOHLAbqw_23aLe8AcRQ8-3uyKA7_CGvsJYds9U65FMF7i6AbXTJ8PDm57EliZdK7KLPuuh3WySnxyXoUgz870MKd7sFTkZq98oW1ORiqAVsCUYfbNu3SUQqvUSGyY__iEw",
"game": "h1z1"
},
Another output
{
"name":"Aces High Pin",
"price":1210,
"have":2,
"max":9,
"rate":95,
"tr":0
}

Well, the json string you provide isn't valid but something like this may help you
<?php
$jsonData=file_get_contents("json.file"); // simply contains your json string as posted
$jsonArray=json_decode($jsonData,true);
$jsonObject=json_decode($jsonData);
$list_of_MHN=array("2016 Invitational Crate","ANGRYPUG Motorcycle Helmet");
print_r($jsonArray);
exit;
foreach($jsonArray as $hash_name=>$arr){
if(in_array($hash_name,$list_of_MHN)){
print_r($arr);
}
}
for($i=0;$i<count($list_of_MHN);$i++){
if(isset($jsonArray[$list_of_MHN[$i]])){
print_r($jsonArray[$list_of_MHN[$i]]);
}
}
for($i=0;$i<count($list_of_MHN);$i++){
if(isset($jsonObject->$list_of_MHN[$i])){
print_r($jsonObject->$list_of_MHN[$i]);
}
}
?>

SimpleXML removes tags in node

I'd like to parse a XML file which is generated by an application called Folker. It's an application to transcribe spoken text. Sometimes it saves the lines in a good format which can be parsed with SimpleXML but sometimes it doesn't.
This line is good:
<contribution speaker-reference="KU" start-reference="TLI_107" end-reference="TLI_109" parse-level="1">
<unparsed>ich überLEG mir das [nochma:l,]</unparsed>
</contribution>
This line is not:
<contribution speaker-reference="VK" start-reference="TLI_108" end-reference="TLI_111" parse-level="1">
<unparsed>[JA:_a; ]<time timepoint-reference="TLI_109"/>ja,<time timepoint-reference="TLI_110"/>also (.) wie [geSAGT;]</unparsed>
</contribution>
In the second line SimpleXML removes the tags which are inside the unparsed node.
How can I get SimpleXML to not remove these tags but parse it as deeper nodes or outputs as an object for example like this (just in JSON for better understanding):
"contribution": {
"speaker-reference": "VK",
"start-reference": "TLI_108",
"end-reference": "TLI_111",
"parse-level": "1",
"unparsed": {
"content": "[JA:_a; ]",
"time": {
[
"timepoint-reference": "TLI_109",
"content": "ja,"
],
[
"timepoint-reference": "TLI_110",
"content": "also (.) wie [geSAGT;]"
]
}
}
}

No, it does not remove them. This works flawlessly (interesting app btw):
<?php
$string = '<contribution speaker-reference="VK" start-reference="TLI_108" end-reference="TLI_111" parse-level="1">
<unparsed>[JA:_a; ]<time timepoint-reference="TLI_109"/>ja,<time timepoint-reference="TLI_110"/>also (.) wie [geSAGT;]</unparsed>
</contribution>';
$xml = simplexml_load_string($string);
$t = $xml->unparsed->time[0];
print_r($t->attributes());
?>
// output:
SimpleXMLElement Object
(
[#attributes] => Array
(
[timepoint-reference] => TLI_109
)
)
You can even iterate over them:
$times = $xml->unparsed->children();
foreach ($times as $t) {
$attributes = $t->attributes());
// do sth. useful with them afterwards
}
Hint: Assumingly, you were trying print_r() or var_dump() on the xml tree. This sometimes gives back opaque results as most of the magic happens behind the scenes. Better use echo $xml->asXML(); to see the actual XML string.

Getting values from a PHP Object

I am trying to figure out how to echo the genre value from an object created with this programme wrapper that requests json data from 'The Movie Database'. I'm so stuck and grappling to understand Object-oriented PHP so any help would be great.
I think the fact it is 'nested' (if that's the correct terminology) might be the issue.
<?php
include("tmdb/tmdb-api.php");
$apikey = "myapi_key";
$tmdb = new TMDB($apikey, 'en', true);
$idMovie = 206647;
$movie = $tmdb->getMovie($idMovie);
// returns a Movie Object
echo $movie->getTitle().'<br>';
echo $movie->getVoteAverage().'<br>';
echo '<img src="'. $tmdb->getImageURL('w185') . $movie->getPoster() .'"/></li><br>';
echo $movie->genres->id[28]->name;
?>
All of the other values are echoed just fine but I can't seem to get at genres. The json data looks like this ( some of it).
{
"adult":false,
"backdrop_path":"\/fa9qPNpmLtk7yC5KZj9kIxlDJvG.jpg",
"belongs_to_collection":{
"id":645,
"name":"James Bond Collection",
"poster_path":"\/HORpg5CSkmeQlAolx3bKMrKgfi.jpg",
"backdrop_path":"\/6VcVl48kNKvdXOZfJPdarlUGOsk.jpg" },
"budget": 0,
"genres":[
{ "id": 28, "name": "Action" },
{ "id": 12, "name": "Adventure" },
{ "id": 80, "name": "Crime" }
],
"homepage":"http:\/\/www.sonypictures.com\/movies\/spectre\/",
"id":206647,
"imdb_id":"tt2379713",
"original_language":"en",
"original_title":"SPECTRE",
"overview":"A cryptic message from Bond\u2019s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE."
}

$movie->genres->id[28]->name
This assumes that id is an array and you want the item with index number 28 from it. What you want is the item containing an id with the value 28 without knowing its index number.
There's no easy way to get to it. You'd have to loop over the array $movie->genres and output the right one.
Maybe like this:
$n = 28;
// loop over genres-array
foreach($movie->genres as $i=>$g){
// check if id of item has right value and if so print it
if($g->id == $n){
echo $g->name;
// skip rest of loop if you only want one
break;
}
}

Craft a JSONpath expression so that it retrieves only a specific value?

I have some JSON of which the following is a small sample:
{
"results": {
"div": [
{
"class": "sylEntry",
"div": [
{
"class": "sT",
"id": "sOT",
"p": "Mon 11/17, Computer work time"
},
{
"class": "des",
"id": "dOne",
"p": "All classes Siebel 0218"
}
],
"id": "sylOne"
}
]
}
}
I would like to only retrieve the "p" content for the div element with class "sT". I would like to use a loop and doing something like this:
var arrayOfResults = $.results..div.p
does not work because I only want to retrieve the p value for the div element with class "sT".
So how do I construct my JSONpath so that it will retrive the array of p elements that are contained within the divs class "sT".
Thanks!!

Concepts
JSONPath apparently has a filter syntax that allows you to insert arbitrary Javascript into an expression for the purposes of matching or filtering. It also uses # as a shortcut for the current node. Their example of combining these two things looks like this:
$..book[?(#.price<10)] // filter all
books cheapier than 10
So this is probably what you want to use here.
Solution
To test the query I had in mind, I modified the jsonpath-test-js.html file in JSONPath's repo to test your data. You can copy-paste my sample to an HTML file and just load it in a browser.
Their test suite has an array of objects with fields called o and p. o contains the original data to operate on while p contains an array of JSONPath expressions to apply to o. It loops over all these pairs and applies all the ps to their respective os, printing out the result. Not as handy as a simple REPL, but it'll do.
Here's what I came up with:
<html>
<head>
<title> JSONPath - Tests (js)</title>
<script type="text/javascript" src="http://www.json.org/json.js"></script>
<script type="text/javascript"
src="http://jsonpath.googlecode.com/svn/trunk/src/js/jsonpath.js">
</script>
</head>
<body>
<pre>
<script type="text/javascript">
var out = "", tests =
[ { "o": { "results" : { "div" : [ { "clazz": "sylEntry",
"id": "sylOne", "div": [ { "clazz": "sT", "id": "sOT",
"p": "Mon 11/17, Computer work time" }, { "clazz": "des",
"id": "dOne", "p": "All classes Siebel 0218" } ] } ] } },
"p": ["$.results..div[?(#.clazz=='sT')].p", // my suggestion expression
"$.results..div[*].p"]}, // your question's expression
];
function evaluate($, p) {
var res = eval(p);
return res != null ? res.toJSONString() : null;
}
for (var i=0; i<tests.length; i++) {
var pathes;
for (var j=0; j<tests[i].p.length; j++) {
pre = ">";
if (pathes = jsonPath(tests[i].o, tests[i].p[j], {resultType: "PATH"}))
for (var k=0; k<pathes.length; k++) {
out += pre + " " + pathes[k] +
" = " + evaluate(tests[i].o, pathes[k]) + "\n";
pre = " ";
}
}
out += "<hr/>";
}
document.write(out);
</script>
</pre>
</body>
</html>
Note that this will first print the results of my query expression and then print the results of yours, so we can compare what they produce.
Here's the output it produces:
> $['results']['div'][0]['div'][0]['p'] = "Mon 11/17, Computer work time"
> $['results']['div'][0]['div'][0]['p'] = "Mon 11/17, Computer work time"
$['results']['div'][0]['div'][4]['p'] = "All classes Siebel 0218"
So the correct operator in the filter expression is ==, meaning the correct expression for you is:
$.results..div[?(#.class=='sT')].p
However, I discovered one unfortunate issue (at least in the Javascript implementation of JSONPath): using the word 'class' in the above query results in this:
SyntaxError: jsonPath: Parse error: _v.class=='sT'
My only guess is that there's an eval being called somewhere to actually evaluate the JSONPath expression. class is a reserved word in Javascript, so it's causing issues. Let's try using the alternate syntax for #.class:
$.results..div[?(#.['class']=='sT')].p
Results:
> $['results']['div'][0]['div'][0]['p'] = "Mon 11/17, Computer work time"
> $['results']['div'][0]['div'][0]['p'] = "Mon 11/17, Computer work time"
$['results']['div'][0]['div'][5]['p'] = "All classes Siebel 0218"
So use the above expression and you should be good to go! The filter feature looks powerful, so it'll probably be well worth exploring its capabilities!

Instead of using hard-to-grasp, non-standard query style, you could use DefiantJS (http://defiantjs.com), which extends the global object JSON with the method "search" - with which you can query JSON structures with standardised XPath queries. This method returns the matches in an array (empty array if no matches were found).
Here is a working JSfiddle of the code below;
http://jsfiddle.net/hbi99/sy2bb/
var data = {
"results": {
"div": {
"class": "sylEntry",
"id": "sylOne",
"div": [
{
"class": "sT",
"id": "sOT",
"p": "Mon 11/17, Computer work time"
},
{
"class": "des",
"id": "dOne",
"p": "All classes Siebel 0218"
}
]
}
}
},
res = JSON.search( data, '//div[class="sT"]/p' );
console.log( res[0] );
// Mon 11/17, Computer work time
To get an idea of XPath and how it works, check out this XPath Evaluator tool:
http://defiantjs.com/#xpath_evaluator

try this
JsonPath.with(jsonResponse).param("name", "getName").get("findAll { a -> a.name == name }")

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP xpath find elements - php

Related

Scraping websites with PHP

Get specific JSON data without decoding multiple links

SimpleXML removes tags in node

Getting values from a PHP Object

Craft a JSONpath expression so that it retrieves only a specific value?

Categories

Resources