Parsing XML with PHP - php

I'm trying to parse a jobs feed using PHP's SimpleXML. I've only used JSON before and am having problems getting the parser to work. Here's some sample data:
<shrs>
<rq url="http://api.simplyhired.com/a/jobs-api/xml_v2/q-comission">
<t>Comission Jobs</t>
<dt>2011-02-18T23:58:38Z</dt>
<si>0</si>
<rpd>10</rpd>
<tr>192</tr>
<tv>146</tv>
<em url=""/>
<h>
<kw pos="1"/>
</h>
</rq>
<rs>
<r>
<jt>Virtual Recruiter (IT) - Comission ...</jt>
<cn url="">Remedy Intelligent Staffing</cn>
<src url="http://api.simplyhired.com/a/job-details/view/jobkey-monster91949932/cjp-0/hits-192?aff_id=28700">Monster</src>
<ty>organic</ty>
<loc cty="Buffalo" st="NY" postal="14211" county="" region="" country="US">Buffalo, NY</loc>
<ls>2011-02-04T05:51:17Z</ls>
<dp>2011-02-04T05:51:17Z</dp>
<e>
Seeking a candidate with previous recruiting experience to work as a Virtual Recruiter for a large client in the IT industry.a Responsibilities: Will recruit, screen, interview, and place candidates for many openings throughout the US Will...
</e>
</r>
<r>
<jt>Virtual Loan Officer (Mortgage) draw vs comission</jt>
<cn url="">Netbranchology.com</cn>
<src url="http://api.simplyhired.com/a/job-details/view/jobkey-7114.353281/cjp-2/hits-192?aff_id=28700">netbranchology.com</src>
<ty>organic</ty>
<loc cty="Denver" st="CO" postal="80218" county="" region="" country="US">Denver, CO</loc>
<ls>2011-02-10T11:47:50Z</ls>
<dp>2011-01-26T11:36:18Z</dp>
<e>
Minimize your overhead by becoming a virtual loan officer... Our client, a Texas-based mortgage banker, has just launched an innovative new program that lets you work from anywhere to originate residential mortgage loans. No office is...
</e>
</r>
</rs>
</shrs>
[etc]
I'd like to retrieve the metadata in the tags into variables, and then loop through each job result under to process it. How can I do this with PHP? (I've been playing around with the SimpleXML functions so far)

Nodes are accessed as object properties, attributes use the array notation. foreach lets you iterate over nodes. You can get the content of a node by casting it as a string. (so if you use echo it's implied)
$shrs = simplexml_load_string($xml);
foreach ($shrs->rs->r as $r)
{
$jobTitle = $r->jt;
$city = $r->loc['cty'];
echo "There's an offer for $jobTitle in $city<br />\n";
}

Try SimpleXML: http://www.php.net/manual/en/book.simplexml.php
It will parse your XML into a nice object.
Edit: here's how to use it (assumes your xml is stored in the variable $xml):
$xmlObject = new SimpleXMLElement($xml);
// to retrieve "http://api.simplyhired.com/a/jobs-api/xml_v2/q-comission"
$url = $xmlObject->rq['url'];
// to retrieve "Comission Jobs"
$t = $xmlObject->rq->t;
...
Hope it helps.

Related

PHP script to echo VLC now playing XML attributes

I've been searching for a while on this and haven't had much luck. I've found plenty of resources showing how to echo data from dynamic XML, but I'm a PHP novice, and nothing I've written seems to grab and print exactly what I want, though from everything I've heard, it should be relatively easy. The source XML (located at 192.168.0.15:8080/requests/status.xml) is as follows:
<root>
<fullscreen>0</fullscreen>
<volume>97</volume>
<repeat>false</repeat>
<version>2.0.5 Twoflower</version>
<random>true</random>
<audiodelay>0</audiodelay>
<apiversion>3</apiversion>
<videoeffects>
<hue>0</hue>
<saturation>1</saturation>
<contrast>1</contrast>
<brightness>1</brightness>
<gamma>1</gamma>
</videoeffects>
<state>playing</state>
<loop>true</loop>
<time>37</time>
<position>0.22050105035305</position>
<rate>1</rate>
<length>168</length>
<subtitledelay>0</subtitledelay>
<equalizer/>
<information>
<category name="meta">
<info name="description">
000003EC 00000253 00000D98 000007C0 00009C57 00004E37 000068EB 00003DC5 00015F90 00011187
</info>
<info name="date">2003</info>
<info name="artwork_url"> file://brentonshp04/music%24/Music/Hackett%2C%20Steve/Guitar%20Noir%20%26%20There%20Are%20Many%20Sides%20to%20the%20Night%20Disc%202/Folder.jpg
</info>
<info name="artist">Steve Hackett</info>
<info name="publisher">Recall</info>
<info name="album">Guitar Noir & There Are Many Sides to the Night Disc 2
</info>
<info name="track_number">5</info>
<info name="title">Beja Flor [Live]</info>
<info name="genre">Rock</info>
<info name="filename">Beja Flor [Live]</info>
</category>
<category name="Stream 0">
<info name="Bitrate">128 kb/s</info>
<info name="Type">Audio</info>
<info name="Channels">Stereo</info>
<info name="Sample rate">44100 Hz</info>
<info name="Codec">MPEG Audio layer 1/2/3 (mpga)</info>
</category>
</information>
<stats>
<lostabuffers>0</lostabuffers>
<readpackets>568</readpackets>
<lostpictures>0</lostpictures>
<demuxreadbytes>580544</demuxreadbytes>
<demuxbitrate>0.015997290611267</demuxbitrate>
<playedabuffers>0</playedabuffers>
<demuxcorrupted>0</demuxcorrupted>
<sendbitrate>0</sendbitrate>
<sentbytes>0</sentbytes>
<displayedpictures>0</displayedpictures>
<demuxreadpackets>0</demuxreadpackets>
<sentpackets>0</sentpackets>
<inputbitrate>0.016695899888873</inputbitrate>
<demuxdiscontinuity>0</demuxdiscontinuity>
<averagedemuxbitrate>0</averagedemuxbitrate>
<decodedvideo>0</decodedvideo>
<averageinputbitrate>0</averageinputbitrate>
<readbytes>581844</readbytes>
<decodedaudio>0</decodedaudio>
</stats>
</root>
What I'm trying to write is a simple PHP script that echoes the artist's name (In this example Steve Hackett). Actually I'd like it to echo the artist, song and album, but I'm confident that if I'm shown how to retrieve one, I can figure out the rest on my own.
What little of my script which actually seems to work goes as follows. I've tried more than what's below, but I left out the bits that I know for a fact aren't working.
<?PHP
$file = file_get_contents('http://192.168.0.15:8080/requests/status.xml');
$sxe = new SimpleXMLElement($file);
foreach($sxe->...
echo "Artist: "...
?>
I think I need to use foreach and echo, but I can't figure out how to do it in a way that will print what's between those info brackets.
I'm sorry if I've left anything out. I'm not only new to PHP, but I'm new to StackOverflow too. I've referenced this site in other projects, and it's always been incredibly helpful, so thanks in advance for your patience and help!
////////Finished Working Script - Thanks to Stefano and all who helped!
<?PHP
$file = file_get_contents('http://192.168.0.15:8080/requests/status.xml');
$sxe = new SimpleXMLElement($file);
$artist_xpath = $sxe->xpath('//info[#name="artist"]');
$album_xpath = $sxe->xpath('//info[#name="album"]');
$title_xpath = $sxe->xpath('//info[#name="title"]');
$artist = (string) $artist_xpath[0];
$album = (string) $album_xpath[0];
$title = (string) $title_xpath[0];
echo "<B>Artist: </B>".$artist."</br>";
echo "<B>Title: </B>".$title."</br>";
echo "<B>Album: </B>".$album."</br>";
?>
Instead of using a for loop, you can obtain the same result with XPath:
// Extraction splitted across two lines for clarity
$artist_xpath = $sxe->xpath('//info[#name="artist"]');
$artist = (string) $artist_xpath[0];
echo $artist;
You will have to adjust the xpath expression (i.e. change #name=... appropriately), but you get the idea. Also notice that [0] is necessary because xpath will return an array of matches (and you only need the first) and the cast (string) is used to extract text contained in the node.
Besides, your XML is invalid and will be rejected by the parser because of the literal & appearing in the <info name="album"> tag.
If you look at your code again, you are missing a function that turns the first result of the xpath expression into a string of a SimpleXMLElement (casting).
One way to write this once is to extend from SimpleXMLElement:
class BetterXMLElement extends SimpleXMLElement
{
public function xpathString($expression) {
list($result) = $this->xpath($expression);
return (string) $result;
}
}
You then create the more specific SimpleXMLElement like you did use the less specific before:
$file = file_get_contents('http://192.168.0.15:8080/requests/status.xml');
$sxe = new BetterXMLElement($file);
And then you benefit in your following code:
$artist = $sxe->xpathString('//info[#name="artist"]');
$album = $sxe->xpathString('//info[#name="album"]');
$title = $sxe->xpathString('//info[#name="title"]');
echo "<B>Artist: </B>".$artist."</br>";
echo "<B>Title: </B>".$title."</br>";
echo "<B>Album: </B>".$album."</br>";
This spares you some repeated code. This means as well less places you can make an error in :)
Sure you can further on optimize this by allowing to pass an array of multiple xpath queries and returning all values named then. But that is something you need to write your own according to your specific needs. So use what you learn in programming to make programming more easy :)
If you want some more suggestions, here is another, very detailed example using DOMDocument, the sister-library of SimpleXML. It is quite advanced but might give you some good inspiration, I think something similar is possible with SimpleXML as well and this is probably what you're looking for in the end:
Extracting data from HTML using PHP and xPath

Ebay api GetSellerList, Parsing response XML

I am using the ebay trading api to get a sellers stock which is currently listed. I am using the call GetSellerList.I am having trouble parsing the xml which I would then insert into there website shop.
This is the xml request.
<GetSellerListRequest xmlns='urn:ebay:apis:eBLBaseComponents'>
<UserID>".$user_id."</UserID>
<DetailLevel>ReturnAll</DetailLevel>
<ErrorLanguage>RFC 3066</ErrorLanguage>
<WarningLevel>Low</WarningLevel>
<Version>".$compat_level."</Version>
<RequesterCredentials>
<eBayAuthToken>".$auth_token."</eBayAuthToken>
</RequesterCredentials>
<StartTimeFrom>2012-06-12T23:35:27.000Z</StartTimeFrom>
<StartTimeTo>2012-08-30T23:35:27.000Z</StartTimeTo>
<Pagination>
<EntriesPerPage>200</EntriesPerPage>
</Pagination>
<OutputSelector>ItemArray.Item.Title</OutputSelector>
<OutputSelector>ItemArray.Item.Description</OutputSelector>
<OutputSelector>ItemArray.Item.BuyItNowPrice</OutputSelector>
<OutputSelector>ItemArray.Item.Quantity</OutputSelector>
</GetSellerListRequest>
I am not the best with php, I am still learning so i have looked through w3schools and php docs and found nothing. I have been using this (off of ebay tuts) to try and get the values of the xml tags by using getElementsByTagName.
$dom = new DOMDocument();
$dom->loadXML($response);
$titles = $dom->getElementsByTagName('Title')->length > 0 ? $dom->getElementsByTagName('Title')->item(0)->nodeValue : '';
Now i was hoping that i would be able to create an array with this then use foreach to insert them into the db but when i use this it only gets the value of the first 'Title' tag
Im sure there is a way to create an array with all values of 'Title' in it.
All help is appreciated.
This would be easier to answer if you posted the response XML (just the relevant portion) rather than the request.
The code you have will only grab the first item - specifically this part:
$dom->getElementsByTagName('Title')->item(0)->nodeValue
Rather, you'll want to loop through all the Title elements and extract their nodeValue. This is a starting point:
$dom = new DOMDocument();
$dom->loadXML($response);
$title_nodes = $dom->getElementsByTagName('Title');
$titles = array();
foreach ($title_nodes as $node) {
$titles[] = $node->nodeValue;
}

XPath multidimensional arrays in PHP

I'm scraping a website that's mostly table based. I have <tr> tags that each represent a category and <td> tags inside these that represent properties of the category.
Using Xpath I get the <tr> fine but with all the <td> info inside it bunched as one string:
$html_string = file_get_contents('testpage.html');
$dom = new DOMDocument();
$dom->loadHTML($html_string);
$xpath = new DOMXpath($dom);
$context_nodes = $xpath->query('//table[#id="category"]/tr[not(starts-with(#id, "category"))]');
And can each get <td> fine but with no retrospective reference to the category with:
$context_nodes = $xpath->query('//table[#id="category"]/tr[not(starts-with(#id, "category"))]/td');
What I would like to do later is be able to reference the properties of each category. I presumed I could do so with $context_nodes[2] etc., thinking that the array it created was a multidimensional string array. This doesn't seem to be the case.
How would I go about creating an array from the xpath info where I can grab a property of a category based on identifying what category I specifically want. E.g. train[1][2]?
Your second attempt is on the right lines. PHP (or, rather, libxml) retains a reference to the context the nodes you selected were returned from, allowing you to do precisely what you need in your case.
XML
<root>
<cat name="category 1">
<prop>prop 1.1</prop>
<prop>prop 1.2</prop>
</cat>
<cat name="category 2">
<prop>prop 2.1</prop>
<prop>prop 2.2</prop>
</cat>
</root>
PHP
$xml = new SimpleXMLElement($xml);
$props = $xml->xpath('cat/prop');
foreach($props as $prop) {
//let's go back up...
$parent_cat = $prop->xpath('parent::*/#name');
echo '<p>'.$prop.' (property of '.$parent_cat[0].')</p>';
}
Notice how we navigate back up the tree, from the point of the prop node, to reference the parent category. Not sure if this is what you meant but hope it helps.

read xml using file get contents

i hv a xml file,how to get values in title field using get file content method..i just want to get the value "TomTom XXL 550M - US, Canada & Mexico Automotive GPS. (Brand New)"
<Title>
TomTom XXL 550M - US, Canada & Mexico Automotive GPS. (Brand New)
</Title>
my code
$xmlstr = file_get_contents($source);
$parseXML = new SimpleXMLElement($xmlstr);
print($parseXML);
// load as file
$parseXMLFile = new SimpleXMLElement($source,null,true);
If you feel confortable with javascript, there is another solution called DOMDocument
You can load XML files and also use function like getElementsByTagName. For example, if you have a books.xml file like this:
<?xml version="1.0" encoding="utf-8"?>
<books>
<book><title>Patterns of Enterprise Application Architecture</title></book>
<book><title>Design Patterns: Elements of Reusable Software Design</title></book>
<book><title>Clean Code</title></book>
</books>
You can extract titles so:
$dom = new DOMDocument;
$dom->load('books.xml');
$books = $dom->getElementsByTagName('title');
foreach ($books as $book) {
echo $book->nodeValue.'<br>';
}
You just have to read your file with simplexml_load_file : Doc for this one
You will then get object of class SimpleXMLElement.
Then, you can use it to get what you want ! Some examples here : SimpleXML Examples

PHP XML Strategy: Parsing DOM to fill "Bean"

I have a question concerning a good strategy on how to fill a data "bean" with data inside an xml file.
The bean might look like this:
class Person
{
var $id;
var $forename = "";
var $surname = "";
var $bio = new Biography();
}
class Biography
{
var $url = "";
var $id;
}
the xml subtree containing the info might look like this:
<root>
<!-- some more parent elements before node(s) of interest -->
<person>
<name pre="forename">
Foo
</name>
<name pre="surname">
Bar
</name>
<id>
1254
</id>
<biography>
<url>
http://www.someurl.com
</url>
<id>
5488
</id>
</biography>
</person>
</root>
At the moment, I have one approach using DOMDocument. A method
iterates over the entries and fills the bean by "remembering"
the last node. I think thats not a good approach.
What I have in mind is something like preconstructing some xpath
expression(s) and then iterate over the subtrees/nodeLists. Return
an array containing the beans as defined above eventually.
However, it seems not to be possible reusing a subtree /DOMNode
as DOMXPath constructor parameter.
Has anyone of you encountered such a problem?
Did you mean using an XML file as a sort of template ?
You can use some factory to build the empty person or biography node and then feed it, or validate using DTD's
You can search using xpath on selected DOM nodes, see php DOMXpath manual
no. The XML contains real data. I need to transform it into a php array (unfortunenatly it must be PHP :/ don't ask why ...).
---> You can use some factory to build the empty person or biography node and then feed it, or validate using DTD's
The "bean" is not the problem ... Constructing the list of beans is harder than i thought.. maybe the main problem is related to the solution, since I want to keep it as general as possible ..
here is some java code I just wrote, maybe you get an idea..
public List<PersonBean> extract(String xml) throws Exception {
InputSource is =new InputSource(new StringReader(xml));
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
NodeList nodeList = (NodeList)xpath.evaluate("/root/person", is, XPathConstants.NODESET);
int length = nodeList.getLength();
int pos = -1;
Traverser tra = new Traverser();
Attribute nameAttr = new Attribute();
nameAttr.setName("attr");
while(++pos < length) {
PersonBean bean = new PersonBean();
Node person = nodeList.item(pos);
Node fore = tra.getElementByNodeName(person, "id");
nameAttr.setValue("forename");
Node pre = tra.getElementByNodeNameWithAttribute(person,"name",nameAttr);
nameAttr.setValue("surname");
Node sur = tra.getElementByNodeNameWithAttribute(person, "name", nameAttr);
bean.setForeName(pre.getTextContent());
bean.setSurName(sur.getTextContent());
bean.setId(fore.getTextContent());
Node bio = tra.getElementByNodeName(person, "biography");
Node bid = tra.getElementByNodeName(bio, "id");
Node url = tra.getElementByNodeName(bio, "url");
BiographyBean bioBean = new BiographyBean();
bioBean.setId(bid.getTextContent());
bioBean.setUrl(url.getTextContent());
bean.setBio(bioBean);
persons.add(bean);
}
return persons;
}
Traverser is just a simple iterative xml traverser ..
Attribute another Bean for Value and Name.
This solution works fine, given the case there is a "person"-node.. However, the code could grow drastically for all other elements that need to be parsed..
I don't expect ready made solutions, just a small hint in the right direction.. :)
Cheers,
Mike

Categories