How to retrieve an attribute from a namespaced element

How to retrieve an attribute from a namespaced element - php

I am trying to get the url attributes from the <media:content> elements in this RSS feed:
https://news.google.com/rss/search?q=test&as_qdr=w1&scoring=n&num=100&hl=en-CA&gl=CA&ceid=CA:en
Here's what I have so far:
$feed_url = "https://news.google.com/rss/search?q=test&as_qdr=w1&scoring=n&num=100&hl=en-CA&gl=CA&ceid=CA:en";
$rss = file_get_contents($feed_url);
$rss = new SimpleXMLElement($rss);
$items = $rss->channel->item;
foreach ($items as $item) {
print_r($item);
echo "<hr>";
}
This code works for all elements except the ones with a semicolon in the name, like <media:content> or <dc:contributor>. If I open the XML feed in my browser I can see the tag I am looking for:
<media:content url="https://lh6.googleusercontent.com/proxy/nxX8kqpFKSDvYg_bf_QrdsS0PYNMFPGspYmTlZlIo0IzyyhYhURxQc5nrpnzfrNBZkWQywioGXdPclazSIEwiz5wklsBePHOCft9qdHl2EmqIES_SMl5orim2xM2eHYalvIgFFeGYvp7cQaCQpKAObhPGQ--diqZg4Io3MSW8f6PXlRAbUcPvpDxB-KRqBj53bbROhoUYuqxkA=-w150-h150-c" medium="image" width="150" height="150"/>
</item>
I tried various solutions from other threads but it didn't work for me. Example:
$xml_object = $rss->channel->item[0];
$ns_media = $xml_object->children('http://search.yahoo.com/mrss/');
I don't know what I'm doing wrong so I would appreciate some help

You're missing a call to the attributes() method of your SimpleXMLElement instance:
foreach ($items as $item) {
$media_content_url = $item->children('http://search.yahoo.com/mrss/')->attributes()->url;
// ...
}

I'm not familiar with SimpleXML, but this is simple enough to do with DomDocument:
$feed_url = "https://news.google.com/rss/search?q=test&as_qdr=w1&scoring=n&num=100&hl=en-CA&gl=CA&ceid=CA:en";
$rss = file_get_contents($feed_url);
$dom = new DomDocument;
$dom->loadXML($rss);
$nodes = $dom->getElementsByTagNameNS("http://search.yahoo.com/mrss/", "content");
foreach ($nodes as $node) {
printf("%s\n", $node->getAttribute("url"));
}
Output:
https://lh6.googleusercontent.com/proxy/m7yanlDdWIjGc1XmsY6AHB5DqqJcgSe1Z7vs9DUC5NbD-FfQqJzEY8uIadNckLJFu7O6rcuh4W-CsXRg2vjr_KLOWhwNG5shhfdetcUkY5dMHa0uN1GBC5iY0svkP-Wxcm7JJ_kJMh6sctcvJ5Hfbb2Vor8KPlnYXUk_Y3jxYeCgmDBTqeRKwQ1pTMtWtJ_7fK5P5PSdKQKjUNnfVODZjHg_c4PwFWw3Cw=-w150-h150-c
https://lh3.googleusercontent.com/proxy/j7vDbXvscxGVLF8xo2DGkEgmgyQ9-u5vE0RWJjmAp84xOuy4v-Ff6cHADsLiC2Zd2KE7s04sCgtT_WNx4K5vxjDw_jbFRwQhlBgpL-YdXMgvDgakxzx8xWDO5bdpHaVssEGXgkxCnXnHXBRgb67vXeY6XnbgeEp7Fe5ohK1fpyk_hE3IYGyHdJnTxiH_=-w150-h150-c
https://lh6.googleusercontent.com/proxy/nxX8kqpFKSDvYg_bf_QrdsS0PYNMFPGspYmTlZlIo0IzyyhYhURxQc5nrpnzfrNBZkWQywioGXdPclazSIEwiz5wklsBePHOCft9qdHl2EmqIES_SMl5orim2xM2eHYalvIgFFeGYvp7cQaCQpKAObhPGQ--diqZg4Io3MSW8f6PXlRAbUcPvpDxB-KRqBj53bbROhoUYuqxkA=-w150-h150-c
https://lh3.googleusercontent.com/proxy/PbDyKTNQAyxkLNnyQFm00dHkNyKoASc3zKJjw7tjRtfmebHfbP_Ov_5RfcsG1RL8gyFaMSvVltd7IQJns6x_N_thPQTWz7E3ER0RlqLhZBYjmM-cp9xUkCdICiFyfkY0XGx-xGSh6zq5C_SpuzAxCVdhoOkqW4Lz_kyw-KN1fUJB6b8VgDGFvssIfmurSm3qCdJYeFJAx7x6lh_NQS1GNeNdbVBf0RoE2jiZfK6SYgFCX2s9KifQ7Sld_0plNrvTyW6VSR9D0AEwlBClWXNfoMmB_NYl4j03ELoUIjB0fRpUxV7YAqiIC1nSxqn92Q=-w150-h150-c
https://lh6.googleusercontent.com/proxy/DcJhP_BX_r7IbiwFgYt7MOL8RKQMizjCEWAn4YBcWdDy-PYOncpb_PrDZ0H5cMlxSFk9X8SQz5WEAX1xJRV4RWBiPwSH6uDJr7bt1Dh4H7MyYAaB_66BnASNA-fw0pszLPYEgfkwZsRyEZNT045MRYXA3Q=-w150-h150-c
https://lh5.googleusercontent.com/proxy/ECDRkzc4AO3OP1V0PNEVw1OhBYwuDRJVdDzF9lFNW34D8aNO8s54aWJfuR_LhDz-wKCRRvS64ggZnsg2UkCE5EYJghnBkQlpmwktgFYcKW0OnXP_-Ynh37EHR9nG9lyRoM1-3ebj=-w150-h150-c
https://lh6.googleusercontent.com/proxy/9xL1YVQyHg2mzitNgeiHRkjq_vJBxmOb0iAb1bBfJcqFLlWOJWkzRmLYrc9-hmK4nGcLnFfLMEb-bTnZmlWRM72_9ibysFUU_z-77ZK5PhX-f8vfIoWp=-w150-h150-c
https://lh5.googleusercontent.com/proxy/FcDG9_8xHTVUP2uHZ6cMAnIAdDxd-Kg29IksHUEDJCX8_mjTd2voG8BITnqpPEXKtFEImDogPfJfHNlqJr7X2I0VHlkesJvRnE2D-aLRxiNJfc2Lh6WIb0PrRy2nluPe6IJOhUulh0AzZ8JXJVBPgYnfeifItdhBsCTz7QtKGN4DbLZzDAVRL28mHNzaFBlCjCMGyhrbR3jmmlLWqj5K6lbfdoS0jxbLDKkNY8ywVq61rua1YHe-J6ZOY40ESL_0hf27KTgIJFSNq99yGX2sMw=-w150-h150-c
https://lh3.googleusercontent.com/proxy/ciJtTGr7RBlyJD2JGS-Ps4OUOcHxph6Csa1RCJOhcIjjqnjMHnqDzBU2MwEBoieNz37zmC69cPcoUi9696CfpMYh_cry5O6xmTT-1BnlyJhGMTeDR2mIbf3-3VEJ5YsNHmyozvClGvbR6_ZOaMgH0w3AWwf_bZppmqXp7Bul8rXDOkIDMeHrmKCpQcJff-lAbV7hnud3h0JH0--7zw5wCCOj57dNwIA=-w150-h150-c
https://lh5.googleusercontent.com/proxy/nyoQUYC-IPq8Td_GPYc60euyS5cgKwUk7ta1ISJl9wafgrGt1HkhVtPSpoO36KZl_8em4B9bBuP_OXmR0RZlGk1yLwcfAK42NknrGy5H0bLwJqouJ0sE0a21EwardDsVGe1XhGXETO92NfSG2Tikpl9pUFLiJEE-ySdL2d5CK9LA82P4DG6FM5eW=-w150-h150-c
https://lh3.googleusercontent.com/proxy/F2QF_T-xeBkwMyMljtdxwaXMQmNvyG9YCv2QmdBBSmNCe6okG7AIuElWuXXI45IjTA8fuyEZbeGEHBJdWIeyxcjiwapXjzAIxm1lxrmMdzLgJyD4C867KZtcTS1NTWyJebHY4u3gBQ2Z=-w150-h150-c
https://lh4.googleusercontent.com/proxy/rbbBxl7QWG4BBIlsvJUIfL3fr8j4f7L_LoRc3NvfWcNOGT7oX7U1_CpJ71oE04TD0Ax75WmDJrlBQNYPGcsQnOid874Z6P3nwNpdNtZlytX_6FlXqefr58IQ3fB-sivfI20EmQVnRfaBXUozjpHbjW225jeI1hxWc1U15MC_rMuJryLEC_CV4OLg=-w150-h150-c
https://lh5.googleusercontent.com/proxy/7TRaIMikpkiHsWKNenXItiKUUSnRhEl92XSOmDHl828VWobr_M8crMxMLvfG9BDbVc4SioxINBmDyDLhxyHiLlk_c_-6ocsm6ATjrUbWHuc-FTba=-w150-h150-c
https://lh5.googleusercontent.com/proxy/VDXcSNdIqLdhl6IFXyQlwOlzDlhPGaPTOWN0XMuX_DHozxXowzuQMWGAnFNIizgXavLZ5rVcw9rGl7NWTHMyRboJVqjzRRtoOs1GJCb0dylyUsHSt5qUSOZULdyBDPkW6HXVDHHyR40EeR4CS9nOX0M=-w150-h150-c
https://lh5.googleusercontent.com/proxy/YlbnCDloJxYZBVFhx-k2JZzYzFhcBA6DMAim5QTgNoyB4-Q_8DjgR2-JV73ARnUmpROHYfdZiKwfEUCdPB7tUSJ-uJuSRFgfQ_t8CV6rQ8zAXiIKOuoNO20AMArh5NXKr99nP_FoiOLf6mIEJw--URXUP0Tg4-i7bCXXdIPIvVpWNaDxKesNa1MRzIWkzHnYoGuy5QGL7byi32Y0ld8mHP2KFbXT2aga0f3S5rl-ikFkRxpaSxk0coE=-w150-h150-c
https://lh6.googleusercontent.com/proxy/XgDFopqdfCYmrzYi5NKRDYZYZgmJ2fyAn-9eN9QnXgmBezTviTAYV4ct07peVeZMerMND3ZwgZ-lK8Uv7B6FivV6LXqAJN4E7OVvtKYToSriuCRK4QOTuf6oFXG0KTetG-QJCoHZT77mWJtCGb0jG9tch59MJ0aWWZ1NA5X7wF4aMtoLSHkvAK2cuspI85CpPMj2ayu6wiG-0GT8fcAwRsVW64773g=-w150-h150-c
https://lh6.googleusercontent.com/proxy/WTq4Z72Acy7ykLcNmv9b_B2IQerfE4M7V00dxJ1o65IBz4OPyDzKkDBrVLAvqKdSjOHuTsHwTw5_UBINqKU3tFIOeEjCdsOs3yLkqG52YI-NGYvtw0EUohgu6Ps=-w150-h150-c
https://lh6.googleusercontent.com/proxy/uyiP15MOcfSPSHn6FU_LMK08w-0WiaDKQwC0e5rC67ZmdEqYqDYQDjHCkkH0UB0vxLRNpUw0jyz_YCvZ713hPuAeZM4xk7ZItIzNnkXRv_H-n8196YojFvVHZjbRYNZguxHtN7uAT1hxNvrRlRL-pdG9eiUs58rx_w=-w150-h150-c
https://lh4.googleusercontent.com/proxy/MBgkZaPJYX173801aVCotrRKb5pTCO3orWgu0cgsCuglj0bAQW7Za7HhaQUVk2NIUxg8MQz0qiizFGBTSZXdcUwpsmTPpz2rXYuZpqxgSAsxjZH4RUZW9P74EuM-_KzJQ7fKZx7sVK26kLxXr206i6DWH8LCCFa9utvSWevAS2OlYSYc17a09Z6Af5NJ7FoIJ6jxh1wnsuZhfwebq_4zQGDg4AF9XVrKayVrYExcBtx-kYqgjAqTiswm7YoO1yJGBUNkXh1aXC76C50WEOiYemdoXeygD1KIsRpExccCQZZ_NDjIgRMjBvPAGy-1Ue1xwCiXvgFu1P7AtZ2g0XA=-w150-h150-c
https://lh6.googleusercontent.com/proxy/9TiiPwVKm7hFE4kzBEeQ2nrwR8h3hjJ1KLMw6G5s-3_FzO0QorKfKYkubFvF-YDgMYyujUloeApsM00xuBWgQYJE4vRrcdXFoAik032DyRykwAYl7e87Qjqb2wnNMUpfmX1PTZo_OK6VC69sGQY9ISWevaI09tKI6w=-w150-h150-c
https://lh4.googleusercontent.com/proxy/bePdkudsqFAyihVJPg94KH4SKhVTwB0BSFiXEsCdQEZubli_o9RtbEctWtw1X5CC_x9JqM9vBPhWMyP29eBmkrCzi04osGEwiTOoaeLI3WhPl49w0UKeNvOONgkK3ZbPhJ4RFutptw71nhLoiWX2uAUDChy_zgxYmg=-w150-h150-c
https://lh5.googleusercontent.com/proxy/GgkEfaSJZU_ZrtKUpU3aqKBS-u1fUkz3wDJH7m3iPlFduw_C0yfprGaOGubYqxBdILL69inogeniN3zM83hh2EaBGS5wNJ8PfmSe1bUOy605NxMR19SPSgeLu0hGlrc_d16v-V1OtwxDt2SVDM382UQ=-w150-h150-c
https://lh3.googleusercontent.com/proxy/C04R8mKirMGoXC7SvbOwbMAh2BpjAUcqRhyFEoZhIg2bl55t5jgF6zLwXvAAe4O95ETW1fp1sIQSGgzCoxlFBp51LCEzyfvDiKkxt0LpYzHmNTeIxmGTlmBkRv4wRNGquW0ZBp1AWnjoqaGosgMWv0kQI6QTkgFTEI5tuhrLppr_0Xcfy4JNSqo8bSVxa_fb27Iih5Evf1RUSS6Umnc6wW3cHip0icT7QmfdebQs3LUvSUqHaaeDikc60NOQdZRX8tUcODic84RoOn41vM8NvBrZZuQgImC8GTPavaMTUIsEOK4QxSNUdxKduRcPpa0p=-w150-h150-c
https://lh4.googleusercontent.com/proxy/qpUSiYScbVAjAv63NKloqPadlLXD7xWo0eocfLMerlUozukyVTS4QWZYcJBPmkHuxJZCh85Zh1mEepVEeZH3JSMxQRrcE-4Apawmnw=-w150-h150-c
https://lh6.googleusercontent.com/proxy/CNvMvcbMHHckCXbyFXkRZnnuPr17TEzvspLGwIobu15dDsrlHt-3QKzi7kcHKvTpJZlCr2l-HhWOPBfJJzPRrd2gn734awWdE5jXRcqrnfly4bwnIPokO68_luWur73lg-k=-w150-h150-c
https://lh4.googleusercontent.com/proxy/eDks_caYi82LNyG2L2AMQENYCuL7LfaH5rhL-qT6QiVnb5r142EFLTe4u61mZf-1xE2UkJB9GfcUy4x5IfNOU-JCd78FMn--f1CldoEY9y7ouU5cdZ8=-w150-h150-c
https://lh6.googleusercontent.com/proxy/WsWb8Woo-ogEIuDkvaBpsxrizSDUwM_k6h9w1ma-d_i4f3c9Bpefe6llemcMlZNODP5hx_raBrZ6dlclfDXJpirHGgVuTFp3W_mCdrGWO1LCsQf6Nz3iyjgJIbFFv12K3rC9sy2sfV3kgpQRURxi50MwLLG4lUcDx8LIiHWk5bG-VR9IBpygAMPtL5LJoRN8fkg9Vh7RA-J8kkbDm8-xirGXhkYheENaly7yH3qpIo_3aBYrHzS1GsCULOfpjdEuw2OISw=-w150-h150-c

Related

How to extract specific type of links from website using php?

I am trying to extract specific type of links from the webpage using php
links are like following..
http://www.example.com/pages/12345667/some-texts-available-here
I want to extract all links like in the above format.
maindomain.com/pages/somenumbers/sometexts
So far I can extract all the links from the webpage, but the above filter is not happening. How can i acheive this ?
Any suggestions ?
<?php
$html = file_get_contents('http://www.example.com');
//Create a new DOM document
$dom = new DOMDocument;
#$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
//Iterate over the extracted links and display their URLs
foreach ($links as $link){
//Extract and show the "href" attribute.
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
?>

You can use DOMXPath and register a function with DOMXPath::registerPhpFunctions to use it after in an XPATH query:
function checkURL($url) {
$parts = parse_url($url);
unset($parts['scheme']);
if ( count($parts) == 2 &&
isset($parts['host']) &&
isset($parts['path']) &&
preg_match('~^/pages/[0-9]+/[^/]+$~', $parts['path']) ) {
return true;
}
return false;
}
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile($filename);
$xp = new DOMXPath($dom);
$xp->registerNamespace("php", "http://php.net/xpath");
$xp->registerPhpFunctions('checkURL');
$links = $xp->query("//a[php:functionString('checkURL', #href)]");
foreach ($links as $link) {
echo $link->getAttribute('href'), PHP_EOL;
}
In this way you extract only the links you want.

This is a slight guess, but if I got it wrong you can still see the way to do it.
foreach ($links as $link){
//Extract and show the "href" attribute.
If(preg_match("/(?:http.*)maindomain\.com\/pages\/\d+\/.*/",$link->getAttribute('href')){
echo $link->nodeValue;
echo $link->getAttribute('href'), '<br>';
}
}

You already use a parser, so you might step forward and use an xpath query on the DOM. XPath queries offer functions like starts-with() as well, so this might work:
$xpath = new DOMXpath($dom);
$links = $xpath->query("//a[starts-with(#href, 'maindomain.com')]");
Loop over them afterwards:
foreach ($links as $link) {
// do sth. with it here
// after all, it is a DOMElement
}

Parsing RSS with PHP

I'm trying to parse RSS: http://www.mlssoccer.com/rss/en.xml .
$feed = new DOMDocument();
$feed->load($url)
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
foreach($items as $key => $item)
{
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->firstChild->nodeValue;
// do some stuff
}
The thing is: I'm getting "$title" and "$pubDate" without a problem, but for some reason "$description" is always empty, there's nothing in it. What could be the reason for such behaviour and how to fix it?

The problem was with CDATA you need to use textContent instead of nodeValue to retreive value beetween
<?php
$feed = new DOMDocument();
$feed->load('http://www.mlssoccer.com/rss/en.xml');
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
foreach($items as $key => $item)
{
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->textContent; // textContent
}

Here can be whitespaces between the opening <description> tag and the opening <![CDATA[. This is a text node.
So if you access the firstChild of description, you might fetch that whitespace text node.
In a generic way you can set the DOMdocument to ignore whitespace nodes:
$feed = new DOMDocument();
$feed->preserveWhiteSpace = FALSE;
$feed->load($url);
Additionally you should check out XPath, it makes reading a DOM much easier:
$xpath = new DOMXpath($feed);
foreach ($xpath->evaluate('//channel/item') as $item) {
$title = $xpath->evaluate('string(title)', $item);
$pubDate = $xpath->evaluate('string(pubDate)', $item);
$description = $xpath->evaluate('string(description)', $item);
// do some stuff
var_dump([$title, $pubData, $description]);
}

RSS parsing with PHP and SimpleXML: How to enter namespaced items?

I am parsing the following RSS feed (relevant part shown)
<item>
<title>xxx</title>
<link>xxx</link>
<guid>xxx</guid>
<description>xxx</description>
<prx:proxy>
<prx:ip>101.226.74.168</prx:ip>
<prx:port>8080</prx:port>
<prx:type>Anonymous</prx:type>
<prx:ssl>false</prx:ssl>
<prx:check_timestamp>1369199066</prx:check_timestamp>
<prx:country_code>CN</prx:country_code>
<prx:latency>20585</prx:latency>
<prx:reliability>9593</prx:reliability>
</prx:proxy>
<prx:proxy>...</prx:proxy>
<prx:proxy>...</prx:proxy>
<pubDate>xxx</pubDate>
</item>
<item>...</item>
<item>...</item>
<item>...</item>
Using the php code
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$proxylist_xml = new SimpleXmlElement($proxylist_rss);
foreach($proxylist_xml->channel->item as $item) {
var_dump($item); // Ok, Everything marked with xxx
var_dump($item->title); // Ok, title
foreach($item->proxy() as $entry) {
var_dump($entry); //empty
}
}
While I can access everything marked with xxx, I cannot access anything inside prx:proxy - mainly because : cannot be present in valid php varnames.
The question is how to reach prx:ip, as example.
Thanks!

Take a look at SimpleXMLElement::children, you can access the namespaced elements with that.
For example: -
<?php
$xml = '<xml xmlns:prx="http://example.org/">
<item>
<title>xxx</title>
<link>xxx</link>
<guid>xxx</guid>
<description>xxx</description>
<prx:proxy>
<prx:ip>101.226.74.168</prx:ip>
<prx:port>8080</prx:port>
<prx:type>Anonymous</prx:type>
<prx:ssl>false</prx:ssl>
<prx:check_timestamp>1369199066</prx:check_timestamp>
<prx:country_code>CN</prx:country_code>
<prx:latency>20585</prx:latency>
<prx:reliability>9593</prx:reliability>
</prx:proxy>
</item>
</xml>';
$sxe = new SimpleXMLElement($xml);
foreach($sxe->item as $item)
{
$proxy = $item->children('prx', true)->proxy;
echo $proxy->ip; //101.226.74.169
}
Anthony.

I would just strip out the "prx:"...
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$proxylist_rss = str_replace('prx:', '', $proxylist_rss);
$proxylist_xml = new SimpleXmlElement($proxylist_rss);
foreach($proxylist_xml->channel->item as $item) {
foreach($item->proxy as $entry) {
var_dump($entry);
}
}
http://phpfiddle.org/main/code/jsz-vga

Try it like this:
$proxylist_rss = file_get_contents('http://www.xxx.com/xxx.xml');
$feed = simplexml_load_string($proxylist_rss);
$ns=$feed->getNameSpaces(true);
foreach ($feed->channel->item as $item){
var_dump($item);
var_dump($item->title);
$proxy = $item->children($ns["prx"]);
$proxy = $proxy->proxy;
foreach ($proxy as $key => $value){
var_dump($value);
}
}

PHP XML file filter on match

I am having a heck of a time getting this working...
What I want to do is filter a xml file by a city (or market in this case).
This is the xml data.
<itemset>
<item>
<id>2171</id>
<market>Vancouver</market>
<url>http://</url></item>
<item>
<id>2172</id>
<market>Toronto</market>
<url>http://</url></item>
<item>
<id>2171</id>
<market>Vancouver</market>
<url>http://</url></item>
This is my code...
<?php
$source = 'get-xml-feed.php.xml';
$xml = new SimpleXMLElement($source);
$result = $xml->xpath('//item/[contains(market, \'Toronto\')]');
while(list( , $node) = each($result)) {
echo '//Item/[contains(Market, \'Toronto\')]',$node,"\n";
}
?>
If I can get this working I would like to access each element, item[0], item[1] base on filtered results.
Thanks

I think this implements what you are looking for using XPath:
<?php
$source = file_get_contents('get-xml-feed.php.xml');
$xml = new SimpleXMLElement($source);
foreach ($xml as $node)
{
$row = simplexml_load_string($node->asXML());
$result = $row->xpath("//item/market[.='Toronto']");
if ($result[0])
{
var_dump($row);
}
}
?>
As another answer mentioned, unless you are wed to the use of XPath it's probably more trouble than it's worth for this application: just load the XML and treat the result as an array.

I propose using simplexml_load_file. The learning curve is less step than using the specific XML objects + XPath. It returns an object in the format you descibe.
Try this and you'll see what I mean:
<?php
$source = 'get-xml-feed.php.xml';
$xml = simplexml_load_file($source);
var_dump($xml);
?>
There is also simplexml_load_string if you just have an XML snippet.

<?php
$source = 'get-xml-feed.php.xml';
//$xml = new SimpleXMLElement($source);
$dom = new DOMDocument();
#$dom->loadHTMLFile($source);
$xml = simplexml_import_dom($dom);
$result = $xml->xpath("//item/market[.='Toronto']/..");
while(list( , $node) = each($result)) {
print_r($node);
}
?>
This will get you the parent nodeset when it contains a node with "Toronto" in it. It returns $node as a simplexml element so you will have to deal with it accordingly (I just printed it as an array).

ignoring nested elements when parsing xml with php

probably a simple question to answer for someone:::
xml:
<foobar>
<foo>i am a foo</foo>
<bar>i am a bar</bar>
<foo>i am a <bar>bar</bar></foo>
</foobar>
In the above, I want to display all elements that are <foo>. When the script gets to the line with the nested < bar > the result is "i am a bar" .. which isn't the result I had hoped for.
Is it not possible to print out the entire contents of that element as it is, so that i see: "i am a <bar>bar</bar>"
php:
$xml = file_get_contents('sample');
$dom = new DOMDocument;
#$dom->loadHTML($xml);
$resources= $dom->getElementsByTagName('foo');
foreach ($resources as $resource){
echo $resource->nodeValue . "\n";
}

After some trolling and trying to do what I needed with SimpleXML, I arrived at the following conclusion. My issue with SimpleXML was where the elements are. If the xml is structured, and the hierarchy is standard ... I have no problem.
If the XML is a web page for example, and the <foo> element is anywhere, SimpleXML doesn't have a good facility like getElementsByTagName to pull out the element wherever it may be....
<?php
$doc = new DOMDocument();
$doc->load('sample');
$element_name = 'foo';
if ($doc->getElementsByTagName($element_name)->length > 0) {
$resources = $doc->getElementsByTagName($element_name);
foreach ($resources as $resource) {
$id = null;
if (!$resource->hasAttribute('id')) {
$resource->setAttribute('id', gen_uuid());
}
$innerHTML = null;
$children = $resource->childNodes;
foreach ($children as $child) {
$tmp_doc = new DOMDocument();
$tmp_doc->appendChild($tmp_doc->importNode($child,true));
$innerHTML .= rtrim($tmp_doc->saveHTML());
}
$resource->nodevalue = $innerHTML;
}
}
echo $doc->saveHTML();
?>

Rather than writing all that code, you might try XPath. That expression would be "//foo", which would get a list of all the elements in the document named "foo".
http://php.net/manual/en/simplexmlelement.xpath.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to retrieve an attribute from a namespaced element - php

You're missing a call to the attributes() method of your SimpleXMLElement instance: foreach ($items as $item) { $media_content_url = $item->children('http://search.yahoo.com/mrss/')->attributes()->url; // ... }

Related

How to extract specific type of links from website using php?

Parsing RSS with PHP

RSS parsing with PHP and SimpleXML: How to enter namespaced items?

PHP XML file filter on match

ignoring nested elements when parsing xml with php

Categories

Resources