Get XML Attributes using PHP - php

I want to get the URL of the image in . The XML document tree is as follow:
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<title>
<![CDATA[ The Star Online Business Highlights ]]>
</title>
<link>/TheStar/Website</link>
<description>...</description>
<image>...</image>
<language>en</language>
<item>
<guid isPermaLink="false">{F88B27DD-24FB-4807-941F-070D772B7586}</guid>
<link>
http://www.thestar.com.my/business/business-news/2017/10/24/top-glove-says-not-buying-adventa-nor-supermax/
</link>
<title>
<![CDATA[ Top Glove says not buying Adventa nor Supermax ]]>
</title>
<description>
<![CDATA[KUALA LUMPUR: Top Glove, which has allocated about RM1bil to expand via mergers, has denied news reports the target companies are Adventa Bhd and Supermax Corporation Bhd.]]>
</description>
<pubDate>Tue, 24 Oct 2017 13:17:18 +08:00</pubDate>
<enclosure url="http://www.thestar.com.my/~/media/online/2017/08/22/03/58/hartalega-glove3.ashx?crop=1&w=0&h=0&" length="" type="image/jpeg"/>
<media:content url="http://www.thestar.com.my/~/media/online/2017/08/22/03/58/hartalega-glove3.ashx?crop=1&w=0&h=0&" type="image/jpeg">
<media:description>
<![CDATA[ ]]>
</media:description>
</media:content>
<section>
<![CDATA[ Business ]]>
</section>
</item>
<item>...</item>
<item>...</item>
<item>...</item>
</channel>
As there is multiple item and I want to make it a loop, I tried:
foreach($xml->channel->item as $news) {
$media = $news->media->children('http://search.yahoo.com/mrss/');
echo ($media->content);
}
and also
foreach($xml->channel->item as $news) {
$media = $news->children('http://search.yahoo.com/mrss/');
echo ($media->content);
}
but both are seems failed. What is the right method?

The $media variable is of type SimpleXMLElement.
What you could do is loop your $media variable in a foreach and then get your url from the attributes.
For example (using simplexml_load_string with additional Libxml parameters to load your example xml:
$source = <<<SOURCE
//Your example xml here
SOURCE;
$xml = simplexml_load_string($source, "SimpleXMLElement", LIBXML_NOERROR|LIBXML_ERR_NONE|LIBXML_ERR_FATAL);
foreach($xml->channel->item as $news) {
$media = $news->children('http://search.yahoo.com/mrss/');
foreach($media as $child) {
echo $child->attributes()->url;
}
}
Will result in:
http://www.thestar.com.my/~/media/online/2017/08/22/03/58/hartalega-glove3.ashx?crop=1=0=0

$xml = new SimpleXMLElement($xml, LIBXML_NOERROR|LIBXML_ERR_NONE|LIBXML_ERR_FATAL);
foreach ($xml->xpath("//media:content") as $node)
{
var_dump ((string) $node["url"]);
}

Related

How do I get the child nodes of this RSS feed?

How can I get the contest logo and start date from this RSS feed? I can get the dc:modified child for example but always get a blank for anything from dc:dataset.
My code:
$feed_url = 'https://www.website.com/?call_custom_simple_rss=1&csrp_post_type=contest&csrp_posts_per_page=2&csrp_show_meta=1';
$feed = file_get_contents($feed_url);
$rss = simplexml_load_string($feed);
foreach($rss->channel->item as $entry) {
echo $entry->children("dc", true)->modified . "<br>";
echo $entry->children("dc", true)->dataset->contest_logo . "<br>";
echo $entry->children("dc", true)->dataset->start_date . "<br>";
}
The RSS feed:
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:wp="http://wordpress.org/export/1.2/" xmlns:excerpt="http://wordpress.org/export/1.2/excerpt/" version="2.0">
<channel>
<title>RSS Title</title>
<description>A website</description>
<lastBuildDate>Wed, 17 Feb 2021 15:03:03 +0000</lastBuildDate>
<item>
<title>
<![CDATA[ Photography Awards ]]>
</title>
<link>
<![CDATA[ /contests/photography-awards/ ]]>
</link>
<pubDate>Mon, 11 Jan 2021 13:52:27 -0600</pubDate>
<dc:identifier>619116</dc:identifier>
<dc:modified>2021-02-09 07:50:10</dc:modified>
<dc:created unix="1610373147">2021-01-11 13:52:27</dc:created>
<dc:dataset>
<contest_logo>
<![CDATA[ 619130 ]]>
</contest_logo>
<start_date>
<![CDATA[ 20210110 ]]>
</start_date>
</dc:dataset>
</item>
</channel>
</rss>
The contest_logo and start_date are in the empty namespace. You have to switch back. Additionally it is not good to reply on namespace prefixes defined in the document. Use the namespace URI (for example defined as mapping array in your code).
$rss = simplexml_load_string($feed);
$xmlns = [
'dc' => 'http://purl.org/dc/elements/1.1/'
];
foreach($rss->channel->item as $entry) {
echo $entry->children($xmlns['dc'])->modified . "<br>";
echo $entry->children($xmlns['dc'])->dataset->children('')->contest_logo . "<br>";
echo $entry->children($xmlns['dc'])->dataset->children('')->start_date . "<br>";
}
Output:
2021-02-09 07:50:10<br>
619130
<br>
20210110
<br>
In DOM you would register an alias on the Xpath processor and use it in the expressions. Here is a demo:
$document = new DOMDocument();
$document->loadXML($feed);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('dc', 'http://purl.org/dc/elements/1.1/');
foreach ($xpath->evaluate('/rss/channel/item') as $entry) {
echo $xpath->evaluate('string(dc:modified)', $entry). "<br>";
echo $xpath->evaluate('string(dc:dataset/contest_logo)', $entry). "<br>";
echo $xpath->evaluate('string(dc:dataset/start_date)', $entry). "<br>";
}
Another alternative - use xpath:
echo $rss->xpath('//dc:dataset/contest_logo')[0] . "\r\n";
echo $rss->xpath('//dc:modified')[0] . "\r\n";
echo $rss->xpath('//start_date')[0] . "\r\n";
Output:
619130
2021-02-09 07:50:10
20210110

Delete Selected Items From Google Merchant XML

i want to remove g:price=0 OR out of stock OR no image ITEMS from my Google Merchant xml feed by PHP.
i'm trying for hours and hours; but could not find a solution yet..
example: (if i have xml like this; the new xml must list only the second item)
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
<title><![CDATA[example title]]></title>
<link><![CDATA[http://www.example.com]]></link>
<description><![CDATA[example description]]></description>
<item>
<g:additional_image_link><![CDATA[]]></g:additional_image_link>
<g:image><![CDATA[]]></g:image>
<g:availability><![CDATA[out of stock]]></g:availability>
<g:price>0.00 TRY</g:price>
</item>
<item>
<g:image><![CDATA[http://www.example.com/image.jpg]]></g:image>
<g:availability><![CDATA[in stock]]></g:availability>
<g:price>100.00 TRY</g:price>
</item>
</channel>
</rss>
Could someone help me? Expected output is this:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
<title><![CDATA[example title]]></title>
<link><![CDATA[http://www.example.com]]></link>
<description><![CDATA[example description]]></description>
<item>
<g:image><![CDATA[http://www.example.com/image.jpg]]></g:image>
<g:availability><![CDATA[in stock]]></g:availability>
<g:price>100.00 TRY</g:price>
</item>
</channel>
</rss>
Here we are using DOMDocument for extracting nodes and removing un-required nodes.
Try this code snippet here
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
<title><![CDATA[example title]]></title>
<link><![CDATA[http://www.example.com]]></link>
<description><![CDATA[example description]]></description>
<item>
<g:additional_image_link><![CDATA[]]></g:additional_image_link>
<g:image><![CDATA[]]></g:image>
<g:availability><![CDATA[out of stock]]></g:availability>
<g:price>0.00 TRY</g:price>
</item>
<item>
<g:image><![CDATA[http://www.example.com/image.jpg]]></g:image>
<g:availability><![CDATA[in stock]]></g:availability>
<g:price>100.00 TRY</g:price>
</item>
</channel>
</rss>
XML;
$array = array("g:image", "g:price", "g:availability");
$domObject = new DOMDocument();
$domObject->loadXML($string);
$results = $domObject->getElementsByTagName("item");
$nodesToRemove = array();
foreach ($results as $node)
{
foreach ($node->childNodes as $innerNode)
{
if ($innerNode instanceof DOMElement && in_array($innerNode->tagName, $array))
{
if ($innerNode->tagName == "g:image" && empty($innerNode->textContent))
{
$nodesToRemove[] = $innerNode->parentNode;
break;
} elseif ($innerNode->tagName == "g:price" && preg_match("/\b0+(\.[0]+)\b/", $innerNode->textContent))
{
$nodesToRemove[] = $innerNode->parentNode;
break;
} elseif ($innerNode->tagName == "g:availability" && $innerNode->textContent == "out of stock")
{
$nodesToRemove[] = $innerNode->parentNode;
break;
}
}
}
}
foreach ($nodesToRemove as $node)
{
$node->parentNode->removeChild($node);
}
echo $domObject->saveXML();
Output:
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
<channel>
<title><![CDATA[example title]]></title>
<link><![CDATA[http://www.example.com]]></link>
<description><![CDATA[example description]]></description>
<item>
<g:image><![CDATA[http://www.example.com/image.jpg]]></g:image>
<g:availability><![CDATA[in stock]]></g:availability>
<g:price>100.00 TRY</g:price>
</item>
</channel>
</rss>

get media:description and media:content url from xml

I have XML data in which some item tag have media: content tag some have not. How can I check that content exists in that XML and also, how can I get description under media: content tag?
Here is XML data:
<rss xmlns:content="" xmlns:wfw="" xmlns:dc="" xmlns:atom="" xmlns:sy="" xmlns:slash="" version="2.0">
<channel>
<item>
<title>Title1</title>
<link>Link</link>
<pubDate>Date</pubDate>
<content:encoded>
<![CDATA[ This is description 1 ]]>
<![CDATA[ This is description 2 ]]>
</content:encoded>
<media:content url="URL" type="image/jpeg">
<media:description>
<![CDATA[ Text ]]>
</media:description>
</media:content>
</item>
<item> -- this item tag does not have media: content
<title>Title2</title>
<link>Link2</link>
<pubDate>Date2</pubDate>
<content:encoded>
<![CDATA[ This is description 3 ]]>
<![CDATA[ This is description 4 ]]>
</content:encoded>
</item>
<item>
<title>Title3</title>
<link>Link3</link>
<pubDate>Date3</pubDate>
<content:encoded>
<![CDATA[ This is description 5 ]]>
<![CDATA[ This is description 6 ]]>
</content:encoded>
<media:content url="UR1L" type="image/jpeg">
<media:description>
<![CDATA[ Text 2 ]]>
</media:description>
</media:content>
</item>
</channel>
</rss>
What I tried is:
<?php
function feeds()
{
$url = "http://localhost/xmldata/xmld.xml"; // xmld.xml contains above data
$feeds = file_get_contents($url);
$rss = simplexml_load_string($feeds);
foreach($rss->channel->item as $entry) {
if($entry->children('media', true)->content->attributes()) {
$md = $entry->children('media', true)->content->attributes();
print_r("$md->url");
}
}
}
?>
It is returning me error like below:
Node no longer exists
Even I don't have any idea to get media:description which is in media:content tag.
You can use isset to check if the 'media:content' property is set on the SimpleXMLElement.
I think it would help if you change these lines:
foreach($rss->channel->item as $entry) {
if($entry->children('media', true)->content->attributes()) {
$md = $entry->children('media', true)->content->attributes();
print_r("$md->url");
}
}
To these lines:
$rss = #simplexml_load_string($feeds);
foreach ($rss->channel->item as $entry) {
if (isset($entry->{'media:content'})) {
$url = (string)$entry->{'media:content'}->attributes()->url;
$description = (string)$entry->{'media:content'}->{'media:description'};
echo "$url<br>";
echo "$description<br>";
}
}
Will result in:
URL
Text
UR1L
Text 2
It is working for me .
$namespaces = $entry->getNamespaces(true);
$media_url = trim((string)$entry->children($namespaces['media'])->content->attributes()->url);

xml DOM : delete element with condition

May be the question is already answered in a way or in another in many questions, but since I'm a new bie in XML, I can't figured it out in my project.
I have an RSS (XML) file with this structure:
<rss>
<channel>
<item>
<title>some title</title>
<description> some descrp </description>
...
</item>
</channel>
</rss>
How can I, in PHP, delete some item when the title is equal to some value? THanks.
EDIT1 : I have my XML file stored at my web server.
$rss = "
<rss>
<channel>
<item>
<title>some title</title>
<description> some descrp </description>
</item>
<item>
<title>some other title</title>
<description> some descrp </description>
</item>
</channel>
</rss>
";
$doc = new DOMDocument();
$doc->loadXML($rss);
$xpath = new DOMXPath($doc);
$els = $xpath->query('//title[text()="some title"]');
foreach($els as $el)
{
$parent = $el->parentNode;
$parent->parentNode->removeChild($parent);
}
echo $doc->saveXML();
It searches for exact match.
ps: another method, without xpath
$doc = new DOMDocument();
$doc->loadXML($rss);
$els = $doc->getElementsByTagName('title');
for($i = $els->length-1; $i >= 0; $i--)
{
$el = $els->item($i);
if ($el->nodeValue == 'some title')
{
$parent = $el->parentNode;
$parent->parentNode->removeChild($parent);
}
}
echo $doc->saveXML();

XML reforming with DOM

I am trying to reformat XML adding intermediate level node.
Here is what I have as input:
<channel>
<item>
<title>Advanced PHP Book</title>
</item>
<item>
<title>MySQL primer</title>
</item>
<item>
<title>C++ for beginners</title>
</item>
</channel>
I need it to be like that at the end (page node added between channel and item):
<channel>
<page>
<item>
<title>Advanced PHP Book</title>
</item>
<item>
<title>MySQL primer</title>
</item>
<item>
<title>C++ for beginners</title>
</item>
</page>
</channel>
Here is my testing code:
$sxe = simplexml_load_string($string);
$dom_sxe = dom_import_simplexml($sxe);
$dom = new DOMDocument('1.0');
$channel = $dom->appendChild($dom->createElement('channel'));
$page = $channel->appendChild($dom->createElement('page'));
$dom_sxe = $dom->importNode($dom_sxe, true);
$dom_sxe = $page->appendChild($dom_sxe);
$dom->formatOutput = true;
echo $dom->saveXML();
The problem I have is that channel element is doubled.
Please help.
I don't think this should be too hard: I think you're overcomplicating it by using the simplexml stuff.
$dom = new DOMDocument;
$dom->loadXML($string);
// create the <page> element
$page = $dom->createElement('page');
while ($dom->firstChild->firstChild) {
// move the items in <channel> to the <page> element
$page->appendChild($dom->firstChild->firstChild);
}
// insert the <page> element into <channel>
$dom->firstChild->appendChild($page);
$dom->saveXML();
$xml = '<channel> <item> <title>Advanced PHP Book</title> </item> <item> <title>MySQL primer</title> </item> <item> <title>C++ for beginners</title> </item> </channel>';
$dom = new DOMDocument;
$dom->loadXML($xml);
$page = $dom->createElement('page');
$items = $dom->getElementsByTagName('item');
while ($items->length) {
$page->appendChild($items->item(0));
}
$dom->getElementsByTagName('channel')->item(0)->appendChild($page);
echo $dom->saveXML();
Output
<?xml version="1.0"?>
<channel> <page><item> <title>Advanced PHP Book</title> </item><item> <title>MySQL primer</title> </item><item> <title>C++ for beginners</title> </item></page></channel>
See it.

Categories