Getting name spaces out of XML from simplexml_load_file in php - php

I am trying to parse this YouTube XML using simplexml_load_file in php.
The XML feed can be found here:
https://www.youtube.com/feeds/videos.xml?playlist_id=PL1mm1FfX5EHRjGyoBpEXBRIGAmCNt8pBT
Below in php I am trying to iterate through the media groups nested inside each entry node.
<?php
$xmlFeed=simplexml_load_file('https://www.youtube.com/feeds/videos.xml?playlist_id=PL1mm1FfX5EHRjGyoBpEXBRIGAmCNt8pBT')
or die("Cannot load YouTube video feed, please try again later.");
foreach ($xmlFeed->entry->children('media', true)->group as $video) {
echo $video->title;
echo $video->description;
echo $video->thumbnail->getNameSpaces(true);
}
?>
Title and description print just fine. But I'm trying to get at the thumbnail URL found in this namespace:
<media:thumbnail url="https://i1.ytimg.com/vi/HEYQXVGnwXc/hqdefault.jpg" width="480" height="360"/>
I've tried all 3 of the following:
echo $video->thumbnail->getNameSpaces(true);
echo $video->thumbnail->getNameSpaces(true)['url'];
echo $video->thumbnail->getNameSpaces(true)->url;
None return the url. The first returns Array and the last two are blank. What am I missing?

Several things: first, you have to use the attributes() function since there is no child of thumbnail. Secondly, you don't need to declare getNameSpaces(true) since the namespace prefix media is done in the for loop. Finally, you do not iterate across all media:group. Right now, you will return only the first set of xml values, not both from each <entry> node. Therefore, you need to add an outer loop -one that iterates across the frequency of <entry> nodes.
$attr = 'url';
for($i = 0; $i < sizeof($xmlFeed->entry); $i++) {
foreach ($xmlFeed->entry[$i]->children('media', true)->group as $video) {
echo $video->title."\n";
echo $video->description."\n";
echo $video->thumbnail->attributes()->$attr."\n";
}
}
XPATH Alternative
Even further, you could have handled your needs in XPath by simply registering the media namespace and querying to exact locations, iterating of course across each set:
$xmlFeed->registerXPathNamespace('media', 'http://search.yahoo.com/mrss/');
// ARRAYS TO HOLD XML VALUES
$videos = $xmlFeed->xpath('//media:group');
$title = $xmlFeed->xpath('//media:group/media:title');
$description = $xmlFeed->xpath('//media:group/media:description');
$url = $xmlFeed->xpath('//media:group/media:thumbnail/#url');
// ITERATING THROUGH EACH ARRAY
for($i = 0; $i < sizeof($videos); $i++) {
echo $title[$i]."\n";
echo $description[$i]."\n";
echo $url[$i]."\n";
}

Related

Extracting XML Data with Foreach Loops, Results Inconsistent

I am extracting XML data using DOMDocument and foreach loops. I am pulling certain attributes and node values from the XML document and creating variables with that data. I am then echoing the variables.
I have successfully completed this for the first portion of the XML data that lives between the <VehicleDescription tags. However, using the same logic with data within the <style> tags, I have been having issues. Specially, the created variables won't echo unless they are in the foreach loop. See the code below for clarification.
My php:
<?php
$vehiclexml = $_POST['vehiclexml'];
$xml = file_get_contents($vehiclexml);
$dom = new DOMDocument();
$dom->loadXML($xml);
//This foreach loop works perfectly, the variables echo below:
foreach ($dom->getElementsByTagName('VehicleDescription') as $vehicleDescription){
$year = $vehicleDescription->getAttribute('modelYear');
$make = $vehicleDescription->getAttribute('MakeName');
$model = $vehicleDescription->getAttribute('ModelName');
$trim = $vehicleDescription->getAttribute('StyleName');
$id = $vehicleDescription->getAttribute('id');
$BodyType = $vehicleDescription->getAttribute('altBodyType');
$drivetrain = $vehicleDescription->getAttribute('drivetrain');
}
//This foreach loop works; however, the variables don't echo below, the will only echo within the loop tags. How can I resolve this?
foreach ($dom->getElementsByTagName('style') as $style){
$displacement = $style->getElementsByTagName('displacement')->item(0)->nodeValue;
}
echo "<b>Year:</b> ".$year;
echo "<br>";
echo "<b>Make:</b> ".$make;
echo "<br>";
echo "<b>Model:</b> ".$model;
echo "<br>";
echo "<b>Trim:</b> ".$trim;
echo "<br>";
echo "<b>Drivetrain:</b> ".$drivetrain;
echo "<br>";
//Displacement will not echo
echo "<b>Displacement:</b> ".$displacement;
?>
Here is the XML file it is pulling from:
<VehicleDescription country="US" language="en" modelYear="2019" MakeName="Toyota" ModelName="RAV4" StyleName="LE" id="1111" altBodyType="SUV" drivetrain="AWD">
<style modelYear="2019" name="Toyota RAV4 LE" passDoors="4">
<make>Toyota</make>
<model>RAV4</model>
<style>LE</style>
<drivetrain>AWD</drivetrain>
<displacement>2.5 liter</displacement>
<cylinders>4-cylinder</cylinders>
<gears>8-speed</gears>
<transtype>automatic</transtype>
<horsepower>203</horsepower>
<torque>184</torque>
</style>
</VehicleDescription>
Any help or insight as to why variables from the first foreach loop echo but variables from the second don't would be greatly appreciated.
Thanks!
Just to post an alternative solution to the way you've fixed this.
As you've, there are a couple of <stlye> tags, this means that the foreach will attempt to use all style tags. But as you know that you are after the contents of the first tag only, you can drop the foreach loop and use the item() method...
$displacement = $dom->getElementsByTagName('style')->item(0)
->getElementsByTagName('displacement')->item(0)->nodeValue;
This also applies to how you fetch the data from the <VehicleDescription> tag. Drop the foreach and use
$vehicleDescription = $dom->getElementsByTagName('VehicleDescription')->item(0);
The error was within the XML document.
Within the <style> tags was another set of <style> tags. Changing the name of the second set solved this issue.

simplexml_load_file : if main tag contains something

I'm not so sure about the title, will try to explain in the next lines.
I have an xml file like this :
<CAR park="3" id="1" bay="0">
<SITE_ID>0</SITE_ID>
<SITE_NAME>Car Seller 1</SITE_NAME>
. . .
</CAR>
I am sucessfully iterating through my xml to get all the data.
But, I want to be able to filter by bays. I want to do something like
$xml = simplexml_load_file('myfile.xml');
$x = 1;
foreach($xml as $car) {
if($car->bay == '0'){
echo $car->SITE_ID;
$x++;
}
}
You can use XPath to fetch only the bay 0 cars...
$bay0 = $xml->xpath('//CAR[#bay="0"]');
foreach ( $bay0 as $car ) {
echo $car->SITE_ID.PHP_EOL;
}
The XPath statement is simply - any CAR element that has an attribute bay with the value 0 in it.
In case you need to access attributes in other cases, with SimpleXML - you access them as though they are array elements, so it would be $car['bay'] in the code you had above.

How to parse malformed RSS feed from third party sites using php?

I'm trying to parse RSS feeds from some medias. My script works for most of them. The problem is that I need to agregate all of them, eventhough they are malformed.
I don't manage to get the description of these two feeds. How could I proceed anyway ?
Here is my script :
<?php
function RSS_items ($url) {
$i = 0;
$doc = new DOMDocument();
$doc->load($url);
$channels = $doc->getElementsByTagName('channel');
foreach($channels as $channel) {
$items = $channel->getElementsByTagName('item');
foreach($items as $item) {
$i++;
$y[$i]['title'] = $item->getElementsByTagName('title')->item(0)->firstChild->textContent;
$y[$i]['link'] = $item->getElementsByTagName('link')->item(0)->firstChild->textContent;
$y[$i]['updated'] = $item->getElementsByTagName('pubDate')->item(0)->firstChild->textContent;
$y[$i]['description'] = $item->getElementsByTagName('description')->item(0)->firstChild->textContent;
}
}
echo '<pre>';
print_r ($y);
echo '</pre>';
}
// the two malformed feeds
RSS_items ('http://www.lefigaro.fr/rss/figaro_actualites-a-la-une.xml');
RSS_items ('https://francais.rt.com/rss');
?>
Problem of your code is in useing firstChild property that select first child of element. But in target XML, description tag hasn't any childs that you want to select first of them. Remove it from code. The result should be like this
$item->getElementsByTagName('description')->item(0)->textContent;

How to get iTunes-specific child nodes of RSS feeds?

I'm trying to process an RSS feed using PHP and there are some tags such as 'itunes:image' which I need to process. The code I'm using is below and for some reason these elements are not returning any value. The output is length is 0.
How can I read these tags and get their attributes?
$f = $_REQUEST['feed'];
$feed = new DOMDocument();
$feed->load($f);
$items = $feed->getElementsByTagName('channel')->item(0)->getElementsByTagName('item');
foreach($items as $key => $item)
{
$title = $item->getElementsByTagName('title')->item(0)->firstChild->nodeValue;
$pubDate = $item->getElementsByTagName('pubDate')->item(0)->firstChild->nodeValue;
$description = $item->getElementsByTagName('description')->item(0)->textContent; // textContent
$arrt = $item->getElementsByTagName('itunes:image');
print_r($arrt);
}
getElementsByTagName is specified by DOM, and PHP is just following that. It doesn't consider namespaces. Instead, use getElementsByTagNameNS, which requires the full namespace URI (not the prefix). This appears to be http://www.itunes.com/dtds/podcast-1.0.dtd*. So:
$img = $item->getElementsByTagNameNS('http://www.itunes.com/dtds/podcast-1.0.dtd', 'image');
// Set preemptive fallback, then set value if check passes
urlImage = '';
if ($img) {
$urlImage = $img->getAttribute('href');
}
Or put the namespace in a constant.
You might be able to get away with simply removing the prefix and getting all image tags of any namespace with getElementsByTagName.
Make sure to check whether a given item has an itunes:image element at all (example now given); in the example podcast, some don't, and I suspect that was also giving you trouble. (If there's no href attribute, getAttribute will return either null or an empty string per the DOM spec without erroring out.)
*In case you're wondering, there is no actual DTD file hosted at that location, and there hasn't been for about ten years.
<?php
$rss_feed = simplexml_load_file("url link");
if(!empty($rss_feed)) {
$i=0;
foreach ($rss_feed->channel->item as $feed_item) {
?>
<?php echo $rss_feed->children('itunes', true)->image->attributes()->href;?>
<?php
}
?>

RSS to HTML with varying number of elements

I've adapted the code found here http://www.w3schools.com/php/php_ajax_rss_reader.asp to turn XML into HTML, and it works fine.
But what I'm stuck on is getting it to show all the items in a feed when the feed can have varying numbers of items. The feed is published daily and can have anywhere from 12-20 articles in it, and I want to show all of them.
In the For Loop for ($i=0; $i<=12; $i++) if I set the condition to be greater than the number of articles, I get an error PHP Fatal error: Call to a member function getElementsByTagName(), so I can't just set it to a big number.
I get the same error if I just remove the condition.
I can't figure out how to count the number of items, either; if I could do that the solution would be easy.
The feed is created in-house so I could ask my colleague to insert the number of items in the feed; is that the best way to go about it?
Thanks!
If you don't know the number of items in the feed, you can go through them all using a foreach loop. Here is an example using the RSS feed from the PHP tag on StackOverflow. Have a look at the rss format so you can see what each entry looks like, and compare it to the code below.
# start off like the w3schools code...
$xml=("https://stackoverflow.com/feeds/tag?tagnames=php&sort=newest");
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
# StackOverflow uses the <entry> element for each separate item.
# find all the "entry" items. This returns an array of matching entry elements
$items = $xmlDoc->getElementsByTagName('entry');
# go through the array of "entry" elements one at a time
# $items is the array of <entry> elements
# $i is set to each <entry> in turn, starting from the first one on the page
foreach ($items as $i) {
# some sample code to get the title, tags, and link
$title = $i->getElementsByTagName('title')->item(0)->nodeValue;
$href = $i->getElementsByTagName('link')->item(0)->getAttribute('href');
$tags = $i->getElementsByTagName('category');
$tag_arr = [];
foreach ($tags as $t) {
$tag_arr[] = $t->getAttribute('term');
}
echo "Title: $item_title; tags: " . implode(", ", $tag_arr) . ";\nhref: $href\n\n";
}
Using a foreach loop means you are not stuck with having to work out how many items you have in your array, and you don't have to set up an array iterator using for ($i = 0; $i < 500; $i++).

Categories