How to Parse XML's Media:Content with PHP? - php

I've found a great tutorial on how to accomplish most of the work at:
https://www.developphp.com/video/PHP/simpleXML-Tutorial-Learn-to-Parse-XML-Files-and-RSS-Feeds
but I can't understand how to extract media:content images from the feeds. I've read as much info as i can find, but i'm still stuck.
ie: How to get media:content with SimpleXML
this suggests using:
foreach ($xml->channel->item as $news){
$ns_media = $news->children('http://search.yahoo.com/mrss/');
echo $ns_media->content; // displays "<media:content>"}
but i can't get it to work.
Here's my script and feed i'm trying to parse:
<?php
$html = "";
$url = "http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC";
$xml = simplexml_load_file($url);
for($i = 0; $i < 10; $i++){
$title = $xml->channel->item[$i]->title;
$link = $xml->channel->item[$i]->link;
$description = $xml->channel->item[$i]->description;
$pubDate = $xml->channel->item[$i]->pubDate;
$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "$description";
$html .= "<br />$pubDate<hr />";
}
echo $html;
?>
I don't know where to add this code into the script to make it work. Honestly, i've browsed for hours, but couldn't find working script that would parse media:content.
Can someone help with this?
========================
UPDATE:
Thanx to fusion3k, i got the final code working:
<?php
$html = "";
$url = "http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC";
$xml = simplexml_load_file($url);
for($i = 0; $i < 5; $i++){
$image = $xml->channel->item[$i]->children('media', True)->content->attributes();
$title = $xml->channel->item[$i]->title;
$link = $xml->channel->item[$i]->link;
$description = $xml->channel->item[$i]->description;
$pubDate = $xml->channel->item[$i]->pubDate;
$html .= "<img src='$image' alt='$title'>";
$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "$description";
$html .= "<br />$pubDate<hr />";
}
echo $html;
?>
Basically all i needed was this simple line:
$image = $xml->channel->item[$i]->children('media', True)->content->attributes();
Can't believe it was so hard for non techie to find this info online after reading dozens of posts and articles. Well, hope this will serve well for other folks like me :)

To get 'url' attribute, use ->attribute() syntax:
$ns_media = $news->children('http://search.yahoo.com/mrss/');
/* Echoes 'url' attribute: */
echo $ns_media->content->attributes()['url'];
// in php < 5.5: $attr = $ns_media->content->attributes(); echo $attr['url'];
/* Catches 'url' attribute: */
$url = $ns_media->content->attributes()['url']->__toString();
// in php < 5.5: $attr = $ns_media->content->attributes(); $url = $attr['url']->__toString();
Namespaces explanation:
The ->children() arguments is not the URL of your XML, it is a Namespace URI.
XML namespaces are used for providing uniquely named elements and attributes in an XML document:
<xxx> Standard XML tag
<yyy:zzz> Namespaced tag
└┬┘ └┬┘
│ └──── Element Name
└──────── Element Prefix (Namespace Identifier)
So, in your case, <media:content> is the “content” element of Namespace “media”. Namespaced elements must be have an associated Namespace URI, as attribute of a parent node or — most commonly — of the root element: this attribute has the form xmlns:yyy="NamespaceURI" (in your case xmlns:media="http://search.yahoo.com/mrss/" as attribute of root node <rss>).
Ultimately, the above $news->children( 'http://search.yahoo.com/mrss/' ) means “retrieve all children elements with http://search.yahoo.com/mrss/ as Namespace URI; an alternative — most intelligible — syntax is: $news->children( 'media', True ) (True means “regarded as a prefix”).
Returning to the code in example, the generic syntax to retrieve all first item's children with prefix media is:
$xml = simplexml_load_file( 'http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC' );
$xml->channel->item[0]->children( 'http://search.yahoo.com/mrss/' );
or (identical result):
$xml = simplexml_load_file( 'http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC' );
$xml->channel->item[0]->children( 'media', True );
Your new code:
If you want to show the <media:content url> thumbnail for each element in your page, modify the original code in this way:
(...)
$pubDate = $xml->channel->item[$i]->pubDate;
$image = $xml->channel->item[$i]->children( 'media', True )->content->attributes()['url'];
// in php < 5.5:
// $attr = $xml->channel->item[$i]->children( 'media', True )->content->attributes();
// $image = $attr['url'];
$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "<img src='$image' alt='$title'>";
(...)

Simple example for newbs like me:
$url = "https://www.youtube.com/feeds/videos.xml?channel_id=UCwNPPl_oX8oUtKVMLxL13jg";
$rss = simplexml_load_file($url);
foreach($rss->entry as $item) {
$time = $item->published;
$time = date('Y-m-d \ H:i', strtotime($time));
$media_group = $item->children( 'media', true );
$title = $media_group->group->title;
$description = $media_group->group->description;
$views = $media_group->group->community->statistics->attributes()['views'];
}
echo $time . ' :: ' . $title . '<br>' . $description . '<br>' . $views . '<br>';

Related

Mixing PHP and HTML for Search Results Page

i'm working on setting up a page that uses PHP to layout the page, but i also need this PHP HTML code inserted into the PHP page for a search function. The search is done on a different page, and then the action is sent to the results page. I'm trying to get the PHP and HTML to mix together. I've tried using echo to no success. Basically i need the PHP HTML code to put the results into the $layout->content("");
<?php
require_once($_SERVER["DOCUMENT_ROOT"].'/layout/layout.inc.php');
require_once($_SERVER["DOCUMENT_ROOT"].'/functions/general.inc.php');
$layout = new default_layout();
$layout->title('IT KB Search');
$layout->content("<div class='border'>");
$layout->content('<h1>IT Support Knowledge Base - Search Results</h1>');
if (isset($_GET['q'])) {
$query = rawurlencode( strip_tags($_GET['q']));
$timestamp = time();
$baseUrl = 'https://oursite.atlassian.net/wiki';
$url = $baseUrl.'/rest/api/content/search?cql=space=KB%20AND%20type=page%20AND%20title~'.$query;
// To enable authenticated search:
// $url .= "&os_username=$username&os_password=$password";
$response = file_get_contents($url);
$response = json_decode($response);
$results = $response->results;
Echo '<div>';
Echo ' <ol>';
foreach($results as $item) {
Echo ' <li><strong><a href="';
$baseUrl. $item-> _links-> webui
Echo ' " target='_blank'>';
$item->title
Echo ' </a></strong></li>';
}
Echo ' </ol></div><hr>';
}
$layout->content("</div>");
$layout->render();
?>
You can to store the HTML to a string and then pass it to the $layout->content() function, like this...
<?php
require_once($_SERVER["DOCUMENT_ROOT"].'/layout/layout.inc.php');
require_once($_SERVER["DOCUMENT_ROOT"].'/functions/general.inc.php');
$layout = new default_layout();
$layout->title('IT KB Search');
$layout->content("<div class='border'>");
$layout->content('<h1>IT Support Knowledge Base - Search Results</h1>');
if (isset($_GET['q'])) {
$query = rawurlencode( strip_tags($_GET['q']));
$timestamp = time();
$baseUrl = 'https://oursite.atlassian.net/wiki';
$url = $baseUrl.'/rest/api/content/search?cql=space=KB%20AND%20type=page%20AND%20title~'.$query;
// To enable authenticated search:
// $url .= "&os_username=$username&os_password=$password";
$response = file_get_contents($url);
$response = json_decode($response);
$results = $response->results;
# Change Starts
$html = '<div>';
$html .= '<ol>';
foreach($results as $item) {
$html .= '<li><strong><a href="';
$html .= $baseUrl. $item-> _links-> webui;
$html .= '" target="_blank">';
$html .= $item->title;
$html .= '</a></strong></li>';
}
$html .= '</ol></div><hr>';
$html .= '</div>';
$layout->content($html);
# Change Ends
}
$layout->content('</div>');
$layout->render();

How to exclude posts that are categorized as "Tutorials" from RSS feed?

What I am doing is parsing RSS feed of a WordPress blog and getting information as title, link, description, and pubDate from the XML file.
<?php
$html = "";
$url = "http://www.ajneffects.com/blog/feed/";
$xml = simplexml_load_file($url);
for ($i = 0; $i < 5; $i++) {
$title = $xml->channel->item[$i]->title;
$link = $xml->channel->item[$i]->link;
$description = $xml->channel->item[$i]->description;
$pubDate = $xml->channel->item[$i]->pubDate;
$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "$description";
// $html .= "$pubDate";
}
echo $html;
Sorry for the bad markup, but that's it.
The thing is, I don't want to post tagged as "Tutorials" to show in the rendered HMTL file. I'm not sure how to get it done.
Try url like : http://www.ajneffects.com/blog/feed/?cat=-5
Here 5 is your "tutorials" category id
for more see: https://codex.wordpress.org/WordPress_Feeds

PHP: How to find an element with particular name attribute in html (from url)

I am currently using PHP's file_get_contents($url) to fetch content from a URL. After getting the contents I need to inspect the given HTML chunk, find a 'select' that has a given name attribute, extract its options, and their values text. I am not sure how to go about this, I can use PHP's simplehtmldom class to parse html, but how do I get a particular 'select' with name 'union'
<span class="d3-box">
<select name='union' class="blockInput" >
<option value="">Select a option</option> ..
Page can have multiple 'select' boxes and hence I need to specifically look by name attribute
<?php
include_once("simple_html_dom.php");
$htmlContent = file_get_contents($url);
foreach($htmlContent->find(byname['union']) as $element)
echo 'option : value';
?>
Any sort of help is appreciated. Thank you in advance.
Try this PHP code:
<?php
require_once dirname(__FILE__) . "/simple_html_dom.php";
$url = "Your link here";
$htmlContent = str_get_html(file_get_contents($url));
foreach ($htmlContent->find("select[name='union'] option") as $element) {
$option = $element->plaintext;
$value = $element->getAttribute("value");
echo $option . ":" . $value . "<br>";
}
?>
how about this:
$htmlContent = file_get_html('your url');
$htmlContent->find('select[name= "union"]');
in object oriented way:
$html = new simple_html_dom();
$htmlContent = $html->load_file('your url');
$htmlContent->find('select[name= "union"]');
From DOMDocument documentation: http://www.php.net/manual/en/class.domdocument.php
$html = file_get_contents( $url );
$dom = new DOMDocument();
$dom->loadHTML( $html );
$selects = $dom->getElementsByTagName( 'select' );
$select = $selects->item(0);
// Assuming all children are options.
$children = $select->childNodes;
$options_values = array();
for ( $i = 0; $i < $children->length; $i++ )
{
$item = $children->item( $i );
$options_values[] = $item->nodeValue;
}

Parsing XML in PHP DOM via cURL - can't get nodeValue if it is url address or date

I have this strange problem parsing XML document in PHP loaded via cURL. I cannot get nodeValue containing URL address (I'm trying to implement simple RSS reader into my CMS). Strange thing is that it works for every node except that containing url addresses and date ( and ).
Here is the code (I know it is a stupid solution, but I'm kinda newbie in working with DOM and parsing XML documents).
function file_get_contents_curl($url) {
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 4); // times out after 4s
$result = curl_exec($ch); // run the whole process
return $result;
}
function vypis($adresa) {
$html = file_get_contents_curl($adresa);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$desc = $doc->getElementsByTagName('description');
$ctg = $doc->getElementsByTagName('category');
$pd = $doc->getElementsByTagName('pubDate');
$ab = $doc->getElementsByTagName('link');
$aut = $doc->getElementsByTagName('author');
for ($i = 1; $i < $desc->length; $i++) {
$dsc = $desc->item($i);
$titles = $nodes->item($i);
$categorys = $ctg->item($i);
$pubDates = $pd->item($i);
$links = $ab->item($i);
$autors = $aut->item($i);
$description = $dsc->nodeValue;
$title = $titles->nodeValue;
$category = $categorys->nodeValue;
$pubDate = $pubDates->nodeValue;
$link = $links->nodeValue;
$autor = $autors->nodeValue;
echo 'Title:' . $title . '<br/>';
echo 'Description:' . $description . '<br/>';
echo 'Category:' . $category . '<br/>';
echo 'Datum ' . gmdate("D, d M Y H:i:s",
strtotime($pubDate)) . " GMT" . '<br/>';
echo "Autor: $autor" . '<br/>';
echo 'Link: ' . $link . '<br/><br/>';
}
}
Can you please help me with this?
To read RSS you shouldn't use loadHTML, but loadXML. One reason why your links don't show is because the <link> tag in HTML ignores its contents. See also here: http://www.w3.org/TR/html401/struct/links.html#h-12.3
Also, I find it easier to just iterate over the <item> tags and then iterate over their children nodes. Like so:
$d = new DOMDocument;
// don't show xml warnings
libxml_use_internal_errors(true);
$d->loadXML($xml_contents);
// clear xml warnings buffer
libxml_clear_errors();
$items = array();
// iterate all item tags
foreach ($d->getElementsByTagName('item') as $item) {
$item_attributes = array();
// iterate over children
foreach ($item->childNodes as $child) {
$item_attributes[$child->nodeName] = $child->nodeValue;
}
$items[] = $item_attributes;
}
var_dump($items);

Simple HTML Dom

Thanks for taking the time to read my post... I'm trying to extract some information from my website using Simple HTML Dom...
I have it reading from the HTML source ok, now I'm just trying to extract the information that I need. I have a feeling I'm going about this in the wrong way... Here's my script...
<?php
include_once('simple_html_dom.php');
// create doctype
$dom = new DOMDocument("1.0");
// display document in browser as plain text
// for readability purposes
//header("Content-Type: text/plain");
// create root element
$xmlProducts = $dom->createElement("products");
$dom->appendChild($xmlProducts);
$html = file_get_html('http://myshop.com/small_houses.html');
$html .= file_get_html('http://myshop.com/medium_houses.html');
$html .= file_get_html('http://myshop.com/large_houses.html');
//Define my variable for later
$product['image'] = '';
$product['title'] = '';
$product['description'] = '';
foreach($html->find('img') as $src){
if (strpos($src->src,"http://myshop.com") === false) {
$src->src = "http://myshop.com/$src->src";
}
$product['image'] = $src->src;
}
foreach($html->find('p[class*=imAlign_left]') as $description){
$product['description'] = $description->innertext;
}
foreach($html->find('span[class*=fc3]') as $title){
$product['title'] = $title->innertext;
}
echo $product['img'];
echo $product['description'];
echo $product['title'];
?>
I put echo's on the end for sake of testing...but I'm not getting anything... Any pointers would be a great HELP!
Thanks
Charles
file_get_html() returns a HTMLDom Object, and you cannot concatenate Objects, although HTMLDom have __toString methods when there concatenated there more then lilly corrupt in some way, try the following:
<?php
include_once('simple_html_dom.php');
// create doctype
$dom = new DOMDocument("1.0");
// display document in browser as plain text
// for readability purposes
//header("Content-Type: text/plain");
// create root element
$xmlProducts = $dom->createElement("products");
$dom->appendChild($xmlProducts);
$pages = array(
'http://myshop.com/small_houses.html',
'http://myshop.com/medium_houses.html',
'http://myshop.com/large_houses.html'
)
foreach($pages as $page)
{
$product = array();
$source = file_get_html($page);
foreach($source->find('img') as $src)
{
if (strpos($src->src,"http://myshop.com") === false)
{
$product['image'] = "http://myshop.com/$src->src";
}
}
foreach($source->find('p[class*=imAlign_left]') as $description)
{
$product['description'] = $description->innertext;
}
foreach($source->find('span[class*=fc3]') as $title)
{
$product['title'] = $title->innertext;
}
//debug perposes!
echo "Current Page: " . $page . "\n";
print_r($product);
echo "\n\n\n"; //Clear seperator
}
?>

Categories