Below is the code. It seems to not be opening the google calendar at all. I believe it has something to do with the url I am using and possbily the special character. I get the following:
Warning: DOMDocument::loadHTMLFile() [domdocument.loadhtmlfile]: I/O warning : failed to load external entity "https://www.google.com/calendar/feeds/tjd#dreilinginc.com/public/basic" on line 4
<?
$dom = new DOMDocument();
$feed = "https://www.google.com/calendar/feeds/tjd#dreilinginc.com/public/basic";
$html = $dom->loadHTMLFile($feed);
$dom->preserveWhiteSpace = true;
$entries = $dom->getElementsByTagName("entry");
foreach ( $entries as $entry )
{
$status = $entry->getElementsByTagName( "eventStatus" );
$eventStatus = $status->item(0)->getAttributeNode("value")->value;
if ($eventStatus == $confirmed)
{
$titles = $entry->getElementsByTagName( "title" );
$title = $titles->item(0)->nodeValue;
$times = $entry->getElementsByTagName( "when" );
$startTime = $times->item(0)->getAttributeNode("startTime")->value;
$when = date( "l jS \o\f F Y - h:i A", strtotime( $startTime ) );
$places = $entry->getElementsByTagName( "where" );
$where = $places->item(0)->getAttributeNode("valueString")->value;
print $title . "\n";
print $when . " AST\n";
print $where . "\n";
print "\n";
}
}
?>
As far as I know, DOMDocument::loadHTMLFile() is capable of negotiating SSL, but if it is failing you might try file_get_contents() to read the file first into a string.
$dom = new DOMDocument();
$feed = "https://www.google.com/calendar/feeds/tjd#dreilinginc.com/public/basic";
$feed_string = file_get_contents($feed);
$html = $dom->loadHTMLFile($feed_string);
This is fully speculative though. Treat it as such.
EDIT Make sure that allow_url_fopen is enabled in your php.ini.
Related
I am trying to scrape this url https://nrg91.gr/nrg-airplay-chart/ using simple-html-dom, but it does not seem to get the full html source code. This code:
include_once('simple_html_dom.php');
$html = file_get_html('https://nrg91.gr/nrg-airplay-chart');
echo $html->plaintext;
displays the content up to the h1, just before the content I am after. And from the simple-html-dom manual examples, this should display all links from that url:
foreach($html->find('a') as $e)
echo $e->href . '<br>';
but it only displays the links up to the main navigation menu, not from the main body or footer.
I also tried using prerender.com, to fully load url before passing it to file_get_html but the result was the same. What am I doing wrong?
That library looks like it hasn't been updated in 7 years. I'd always recommend using PHP's built-in functions:
$url = "https://nrg91.gr/nrg-airplay-chart/";
$dom = new DomDocument();
libxml_use_internal_errors(true);
$dom->load($url);
foreach($dom->getElementsByTagName("a") as $e) {
echo $e->getAttribute("href") . "\n";
}
Here's my super dirty approach to fetching the rank/artist/title/youtube data using both DOMDocument and SimpleXML.
The concept is to locate each "row" of data via the xpath //ul[#id="chart_ul"]/li, then using dom_import_simplexml( $outer )->getNodePath() to build a new xpath to select the individual elements where the desired data can be located.
$temp = sys_get_temp_dir() . DIRECTORY_SEPARATOR . 'nrg-airplay-chart.html';
if( file_exists( $temp ) === false or filemtime( $temp ) < time() - 3600 )
{
file_put_contents( $temp, $html = file_get_contents('https://nrg91.gr/nrg-airplay-chart/') );
}
else
{
$html = file_get_contents( $temp );
}
$dom = new DOMDocument();
$dom->loadHTML( $html );
$xml = simplexml_import_dom( $dom );
$array = array();
foreach( $xml->xpath('//ul[#id="chart_ul"]/li') as $index => $set )
{
$basexpath = dom_import_simplexml( $set )->getNodePath();
$array[] = array(
'ranking' => (string) $xml->xpath( $basexpath . '//span[#id="ranking"]' )[0],
'artist' => (string) $xml->xpath( $basexpath . '//p[#id="artist"]/b' )[0],
'title' => (string) $xml->xpath( $basexpath . '//p[#id="title"]' )[0],
'youtube' => (string) $xml->xpath( $basexpath . '//div[#id="media"]/a/#href' )[0],
);
}
print_r( $array );
Another approach you might wanna comply:
<?php
function get_content($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_exec($ch);
$htmlContent = curl_exec($ch);
curl_close($ch);
return $htmlContent;
}
$link = "https://nrg91.gr/nrg-airplay-chart/";
$xml = get_content($link);
$dom = #DOMDocument::loadHTML($xml);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//li[contains(#id,"wprs_chart-")]') as $items){
$artist = $xpath->query('.//p[#id="artist"]/b',$items)->item(0)->nodeValue;
$title = $xpath->query('.//p[#id="title"]',$items)->item(0)->nodeValue;
echo "{$artist} -- {$title}<br>";
}
?>
Output you should get like:
PORTOGAL THE MAN -- Feel It Still
JAX JONEW Feat INA WROLDSEN -- Breathe
CAMILA CABELLO -- Havana
CARBI B, J BALVIN & BAD BUNNY -- I Like It
ZAYN Feat SIA -- Dusk Till Dawn
I've found a great tutorial on how to accomplish most of the work at:
https://www.developphp.com/video/PHP/simpleXML-Tutorial-Learn-to-Parse-XML-Files-and-RSS-Feeds
but I can't understand how to extract media:content images from the feeds. I've read as much info as i can find, but i'm still stuck.
ie: How to get media:content with SimpleXML
this suggests using:
foreach ($xml->channel->item as $news){
$ns_media = $news->children('http://search.yahoo.com/mrss/');
echo $ns_media->content; // displays "<media:content>"}
but i can't get it to work.
Here's my script and feed i'm trying to parse:
<?php
$html = "";
$url = "http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC";
$xml = simplexml_load_file($url);
for($i = 0; $i < 10; $i++){
$title = $xml->channel->item[$i]->title;
$link = $xml->channel->item[$i]->link;
$description = $xml->channel->item[$i]->description;
$pubDate = $xml->channel->item[$i]->pubDate;
$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "$description";
$html .= "<br />$pubDate<hr />";
}
echo $html;
?>
I don't know where to add this code into the script to make it work. Honestly, i've browsed for hours, but couldn't find working script that would parse media:content.
Can someone help with this?
========================
UPDATE:
Thanx to fusion3k, i got the final code working:
<?php
$html = "";
$url = "http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC";
$xml = simplexml_load_file($url);
for($i = 0; $i < 5; $i++){
$image = $xml->channel->item[$i]->children('media', True)->content->attributes();
$title = $xml->channel->item[$i]->title;
$link = $xml->channel->item[$i]->link;
$description = $xml->channel->item[$i]->description;
$pubDate = $xml->channel->item[$i]->pubDate;
$html .= "<img src='$image' alt='$title'>";
$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "$description";
$html .= "<br />$pubDate<hr />";
}
echo $html;
?>
Basically all i needed was this simple line:
$image = $xml->channel->item[$i]->children('media', True)->content->attributes();
Can't believe it was so hard for non techie to find this info online after reading dozens of posts and articles. Well, hope this will serve well for other folks like me :)
To get 'url' attribute, use ->attribute() syntax:
$ns_media = $news->children('http://search.yahoo.com/mrss/');
/* Echoes 'url' attribute: */
echo $ns_media->content->attributes()['url'];
// in php < 5.5: $attr = $ns_media->content->attributes(); echo $attr['url'];
/* Catches 'url' attribute: */
$url = $ns_media->content->attributes()['url']->__toString();
// in php < 5.5: $attr = $ns_media->content->attributes(); $url = $attr['url']->__toString();
Namespaces explanation:
The ->children() arguments is not the URL of your XML, it is a Namespace URI.
XML namespaces are used for providing uniquely named elements and attributes in an XML document:
<xxx> Standard XML tag
<yyy:zzz> Namespaced tag
└┬┘ └┬┘
│ └──── Element Name
└──────── Element Prefix (Namespace Identifier)
So, in your case, <media:content> is the “content” element of Namespace “media”. Namespaced elements must be have an associated Namespace URI, as attribute of a parent node or — most commonly — of the root element: this attribute has the form xmlns:yyy="NamespaceURI" (in your case xmlns:media="http://search.yahoo.com/mrss/" as attribute of root node <rss>).
Ultimately, the above $news->children( 'http://search.yahoo.com/mrss/' ) means “retrieve all children elements with http://search.yahoo.com/mrss/ as Namespace URI; an alternative — most intelligible — syntax is: $news->children( 'media', True ) (True means “regarded as a prefix”).
Returning to the code in example, the generic syntax to retrieve all first item's children with prefix media is:
$xml = simplexml_load_file( 'http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC' );
$xml->channel->item[0]->children( 'http://search.yahoo.com/mrss/' );
or (identical result):
$xml = simplexml_load_file( 'http://rssfeeds.webmd.com/rss/rss.aspx?RSSSource=RSS_PUBLIC' );
$xml->channel->item[0]->children( 'media', True );
Your new code:
If you want to show the <media:content url> thumbnail for each element in your page, modify the original code in this way:
(...)
$pubDate = $xml->channel->item[$i]->pubDate;
$image = $xml->channel->item[$i]->children( 'media', True )->content->attributes()['url'];
// in php < 5.5:
// $attr = $xml->channel->item[$i]->children( 'media', True )->content->attributes();
// $image = $attr['url'];
$html .= "<a href='$link'><h3>$title</h3></a>";
$html .= "<img src='$image' alt='$title'>";
(...)
Simple example for newbs like me:
$url = "https://www.youtube.com/feeds/videos.xml?channel_id=UCwNPPl_oX8oUtKVMLxL13jg";
$rss = simplexml_load_file($url);
foreach($rss->entry as $item) {
$time = $item->published;
$time = date('Y-m-d \ H:i', strtotime($time));
$media_group = $item->children( 'media', true );
$title = $media_group->group->title;
$description = $media_group->group->description;
$views = $media_group->group->community->statistics->attributes()['views'];
}
echo $time . ' :: ' . $title . '<br>' . $description . '<br>' . $views . '<br>';
I am trying to parse a Google Calendar to use on our TV's to display 'Today's Events'.
While I am most of the way there thanks to the help of a friend, I wanted to see if somebody could help me the rest of the way.
The code below generates the calendar with all the information, but for EVERY entry it shows the date. Since they are all the same day, this is kind of frustrating and confusing when looking at it. I am nowhere near a programmer, but I can make sense of some things.
How would I group all Todays events under a single date heading?
Thanks in advance.
<?php
$confirmed = 'http://schemas.google.com/g/2005#event.confirmed';
$three_months_in_seconds = 60 * 60 * 24 * 28 * 3;
$three_months_ago = date("Y-m-d\Th:i:s", time() - $three_months_in_seconds);
$three_months_from_today = date("Y-m-d\Th:i:s", time() + $three_months_in_seconds);
$params = "?orderby=starttime&start-min=" . $three_months_ago . "&start-max=" . $three_months_from_today;
//$params = "?orderby=starttime&start-min=2012-12-01T05:48:47&start-max=2013-05-07T05:48:47&sortorder=a&singleevents=true&futureevents=true";
$params = "?orderby=starttime&sortorder=a&singleevents=true&futureevents=true";
$feed = "https://www.google.com/calendar/feeds/REDACTED%40gmail.com/private-REDACTED/full".$params;
$doc = new DOMDocument();
if (!$doc->load( $feed )) echo 'failed to load';
$entries = $doc->getElementsByTagName( "entry" );
foreach ( $entries as $entry ) {
$status = $entry->getElementsByTagName( "eventStatus" );
$eventStatus = $status->item(0)->getAttributeNode("value")->value;
if ($eventStatus == $confirmed) {
$titles = $entry->getElementsByTagName( "title" );
$title = $titles->item(0)->nodeValue;
$times = $entry->getElementsByTagName( "when" );
$startTime = $times->item(0)->getAttributeNode("startTime")->value;
$when = date( "D M j, Y", strtotime( $startTime ) );
$time = date("g:i A",strtotime($startTime));
$places = $entry->getElementsByTagName( "where" );
$where = $places->item(0)->getAttributeNode("valueString")->value;
print "<div class='row when'>$when</div>";
echo "<div class='row event'><span class='time'>$time</span><span class='title'>$title</span><span class='where'>$where</span></div>";
// print $where . "\n";
print "\n";
}
}
?>
Have an answer:
just change this:
print "<div class='row when'>$when</div>";
to this:
if ($old_when!=$when) print "<div class='row when'>$when</div>"; $old_when=$when;
and add
$old_when = null;
before the foreach
I have a php readfile script, like this:
<?php
$contentFile = "http://google.com";
readfile( $contentFile );
?>`
I want to insert a code in a specific line in the output of the readfile.
Example:
<html>
{top_code}
{Code i want to insert here}
{bottom-code}
</html>
How can I make this possible?
You can't. readfile() streams whatever you're reading out to the user's browser. You could use the output buffering mechanism to capture that data instead, but then you might as well just use file_get_contents() instead and save yourself a few extra lines of code.
file_get_contents returns the requested file/url as a string. Then you use standard string or DOM operations to manipulate that 'page'.
This can do the job
$contentFile = "http://google.com";
$html = file_get_contents($contentFile);
$html = explode("\n",$html);
$line = $line_number - 1;
array_splice($html, $line, 0,"Burim Shala");
$html = implode("\n",$html);
I found this, to be the solution to the problem of RSS feeds using PHP without MYSQL
Thanks to http://bavotasan.com/2010/display-rss-feed-with-php/
$rss = new DOMDocument();
$rss->load('http://wordpress.org/news/feed/');
$feed = array();
foreach ($rss->getElementsByTagName('item') as $node) {
$item = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
);
array_push($feed, $item);
}
$limit = 5;
for($x=0;$x<$limit;$x++) {
$title = str_replace(' & ', ' & ', $feed[$x]['title']);
$link = $feed[$x]['link'];
$description = $feed[$x]['desc'];
$date = date('l F d, Y', strtotime($feed[$x]['date']));
echo '<p><strong>'.$title.'</strong><br />';
echo '<small><em>Posted on '.$date.'</em></small></p>';
echo '<p>'.$description.'</p>';
}
I have this strange problem parsing XML document in PHP loaded via cURL. I cannot get nodeValue containing URL address (I'm trying to implement simple RSS reader into my CMS). Strange thing is that it works for every node except that containing url addresses and date ( and ).
Here is the code (I know it is a stupid solution, but I'm kinda newbie in working with DOM and parsing XML documents).
function file_get_contents_curl($url) {
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 4); // times out after 4s
$result = curl_exec($ch); // run the whole process
return $result;
}
function vypis($adresa) {
$html = file_get_contents_curl($adresa);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$desc = $doc->getElementsByTagName('description');
$ctg = $doc->getElementsByTagName('category');
$pd = $doc->getElementsByTagName('pubDate');
$ab = $doc->getElementsByTagName('link');
$aut = $doc->getElementsByTagName('author');
for ($i = 1; $i < $desc->length; $i++) {
$dsc = $desc->item($i);
$titles = $nodes->item($i);
$categorys = $ctg->item($i);
$pubDates = $pd->item($i);
$links = $ab->item($i);
$autors = $aut->item($i);
$description = $dsc->nodeValue;
$title = $titles->nodeValue;
$category = $categorys->nodeValue;
$pubDate = $pubDates->nodeValue;
$link = $links->nodeValue;
$autor = $autors->nodeValue;
echo 'Title:' . $title . '<br/>';
echo 'Description:' . $description . '<br/>';
echo 'Category:' . $category . '<br/>';
echo 'Datum ' . gmdate("D, d M Y H:i:s",
strtotime($pubDate)) . " GMT" . '<br/>';
echo "Autor: $autor" . '<br/>';
echo 'Link: ' . $link . '<br/><br/>';
}
}
Can you please help me with this?
To read RSS you shouldn't use loadHTML, but loadXML. One reason why your links don't show is because the <link> tag in HTML ignores its contents. See also here: http://www.w3.org/TR/html401/struct/links.html#h-12.3
Also, I find it easier to just iterate over the <item> tags and then iterate over their children nodes. Like so:
$d = new DOMDocument;
// don't show xml warnings
libxml_use_internal_errors(true);
$d->loadXML($xml_contents);
// clear xml warnings buffer
libxml_clear_errors();
$items = array();
// iterate all item tags
foreach ($d->getElementsByTagName('item') as $item) {
$item_attributes = array();
// iterate over children
foreach ($item->childNodes as $child) {
$item_attributes[$child->nodeName] = $child->nodeValue;
}
$items[] = $item_attributes;
}
var_dump($items);