reading twitter's rss search feed with simple xml - php

Having some trouble selecting some nodes in the rss feed for twitter's search
the rss url is here
http://search.twitter.com/search.rss?q=twitfile
each item looks like this
<item>
<title>RT #TwittBoy: TwitFile - Comparte tus archivos en Twitter (hasta 200Mb) http://bit.ly/xYNsM</title>
<link>http://twitter.com/MarielaCelita/statuses/5990165590</link>
<description>RT <a href="http://twitter.com/TwittBoy">#TwittBoy</a>: <b>TwitFile</b> - Comparte tus archivos en Twitter (hasta 200Mb) <a href="http://bit.ly/xYNsM">http://bit.ly/xYNsM</a></description>
<pubDate>Mon, 23 Nov 2009 22:45:39 +0000</pubDate>
<guid>http://twitter.com/MarielaCelita/statuses/5990165590</guid>
<author>MarielaCelita#twitter.com (M.Celita Lijerón)</author>
<media:content type="image/jpg" width="48" height="48" url="http://a3.twimg.com/profile_images/537676869/orkut_normal.jpg"/>
<google:image_link>http://a3.twimg.com/profile_images/537676869/orkut_normal.jpg</google:image_link>
</item>
My php is below
foreach ($twitter_xml->channel->item as $key) {
$screenname = $key->{"author"};
$date = $key->{"pubDate"};
$profimg = $key->{"google:image_link"};
$link = $key->{"link"};
$title = $key->{"title"};
echo"
<li>
<h5><a href=$link>$author</a></h5>
<p class=info><a href=$link>$title</a></p>
</li>
";
Problem is nothing is being echoed, i mean from the rss feed, if there are 20 results, its looping 20 times, just no data

In the code, $screenname is assigned a value but you are echoing $author.
To get elements within namespaces like google:image_link ,you will have to do this:
$g = $key->children("http://base.google.com/ns/1.0");
$profimg = $g->{"image_link"};
If you are wondering where did I get "http://base.google.com/ns/1.0" from, the namespace is mentioned in the second line of the rss feed.
$url="http://search.twitter.com/search.rss?q=twitfile";
$twitter_xml = simplexml_load_file($url);
foreach ($twitter_xml->channel->item as $key) {
$author = $key->{"author"};
$date = $key->{"pubDate"};
$link = $key->{"link"};
$title = $key->{"title"};
$g = $key->children("http://base.google.com/ns/1.0");
$profimg = $g->{"image_link"};
echo"
<li>
<h5><a href=$link>$author</a></h5>
<p class=info><a href=$link>$title</a></p>
</li>
";
$xml = $twitter_xml;
}
This code works.

Set error_reporting(E_ALL); and you'll see that $author isn't defined.
You can't access <google:image_link/> this way, you'll have to use XPath or children()
$key->children("google", true)->image_link;
If you use SimpleDOM, there's a shortcut that returns the first element of an XPath result:
$key->firstOf("google:image_link");

if (!$xml = simplexml_load_file('http://search.twitter.com/search.atom?q='.urlencode ($terms)))
{
throw new RuntimeException('Unable to load or parse search results feed');
}
if (!count($entries = $xml->entry))
{
throw new RuntimeException('No entry found');
}
for($i=0;$i<count($entries);$i++)
{
$title[$i] = $entries[$i]->title;
//etc.. continue description,,,,,
}

I made this and it works :)) $sea_name is the keyword your looking for...
<?php
function twitter_feed( $sea_name ){
$endpoint = 'http://search.twitter.com/search.rss?q='.urlencode($sea_name); // URL to call
$resp = simplexml_load_file($endpoint);
// Check to see if the response was loaded, else print an error
if ($resp) {
$results = '';
$counter=0;
// If the response was loaded, parse it and build links
foreach($resp->channel->item as $item) {
//var_dump($item);
preg_match("/\((.*?)\)/", $item->author, $blah);
$content = $item->children("http://search.yahoo.com/mrss/" );
$imageUrl = getXmlAttribute( $content, "url" );
echo '
<div class="twitter-item">
<img src="'.$imageUrl.'" />
<span class="twit">'.$blah[1].'</span><br />
<span class="twit-content">'.$item->title.'</span>
<br style="clear:both; line-height:0;margin:0;padding:0;">
</div>';
$counter++;
}
}
// If there was no response, print an error
else {
$results = "Oops! Must not have gotten the response!";
}
echo $results;
}
function getXmlAttribute( SimpleXMLElement $xmlElement, $attribute ) {
foreach( $xmlElement->attributes() as $name => $value ) {
if( $name == $attribute ) {
return (string)$value;
}
}
}
?>
The object will contain somthing like:
<!-- SimpleXMLElement Object
(
[title] => Before I go to bed, I just want to say I've just seen Peter Kay's CIN cartoon video for the 1st time... one word... WOW.
[link] => http://twitter.com/Alex_Segal/statuses/5993710015
[description] => Before I go to bed, I just want to say I&apos;ve just seen <b>Peter</b> <b>Kay</b>&apos;s CIN cartoon video for the 1st time... one word... WOW.
[pubDate] => Tue, 24 Nov 2009 01:00:00 +0000
[guid] => http://twitter.com/Alex_Segal/statuses/5993710015
[author] => Alex_Segal#twitter.com (Alex Segal)
)
-->
You can use any of it inside the foreach look and echo them such as $item->author, $item->link, etc....any other attributes you can use the getattribute function...

Related

How to parse html inside xml's tag

I need help in getting data from description tag where it consists of <a>, <img>and some text. The xml I am trying to parse is this
I managed to get all the data I need, except for description tag where I got <a> tag along with description text. What I need is img's src and the description text.
My code :
foreach ($rss->getElementsByTagName('item') as $node) {
/*$test = $node->getElementsByTagName('description');
$test = $test->item(0)->textContent;*/
var_dump($test);
exit;
$nodes = $node->getElementsByTagName('content');
if(!is_object($nodes) || $nodes === null || $nodes->length==0){
$linkthumbNode = $node->getElementsByTagName('image');
if(isset($linkthumbNode) && $linkthumbNode->length >0){
$linkthumb=$linkthumbNode->item(0)->nodeValue;
if(empty($linkthumb)||$linkthumb == " "){
$linkthumb = $linkthumbNode->item(0)->getAttribute('src');
}
}else{
$linkthumb = "NO IMAGE";
}
}else{
$linkthumb = $nodes->item(0)->getAttribute('url');
}
$title = $node->getElementsByTagName('title')->item(0)->nodeValue;
$desc = $node->getElementsByTagName('description')->item(0)->textContent;
$link = $node->getElementsByTagName('link')->item(0)->nodeValue;
$img = $linkthumb;
$date = $node->getElementsByTagName('pubDate');
if(isset($date) && $date->length >0){
$date = $date->item(0)->nodeValue;
}else{
$date = "no date provided";
}
$item = array (
'title' => $title,
'desc' => $desc,
'link' => $link,
'img' => $img,
'date' => $date,
);
array_push($feed, $item);
}
the xml description tag is :
<description>
<img border="0" hspace="10" align="left" style="margin-top:3px;margin-right:5px;" src="http://timesofindia.indiatimes.com/photo/20984744.cms" />Nine food combinations that will make staying healthy and looking fit easier
</description>
what I need: http://timesofindia.indiatimes.com/photo/20984744.cms as image and Nine food combinations that will make staying healthy and looking fit easier as my description.
Can someone help me? I'm not that great at PHP and parsing XML.
Maybe I am a little late to the party, but if an answer is still needed, check out my solution. I use PHP DOMDocument and regular expressions since I haven't found a simple way to get the needed data using only XML-extensions.
$rss = file_get_contents('https://timesofindia.indiatimes.com/rssfeeds/2886704.cms');
$feed = new DOMDocument();
$feed->loadXML($rss);
$items = array();
foreach($feed->getElementsByTagName('item') as $item) {
$arr = array();
foreach($item->childNodes as $child) {
if($child->nodeName === 'title' || $child->nodeName === 'link') $arr[$child->nodeName] = $child->nodeValue;
if($child->nodeName === 'pubDate') $arr['date'] = $child->nodeValue;
if($child->nodeName === 'description') {
preg_match('/(?<=src=[\'\"])(.+)(?=[\'\"])/i', $child->nodeValue, $matches);
$arr['img'] = $matches[0];
preg_match('/[^>]+$/i', $child->nodeValue, $matches);
$arr['desc'] = $matches[0];
}
}
array_push($items, $arr);
}
print_r($items);
The output is like this and seems to be what you needed:
Array ( [0] => Array ( [title] => 5 reasons you get sore after sex [img] => https://timesofindia.indiatimes.com/photo/61101815.cms [desc] => Sometimes, a super-filmy, almost-perfect sex leaves you all euphoric but only to end with soreness later. So, what is it that is going wrong? Can it be remedied? [link] => https://timesofindia.indiatimes.com/life-style/health-fitness/health-news/5-reasons-you-get-sore-after-sex/life-style/health-fitness/health-news/5-reasons-you-get-sore-after-sex/photostory/61101724.cms [date] => Mon, 16 Oct 2017 10:21:27 GMT )...

PHP - RSS Parser XML

Question: How to Parse <media:content URL="IMG" /> from XML?
OK. This is like asking why 1+1 = 2. And 2+2=Not Available.
Orginal Link:
How to Parse XML With SimpleXML and PHP // By: John Morris.
https://www.youtube.com/watch?v=_1F1Iq1IIS8
Using his method I can easily reach items on RSS FEED New York Times
With Following Code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>How to Parse XML with SimpleXML and PHP</title>
</head>
<body>
<?php
$url = 'http://rss.nytimes.com/services/xml/rss/nyt/Sports.xml';
$xml = simplexml_load_file($url) or die("Can't connect to URL");
?><pre><?php //print_r($xml); ?></pre><?php
foreach ($xml->channel->item as $item) {
printf('<li>%s</li>', $item->link, $item->title);
}
?>
</body>
</html>
GIVES:
Sparky Lyle in Monument Park? Fans Say Yes, but He Disagrees
The Thickly Accented American Behind the N.B.A. in France
On Pro Basketball: ‘That Got Ugly in a Hurry’: More Playoff Pain Delivered by the Spurs
...
BUT
TO reach media:content you cannot use simplexml_load_file as it doesn't grab any media.content tags.
So... Yes.. I searched around on the Webb.
I found this example on StackOverflow:
get media:description and media:content url from xml
But using the Code:
<?php
function feeds()
{
$url = "http://rss.nytimes.com/services/xml/rss/nyt/Sports.xml"; // xmld.xml contains above data
$feeds = file_get_contents($url);
$rss = simplexml_load_string($feeds);
foreach($rss->channel->item as $entry) {
if($entry->children('media', true)->content->attributes()) {
$md = $entry->children('media', true)->content->attributes();
print_r("$md->url");
}
}
}
?>
Gave me no errors. But also a blank page.
And it seems most people (googling) has little to no idea how to really use media:content . So I have to turn for Stackoverflow and hope someone can provide an answer. Im even willing to not use SimpleXML.
What I want.. is .. to grab media:content url IMAGES and use them on a external site.
Also.. if possible.
I would like to put the XML parsed items into a SQL database.
I came up with this:
<?php
$url = "http://rss.nytimes.com/services/xml/rss/nyt/Sports.xml"; // xmld.xml contains above data
$feeds = file_get_contents($url);
$rss = simplexml_load_string($feeds);
$items = [];
foreach($rss->channel->item as $entry) {
$image = '';
$image = 'N/A';
$description = 'N/A';
foreach ($entry->children('media', true) as $k => $v) {
$attributes = $v->attributes();
if ($k == 'content') {
if (property_exists($attributes, 'url')) {
$image = $attributes->url;
}
}
if ($k == 'description') {
$description = $v;
}
}
$items[] = [
'link' => $entry->link,
'title' => $entry->title,
'image' => $image,
'description' => $description,
];
}
print_r($items);
?>
Giving:
Array
(
[0] => Array
(
[link] => SimpleXMLElement Object
(
[0] => https://www.nytimes.com/2017/04/17/sports/basketball/a-court-used-for-playing-hoops-since-1893-where-paris.html?partner=rss&emc=rss
)
[title] => SimpleXMLElement Object
(
[0] => A Court Used for Playing Hoops Since 1893. Where? Paris.
)
[image] => SimpleXMLElement Object
(
[0] => https://static01.nyt.com/images/2017/04/05/sports/basketball/05oldcourt10/05oldcourt10-moth-v13.jpg
)
[description] => SimpleXMLElement Object
(
[0] => The Y.M.C.A. in Paris says its basketball court, with its herringbone pattern and loose slats, is the oldest one in the world. It has been continuously functional since the building opened in 1893.
)
)
.....
And you can iterate over
foreach ($items as $item) {
printf('<img src="%s">', $item['image']);
printf('%s', $item['url'], $item['title']);
}
Hope this helps.

viewing XML data if attribute value equals variable value

I'm stuck on something extremely simple.
Here is my xml feed:
http://xml.betfred.com/Horse-Racing-Daily.xml
Here is my code
<?php
function HRList5($viewbets) {
$xmlData = 'http://xml.betfred.com/Horse-Racing-Daily.xml';
$xml = simplexml_load_file($xmlData);
$curdate = date('d/m/Y');
$new_array = array();
foreach ($xml->event as $event) {
if($event->bettype->attributes()->bettypeid == $viewbets){//$_GET['evid']){
// $eventid = $_GET['eventid'];
// if ($limit == $c) {
// break;
// }
// $c++;
$eventd = substr($event->attributes()->{'date'},6,2);
$eventm = substr($event->attributes()->{'date'},4,2);
$eventy = substr($event->attributes()->{'date'},0,4);
$eventt = $event->attributes()->{'time'};
$eventid = $event->attributes()->{'eventid'};
$betname = $event->bettype->bet->attributes()->{'name'};
$bettypeid = $event->bettype->attributes()->{'bettypeid'};
$betprice = $event->bettype->bet->attributes()->{'price'};
$betid = $event->bettype->bet->attributes()->{'id'};
$new_array[$betname.$betid] = array(
'betname' => $betname,
'viewbets' => $viewbets,
'betid' => $betid,
'betname' => $betname,
'betprice' => $betprice,
'betpriceid' => $event->bettype->attributes()->{'betid'},
);
}
ksort($new_array);
$limit = 10;
$c = 0;
foreach ($new_array as $event_time => $event_data) {
// $racedate = $event_data['eventy'].$event_data['eventm'].$event_data['eventd'];
$today = date('Ymd');
//if($today == $racedate){
// if ($limit == $c) {
// break;
//}
//$c++;
$replace = array("/"," ");
// $eventname = str_replace($replace,'-', $event_data['eventname']);
//$venue = str_replace($replace,'-', $event_data['venue']);
echo "<div class=\"units-row unit-100\">
<div class=\"unit-20\" style=\"margin-left:0px;\">
".$event_data['betprice']."
</div>
<div class=\"unit-50\">
".$event_data['betname'].' - '.$event_data['betprice']."
</div>
<div class=\"unit-20\">
<img src=\"betnow.gif\" ><br />
</div>
</div>";
}
}//echo "<strong>View ALL Horse Races</strong> <strong>>></strong>";
//var_dump($event_data);
}
?>
Now basically the XML file contains a list of horse races that are happening today.
The page I call the function on also declares
<?php $viewbets = $_GET['EVID'];?>
Then where the function is called I have
<?php HRList5($viewbets);?>
I've just had a play around and now it displays the data in the first <bet> node
but the issue is it's not displaying them ALL, its just repeating the 1st one down the page.
I basically need the xml feed queried & if the event->bettype->attributes()->{'bettypeid'} == $viewbets I want the bet nodes repeated down the page.
I don't use simplexml so can offer no guidance with that - I would say however that to find the elements and attributes you need within the xml feed that you ought to use an XPath query. The following code will hopefully be of use in that respect, it probably has an easy translation into simplexml methods.
Edit: Rather than targeting each bet as the original xpath did which then caused issues, the following should be more useful. It targets the bettype and then processes the childnodes.
/* The `eid` to search for in the DOM document */
$eid=25573360.20;
/* create the DOM object & load the xml */
$dom=new DOMDocument;
$dom->load( 'http://xml.betfred.com/Horse-Racing-Daily.xml' );
/* Create a new XPath object */
$xp=new DOMXPath( $dom );
/* Search the DOM for nodes with particular attribute - bettypeid - use number function from XSLT to test */
$oCol=$xp->query('//event/bettype[ number( #bettypeid )="'.$eid.'" ]');
/* If the query was successful there should be a nodelist object to work with */
if( $oCol ){
foreach( $oCol as $node ) {
echo '
<h1>'.$node->parentNode->getAttribute('name').'</h1>
<h2>'.date('D, j F, Y',strtotime($node->getAttribute('bet-start-date'))).'</h2>';
foreach( $node->childNodes as $bet ){
echo "<div>Name: {$bet->getAttribute('name')} ID: {$bet->getAttribute('id')} Price: {$bet->getAttribute('price')}</div>";
}
}
} else {
echo 'XPath query failed';
}
$dom = $xp = $col = null;

Trim characters from RSS feed

I'm calling in a RSS feed to my website using PHP. Currently my code below is calling in the entire contents for pubDate:
<pubDate>Thu, 12 Sep 2013 07:23:59 +0000</pubDate>
How do I just display the day and month from the above example i.e. 12 Sep?
EDIT
I should clarify, the above line of code is an example output I currently get but as I'm calling the latest 3 posts from an RSS feed, this date and time will vary. I therefore need the code to be more dynamic (if that's the right term!)
This code is my full code that fetches the contents of an RSS feed:
<?php
$counter = 0;
$xml=simplexml_load_file("http://tutorial.world.edu/feed/");
foreach ($xml->channel->item as $item) {
$title = (string) $item->title; // Title Post
$link = (string) $item->link; // Url Link
$pubDate = (string) $item->pubDate; // date
$description = (string) $item->description; //Description Post
echo '<div class="display-rss-feed"><a href="'.$link.'" target="_blank" title="" >'.$title.' </a><br/><br/>';
echo $description.'<hr><p style="background-color:#e4f;">'.$pubDate.'</p></div>';
if($counter == 2 ) {
break;
} else {
$counter++;
}
} ?>
Use strtotime and date:
$pubDate = 'Thu, 12 Sep 2013 07:23:59 +0000';
$pubDate = date('j M', strtotime($pubDate)); //This is the only one you need!
var_dump($pubDate); //string(6) "12 Sep"
You can parse the date using date_parse and then use the values of month and day in the resulting array.
you can use preg_match() function with desired regular express to fetch particular data.
for example
$content="Thu, 12 Sep 2013 07:23:59 +0000";
preg_match("/.*,(. *)20[0-9][0-9]/"," $content",$g_val) ;
$g_val[1] would have " 12 Sep"
Even this works
<?php
$str="<pubDate>Thu, 12 Sep 2013 07:23:59 +0000</pubDate>";
$str=explode(" ",$str);
echo $str[1]." ".$str[2];//12 Sep
EDIT:
<?php
$counter = 0;
$xml=simplexml_load_file("http://tutorial.world.edu/feed/");
foreach ($xml->channel->item as $item) {
$title = (string) $item->title; // Title Post
$link = (string) $item->link; // Url Link
$pubDate = (string) $item->pubDate; // date
$pubDate=explode(" ",$pubDate);
$pubDate = $pubDate[1]." ".$pubDate[2];
$description = (string) $item->description; //Description Post
echo '<div class="display-rss-feed"><a href="'.$link.'" target="_blank" title="" >'.$title.' </a><br/><br/>';
echo $description.'<hr><p style="background-color:#e4f;">'.$pubDate.'</p></div>';
if($counter == 2 ) {
break;
} else {
$counter++;
}
} ?>

Zend_Dom gives you a DOMElement... how do I use it?

I'm trying to use Zend_Dom for some very light screen scraping (I want to grab a headline, some body text and a link from a small block of news items on my website) and I'm not sure how to handle the DOMElement that it gives me.
In the manual for Zend_Dom the code says:
foreach ($results as $result) {
// $result is a DOMElement
}
How do I make use of this DOMElement?
A detailed example (looking for the anchor elements on Google):
$url='http://google.com/';
$client = new Zend_Http_Client($url);
$response = $client->request();
$html = $response->getBody();
$dom = new Zend_Dom_Query($html);
$results = $dom->query('a');
foreach($results as $r){
Zend_Debug::dump($r);
}
This gives me:
object(DOMElement)#81 (0) {
}
object(DOMElement)#82 (0) {
}
object(DOMElement)#83 (0) {
}
... etc, etc...
What I find confusing is that this looks like each element contains nothing (0)! This isn't the case but that is my first impression. So I poke around online and find I can add nodeValue to get something out of this:
Zend_Debug::dump($r->nodeValue);
which gives me:
string(6) "Images"
string(6) "Videos"
string(4) "Maps"
...etc, etc...
But where I run into trouble is getting specific elements and their contents.
For instance given this html:
<div class="newsBlurb">
<span class="newsDate">Mon, 11 October 2010</span>
<h3 class="newsHeadline">Some text</h3>
<a class="newsMore" href="http://foo.com/1/2/">More</a>
</div>
<div class="hr"></div>
<div class="newsBlurb">
<span class="newsDate">Mon, 16 August 2010</span>
<h3 class="newsHeadline">Stuff is here</h3>
<a class="newsMore" href="http://bar.com/pants.html">More</a>
</div>
I can grab the text from each newsBlurb, using the technique I use in the Google example, but cannot get each element by itself. I want to get the date and stick it somewhere, get the headline text and stick it somewhere and get the link to use. But all I get is the actual text in the div.
How do I get what I want from this?
EDIT
Here is another example that does not work as I expect. Any ideas why?
$url = 'http://php.net/manual/en/class.domelement.php';
$client = new Zend_Http_Client($url);
$response = $client->request();
$html = $response->getBody();
$dom = new Zend_Dom_Query($html);
$newsBlurbNode = $dom->query('div.note');
Zend_Debug::dump($newsBlurbNode);
this gives me:
object(Zend_Dom_Query_Result)#867 (7) {
["_count":protected] => NULL
["_cssQuery":protected] => string(8) "div.note"
["_document":protected] => object(DOMDocument)#79 (0) {
}
["_nodeList":protected] => object(DOMNodeList)#864 (0) {
}
["_position":protected] => int(0)
["_xpath":protected] => NULL
["_xpathQuery":protected] => string(33) "//div[contains(#class, ' note ')]"
}
Trying to get anything out of this I used:
$children = $newsBlurbNode->childNodes;
foreach ($children as $child) {
}
Which results in an error because the foreach loop has nothing in it. Ack! What am I not getting?
You can use something like this to get access to the individual nodes:
$children = $newsBlurbNode->childNodes;
foreach ($children as $child) {
//do something with individual nodes
}
Otherwise I would go through: http://php.net/manual/en/class.domelement.php
Hey I have been messing around with something similar - let me know if this is sufficient to help you out - if not I can explain it some more.
$data = "<p id='p_1'><a href='testing1.html'><span>testing in a span 1</span></a></p>
<p id='p_2'><a href='testing2.html'></a></p>
<p id='p_3'><a href='testing3.html'><span>testing in a span 3</span></a></p>
<p id='p_4'><a href='testing4.html'><span>testing in a span 4</span></a></p>
<p id='p_5'><a href='testing5.html'><span>testing in a span 5</span></a></p>";
$dom = new Zend_Dom_Query();
$dom->setDocumentHtml($data);
//Look for any links inside of paragraph tags
$results = $dom->query('p a');
foreach($results as $r){
echo "Parent Tag: ".$r->nodeName."<br />";
echo $r->nodeValue."<br />";
$children = $r->childNodes;
if($children->length > 0){
$children = $r->childNodes;
foreach($children as $c){
echo "Child Tag: <br />";
echo $c->nodeName."<br />";
echo $c->nodeValue."<br />";
}
}
echo $r->getAttribute('href')."<br /><br />";
}
echo $data;

Categories