how to add offset and limit to my php rss parser? - php

how to add an offset and limit to my PHP rss parser that returns the result as an object ?, here is what i have at the moment. it doesn't have any offset nor limit, how to approach this ?
class Rss
{
/*
*#access public
*#params url,int ''=default,int ''=default
*#usage input url,offset and limit,
*#returns content based onf the offset/limit input
*/
public function getFeed($url,$offset='',$limit=''){
$object = array();
$rss = simplexml_load_file($url);
foreach($rss->channel->item as $item){
$object[] = $item->title;
$object[] = $item->description;
$object[] = $item->link;
}
return $object;
}
}

Simpliest way
$limit = 10; $offset = 5;
$i=0; $taken=0;
foreach($rss->channel->item as $item){
if ($i>=$offset && $taken<$limit){
++$taken;
$object[] = $item->title;
$object[] = $item->description;
$object[] = $item->link;
}
//little optimization here
if ($taken == $limit)
break;
++$i;
}
Of course you can store $limit and $offset as object properties, or get them elsewhere.

how about a single counter thing? set offset/limit as needed
public function getFeed($url,$offset='',$limit=''){
$object = array();
$rss = simplexml_load_file($url);
$offset = 3; $limit = 8; $counter = 0;
foreach($rss->channel->item as $item){
$counter++;
if ($counter > $offset && $counter < $limit) {
$object[] = $item->title;
$object[] = $item->description;
$object[] = $item->link;
}
}
return $object;
}

You can use SimpleXMLElement::xpath. This way you don't have to traverse all items just for counting things.
public function getFeed($url, $offset = 1, $limit = -1){
$object = array();
$rss = simplexml_load_file($url);
$limitCriteria = '';
if ($limit > 0) {
$limitCriteria = 'and position() <= ' . ((int)$offset + (int)$limit + 1);
}
foreach($rss->xpath(sprintf('//item[position() >= %s %s]', (int)$offset, $limitCriteria)) as $item){
$object[] = $item->title;
$object[] = $item->description;
$object[] = $item->link;
}
return $object;
}

Related

how to scrape a webpage with pagination

i'm setting up a new server, and want to scrape some information from a website
this is my code i tried to scrape pages one by one but i only get 2 of pages
$result = array();
function scrapingAnimelist($url, $page)
{
$res = array();
$urlParsed = $url . "&page=" . $page;
$html = file_get_html($urlParsed);
$pageData = array();
foreach ($html->find('div[class=body]') as $item) {
$metaData = array();
$metaData['title'] = $item->find('h2[class=title]', 0)->innertext;
$metaData['img'] = $item->find('img[class=img]', 0)->src;
$metaData['url'] = $item->find('a', 0)->href;
array_push($pageData, $metaData);
}
$res[$page] = $pageData;
if (sizeof($pageData) == 20) {
$page++;
$res[$page] = scrapingAnimelist($url, $page);
}
global $result;
$result = $res;
return $pageData;
}
i expect the output of json object with only 2 arrays ( page datas ) to be 3 in link : https://anime-list2.cf/anime-search?s=mag
Your $result is not set on the second run
yout should make it like this
$result = array();
function scrapingAnimelist($url, $page) {
global $result;
$urlParsed = $url . "&page=" . $page;
$html = file_get_html($urlParsed);
$pageData = array();
foreach ($html->find('div[class=body]') as $item) {
$metaData = array();
$metaData['title'] = $item->find('h2[class=title]', 0)->innertext;
$metaData['img'] = $item->find('img[class=img]', 0)->src;
$metaData['url'] = $item->find('a', 0)->href;
array_push($pageData, $metaData);
}
$result[$page] = $pageData;
if (sizeof($pageData) == 20) {
return scrapingAnimelist($url, $page + 1);
}
return $result;
}

php array keep pushing in the depth

My current code is like :
<?php
$item = "123.456.789.963.852.741";
$item_arr = explode(".", $item);
$inner_count = count($item_arr);
$parent_element = "myarray";
if($inner_count==3){
$my_array[$item_arr[0]]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]['order'] = $count;
$my_array[$parent_element]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]=$my_array[$item_arr[0]]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]];
unset($my_array[$item_arr[0]]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]);
}
if($inner_count==4){
$my_array[$item_arr[0]]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]['XYZ-Key'][$item_arr[3]]['order'] = $count;
$my_array[$parent_element]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]['XYZ-Key'][$item_arr[3]]=$my_array[$item_arr[0]]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]['XYZ-Key'][$item_arr[3]];
unset($my_array[$item_arr[0]]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]['XYZ-Key'][$item_arr[3]]);
}
if($inner_count==5){
$my_array[$item_arr[0]]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]['XYZ-Key'][$item_arr[3]]['XYZ-Key'][$item_arr[4]]['order'] = $count;
$my_array[$parent_element]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]['XYZ-Key'][$item_arr[3]]['XYZ-Key'][$item_arr[4]]=$my_array[$item_arr[0]]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]['XYZ-Key'][$item_arr[3]]['XYZ-Key'][$item_arr[4]];
unset($my_array[$item_arr[0]]['XYZ-Key'][$item_arr[1]]['XYZ-Key'][$item_arr[2]]['XYZ-Key'][$item_arr[3]]['XYZ-Key'][$item_arr[4]]);
}
now i want to extend it to more count (right now code is up to 5)
but the problem I can not the code in the same way
You can do it in this manner. Just change string for your format
$item = "1.2.3.4.5";
$ar = explode('.', $item);
$start = '{';
$end = '}';
foreach($ar as $i) {
$start .= '"'.$i.'":{';
$end = '}'.$end;
}
$temp = $start . '"order":{}' .$end; // {"1":{"2":{"3":...{"order":{}}}}}
$res = json_decode($temp,true);

What is wrong with this php string

I am new to php, trying to get a RSS Reader to display a message when there is nothing to display.
I asked for some help yesterday and was kindly assisted, and it made sense, but for some reason it is not storing it.
Hoping someone could tell me what is wrong with the following php.
<?php
require_once("rsslib.php");
$url = "http://www.bom.gov.au/fwo/IDZ00063.warnings_land_qld.xml";
$rss123 = RSS_Display($url, 3, false, true);
if (count($rss123) < 1)
{
// nothing shown, do whatever you want
echo 'There are no current warnings';
echo '<style type="text/css">
#flashing_wrapper {
display: none;
}
</style>';
}
else
{
// something to display
echo $rss123;
}
?>
My problem is, it doesnt seem to be storing a value in $rss123.
It can be viewed at the following address - http://goo.gl/12XQSe
Thanks in advanced,
Pete
----- EDIT ------
As requested in a comment, RSS_Display is from the rsslib.php file, which is as follows
<?php
/*
RSS Extractor and Displayer
(c) 2007-2010 Scriptol.com - Licence Mozilla 1.1.
rsslib.php
Requirements:
- PHP 5.
- A RSS feed.
Using the library:
Insert this code into the page that displays the RSS feed:
<?php
require_once("rsslib.php");
echo RSS_Display("http://www.xul.fr/rss.xml", 15);
? >
*/
$RSS_Content = array();
function RSS_Tags($item, $type)
{
$y = array();
$tnl = $item->getElementsByTagName("title");
$tnl = $tnl->item(0);
$title = $tnl->firstChild->textContent;
$tnl = $item->getElementsByTagName("link");
$tnl = $tnl->item(0);
$link = $tnl->firstChild->textContent;
$tnl = $item->getElementsByTagName("pubDate");
$tnl = $tnl->item(0);
$date = $tnl->firstChild->textContent;
$tnl = $item->getElementsByTagName("description");
$tnl = $tnl->item(0);
$description = $tnl->firstChild->textContent;
$y["title"] = $title;
$y["link"] = $link;
$y["date"] = $date;
$y["description"] = $description;
$y["type"] = $type;
return $y;
}
function RSS_Channel($channel)
{
global $RSS_Content;
$items = $channel->getElementsByTagName("item");
// Processing channel
$y = RSS_Tags($channel, 0); // get description of channel, type 0
array_push($RSS_Content, $y);
// Processing articles
foreach($items as $item)
{
$y = RSS_Tags($item, 1); // get description of article, type 1
array_push($RSS_Content, $y);
}
}
function RSS_Retrieve($url)
{
global $RSS_Content;
$doc = new DOMDocument();
$doc->load($url);
$channels = $doc->getElementsByTagName("channel");
$RSS_Content = array();
foreach($channels as $channel)
{
RSS_Channel($channel);
}
}
function RSS_RetrieveLinks($url)
{
global $RSS_Content;
$doc = new DOMDocument();
$doc->load($url);
$channels = $doc->getElementsByTagName("channel");
$RSS_Content = array();
foreach($channels as $channel)
{
$items = $channel->getElementsByTagName("item");
foreach($items as $item)
{
$y = RSS_Tags($item, 1); // get description of article, type 1
array_push($RSS_Content, $y);
}
}
}
function RSS_Links($url, $size = 15)
{
global $RSS_Content;
$page = "<ul>";
RSS_RetrieveLinks($url);
if($size > 0)
$recents = array_slice($RSS_Content, 0, $size + 1);
foreach($recents as $article)
{
$type = $article["type"];
if($type == 0) continue;
$title = $article["title"];
$link = $article["link"];
$page .= "<li>$title</li>\n";
}
$page .="</ul>\n";
return $page;
}
function RSS_Display($url, $size = 18, $site = 0, $withdate = 0)
{
global $RSS_Content;
$opened = false;
$page = "";
$site = (intval($site) == 0) ? 1 : 0;
RSS_Retrieve($url);
if($size > 0)
$recents = array_slice($RSS_Content, $site, $size + 1 - $site);
foreach($recents as $article)
{
$type = $article["type"];
if($type == 0)
{
if($opened == true)
{
$page .="</ul>\n";
$opened = false;
}
$page .="<b>";
}
else
{
if($opened == false)
{
$page .= "<ul>\n";
$opened = true;
}
}
$title = $article["title"];
$link = $article["link"];
$page .= "<li>$title";
if($withdate)
{
$date = $article["date"];
$page .=' <span class="rssdate">'.$date.'</span>';
}
$description = $article["description"];
if($description != false)
{
$page .= "<br><span class='rssdesc'>$description</span>";
}
$page .= "</li>\n";
if($type==0)
{
$page .="</b><br />";
}
}
if($opened == true)
{
$page .="</ul>\n";
}
return $page."\n";
}
?>
There seems to be something wrong with the xml file you are using. I tried the with a another xml by replacing the url with the mentioned value. $url = "http://www.scriptol.com/rss.xml";
Oddly enough it seems to be working now with the old xml as well.

XML parsing in php

I am parsing a xml and but there is a tag which contain image and text both and i want to seprate both image and text in diffrent columns of table in my design layout but i dont know how to do it. please help me. my php file is :
<?php
$RSS_Content = array();
function RSS_Tags($item, $type)
{
$y = array();
$tnl = $item->getElementsByTagName("title");
$tnl = $tnl->item(0);
$title = $tnl->firstChild->textContent;
$tnl = $item->getElementsByTagName("link");
$tnl = $tnl->item(0);
$link = $tnl->firstChild->textContent;
$tnl = $item->getElementsByTagName("description");
$tnl = $tnl->item(0);
$img = $tnl->firstChild->textContent;
$y["title"] = $title;
$y["link"] = $link;
$y["description"] = $img;
$y["type"] = $type;
return $y;
}
function RSS_Channel($channel)
{
global $RSS_Content;
$items = $channel->getElementsByTagName("item");
// Processing channel
$y = RSS_Tags($channel, 0); // get description of channel, type 0
array_push($RSS_Content, $y);
// Processing articles
foreach($items as $item)
{
$y = RSS_Tags($item, 1); // get description of article, type 1
array_push($RSS_Content, $y);
}
}
function RSS_Retrieve($url)
{
global $RSS_Content;
$doc = new DOMDocument();
$doc->load($url);
$channels = $doc->getElementsByTagName("channel");
$RSS_Content = array();
foreach($channels as $channel)
{
RSS_Channel($channel);
}
}
function RSS_RetrieveLinks($url)
{
global $RSS_Content;
$doc = new DOMDocument();
$doc->load($url);
$channels = $doc->getElementsByTagName("channel");
$RSS_Content = array();
foreach($channels as $channel)
{
$items = $channel->getElementsByTagName("item");
foreach($items as $item)
{
$y = RSS_Tags($item, 1);
array_push($RSS_Content, $y);
}
}
}
function RSS_Links($url, $size = 15)
{
global $RSS_Content;
$page = "<ul>";
RSS_RetrieveLinks($url);
if($size > 0)
$recents = array_slice($RSS_Content, 0, $size + 1);
foreach($recents as $article)
{
$type = $article["type"];
if($type == 0) continue;
$title = $article["title"];
$link = $article["link"];
$img = $article["description"];
$page .= "$title\n";
}
$page .="</ul>\n";
return $page;
}
function RSS_Display($url, $click, $size = 8, $site = 0, $withdate = 0)
{
global $RSS_Content;
$opened = false;
$page = "";
$site = (intval($site) == 0) ? 1 : 0;
RSS_Retrieve($url);
if($size > 0)
$recents = array_slice($RSS_Content, $site, $size + 1 - $site);
foreach($recents as $article)
{
$type = $article["type"];
if($type == 0)
{
if($opened == true)
{
$page .="</ul>\n";
$opened = false;
}
$page .="<b>";
}
else
{
if($opened == false)
{
$page .= "<table width='369' border='0'>
<tr>";
$opened = true;
}
}
$title = $article["title"];
$link = $article["link"];
$img = $article["description"];
$page .= "<td width='125' align='center' valign='middle'>
<div align='center'>$img</div></td>
<td width='228' align='left' valign='middle'><div align='left'><a
href=\"$click\" target='_top'>$title</a></div></td>";
if($withdate)
{
$date = $article["date"];
$page .=' <span class="rssdate">'.$date.'</span>';
}
if($type==0)
{
$page .="<br />";
}
}
if($opened == true)
{
$page .="</tr>
</table>";
}
return $page."\n";
}
?>
To separate the image and description you need to parse the HTML that is stored inside the description element again as XML. Luckily it is valid XML inside that element, therefore you can do this straight forward with SimpleXML, the following code-example take the URL and converts each item *description* into the text only and extracts the src attribute of the image to store it as the image element:
<item>
<title>Fake encounter: BJP backs Kataria, says CBI targeting Modi</title>
<link>http://ibnlive.in.com/news/fake-encounter-bjp-backs-kataria-says-cbi-targeting-modi/391802-37-64.html</link>
<description>The BJP lashed out at the CBI and questioned its 'shoddy investigation' into the Sohrabuddin fake encounter case.</description>
<pubDate>Wed, 15 May 2013 13:48:56 +0530</pubDate>
<guid>http://ibnlive.in.com/news/fake-encounter-bjp-backs-kataria-says-cbi-targeting-modi/391802-37-64.html</guid>
<image>http://static.ibnlive.in.com/ibnlive/pix/sitepix/05_2013/bjplive_kataria3.jpg</image>
</item>
The code-example is:
$url = 'http://ibnlive.in.com/ibnrss/top.xml';
$feed = simplexml_load_file($url);
$items = $feed->xpath('(//channel/item)');
foreach ($items as $item) {
list($description, $image) =
simplexml_load_string("<r>$item->description</r>")
->xpath('(/r|/r//#src)');
$item->description = (string)$description;
$item->image = (string)$image;
}
You can then import the SimpleXML into a DOMElement with dom_import_simplexml() however honestly, I just would wrap that little HTML creation as well into a foreach of SimpleXML because you can make use of LimitIterator for the paging as well as you could with DOMDocument and the data you access is actually easily at hand with SimpleXML, it's just easy to pass along the XML elements as SimpleXMLElements instead of parsing into an array first and then processing the array. That's moot.

using script "RSSlib "(php) how can I sort items in RSS feed so oldest shows first?

I have two pages, one is the script itself, and the other the page that calls it. I can display my feed fine, but I need to display it in reverse order. (oldest first). I have tried the two asort and arsort functions, but I cant get them to work.
Here is the code:
$RSS_Content = array();
function RSS_Tags($item, $type)
{
$y = array();
$tnl = $item->getElementsByTagName("title");
$tnl = $tnl->item(0);
$title = $tnl->firstChild->data;
$tnl = $item->getElementsByTagName("link");
$tnl = $tnl->item(0);
$link = $tnl->firstChild->data;
//$tnl = $item->getElementsByTagName("description");
// $tnl = $tnl->item(0);
// $description = $tnl->firstChild->data;
$y["title"] = $title;
$y["link"] = $link;
//$y["description"] = $description;
$y["type"] = $type;
return $y;
}
function RSS_Channel($channel)
{
global $RSS_Content;
$items = $channel->getElementsByTagName("item");
// Processing channel
$y = RSS_Tags($channel, 0); // get description of channel, type 0
array_push($RSS_Content, $y);
// Processing articles
foreach($items as $item)
{
$y = RSS_Tags($item, 1); // get description of article, type 1
array_push($RSS_Content, $y);
}
}
function RSS_Retrieve($url)
{
global $RSS_Content;
$doc = new DOMDocument();
$doc->load($url);
$channels = $doc->getElementsByTagName("channel");
$RSS_Content = array();
foreach($channels as $channel)
{
RSS_Channel($channel);
}
}
function RSS_RetrieveLinks($url)
{
global $RSS_Content;
$doc = new DOMDocument();
$doc->load($url);
$channels = $doc->getElementsByTagName("channel");
$RSS_Content = array();
foreach($channels as $channel)
{
$items = $channel->getElementsByTagName("item");
foreach($items as $item)
{
$y = RSS_Tags($item, 1); // get description of article, type 1
array_push($RSS_Content, $y);
}
}
}
function RSS_Links($url, $size)
{
global $RSS_Content;
$page = "<ul>";
RSS_RetrieveLinks($url);
if($size > 0)
$recents = array_slice($RSS_Content, 0, $size);
foreach($recents as $article)
{
$type = $article["type"];
if($type == 0) continue;
$title = $article["title"];
$link = $article["link"];
$page .= "<li>$title</li>\n";
}
$page .="</ul>\n";
return $page;
}
function RSS_Display($url, $size)
{
global $RSS_Content;
asort($RSS_Content);
$opened = false;
$page = "";
RSS_Retrieve($url);
if($size > 0)
$recents = array_slice($RSS_Content, 0, $size);
foreach($recents as $article)
{
$type = $article["type"];
if($type == 0)
{
if($opened == true)
{
$page .="</ul>\n";
$opened = false;
}
$page .="<b>";
}
else
{
if($opened == false)
{
$page .= "<ul>\n";
$opened = true;
}
}
$title = $article["title"];
$link = $article["link"];
// $description = $article["description"];
$page .= "<p>$title";
// if($description != false)
{
//$page .= "<br>$description";
}
$page .= "</p>\n";
if($type==0)
{
$page .="</b><br />";
}
}
if($opened == true)
{
$page .="</ul>\n";
}
return $page."\n";
}
Then on the second page, I have this:
$url = "feedurlhere.xml";
echo RSS_Links($url, 10);
?>
</div>
Thanks to Scriptol, the answer is:
change all instances of array_push to array_switch
Cheers

Categories