Parsing Tumblr XML Breaks PHP Loop - php

I'm building a website that relies on reading through the XML produced by Tumblr for my photoblog and extracting blog post URLs and image URLs for display and linking.
I've written code in PHP that parses the XML and constructs an outer and inner loops to deal with how Tumblr limits the max number of posts to 50 on any request. The loops should process requests that return batches of 50 URLs and in the final loop only deal with the remainder.
Currently the blog has 53 posts so the code has logic that determines that 2 requests should be read in 2 loops:
The first loop reads: http://abstractmueller.tumblr.com/api/read?start=0&num=50
The second loops reads: http://abstractmueller.tumblr.com/api/read?start=50&num=3
In both cases the XML is loaded into the $xml variable and then parsed for specific data that is collected in an inner loop. When executed, it should just produce a list of image numbers, blog post URLs and image URLs through the entire set of loops.
When I execute this code, the loop always breaks after parsing 20 items as long as I read from XML file returned from the request. If I test the code removing references to the XML output and just letting the code run through counting loops with simple output it works.
I don't understand what in the XML file might be causing this -- through there is a subtle change in the some of the XML data after the first 20 posts.
Can anyone help?
Here is the code:
<?php
// Get xml file from Tumblr API and load
$request_url = "http://abstractmueller.tumblr.com/api/read";
$xml = simplexml_load_file($request_url);
// Get total number of posts, set max number of posts and calculate loop critical info
$posts_data = $xml->posts[0]->attributes();
$posts_max = 50;
$loop_number = ceil($posts_data['total']/$posts_max);
$loop_remainder = $posts_data['total']%$posts_max;
echo '<p>Total posts = '.$posts_data['total'].'</p>';
echo '<p>Loop number = '.$loop_number.'</p>';
?>
<div>
<?php
// Start outer loops to fetch up to 50 post-related image urls each fetch
for ($outer_loop = 1; $outer_loop <= $loop_number; $outer_loop++){
$post_start = ($outer_loop - 1) * $posts_max;
echo '<p>Current loop = '.$outer_loop.'</p>';
// Branch looping so first branch loops through each batch of 50 images, then use last loop to limit looping to the remainder
if ($outer_loop < $loop_number) {
echo '<p>Post start = '.$post_start.'</p>';
echo '<p>Loop end = '.($post_start + $posts_max-1).'</p>';
$request_url = 'http://abstractmueller.tumblr.com/api/read?start='.$post_start.'&num='.$posts_max;
echo '<p>XML URL '.$request_url.'</p>';
// Get post URLs and image URLs in batches of 50
for ($img_num = ($post_start); $img_num < ($post_start + $posts_max); $img_num++) {
$blog_url = $xml->posts->post[$img_num]->attributes();
$photo_url = $xml->posts->post[$img_num]->{'photo-url'}[2];
echo '<p>Image '.$img_num.' Blog URL '.$blog_url['url'].' Image URL '.$photo_url.'</p>';
}
} else {
echo '<p>Post start = '.$post_start.'</p>';
echo '<p>Loop end = '.($post_start + $loop_remainder - 1).'</p>';
$request_url = 'http://abstractmueller.tumblr.com/api/read?start='.$post_start.'&num='.$loop_remainder;
echo '<p>XML URL '.$request_url.'</p>';
// Get post URLs and image URLs up to the total remainder
for ($img_num = $post_start; $img_num <= $loop_remainder + $posts_max; $img_num++) {
$blog_url = $xml->posts->post[$img_num]->attributes();
$photo_url = $xml->posts->post[$img_num]->{'photo-url'}[2];
echo '<p>Image '.$img_num.' Blog URL '.$blog_url['url'].' Image URL '.$photo_url.'</p>';
}
}
}
?>
<div>

Related

Get pagination results in Active Collab API

I have just discovered you can get pagination results through the api by passing in the page parameter like so:
$projects = $client->get('projects/147/time-records?page=3')->getJson();
Is there a way of knowing how many time records a project has so I know how many times I need to paginate?
Alternatively, how would I go about retrieving several pages worth of data - i'm struggling with the code!
I have created an issue on Github - will await a response.
For now, I do the following:
// Get all the projects
// Set the page number
$page = 1;
// Create an empty array
$project_records = array();
// Get the first page of results
$project_records_results = $client->get('projects?page=' . $page)->getJson();
// Merge the results with base array
$project_records = array_merge($project_records, $project_records_results);
// Get the next page of results,
// if it returns something merge with the base array and continue
while ($project_records_results = $client->get('projects?page=' . ++$page)->getJson()) {
$project_records = array_merge($project_records, $project_records_results);
}
Sure. All paginated results will include following headers:
X-Angie-PaginationCurrentPage - indicates current page
X-Angie-PaginationItemsPerPage - indicates number of items per page
X-Angie-PaginationTotalItems - indicates number of items in the entire data set.
When you get header values, simple:
$total_pages = ceil($total_items_header_value / $items_per_page_header_value);
will give you number of pages that are in the collection.
Alternative: You can iterate through pages (by starting with page GET parameter set to 1, and incrementing it) until you get an empty result (page with no records). Page that returns no records is the last page.
Please note, that the headers are now all lowercase (v1)!
So the answer above should be corrected.
To get them call:
$headers = $client->get($path)->getHeaders();
Working code example from /api/v1/:
$paginationCurrentPage = isset($headers['x-angie-paginationcurrentpage'][0]) ? $headers['x-angie-paginationcurrentpage'][0] : NULL;
$paginationItemsPerPage = isset($headers['x-angie-paginationitemsperpage'][0]) ? $headers['x-angie-paginationitemsperpage'][0] : NULL;
$paginationTotalItems = isset($headers['x-angie-paginationtotalitems'][0]) ? $headers['x-angie-paginationtotalitems'][0] : NULL;

paginated api request, how to know if there is another page?

I am creating a PHP class that use a 3rd party API. The API has a method with this request URL structure:
https://api.domain.com/path/sales?page=x
Where "x" is the page number.
Each page return 50 sales and I need to return an undefined number of pages for each user (depending on the user sales) and store some data from each sale.
I have already created some methods that get the data from the URL, decode and create a new array with the desired data, but only with the first page request.
Now I want to create a method that check if is there another page, and if there is, get it and make the check again
How can I check if there is another page? And how to create a loop that get another page if there is one?
I have already this code, but it create an infinite loop.
require('classes/class.example_api.php');
$my_class = new Example_API;
$page = 1;
$sales_url = $my_class->sales_url( $page );
$url = $my_class->get_data($sales_url);
while ( !empty($url) ) {
$page++;
$sales_url = $my_class->sales_url( $page );
$url = $my_class->get_data($sales_url);
}
I don't use CURL, I use file_get_content. When I request a page out of range, I get this result:
string(2) "[]"
And this other after json_decode:
array(0) { }
From your input, in the while loop, you change the $url (which actually holds the data return by the API call) and this is checked for emptiness, if I'm correct.
$url = $my_class->get_data($sales_url);
If the above is just the original response (so in case of page out of range a string "[]"), it will never get empty("[]") to true. So my guess is that the return value from get_data is this string, while it should be the actual array/json even if the result is empty (ie I suspect that you perform the json_decode once you have collected the data e.g. outside the loop).
If this is the case, my suggestion would be to either check for "[]" in the loop (e.g. while ($url !== "[]")) or within the loop decode the response data ($url = json_decode($url)).
From my experience with several API's, the response returns the number of rows found, and x number per page starting with page 1.
In your case, if the response has the number of rows then just divide it by the x number page and loop through the results as page numbers.
$results = 1000;
$perPage = 50;
$pages = ceil($results/$perPage);
for (i=1; $i <= $pages; $i++){
// execute your api call and store the results
}
Hope this help.
From the responses you've shown, you get an empty array if there are no results. In that case, you could use the empty method in a loop to determine if there's anything to report:
// Craft the initial request URL
$page = 1;
$url = 'https://api.domain.com/path/sales?page=' . $page;
// Now start looping
while (!empty(file_get_contents($url)) {
// There's data here, do something with it
// And set the new URL for the next page
$url = 'https://api.domain.com/path/sales?page=' . ++$page;
}
That way it will keep looping over all the pages, until there is no more data.
Check http response headers for total number of items in set

RSS to HTML with varying number of elements

I've adapted the code found here http://www.w3schools.com/php/php_ajax_rss_reader.asp to turn XML into HTML, and it works fine.
But what I'm stuck on is getting it to show all the items in a feed when the feed can have varying numbers of items. The feed is published daily and can have anywhere from 12-20 articles in it, and I want to show all of them.
In the For Loop for ($i=0; $i<=12; $i++) if I set the condition to be greater than the number of articles, I get an error PHP Fatal error: Call to a member function getElementsByTagName(), so I can't just set it to a big number.
I get the same error if I just remove the condition.
I can't figure out how to count the number of items, either; if I could do that the solution would be easy.
The feed is created in-house so I could ask my colleague to insert the number of items in the feed; is that the best way to go about it?
Thanks!
If you don't know the number of items in the feed, you can go through them all using a foreach loop. Here is an example using the RSS feed from the PHP tag on StackOverflow. Have a look at the rss format so you can see what each entry looks like, and compare it to the code below.
# start off like the w3schools code...
$xml=("https://stackoverflow.com/feeds/tag?tagnames=php&sort=newest");
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
# StackOverflow uses the <entry> element for each separate item.
# find all the "entry" items. This returns an array of matching entry elements
$items = $xmlDoc->getElementsByTagName('entry');
# go through the array of "entry" elements one at a time
# $items is the array of <entry> elements
# $i is set to each <entry> in turn, starting from the first one on the page
foreach ($items as $i) {
# some sample code to get the title, tags, and link
$title = $i->getElementsByTagName('title')->item(0)->nodeValue;
$href = $i->getElementsByTagName('link')->item(0)->getAttribute('href');
$tags = $i->getElementsByTagName('category');
$tag_arr = [];
foreach ($tags as $t) {
$tag_arr[] = $t->getAttribute('term');
}
echo "Title: $item_title; tags: " . implode(", ", $tag_arr) . ";\nhref: $href\n\n";
}
Using a foreach loop means you are not stuck with having to work out how many items you have in your array, and you don't have to set up an array iterator using for ($i = 0; $i < 500; $i++).

divide xml into chunks based on url in php

I have a robots.txt file
in that i generate dynamic sitemap links.
I get the following links if i run the robots.txt file in the browser.
Here you get 5 sitemap links for each language.
Reason: there are 10 products in database.
i want to show only two products per link. so i divided the total no.of products with no.of products on one page.
Sitemap:http://demo.com/pub/sitemap_products.php?page=1&lang=it_IT
the part in bold is dynamic.
code in: sitemap_products.php
$Qproduct : returns an array of all the products in the db for all the languages.
So the bellow loop generates an xml having links of the products for the language in the sitemap url
for eg.
if the link is
Sitemap:http://demo.com/pub/sitemap_products.php?page=1&lang=it_IT
it will generate all the products present in IT language.
The xml links that are generated now are based on languages that we get from url.
but i want to divide them into chunks of 2 product's xml per sitemap link.
while($Qproduct->next())
{
if(!isset($page_language[$Qproduct->valueInt('language_id')]))
{
$page_language[$Qproduct->valueInt('language_id')] = mxp_get_page_language($MxpLanguage->getCode($Qproduct->valueInt('language_id')), 'products');
}
if($Qproduct->valueInt('language_id') == $QproductLang->valueInt('languages_id'))
{
$string_to_out .= '<url>
<loc>' . href_link($page_language[$Qproduct->valueInt('language_id')], $Qproduct->value('keyword'), 'NONSSL', false) . '</loc>
<changefreq>weekly</changefreq>
<priority>1</priority>
</url>';
}
}
what i wish to do is apply a condition so that it gives me exactly two products links in xml when page=1(see in the sitemap links) instead of all the 10 products link in xml.
similarly if page=2 it should display next 2 products. and so on.
I am a bit confused in the condition that i am supposed to apply.
Please help me out.
First of all, use an XML library to create the XML, not string concatenation. Example:
$loc = href_link($page_language[$Qproduct->valueInt('language_id')], $Qproduct->value('keyword'), 'NONSSL', false);
$url = new SimpleXMLElement('<url/>');
$url->loc = $loc;
$url->changefreq = 'weekly';
$url->priority = 1;
In your case, you can even easily wrap that into a function that just returns such an element and which has two parameters: $Qproduct and $page_language (as string, not array (!)).
But that's just some additional advice, because the main point you ask about is the looping and more specifically the filtering and navigating inside the loop to the elements you're interested in.
First of all you operate on all results by looping over them:
while ($Qproduct->next())
{
...
}
Then you say, that you're only interested in links of a specific language:
while ($Qproduct->next())
{
$condition = $Qproduct->valueInt('language_id') == $QproductLang->valueInt('languages_id');
if (!$condition) {
continue;
}
...
}
This already filters out all elements not interested in. What is left to keep track and decide which elements to take:
$page = 1;
$start = ($page - 1) * 2;
$end = $page * 2 - 1;
$count = 0;
while ($Qproduct->next())
{
$condition = $Qproduct->valueInt('language_id') == $QproductLang->valueInt('languages_id');
if (!$condition) {
continue;
}
$count++;
if ($count < $start) {
continue;
}
...
if ($count >= $end) {
break;
}
}
Alternatively, instead writing this all the time your own, create an Iterator for $Qproduct iteration and the use FilterIterator and LimitIterator for filtering and pagination.

How can I skip first item from Twitter Status feed, display next 4 items with PHP?

I am setting up a series of Twitter feed displays on one page. One shows the MOST RECENT status, in a particular fashion. The other (I am hoping) will show the next 4 statuses, while NOT including the most recent status. Here is part of the code that I think needs attention in order for this idea to work out:
$rss = file_get_contents('https://api.twitter.com/1/statuses/user_timeline.rss?
screen_name='.$twitter_user_id);
if($rss) {
// Parse the RSS feed to an XML object.
$xml = simplexml_load_string($rss);
if($xml !== false) {
// Error check: Make sure there is at least one item.
if (count($xml->channel->item)) {
$tweet_count = 0;
// Start output buffering.
ob_start();
// Open the twitter wrapping element.
$twitter_html = $twitter_wrap_open;
// Iterate over tweets.
foreach($xml->channel->item as $tweet) {
Here is the website which has lent me the code for this task:
< Pixel Acres - Display recent Twitter tweets using PHP >
Your foreach loop goes over each item in the feed. You want to skip certain elements based on the position in the feed, so you could add an index variable to the foreach and an if after the foreach:
foreach($xml->channel->item as $i => $tweet) {
if ($i == 0 || $i > 4)
continue;
I used an alternate method to solve the issue I was having. It included using a string replace on the latest tweet's URL to obtain the Tweet ID, which then allowed me to query tweets using (Tweet ID - 1) as the max_id term.

Categories