Over at rsscache they offer a mechanism that caches your website's feed. They claim that if a new node gets added to your feed, instead of flushing and refilling the entire cache(for the current user, they proably do for new users), they only send the new node to the current users newsreader, and the reader adds it within the other nodes, it updates without completely refreshing, saving bandwidth. (see step 6)
Is this correct, I can imagine the cache having a node added, rather then flushed and renewed, but I don't get how this scenario could work in the visitor's feedreader.
If so how can it (selective update to cache and or reader) be achieved with php?
Could this selective node update-mechanism be extended to exclude error nodes, like;
error:node not found
So when random nodes in the mashup feed (lifestream) originating from a specific service, ea twitter, dissapear because the servide is offline, the nodes dont get replaced with the offline error, but their previous state sticks?
Normally feedreaders keep all items ever included in a feed. Or let you configure the maximum age or number of items to keep.
Rsscache.com probably stores the items of a feed and which clients have received which items is a database. With that information it's possible to dynamically create a feed with only items not returned yet.
So when you start reading a feed through rsscache.com all items are returned but in later requests only the new items.
Here's one approach that might work :
Keep a local cache of the feeds, and do conditional GETs (using If-Modified-Since header) based on the date of this cached version (resulting in less bandwidth for the webmaster IF his server supports conditional GETs). Once you have the feed check every item in it to see if it already exists in your DB or not. If it already exists, skip it, if it doesn't then add it including a timestamp to reflect when you added it, and you probably want to remove "dead" items for the feed as well.
Now you need to keep track of the last time someone queried a specific feed on the server. If it's the first time someone asks for a feed, then just return all items you stored in the DB for that feed, and add a record containing user (ip?), feed and the current time. If he's been there before for that feed then select all records for that feed that have been added after his previous query (and afterwards update the last_query time for that specific user and feed to the current time). And if there are no new items simply return an empty feed.
Related
I have been pondering over this question for a few days now, and I can't seem to figure out what the best practice would be, so hopefully one of you guys / girls will be able to lead me down the right path.
Data
Lets say for the sake of argument we have a Meta data table, an Item data table and an Expiration date table.
Meta
Items
Expirations
A meta is what defines a collection of items and holds the general information for that collection, and there a certain criterias set on the meta which are used throughout the items layout.
An item is a derivation of a meta and holds unique data specific to that ONE item, and usually calls for the meta information. (Note: An item can ALSO be a collection of OTHER items)
A expiration is self explanatory, and is the expiration date of one individual item.
Current implementation
If you enter the meta page for any given meta, it will load the items in a paginated ajax call and for each loaded item it will load a status icon. This status icon is generated EACH time it is shown, and it is recursive (This means that if I were to put a status icon on the meta, then it would load all items for that meta and check all it's expiration dates and so fourth.)
With the extended implementation it currently does 600 calls (5.56 seconds) to the database (Which is quite a load, but is neccessary based on the database layout.)
My theory
I reckon it would show much better results for the end-user to simply store the status value in the database alongside each element, however I don't know if that is the right way to do things - considering that I would have to recursively adjust this status icon every time an element within the spectrum changes (including having a server job setup to go and update the status based on the expiration date)
The question
What would the best practice for a status icon be that is recursive and has many factors and relations to keep track of?
Best practice would be to cache a status icon and read it for end-user from cache only. All your load heavy stuff (recursive DB calls magic) will go into a job. Lastly you create observers for all your models that will queue this job to update cache value for changed items.
This way you will separate load heavy stuff to be handled by workers on your server side and on your client side everything will be fast and smooth. Once status icon update will be handled your end-user will see a change.
I need to retrieve all the Instagram media with a specific tag since a specific date.
Using the following call, I'm supposed to do that. So I can just keep paging it until I find an item with a timestamp before my limit date, so I stop it.
/tags/<tag-name>/media/recent
The problem is that it doesn't return any field that identifies when it was tagged. The two only timestamps that are returned from the call are:
create_time: Date when the item was created
caption->create_time: Date when the caption of the item was created
But they doesn't really matter since the tag can be also included via comments (which doesn't come in the response).
I believe Instagram must return the tag timestamp, since this is what this call is based on. But as it doesn't, is there any other way?
Here is an example to help understanding the case:
I wan't to bring all media tagged with #stackoverflow since 01/01/2017. For that I will call the API, retrieving page by page and checking item by item until I find one with a date before 01/01/2017
/tags/stackoverflow/media/recent
But it happens that someone can go to a really old photo and comment on it, including the tag #stackoverflow. This will make the media item go straight to the top of the API response, but there is no way to know if the tag was added before my limit date.
All dates in this commented item are before my limit date, but the tagging is recent.
I'm currently developing a database/website server interface to facilitate inputting data for a data collection project. There are two types of additions being made to the database: A and B here. Some of the tables in the database that handle these entries are as follows:
dcs_projectname_a
dcs_projectname_b
Each of these have tables for all the required input fields in addition to things like creator, timestamp, etc.
The pages on the website facilitate three different options: add, view, and edit. Each page for each type of entry performs the respective function. That is, the add page adds, view page views, etc.
I am just about done; however, there is a major challenge I haven't really confronted yet: concurrency. There will be multiple users adding content to the database at the same time. Each entry is given its own specific id and there CANNOT be any duplicate id's. That is, the first a entry is A000001, the next is A000002, and so on.
On the add and edit pages, there is a disabled field for the user to view the id for other uses when physically documenting entries.
What I need to figure out is how to implement concurrency management so that when users are concurrently adding a's that they will not be under the same id and row.
The add page automatically generates the id to be used by accessing the database's most recent id and adding one.
My thought was to create a new row in the table every time the add page is opened and give it the calculated id. Then, when information is added it performs a modification to that existing row. This way, if another user opens the add page while another entry is currently being added it will be given the future id, not the same one.
With this method I need a way to delete this entry if the user leaves the add page or closes the browser. Then, I also need other users with open add pages to automatically update their id's to one less when the first user (or any other user less than the most id being used) leaves their add page and cancels the addition.
This is kind of a complicated explanation and if you don't understand let me know and I'll try to answer as best as I can. Any help is much appreciated!
Thanks
There's a number of solutions to your problem, but you seem to have made things harder by having your application generate the record IDs for you.
Instead, you could just be using MySQL's AUTO_INCREMENT functionality to automatically generate/increment the record ID for you (upon insert). MySQL will ensure that there are no duplicates, and you can get rid of the extra database call to retrieve the most recent ID.
This is an RSS feed (A) where the user can add several images, but he/she can also add a RSS feed (B) from a different user with images. When requesting feed (A), the server then fetches feed (B), and the images from the feed are then added to the requested feed (A).
What are some mechanisms or options to prevent infinite circular recursion?
e.g. when feed (B) also includes feed (A)
// Feed A setup
// - image1a
// - image2a
// - feed-B
// Feed B setup
// - image1b
// - feed-A
// fetching / assembling feed A
// - image1a
// - image2a
// - (A fetches feed-B)
// - image1b
// - (B fetches feed-A)
// - image1a
// - image2a
// - (fetched A fetches feed-B again)
// - image1b
// - (second B fetches feed-A again)
// .. recursion
There are three solutions:
Store the original feed id with each item and forward only items created initially for each feed,
or forward all items for each feed, and pass around a list of all feeds the item has been in (and check that list),
or use a unique itemID for each RSS, store it only once, put a unique constraint or primary key on the itemID, and thus never store an item twice for each feed.
As it turns out, the solution is in several stages.
1 locking : On request return a copy of the cached XML. During building the feed XML, set a lock. This prevents external feeds that are fetching this feed, from triggering a second new build. The external feed will only receive the cached XML.
2 identify items : The lock stops a potential runaway process, but the feed XML does grow on every request with the previously cached XML items. To prevent duplicates add a unique identifier for the feed to each "guid" field. If an item is the feeds own, don't include it and log a message (and notify if needed).
I am trying to read data from an RSS feed which has 25 items. When I request the RSS file through HTTP it says there are only 20 items.
function test($location)
{
$doc = new DomDocument();
$doc->load($location);
$items = $doc->getElementsByTagName('item');
return $items->length;
}
// Prints 20
echo test('http://www.reddit.com/r/programming/new/.rss?after=t3_');
// Prints 25
echo test('programming.xml');
I've tried RSS feeds from other subreddits as well with the same result.
I see what the issue is now... If you visit a sub-reddit like /r/programming/ and go to the "new" tab to see newest submissions, there are two sorting options. The first option is "rising" which only shows up-and-coming entries, the alternate sort order is "new".
Since I chose the "new" sort order in my browser it saved a cookie and was used as the default sort order afterwards. However, accessing the page through code was still using the default sort order, which returned a variable amount of results.
I resolved the issue by appending the sort order query string to the request url: http://www.reddit.com/r/programming/new/.rss?sort=new
If it were having problems loading the feed, it'd probably issue a warning of some sort.
Right now, your sample code for the reddit feed shows that it has 14 items. The number of items in that feed is not constant. So the issue is that your local copy is different that the one you were loading from reddit.