Ok so I have these requirements that i need and I really dont know where to start. Here is what i have
What I need is some PHP code that will grab the latest article from the RSS feed from a wordpress blog. When the PHP grabs the RSS feed, cache it and look for a newer version if the cache is empty or if 24 hours have passed. I need this code to be pretty full proof and be able to run without a DB behind it. Can you just cache the RSS results in memory?
I found this but i am not sure it will be useful in this situation...What i am looking for is some good direction on what/how I can do this. And if there is already a tool out there that can help with this...
Thanks in advance
So if you want to cache the feed itself, it would be pretty simple to do this with a plain text file. Something like this should do the trick:
$validCache = false;
if (file_exists('rss_cache.txt')) {
$contents = file_get_contents('rss_cache.txt');
$data = unserialize($contents);
if (time() - $data['created'] < 24 * 60 * 60) {
$validCache = true;
$feed = $data['feed'];
}
}
if (!$validCache) {
$feed = file_get_contents('http://example.com/feed.rss');
$data = array('feed' => $feed, 'created' => time());
file_put_contents('rss_cache.txt', serialize($data));
}
You could then access the contents of the RSS feed with $feed. If you wanted to cache the article itself, the changes should be fairly obvious.
Related
I've been searching for a suitable PHP caching method for MSSQL results.
Most of the examples I can find suggest storing the results in an array, which would then get included to page. This seems great unless a request for the content was made at the same time as it being updated/rebuilt.
I was hoping to find something similar to ASP's application level variables, but far as I'm aware, PHP doesn't offer this functionality?
The problem I'm facing is I need to perform 6 queries on page to populate dropdown boxes. This happens on the vast majority of pages. It's also not an option to combine the queries. The cached data will also need to be rebuilt sporadically, when the system changes. This could be once a day, once a week or a month. Any advice will be greatly received, thanks!
You can use Redis server and phpredis PHP extension to cache results fetched from database:
$redis = new Redis();
$redis->connect('/tmp/redis.sock');
$sql = "SELECT something FROM sometable WHERE condition";
$sql_hash = md5($sql);
$redis_key = "dbcache:${sql_hash}";
$ttl = 3600; // values expire in 1 hour
if ($result = $redis->get($redis_key)) {
$result = json_decode($result, true);
} else {
$result = Db::fetchArray($sql);
$redis->setex($redis_key, $ttl, json_encode($result));
}
(Error checks skipped for clarity)
So lets say i have a google news feed, like this: https://news.google.com/news/feeds?pz=1&cf=all&ned=no_no&hl=no&q=%22something%22&output=atom&num=1
Grabbing the title, author and link would be easy, but how would i go around getting say the first 200 characters of the content? its full of html, and mixed in with the title and author aswell.
i could use strip_tags on it, but it would still be a mess.
Any way to make google return a ['description'] maybe?
or is there perhaps any other good news feeds that gives me the content in a way thats easier to manage?
[edit]
Update on how i ended up doing it.
$news = #simplexml_load_string(file_get_contents('https://news.google.com/news/feeds?pz=1&cf=all&ned=no_no&hl=no&q=%22molde+fotballklubb%22+OR+%22tornekrattet%22+OR+%22mfk%22+OR+%22oddmund+bjerkeset%22+-%22moss%22&output=atom&num=1'), 'SimpleXMLElement', LIBXML_NOCDATA);
$data = get_object_vars($news->{'entry'});
$test = explode('<font size="-1">', $data['content']);
$link = get_object_vars($data['link']);
$return['title'] = strip_tags($test[0]);
$return['author'] = strip_tags($test[1]);
$return['description'] = strip_tags($test[2]);
$return['link'] = $link['#attributes']['href'];
It is still not working properly, but thats because the feed gives me the content in different ways all the time. Sometimes the content of the news article itself will just be metadata like the authors and image descriptions.
And the breaking it up by html tags when the html have changes from time to time causes some problems. But i cant figure out any othe way of doing it with this feed.
You could try loading the HTML in a DOMDocument instance and extract the parts you need, or use a wrapper for it like Goutte which makes it a lot easier to extract portions you need.
http://php.net/manual/en/class.domdocument.php
https://github.com/fabpot/Goutte
I currently run a website that pulls in a rss feed and you can go to the links the problem i am having is when i click on a rss link it takes me to a webpage but that webpage will load really slow.
I am looking to cache that webpage so it loads really quickly, what is the best way to do this i can create a cache folder in my project and then cache each file to that folder and then serve from their example below.
<?php
foreach ($source_xml->channel->item as $rss) {
$title = trim($rss->title);
$link = $rss->link;
$html = $title . '.html';
$homepage = file_get_contents($link);
file_put_contents('cache/' . $html, $homepage);
}
?>
This takes quite long with alot of feeds and i am not sure if this is the most productive way i have also tried creating a database and have a extra field called cache that is a text field, i then store the output from file_get_contents in there example below.
<?php
foreach ($source_xml->channel->item as $rss) {
$title = trim($rss->title);
$link = $rss->link;
$cache = file_get_contents($link);
$data = array(
'title' => $title,
'link' => $link,
'cache' => $cache
);
echo $this->cron_model->addResults($data);
}
?>
This works but i get this issue when looking at the mysql
Because of its length,
this column might not be editable
I am not familiar with caching and have never really needed to deal with it since now can someone give me some best practise advice i know i can hack something together but would prefer to know the right way before going forward.
Thanks
For better performance regarding caching with PHP + MySQL you can use memcached.
Memcached
For further caching performance, you can utilize Opcode Caching and Meta Caching.
Opcode Caching
APC
Meta Caching
HTTP EQUIV
Im using a remote xml feed, and I don't want to hit it every time. This is the code I have so far:
$feed = simplexml_load_file('http://remoteserviceurlhere');
if ($feed){
$feed->asXML('feed.xml');
}
elseif (file_exists('feed.xml')){
$feed = simplexml_load_file('feed.xml');
}else{
die('No available feed');
}
What I want to do is have my script hit the remote service every hour and cache that data into the feed.xml file.
Here is a simple solution:
Check the last time your local feed.xml file was modified. If the difference between the current timestamp and the filemtime timestamp is greater than 3600 seconds, update the file:
$feed_updated = filemtime('feed.xml');
$current_time = time();
if($current_time - $feed_updated >= 3600) {
// Your sample code here...
} else {
// use cached feed...
}
<?php
$cache = new JG_Cache();
if(!($feed = $cache->get('feed.xml', 3600))) {
$feed = simplexml_load_file('http://remoteserviceurlhere');
$cache->set('feed.xml', $feed);
}
Use any file based caching mechanism e.g. http://www.jongales.com/blog/2009/02/18/simple-file-based-php-cache-class/
$feedmtime = filemtime('feed.xml');
$current_time = time();
if(!file_exists('feed.xml') || ($current_time - $feedmtime >= 3600)){
$feed = simplexml_load_file($url);
$feed->asXML('feed.xml');
}else{
$feed = simplexml_load_file('feed.xml');
}
return $feed;
Take a look at Simple PHP caching.
I created a simple PHP class to tackle this issue. Since I'm dealing with a variety of sources, it can handle whatever you throw at it (xml, json, etc). You give it a local filename (for storage purposes), the external feed, and an expires time. It begins by checking for the local file. If it exists and hasn't expired, it returns the contents. If it has expired, it attempts to grab the remote file. If there's an issue with the remote file, it will fall-back to the cached file.
Blog post here: http://weedygarden.net/2012/04/simple-feed-caching-with-php/
Code here: https://github.com/erunyon/FeedCache
I'm working on a site with a simple php-generated twitter box with user timeline tweets pulled from the user_timeline rss feed, and cached to a local file to cut down on loads, and as backup for when twitter goes down. I based the caching on this: http://snipplr.com/view/8156/twitter-cache/. It all seemed to be working well yesterday, but today I discovered the cache file was blank. Deleting it then loading again generated a fresh file.
The code I'm using is below. I've edited it to try to get it to work with what I was already using to display the feed and probably messed something crucial up.
The changes I made are the following (and I strongly believe that any of these could be the cause):
- Revised the time difference code (the linked example seemed to use a custom function that wasn't included in the code)
Removed the "serialize" function from the "fwrites". This is purely because I couldn't figure out how to unserialize once I loaded it in the display code. I truthfully don't understand the role that serialize plays or how it works, so I'm sure I should have kept it in. If that's the case I just need to understand where/how to deserialize in the second part of the code so that it can be parsed.
Removed the $rss variable in favor of just loading up the cache file in my original tweet display code.
So, here are the relevant parts of the code I used:
<?php
$feedURL = "http://twitter.com/statuses/user_timeline/#######.rss";
// START CACHING
$cache_file = dirname(__FILE__).'/cache/twitter_cache.rss';
// Start with the cache
if(file_exists($cache_file)){
$mtime = (strtotime("now") - filemtime($cache_file));
if($mtime > 600) {
$cache_rss = file_get_contents('http://twitter.com/statuses/user_timeline/75168146.rss');
$cache_static = fopen($cache_file, 'wb');
fwrite($cache_static, $cache_rss);
fclose($cache_static);
}
echo "<!-- twitter cache generated ".date('Y-m-d h:i:s', filemtime($cache_file))." -->";
}
else {
$cache_rss = file_get_contents('http://twitter.com/statuses/user_timeline/#######.rss');
$cache_static = fopen($cache_file, 'wb');
fwrite($cache_static, $cache_rss);
fclose($cache_static);
}
//END CACHING
//START DISPLAY
$doc = new DOMDocument();
$doc->load($cache_file);
$arrFeeds = array();
foreach ($doc->getElementsByTagName('item') as $node) {
$itemRSS = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue
);
array_push($arrFeeds, $itemRSS);
}
// the rest of the formatting and display code....
}
?>
ETA 6/17 Nobody can help…?
I'm thinking it has something to do with writing a blank cache file over a good one when twitter is down, because otherwise I imagine that this should be happening every ten minutes when the cache file is overwritten again, but it doesn't happen that frequently.
I made the following change to the part where it checks how old the file is to overwrite it:
$cache_rss = file_get_contents('http://twitter.com/statuses/user_timeline/75168146.rss');
if($mtime > 600 && $cache_rss != ''){
$cache_static = fopen($cache_file, 'wb');
fwrite($cache_static, $cache_rss);
fclose($cache_static);
}
…so now, it will only write the file if it's over ten minutes old and there's actual content retrieved from the rss page. Do you think this will work?
Yes your code is problematic, because whatever Twitter sends you, you write it.
You should test the file you get from Twitter like this:
if (($mtime > 600) && ($cache_rss = file_get_contents($feedURL)))
{
file_put_contents($cache_rss);
}
file_get_contents() return false if there is an error, check it before caching some new content.