A Publisher with RSS as Datasource

A Publisher with RSS as Datasource - php

I am writing some code to fetch news from rss feed and publish n items at once every m hours to another site.
I compare the update xml file with the previous one saved on server using PHP.
I load the two xml into php array and the latest post is filter out using array_diff_assoc().
If the number of the latest post>n, the older one will be publish first, the rest will be done next time. Therefore I need some ways to store which item have publish or not.
What is the simplest way to do so? I don't want to apply mySQL/S for such a simple task.

Can't you just store those not published? Then each time, pull up the old, stored ones, and append to the list those new ones ID'd by array_diff_assoc(). Publish n, and if number > n, store the new list of unpublished ones.
As to how to store them, I'm not a PHP programmer, but what about using PHP's serialize and unserialize functions? In python, I'd use the pickle module if I had to store data objects of some type, and I understand those are the PHP equivalent.

Related

Is there a way to store multiple records rather than using multiple rows in MySQL?

I would like to make full use out of MySQL for the purpose of a (web) application I have developed for a chiropractor.
So far I have been storing in a single row for [every year] for what are called progress notes. The table structure looks something like this (progress_note_id, patient_id, date (Y-0-0), progress_note). When the client wishes to append for the year of the current progress notes, he simply clicks at the top of a textarea (html), which I use TinyMCE JavaScript library, to make a new entry date along with the shorthand notes to go at the beginning of the column (progress_note). So far its been working ok, if there are 900+ clients (est.) there could potentially be 1300+ progress notes, for each year since the beginning of the application (2018).
Now the client wishes to be able to see previous progress notes (history), but is unable to modify any previous notes, while still be able to write new ones. The solution I have come up with is to use XML inside the textarea, and use PHP to decipher the new notes from the old ones.
My problem however is if I should have to convert my entire table from a yearly to a daily, that it could take up a lot of time and energy to convert multiple notes into each single rows, (est. 10x) Which could end up being 13,000+ rows. I realize that no matter what method I choose to do is going to be a lot of work. Another way around this perhaps I found was to use XML column type in MySQL to potentially store multiple records, and if I wish to append it, all I would need is PHP to interpret the entire XML and add a new child node, to the beginning. Each progress note is 255 - 500 chars. And in worst case scenario, if the patient was to be 52 times a year (1 for every week), there shouldn't be a large enough overhead.
Is this the correct way to solving this problem? I do wish to keep with MySQL DB and I realize that MySQL is not an intended for XML. And for some clarification, what I hope to accomplish is the same thing I intended to do with current progress notes, but with XML. I believe in ascending order (newer -> oldest).
<xml_result>
<progress_note>
<date>2020-08-16</date>
<content></content>
</progress_note>
<xml_result>
Thank-you for any of your time and for any suggestions.

Firstly, 13000+ is not a problem for mysql. In most case for web application, mysql can handle more than 10m+ records for a single instance with a good performance.
Secondly, you can use either XML or JSON format in a text field and handle the decoding in your application.

Simple RSS reader in php with continuous scrolling

I already made a simple RSS reader, but it only gets me like 25 articles. How do I make it to work like feedly.com or digg.com, so that it retrieves me many more feeds, and not only 25?
The php code I have:
$rss = simplexml_load_file('http://www.elespectador.com/rss.xml');
I already know how to retrieve the title, description, etc. of each item.

Pagination in feeds is arbitrary and you'll have trouble finding a consistent pattern. You should store any data so that now you have 25 elements, but when new ones are added, you keep adding more and more.
Another solution is to use the data from a service like Superfeedr (I created it!) which stores past content for milions of feeds.

Finding "similar" articles in an RSS feed with PHP

There is something I am trying to accomplish although I'm not really sure where to start.
I currently have a MySql database with a list of articles. The DB contains the article title, content, and some other info like dates, etc.
There is an RSS feed that we monitor for new articles, it's a Google Alert feed that just contains the latest news on certain subjects. I want to be able to automatically monitor this feed and record any feed items that are similar to stories currently in our DB.
I know how to set a script to run automatically, and I know how to parse the RSS feed with SimplePie.
What I need to figure out is how to take the description of the rss feed items, run a check on our DB to see if the feed item is similar to something we have in our DB, and return a numerical score of some sort, sort of like a "similarity rating" or something.
After that I can have the info I need recorded to the DB if the "similarity rating" is above a set limit, which I know how to do.
So my only issue is how to compare each feed item to our current articles, and return a score based on how similar it is.

The Levenshtein function (available for both PHP and MySQL) is a good way to handle this. It basically calculates a value based on the number of permutations (replacements, moves, etc) required to convert one string to another. That score would be your "similarity rating".
EDIT: the Levenshtein function is not available natively in MySQL but there are SQL implementations of it that you can use such as: http://kristiannissen.wordpress.com/2010/07/08/mysql-levenshtein/

Best way to find updates in xml feed

I have an xml feed that I have to check periodically for updates. The xml consists of many elements and I'm looking to figure it out which is the best (and probably faster) way to find out which elements suffered updates from last time I've checked.
What I think of is to check first the lastBuildDate for modifications and if it differs from the previous one to start parse the xml again. This would involve keeping each element with all of its attributes in my database. But each element can have different number of attributes as well as other nested elements. So if it would be to store each element in my database what would be the best way to keep them ?
That's why I'm asking for your help :) Thank you.

Most modern databases will store your XML as a blob if you like. (You tagged PHP... MySQL? If so, use MEDIUMTEXT.) Store your XML and generate a diff when you get a new one. If you don't have an XML diff tool, canonicalize both XML listings then run a text diff.

How do I design a web interface for browsing text man pages?

I would like to design a web app that allows me to sort, browse, and display various attributes (e.g. title, tag, description) for a collection of man pages.
Specifically, these are R documentation files within an R package that houses a collection of data sets, maintained by several people in an SVN repository. The format of these files is .Rd, which is LaTeX-like, but different.
R has functions for converting these man pages to html or pdf, but I'd like to be able to have a web interface that allows users to click on a particular keyword, and bring up a list (and brief excerpts) for those man pages that have that keyword within the \keyword{} tag.
Also, the generated html is somewhat ugly and I'd like to be able to provide my own CSS.
One obvious option is to load all the metadata I desire into a database like MySQL and design my site to run queries and fetch the appropriate data.
I'd like to avoid that to minimize upkeep for future maintainers. The number of files is small (<500) and the amount of data is small (only a couple of hundred lines per file).
My current leaning is to have a script that pulls the desired metadata from each file into a summary JSON file and load this summary.json file in PHP, decode it, and loop through the array looking for those items that have attributes that match the current query (e.g. all docs with keyword1 AND keyword2).
I was starting in that direction with the following...
$contents=file_get_contents("summary.json");
$c=json_decode($contents,true);
foreach ($c as $ind=>$val ) { .... etc
Another idea was to write a script that would convert these .Rd files to xml. In that case, are there any lightweight frameworks that make it easy to sort and search a small collection of xml files?
I'm not sure if xQuery is overkill or if I have time to dig into it...
I think I'm suffering from too-many-options-syndrome with all the AJAX temptations. Any help is greatly appreciated.
I'm looking for a super simple solution. How might some of you out there approach this?

My approach would be parsing the keywords (from your description i assume they have a special notation to distinguish them from normal words/text) from the files and storing this data as searchindex somewhere. Does not have to be mySQL, sqlite would surely be enough for your project.
A search would then be very simple.
Parsing files could be automated as post-commit-hook to your subversion repository.

Why don't you create table SUMMARIES with column for each of summary's fields?
Then you could index that with full-text index, assigning different weight to each field.
You don't need MySQL, you can use SQLite which has the the Google's full-text indexing (FTS3) built in.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.