Deleting XML entries after certain amount of time

Deleting XML entries after certain amount of time - php

I have done a search but I can't really find anything related to what I am needing. I am not sure if I am searching for it correctly or not. I have an XML file that is created via PHP and populated with data from a form. What I am trying to do is delete certain entries in that XML file after X amount of time. Is there an easy way to do this or can it even be done at all. I was thinking of some php script that was run by a CRON to check the XML file and delete certain entries by the timestamp after X amount of time. Can someone provide some suggestions or get me pointed in the right direction?
-Thanks!

To remove the old nodes from an XML file in a PHP script, you would need to parse the XML file, using, for example, the SimpleXML (http://www.php.net/manual/en/simplexml.examples-basic.php) library.
You can then check each of the nodes in question to see whether a timestamp attribute/child value is less than the current time - 12 hours, and if so, remove that node.
This assumes that the nodes have a timestamp attribute or value - you may need to add this in where you create the XML in PHP, for example
<anode created="2013/06/04 12:00">
//
</anode>
Lastly, to perform the date comparison, you will want to use an if statement similar to the following;
$timestamp = strtotime($domElement->getAttribute('timestamp'));
if ($timestamp < strtotime('now - 12 hour')) {
// remove node
}

Related

How do I use a depth as referal/delimater to a flatfile with line numbers as columns?

I am working on a genealogy project and I am trying to read the GEDCOM (*.ged) file format.
Instead of using columns, they choose to use lines and on each line has a root node and following sub node with its affiliated values, and their associated (child) nodes. Very simple numbering system 0 (root) node and 1,2,3... another root node and so on.
The problem that I'm having is that I have placed variables as (check points) indicating what part/section the program would be in, head / submission / individual, in order to minimize just what part of the program it is in, at what point. But one of the child nodes (particularly DATE) is inside the indi-birt-date as well as indi-chan-date, and there for passes / fails to differentiate and parse the correct date in intended check point.
Unlike head and submission, there are multiple (indi)viduals. And there for its difficult to create the right code to match the scenario.
preg_match('/^\d.(DATE)\s(.*)/i', '2 DATE Jan 01 2022', $matches); is the same condition for both instances. And therefor my birthdate is overridden with the final update (chan-date)
What I would like to know is how do I create a depth scenario, as a variable, and decide just what level I am in the program, and there for limit just what matches and executing code limits to the depth the program is at.
Update: I moved a couple of checkpoints inside so they could only be opened once per individual, and closed when they are not needed.

I have created a image showing how to create/read from GED file in PHP.
The code assumes that it has a file open (fopen) and reading line by line (while-fgets-loop). Instead of reading the entire file into memory (arrays), its best to store it all into a relational database. That code can be written in place of the first die() function. Also its best to mention that the programs that create these GED files, also have hidden characters in front of the line as a space (zero width no-break space Unicode code point).
So use the following $line = preg_replace('/[\x{200B}-\x{200D}\x{FEFF}]/u', '', trim($line));

Is there a way to store multiple records rather than using multiple rows in MySQL?

I would like to make full use out of MySQL for the purpose of a (web) application I have developed for a chiropractor.
So far I have been storing in a single row for [every year] for what are called progress notes. The table structure looks something like this (progress_note_id, patient_id, date (Y-0-0), progress_note). When the client wishes to append for the year of the current progress notes, he simply clicks at the top of a textarea (html), which I use TinyMCE JavaScript library, to make a new entry date along with the shorthand notes to go at the beginning of the column (progress_note). So far its been working ok, if there are 900+ clients (est.) there could potentially be 1300+ progress notes, for each year since the beginning of the application (2018).
Now the client wishes to be able to see previous progress notes (history), but is unable to modify any previous notes, while still be able to write new ones. The solution I have come up with is to use XML inside the textarea, and use PHP to decipher the new notes from the old ones.
My problem however is if I should have to convert my entire table from a yearly to a daily, that it could take up a lot of time and energy to convert multiple notes into each single rows, (est. 10x) Which could end up being 13,000+ rows. I realize that no matter what method I choose to do is going to be a lot of work. Another way around this perhaps I found was to use XML column type in MySQL to potentially store multiple records, and if I wish to append it, all I would need is PHP to interpret the entire XML and add a new child node, to the beginning. Each progress note is 255 - 500 chars. And in worst case scenario, if the patient was to be 52 times a year (1 for every week), there shouldn't be a large enough overhead.
Is this the correct way to solving this problem? I do wish to keep with MySQL DB and I realize that MySQL is not an intended for XML. And for some clarification, what I hope to accomplish is the same thing I intended to do with current progress notes, but with XML. I believe in ascending order (newer -> oldest).
<xml_result>
<progress_note>
<date>2020-08-16</date>
<content></content>
</progress_note>
<xml_result>
Thank-you for any of your time and for any suggestions.

Firstly, 13000+ is not a problem for mysql. In most case for web application, mysql can handle more than 10m+ records for a single instance with a good performance.
Secondly, you can use either XML or JSON format in a text field and handle the decoding in your application.

Change RSS feed, but only new items

I'm fairly new to PHP, and I'm trying to write a script that solves the following
I have an RSS feed that gets saved to my server every 10 minutes (copied from elsewhere).
There is a problem with the timestamps (pubDate tag) on the RSS feed, they always have the correct date but 00:00:00 GMT as the timestamp (I have no control over this).
Therefor, when I use an autotweeting program to tweet updates from the feed (it checks it every hour or so), it won't - It only tweets the first update of each day as a result.
Therefor, what I'm trying to do to fix it to some degree is to check if the feed has changed, and if it has, change the saved pubDate to the current server time on only the new items.
I'm also kind of confused as to how I can have it check for changes - If I have a corrected version (with fairly accurate timestamps) saved to my server, it will ALWAYS think there are changes, because the timestamps will always be 00:00:00. I'm thinking, check both feeds for items including the full strings such as <guid isPermaLink="true">http://services.runescape.com/m=adventurers-log/a=161/display_player_profile.ws?searchName=A13d&id=-463827091</guid> - Since the id= at the end stays constant, it would only change the <pubDate> of items found to be new.
http://services.runescape.com/m=adventurers-log/a=161/rssfeed?searchName=A13d Here is a feed as an example. If anyone could get me started or point me to some kind of tutorial that might help, I'd really appreciate it. This is over my head, but something I need to learn how to do.

Maybe there is something wrong with your code parsing the timestamp, date format perhaps?
I believe the method of doing full string comparisons(<title> & <description>) between items with the same <guid> is your best bet. Here is some reading about RSS duplicate detection if you are interested.

Fastest Way to display a data node + all its attributes in PHP?

I'm using php to take xml files and convert them into single line tab delimited plain text with set columns (i.e. ignores certain tags if database does not need it and certain tags will be empty). The problem I ran into is that it took 13 minutes to go through 56k (+ change) files, which I think is ridiculously slow. (average folder has upwards of a million xml files) I'll probably cronjob it overnight anyways, but it is completely untestable at a reasonable pace while I'm at work for things like missing files and corrupt files and such.
Here's hoping someone can help me make the thing faster, the xml files themselves are not too big (<1k lines) and I don't need every single data tag, just some, here's my data node method:
function dataNode ($entries) {
$out = "";
foreach ($entries as $e) {
$out .= $e->nodeValue."[ATTRIBS]";
foreach ($e->attributes as $name => $node)
$out .= $name."=".$node->nodeValue;
}
return $out;
}
where $entries is a DOMNodeList generated from XPath queries for the nodes I need. So the question is, what is the fastest way to go to a target data node or nodes (if I have 10 keyword nodes from my XPath query then I need all of them to be printed from that function) and output the nodevalue and all it's attributes?
I read here that iterating through a DOMNodeList isn't constant time but I can't really use the solution given because a sibling to the node I want might be one that I don't need or need to call a different format function before I write it to file and I really don't want to run the node through a gigantic switch statement for every iteration trying to format out the data.
Edit: I'm an idiot, I had my write function inside my processing loop so every iteration it had to reopen the file I was writing to, thanks for both of your help, I'm trying to learn XSLT right now as it seems very useful.

A comment would be a little short, so I write it as an answer:
It's hard to say where actually your setup can benefit from optimizing. Perhaps it's possible to join multiple of your many XML files together before loading.
From the information you give in your question I would assume that it's more the disk operations that are taking the time than the XML parsing. I found DomDocument and Xpath quite fast even on large files. An XML file with up to 60 MB takes about 4-6 secs to load, a file of 2MB only a fraction.
Having many small files (< 1k) would mean a lot of work on the disk, opening / closing files. Additionally, I have no clue how you iterate over directories/files, sometimes this can be speed up dramatically as well. Especially as you say that you have millions of file nodes.
So perhaps concatenating/merging files is an option for you which can be run quite safe so to reduce the time to test your converter.
If you encounter missing or corrupt files, you should create a log and catch these errors. So you can let the job run through and check for errors later.
Additionally, if possible, you can try to make your workflow resumeable. E.g. if an error occurs, the current state is saved and next time you can continue at this state.
The suggestion above in a comment to run an XSLT on the files is a good idea as well to transform them first. Having a new layer in the middle to transpose data can help to reduce the overall problem dramatically as it can reduce complexity.
This workflow on XML files has helped me so far:
Preprocess the file (plain text filters, optional)
Parse the XML. That's loading into DomDocument, XPath iterating etc.
My Parser sends out events with the parsed data if found.
The Parser throws a specific exception if data is encountered that is not in the expected format. That allows to realize errors in the own parser.
Every other errors are converted to Exceptions as well.
Exceptions can be caught and operations finished. E.g. go to next file etc.
Logger, Resumer and Exporter (file-export) can hook onto the events. Sort of the visitor pattern.
I've build such a system to process larger XML files which formats change. It's flexible enough to deal with changes (e.g. replace the parser with a new version while keeping logging and exporting). The event system really pushed it for me.
Instead of a gigantic switch statement I normally use a $state variable for the parsers state while iterating over a domnodelist. $state can be handy to resume operations later. Restore the state and go to the last known position, then continue.

Best way to find updates in xml feed

I have an xml feed that I have to check periodically for updates. The xml consists of many elements and I'm looking to figure it out which is the best (and probably faster) way to find out which elements suffered updates from last time I've checked.
What I think of is to check first the lastBuildDate for modifications and if it differs from the previous one to start parse the xml again. This would involve keeping each element with all of its attributes in my database. But each element can have different number of attributes as well as other nested elements. So if it would be to store each element in my database what would be the best way to keep them ?
That's why I'm asking for your help :) Thank you.

Most modern databases will store your XML as a blob if you like. (You tagged PHP... MySQL? If so, use MEDIUMTEXT.) Store your XML and generate a diff when you get a new one. If you don't have an XML diff tool, canonicalize both XML listings then run a text diff.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.