I have an xml feed that I have to check periodically for updates. The xml consists of many elements and I'm looking to figure it out which is the best (and probably faster) way to find out which elements suffered updates from last time I've checked.
What I think of is to check first the lastBuildDate for modifications and if it differs from the previous one to start parse the xml again. This would involve keeping each element with all of its attributes in my database. But each element can have different number of attributes as well as other nested elements. So if it would be to store each element in my database what would be the best way to keep them ?
That's why I'm asking for your help :) Thank you.
Most modern databases will store your XML as a blob if you like. (You tagged PHP... MySQL? If so, use MEDIUMTEXT.) Store your XML and generate a diff when you get a new one. If you don't have an XML diff tool, canonicalize both XML listings then run a text diff.
Related
I would like to make full use out of MySQL for the purpose of a (web) application I have developed for a chiropractor.
So far I have been storing in a single row for [every year] for what are called progress notes. The table structure looks something like this (progress_note_id, patient_id, date (Y-0-0), progress_note). When the client wishes to append for the year of the current progress notes, he simply clicks at the top of a textarea (html), which I use TinyMCE JavaScript library, to make a new entry date along with the shorthand notes to go at the beginning of the column (progress_note). So far its been working ok, if there are 900+ clients (est.) there could potentially be 1300+ progress notes, for each year since the beginning of the application (2018).
Now the client wishes to be able to see previous progress notes (history), but is unable to modify any previous notes, while still be able to write new ones. The solution I have come up with is to use XML inside the textarea, and use PHP to decipher the new notes from the old ones.
My problem however is if I should have to convert my entire table from a yearly to a daily, that it could take up a lot of time and energy to convert multiple notes into each single rows, (est. 10x) Which could end up being 13,000+ rows. I realize that no matter what method I choose to do is going to be a lot of work. Another way around this perhaps I found was to use XML column type in MySQL to potentially store multiple records, and if I wish to append it, all I would need is PHP to interpret the entire XML and add a new child node, to the beginning. Each progress note is 255 - 500 chars. And in worst case scenario, if the patient was to be 52 times a year (1 for every week), there shouldn't be a large enough overhead.
Is this the correct way to solving this problem? I do wish to keep with MySQL DB and I realize that MySQL is not an intended for XML. And for some clarification, what I hope to accomplish is the same thing I intended to do with current progress notes, but with XML. I believe in ascending order (newer -> oldest).
<xml_result>
<progress_note>
<date>2020-08-16</date>
<content></content>
</progress_note>
<xml_result>
Thank-you for any of your time and for any suggestions.
Firstly, 13000+ is not a problem for mysql. In most case for web application, mysql can handle more than 10m+ records for a single instance with a good performance.
Secondly, you can use either XML or JSON format in a text field and handle the decoding in your application.
I'm looking into the possibility of efficiently comparing two similar XML-files and updating outdated information.
The main XML-file I'm working with is about 200-250mb in size. The second is a tad smaller.
The two XML-files pretty much looks like this:
<product>
<Category>BOOK</Category>
<Bookgroup>BOOKF</Bookgroup>
<Productname>Name of the book</Productname>
<Productcode>123456789</Productcode>
<Price>79.00</Price>
<Availability>Stock On Order</Availability>
<ProductURL>www.url.com</ProductURL>
<Release>07.08.2013</Release>
<Author>Name of author</Author>
<Genre>Crime</Genre>
<BookType>Pocket</BookType>
<Language>English</Language>
</product>
As you can see I'm working with books, and the purpose of having a second XML-file with the same information is that I only want one copy of each book for further use.
Basically I'm trying to figure out how I effectively can parse through the first XML and check whether the book exists in the second XML. If it exists I'll check if productinformation (price, availabilty etc) have been updated. If this information has been updated this needs to be updated in the second XML as well.
If it doesnt exist it needs to be added to the second XML.
Using XMLReader I'm able to parse through each book from the first XML fairly fast (40ish seconds to loop through 4,5million lines of XML and echo out all the books) by using a similar approach as this.
My problem occurs when I want to check if this book exists in the second XML and make changes in the second XML if it needs to be updated or added.
Would it for example be possible to use XMLReader on the second XML and stop at nodes with the same booktitle as I've stopped at in the first XML and then make the check? If so how?
How can I parse an 88 GB RDF file with PHP?
This RDF is filled with entities and facts about each entity.
I'm trying to iterate through each entity and check for certain facts per each entity. Then write those facts to an XML document I created earlier in the script.
So as I am navigating the rdf, per each entity I create a <card></card> element and give it a child called <facts>. I run through all the facts on the entity and I take the ones I need and write them inside and as <fact></fact> element children inside the <facts></facts>.
How can I parse the rdf, extract the data, and write it to XML?
First, use an RDF parser. Googling for a PHP RDF parser turned up lots of results; I dont use PHP personally, but I'm sure one of them will do the job of parsing RDF. But make sure it's a streaming parser, you're not going to hold 88G of RDF in memory on your workstation.
Second, you said you need to 'iterate through each entity' that might be tricky if either they're not sorted by subject in the original file, or the parser does not report them in the same order.
Assuming that is not a problem, then you can just keep the triples for each subject in a local data structure, and when you get a triple w/ a subject different than the ones you've queued locally, do whatever business logic you need and write out the XML. Might want to make sure you can't queue up so many statements locally that you'll OOM.
Lastly, I'm going to assume you have a good reason to take RDF and turn it into an XML format that is not RDF/XML. But I you might reconsider your design just in case.
Or you could put the data in an RDF database and write SPARQL queries against it, transforming query results into whatever XML or anything else you need.
I think your best option would be:
use some external tool (probably something like rapper?) to convert the source-file from Turtle into n-triples format
iterate file one line at a time via fopen+fgets as n-triples defines strict 1-statement per 1-line constraint which is perfect in this case
I am trying to build a very simple price comparison script.
Until now, I wrote a code that gets some product xml feeds from shops and with the help of XSLT I create a single-global xml of all those input XMLs. I use the XSLT because the shops have different names for elements.
Now I want to take it one step further and I want to create a search form that will display me the products let's say I have the term "laptop".
I know how to create a form, but I need a coding guidance to understand how to make it to search in my XML file (products.xml) and display let's say the
Thank you
You might want to check out http://php.net/manual/en/class.xmlreader.php
Using that it is pretty easy to navigate through an XML file and grab all the info you need.
EDIT:
On second thought, http://php.net/manual/en/book.simplexml.php is a MUCH simpler way to achieve what you're trying to do. Hence the name, I guess ;)
You can use SimpleXML library to parse your xml file. In my opinion SimpleXML is easier to use than xmlreader. Though SimpleXML is introduced on php5.
I want to store the contents of an xml file in the database. Is there an easy way to do it?
Can i write some script that can do the task for me?
The schema of the XML file looks like this:
<schedule start="20100727120000 +0530" stop="20100727160000 +0530" ch_id="0210.CHNAME.in">
<title>Title_info</title>
<date>20100727</date>
<category>cat_02</category>
</schedule>
One thing to note is:
How do I read the start time? I need the time +0530 added to the time?
Thank you so much.
You'll probably want to create a table called schedules that matches your data, then read the contents of the XML file with an XML parser of your choice. SimpleXML might be the right tool for this job.
As for the dates, I recommend you try using the function date_parse_from_format().
look up simple_xml on the php page - off hand I'm not too hot on it, but basically you will end up with a loop which will add your data to an object eg:
$xml
and you will be able to call tags as such $xml->schedule->title $xml->schedule->date and $xml->schedule->category and you will be able to call attributes as such $xml->schedule[start] but you might wanna check that.
I had to do this recently for a client, and this was the best way I could find. The attributes may be tricky - I can't quite remember but you might have to look into namespaces and such... anyway, find simple_xml and you're on the right tracks.