I am trying to parse an XML string containing the special octal character 205. The XML string comes from a shoutcast server. It seems that this character crashes the internals of SimpleXMLElement.
// This is my metadata.php file:
<?php
header('Content-type: application/json');
$output = file_get_contents('error.xml.bak');
$xml = new SimpleXMLElement($output, LIBXML_NOERROR|LIBXML_ERR_NONE) or die('Something is wrong');
echo 'OK';
?>
I am getting the following error:
kazepis$ php metadata.php
PHP Fatal error: Uncaught Exception: String could not be parsed as XML in /var/www/html/wolfclub/metadata.php:4
Stack trace:
#0 /var/www/html/radio/metadata.php(4): SimpleXMLElement->__construct('<?xml version="...', 32)
#1 {main}
thrown in /var/www/html/radio/metadata.php on line 4
You can find the sample XML with the problematic character (decimal 133, octal 205) here:
https://wetransfer.com/downloads/f1b615c1cc09c8262cdd9965991b9cd420200123155505/801ab3
or inline:
<?xml version="1.0" standalone="yes" ?><!DOCTYPE SHOUTCASTSERVER [<!ELEMENT SHOUTCASTSERVER (CURRENTLISTENERS,PEAKLISTENERS,MAXLISTENERS,REPORTEDLISTENERS,AVERAGETIME,SERVERGENRE,SERVERURL,SERVERTITLE,SONGTITLE,SONGURL,IRC,ICQ,AIM,WEBHITS,STREAMHITS,STREAMSTATUS,BITRATE,CONTENT,VERSION,WEBDATA,LISTENERS,SONGHISTORY)><!ELEMENT CURRENTLISTENERS (#PCDATA)><!ELEMENT PEAKLISTENERS (#PCDATA)><!ELEMENT MAXLISTENERS (#PCDATA)><!ELEMENT REPORTEDLISTENERS (#PCDATA)><!ELEMENT AVERAGETIME (#PCDATA)><!ELEMENT SERVERGENRE (#PCDATA)><!ELEMENT SERVERURL (#PCDATA)><!ELEMENT SERVERTITLE (#PCDATA)><!ELEMENT SONGTITLE (#PCDATA)><!ELEMENT SONGURL (#PCDATA)><!ELEMENT IRC (#PCDATA)><!ELEMENT ICQ (#PCDATA)><!ELEMENT AIM (#PCDATA)><!ELEMENT WEBHITS (#PCDATA)><!ELEMENT STREAMHITS (#PCDATA)><!ELEMENT STREAMSTATUS (#PCDATA)><!ELEMENT BITRATE (#PCDATA)><!ELEMENT CONTENT (#PCDATA)><!ELEMENT VERSION (#PCDATA)><!ELEMENT WEBDATA (INDEX,LISTEN,PALM7,LOGIN,LOGINFAIL,PLAYED,COOKIE,ADMIN,UPDINFO,KICKSRC,KICKDST,UNBANDST,BANDST,VIEWBAN,UNRIPDST,RIPDST,VIEWRIP,VIEWXML,VIEWLOG,INVALID)><!ELEMENT INDEX (#PCDATA)><!ELEMENT LISTEN (#PCDATA)><!ELEMENT PALM7 (#PCDATA)><!ELEMENT LOGIN (#PCDATA)><!ELEMENT LOGINFAIL (#PCDATA)><!ELEMENT PLAYED (#PCDATA)><!ELEMENT COOKIE (#PCDATA)><!ELEMENT ADMIN (#PCDATA)><!ELEMENT UPDINFO (#PCDATA)><!ELEMENT KICKSRC (#PCDATA)><!ELEMENT KICKDST (#PCDATA)><!ELEMENT UNBANDST (#PCDATA)><!ELEMENT BANDST (#PCDATA)><!ELEMENT VIEWBAN (#PCDATA)><!ELEMENT UNRIPDST (#PCDATA)><!ELEMENT RIPDST (#PCDATA)><!ELEMENT VIEWRIP (#PCDATA)><!ELEMENT VIEWXML (#PCDATA)><!ELEMENT VIEWLOG (#PCDATA)><!ELEMENT INVALID (#PCDATA)><!ELEMENT LISTENERS (LISTENER*)><!ELEMENT LISTENER (HOSTNAME,USERAGENT,UNDERRUNS,CONNECTTIME, POINTER, UID)><!ELEMENT HOSTNAME (#PCDATA)><!ELEMENT USERAGENT (#PCDATA)><!ELEMENT UNDERRUNS (#PCDATA)><!ELEMENT CONNECTTIME (#PCDATA)><!ELEMENT POINTER (#PCDATA)><!ELEMENT UID (#PCDATA)><!ELEMENT SONGHISTORY (SONG*)><!ELEMENT SONG (PLAYEDAT, TITLE)><!ELEMENT PLAYEDAT (#PCDATA)><!ELEMENT TITLE (#PCDATA)>]><SHOUTCASTSERVER><CURRENTLISTENERS>1</CURRENTLISTENERS><PEAKLISTENERS>3</PEAKLISTENERS><MAXLISTENERS>5000</MAXLISTENERS><REPORTEDLISTENERS>1</REPORTEDLISTENERS><AVERAGETIME>1</AVERAGETIME><SERVERGENRE>public</SERVERGENRE><SERVERURL>http://www.virtualdj.com/</SERVERURL><SERVERTITLE>wolf</SERVERTITLE><SONGTITLE>BARRY WHITE - YOU'RE THE FIRST,THE LAST ▒ </SONGTITLE><SONGURL></SONGURL><IRC>wolf</IRC><ICQ>wolf</ICQ><AIM>wolf</AIM><WEBHITS>80</WEBHITS><STREAMHITS>6</STREAMHITS><STREAMSTATUS>1</STREAMSTATUS><BITRATE>96</BITRATE><CONTENT>audio/mpeg</CONTENT><VERSION>1.9.8</VERSION><WEBDATA><INDEX>0</INDEX><LISTEN>0</LISTEN><PALM7>6</PALM7><LOGIN>0</LOGIN><LOGINFAIL>0</LOGINFAIL><PLAYED>0</PLAYED><COOKIE>0</COOKIE><ADMIN>1</ADMIN><UPDINFO>1</UPDINFO><KICKSRC>0</KICKSRC><KICKDST>0</KICKDST><UNBANDST>0</UNBANDST><BANDST>0</BANDST><VIEWBAN>0</VIEWBAN><UNRIPDST>0</UNRIPDST><RIPDST>0</RIPDST><VIEWRIP>0</VIEWRIP><VIEWXML>69</VIEWXML><VIEWLOG>0</VIEWLOG><INVALID>3</INVALID></WEBDATA><LISTENERS><LISTENER><HOSTNAME>78.129.222.56</HOSTNAME><USERAGENT>curl/7.29.0</USERAGENT><UNDERRUNS>0</UNDERRUNS><CONNECTTIME>216</CONNECTTIME><POINTER>0</POINTER><UID>2</UID></LISTENER></LISTENERS><SONGHISTORY><SONG><PLAYEDAT>1579791561</PLAYEDAT><TITLE>BARRY WHITE - YOU'RE THE FIRST,THE LAST ▒ </TITLE></SONG></SONGHISTORY></SHOUTCASTSERVER>
Any ideas why this is happening?
My operating system:
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
Thank you!
PHP 7.3.11-1~deb10u1 (cli) (built: Oct 26 2019 14:14:18) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.11, Copyright (c) 1998-2018 Zend Technologies
with Zend OPcache v7.3.11-1~deb10u1, Copyright (c) 1999-2018, by Zend Technologies
I found this:
https://www.php.net/manual/en/class.simplexmlelement.php#107869
However the $errors array as shown in the line $errors = libxml_get_errors(); is always empty in my case. So that snippet did not help. Moreover i also got the following warnings:
PHP Warning: DOMDocument::loadXML(): Input is not proper UTF-8, indicate encoding !\nBytes: 0x85 0x20 0x20 0x20 in Entity, line: 1 in /var/www/html/radio/metadata.php on line 6
[Fri Jan 24 18:56:12.290661 2020] [php7:warn] [pid 17910] [client 130.xxx.xxx.xxx:xxxxx] PHP Warning: simplexml_import_dom(): Invalid Nodetype to import in /var/www/html/radio/metadata.php on line 12
Anyway, I managed to get over this messy situation by using utf8_encode() to encode my string before feeding it to the SimpleXMLElement constructor.
My resulting "test" php file is:
<?php
// header('Content-type: application/json');
$output = file_get_contents('error.xml.bak');
$output = utf8_encode($output);
$doc = new DOMDocument('1.0', 'utf-8');
$doc->loadXML($output);
var_dump($output);
$errors = libxml_get_errors();
var_dump($errors);
$xml = simplexml_import_dom($doc);
// $xml = new SimpleXMLElement($output, LIBXML_NOERROR|LIBXML_ERR_NONE) or die('Something is wrong');
var_dump($xml);
?>
which results in the following printout without any errors or warning whatsoever...
string(3396) "]>13500011publichttp://www.virtualdj.com/wolfBARRY WHITE - YOU'RE THE FIRST,THE LAST wolfwolfwolf806196audio/mpeg1.9.800600001100000000690378.129.222.56curl/7.29.00216021579791561 " array(0) { } object(SimpleXMLElement)#2 (22) { ["CURRENTLISTENERS"]=> string(1) "1" ["PEAKLISTENERS"]=> string(1) "3" ["MAXLISTENERS"]=> string(4) "5000" ["REPORTEDLISTENERS"]=> string(1) "1" ["AVERAGETIME"]=> string(1) "1" ["SERVERGENRE"]=> string(6) "public" ["SERVERURL"]=> string(25) "http://www.virtualdj.com/" ["SERVERTITLE"]=> string(4) "wolf" ["SONGTITLE"]=> string(64) "BARRY WHITE - YOU'RE THE FIRST,THE LAST " ["SONGURL"]=> object(SimpleXMLElement)#3 (0) { } ["IRC"]=> string(4) "wolf" ["ICQ"]=> string(4) "wolf" ["AIM"]=> string(4) "wolf" ["WEBHITS"]=> string(2) "80" ["STREAMHITS"]=> string(1) "6" ["STREAMSTATUS"]=> string(1) "1" ["BITRATE"]=> string(2) "96" ["CONTENT"]=> string(10) "audio/mpeg" ["VERSION"]=> string(5) "1.9.8" ["WEBDATA"]=> object(SimpleXMLElement)#4 (20) { ["INDEX"]=> string(1) "0" ["LISTEN"]=> string(1) "0" ["PALM7"]=> string(1) "6" ["LOGIN"]=> string(1) "0" ["LOGINFAIL"]=> string(1) "0" ["PLAYED"]=> string(1) "0" ["COOKIE"]=> string(1) "0" ["ADMIN"]=> string(1) "1" ["UPDINFO"]=> string(1) "1" ["KICKSRC"]=> string(1) "0" ["KICKDST"]=> string(1) "0" ["UNBANDST"]=> string(1) "0" ["BANDST"]=> string(1) "0" ["VIEWBAN"]=> string(1) "0" ["UNRIPDST"]=> string(1) "0" ["RIPDST"]=> string(1) "0" ["VIEWRIP"]=> string(1) "0" ["VIEWXML"]=> string(2) "69" ["VIEWLOG"]=> string(1) "0" ["INVALID"]=> string(1) "3" } ["LISTENERS"]=> object(SimpleXMLElement)#5 (1) { ["LISTENER"]=> object(SimpleXMLElement)#7 (6) { ["HOSTNAME"]=> string(13) "78.129.222.56" ["USERAGENT"]=> string(11) "curl/7.29.0" ["UNDERRUNS"]=> string(1) "0" ["CONNECTTIME"]=> string(3) "216" ["POINTER"]=> string(1) "0" ["UID"]=> string(1) "2" } } ["SONGHISTORY"]=> object(SimpleXMLElement)#6 (1) { ["SONG"]=> object(SimpleXMLElement)#7 (2) { ["PLAYEDAT"]=> string(10) "1579791561" ["TITLE"]=> string(64) "BARRY WHITE - YOU'RE THE FIRST,THE LAST " } } }
NOTE: The problematic character is STILL THERE! \u0085 but properly encoded so I guess that is why it is not a problem any more...
I also tried the previous version of the code with the SimpleXMLElement constructor:
<?php
$output = file_get_contents('error.xml.bak');
$output = utf8_encode($output);
$xml = new SimpleXMLElement($output, LIBXML_NOERROR|LIBXML_ERR_NONE) or die('Something is wrong');
echo json_encode($xml);
?>
which also worked as expected:
{"CURRENTLISTENERS":"1","PEAKLISTENERS":"3","MAXLISTENERS":"5000","REPORTEDLISTENERS":"1","AVERAGETIME":"1","SERVERGENRE":"public","SERVERURL":"http:\/\/www.virtualdj.com\/","SERVERTITLE":"wolf","SONGTITLE":"BARRY WHITE - YOU'RE THE FIRST,THE LAST \u0085 ","SONGURL":{},"IRC":"wolf","ICQ":"wolf","AIM":"wolf","WEBHITS":"80","STREAMHITS":"6","STREAMSTATUS":"1","BITRATE":"96","CONTENT":"audio\/mpeg","VERSION":"1.9.8","WEBDATA":{"INDEX":"0","LISTEN":"0","PALM7":"6","LOGIN":"0","LOGINFAIL":"0","PLAYED":"0","COOKIE":"0","ADMIN":"1","UPDINFO":"1","KICKSRC":"0","KICKDST":"0","UNBANDST":"0","BANDST":"0","VIEWBAN":"0","UNRIPDST":"0","RIPDST":"0","VIEWRIP":"0","VIEWXML":"69","VIEWLOG":"0","INVALID":"3"},"LISTENERS":{"LISTENER":{"HOSTNAME":"78.129.222.56","USERAGENT":"curl\/7.29.0","UNDERRUNS":"0","CONNECTTIME":"216","POINTER":"0","UID":"2"}},"SONGHISTORY":{"SONG":{"PLAYEDAT":"1579791561","TITLE":"BARRY WHITE - YOU'RE THE FIRST,THE LAST \u0085 "}}}
NOTE the \u0085 towards the end...
I am trying to access a Taleo RSS/XML feed and parse the data. I am using SimpleXML, and it is loading in all of the regular data correctly, such as <title>, <link>, etc.
However, there are several nodes that are formatted like <taleo:reqId> or <taleo:location>, and I can't seem to figure out how to access that data. It's not being returned by SimpleXML.
`$xml = simplexml_load_file('https://chp.tbe.taleo.net/chp03/ats/servlet/Rss?org=DRAGADOS&cws=1&WebPage=SRCHR&WebVersion=0&_rss_version=2');`
Returns in web browser source:
`<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://chp.tbe.taleo.net/chp03/ats/rss/taleorssfeed.xsl" ?>
<?xml-stylesheet type="text/css" href="https://chp.tbe.taleo.net/chp03/ats/rss/taleorssfeed.css" ?>
<rss xmlns:taleo="urn:TBERss" version="2.0">
<channel>
<title>Dragados Job Feed</title>
<link>https://chp.tbe.taleo.net/dispatcher/servlet/DispatcherServlet?org=DRAGADOS&act=redirectCws&cws=1</link>
<description>Dragados Job Feed</description>
<language>en</language>
<pubDate>Tue, 07 Nov 2017 17:00:40 GMT</pubDate>
<ttl>60</ttl>
<item>
<title>Estimator</title>
<link>https://chp.tbe.taleo.net/chp03/ats/careers/requisition.jsp?org=DRAGADOS&cws=1&rid=1155</link>
<guid>https://chp.tbe.taleo.net/chp03/ats/careers/requisition.jsp?org=DRAGADOS&cws=1&rid=1155</guid>
<description> ..... </description>
<pubDate>Tue, 07 Nov 2017 17:00:40 GMT</pubDate>
<taleo:reqId>1155</taleo:reqId>
<taleo:location>Southern California Branch (bidding)</taleo:location>
<taleo:locationCountry>US</taleo:locationCountry>
<taleo:locationState>US-CA</taleo:locationState>
<taleo:locationCity>Costa Mesa</taleo:locationCity>
<taleo:department>West Coast Bidding</taleo:department>
<taleo:html-description> ... </taleo:html-description>
</item>
...`
Returns in SimpleXML:
`object(SimpleXMLElement)#487 (2) { ["#attributes"]=> array(1) { ["version"]=> string(3) "2.0" } ["channel"]=> object(SimpleXMLElement)#486 (7) { ["title"]=> string(17) "Dragados Job Feed" ["link"]=> string(97) "https://chp.tbe.taleo.net/dispatcher/servlet/DispatcherServlet?org=DRAGADOS&act=redirectCws&cws=1" ["description"]=> string(17) "Dragados Job Feed" ["language"]=> string(2) "en" ["pubDate"]=> string(29) "Tue, 07 Nov 2017 17:00:40 GMT" ["ttl"]=> string(2) "60" ["item"]=> array(79) { [0]=> object(SimpleXMLElement)#485 (5) { ["title"]=> string(9) "Estimator" ["link"]=> string(87) "https://chp.tbe.taleo.net/chp03/ats/careers/requisition.jsp?org=DRAGADOS&cws=1&rid=1155" ["guid"]=> string(87) "https://chp.tbe.taleo.net/chp03/ats/careers/requisition.jsp?org=DRAGADOS&cws=1&rid=1155" ["description"]=> string(3580) " ... " ["pubDate"]=> string(29) "Tue, 07 Nov 2017 17:00:40 GMT" }
...`
I find it easier sometimes to split out the data for different namespaces in this case. So when going through the <item> data, you can extract all the elements for a specific namespace (the 'taleo' elements in this case) using ->children('namespaceURN') and it then access the data in a similar way to all the other times but using this new set of nodes as the basis.
foreach ( $xml->channel as $channel ) {
echo "title=".$channel->title.PHP_EOL;
foreach ( $channel->item as $item ) {
echo "pubDate=".$item->pubDate.PHP_EOL;
// Extract the elements for taleo namespace
$taleo = $item->children("urn:TBERss");
echo "taleo:reqId=".$taleo->reqId.PHP_EOL;
}
}
Below is my xml with namespace
<?xml version="1.0" encoding="UTF-8"?>
<hotels xmlns="http://www.test.com/schemas/messages" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" schemaLocation="http://www.test.com/schemas/messages">
<hotel >
<rooms>
<room>
<rates>
<rate id="1" adults="1" child="0"></rate>
<rate id="2" adults="2" child="0"></rate>
<rate id="3" adults="1" child="0"></rate>
</rates>
</room>
<room>
<rates>
<rate id="4" adults="1" child="0"></rate>
<rate id="5" adults="2" child="0"></rate>
<rate id="6" adults="2" child="0"></rate>
</rates>
</room>
</rooms>
</hotel>
</hotels>
i trying below php code (xpath) using foreach to get values of the ratenode
$xd = simplexml_load_file('C:/inetpub/vhosts/test.com/data_download/q.xml');
$xd->registerXPathNamespace("n", "http://www.test.com/schemas/messages");
foreach($xd->xpath("//n:hotels/n:hotel") as $xd_item)
{
echo 'item - A';
foreach($xd_item->xpath("rooms/room") as $xd_room)
{
foreach($xd_room->xpath("rates/rate") as $xd_rate)
{
echo 'rate - C';
}
}
}
In the foreach of $xd_item is not working. I mean the 2nd foreach its end with the value "echo 'item - A';" anyone can help me?
The problem with your codes is, as mentioned by #kjhughes in his comment, that some elements are in the default namespace but your XPath missed to use the corresponding prefix on those elements. Also, your code prints static literal strings i.e 'item - A' and 'rate - C', not any part of the XML being parsed.
" Is possible to write conditions like $xd->xpath("rates/rate[#id='1']") in dom"
Yes it is possible. It is also possible using SimpleXML, for example :
$xd->registerXPathNamespace("n", "http://www.test.com/schemas/messages");
foreach($xd->xpath("//n:rate[#id='1']") as $rate){
echo $rate["id"] .", ". $rate["adults"] .", ". $rate["child"] ."\r\n";
}
eval.in demo
output :
//id, adults, rate
1, 1, 0
The variable $xd_item is of type SimpleXMLElement from which you can access its room properties like this for example:
$xd_item->rooms->room
This will return an object of type SimpleXMLElement from which you can get the rates and you can loop using a foreach. The values that you want from the 'rate' are in the attributes
For example:
$xd = simplexml_load_file('C:/inetpub/vhosts/test.com/data_download/q.xml');
$xd->registerXPathNamespace("n", "http://www.test.com/schemas/messages");
foreach($xd->xpath("//n:hotels/n:hotel") as $xd_item)
{
foreach($xd_item->rooms->room as $room) {
foreach ($room->rates->rate as $rate) {
echo sprintf(
'id: %s<br>adults: %s<br>child: %s<br><br>',
$rate->attributes()->id->__toString(),
$rate->attributes()->adults->__toString(),
$rate->attributes()->child->__toString()
);
}
}
}
Whether the 25 digits are decimal and integers or just integers, DOMDocument::schemaValidate() fires a warning, return false, and libxml_get_errors(); captures the next errors:
PHP snippet:
$DD = new DOMDocument('1.0', 'ISO-8859-1');
$DD -> loadXML('<?xml version ="1.0" encoding="ISO-8859-1"?><a></a>');
libxml_use_internal_errors(true);
$old_libxml_disable_entity_loader = libxml_disable_entity_loader(false);
$DD -> schemaValidate(__DIR__ . '/schemas/schema.xsd'); // WARNING
libxml_disable_entity_loader($old_libxml_disable_entity_loader);
$errors = libxml_get_errors();
foreach ($errors as $error) { // PRINT ERRORS
echo $error -> code . '<br>';
echo $error -> message . '<br>';
}
DOMDocument::schemaValidate() Generated Errors:
Error 1824:
Element '{http://www.w3.org/2001/XMLSchema}maxInclusive':
'9999999999999999999999999' is not a valid value of the
atomic type 'xs:decimal'. in /path/schema.xsd on line X
Error 1717:
Element '{http://www.w3.org/2001/XMLSchema}maxInclusive': The value
'9999999999999999999999999' of the facet does not validate
against the base type '{http://www.w3.org/2001/XMLSchema}decimal'. in
/path/schema.xsd on line X
Valid schema (invalid XML only):
<?xml version="1.0" encoding="ISO-8859-1"?>
<xs:schema
targetNamespace="http://www.lala.com/la"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:la="http://www.lala.com/la"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xs:simpleType name="AmountType">
<xs:restriction base="xs:decimal">
<xs:totalDigits value="100"/>
<xs:fractionDigits value="20"/>
<xs:maxInclusive value="999999999999999999999999"/><!-- 24 DIGITS -->
</xs:restriction>
</xs:simpleType>
</xs:schema>
Invalid schema: WARNING + Libxml internal errors of invalid schema
<?xml version="1.0" encoding="ISO-8859-1"?>
<xs:schema
targetNamespace="http://www.lala.com/la"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:la="http://www.lala.com/la"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xs:simpleType name="AmountType">
<xs:restriction base="xs:decimal">
<xs:totalDigits value="100"/>
<xs:fractionDigits value="20"/>
<xs:maxInclusive value="9999999999999999999999999"/><!-- 25 DIGITS -->
</xs:restriction>
</xs:simpleType>
</xs:schema>
PHP version: 5.5.20
Libxml version: 2.9.2
According to W3C XML Schema Part 2: Datatypes Second Edition, libxml2 can limit the range of maxInclusive because it is allowed to limit the range of the value space of xs:decimal...
4.3.7 maxInclusive:
[Definition:] maxInclusive is the ·inclusive upper bound· of the
·value space· for a datatype with the ·ordered· property. The value of
maxInclusive ·must· be in the ·value space· of the ·base type·.
3.2.3 decimal
Note: All ·minimally conforming· processors ·must· support decimal
numbers with a minimum of 18 decimal digits (i.e., with a
·totalDigits· of 18). However, ·minimally conforming· processors ·may·
set an application-defined limit on the maximum number of decimal
digits they are prepared to support, in which case that
application-defined maximum number ·must· be clearly documented.
I want to load XML file and then remove all <Charge> where <DispositionDate> is bigger/older then 7 years. Date format is YYYY-MM-DD.
XML example:
<BackgroundReports userId="" password="" account="" >
<BackgroundReportPackage>
<Screenings>
<Screening type="criminal" qualifier="">
<CriminalReport>
<CriminalCase>
<AgencyReference type="Docket">
<IdValue>CR-0870120-09</IdValue>
</AgencyReference>
<Charge>
<ChargeId>
<IdValue>1</IdValue>
</ChargeId>
<ChargeOrComplaint>DUI: HIGHEST RTE OF ALC (BAC .16+) 1ST OFF</ChargeOrComplaint>
<ChargeTypeClassification>unknown</ChargeTypeClassification>
<DispositionDate>2009-04-07</DispositionDate>
</Charge>
<Charge>
<ChargeId>
<IdValue>2</IdValue>
</ChargeId>
<ChargeOrComplaint>CARELESS DRIVING</ChargeOrComplaint>
<ChargeTypeClassification>unknown</ChargeTypeClassification>
<DispositionDate>2010-08-02</DispositionDate>
</Charge>
<Charge>
<ChargeId>
<IdValue>3</IdValue>
</ChargeId>
<ChargeOrComplaint>STATUTE: 475 PC</ChargeOrComplaint>
<ChargeTypeClassification>misdemeanor</ChargeTypeClassification>
<OffenseDate>1988-11-05</OffenseDate>
<Disposition>CONVICTED</Disposition>
<DispositionDate>1988-11-09</DispositionDate>
<DispositionDate>1988-11-05</DispositionDate>
<DispositionDate>1988-11-09</DispositionDate>
</Charge>
</CriminalCase>
</CriminalReport>
</Screening>
</Screenings>
</BackgroundReportPackage>
</BackgroundReports>
I know how to open and close/save file using PHP, I don't know how to delete the part i don't want... If anyone would help me with that I would be extremly thankfull!
You can either use SimpleXML, DOM or XSL for it.
Example XML (shortened for brevity (from Revision 1 of your question)):
$xml = <<< XML
<CriminalCase>
<Charge>
<DispositionDate>1995-12-21</DispositionDate>
</Charge>
<Charge>
<DispositionDate>2010-12-21</DispositionDate>
</Charge>
</CriminalCase>
XML;
With SimpleXml
$sevenYearsAgo = new DateTime('-7 years');
$CriminalCase = new SimpleXmlElement($xml);
for ($i = 0; $i < $CriminalCase->Charge->count(); $i++) {
$dispositionDate = new DateTime($CriminalCase->Charge->DispositionDate);
if ($dispositionDate < $sevenYearsAgo) {
unset($CriminalCase->Charge[$i]);
}
}
echo $CriminalCase->asXml();
With DOM
$dom = new DOMDocument;
$dom->loadXml($xml);
$xpath = new DOMXPath($dom);
$oldCases = $xpath->query(
sprintf(
'//Charge[substring-before(DispositionDate, "-") < %d]',
date('Y', strtotime('-7 years'))
)
);
foreach ($oldCases as $oldCase) {
$oldCase->parentNode->removeChild($oldCase);
}
echo $dom->saveXml();
With XSLT
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:date="http://exslt.org/dates-and-times"
extension-element-prefixes="date">
<xsl:output indent="yes" method="xml"/>
<xsl:template match="/">
<CriminalCase>
<xsl:apply-templates />
</CriminalCase>
</xsl:template>
<xsl:template match="Charge">
<xsl:if test="date:year(DispositionDate) > date:year() - 7">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
and then use this PHP Code to transform it
$doc = new DOMDocument();
$xsl = new XSLTProcessor();
$doc->loadXml($xsl);
$xsl->importStyleSheet($doc);
$doc->loadXml($xml);
echo $xsl->transformToXML($doc);
Here are some tips on how to get started:
You need to parse the XML to something a little easier to work with. PHP has a library called SimpleXML.
Loop through the data and remove the objects which are older than 7 years. To compare dates you have to first convert the dates you got from the XML to something PHP can process as a date. Take a look at strtotime which gives you the timestamp (seconds since 1970, actually 1901 for version > 5.1.0) or DateTime which supports dates before 1970.
To check if the fetched date is older than 7 years ago, you need to (one way) subtract the timestamp with the current timestamp, and see if that value is greater than 7 years in seconds. Or if you use DateTime, you can take a look at DateTime::diff. Remove the objects that you iterate over that are older than 7 years (unset).
To save as XML again, take a look at SimpleXMLElement::asXML
Hope that helps!