I have a SimpleXML Object made from merging multiple XMLs from PubMed (snippet below) but there is repetition from the merge. How can I compare all first child arrays - array[][0], array[][1] etc - and discard any duplicates?
I though perhaps serialising was the answer but you can't serialise a SimpleXML Object afaik..
I'm not sure where to start?
Array
(
[0] => Array
(
[title] => SimpleXMLElement Object
(
[0] => Superstructure of the centromeric complex of TubZRC plasmid partitioning systems.
)
[link] => SimpleXMLElement Object
(
[#attributes] => Array
(
[Version] => 1
)
[0] => 23010931
)
[author] => Aylett, CH., Löwe, J.
[journal] => SimpleXMLElement Object
(
[0] => Proc. Natl. Acad. Sci. U.S.A.
)
[pubdate] => 2012-9-27
[day] => SimpleXMLElement Object
(
[0] => 25
)
[month] => SimpleXMLElement Object
(
[0] => Sep
)
[year] => SimpleXMLElement Object
(
[0] => 2012
)
)
[1] => Array
(
[title] => SimpleXMLElement Object
(
[0] => Superstructure of the centromeric complex of TubZRC plasmid partitioning systems.
)
[link] => SimpleXMLElement Object
(
[#attributes] => Array
(
[Version] => 1
)
[0] => 23010931
)
[author] => Aylett, CH., Löwe, J.
[journal] => SimpleXMLElement Object
(
[0] => Proc. Natl. Acad. Sci. U.S.A.
)
[pubdate] => 2012-9-27
[day] => SimpleXMLElement Object
(
[0] => 25
)
[month] => SimpleXMLElement Object
(
[0] => Sep
)
[year] => SimpleXMLElement Object
(
[0] => 2012
)
)
Alternatively it could be done at the initial XML merge stage - I use the code below at the moment if anyone can suggest how to modify it to remove duplicates?
function simplexml_merge (SimpleXMLElement &$xml1, SimpleXMLElement $xml2) {
$dom1 = new DomDocument();
$dom2 = new DomDocument();
$dom1->loadXML($xml1->asXML());
$dom2->loadXML($xml2->asXML());
$xpath = new domXPath($dom2);
$xpathQuery = $xpath->query('/*/*');
for ($i = 0; $i < $xpathQuery->length; $i++) {
$dom1->documentElement->appendChild(
$dom1->importNode($xpathQuery->item($i), true));
}
$xml1 = simplexml_import_dom($dom1);
}
$xml1 = new SimpleXMLElement($search1);
$xml2 = new SimpleXMLElement($search2);
simplexml_merge($xml1, $xml2);
Thanks.
...
...
For clarity - here's the XML source layout that I am importing into SimpleXML - each PubmedArticle is one "element" I am interested in comparing and ensuring there are no duplicates -
<xml...>
<Document>
<PubmedArticle>
<MedlineCitation>
<PMID version="1">xxx</PMID>
...
</MedlineCitation>
...
</PubmedArticle>
<PubmedArticle>
<MedlineCitation>
<PMID version="1">xxx</PMID>
...
</MedlineCitation>
...
</PubmedArticle>
etc
</Document>
</xml>
The PMID node is unique so can be used to check for duplicates.
...
...
Using the link from #Gordon - I know use:
//Get my source XML
$xml1 = new SimpleXMLElement($search1);
$xml2 = new SimpleXMLElement($search2);
//Run through $xml1 and build a query based on it's PMIDs
$query = array();
foreach ($xml1->PubmedArticle as $paper) {
$query[] = sprintf('(PMID != %s)',$paper->MedlineCitation->PMID);
}
$query = implode('and', $query);
//Run through $xml2 and get node which don't have PMID matching $xml1
foreach ($xml2->xpath(sprintf('PubmedArticle/MedlineCitation[%s]', $query)) as $paper) {
echo $paper->asXml();
}
However I still have one problem - getting the output merged.
The output of $xml2 is missing the <PubmedArticle> node around each 'match' for a start. Then I presume I can use the same merge code (above) to do the merge.
Can you point me in the right direction?
Convert it to an array (which I'm not going to write for you, just iterate and add.), then array_diff().
Decided to follow #Gordon's line as it kept it XML. Eventually got it all working:
//function to check 2 xml inputs for duplicate nodes
function dedupeXML($xml1, $xml2) {
$query = array();
foreach ($xml1->PubmedArticle as $paper) {
$query[] = sprintf('(MedlineCitation/PMID != %s)',$paper->MedlineCitation->PMID);
}
$query = implode('and', $query);
$xmlClean = '<Document>';
foreach ($xml2->xpath(sprintf('PubmedArticle[%s]', $query)) as $paper) {
$xmlClean .= $paper->asXML();
}
$xmlClean .= '</Document>';
$xmlClean = new SimpleXMLElement($xmlClean);
return $xmlClean;
}
//function to merge 2 xml inputs
function mergeXML (SimpleXMLElement &$xml1, SimpleXMLElement $xml2) {
// convert SimpleXML objects into DOM ones
$dom1 = new DomDocument();
$dom2 = new DomDocument();
$dom1->loadXML($xml1->asXML());
$dom2->loadXML($xml2->asXML());
// pull all child elements of second XML
$xpath = new domXPath($dom2);
$xpathQuery = $xpath->query('/*/*');
for ($i = 0; $i < $xpathQuery->length; $i++) {
// and pump them into first one
$dom1->documentElement->appendChild(
$dom1->importNode($xpathQuery->item($i), true));
}
$xml = simplexml_import_dom($dom1);
return $xml;
}
$xml1 = new SimpleXMLElement($search1);
$xml2 = new SimpleXMLElement($search2);
$xml3 = new SimpleXMLElement($search3);
//dedupe and merge inputs
//input 1 & 2
$xml2Clean = dedupeXML($xml1, $xml2);
$xml12 = mergeXML($xml1, $xml2Clean);
//input 1+2 & 3
$xml3Clean = dedupeXML($xml12, $xml3);
$xml123 = mergeXML($xml12, $xml3Clean);
This would be easy to adapt to other data sources - just modify the dedupeXML function to match the data structure of your XML.
Related
I can't manage to sort an array alfabetically.
It's an array with cities that I get from an external XML.
The XML looks like this, and it's the node localidad I am trying to sort.
<parada>
<id>506</id>
<localidad>
<![CDATA[ Alvor ]]>
</localidad>
<parada>
<![CDATA[ Alvor Baia Hotel (Bus Stop Alvor Férias) ]]>
</parada>
<lat>37.1296</lat>
<lng>-8.58058</lng>
<horasalida>05:40</horasalida>
</parada>
The relevant code:
$xml = new SimpleXMLElement($viajes);
foreach ($xml->parada as $excursion) {
$newParadasarray[] = $excursion->localidad;
}
$newParadasarray = array_unique($newParadasarray);
foreach ($newParadasarray as $parada) {
if (strpos($parada, 'Almuñecar') !== false)
echo '<option value="Almuñecar">Almuñecar</option>';
if (strpos($parada, 'Benalmádena') !== false)
echo '<option value="Benalmádena Costa">Benalmádena Costa</option>';
if (strpos($parada, 'Estepona') !== false)
echo '<option value="Estepona">Estepona</option>';
etc.
}
I have tried with sort() and array_values().
This is the output of print_r($newParadasarray):
Array (
[0] => SimpleXMLElement Object ( [0] => SimpleXMLElement Object ( ) )
[1] => SimpleXMLElement Object ( [0] => SimpleXMLElement Object ( ) )
[2] => SimpleXMLElement Object ( [0] => SimpleXMLElement Object ( ) )
[4] => SimpleXMLElement Object ( [0] => SimpleXMLElement Object ( ) )
[9] => SimpleXMLElement Object ( [0] => SimpleXMLElement Object ( ) )
[14] => SimpleXMLElement Object ( [0] => SimpleXMLElement Object ( ) )
[20] => etc.
The problem is that your assigning a SimpleXMLElement into the array, instead you want the content of the element, so just change the line...
$newParadasarray[] = $excursion->localidad;
to
$newParadasarray[] = trim((string)$excursion->localidad);
The cast (string) takes the text content and trim() removes the extra whitespace around it.
I am assuming that you have multiple <parada> elements, so that $xml->parada is returning the correct data.
If you're familiar with DOMDocument you could simply do this:
$doc = new DOMDocument();
$doc->loadXML($xml);
$array = array();
foreach($doc->getElementsByTagName("localidad") as $localidad) {
$array[] = trim($localidad->nodeValue);
}
$array = array_unique($array);
sort($array);
$data = simplexml_load_file($source_url);
if I'm looping through some items that look like this
[item] => Array
(
[0] => SimpleXMLElement Object
(
[title] => Justin Bieber and Chance the Rapper Collaborate
[link] => http://feedproxy.google.com/~r/absolutepunknet/~3/vn98Mqyr4_4/showthread.php
[pubDate] => Mon, 09 Dec 2013 07:37:44 GMT
[description] => SimpleXMLElement Object
(
)
[category] => News
[guid] => http://www.absolutepunk.net/showthread.php?t=3576311
)
[1] => SimpleXMLElement Object
(
[title] => SimpleXMLElement Object
(
)
[link] => http://feedproxy.google.com/~r/absolutepunknet/~3/IjS0KtTy8Ws/showthread.php
[pubDate] => Mon, 09 Dec 2013 07:06:56 GMT
[description] => SimpleXMLElement Object
(
)
[category] => News
[guid] => http://www.absolutepunk.net/showthread.php?t=3576281
)
Using
foreach ($data->channel->item as $key => $value)
How can I access and potentially remove one of the objects, in this example, I'd like to remove [1] => SimpleXMLElementOjbect
I've tried doing it using value of $key but that just contains the word "item". Can't quite figure this one out.
If you want to remove the item from the XML element (not the array item itself) you can do it this way:
$dom = dom_import_simplexml($element);
$dom->parentNode->removeChild($dom);
For removing array items #Dave Chen's answer should work (only outside the foreach loop I think)
Use DOMDocument directly, DOMXpath to find nodes:
$dom = new DOMDocument();
$dom->load($rssFile);
$xpath = new DOMXpath($dom);
//fetch the second item into a list
$items = $xpath->evaluate('/rss/channel/item[2]');
foreach ($items as $item) {
// remove the item from its parent node
$item->parentNode->removeChild($item);
}
echo $dom->saveXml();
In Xpath list positions start with 1 not 0. The index 1 in PHP is the position 2 in Xpath.
How to get Cxyabc, Cxy123 and Cxy234 inside an array from below object?
$xml_element = simplexml_load_string($xml,null, LIBXML_NOCDATA);
$childId = $xml_element->Parent->ChildID;
print_r(childId);
SimpleXMLElement Object (
[#attributes] => Array (
[entity] => result
[order-value] => 1
)
[0] => Cxyabc
[1] => Cxy123
[2] => Cxy234
)
Thanks for answers, i tried below one and working fine. string conversion is necessary.
$test = array();
foreach($childId as $value){
$strValue = (string)$value;
array_push($test,$strValue);
}
Try:
$cxyabc = $obj->{0};
$cxy123 = $obj->{1};
The usage of { } is necessary because object properties cannot begin with a digit so $obj->0 is not valid.
You would access the attributes using array notation:
$entity = $obj['entity'];
I'm have the following working example to retrieve a specific Wikipedia page that returns a SimpleXMLElement Object:
ini_set('user_agent', 'michael#example.com');
$doc = New DOMDocument();
$doc->load('http://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&format=xml');
$xml = simplexml_import_dom($doc);
print '<pre>';
print_r($xml);
print '</pre>';
Which returns:
SimpleXMLElement Object
(
[parse] => SimpleXMLElement Object
(
[#attributes] => Array
(
[title] => Main Page
[revid] => 472210092
[displaytitle] => Main Page
)
[text] => <body><table id="mp-topbanner" style="width: 100%;"...
Silly question/mind blank. What I am trying to do is capture the $xml->parse->text element and in-turn parse that. So ultimately what I want returned is the following object; how do I achieve this?
SimpleXMLElement Object
(
[body] => SimpleXMLElement Object
(
[table] => SimpleXMLElement Object
(
[#attributes] => Array
(
[id] => mp-topbanner
[style] => width:100% ...
After grabbing a fresh tea and eating a banana, here's the solution I've come up with:
ini_set('user_agent','michael#example.com');
$doc = new DOMDocument();
$doc->load('http://en.wikipedia.org/w/api.php?action=parse&page=Main%20Page&format=xml');
$nodes = $doc->getElementsByTagName('text');
$str = $nodes->item(0)->nodeValue;
$html = new DOMDocument();
$html->loadHTML($str);
This then allows me to get an elements value, which is what I was after. For example:
echo "Some value: ";
echo $html->getElementById('someid')->nodeValue;
I just want to get the value from xml node.So I following the code from php document:
SimpleXMLElement::xpath() .But it didn't.And I thought the Xpath is much more inconvenience ,is there a much better way to get the node I want??!
my php code:
<?php
/**
* #author kevien
* #copyright 2010
*/
$arr = array ();
$xml = simplexml_load_file("users.xml");
$result = $xml->xpath('/users/user[#id="126"]/watchHistory/whMonthRecords[#month="2010-09"]/whDateList/date');
while(list( , $node) = each($result)) {
array_push($arr, $node);
}
print_r($arr);
?>
it returns:
Array ( [0] => SimpleXMLElement Object ( [0] => 02 ) [1] => SimpleXMLElement Object ( [0] => 03 ) [2] => SimpleXMLElement Object ( [0] => 06 ) [3] => SimpleXMLElement Object ( [0] => 10 ) [4] => SimpleXMLElement Object ( [0] => 21 ) )
my part of users.xml :
<users>
<user id="126">
<name>老黄牛三</name>
<watchHistory>
<whMonthRecords month="2010-09">
<whDateList month="2010-09">
<date>02</date>
<date>03</date>
<date>06</date>
<date>10</date>
<date>21</date>
</whDateList>
</<whMonthRecords>
</<watchHistory>>
</user>
</users>
Thank you very much!!
Replace your whole loop with:
foreach ($result as $node) {
$arr[] = (string)$node;
}
or even:
$result = array_map('strval', $result);