Merge people profiles based on email match - php

I have an existing directory (php with xml datasource) which contains people information such as this:
MainSource.xml
<people>
<person>
<id></id>
<last_name></last_name>
<first_name></first_name>
<email></email>
<phone></phone>
</person>
...
</people>
I need to add a new node to MainSource.xml from NewSource.xml, matching on email address, from the new datasource which contains people info like this:
NewSource.xml
<people>
<person>
<email></email>
<website_url></website_url>
</person>
...
</people>
I have tried a number of variations, but I think my hangup is properly comparing the two documents. Logically, it feels like I need to be iterating, as opposed to foreach? Or two foreach, one for each source? Here's a sample of what I'm thinking. Please offer any clarity or insight which can nudge me along in the right direction.
<?php
$doc1 = new DOMDocument();
$doc1->load('MainSource.xml');
$doc2 = new DOMDocument();
$doc2->load('NewSource.xml');
foreach ($doc1->person as $person) {
if ($person->email === $doc2->person->email) {
$node = $doc1->createElement("website_url", $valueFromDoc2);
$newnode = $doc1->appendChild($node);
}
}
$merged = $doc1->saveXML();
file_put_contents('MergedSource.xml', $merged)
?>

As mentioned by #waterloomatt, you need to use xpath to achieve that.
Assuming that MainSource.xml looks like this:
<people>
<person>
<id>1</id>
<last_name>smith</last_name>
<first_name>john</first_name>
<email>js#example.com</email>
<phone>555-123-1234</phone>
</person>
<person>
<id>2</id>
<last_name>doe</last_name>
<first_name>jane</first_name>
<email>jd#anotherexample.com</email>
<phone>666-234-2345</phone>
</person>
</people>
and NewSource.xml looks like this:
<people>
<person>
<email>js#example.com</email>
<website_url>js.example.com</website_url>
</person>
<person>
<email>jd#anotherexample.com</email>
<website_url>jd.anotherexample.com</website_url>
</person>
</people>
you can try this:
$doc1->loadXML('MainSource.xml');
$xpath1 = new DOMXPath($doc1);
# find each person's email address
$sources = $xpath1->query('//person//email');
$doc2->loadXML('NewSource.xml');
$xpath2 = new DOMXPath($doc2);
foreach ($sources as $source) {
#for each email address, get the parent and use that as the destination
#of the new web address element
$destination = $xpath1->query('..',$source);
#in the other doc, search for each person whose email address matches
#that of the first doc and get the relevant web address
$exp2 = "//person[email[text()='{$source->nodeValue}']]//website_url";
$target = $xpath2->query($exp2);
#import the result of the search as a node into the first doc
$node = $doc1->importNode($target[0], true);
#finally, append the imported node in the right location of the first doc
$destination[0]->appendChild($node);
};
echo $doc1->saveXml();
Output:
<people>
<person>
<id>1</id>
<last_name>smith</last_name>
<first_name>john</first_name>
<email>js#example.com</email>
<phone>555-123-1234</phone>
<website_url>js.example.com</website_url></person>
<person>
<id>2</id>
<last_name>doe</last_name>
<first_name>jane</first_name>
<email>jd#anotherexample.com</email>
<phone>666-234-2345</phone>
<website_url>jd.anotherexample.com</website_url></person>
</people>

Related

How can I search a XML file using regular expressions in php

I am curious if I can search an XML file for a certain tag with regular expressions. I can search the file if I use fopen('foo.xml'); but it will only allow me to search the content between the tags not the tags them self. My objective for this is I hope to create a function that will allow me to delete all the content between two tags for example between users which are in a xml file. He language that I am using is PHP.
Thanks in advance john.
You should use something like SimpleXMLto handle/edit XML files.
If you really insist on doing it by treating the SML file as a string you can do something like this (or you can use regex). But you should use an XML library.
// get your file as a string
$yourXML = file_get_contents($file) ;
$posStart = stripos($yourXML,'<users>') + strlen('<users>') ;
$posEnd = stripos($yourXML,'</users>') ;
$newXML = substr($yourXML,0,$posStart) . substr($yourXML,$posEnd) ;
// <users> is now empty
echo $newXML ;
DomDocument & XPath will make things very clean, direct and reliable.
You can use evaluate() or query() as they provide the same result.
// will seek out the matching tags regardless of their location.
Be aware that my solution is case-sensitive.
Code: (Demo)
$xml = <<<XML
<myXml>
<Person>
<firstName>pradeep</firstName>
<lastName>jain</lastName>
<address>
<doorNumber>287</doorNumber>
<street>2nd block</street>
<city>bangalore</city>
</address>
<phoneNums type="mobile">9980572765</phoneNums>
<phoneNums type="landline">080 42056434</phoneNums>
<phoneNums type="skype">123456</phoneNums>
</Person>
<Person>
<firstName>pradeep</firstName>
<lastName>jain</lastName>
<address>
<doorNumber>287</doorNumber>
<street>2nd block</street>
<city>bangalore</city>
</address>
<phoneNums type="mobile">1</phoneNums>
<phoneNums type="landline">2</phoneNums>
<phoneNums type="skype">3</phoneNums>
</Person>
</myXml>
XML;
$dom = new DOMDocument;
$dom->loadXML($xml); // <-- you'll need to import your file instead of a string as demo'ed here
$xpath = new DOMXPath($dom);
echo count($xpath->evaluate("//phoneNums")) , "\n"; // 6
echo count($xpath->evaluate("//street")) , "\n"; // 2
echo count($xpath->evaluate("//myXml")) , "\n"; // 1
echo count($xpath->evaluate("//Person")) , "\n"; // 2
echo count($xpath->evaluate("//person")) , "\n"; // 0 <-- case-sensitive
As a simple mock up of the various parts needed to do this in SimpleXML, there are a few concepts you need to know to get it to work.
The main one being XPath, which a sort of SQL for XML. Of course it has it's own notation and can be a little pedantic at times, but you can experiment with it on sites like https://codebeautify.org/Xpath-Tester.
$data = '<?xml version="1.0" encoding="UTF-8"?>
<Users>
<User id="123">
<Name>fred</Name>
<Extension>1234</Extension>
</User>
<User id="124">
<Name>bert</Name>
<Extension>1235</Extension>
</User>
<User id="125">
<Name>foo</Name>
<Extension>1236</Extension>
</User>
</Users>';
$userID = "123";
$users = simplexml_load_string($data);
// Find the user with the id attribute (use [0] as the call to xpath
// returns a list of matches and you only want the first one)
$userMatch = $users->xpath("//User[#id='{$userID}']")[0];
// Just output user id attribute and name
echo "id=".$userMatch['id'].",name=".$userMatch->Name.PHP_EOL;
echo "Removing user...".PHP_EOL;
// Remove the user - note the [0] is required here
unset($userMatch[0]);
// Print out the resulting XML after the removal
echo $users->asXML();
I've put comments through the code as how it works. The output is...
id=123,name=fred
Removing user...
<?xml version="1.0" encoding="UTF-8"?>
<Users>
<User id="124">
<Name>bert</Name>
<Extension>1235</Extension>
</User>
<User id="125">
<Name>foo</Name>
<Extension>1236</Extension>
</User>
</Users>

Count how many children are in XML with PHP

i know a few about php, so sorry for the question:
i have this file xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<alert>
<status> </status>
<nothing> </nothing>
<info>
<area>
</area>
</info>
<info>
<area>
</area>
</info>
<info>
<area>
</area>
</info>
</alert>
i must do a for loop and inside a "foreach" for each
The problem is that i'm not sure what is a way to know how many times i had to repeat a for loop. Because in this file xml (that is an example) i don't know how many are
Is good if:
$url = "pathfile";
$xml = simplexml_load_file($url);
$numvulcani = count($xml->alert->info); // is good ?
for ($i = 0; $i <= $numvulcani; $i++) {
foreach ($xml->alert->info[$i] as $entry) {
$area = $entry->area;
}
}
is true ?
sorry for bad english
You need to use SimpleXMLElement::count function for this — It counts the children of an element.
<?php
$xml = <<<EOF
<people>
<person name="Person 1">
<child/>
<child/>
<child/>
</person>
<person name="Person 2">
<child/>
<child/>
<child/>
<child/>
<child/>
</person>
</people>
EOF;
$elem = new SimpleXMLElement($xml);
foreach ($elem as $person) {
printf("%s has got %d children.\n", $person['name'], $person->count());
}
?>
The output will be as follows :
Person 1 has got 3 children.
Person 2 has got 5 children.
Also take a look at this link : xml count using php
Try replacing foreach ($xml->alert->info[$i] as $entry) with:
foreach ($xml->alert->info[$i] as $j => $entry)
The current item index will be $j
You're perhaps overcomplicating this a bit as it's new to you.
First of all, you don't need to reference the alert root element like $xml->alert because the SimpleXMLElement named by the variable $xml represents that document element already.
And second, you don't need to count here, you can just foreach directly:
foreach ($xml->info as $info) {
echo ' * ', $info->asXML(), "\n";
}
This iterates over those three info elements that are children of the alert element.
I recommend the Basic SimpleXML usage guide in the PHP manual for a good start with SimpleXML.

PHP XML append to created file

I have the following XML documment:
<list>
<person>
<name>Simple name</name>
</person>
</list>
I try to read it, and basically create another "person" element. The output I want to achieve is:
<list>
<person>
<name>Simple name</name>
</person>
<person>
<name>Simple name again</name>
</person>
</list>
Here is how I am doing it:
$xml = new DOMDocument();
$xml->load('../test.xml');
$list = $xml->getElementsByTagName('list') ;
if ($list->length > 0) {
$person = $xml->createElement("person");
$name = $xml->createElement("name");
$name->nodeValue = 'Simple name again';
$person->appendChild($name);
$list->appendChild($person);
}
$xml->save("../test.xml");
What I am missing here?
Edit: I have translated the tags, so that example would be clearer.
Currently, you're pointing/appending to the node list instead of that found parent node:
$list->appendChild($person);
// ^ DOMNodeList
You should point to the element:
$list->item(0)->appendChild($person);
Sidenote: The text can already put inside the second argument of ->createElement():
$name = $xml->createElement("name", 'Simple name again');

Prepending raw XML using PHP's SimpleXML

Given a base $xml and a file containing a <something> tag with attributes, children and children of its children, I would like to append it as first child and all of its children as raw XML.
Original XML:
<root>
<people>
<person>
<name>John Doe</name>
<age>47</age>
</person>
<person>
<name>James Johnson</name>
<age>13</age>
</person>
</people>
</root>
XML in file:
<something someval="x" otherthing="y">
<child attr="val" ..> { some children and values ... }</child>
<child attr="val2" ..> { some children and values ... }</child>
...
</something>
Result XML:
<root>
<something someval="x" otherthing="y">
<child attr="val" ..> { some children and values ... }</child>
<child attr="val2" ..> { some children and values ... }</child>
...
</something>
<people>
<person>
<name>John Doe</name>
<age>47</age>
</person>
<person>
<name>James Johnson</name>
<age>13</age>
</person>
</people>
</root>
This tag would contain several children both direct and recursively, so it would not be practical to build the XML via the SimpleXML operations. Besides, keeping it in a file would result in lower maintenance costs.
Technically it would simply be prepending one child. The problem is that this child would have other children and so on.
On the PHP addChild page there's a comment that says:
$x = new SimpleXMLElement('<root name="toplevel"></root>');
$f1 = new SimpleXMLElement('<child pos="1">alpha</child>');
$x->{$f1->getName()} = $f1; // adds $f1 to $x
However, this does not seem to treat my XML as raw XML therefore causing < and > escaped tags to appear. Several warnings concerning namespaces seem to appear as well.
I suppose I could do a quick replace of such tags but I am not sure whether it could cause future problems and it certainly does not feel right.
Manually hacking the XML is not an option and neither is adding children one by one. Choosing a different library could be.
Any clues on how to get this working?
Thanks!
I'm really not sure if that will work. Try this or downvote this, but I hope it helps. Using DOMDocument (Reference)
<?php
$xml = new DOMDocument();
$xml->loadHTML($yourOriginalXML);
$newNode = DOMDocument::createElement($someXMLtoPrepend);
$nodeRoot = $xml->getElementsByTagName('root')->item(0);
$nodeOriginal = $xml->getElementsByTagName('people')->item(0);
$nodeRoot->insertBefore($newNode,$nodeOriginal);
$finalXmlAsString = $xml->saveXML();
?>
Sometimes UTF-8 can make problems, then try this:
<?php
$xml = new DOMDocument();
$xml->loadHTML(mb_convert_encoding($yourOriginalXML, 'HTML-ENTITIES', 'UTF-8'));
$newNode = DOMDocument::createElement(mb_convert_encoding($someXMLtoPrepend, 'HTML-ENTITIES', 'UTF-8'));
$nodeRoot = $xml->getElementsByTagName('root')->item(0);
$nodeOriginal = $xml->getElementsByTagName('people')->item(0);
$nodeRoot->insertBefore($newNode,$nodeOriginal);
$finalXmlAsString = $xml->saveXML();
?>

PHP parsing xml xpath

XML
<person>
<description>
<p>blah blah blah</p>
<p>kjdsfksdjf</p>
</description>
</person>
<person>
<description>
k kjsdf kk sak kfsdjk sadk
</description>
</person>
I'd like to parse the description so that it returns the html tags that are inside.
I've tried both of these, without success
$description = ereg_replace('<description>|</description>','',$person->description->asXML());
$description = $person->description;
Any suggestions?
EDIT
What I'm trying to accomplish is to import an xml file into a mysql db. Everything is working accept what is mentioned above... the paragraph tags inside the description aren't showing up... and they need to be there. The mysql field "description" is set as a text field. If I was to parse the xml to output in the browser then $description = ereg_replace('<description>|</description>','',$person->description->asXML()); works fine... this isn't true though when I'm trying to import into mysql. Do I need to add something to the mysql INSERT? mysql_query("UPDATE table SET description = '$value' WHERE id = '$id'");
Please familiarize yourself with the SimpleXml API:
$xml = <<< XML
<person>
<description>
<p>blah blah blah</p>
<p>kjdsfksdjf</p>
</description>
</person>
XML;
$person = simplexml_load_string($xml);
foreach ($person->description->children() as $child) {
echo $child->asXml();
}
gives
<p>blah blah blah</p><p>kjdsfksdjf</p>
Note that SimpleXml isnt capable of doing the same for the second description element you show because it has no concept of text nodes, e.g.
$xml = <<< XML
<person>
<description>
k kjsdf kk sak kfsdjk sadk
</description>
</person>
XML;
$person = simplexml_load_string($xml);
foreach ($person->description->children() as $child) {
echo $child->asXml();
}
will return an empty string. If you want a unified API, use DOM:
$xml = <<< XML
<people>
<person>
<description>
<p>blah blah blah</p>
<p>kjdsfksdjf</p>
</description>
</person>
<person>
<description>
k kjsdf kk sak kfsdjk sadk
</description>
</person>
</people>
XML;
$dom = new DOMDocument;
$dom->loadXml($xml);
$xp = new DOMXPath($dom);
foreach ($xp->query('/people/person/description/node()') as $child) {
echo $dom->saveXml($child);
}
will give
<p>blah blah blah</p>
<p>kjdsfksdjf</p>
k kjsdf kk sak kfsdjk sadk
For importing XML into MySql, you can also use http://dev.mysql.com/doc/refman/5.5/en/load-xml.html
I'd like to parse the description so that it returns the html tags that are inside.
In XPath you would select the child nodes of the description elements.
Use:
"//person/description/*"
to get all child nodes (html tags only) or
"//person/description/node()"
to get all child nodes (html tags and text nodes).
For instance, this php code:
<?php
$xml = simplexml_load_file("test.xml");
$result = $xml->xpath("//person/description/*");
print_r($result);
?>
Returns an array of SimpleXMLElements which are children of description. Each item is retrieved with all its descendant nodes.

Categories