Trying to get all URLs values from xml.
I have hundreds of entry exactly in the form like e.g. this entry 16:
<?xml version="1.0" encoding="utf-8" ?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<entries>
<entry id="16">
<revision number="1" status="accepted" wordclass="v" nounclasses="" unverified="false"></revision>
<media type="audio" url="http://website.com/file/65.mp3" />
</entry>
<entry id="17">
....
</entry>
</entries>
</root>
I am using this code but cannot get it to work. Why?
$doc = new DOMDocument;
$doc->Load('data.xml');
$xpath = new DOMXPath($doc);
$query = '//root/entries/entry/media';
$entries = $xpath->query($query);
What is the correc query for that? Best would be to only get the url value.
Your query probably returns the proper elements, but by default gives you the content of the media tag ( which in your case are empty, since the tag is self-closing ).
To get the url attribute of the tag you should use getAttribute(), example :
$entries = $xpath->query('//root/entries/entry/media');
foreach($entries as $entry) {
print $entry->getAttribute("url")."<br/>";
}
Or you should just xpath-query the attribute instead and read out it's value:
$urlAttributes = $xpath->query('//root/entries/entry/media/#url');
#####
foreach ($urlAttributes as $urlAttribute)
{
echo $urlAttribute->value, "<br/>\n";
#####
}
See DOMAttr::$valueDocs:
value
The value of the attribute
I would do that with SimpleXML actually:
$file = 'data.xml';
$xpath = '//root/entries/entry/media/#url';
$xml = simplexml_load_file($file);
$urls = array();
if ($xml) {
$urls = array_map('strval', $xml->xpath($xpath));
}
Which will give you all URLs as strings inside the $urls array. If there was an error loading the XML file, the array is empty.
Related
I want to delete those entries where the title matches my $titleArray.
My XML files looks like:
<products>
<product>
<title>Battlefield 1</title>
<url>https://www.google.de/</url>
<price>0.80</price>
</product>
<product>
<title>Battlefield 2</title>
<url>https://www.google.de/</url>
<price>180</price>
</product>
</products>
Here is my code but I don't think that it is working and my IDE says here $node->removeChild($product); -> "Expected DOMNode, got DOMNodeList"
What is wrong and how can I fix that?
function removeProduct($dom, $productTag, $pathXML, $titleArray){
$doc = simplexml_import_dom($dom);
$items = $doc->xpath($pathXML);
foreach ($items as $item) {
$node = dom_import_simplexml($item);
foreach ($titleArray as $title) {
if (mb_stripos($node->textContent, $title) !== false) {
$product = $node->parentNode->getElementsByTagName($productTag);
$node->removeChild($product);
}
}
}
}
Thank you and Greetings!
Most DOM methods that fetch nodes return a list of nodes. You can have several element nodes with the same name. So the result will a list (and empty list if nothing is found). You can traverse the list and apply logic to each node in the list.
Here are two problems with the approach. Removing nodes modifies the document. So you have to be careful not to remove a node that you're still using after that. It can lead to any kind of unexpected results. DOMNode::getElementsByTagName() returns a node list and it is a "live" result. If you remove the first node the list actually changes, not just the XML document.
DOMXpath::evaluate() solves two of the problems at the same time. The result is not "live" so you can iterate the result with foreach() and remove nodes. Xpath expressions allow for conditions so you can filter and fetch specific nodes. Unfortunately Xpath 1.0 has now lower case methods, but you can call back into PHP for that.
function isTitleInArray($title) {
$titles = [
'battlefield 2'
];
return in_array(mb_strtolower($title, 'UTF-8'), $titles);
}
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions('isTitleInArray');
$expression = '//product[php:function("isTitleInArray", normalize-space(title))]';
foreach ($xpath->evaluate($expression) as $product) {
$product->parentNode->removeChild($product);
}
echo $document->saveXml();
Output:
<?xml version="1.0"?>
<products>
<product>
<title>Battlefield 1</title>
<url>https://www.google.de/</url>
<price>0.80</price>
</product>
</products>
I have an XML like below
<entries>
<entry>
<title lang="en">Sample</title>
<entrydate>0</entrydate>
<contents>0</contents>
<entrynum>0</entrynum>
</entry>
<entry>
<title lang="fr">Sample</title>
<entrydate>1</entrydate>
<contents>1</contents>
<entrynum>1</entrynum>
</entry>
</entries>
Is there a way in PHP to delete the parent node (entry) based on the title lang attribute? I need to keep only the en ones, so in this case I would need to get the XML without the second entry node.
I tried looking around but couldn't find any solution...
You need to use DOMDocument class to parse string to XML document. Then use DOMXpath class to find target element in document and use DOMNode::removeChild() to remove selected element from document.
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
// select target entry tag
$entry = $xpath->query("//entry[title[#lang='fr']]")->item(0);
// remove selected element
$entry->parentNode->removeChild($entry);
$xml = $doc->savexml();
You can check result in demo
You could also read your file and generated new one with your modification
<?php
$entries = array('title' => "What's For Dinner",
'link' => 'http://menu.example.com/',
'description' => 'Choose what to eat tonight.');
print "<entries>\n";
foreach ($entries as $element => $content) {
print " <$element>";
print htmlentities($content);
print "</$element>\n";
}
print "</entries>";
?>
Use the method described in this answer, i.e.
<?php
$xml = simplexml_load_file('1.xml');
$del_items = [];
foreach ($xml->entry as $e) {
$attr = $e->title->attributes();
if ($attr && $attr['lang'] != 'en') {
$del_items []= $e;
}
}
foreach ($del_items as $e) {
$dom = dom_import_simplexml($e);
$dom->parentNode->removeChild($dom);
}
echo $xml->asXML();
Output
<?xml version="1.0" encoding="UTF-8"?>
<entries>
<entry>
<title lang="en">Sample</title>
<entrydate>0</entrydate>
<contents>0</contents>
<entrynum>0</entrynum>
</entry>
</entries>
The items cannot be removed within the first loop, because otherwise we may break the iteration chain. Instead, we collect the entry objects into $del_items array, then remove them from XML in separate loop.
Consider the following code:
$xml = <<<XML
<root>
<region id='thisRegion'></region>
<region id='thatRegion'></region>
</root>
XML;
$partials['thisRegion'] = "<p>Here's this region</p>";
$partials['thatRegion'] = "<p>Here's that region</p>";
$DOM = new DOMDocument;
$DOM->loadXML($xml);
$regions = $DOM->getElementsByTagname('region');
foreach( $regions as $region )
{
$id = $region->getAttribute('id');
$partial = $DOM->createDocumentFragment();
$partial->appendXML( $partials[$id] );
$region->parentNode->replaceChild($partial, $region);
}
echo $DOM->saveXML();
The output is:
<root>
<p>Here's this region</p>
<region id="thatRegion"/>
</root>
I cannot for the life of me figure out why all of the region tags aren't being replaced. This is a problem in my project, and at first I thought that it wasn't replacing elements I appended after the loadXML, but with some experimenting I haven't been able to narrow down the pattern here.
I would appreciate a code correction to allow me to replace all tags in a DOMDocument with a given Element Node. I also wouldn't mind any input into a more efficient/practical way to execute this if I haven't found it.
Thanks in advance!
[edit] PHP 5.3.13
NodeLists are live.
So when you remove an item inside the document, the NodeList also will be modified. Avoid using a reference to the NodeList and start replacing at the last item:
$DOM = new DOMDocument;
$DOM->loadXML($xml);
$regions = $DOM->getElementsByTagname('region');
$regionsCount = $DOM->getElementsByTagName('region')->length;
for($i= $regionsCount;$i>0;--$i)
{
$region=$DOM->getElementsByTagName('region')->item($i-1);
$id = $region->getAttribute('id');
$partial = $DOM->createDocumentFragment();
$partial->appendXML( $partials[$id] );
$region->parentNode->replaceChild($partial, $region);
}
echo $DOM->saveXML();
?>
http://codepad.org/gTjYC4hr
I am have two xml files.. I first get one and loop through it then I need to take an id from the first xml file and find it in the second one and echo out the results associated with that id. If I were to do this with SQL I would simply do this:
$query = (SELECT * FROM HotelSummary WHERE roomTypeCode = '$id') or die();
while($row=mysql_fetch_array($query)){
$name = $row['Name'];
}
echo $name;
How can I do this is in xml and php??
I recommend you to read the DOMDocument documentation.
It's quite heavy but also powerful (not always clear what happens, but the Internet shold always give you a solution)
You can simply walk through your first document, finding your Id and then find your DOMElement via an XPath.
<?php
$dom = new DOMDocument();
$dom->load('1.xml');
foreach ($dom->getElementsByTagName('article') as $node) {
// your conditions to find out the id
$id = $node->getAttribute('id');
}
$dom = new DOMDocument();
$dom->load('2.xml');
$xpath = new DOMXPath($dom);
$element = $xpath->query("//*[#id='".$id."']")->item(0);
// would echo "top_2" based on my example files
echo $element->getAttribute('name');
Based on following test files:
1.xml
<?xml version="1.0" encoding="UTF-8"?>
<articles>
<article id="foo_1">
<title>abc</title>
</article>
<article id="foo_2">
<title>def</title>
</article>
</articles>
2.xml
<?xml version="1.0" encoding="UTF-8"?>
<tests>
<test id="foo_1" name="top_1">
</test>
<test id="foo_2" name="top_2">
</test>
</tests>
Use SimpleXML to create an object representation of the file. You can then loop through the elements of the Simple XML object.
Depending on the format of the XML file:
Assuming it is:
<xml>
<roomTypeCode>
<stuff>stuff</stuff>
<name>Skunkman</name>
</roomTypeCode>
<roomTypeCode>
<stuff>other stuff</stuff>
<name>Someone Else</name>
</roomTypeCode>
</xml>
It would be something like this:
$xml = simplexml_load_file('xmlfile.xml');
for($i = 0; $i < count($xml->roomTypeCode); $i++)
{
if($xml->roomTypeCode[$i]->stuff == "stuff")
{
$name = $xml->roomTypeCode[$i]->name;
}
}
That connects to the XML file, finds how many roomTypeCode entries there are, searches for the value of "stuff" within and when it matches it correctly, you can access anything having to do with that XML entry.
Consider the following code :
$dom = new DOMDocument();
$dom->loadXML($file);
$xmlPath = new DOMXPath($dom);
$arrNodes = $xmlPath->query('*/item');
foreach($arrNodes as $item){
//missing code
}
The $file is an xml and each item has a title and a description.
How can I display them (title and description)?
$file = "<item>
<title>test_title</title>
<desc>test</desc>
</item>";
I suggest using php's simplexml, with that, you still get xpath functionality, but with easier approach, for example you would access attributes like this:
$name = $item['name'];
Here's an example:
xmlfile.xml:
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<items>
<item title="Hello World" description="Hellowing the world.." />
<item title="Hello People" description="greeting people.." />
</items>
</xml>
do.php:
<?php
$xml_str = file_get_contents('xmlfile.xml');
$xml = new SimpleXMLElement($xml_str);
$items = $xml->xpath('*/item');
foreach($items as $item) {
echo $item['title'], ': ', $item['description'], "\n";
}
If your item looks like this:
<item>
<title>foo</title>
<description>frob</description>
</item>
You could use getElementsByTagName() and nodeValue:
foreach($arrNodes as $item){
print $item->getElementsByTagName('title')->item(0)->nodeValue;
}
Are title and description attributes? E. g. does an item look like this:
<item title="foo" description="frob" />
If so, you could just use getAttribute():
...
foreach($arrNodes as $item){
print $item->getAttribute('title');
}
The right XPath expression should be:
/*/item/title | /*/item/desc
Or
/*/item/*[self::title or self::desc]
This is evaluate to a node set with title and desc element in document order