This question already has answers here:
Simple XML - Dealing With Colons In Nodes
(4 answers)
Closed 9 years ago.
I've been trying to use SimpleXML, but it doesn't seem to like XML that looks like this:
<xhtml:div>sample <xhtml:em>italic</xhtml:em> text</xhtml:div>
So what library will handle tags that look like that (have a colon in them)?
Say you have some xml like this.
<xhtml:div>
<xhtml:em>italic</xhtml:em>
<date>2010-02-01 06:00</date>
</xhtml:div>
You can access 'em' like this: $xml->children('xhtml', true)->div->em;
however, if you want the date field, this: $xml->children('xhtml', true)->div->date; wont work, because you are stuck in the xhtml namespace.
you must execute 'children' again to get back to the default namespace:
$xml->children('xhtml', true)->div->children()->date;
If you want to fix it quickly do this (I do when I feel lazy):
// Will replace : in tags and attributes names with _ allowing easy access
$xml = preg_replace('~(</?|\s)([a-z0-9_]+):~is', '$1$2_', $xml);
This will convert <xhtml: to <xhtml_ and </xhtml: to </xhtml_.
Kind of hacky and can fail if CDATA NameSpaced XML container blocks are involved or UNICODE tag names but I'd say you are usually safe using it (hasn't failed me yet).
Colon denotes an XML namespace. The DOM has good support for namespaces.
I don't think it's a good idea to get rid of the colon or to replace it with something else as some people suggested. You can easily access elements that have a namespace prefix. You can either pass the URL that identifies the namespace as an argument to the children() method or pass the namespace prefix and "true" to the children() method. The second approach requires PHP 5.2 and up.
SimpleXMLElement::children
Related
I'm using the following PHP code to read XML data from NOAA's tide reporting station API:
$rawxml = file_get_contents(
"http://opendap.co-ops.nos.noaa.gov/axis/webservices/activestations/"
."response.jsp?v=2&format=xml&Submit=Submit"
);
$rawxml = utf8_encode($rawxml);
$ob = simplexml_load_string($rawxml);
var_dump($ob);
Unfortunately, I end up with it displaying this:
object(SimpleXMLElement)#246 (0) { }
It looks to me like the XML is perfectly well-formed - why won't this parse? From looking at another question (Simplexml_load_string() fail to parse error) I got the idea that the header might be the problem - the http call does indeed return a charset value of "ISO-8859-1". But adding in the utf8_encode() call doesn't seem to do the trick.
What's especially confusing is that simplexml_load_string() doesn't actually fail - it returns a cheerful XML array, just with nothing in it!
You've been fooled (and had me fooled) by the oldest trick in the SimpleXML book: SimpleXML doesn't parse the whole document into a PHP object, it presents a PHP API to an internal structure. Functions like var_dump can't see this structure, so don't always give a useful idea of what's in the object.
The reason it looks "empty" is that it is listing the children of the root element which are in the default namespace - but there aren't any, they're all in the "soapenv:" namespace.
To access namespaced elements, you need to use the children() method, passing in the full namespace name (recommended) or its local prefix (simpler, but could be broken by changes in the way the file is generated the other end). To switch back to the "default namespace", use ->children(null).
So you could get the ID attribute of the first stationV2 element like this (live demo):
// Define constant for the namespace names, rather than relying on the prefix the remote service uses remaining stable
define('NS_SOAP', 'http://schemas.xmlsoap.org/soap/envelope/');
// Download the XML
$rawxml = file_get_contents("http://opendap.co-ops.nos.noaa.gov/axis/webservices/activestations/response.jsp?v=2&format=xml&Submit=Submit");
// Parse it
$ob = simplexml_load_string($rawxml);
// Use it!
echo $ob->children(NS_SOAP)->Body->children(null)->ActiveStationsV2->stationsV2->stationV2[0]['ID'];
I've written some debugging functions to use with SimpleXML which should be much less misleading than var_dump etc. Here's a live demo with your code and simplexml_dump.
I have two lines of XML data that are attributes but also contain data inside then and they are repeating fields. They are being stored in a SimpleXML variable.
<inputField Type="Name">John Doe</inputField>
<inputField Type="DateOfHire">Tomorrow</inputField>
(Clearly this isnt real data but the syntax is actually in my data and I'm just using string data in them)
Everything that I've seen says to access the data like this, ,which I have tried and it worked perfectly. But my data is dynamic so the data isn't always going to be in the same place, so it doesn't fit my needs.
$xmlFile->inputField[0];
$xmlFile->inputField[1];
This works fine until one of the lines is missing, and I can have anywhere from 0 to 5 lines. So what I was wondering was is there any way that I can access the data by attribute name? So potentially like this.
$xmlFile->inputField['Name'];
or
$xmlFile->inputField->Name;
I use these as examples strictly to illustrate what I'm trying to do, I am aware that neither of the above lines of code are syntactically correct.
Just a note this information is being generated externally so I cannot change the format.
If anyone needs clarification feel free to let me know and would be happy to elaborate.
Maybe like this?
echo $xmlFile->inputField->attributest()->Name;
And what you're using? DOMDocument or simplexml?
You don't say, but I assume you're using SimpleXMLElement?
If you want to access every item, just iterate:
foreach ($xmlFile->inputField as $inputField) { ... }
If you want to access an attribute use array notation:
$inputField['Type']
If you want to access only one specific element, use xpath:
$xmlFile->xpath('inputField[#Type="Name"]');
Perhaps you should read through the basic examples of usage in the SimpleXMLElement documentation?
For example you can a grab a data:
$xmlFile = simplexml_load_file($file);
foreach($xmlFile->inputField as $res) {
echo $res["Name"];
}
Working with PHP DOM - HTML manipulation.
Got 2 questions
Recently read that, there is better way to output special html characters (e.g. ©): DOMDocument::createEntityReference() method. Main advantage is, you don't need to use htmlentities, it will be automatically escaped.
For ex: $copyright_symbol = $document->createEntityReference("copy");.
Now, the problem is, where can I find characters' code reference? In my case I need php equalent of × (× symbol)
What if I want to set muliple classes to element? Can I do it like that $el->setAttribute('class', 'class1 class2 ...') ??
here you can see character codes as well as friendly names. For your ×, you will use "times"
and for the second question, yes, you can do it like that.
I'm a somewhat experienced PHP scripter, however I just dove into parsing XML and all that good stuff.
I just can't seem to wrap my head around why one would use a separate XML parser instead of just using the explode function, which seems to be just as simple. Here's what I've been doing (assuming there is a valid XML file at the path xml.php):
$contents = file_get_contents("xml.php");
$array1 = explode("<a_tag>", $contents);
$array2 = explode("</a_tag>", $array1[1]);
$data = $array2[0];
So my question is, what is the practical use for an XML parser if you can just separate the values into arrays and extract the data from that point?
Thanks in advance! :)
Excuse me for not going into details but for starters try parsing
$contents = '<a xmlns="urn:something">
<a_tag>
<b>..</b>
<related>
<a_tag>...</a_tag>
</related>
</a_tag>
<foo:a_tag xmlns:foo="urn:something">
<![CDATA[This is another <a_tag> element]]>
</foo:a_tag>
</a>';
with your explode-approach. When you're done we can continue with some trickier things ;-)
In a nutshell, its consistency. Before XML came into wide use there were numerous undocumented formats for keeping information in files. One of the motivators behind XML was to create a well defined, standard document format. With this well defined format in place, a general set of parsing tools could be developed that would work consistently on documents so long as the documents adhered to the aforementioned well defined format.
In some specific cases, your example code will work. However, if the document changes
...
<!-- adding an attribute -->
<a_tag foo="bar">Contents of the Tag</a_tag>
...
...
<!-- adding a comment to the contents -->
<a_tag>Contents <!-- foobar --> of the Tag</a_tag>
...
Your parsing code will probably break. Code written using a correctly defined XML parser will not.
XML parsers:
Handle encoding
May have xpath support
Allow you to easily modify and save the XML; append/remove child nodes, add/remove attributes, etc.
Don't need to load the whole file into memory (except from DOM parsers)
Know about namespaces
...
How would you explode the same file if a_tag had an attribute?
explode("<a_tag>" ... will work differently than explode("<a_tag attr='value'>" ..., after all.
XML Parsers understand the XML specification. Explode can only handle the simplest of cases, and will most likely fail in a lot of instances of that case.
Using a proven XML parsing method will make the code more maintainable and easy to read. It will also make it more easily adaptable should the schema change, and it can make it easier to determine error conditions. XPath and XSLT exist for a reason, they are proven ways to deal with XML data in a sensible, legible manner. I'd suggest you use whichever is applicable in your given situation. Remember, just because you think you're only writing code for one specific purpose, you never know what a piece of well-written code could evolve into.
This question already has answers here:
Simple XML - Dealing With Colons In Nodes
(4 answers)
Closed 9 years ago.
I've been trying to use SimpleXML, but it doesn't seem to like XML that looks like this:
<xhtml:div>sample <xhtml:em>italic</xhtml:em> text</xhtml:div>
So what library will handle tags that look like that (have a colon in them)?
Say you have some xml like this.
<xhtml:div>
<xhtml:em>italic</xhtml:em>
<date>2010-02-01 06:00</date>
</xhtml:div>
You can access 'em' like this: $xml->children('xhtml', true)->div->em;
however, if you want the date field, this: $xml->children('xhtml', true)->div->date; wont work, because you are stuck in the xhtml namespace.
you must execute 'children' again to get back to the default namespace:
$xml->children('xhtml', true)->div->children()->date;
If you want to fix it quickly do this (I do when I feel lazy):
// Will replace : in tags and attributes names with _ allowing easy access
$xml = preg_replace('~(</?|\s)([a-z0-9_]+):~is', '$1$2_', $xml);
This will convert <xhtml: to <xhtml_ and </xhtml: to </xhtml_.
Kind of hacky and can fail if CDATA NameSpaced XML container blocks are involved or UNICODE tag names but I'd say you are usually safe using it (hasn't failed me yet).
Colon denotes an XML namespace. The DOM has good support for namespaces.
I don't think it's a good idea to get rid of the colon or to replace it with something else as some people suggested. You can easily access elements that have a namespace prefix. You can either pass the URL that identifies the namespace as an argument to the children() method or pass the namespace prefix and "true" to the children() method. The second approach requires PHP 5.2 and up.
SimpleXMLElement::children