PHP SimpleXML::addChild with empty string - redundant node - php

Calling addChild with an empty string as the value (or even with whitespace) seems to cause a redundant SimpleXml node to be added inside the node instead of adding just the node with no value.
Here's a quick demo of what happens:
[description] => !4jh5jh1uio4jh5ij14j34io5j!
And here's with an empty string:
[description] => SimpleXMLElement Object ( [0] => )
The workaround I'm using at the moment is pretty horrible - I'm doing a str_replace on the final JSON to replace !4jh5jh1uio4jh5ij14j34io5j! with an empty string. Yuck. Perhaps the only answer at this point is 'submit a bug report to simplexml'...
Does anyone have a better solution?

I think I figured out what is going on. Given code like this:
$xml = new SimpleXMLElement('<xml></xml>');
$xml->addChild('node','value');
print_r($xml);
$xml = new SimpleXMLElement('<xml></xml>');
$xml->addChild('node','');
print_r($xml);
$xml = new SimpleXMLElement('<xml></xml>');
$xml->addChild('node');
print_r($xml);
The output is this:
SimpleXMLElement Object
(
[node] => value
)
SimpleXMLElement Object
(
[node] => SimpleXMLElement Object
(
[0] =>
)
)
SimpleXMLElement Object
(
[node] => SimpleXMLElement Object
(
)
)
So, to make it so that in case #2 the empty element isn't created (i.e. if you don't know if the second argument is going to be an empty string or not), you could just do something like this:
$mystery_string = '';
$xml = new SimpleXMLElement('<xml></xml>');
if (preg_match('#\S#', $mystery_string)) // Checks for non-whitespace character
$xml->addChild('node', $mystery_string);
else
$xml->addChild('node');
print_r($xml);
echo "\nOr in JSON:\n";
echo json_encode($xml);
To output:
SimpleXMLElement Object
(
[node] => SimpleXMLElement Object
(
)
)
Or in JSON:
{"node":{}}
Is that what you want?
Personally, I never use SimpleXML, and not only because of this sort of weird behavior -- it is still under major development and in PHP5 is missing like 2/3 of the methods you need to do DOM manipulation (like deleteChild, replaceChild etc).
I use DOMDocument (which is standardized, fast and feature-complete, since it's an interface to libxml2).

With SimpleXML, what you get if you use print_r(), or var_dump(), serialize(), or similar, does not correspond to what is stored internally in the object. It is a 'magical' object which overloads the way PHP interates its contents.
You get the true representation of the element with AsXML() only.
When something like print_r() iterates over a SimpleXML element or you access its properties using the -> operator, you get a munged version of the object. This munged version allows you to do things like "echo $xml->surname" or $xml->names[1] as if it really had these as properties, but is separate to the true XML contained within: in the munged representation elements are not necessarily in order, and elements whose names are PHP reserved words (like "var") aren't presented as properties, but can be accessed with code like $xml["var"] - as if the object is an associative array. Where multiple sibling elements have the same name they are presented like arrays. I guess an empty string is also presented like an array for some reason. However, when output using AsXML() you get the real representation.

Maybe I'm not understanding the question right but, it seems to me that when you use the addChild method, you're required to have a string as an argument for the name of the node regardless of what content is in the node. The value (second argument) is optional and can be left blank to add and empty node.
Let me know if that helps.

I've created an Xml library to which extends the simpleXml object to include all of the functionally that is present in the DOMDocument but is missing an interface from SimpleXml (as the two functions interact with the same underlying libxml2 object --by reference). It also has niceties such as AsArray() or AsJson() to output your object in one of those formats.
I've just updated the library to work as you expect when outputting JSON. You can do the following:
$xml = new bXml('<xml></xml>');
$xml->addChild('node', '');
$json_w_root = $xml->asJson(); // is { 'xml': {'node':'' } }
$json = $xml->children()->asJson(); // is { 'node' : '' } as expected.
The library is hosted on google code at http://code.google.com/p/blibrary/

Related

How to get ALL elements of simplexml object

OK, I'm totally stumped here. I've found similar questions, but the answers don't seem to work for my specific problem. I've been working on this on and off for days.
I have this here simplexml object (it's actually much, much, MUCH longer than this, but I'm cutting out all the extraneous stuff so you'll actually look at it):
SimpleXMLElement Object
(
[SubjectClassification] => Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[Authority] => Category Code
[Value] => s
[Id] => s
)
)
[1] => SimpleXMLElement Object
(
[#attributes] => Array
(
[Authority] => Subject
[Value] => Sports
[Id] => 54df6c687df7100483dedf092526b43e
)
)
[2] => SimpleXMLElement Object
(
[#attributes] => Array
(
[Authority] => Subject
[Value] => Professional baseball
[Id] => 20dd2c287e4e100488e5d0913b2d075c
)
)
)
)
I got this block of code by doing a print_r on a variable containing the following:
$subjects->SubjectClassification->children();
Now, I want to get at all the elements of the subjectClassification array. ALL of them! But when I do this:
$subjects->SubjectClassification;
Or this:
$subjects->SubjectClassification->children();
OR if I try to get all the array elements via a loop, all I get is this:
SimpleXMLElement Object
(
[#attributes] => Array
(
[Authority] => Category Code
[Value] => s
[Id] => s
)
)
Why? How can I get everything?
You can use xpath to do this. Its the easiest way and most efficient I find and cuts down the need for lots of for loops and such to resolve items. To get all the nodes you want you can use:
if your xml is like this:
<Subjects>
<SubjectClassification>
</SubjectClassification>
<SubjectClassification>
</SubjectClassification>
<SubjectClassification>
</SubjectClassification>
</Subjects>
Then to get all subject classifications in an array you can do the following:
$subject_classifications = $xml->xpath("//SubjectClassification");
The xml variable refers to your main simplexml object i.e. the file you loaded using simplexml.
Then you can just iterate through the array using a foreach loop like this:
foreach($subject_classifications as $subject_classification){
echo (string) $subject_classification->Authority;
echo (string) $subject_classification->Value;
echo (string) $subject_classification->Id;
}
Your structure may vary but you get the idea anyway. You can see a good article from IBM here "Using Xpath With PHP":
Because of the extent to which SimpleXML overloads PHP syntax, relying on print_r to figure out what's in a SimpleXML object, or what you can do with it, is not always helpful. (I've written a couple of debugging functions intended to be more comprehensive.) Ultimately, the reference should be to the XML structure itself, and knowledge of how SimpleXML works.
In this case, it looks from the output you provide that what you have is a list of elements all called SubjectClassification, and all siblings to each other. So you don't want to call $subjects->SubjectClassification->children(), because those nodes have no children.
Without a better idea of the underlying XML structure, it's hard to say more, so I'll save this incomplete answer for now.
For all descendants (that are children, grand-children, grand-grand-children, grand-grand-... (you get the idea)) of <subjectClassification>s ("all the elements [...] ALL of them!" as you named it), you can make use of Xpath which supports such more advanced queries (at least I assume that is what you're looking for, your question does not give any detailed discription nor example what you mean by "all" specifically).
As for SimpleXML you can query elements (and attributes) only with Xpath, but as you need elements only, this is no show stopper:
$allOfThem = $subjects->xpath('./SubjectClassification//*');
The key point here is the Xpath expression:
./SubjectClassification//*
Per the dot . at the beginning it is relative to the context-node, which is $subjects in your case. Then looking for all elements that are descending to the direct child-element named SubjectClassification. This works per // (unspecified depth) and * (any element, star acts as a wildcard).
So hopefully this answers your question months after. I just stumbled over it by cleaning up some XML questions and perhaps this is useful for future reference as well.
I have added this second answer in case whats actually throwing you is retrieving the attributes array as opposed to the nodes. This is how you could print out the attributes for each SubjectClassification in your main $xml object.
foreach($xml->SubjectClassification->attributes() as $key => $value) {
echo $key . " : " . $value "\n";
}
I've found that count returns the proper number of elements, and you can then use a standard for loop to iterate over them:
$n = count($subjects->SubjectClassification);
for ($i = 0; $i < $n; $i++) {
var_dump($subjects->SubjectClassification[$i]);
}
I'm not sure why the foreach loop doesn't work, nor why dumping $subjects->SubjectClassification directly only shows the first node, but for any who stumble across this ancient question as I have, the above is one way to find more information without resorting to external libraries.

Getting the first XML element with SimpleXML

ok, this might be a stupid question, but how do I get one single element from an XML document?
I have this XML
$element = $response['linkedin'];
SimpleXMLElement Object
(
[id] => 575677478478
[first-name] => John
[last-name] => Doe
[email-address] => john#doe.com
[picture-url] => http://m3.licdn.com/mpr/mprx/123
[headline] => Headline goes here
[industry] => Internet
[num-connections] => 71
I just want to assign first-name as $firstName
I can loop over it using xPath, but that just seems like overkill.
ex:
$fName = $element->xpath('first-name');
foreach ($fName as $name)
{
$firstName = $name;
}
If you access a list of (one or more) element nodes in SimpleXML as a single element, it will return the first element. That is by default (and outlined as well in the SimpleXML Basic Usage):
$first = $element->{'first-name'};
If there are more than one element, you can specify which one you mean by using the zero-based index of it, either in square (array-access) or curly (property-access) brackets:
$first = $element->{'first-name'}[0];
$first = $element->{'first-name'}{0};
This also allows you to create a so called SimpleXML self-reference to access the element itself, e.g. to remove it:
unset($first[0]); # removes the element node from the document.
unset($first); # unsets the variable $first
You might think your Xpath would be overkill. But it's not that expensive in SimpleXML. Sometimes the only way to access an element is with Xpath even. Therefore it might be useful for you to know that you can easily access the first element as well per an xpath. For example the parent element in SimpleXML:
list($parent) $element->xpath('..'); # PHP < 5.4
$parent = $element->xpath('..')[0]; # PHP >= 5.4
As you can see it is worth to actually understand how things work to make more use of SimpleXML. If you already know all from the SimpleXML Basic Usage page, you might want to learn a bit more with the
SimpleXML Type Cheatsheet
How to tell apart SimpleXML objects representing element and attribute?
SimpleXMLElement implements JsonSerializable
Answer form per request. ^^
If that SimpleXMLElement is the only one contained within $resource['linkedin'], you can change it with:
$resource['linkedin']->{'first-name'} = $name;
That allows you direct access to the element without needing to do an xpath on it. ^^
You can use XPath to find the first instance of a matching element.
/root/firstname[1] would give you the first instance of firstname in your document.
$res=$response['linkedin']->xpath('/first-name[1]');

How do you use a # symbol in the name of a PHP object

I've got an XML file that has the label #attributes for one of the names
SimpleXMLElement Object
(
[#attributes] => Array
(
[PART_NUMBER] => ABC123
I want to make a reference to this object like $product->#attributes['part_number'] but of course the # symbol causes an error.
So how do I reference this item in the object?
Well, in the case of SimpleXML, you'd call the $product->attributes() method as defined in the manual. That will give you an array mapping attribute names to values.
$obj:
stdClass Object
(
[#id] => Hello
[$] => World!
)
To access #id and $:
echo $obj->{'#id'};
echo ' ';
echo $obj->{'$'};
$product[0]['PART_NUMBER'] should work.
If you got more than one attribute, you should use $product->attributes() in a foreach
attributes in SimpleXML manual
If you're using SimpleXML objects, is already have an attributes method built into it (without the # sign) -- use it like this: $product->attributes('part_number');
If you're trying to create your own objects to map to the XML, then as you already found out, you can't use the # symbol in a PHP variable name (nor any other symbol except underscore).
I'd suggest simply using $product->attributes['part_number'] (ie without the # symbol at all) and mapping it inside your class.
If you really need to map it into your variable names, the best you can really hope for would be some kind of replacement string that you can swap in and out as you convert between the two formats.
eg: $product->at__attributes['part_number']
But that's not really a particularly good solution, IMHO.

misleading usage of square bracket in SimpleXMLElement object

I found the [] operator is sometimes confusing when it is used agains SimpleXMLElement object.
$level_a = $xml->children();
$level_a['name']; # this returns the 'name' attribute of level_a (SimpleXmlElement object)
$level_a[0]; # this returns $level_a itself!
$level_a[1]; # this returns the second SimpleXmlElement object under root node. (Same level as level_a)
I can't find any documents about the numeric indexing usage of SimpleXmlElement class. Can anybody explain how those two worked?
Note that it seems this [num] operator of SimpleXmlElement just mimic the behavior of Array. I feel that this is not something stirred with Array, but the implementation of SimpleXmlElement class.
I don't believe anything magical is going on here. An array in PHP can keyed by an integer, and may be keyed by a string as well. So the $xml->children() line is likely making an array of key-value attribute pairs in the form
foreach (attrs($element) as $attribute_name => $attribute_value)
$array[$attribute_name] = $attribute_value;
$array[0] = $element;
// etc.

getting items from a SimpleXMLElement object

From what I can tell, a SimpleXMLElement is just an array of other SimpleXMLElements, plus some regular array values if there wasn't a tag nested in another tag.
I have a SimpleXMLElement in a variable $data, and var_dump($data) gives me this:
object(SimpleXMLElement)#1 (33) {
["buyer-accepts-marketing"]=>
string(4) "true"
...
...
but calling var_dump($data->buyer-accepts-marketing) gives me an error, and var_dump($data["buyer-accepts-marketing"]) gives me NULL. Calling var_dump($data->shipping-address->children()) gives me an error.
going like this:
foreach($data as $item) {
var_dump($item);
}
gives a whole bunch of SimpleXMLElement objects, but oddly enough, no strings or ints.
What am I missing here? I want to take specific portions of it and pass them to a function, so for example, I don't have to go
$data->billing-address->postal-code;
...
$data->shipping-address->postal-code;
...
and can just go
address($data->billing-address);
address($data->shipping-address);
etc.
SimpleXMLElement is not just an array. To access child elements, you must use object notation ($a->b) and to access attributes you must use array notation ($a['b']).
Problem is, with object notation, valid tag names can be illegal PHP code.
You need to do this:
$data->{'buyer-accepts-marketing'};
Note that this returns a SimpleXMLElement! The reason for this is that it can contain either just text, more child elements, or both. The output of var_dump() is very misleading for SimpleXMLElements. If you want to the text content of a single <buyer-accepts-marketing> tag, you have to do this:
(string)$data->{'buyer-accepts-marketing'};
Of course it is also perfectly legal to do this:
(int)$data->{'buyer-accepts-marketing'};
The reason this appears to work in some cases (such as echoing a SimpleXMLElement) is that the type conversions are implicit and automatic. You can't echo an object, so PHP automatically converts it to a string.
I have a love/hate relationship with SimpleXML. It makes things very easy only after you understand how complex the actual API is.
Read up on the so-called "basic" examples to get a good handle on it.

Categories