I'm using SimpleXML to parse an XML feed of property listings from different realtors. The relevant section of the XML feed looks something like this:
<branch name="Trustee Realtors">
<properties>
<property>
<reference>1</reference>
<price>275000</price>
<bedrooms>3</bedrooms>
</property>
<property>
<reference>2</reference>
<price>350000</price>
<bedrooms>4</bedrooms>
</property>
<property>
<reference>3</reference>
<price>128500</price>
<bedrooms>4</bedrooms>
</property>
</properties>
</branch>
<branch name="Quick-E-Realty Inc">
<properties>
<property>
<reference>4</reference>
<price>180995</price>
<bedrooms>3</bedrooms>
</property>
</properties>
</branch>
and is then converted to an array like this:
$xml = file_get_contents($filename);
$xml = simplexml_load_string($xml);
$xml_array = json_decode(json_encode((array) $xml), 1);
$xml_array = array($xml->getName() => $xml_array);
The issue I'm having is that when the array is created the data for the single listing is in a different position in the array to the multiple listings - I'm not sure exactly how to explain this, but if I var_dump() the array for the multiple items it looks like this:
array(3) {
[0]=>
array(3) {
["reference"]=>
string(4) "0001"
["price"]=>
string(6) "275000"
["bedrooms"]=>
int(3)
}
[1]=>
array(3) {
["reference"]=>
string(4) "0002"
["price"]=>
string(6) "350000"
["bedrooms"]=>
int(4)
}
[2]=>
array(3) {
["reference"]=>
string(4) "0003"
["price"]=>
string(6) "128500"
["bedrooms"]=>
int(2)
}
}
If I var_dump() the array for the single listing it looks like this:
array(3) {
["reference"]=>
string(4) "0004"
["price"]=>
string(6) "180995"
["bedrooms"]=>
int(3)
}
But what I need it to look like is this:
array(1) {
[0]=>
array(3) {
["reference"]=>
string(4) "0004"
["price"]=>
string(6) "180995"
["bedrooms"]=>
int(3)
}
}
Each of these arrays represents the property listings from a single realtor. I'm not sure whether this is just the way that SimpleXML or the json functions work but what I need is for the same format to be used (the array containing the property listing to be the value of the [0] key).
Thanks in advance!
SimpleXML is quirky like this. I used it recently trying to make configuration files "easier" to write up and found out in the process that SimpleXML doesn't always act consistent. In this case I think you will benefit from simply detecting if a <property> is the only one in a set, and if so, then wrap it in an array by itself and then send it to your loop.
NOTE: ['root'] is there because I needed to wrap a '<root></root>' element around your XML to make my test work.
//Rebuild the properties listings
$rebuild = array();
foreach($xml_array['root']['branch'] as $key => $branch) {
$branchName = $branch['#attributes']['name'];
//Check to see if 'properties' is only one, if it
//is then wrap it in an array of its own.
if(is_array($branch['properties']['property']) && !isset($branch['properties']['property'][0])) {
//Only one propery found, wrap it in an array
$rebuild[$branchName] = array($branch['properties']['property']);
} else {
//Multiple properties found
$rebuild[$branchName] = $branch['properties']['property'];
}
}
That takes care of rebuilding your properties. It feels a little hackish. But basically you are detecting for the lack of a multi-dimensional array here:
if(is_array($branch['properties']['property']) && !isset($branch['properties']['property'][0]))
If you don't find a multi-dimensional array then you explicitly make one of the single <property>. Then to test that everything was rebuilt correctly you can use this code:
//Now do your operation...whatever it is.
foreach($rebuild as $branch => $properties) {
print("Listings for $branch:\n");
foreach($properties as $property) {
print("Reference of " . $property['reference'] . " sells at $" . $property['price'] . " for " . $property['bedrooms'] . " bedrooms.\n");
}
print("\n");
}
This produces the following output:
Listings for Trustee Realtors:
Reference of 1 sells at $275000 for 3 bedrooms.
Reference of 2 sells at $350000 for 4 bedrooms.
Reference of 3 sells at $128500 for 4 bedrooms.
Listings for Quick-E-Realty Inc:
Reference of 4 sells at $180995 for 3 bedrooms.
And a dump of the rebuild will produce:
Array
(
[Trustee Realtors] => Array
(
[0] => Array
(
[reference] => 1
[price] => 275000
[bedrooms] => 3
)
[1] => Array
(
[reference] => 2
[price] => 350000
[bedrooms] => 4
)
[2] => Array
(
[reference] => 3
[price] => 128500
[bedrooms] => 4
)
)
[Quick-E-Realty Inc] => Array
(
[0] => Array
(
[reference] => 4
[price] => 180995
[bedrooms] => 3
)
)
)
I hope that helps you out getting closer to a solution to your problem.
The big massive "think outside the box" question to ask yourself here is: why are you converting the SimpleXML object to an array in the first place?
SimpleXML is not just a library for parsing XML and then using something else to manipulate it, it's designed for exactly the kind of thing you're about to do with that array.
In fact, this problem of sometimes having single elements and sometimes multiple is one of the big advantages it has over a plain array representation: for nodes that you know will be single, you can leave off the [0]; but for nodes you know might be multiple, you can use [0], or a foreach loop, and that will work too.
Here are some examples of why SimpleXML lives up to its name with your XML:
$sxml = simplexml_load_string($xml);
// Looping over multiple nodes with the same name
// We could also use $sxml->children() to loop regardless of name
// or even the shorthand foreach ( $sxml as $children )
foreach ( $sxml->branch as $branch ) {
// Access an attribute using array index notation
// the (string) is optional here, but good habit to avoid
// passing around SimpleXML objects by mistake
echo 'The branch name is: ' . (string)$branch['name'] . "\n";
// We know there is only one <properties> node, so we can take a shortcut:
// $branch->properties means the same as $branch->properties[0]
// We don't know if there are 1 or many <property> nodes, but it
// doesn't matter: we're asking to loop over them, so SimpleXML
// knows what we mean
foreach ( $branch->properties->property as $property ) {
echo 'The property reference is ' . (string)$property->reference . "\n";
}
}
Basically, whenever I see that ugly json_decode(json_encode( trick, I cringe a little, because 99 times out of 100 the code that follows is much uglier than just using SimpleXML.
One possibility is reading the XML with DOM+XPath. XML can not just be converted to JSON, but building a specific JSON for a specific XML is easy:
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXPath($dom);
$result = [];
foreach ($xpath->evaluate('//branch') as $branchNode) {
$properties = [];
foreach ($xpath->evaluate('properties/property', $branchNode) as $propertyNode) {
$properties[] = [
'reference' => $xpath->evaluate('string(reference)', $propertyNode),
'price' => (int)$xpath->evaluate('string(price)', $propertyNode),
'bedrooms' => (int)$xpath->evaluate('string(bedrooms)', $propertyNode)
];
}
$result[] = [
'name' => $xpath->evaluate('string(#name)', $branchNode),
'properties' => $properties
];
}
echo json_encode($result, JSON_PRETTY_PRINT);
Output: https://eval.in/154352
[
{
"name": "Trustee Realtors",
"properties": [
{
"reference": "1",
"price": 275000,
"bedrooms": 3
},
{
"reference": "2",
"price": 350000,
"bedrooms": 4
},
{
"reference": "3",
"price": 128500,
"bedrooms": 4
}
]
},
{
"name": "Quick-E-Realty Inc",
"properties": [
{
"reference": "4",
"price": 180995,
"bedrooms": 3
}
]
}
Use the SimpleXMLElement Class:
<?php
$xml = "<body>
<item>
<id>2</id>
</item>
</body>";
$elem = new SimpleXMLElement($xml);
if($elem->children()->count() === 1){
$id = $elem->item->addChild(0)->addChild('id',$elem->item->id);
unset($elem->item->id);
};
$array = json_decode(json_encode($elem), true);
print_r($array);
Output:
Array
(
[item] => Array
(
[0] => Array
(
[id] => 2
)
)
)
did you use this:
$xml_array['branch']['properties']['property']
as loop source? try to use this:
$xml_array['branch']['properties']
don't use ['property'] at the end of the line, don't use 3 segment just use 2 segment
<?php
$xml = file_get_contents('simple.xml');
$xml = simplexml_load_string($xml);
$xml_array = json_decode(json_encode((array) $xml), 1);
$xml_array = array($xml->getName() => $xml_array);
print_r($xml_array);
foreach($xml_array['branch']['properties'] as $a){
print_r($a);
}
?>
In order to solve this problem, you should select using xpath (as other mention), but in my opinion this is not a very familiar tool to most web-developers. I created a very small composer enabled package, which solves this problem. Credit to the symfony package CssSelector (https://symfony.com/doc/current/components/css_selector.html) which rewrites CSS selectors to xpath selectors. My package is just a thin wrapper that actually deals with what you in the most common cases will do with XML using PHP. You can find it here: https://github.com/diversen/simple-query-selector
use diversen\querySelector;
// Load simple XML document
$xml = simplexml_load_file('test2.xml');
// Get all branches as DOM elements
$elems = querySelector::getElementsAsDOM($xml, 'branch');
foreach($elems as $elem) {
// Get attribute name
echo $elem->attributes()->name . "\n";
// Get properties as array
$props = querySelector::getElementsAsAry($elem, 'property');
print_r($props); // You will get the array structure you expect
}
You could also (if you don't care about the branch name) just do:
$elems = querySelector::getElementsAsAry($xml, 'property');
Testing if the parsed XML has multiple tags, or is a single tag converted to array, instead of rebuilding the array, you could just test for the following case:
<?php
if (is_array($info[0])) {
foreach ($info as $fields) {
// Do something...
}
} else {
// Do something else...
}
Try it=)
$xml = simplexml_load_string($xml_raw, "SimpleXMLElement", LIBXML_NOCDATA);
$json = json_encode($xml);
$array = json_decode($json, TRUE);
$marray['RepairSheets']['RepairSheet'][0] = $array['RepairSheets']['RepairSheet'];
$array = (isset($array['RepairSheets']['RepairSheet'][0]) == true) ? $array : $marray;
First of all, I've seen a good deal of similar questions. I know regex or dom can be used, but I can't find any good examples of DOM and regex makes me pull my hair. In addition, I need to pull out multiple values from the html source, some simply contents, some attributes.
Here is an example of the html I need to get info from:
<div class="log">
<div class="message">
<abbr class="dt" title="time string">
DATA_1
</abbr>
:
<cite class="user">
<a class="tel" href="tel:+xxxx">
<abbr class="fn" title="DATA_2">
Me
</abbr>
</a>
</cite>
:
<q>
DATA_3
</q>
</div>
</div>
The "message" block may occur once or hundreds of times. I am trying to end up with data like this:
array(4) {
[0] => array(3) {
["time"] => "DATA_1"
["name"] => "DATA_2"
["message"] => "DATA_3"
}
[1] => array(3) {
["time"] => "DATA_1"
["name"] => "DATA_2"
["message"] => "DATA_3"
}
[2] => array(3) {
["time"] => "DATA_1"
["name"] => "DATA_2"
["message"] => "DATA_3"
}
[3] => array(3) {
["time"] => "DATA_1"
["name"] => "DATA_2"
["message"] => "DATA_3"
}
}
I tried using simplexml but it only seems to work on very simple html pages. Could someone link me to some examples? I get really confused since I need to get DATA_2 from a title attribute. What do you think is the best way to extract his data? It seems very similar to XML extraction which I have done, but I need to use some other method.
Here is an example using DOMDocument and DOMXpath to parse your HTML.
$doc = new DOMDocument;
$doc->loadHTMLFile('your_file.html');
$xpath = new DOMXpath($doc);
$res = array();
foreach ($xpath->query('//div[#class="message"]') as $elem) {
$res[] = array(
'time' => $xpath->query('abbr[#class="dt"]', $elem)->item(0)->nodeValue,
'name' => $xpath->query('cite/a/abbr[#class="fn"]', $elem)->item(0)->getAttribute('title'),
'message' => $xpath->query('q', $elem)->item(0)->nodeValue,
);
}
Can I suggest using xPath? It seems like a perfect candidate for what you want to do (but I may be misinterpreting what you're asking).
XPath will let you select particular nodes of an XML/HTML tree, and then you can operate on them from there. After that, it should be a simple task (or a tiny bit of simple regex at most. Personally, I love regex, so let me know if you need help with that).
Your XPath statements will look something like (assuming no conflicting names):
time (data 1):
/div/div/abbr/text()
name (data 2):
/div/div/cite/a/abbr/#title
message (data 3):
/div/div/q/text()
You can get more tech than this if, for example, if you want to identify the elements via their attributes, but what I've given you will be pretty fast.
I have the following piece of code:
print_r($queries);
$id2query = array();
while ($res_array = mysql_fetch_array($results)) {
$id = $res_array['id'];
$query = $res_array['query'];
$id2query[$id] = $query;
}
print_r($queries);
The interesting thing is that printr_r before and after the loop return different things.
Does anybody know how it can be possible?
ADDED
$queries is an array. It shown code is a part of a function and $queries is one of the arguments of the function. Before the loop it returns:
Array ( [0] => )
and after the loop it returns:
Array ( [0] => web 2.0 )
ADDED 2
web 2.0 comes from $res_array. Here is the content of the $res_array:
Array ( [0] => 17 [id] => 17 [1] => web 2.0 [query] => web 2.0 [2]
But I do not understand how a value from $res_array migrates to $queries.
ADDED 3
I tried
print "AAAA".var_dump($queries)."BBB";
it returns AAABBB.
ADDED 4
I have managed to use var_dump in the correct way and this is what it returns before the loop:
array(1) { [0]=> &string(0) "" }
This is what I have after the loop:
array(1) { [0]=> &string(7) "web 2.0" }
But I do not understand what it means.
The var_dump below ADDED 4 shows it, the array contains a reference to a string. So it is not a copy of that string, it is something like a pointer (I know, they are not real pointers, see PHPDocs below) to the original string. So if that one gets changed, the references shows the changed value too.
I'd suggest you have a look at:
PHPDoc References
PHPDoc What references do
Example code:
$s = "lulu";
$a = array(&$s);
var_dump($a);
$s = "lala";
var_dump($a);
First var_dump will return:
array(1) {
[0]=>
&string(4) "lulu"
}
And the second:
array(1) {
[0]=>
&string(4) "lala"
}
I'm doing some content importing using the node import module in drupal. My problem is that I'm getting errors on data that looks like it should be working smoothly. This is the code at issue:
if (count($allowed_values) && !array_key_exists($item['value'], $allowed_values)) { //$allowed_values[$item['value']] == NULL) {
print "||||" . $item['value'] . "||||";
print_r($allowed_values);
And this is a sample of what is printing:
||||1||||Array ( [0] => no [1] => Zicam® Cold Remedy Nasal Gel Spray Single Hole Actuator (“Jet”) ) ||||1||||Array ( [0] => No [1] => Yes )
It looks to me like it's saying that "1" is not in the array, when printing the way "1" is clearly visible. If I replace the existing module code with the commented out check, no error is thrown.
Your code is not complete and i cannot reproduce the error.
Allow me to adjust your example:
<?
$item = array('value' => 1);
$allowed_values = array(0 => 'no',1 => 'yes');
echo "needle:";
var_dump($item['value']);
echo "haystack:";
var_dump($allowed_values);
if (count($allowed_values) && !array_key_exists($item['value'], $allowed_values)) {
echo "needle hast not been found or haystack is empty\n";
} else {
echo "needle has been found\n";
}
gives the desired output:
needle:int(1)
haystack:array(2) {
[0]=>
string(2) "no"
[1]=>
string(3) "yes"
}
needle has been found
PHP also works when you assign the needle a string and not an integer. It is some sort of lossy type conversion that can be really convenient but also a pain in the ass. Often you dont know whats going on and errors are caused.
But still. I bet you have something wrong with your variable types.
You should dump them and see what is really in there.
I started using the phpquery thingy, but I got lost in all that documentation.
In case someone does not know what the hell I am talking about: http://code.google.com/p/phpquery/
My question is pretty much basic.
I succeeded at loading an XML document and now I want to parse all the tags from it.
Using pq()->find('title') I can output all of the contents inside the title tags. Great!
But I want to throw every <title> tag in a variable. So, lets say that there are 10 <title> tags, I want every one of them in a separate variable, like: $title1, $title2 ... $title10. How can this be done?
Hope you understand the question.
TIA!
You could do it like this:
phpQuery::unloadDocuments();
phpQuery::newDocument($content);
$allTitles = [];
pq('title')->each(function ($item) use (&$allTitles) {
$allTitles[] = pq($item)->text();
});
var_dump($allTitles);
For example if there are 3 titles in the $content this var_dump will output:
array(3) {
[0] =>
string(6) "title1"
[1] =>
string(6) "title2"
[2] =>
string(6) "title3"
}