Why does my XML data get all mixed up? - php

i have a problem with getting content from a XML into a mysql database.
This is the code:
$objDOM = new DOMDocument('1.0', 'UTF-8');
$objDOM->load("something.xml"); $IAutnr = $objDOM->getElementsByTagName("Data");
Now, in a for loop:
for($i=$t;$i<=$max;$i++) {
$some= $objDOM->getElementsByTagName("some");
$something = $some->item($i)->nodeValue;
$some2 = $objDOM->getElementsByTagName("some2");
$something2 = $some2->item($i)->nodeValue;
Now put $something and $something2 into the database
}
Now, what happens is, that everything works perfectly fine until one of the Elements (some,some2...) does not exist within the tag "Data". So what he does, is taking the element from the next "Data"-tag and this mixes all my data, so I have data in my database, that actually doesnt belong there. And so I have an all mixed up database.
I allready tried for several hours to change the XML manually by putting the missing tags inside, but with thousands of data records, it is not possible.
So I need to add something into my code, that will have the effect, that if the tag doesnt exist, just leave it and dont take the tag from the next "Data"-Tag.
I actually dont even understand why he is doing that, why is he just jumping into the next "Data"-tag?
Thank you very much for your help!

I'm only guessing here about the content of your XML structure, but I imagine it looks something like
...
<Data>
<some>a</some>
<some2>b</some2>
</Data>
<Data>
<some>c</some>
<some2>d</some2>
</Data>
...
If this is the case, you should be looping over the collection of Data elements in $IAutnr, eg
for($i = 0, $limit = min($IAutnr->length, $max); $i < $limit; $i++) {
$data = $IAutnr->item($i);
$some = $data->getElementsByTagName('some');
$something = $some->item(0)->nodeValue;
$some2 = $data->getElementsByTagName('some2');
$something2 = $some2->item(0)->nodeValue;
// insert
}
Unless you need some of the more advanced features of the DOM library, I'd recommend using SimpleXML.

It does that because you're asking it to extract elements with tag name "some" and "some2" from the entire XML structure, so that's what it does -- it doesn't only look into the branch you intend it to, because you never tell it to do that. One way to fix it is to look at $some->items($i)->parentNode (and maybe to that node's parent, and so on) in order to properly identify the parent $something and $something2 belong to. Of course, there's no guarantee that $something and $something2 belong to the same parent, unless your XML is somehow guaranteed to present either none or both within the same branch. I know the explanation's a bit hairy, but that's the best way I could put it into words.

Related

Getting an XML value from a named field

Sorry to be asking this, but it's driving me crazy.
I've been using the php SimpleXMLElement as my XML go to parser, and I've looked at many examples, and have given up on this many times. But, now, I just need to have this working. There are many examples on how to get simple fields, but not so many with values in the fields...
I'm trying to get the "track_artist_name" value from this XML as a named variable in php.
<nowplaying-info-list>
<nowplaying-info >
<property name="track_title"><![CDATA[Song Title]]></property>
<property name="track_album_name"><![CDATA[Song Album]]></property>
<property name="track_artist_name"><![CDATA[Song Artist]]></property>
</nowplaying-info>
</nowplaying-info-list>
I've tried using xpath with:
$sxml->xpath("/nowplaying-info-list[0]/nowplaying-info/property[#name='track_artist_name']"));
But, I know it's all mucked up and not working.
I originally tried something like this too, thinking it made sense - but no:
attrs = $sxml->nowplaying_info[0]->property['#name']['track_artist_name'];
echo $attrs . "\n\n";
I know I can get the values with something such as this:
$sxml->nowplaying_info[0]->property[2];
Sometimes there are more lines in the XML results than other times, and so because of this, it is breaks the calculations with the wrong data.
Can someone shed some light on my problem? I'm just trying to the name of the artist to a variable. Many thanks.
*** WORKING UPDATE: **
I was unaware there were different XML interpreter methods, and was using the following XML interpreter version:
// read feed into SimpleXML object
$sxml = new SimpleXMLElement($json);
That didn't work, but have now updated to the following (for that section of code) thanks to the help here.
$sxml_new = simplexml_load_string($json_raw);
if ( $sxml_new->xpath("/nowplaying-info-list/nowplaying-info/property[#name='track_artist_name']") != null )
{
$results = $sxml_new->xpath("/nowplaying-info-list/nowplaying-info/property[#name='track_artist_name']");
//print_r($results);
$artist = (string) $results[0];
// var_dump($artist);
echo "Artist: " . $artist . "\n";
}
Your xpath expression is pretty much right, but you don't need to specify an index for the <nowplaying-info-list> element - it'll deal with that itself. If you were to supply an index, it would need to start at 1, not 0.
Try
$results = $sxml->xpath("/nowplaying-info-list/nowplaying-info/property[#name='track_artist_name']");
echo (string) $results[0];
Song Artist
See https://3v4l.org/eH4Dr
Your second approach:
$sxml->nowplaying_info[0]->property['#name']['track_artist_name'];
Would be trying to access the attribute named #name of the first property element, rather than treating it as an xpath-style # expression. To do this without using xpath, you'd need to loop over each of the <property> elements, and test their name attibrute.
Just in case if the node you are looking for is deeply residing some where, you could just add a double slash at the start.
$results = $sxml->xpath("//nowplaying-info-list/nowplaying-info/property[#name='track_artist_name']");
Also in case if you have multiple <nowplaying-info> elements. You could make of use of the index for that. (note the [1] index)
$results = $sxml->xpath("//nowplaying-info-list/nowplaying-info[1]/property[#name='track_artist_name']");

Create comma separated string via xml values

I'm working on some system for a few hours now and this little thing is too much for me to think logically about at the moment.
Normally I would wait a few hours but this is a last minute job and I need to finish this.
Here's my problem:
I have an XML file that gets posted to my PHP file, the PHP file inserts certain data into a DB, but some XML nodes have the same name:
<accessoires>
<accessoire>value1</accessoire>
<accessoire>value2</accessoire>
<accessoire>value3</accessoire>
</accessoires>
Now I want to get a var $acclist which contains all values seperated by a comma:
value1,value2,value3,
I bet the solution to this is very easy but I'm at the known point where even the easiest piece of code becomes a hassle. And googling only comes up with nodes that in some way have their own identifiers.
Could someone help me out please?
You can try simplexml_load_string to parse the html then call implode on the node after casting to an array.
NOTE This code was tested in php 5.4.6 and behaves as expected.
<?php
$xml = '<accessoires>
<accessoire>value1</accessoire>
<accessoire>value2</accessoire>
<accessoire>value3</accessoire>
</accessoires>';
$dat = simplexml_load_string($xml);
echo implode(",",(array)$dat->accessoire);
For 5.3.x I had to change to
$xml = '<accessoires>
<accessoire>value1</accessoire>
<accessoire>value2</accessoire>
<accessoire>value3</accessoire>
</accessoires>';
$dat = simplexml_load_string($xml);
$dat = (array)$dat;
echo implode(",",$dat["accessoire"]);
You do this by taking a library that is able to parse and process XML, for example with SimpleXML:
implode(',', iterator_to_array($accessoires->accessoire, FALSE));
The key part here is to use iterator_to_array() as SimpleXML offers the same-named child-elements here as an iterator. Otherwise $accessoires->accessoire gives you auto-magically only the first element (if any).

Find a node with xpath

I'd like to parse google geocode api respond, but the structure of the result is not always the same. I need to know the postal code for example, but it is sometimes in the Locality/DependentLocality/PostalCode/PostalCodeNumber node and sometimes in the Locality/PostalCode/PostalCodeNumber node. I don't really know the logic behind this, just want to get the value of the PostalCodeNumber node, no matter where is it exactly. Can I do it with XPath? If so, how?
UPDATE
Tried with //PostalCodeNumber but it returns an empty array. The code snippet is the following:
$xml = new \SimpleXMLElement($response);
var_dump($xml->xpath('//PostalCodeNumber'));
The $response is the content of http://maps.google.com/maps/geo?q=1055+Budapest&output=xml
(copy paste the url instead of clicking on it because of some character problems...)
Try to use this XPath:
Locality//PostalCodeNumber
It will find all descendants PostalCodeNumber of Locality element.
//PostalCode/PostalCodeNumber
Should do the trick. A quick google search yields the following schema snippet, indicating that there may be multiple DependentLocality elements, nested, so you'll want to check for multiple results, and have some idea of whether you want the most specific (most deeply nested) or least specific.
Update:
To guard against namespace issues, explicitly add the namespace to the query:
$xml = new SimpleXMLElement($response);
$xpath->registerXPathNamespace('ns', 'urn:oasis:names:tc:ciq:xsdschema:xAL:2.0');
var_dump($xml->xpath('//ns:PostalCodeNumber'));
Update 2: fixed a couple of typos
Update 3:
<?php
$result = file_get_contents('http://maps.google.com/maps/geo?q=1055+Budapest&output=xml');
$sxe = new SimpleXMLElement($result);
$sxe->registerXPathNamespace('c', 'urn:oasis:names:tc:ciq:xsdschema:xAL:2.0');
$search = $sxe->xpath('//c:PostalCodeNumber');
foreach($search as $code) {
echo $code;
}
?>

PHP: How can I access this XML entity when its name contains a reserved word?

I'm trying to parse this feed: http://musicbrainz.org/ws/1/artist/c0b2500e-0cef-4130-869d-732b23ed9df5?type=xml&inc=url-rels
I want to grab the URLs inside the 'relation-list' tag.
I've tried fetching the URL with PHP using simplexml_load_file(), but I can't access it using $feed->artist->relation-list as PHP interprets "list" as the list() function.
I have a feeling I'm going about this wrong (not much XML experience), and even if I was able to get hold of the elements I want, I don't know how to extract their attributes (I just want the type and target fields).
Can anyone gently nudge me in the right direction?
Thanks.
Matt
Have a look at the examples on the php.net page, they actually tell you how to solve this:
// $feed->artist->relation-list
$feed->artist->{'relation-list'}
To get an attribute of a node, just use the attribute name as array index on the node:
foreach( $feed->artist->{'relation-list'}->relation as $relation ) {
$target = (string)$relation['target'];
$type = (string)$relation['type'];
// Do something with it
}
(Untested)

Extract form fields using RegEx

I'm looking for a way to get all the form inputs and respective values from a page given a specific URL and form name.
function GetForm($url, $name)
{
return array
(
'field_name_1' => 'value_1',
'field_name_2' => 'value_2',
'select_field_name' => array('option_1', 'option_2', 'option_3'),
);
}
GetForm('http://www.google.com/', 'f');
Can anyone provide me with the necessary regular expressions to accomplish this?
EDIT: I understand that querying the DOM would be far more reliable, however what I'm looking for is a website agnostic solution that allows me to get all the fields of a given form. I don't believe this is possible with DOM without knowing the document nodes first, am I wrong?
I don't need a bullet proof solution, just something that works on standard web pages, for the FORM tag I've come up with the following RegEx;
'~<form.*?name=[\'"]?' . $name . '[\'"]?.*?>(.+?)</form>~is'
I believe that doing something similar for input fields won't be difficult, what I find most challenging is the RegEx for the select and option fields.
Using regex to parse HTML is probably not the best way to go.
You might take a look at DOMDocument::loadHTML, which will allow you to work with an HTML document using DOM methods (and XPath queries, for instance, if you know those).
You might also want to take a look at Zend_Dom and Zend_Dom_Query, btw, which are quite nice if you can use some parts of Zend Framework in your application.
They are used to get fetch data from HTML pages when doing functionnal testing with Zend_Test, for instance -- and work quite well ;-)
It may seem harder in the first place... But, considering the mess some HTML pages are, it is probably a much wiser idea...
EDIT after the comment and the edit of the OP
Here are a couple of thought about, to begin by something "simple", an input tag :
it can spread accross several lines
it can have many attributes
condirering only name and value are of interest to you, you have to deal with the fact that those two can be in any possible order
attributes can have double-quotes, single-quotes, or even nothing arround their values
tags / attributes can be both lower-case or upper-case
tags don't always have to be closed
Well, some of those points are not valid-HTML ; but still work in the most commons web-browsers, so they have to be taken into account...
Only with those points, I wouldn't like to be the one writting the regex ^^
But I suppose there might be others difficulties I didn't think about.
On the other side, you have DOM and xpath... To get the value of an input name="q" (example is this page), it's a matter of something like this :
$url = 'http://www.google.fr/search?q=test&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a';
$html = file_get_contents($url);
$dom = new DOMDocument();
if (#$dom->loadHTML($html)) {
// yep, not necessarily valid-html...
$xpath = new DOMXpath($dom);
$nodeList = $xpath->query('//input[#name="q"]');
if ($nodeList->length > 0) {
for ($i=0 ; $i<$nodeList->length ; $i++) {
$node = $nodeList->item($i);
var_dump($node->getAttribute('value'));
}
}
} else {
// too bad...
}
What matters here ? The XPath query, and only that... And is there anything static/constant in it ?
Well, I say I want all <input> that have a name attribute that is equal to "q".
And it just works : I'm getting this result :
string 'test' (length=4)
string 'test' (length=4)
(I checked : there are two input name="q" on the page ^^ )
Do I know the structure of the page ? Absolutly not ;-)
I just know I/you/we want input tags named q ;-)
And that's what we get ;-)
EDIT 2 : and a bit fun with select and options :
Well, just for fun, here's what I came up for select and option :
$url = 'http://www.google.fr/language_tools?hl=fr';
$html = file_get_contents($url);
$dom = new DOMDocument();
if (#$dom->loadHTML($html)) {
// yep, not necessarily valid-html...
$xpath = new DOMXpath($dom);
$nodeListSelects = $xpath->query('//select');
if ($nodeListSelects->length > 0) {
for ($i=0 ; $i<$nodeListSelects->length ; $i++) {
$nodeSelect = $nodeListSelects->item($i);
$name = $nodeSelect->getAttribute('name');
$nodeListOptions = $xpath->query('option[#selected="selected"]', $nodeSelect); // We want options that are inside the current select
if ($nodeListOptions->length > 0) {
for ($j=0 ; $j<$nodeListOptions->length ; $j++) {
$nodeOption = $nodeListOptions->item($j);
$value = $nodeOption->getAttribute('value');
var_dump("name='$name' => value='$value'");
}
}
}
}
} else {
// too bad...
}
And I get as an output :
string 'name='sl' => value='fr'' (length=23)
string 'name='tl' => value='en'' (length=23)
string 'name='sl' => value='en'' (length=23)
string 'name='tl' => value='fr'' (length=23)
string 'name='sl' => value='en'' (length=23)
string 'name='tl' => value='fr'' (length=23)
Which is what I expected.
Some explanations ?
Well, first of all, I get all the select tags of the page, and keep their name in memory.
Then, for each one of those, I get the selected option tags that are its descendants (there's always only one, btw).
And here, I have the value.
A bit more complicated that the previous example... But still much more easy than regex, I believe... Took me maybe 10 minutes, not more... And I still won't have the courage (madness ?) to start thinkg about some kind of mutant regex that would be able to do that :-D
Oh, and, as a sidenote : I still have no idea what the structure of the HTML document looks like : I have not even taken a single look at it's source ^^
I hope this helps a bit more...
Who knows, maybe I'll convince you regex are not a good idea when it comes to parsing HTML... maybe ? ;-)
Still : have fun !

Categories