I want to use xPath to get only the contents of the head element.
//some curl request
$result = curl_exec($ch);
if (curl_errno($ch)) {
echo 'Error:' . curl_error($ch);
}
curl_close ($ch);
$DOM = new DOMDocument;
libxml_use_internal_errors(true);
if (!$DOM->loadHTML('<?xml encoding="utf-8" ?>' . $result)) {
$errors = "";
foreach (libxml_get_errors() as $error) {
$errors .= $error->message . "<br/>";
}
libxml_clear_errors();
print "libxml errors:<br>$errors";
return;
}
$xpath = new DOMXPath($DOM);
$head = $xpath->query('//*["head"]')->item(0); // Any suggestions?
As stated in the title, I am using PHP to try and extract the contents of whatever is in head in curl. Any suggestions is welcome.
If your curl response is as expected and everything up until your XPath is good, then try:
$head = $xpath->query('/head/*');
foreach ($head as $headElement) {
/* DO STUFF*/
}
You may also not need to add <?xml encoding="utf-8" ?> to the beginning of the result.
Related
I have tried various methods as seen in here
and in here and many more.
I even tried the function in here.
The XML looks something like this:
<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:a="http://www.w3.org/2005/08/addressing"><s:Header><a:Action s:mustUnderstand="1">http://tempuri.org/IFooEntryOperation/SaveFooStatusResponse</a:Action></s:Header><s:Body><SaveFooStatusResponse xmlns="http://htempuri.org/"><SaveFooStatusResult xmlns:b="http://schemas.datacontract.org/2004/07/FooAPI.Entities.Foo" xmlns:i="http://www.w3.org/2001/XMLSchema-instance"><b:AWBNumber>999999999</b:AWBNumber><b:IsError>true</b:IsError><b:Status><b:FooEntryStatus><b:StatusCode>Foo_ENTRY_FAILURE</b:StatusCode><b:StatusInformation>InvalidEmployeeCode</b:StatusInformation></b:FooEntryStatus></b:Status></SaveFooStatusResult></SaveFooStatusResponse></s:Body></s:Envelope>
And here's one example of my code (I have a dozen variations):
$ReturnData = $row["ReturnData"]; // string frm a database
if (strpos($ReturnData, "s:Envelope") !== false){
$ReturnXML = new SimpleXMLElement($ReturnData);
$xml = simplexml_load_string($ReturnXML);
$StatusCode = $xml["b:StatusCode"];
echo "<br>StatusCode: " . $StatusCode;
$IsError = $xml["b:IsError"];
echo "<br>IsError: " . $IsError;
}
Another option I tried:
$test = json_decode(json_encode($xml, 1); //this didn't work either
I either get an empty array or I get errors like:
"Fatal error: Uncaught exception 'Exception' with message 'String
could not be parsed as XML"
I have tried so many things, I may lost track of where my code is right now. Please help - I am really stuck...
I also tried:
$ReturnXML = new SimpleXMLElement($ReturnData);
foreach( $ReturnXML->children('b', true)->entry as $entries ) {
echo (string) 'Summary: ' . simplexml_load_string($entries->StatusCode->children()->asXML(), null, LIBXML_NOCDATA) . "<br />\n";
}
Method 1.
You can try the below code snippet to parse it an array
$p = xml_parser_create();
xml_parse_into_struct($p, $xml, $values, $indexes);// $xml containing the XML
xml_parser_free($p);
echo "Index array\n";
print_r($indexes);
echo "\nVals array\n";
print_r($values);
Method 2.
function XMLtoArray($xml) {
$previous_value = libxml_use_internal_errors(true);
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->preserveWhiteSpace = false;
$dom->loadXml($xml);
libxml_use_internal_errors($previous_value);
if (libxml_get_errors()) {
return [];
}
return DOMtoArray($dom);
}
function DOMtoArray($root) {
$result = array();
if ($root->hasAttributes()) {
$attrs = $root->attributes;
foreach ($attrs as $attr) {
$result['#attributes'][$attr->name] = $attr->value;
}
}
if ($root->hasChildNodes()) {
$children = $root->childNodes;
if ($children->length == 1) {
$child = $children->item(0);
if (in_array($child->nodeType,[XML_TEXT_NODE,XML_CDATA_SECTION_NODE]))
{
$result['_value'] = $child->nodeValue;
return count($result) == 1
? $result['_value']
: $result;
}
}
$groups = array();
foreach ($children as $child) {
if (!isset($result[$child->nodeName])) {
$result[$child->nodeName] = DOMtoArray($child);
} else {
if (!isset($groups[$child->nodeName])) {
$result[$child->nodeName] = array($result[$child->nodeName]);
$groups[$child->nodeName] = 1;
}
$result[$child->nodeName][] = DOMtoArray($child);
}
}
}
return $result;
}
You can get an array using print_r(XMLtoArray($xml));
I don't know how you would do this using SimpleXMLElement but judging by the fact you have tried so many things I trust that the actual method employed is not important so you should therefore find the following, which uses DOMDocument and DOMXPath, of interest.
/* The SOAP response */
$strxml='<?xml version="1.0" encoding="UTF-8"?>
<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope" xmlns:a="http://www.w3.org/2005/08/addressing">
<s:Header>
<a:Action s:mustUnderstand="1">http://tempuri.org/IFooEntryOperation/SaveFooStatusResponse</a:Action>
</s:Header>
<s:Body>
<SaveFooStatusResponse xmlns="http://htempuri.org/">
<SaveFooStatusResult xmlns:b="http://schemas.datacontract.org/2004/07/FooAPI.Entities.Foo" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<b:AWBNumber>999999999</b:AWBNumber>
<b:IsError>true</b:IsError>
<b:Status>
<b:FooEntryStatus>
<b:StatusCode>Foo_ENTRY_FAILURE</b:StatusCode>
<b:StatusInformation>InvalidEmployeeCode</b:StatusInformation>
</b:FooEntryStatus>
</b:Status>
</SaveFooStatusResult>
</SaveFooStatusResponse>
</s:Body>
</s:Envelope>';
/* create the DOMDocument and manually control errors */
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=true;
$dom->recover=true;
$dom->strictErrorChecking=true;
$dom->loadXML( $strxml );
libxml_clear_errors();
/* Create the XPath object */
$xp=new DOMXPath( $dom );
/* Register the various namespaces found in the XML response */
$xp->registerNamespace('b','http://schemas.datacontract.org/2004/07/FooAPI.Entities.Foo');
$xp->registerNamespace('i','http://www.w3.org/2001/XMLSchema-instance');
$xp->registerNamespace('s','http://www.w3.org/2003/05/soap-envelope');
$xp->registerNamespace('a','http://www.w3.org/2005/08/addressing');
/* make XPath queries for whatever pieces of information you need */
$Action=$xp->query( '//a:Action' )->item(0)->nodeValue;
$StatusCode=$xp->query( '//b:StatusCode' )->item(0)->nodeValue;
$StatusInformation=$xp->query( '//b:StatusInformation' )->item(0)->nodeValue;
printf(
"<pre>
%s
%s
%s
</pre>",
$Action,
$StatusCode,
$StatusInformation
);
The output from the above:
http://tempuri.org/IFooEntryOperation/SaveFooStatusResponse
Foo_ENTRY_FAILURE
InvalidEmployeeCode
Please see my script below :
<?php
function getContent ()
{
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL, 'http://localhost/test.php/test2.php');
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
$output=curl_exec($ch);
curl_close($ch);
return $output;
}
function getHrefFromLinks ($cString){
libxml_use_internal_errors(true);
$dom = new DomDocument();
$dom->loadHTML($cString);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a/#href');
foreach($nodes as $href) {
echo $href->nodeValue; echo "<br />"; // echo current attribute value
$href->nodeValue = 'new value'; // set new attribute value
$href->parentNode->removeAttribute('href'); // remove attribute
}
foreach (libxml_get_errors() as $error) {
}
libxml_clear_errors();
}
echo getHrefFromLinks (getContent());
?>
The output of http://localhost/test.php/test2.php is :
Luck</span> LuckyLuck</span>'s Locki
When echo getHrefFromLinks (getContent()); runs, the output is :
/oncelink/index.html<br />/oncelink-2/lucky<br />
This is wrong, as the output should be :
/oncelink/index.html<br />/oncelink-2/lucky'locki<br />
I understand that the href value generated from the link is somehow incorrect as it includes an additional apostrophe but I won't be able to change that as it is pre-generated.
The other question is, how can I get the value of the span tag :
<span class="lsbold">
Thanks in advance!
SOLVED :)
Well. If it's stupid but it works, then it aint stupid :D
Just added the following code in the end :
$fix = str_replace("href='", 'href="', getContent());
$fix = str_replace("'>", '">', $fix);
echo getHrefFromLinks ($fix);
$newstring = substr_replace("http://ws.spotify.com/search/1/track?q=", $_COOKIE["word"], 39, 0);
/*$curl = curl_init($newstring);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($curl);*/
//echo $result;
$xml = simplexml_load_file($newstring);
//print_r($xml);
$xpath = new DOMXPath($xml);
$value = $xpath->query("//track/#href");
foreach ($value as $e) {
echo $e->nodevalue;
}
This is my code. I am using spotify to supply me with an xml document. I am then trying to get the href link from all of the track tags so I can use it. Right now the print_r($xml) I have commented out works, but if I try to query and print that out it returns nothing. The exact link I am trying to get my xml from is: http://ws.spotify.com/search/1/track?q=incredible
This maybe is not the answer you need, because I dropped the DOMXPath, I'm using getElementsByTagName() instead.
$url = "http://ws.spotify.com/search/1/track?q=incredible";
$xml = file_get_contents( $url );
$domDocument = new DOMDocument();
$domDocument->loadXML( $xml );
$value = $domDocument->getElementsByTagName( "track" );
foreach ( $value as $e ) {
echo $e->getAttribute( "href" )."<br>";
}
I'm trying to get the href of all anchor(a) tags using this code
$obj = json_decode($client->getResponse()->getContent());
$dom = new DOMDocument;
if($dom->loadHTML(htmlentities($obj->data->partial))) {
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
echo $node->getAttribute('href');
}
}
where the returned JSON is like here but it doesn't echo anything. The HTML does have a tags but the foreach is never run. What am I doing wrong?
Just remove that htmlentities(). It will work just fine.
$contents = file_get_contents('http://jsonblob.com/api/jsonBlob/54a7ff55e4b0c95108d9dfec');
$obj = json_decode($contents);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($obj->data->partial);
libxml_clear_errors();
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHTML($node) . '<br/>';
echo $node->getAttribute('href') . '<br/>';
}
I have this strange problem parsing XML document in PHP loaded via cURL. I cannot get nodeValue containing URL address (I'm trying to implement simple RSS reader into my CMS). Strange thing is that it works for every node except that containing url addresses and date ( and ).
Here is the code (I know it is a stupid solution, but I'm kinda newbie in working with DOM and parsing XML documents).
function file_get_contents_curl($url) {
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 4); // times out after 4s
$result = curl_exec($ch); // run the whole process
return $result;
}
function vypis($adresa) {
$html = file_get_contents_curl($adresa);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$desc = $doc->getElementsByTagName('description');
$ctg = $doc->getElementsByTagName('category');
$pd = $doc->getElementsByTagName('pubDate');
$ab = $doc->getElementsByTagName('link');
$aut = $doc->getElementsByTagName('author');
for ($i = 1; $i < $desc->length; $i++) {
$dsc = $desc->item($i);
$titles = $nodes->item($i);
$categorys = $ctg->item($i);
$pubDates = $pd->item($i);
$links = $ab->item($i);
$autors = $aut->item($i);
$description = $dsc->nodeValue;
$title = $titles->nodeValue;
$category = $categorys->nodeValue;
$pubDate = $pubDates->nodeValue;
$link = $links->nodeValue;
$autor = $autors->nodeValue;
echo 'Title:' . $title . '<br/>';
echo 'Description:' . $description . '<br/>';
echo 'Category:' . $category . '<br/>';
echo 'Datum ' . gmdate("D, d M Y H:i:s",
strtotime($pubDate)) . " GMT" . '<br/>';
echo "Autor: $autor" . '<br/>';
echo 'Link: ' . $link . '<br/><br/>';
}
}
Can you please help me with this?
To read RSS you shouldn't use loadHTML, but loadXML. One reason why your links don't show is because the <link> tag in HTML ignores its contents. See also here: http://www.w3.org/TR/html401/struct/links.html#h-12.3
Also, I find it easier to just iterate over the <item> tags and then iterate over their children nodes. Like so:
$d = new DOMDocument;
// don't show xml warnings
libxml_use_internal_errors(true);
$d->loadXML($xml_contents);
// clear xml warnings buffer
libxml_clear_errors();
$items = array();
// iterate all item tags
foreach ($d->getElementsByTagName('item') as $item) {
$item_attributes = array();
// iterate over children
foreach ($item->childNodes as $child) {
$item_attributes[$child->nodeName] = $child->nodeValue;
}
$items[] = $item_attributes;
}
var_dump($items);