I'm trying to write a script that grabs the URL of the first image from this website: http://www.slothradio.com/covers/?adv=&artist=pantera&album=vulgar+display+of+power
Here's my script:
$content = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[#class='album0']/img");
echo '<pre>';print_r($elements);exit;
When I run that, it outputs
DOMNodeList Object
(
)
Even when I change my query to $xpath->query("*/img"), I still get nothing. What am I doing wrong?
$doc->loadHTMLFile($content); takes in FILE PATH not HTML content see documentation
http://php.net/manual/en/domdocument.loadhtmlfile.php
Use
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
To Output Element use
var_dump(iterator_to_array($elements));
//Or
print_r(iterator_to_array($elements));
Thanks
:)
What am I doing wrong?
You are using print_r, but DOMNodeList does not offer any output for that function (because it's an internal class). You can start with outputting the number of items for example. In the end you need to iterate over the node list and deal with each node on your own.
printf("Found %d element(s).\n", $elements->length);
Related
I want to see what is inside a DOMXPath object. I am not referring to using query/evaluate functions of XPath. At the moment I have the following code:
$file = file_get_contents("schema.xsd");
$doc = new DOMDocument();
$doc->loadXML($file);
$xpath = new DOMXPath($doc);
$xpath->registerNamespace('xs', 'http://www.w3.org/2001/XMLSchema');
How do I display the $xpath using PHP?
You can use print_r, var_dump or var_export which allows you to "view" variables in PHP.
More information in this link.
As a bonus, you can wrap it in a pre or code tag so it gets laid out decently.
<pre>
<?php print_r($xpath); ?>
</pre>
I'm trying to figure out how parse an html page to get a forms action value, the labels within the form tab as well as the input field names. I took at look at php.net Domdocument and it tells me to get a childnode but all that does is give me errors that it doesnt exist. I also tried doing print_r of the variable holding the html content and all that shows me is length=1. Can someone show me a few samples that i can use because php.net is confusing to follow.
<?php
$content = "some-html-source";
$content = preg_replace("/&(?!(?:apos|quot|[gl]t|amp);|#)/", '&', $content);
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($content);
$form = $dom->getElementsByTagName('form');
print_r($form);
I suggest using DomXPath instead of getElementsByTagName because it allows you to select attribute values directly and returns a DOMNodeList object just like getElementsByTagName. The # in #action indicates that we're selecting by attribute.
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DomXPath($doc);
$action = $xpath->query('//form/#action')->item(0);
var_dump($action);
Similarly, to get the first input
$action = $xpath->query('//form/input')->item(0);
To get all input fields
for($i=0;$i<$xpath->query('//form/input')->length;$i++) {
$label = $xpath->query('//form/input')->item($i);
var_dump($label);
}
If you're not familiar with XPath, I recommend viewing these examples.
I'm building a php script to transfer selected contents of an xml file to an sql database..
One of the hardcoded XML contents is formatted like this:
<visualURL>
id=18144083|img=http://upload.wikimedia.org/wikipedia/en/8/86/Holyrollernovacaine.jpg
</visualURL>
And I'm looking for a way to just get the contents of the URL (all text after img=).
$Image = $xpath->query("substring-after(/Playlist/PlaylistEntry[1]/visualURL[1]/text(), 'img=')", $element)->item(0)->nodeValue;
Displays a property non-object error on my php output.
There must be another way to just extract the URL contents using XPath that I want, no?
Any help would be greatly appreciated!
EDIT:
Here is the minimum code
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->loadXML('<Playlist>
<PlaylistEntry>
<visualURL>
id=12582194|img=http://upload.wikimedia.org/wikipedia/en/9/96/Sometime_around_midnight.jpg
</visualURL>
</PlaylistEntry>
</Playlist>');
$xpath = new DOMXpath($xmlDoc);
$elements = $xpath->query("/Playlist/PlaylistEntry[1]");
if (!is_null($elements))
foreach ($elements as $element)
$Image = $xpath->query("substring-after(/Playlist/PlaylistEntry[1]/visualURL[1]/text(), 'img=')", $element)- >item(0)->nodeValue;
print "Finished Item: $Image";
?>
EDIT 2:
After some research I believe I must use
$xpath->evaluate
instead of my current use of
$xpath->query
see this link
Same XPath query is working with Google docs but not PHP
I'm not exactly sure how to do this yet.. but i will investigate more in the morning. Again, any help would be appreciated.
You're in right direction. Use DOMXPath::evaluate() for xpath expression that doesn't return node(s) like substring-after() (it returns string as documented in the linked page). The following codes prints expected output :
$xmlDoc = new DOMDocument();
$xml = <<<XML
<Playlist>
<PlaylistEntry>
<visualURL>
id=12582194|img=http://upload.wikimedia.org/wikipedia/en/9/96/Sometime_around_midnight.jpg
</visualURL>
</PlaylistEntry>
</Playlist>
XML;
$xmlDoc->loadXML($xml);
$xpath = new DOMXpath($xmlDoc);
$elements = $xpath->query("/Playlist/PlaylistEntry");
foreach ($elements as $element) {
$Image = $xpath->evaluate("substring-after(visualURL, 'img=')", $element);
print "Finished Item: $Image <br>";
}
output :
Finished Item: http://upload.wikimedia.org/wikipedia/en/9/96/Sometime_around_midnight.jpg
Demo
I am getting a xml response from doing:
$foo = $client->__doRequest (parameters here)
when I echo out $foo I get the xml exactly as I'm told I should. The problem is now I want to extract some values from the xml. Now the easiest way I can see to do that is to convert it to a php array and then is super simple to get value and do lots of lovely stuff with but I seem to be having trouble doing this. Have seen a lot of examples using simple_load_xml but all I get is 'Notice: Array to string conversion in'. When I var_dump '$foo' I get 'string 'xml' '.
What am I doing wrong?
As suggested by #CD001 I persevered with DOMDocument and figured it out in the end with the following code:
$dom = new DOMDocument;
$dom->loadXML($xml);
$things = $dom->getElementsByTagName('chocolate');
/** I only had a single result so had to do it this way rather then a loop**/
if($things->length > 0) {
$node = $things->item(0);
$chocolate = $node->nodeValue;
}
else {
// empty result set
}
echo $chocolate;
bah! JSON is so much nicer...
Use Xpath:
$dom = new DOMDocument;
$dom->loadXML($xml);
$xpath = new DOMXpath($dom);
// get content of the first chocolate element node as a string
$chocolate = $xpath->evaluate('string(//chocolate)');
echo $chocolate;
I am using domDocument hoping to parse this little html code. I am looking for a specific span tag with a specific id.
<span id="CPHCenter_lblOperandName">Hello world</span>
My code:
$dom = new domDocument;
#$dom->loadHTML($html); // the # is to silence errors and misconfigures of HTML
$dom->preserveWhiteSpace = false;
$nodes = $dom->getElementsByTagName('//span[#id="CPHCenter_lblOperandName"');
foreach($nodes as $node){
echo $node->nodeValue;
}
But For some reason I think something is wrong with either the code or the html (how can I tell?):
When I count nodes with echo count($nodes); the result is always 1
I get nothing outputted in the nodes loop
How can I learn the syntax of these complex queries?
What did I do wrong?
You can use simple getElementById:
$dom->getElementById('CPHCenter_lblOperandName')->nodeValue
or in selector way:
$selector = new DOMXPath($dom);
$list = $selector->query('/html/body//span[#id="CPHCenter_lblOperandName"]');
echo($list->item(0)->nodeValue);
//or
foreach($list as $span) {
$text = $span->nodeValue;
}
Your four part question gets an answer in three parts:
getElementsByTagName does not take an XPath expression, you need to give it a tag name;
Nothing is output because no tag would ever match the tagname you provided (see #1);
It looks like what you want is XPath, which means you need to create an XPath object - see the PHP docs for more;
Also, a better method of controlling the libxml errors is to use libxml_use_internal_errors(true) (rather than the '#' operator, which will also hide other, more legitimate errors). That would leave you with code that looks something like this:
<?php
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query("//span[#id='CPHCenter_lblOperandName']") as $node) {
echo $node->textContent;
}