Having a real bugger of an Xpath issue. I am trying to match the nodes with a certain value.
Here is an example XML fragment.
http://pastie.org/private/xrjb2ncya8rdm8rckrjqg
I am trying to match a given MatchNumber node value to see if there are two or more. Assuming that this is stored in a variable called $data I am using the below expression. Its been a while since ive done much XPath as most thing seem to be JSON these days so please excuse any rookie oversights.
$doc = new DOMDocument;
$doc->load($data);
$xpath = new DOMXPath($doc);
$result = $xpath->query("/CupRoundSpot/MatchNumber[.='1']");
I need to basically match any node that has a Match Number value of 1 and then determine if the result length is greater than 1 ( i.e. 2 or more have been found ).
Many thanks in advance for any help.
Your XML document has a default namespace: xmlns="http://www.fixtureslive.com/".
You have to register this namespace on the xpath element and use the (registered) prefix in your query.
$xpath->registerNamespace ('fl' , 'http://www.fixtureslive.com/');
$result = $xpath->query("/fl:ArrayOfCupRoundSpot/fl:CupRoundSpot/fl:MatchNumber[.='1']");
foreach( $result as $e ) {
echo '.';
}
The following XPath:
/CupRoundSpot[MatchNumber = 1]
Returns all the CupRoundSpot nodes where MatchNumber equals 1. You could use these nodes futher in your PHP to do stuff with it.
Executing:
count(/CupRoundSpot[MatchNumber = 1])
Returns you the total CupRoundSpot nodes found where MatchNumber equals 1.
You have to register the namespace. After that you can use the Xpath count() function. An expression like that will only work with evaluate(), not with query(). query() can only return node lists, not scalar values.
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('fl', 'http://www.fixtureslive.com/');
var_dump(
$xpath->evaluate(
'count(/fl:ArrayOfCupRoundSpot/fl:CupRoundSpot[number(fl:MatchNumber) = 1])'
)
);
Output:
float(2)
DEMO: https://eval.in/130366
To iterate the CupRoundSpot nodes, just use foreach:
$nodes = $xpath->evaluate(
'/fl:ArrayOfCupRoundSpot/fl:CupRoundSpot[number(fl:MatchNumber) = 1]'
);
foreach ($nodes as $node) {
//...
}
Related
I'm trying to understand the basics of DOMXPath in PHP. I have an XML file that starts with what's below.
<?xml version="1.0"?>
<ListFinancialEventsResponse xmlns="http://mws.amazonservices.com/Finances/2015-05-01">
<ListFinancialEventsResult>
<FinancialEvents>
<ShipmentEventList>
<ShipmentEvent>
I'm trying to get the FinancialEvents tags using the below PHP with a few different xpath query attempts but neither works.
$file = file_get_contents('file.xml');
$dom = new DOMDocument();
$dom->loadXML($file);
$xpath = new DOMXPath($dom);
$xpath->registerNamespace('m','http://mws.amazonservices.com/Finances/2015-05-01');
$events = $xpath->query('FinancialEvents'); // Attempt 1
$events = $xpath->query('m:FinancialEvents');// Attempt 2
According to the docs, these should return all nodes with name 'FinancialEvents'. I know that it works if I use the below xpath query
$events = $xpath->query('//m:FinancialEvents');
So my question is, why don't my first 2 queries work? Isn't the element <FinancialEvents> also a node of the same name?
Thanks
Not tested but you could try:
$query='//ListFinancialEventsResult/FinancialEvents';
$events=$xp->query( $query );
if( !empty( $events ) && $events->length > 0 ){
foreach( $events as $event ){
echo $event->nodeValue;
}
}
I suspect that the first couple of queries do not return a nodelist is because you are effectively trying to find nodes FinancialEvents at root level ~ if you supplied a reference node ( which would be a DOMNode ListFinancialEventsResult )for the query then it would work.
I have a xml file which contains this :
<ns1:Response xmlns:ns1="http://example.com/">
- <ns1:return>
<ns1:mid>39824</ns1:mid>
<ns1:serverType>4</ns1:serverType>
<ns1:size>5</ns1:size>
</ns1:return>
- <ns1:return>....
</ns1:return>
Now I want to get nodevalue of mid where nodevalue size has 5, I tried following code but no results:
$doc = new DOMDocument();
$doc->load($file);
$xpath = new DOMXPath($doc);
$query = '//Response/return/size[.="5"]/mid';
$entries = $xpath->evaluate($query);
So how can I do that ?
thanks in advance
PHP has some automatic registration for the namespaces of the current context, but it is a better idea not to depend on it. Prefixes can change. You can even use a default namespace and avoid the prefixes.
Best register your own prefix:
$xpath->registerNamespace('e', 'http://example.com/');
In XPath you define location paths with conditions:
Any return node inside a Response node:
//e:Response/e:return
If it has a child node size node with the value 5
//e:Response/e:return[e:size = 5]
Get the mid node inside it
//e:Response/e:return[e:size = 5]/e:mid
Cast the first found mid node into a string
string(//e:Response/e:return[e:size = 5]/e:mid)
Complete example:
$xml = <<<'XML'
<ns1:Response xmlns:ns1="http://example.com/">
<ns1:return>
<ns1:mid>39824</ns1:mid>
<ns1:serverType>4</ns1:serverType>
<ns1:size>5</ns1:size>
</ns1:return>
<ns1:return></ns1:return>
</ns1:Response>
XML;
$doc = new DOMDocument();
$doc->loadXml($xml);
$xpath = new DOMXPath($doc);
$xpath->registerNamespace('e', 'http://example.com/');
$mid = $xpath->evaluate(
'string(//e:Response/e:return[e:size = 5]/e:mid)'
);
var_dump($mid);
Output:
string(5) "39824"
You can also use following::sibling in this case. Get mid value where its following sibling is size with text equal to 5. Rough example:
$query = 'string(//ns1:Response/ns1:return/ns1:mid[following-sibling::ns1:size[text()="5"]])';
Sample Output
You're missing some namespace and you're trying to get the child mid of a size element whose content is 5.
try this:
$query = '//ns1:Response/ns1:return/ns1:mid[../ns1:size[text()="5"]]';
then, to see the result:
foreach ($entries as $entry) {
echo $entry->nodeValue . "<br />";
}
I'm trying to write a script that grabs the URL of the first image from this website: http://www.slothradio.com/covers/?adv=&artist=pantera&album=vulgar+display+of+power
Here's my script:
$content = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[#class='album0']/img");
echo '<pre>';print_r($elements);exit;
When I run that, it outputs
DOMNodeList Object
(
)
Even when I change my query to $xpath->query("*/img"), I still get nothing. What am I doing wrong?
$doc->loadHTMLFile($content); takes in FILE PATH not HTML content see documentation
http://php.net/manual/en/domdocument.loadhtmlfile.php
Use
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
To Output Element use
var_dump(iterator_to_array($elements));
//Or
print_r(iterator_to_array($elements));
Thanks
:)
What am I doing wrong?
You are using print_r, but DOMNodeList does not offer any output for that function (because it's an internal class). You can start with outputting the number of items for example. In the end you need to iterate over the node list and deal with each node on your own.
printf("Found %d element(s).\n", $elements->length);
I am parsing an HTML page with DOM and XPath in PHP.
I have to fetch a nested <Table...></table> from the HTML.
I have defined a query using FirePath in the browser which is pointing to
html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table
When I run the code it says DOMNodeList is fetched having length 0. My objective is to spout out the queried <Table> as a string. This is an HTML scraping script in PHP.
Below is the function. Please help me how can I extract the required <table>
$pageUrl = "http://www.boc.cn/sourcedb/whpj/enindex.html";
getExchangeRateTable($pageUrl);
function getExchangeRateTable($url){
$htmlTable = "";
$xPathTable = nulll;
$xPathQuery1 = "html/body/table[2]/tbody/tr/td[2]/table[2]/tbody/tr/td/table";
if(strlen($url)==0){die('Argument exception: method call [getExchangeRateTable] expects a string of URL!');}
// initialize objects
$page = tidyit($url);
$dom = new DOMDocument();
$dom->loadHTML($page);
$xpath = new DOMXPath($dom);
// $elements is sppearing as DOMNodeList
$elements = $xpath->query($xPathQuery1);
// print_r($elements);
foreach($elements as $e){
$e->firstChild->nodeValue;
}
}
have you try like this
$dom = new domDocument;
$dom->loadHTML($tes);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName("table");
$rows = $tables->item(0)->getElementsByTagName("tr");
print_r($rows);
Remove the tbody's from your XPath query - they are in most cases inserted by your browser, as is with the page you are trying to scrape.
/html/body/table[2]/tr/td[2]/table[2]/tr/td/table
This will most likely work.
However, its probaly more safe to use a different XPath. Following XPath will select the first th based on it's textual content, then select the tr's parent - a tbody or table:
//th[contains(text(),'Currency Name')]/parent::tr/parent::*
The xpath query should be with a leading / like :-
/html/...
I am loading HTML into DOM and then querying it using XPath in PHP. My current problem is how do I find out how many matches have been made, and once that is ascertained, how do I access them?
I currently have this dirty solution:
$i = 0;
foreach($nodes as $node) {
echo $dom->savexml($nodes->item($i));
$i++;
}
Is there a cleaner solution to find the number of nodes, I have tried count(), but that does not work.
You haven't posted any code related to $nodes so I assume you are using DOMXPath and query(), or at the very least, you have a DOMNodeList.
DOMXPath::query() returns a DOMNodeList, which has a length member. You can access it via (given your code):
$nodes->length
If you just want to know the count, you can also use DOMXPath::evaluate.
Example from PHP Manual:
$doc = new DOMDocument;
$doc->load('book.xml');
$xpath = new DOMXPath($doc);
$tbody = $doc->getElementsByTagName('tbody')->item(0);
// our query is relative to the tbody node
$query = 'count(row/entry[. = "en"])';
$entries = $xpath->evaluate($query, $tbody);
echo "There are $entries english books\n";