PHP XPath ends-with [duplicate]

PHP XPath ends-with [duplicate] - php

This question already has an answer here:
Closed 11 years ago.
The community is reviewing whether to reopen this question as of 6 days ago.
Possible Duplicate:
How to use XPath function in a XPathExpression instance programatically?
I'm trying to find all of the rows of a nested table that contain an image with an id that ends with '_imgProductImage'.
I'm using the following query:
"//tr[/td/a/img[ends-with(#id,'_imgProductImage')]"
I'm getting the error: xmlXPathCompOpEval: function ends-with not found
My google searches i believe say this should be a valid query/function. What's the actual function i'm looking for if it's not "ends-with"?

from How to use XPath function in a XPathExpression instance programatically?
One can easily construct an XPath 1.0 expression, the evaluation of which produces the same result as the function ends-with():
$str2 = substring($str1, string-length($str1)- string-length($str2) +1)
produces the same boolean result (true() or false()) as:
ends-with($str1, $str2)
so for your example, the following xpath should work:
//tr[/td/a/img['_imgProductImage' = substring(#id, string-length(#id) - 15)]
you will probably want to add a comment that this is a xpath 1.0 reformulation of ends-with().

It seems that ends-with() is an XPath 2.0 function.
DOMXPath only supports XPath 1.0
Edit after the comment : In your case, I suppose you'll have to :
Find all images, using a simpler XPath query, that will return more images than what you want -- but include those you want to keep.
Loops over those, testing in PHP, for each one of them, if the id attribute (see the getAttribute method) matches what you want.
To test if the attribute is OK, you could use something like this, in the loop that iterates over the images :
$id = $currentNode->getAttribute('id');
if (preg_match('/_imgProductImage$/', $id)) {
// the current node is OK ;-)
}
Note that, in my regex pattern, I used a $ to indicate end of string.

There is no ends-with function in XPath 1.0, but you can fake it:
"//tr[/td/a/img[substring(#id, string-length(#id) - 15) = '_imgProductImage']]"

If you're on PHP 5.3.0 or later, you can use registerPHPFunctions to call any PHP function you want, although the syntax is a little odd. For example,
$xpath = new DOMXPath($document);
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions("ends_with");
$nodes = $x->query("//tr[/td/a/img[php:function('ends-with',#id,'_imgProductImage')]"
function ends_with($node, $value){
return substr($node[0]->nodeValue,-strlen($value))==$value;
}

Related

Can't get picture url by Parsing [duplicate]

This question already has answers here:
Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?
(2 answers)
Closed 8 years ago.
Im building a script that give me an product array by parsing html from a list of websites.
I believe that Im doing everything right.. But for some reason i have alots of difficulty with only one website Makita.ca
So.. Im using DOMXPath for retrieving element. i am providing the RAW html that im getting from makita.ca
What picture i want to get is those on the pictures that are on the left
please also note that the only thing i need is the link of the image and not the actual
image.
the folowing image page is at http://www.makita.ca/index2.php?event=tool&id=100
$productArray = array();
$Dom = new DOMDocument();
#$Dom -> loadHTML($this->html);
$xpath = new DOMXPath($Dom);
echo $xpath -> query('//*[#id="content_other"]/table[2]/tbody/tr/td[1]/table/tbody/tr[4]/td/table/tbody/tr[1]/td/div/a/img')->length;
if($xpath -> query('//*[#id="content_other"]/table[2]/tbody/tr/td[1]/table/tbody/tr[4]/td/table')->length > 0)
{
for($i=0;$i<$xpath->query('//*[#id="content_other"]/table[2]/tbody/tr/td[1]/table/tbody/tr[4]/td/table/tbody/tr')->length;$i++)
{
if($xpath->query('//*[#id="content_other"]/table[2]/tr/td[1]/table/tr[4]/td/table/tr['.$i.']/td/div/a/img') > 0)
$productArray['picture'][] = $xpath -> query('//*[#id="content_other"]/table[2]/tr/td[1]/table/tr[4]/td/table/tr['.$i.']/td/div/a/img')->item(0)->nodeValue;
}
}
Do you see what is my mistake ? since now im really lost.
Edit:
ok for test purposes i am echoing the length of the query() method witch should give me how much element match the query
So I retyped to hole query down so they can't have any non asci character
So i retyped the hole query '//*[#id="content_other"]/table[2]//tr/td1/table//tr[4]/td/table//tr1/td/div‌/a/img'
then the result is 0
So i removed the end of the query part by part..
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td/table//tr[1]/td/div‌/a = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td/table//tr[1]/td/div‌ = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td/table//tr[1]/td = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td/table//tr[1] = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td/table = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr = 5
Wooo i got some element matching here !
ok let try the last element witch is the one i need
so since it is zero based then to get the tr number 5 i need to enter as a path this
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]
But I still get 0.... So i dont know what to do any more..

//div[#class='product_heading']/ancestor-or-self::table[1]//a/img selects firstly the "Action Shots", then all the images found under this bloc.
This XPath expression will be more reliable than yours, because of the low number of positional expressions which tends to break easily as the markup changes.
//div[#class='product_heading']/ancestor-or-self::table[1]//a[#rel='thumbnail']/img would be a stronger security

How to read this kind of XML with PHP simpleXML [duplicate]

This question already has answers here:
Simple XML - Dealing With Colons In Nodes
(4 answers)
Closed 9 years ago.
I've been trying to use SimpleXML, but it doesn't seem to like XML that looks like this:
<xhtml:div>sample <xhtml:em>italic</xhtml:em> text</xhtml:div>
So what library will handle tags that look like that (have a colon in them)?

Say you have some xml like this.
<xhtml:div>
<xhtml:em>italic</xhtml:em>
<date>2010-02-01 06:00</date>
</xhtml:div>
You can access 'em' like this: $xml->children('xhtml', true)->div->em;
however, if you want the date field, this: $xml->children('xhtml', true)->div->date; wont work, because you are stuck in the xhtml namespace.
you must execute 'children' again to get back to the default namespace:
$xml->children('xhtml', true)->div->children()->date;

If you want to fix it quickly do this (I do when I feel lazy):
// Will replace : in tags and attributes names with _ allowing easy access
$xml = preg_replace('~(</?|\s)([a-z0-9_]+):~is', '$1$2_', $xml);
This will convert <xhtml: to <xhtml_ and </xhtml: to </xhtml_.
Kind of hacky and can fail if CDATA NameSpaced XML container blocks are involved or UNICODE tag names but I'd say you are usually safe using it (hasn't failed me yet).

Colon denotes an XML namespace. The DOM has good support for namespaces.

I don't think it's a good idea to get rid of the colon or to replace it with something else as some people suggested. You can easily access elements that have a namespace prefix. You can either pass the URL that identifies the namespace as an argument to the children() method or pass the namespace prefix and "true" to the children() method. The second approach requires PHP 5.2 and up.
SimpleXMLElement::children

Find a node with xpath

I'd like to parse google geocode api respond, but the structure of the result is not always the same. I need to know the postal code for example, but it is sometimes in the Locality/DependentLocality/PostalCode/PostalCodeNumber node and sometimes in the Locality/PostalCode/PostalCodeNumber node. I don't really know the logic behind this, just want to get the value of the PostalCodeNumber node, no matter where is it exactly. Can I do it with XPath? If so, how?
UPDATE
Tried with //PostalCodeNumber but it returns an empty array. The code snippet is the following:
$xml = new \SimpleXMLElement($response);
var_dump($xml->xpath('//PostalCodeNumber'));
The $response is the content of http://maps.google.com/maps/geo?q=1055+Budapest&output=xml
(copy paste the url instead of clicking on it because of some character problems...)

Try to use this XPath:
Locality//PostalCodeNumber
It will find all descendants PostalCodeNumber of Locality element.

//PostalCode/PostalCodeNumber
Should do the trick. A quick google search yields the following schema snippet, indicating that there may be multiple DependentLocality elements, nested, so you'll want to check for multiple results, and have some idea of whether you want the most specific (most deeply nested) or least specific.
Update:
To guard against namespace issues, explicitly add the namespace to the query:
$xml = new SimpleXMLElement($response);
$xpath->registerXPathNamespace('ns', 'urn:oasis:names:tc:ciq:xsdschema:xAL:2.0');
var_dump($xml->xpath('//ns:PostalCodeNumber'));
Update 2: fixed a couple of typos
Update 3:
<?php
$result = file_get_contents('http://maps.google.com/maps/geo?q=1055+Budapest&output=xml');
$sxe = new SimpleXMLElement($result);
$sxe->registerXPathNamespace('c', 'urn:oasis:names:tc:ciq:xsdschema:xAL:2.0');
$search = $sxe->xpath('//c:PostalCodeNumber');
foreach($search as $code) {
echo $code;
}
?>

Is there any faster/better way instead of using preg_match in the following code? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How can I edit my code to echo the data of child's element where my search term was found in, in XMLReader?
this code finds if there is the string 2004 in <date_iso></date_iso> and if it is so, I echo some data from that specific element that the search string was found.
I was wondering if this is the best/fastest approach because my main concern is speed and the XML file is huge. Thank you for your ideas.
this is a sample of the XML
<entry ID="4406">
<id>4406</id>
<title>Book Look Back at 2002</title>
<link>http://www.sebastian-bergmann.de/blog/archives/33_Book_Look_Back_at_2002.html</link>
<description></description>
<content_encoded></content_encoded>
<dc_date>20.1.2003, 07:11</dc_date>
<date_iso>2003-01-20T07:11</date_iso>
<blog_link/>
<blog_title/>
</entry>
this is the code
<?php
$books = simplexml_load_file('planet.xml');
$search = '2004';
foreach ($books->entry as $entry) {
if (preg_match('/' . preg_quote($search) . '/i', $entry->date_iso)) {
echo $entry->dc_date;
}
}
?>
this is another approach
<?php
$books = simplexml_load_file('planet.xml');
$search = '2004';
$regex = '/' . preg_quote($search) . '/i';
foreach ($books->entry as $entry) {
if (preg_match($regex, $entry->date_iso)) {
echo $entry->dc_date;
}
}
?>

If your main concern is speed, you shouldn't use simplexml or any other DOM-based xml parsing for this; use a SAX-based parser. Furthermore, don't use preg_match if you only want to do simple substring matching (use strpos).
If speed isn't really your concern but being idiomatic is, use an XPath 2.0 implementation (don't know if there is one for PHP) or do other XPath-based regex matching things - a quick google shows exslt options, or simpler xpath 1.0-based string matching options.

preg_match is a regular expression function, if you only need to do simple string comparisons, most often it's recommended to not use regular expression for it.
An alternative to your use of preg_match would be to compare the beginning of the <date-iso> contents against the year:
if ($search === substr($entry->date_iso, 0, 4))
as the date is always in the same format (hopefully) and starts with the year. You could also add the - to the search string and then compare against the first 5 characters.

Get a single element with PHP and XPath

Lots of tutorials around the net but none of them can explain me this:
How do I select a single element (in a table, for example), having its absolute XPath?
Example:
I have this:
/html/body/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[3]/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[4]/td[5]/span
What's that PHP function to get the text of that element?!
Really I could not find an answer. Found lots of guides and hints to get all the elements of the table, all the buttons of a form, etc, but not what I need.
Thank you.

$xml = simplexml_load_string($html_content_string);
$arr = $xml->xpath("//body/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[3]/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[4]/td[5]/span");
var_dump($arr);

Load you HTML document into a DOM object then make a DOMXPath object from it and let it evaluate your query string.
It's all described in detail here: http://php.net/manual/en/book.dom.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP XPath ends-with [duplicate] - php

There is no ends-with function in XPath 1.0, but you can fake it: "//tr[/td/a/img[substring(#id, string-length(#id) - 15) = '_imgProductImage']]"

Related

Can't get picture url by Parsing [duplicate]

How to read this kind of XML with PHP simpleXML [duplicate]

Find a node with xpath

Is there any faster/better way instead of using preg_match in the following code? [duplicate]

Get a single element with PHP and XPath

Categories

Resources