Update
To avoid delete this question, after a few comments I realized that, PHP uses Xpath 1.0 so I was trying to use functions from Xpath version 2.0.
Thanks for your feedback and hope some high reputation users suggest me if it's better to delete the question or leave it with this update.
I have been searching this problem but none of the solutions posted worked for me. I'm using DomXPath in PHP and I have used the Template Tester for Xpath 2.0 to test my queries: http://videlibri.sourceforge.net/cgi-bin/xidelcgi
This is the first part of the code to start working on it:
$dom_html = new DOMDocument();
libxml_use_internal_errors(true);
$dom_html->loadHTMLFile($file);
libxml_clear_errors();
$xpath = new DOMXPath($dom_html);
Then I try this query in the Xpath 2.0 Tester:
//table/tbody/tr/td/div/count(table)
This query give me the amount of that each div has inside of it and it's a perfect solution for me:
11
2
1
1
2
14
4
19
4
4
3
2
9
16
But when I tried to make the same in PHP, I have not obtain those numbers. I have trying the following solutions:
$quantity = $xpath->evaluate('count(//table/tbody/tr/td/div/table)');
But this give me the total count and not give me the desired solution.
$quantity = $xpath->query('//table/tbody/tr/td/div/count(table)');
When I make this query, I tried using two different ways to obtain mi desired answer but none of them works for me:
1)
foreach ($quantity as $content)
{
echo $content->nodeValue;
}
2)
foreach ($quantity as $content)
{
echo $contenido->textContent;
}
Thanks
Related
Till now I was using PHP Rest Api in order to send requests with cypher queries and get a response back. The response is a huge string which makes it difficult to parse and can not be transformed to JSON.
I now installed Neo4jPHP and I am trying to figure out how to write the same query I had in cypher.
This is my query:
MATCH (n:RealNode)-[r:contains*]-(z) WHERE n.gid='123' RETURN n,z;")
What I actually want is to get a list of all the names of the nodes (name is a property inside each node) which is related to my n node. How do I do this?
I can not find many examples for Neo4jPHP onnline and the ones I found seem not to work. I downloaded the latest version from here (https://github.com/jadell/neo4jphp).
Thanks
D.
RE-EDITED
I try this query in neo4j Server:
MATCH (n)-[r:KNOWS*]-(z) WHERE n.name='Arthur Dent' AND z.name='Ford Prefect' RETURN n,z,r;
and I get all the 3 nodes which are connected to each other. The same query through neo4jPHP will return only the name of one node. Why is this happening?
$querystring="MATCH path=(n:RealNode {gid:'58731'})-[:contains*]-(z) RETURN [x in nodes(path) | x.id] as names";
$query=new Everyman\Neo4j\Cypher\Query($client,$querystring);
$result=$query->getResultSet();
print_r($result);
foreach($result as $row){
echo $row['x']->getProperty('name') . "\n";
}
On Cypher level you might use a query like:
MATCH path=(n {name:'Arthur Dent'])-[:KNOWS*]-(z {name:'Ford Perfect'})
RETURN [x in nodes(path) | x.name] as names
You assign a variable to a pattern, here path. In the RETURN you iterate over all nodes along that path and extract its name property.
2 additional hints:
consider assigning labels e.g. Person to your nodes and use a declarative index ( CREATE INDEX ON :Person(name) ) to speed up the look up for start/end node of your query
for variable path length matches [:KNOWS*] consider using an upper limit, depending on the size and structure of your graph this can get rather expensive. [:KNOWS*10] to limit on 10th degree.
After lots of trying and some help from Stefan Armbruster I made it work using Neo4jPHP.
This is how it looks:
$client = new Everyman\Neo4j\Client();
$querystring="MATCH path=(n {gid:'58731'})-[:contains*]-(z) RETURN LAST([x in nodes(path) | x.id]) as names";
$query=new Everyman\Neo4j\Cypher\Query($client,$querystring);
$result=$query->getResultSet();
foreach($result as $resultItem){
$resultArray[] = $resultItem['n'];
}
print_r($resultArray); // prints the array
Neo4jPHP is very handy tool but not very popular. Small community and few examples online. Hope this helps someone.
This question already has answers here:
Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?
(2 answers)
Closed 8 years ago.
Im building a script that give me an product array by parsing html from a list of websites.
I believe that Im doing everything right.. But for some reason i have alots of difficulty with only one website Makita.ca
So.. Im using DOMXPath for retrieving element. i am providing the RAW html that im getting from makita.ca
What picture i want to get is those on the pictures that are on the left
please also note that the only thing i need is the link of the image and not the actual
image.
the folowing image page is at http://www.makita.ca/index2.php?event=tool&id=100
$productArray = array();
$Dom = new DOMDocument();
#$Dom -> loadHTML($this->html);
$xpath = new DOMXPath($Dom);
echo $xpath -> query('//*[#id="content_other"]/table[2]/tbody/tr/td[1]/table/tbody/tr[4]/td/table/tbody/tr[1]/td/div/a/img')->length;
if($xpath -> query('//*[#id="content_other"]/table[2]/tbody/tr/td[1]/table/tbody/tr[4]/td/table')->length > 0)
{
for($i=0;$i<$xpath->query('//*[#id="content_other"]/table[2]/tbody/tr/td[1]/table/tbody/tr[4]/td/table/tbody/tr')->length;$i++)
{
if($xpath->query('//*[#id="content_other"]/table[2]/tr/td[1]/table/tr[4]/td/table/tr['.$i.']/td/div/a/img') > 0)
$productArray['picture'][] = $xpath -> query('//*[#id="content_other"]/table[2]/tr/td[1]/table/tr[4]/td/table/tr['.$i.']/td/div/a/img')->item(0)->nodeValue;
}
}
Do you see what is my mistake ? since now im really lost.
Edit:
ok for test purposes i am echoing the length of the query() method witch should give me how much element match the query
So I retyped to hole query down so they can't have any non asci character
So i retyped the hole query '//*[#id="content_other"]/table[2]//tr/td1/table//tr[4]/td/table//tr1/td/div/a/img'
then the result is 0
So i removed the end of the query part by part..
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td/table//tr[1]/td/div/a = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td/table//tr[1]/td/div = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td/table//tr[1]/td = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td/table//tr[1] = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td/table = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]/td = 0
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr = 5
Wooo i got some element matching here !
ok let try the last element witch is the one i need
so since it is zero based then to get the tr number 5 i need to enter as a path this
//*[#id="content_other"]/table[2]//tr/td[1]/table//tr[4]
But I still get 0.... So i dont know what to do any more..
//div[#class='product_heading']/ancestor-or-self::table[1]//a/img selects firstly the "Action Shots", then all the images found under this bloc.
This XPath expression will be more reliable than yours, because of the low number of positional expressions which tends to break easily as the markup changes.
//div[#class='product_heading']/ancestor-or-self::table[1]//a[#rel='thumbnail']/img would be a stronger security
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
PHP HTML DomDocument getElementById problems
I'm trying to extract info from Google searches in PHP and find that I can read the search urls without problem, but getting anything out of them is a whole different issue. After reading numerous posts, and applicable PHP docs, I came up with the following
// get large panoramas of montana
$url = 'http://www.google.com/search?q=montana+panorama&tbm=isch&biw=1408&bih=409';
$html = file_get_contents($url);
// was getting tons of "entity parse" errors, so added
$html = htmlentities($html, ENT_COMPAT, 'UTF-8', true); // tried false as well
$doc = new DOMDocument();
//$doc->strictErrorChecking = false; // tried both true and false here, same result
$result = $doc->loadHTML($html);
//echo $doc->saveHTML(); this shows that the tags I'm looking for are in fact in $doc
if ($result === true)
{
var_dump($result); // prints 'true'
$tags = $doc->getElementById('center_col');
$tags = $doc->getElementsByTagName('td');
var_dump($tags); // previous 2 lines both print NULL
}
I've verified that the ids and tags I'm looking for are in the html by error_log($html) and in the parsed doc with $doc->SaveHTNL(). Anyone see what I'm doing wrong?
Edit:
Thanks all for the help, but I've hit a wall with DOMDocument. Nothing in any of the docs, or other threads, works with Google image queries. Here's what I tried:
I looked at the #Jon link tried all the suggestions there, looked at the getElementByID docs and read all the comments there as well. Still getting empty result sets. Better than NULL, but not much.
I tried the xpath trick:
$xpath = new DOMXPath($doc);
$ccol = $xpath->query("//*[#id='center_col']");
Same result, an empty set.
I did a error_log($html) directly after the file read and the document has a doctype "" so it's not that.
I also see there that user "carl2088" says "From my experience, getElementById seem to work fine without any setups if you have loaded a HTML document". Not in the case of Google image queries, it would appear.
In desperation, I tried
echo count(explode('center_col', $html))
to see if for some strange reason it disappears after the initial error_log($html). It's definitely there, the string is split into 4 chunks.
I checked my version of PHP (5.3.15) complied Aug. 25 2012, so it's not a version too old to support getElementByID.
Before yesterday, I had been using an extremely ugly series of "explodes" to get the info, and while it's horrid code, it took 45 minutes to write and it works.
I'd really like to ditch my "explode" hack, but 5 hours to achieve nothing vs 45 minutes to get something that works, makes it really difficult to do things the right way.
If anyone else with experience using DOMDocument has some additional tricks I could try, it would be much appreciated.
are you using the the javascript getElementById and getElementsByTagName if yes than this is the problem
$tags = $doc->getElementById('center_col');
$tags = $doc->getElementsByTagName('td');
You will need to validate your document with DOMDocument->validate() or DOMDocument->validateOnParse before using function $doc->getElementById('center_col');
$doc->validateOnParse = true;
$doc->loadHTML($html);
stackoverflow: getelementbyid-problem
http://php.net/manual/de/domdocument.getelementbyid.php
it's in the question #Jon post in his comment!
I'm using the Bing Search API 2.0 (XML) & PHP to retreive results.
But when running some queries, the API doesn't return the (same) results Bing.com would.
When I send this request: (This is using the API)
http://api.search.live.net/xml.aspx?Appid=__________&query=3+ts+site%3Amycharity.ie/charity&sources=web&web.count=10&web.offset=0
I get 0 results.
But if I go to Bing.com and search for bacon the URL would be:
http://www.bing.com/search?q=bacon&go=&form=QBRE&filt=all&qs=n&sk=&sc=8-5
So If I take I substitute in my API query into this URL like so:
http://www.bing.com/search?q=3+ts+site%3Amycharity.ie/charity&go=&form=QBRE&filt=all&qs=n&sk=&sc=8-5
I should get 0 results again, right?
No, I get the 1 result. (The result I was looking for with the API).
Why is this? Is there anyway around this?
Yes the Bing API is totally brain dead and utterly useless because of this fact.
But, luckily, screen scraping is trivial:
<?
function searchBing($search_term)
{
$html = file_get_contents("http://www.bing.com/search?q=".urlencode($search_term)."&go=&qs=n&sk=&sc=8-20&first=$start&FORM=QBLH");
$doc = new DOMDocument();
#$doc->loadHtml($html);
$x = new DOMXpath($doc);
$output = array();
// just grab the urls for now
foreach ($x->query("//div[#class='sb_tlst']//a") as $node)
{
$output[] = $node->getAttribute("href");
}
return $output;
}
print_r(searchBing("bacon"));
Doesnt look like the API request is actually requesting the information. Well, it is, but not quite. Example;
from the bing search; "search?q=bacon&go=&form" Note the word bacon in it.
This doesnt appear to be parsed in any way in the API request. Not even as a hex value. I believe that herein lies the problem.
Perhaps there was an issue, which is now fixed...
Currently, if I'm trying the following queries made according to the Bing API 2.0 MSDN they all return the same single result:
http://www.bing.com/search?q=3+ts+site%3Amycharity.ie/charity&go=&form=QBRE&filt=all&qs=n&sk=&sc=8-5
http://api.bing.net/xml.aspx?Appid=______7&query=3+ts+site%3Amycharity.ie/charity&sources=web
http://api.bing.net/json.aspx?Appid=_______&query=3+ts+site%3Amycharity.ie/charity&sources=web
This question already has an answer here:
Closed 11 years ago.
The community is reviewing whether to reopen this question as of 6 days ago.
Possible Duplicate:
How to use XPath function in a XPathExpression instance programatically?
I'm trying to find all of the rows of a nested table that contain an image with an id that ends with '_imgProductImage'.
I'm using the following query:
"//tr[/td/a/img[ends-with(#id,'_imgProductImage')]"
I'm getting the error: xmlXPathCompOpEval: function ends-with not found
My google searches i believe say this should be a valid query/function. What's the actual function i'm looking for if it's not "ends-with"?
from How to use XPath function in a XPathExpression instance programatically?
One can easily construct an XPath 1.0 expression, the evaluation of which produces the same result as the function ends-with():
$str2 = substring($str1, string-length($str1)- string-length($str2) +1)
produces the same boolean result (true() or false()) as:
ends-with($str1, $str2)
so for your example, the following xpath should work:
//tr[/td/a/img['_imgProductImage' = substring(#id, string-length(#id) - 15)]
you will probably want to add a comment that this is a xpath 1.0 reformulation of ends-with().
It seems that ends-with() is an XPath 2.0 function.
DOMXPath only supports XPath 1.0
Edit after the comment : In your case, I suppose you'll have to :
Find all images, using a simpler XPath query, that will return more images than what you want -- but include those you want to keep.
Loops over those, testing in PHP, for each one of them, if the id attribute (see the getAttribute method) matches what you want.
To test if the attribute is OK, you could use something like this, in the loop that iterates over the images :
$id = $currentNode->getAttribute('id');
if (preg_match('/_imgProductImage$/', $id)) {
// the current node is OK ;-)
}
Note that, in my regex pattern, I used a $ to indicate end of string.
There is no ends-with function in XPath 1.0, but you can fake it:
"//tr[/td/a/img[substring(#id, string-length(#id) - 15) = '_imgProductImage']]"
If you're on PHP 5.3.0 or later, you can use registerPHPFunctions to call any PHP function you want, although the syntax is a little odd. For example,
$xpath = new DOMXPath($document);
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions("ends_with");
$nodes = $x->query("//tr[/td/a/img[php:function('ends-with',#id,'_imgProductImage')]"
function ends_with($node, $value){
return substr($node[0]->nodeValue,-strlen($value))==$value;
}