Error while using contains in XPath query - php

I'm trying to parse the following URL:
http://rss.cbc.ca/lineup/technology.xml
My code is:
$doc = new DOMDocument();
$doc->load("http://rss.cbc.ca/lineup/technology.xml");
echo '<ul class="rss">';
$i = 0;
if( isset($_GET['filter']) ){
$xpath = new DOMXPath($doc);
$doc = $xpath->query("item/title[contains(.,'".$_GET['filter']."')] or item/description[contains(.,'".$_GET['filter']."')]");
echo "<p>Filtering news items on '".$_GET['filter']."'</p>";
}
foreach ($doc->getElementsByTagName('item') as $node) {
if($i % 2 == 0)
$class = "even";
else
$class = "odd";
echo '<li class="'.$class.'">';
echo "<h1>".$node->getElementsByTagName('title')->item(0)->nodeValue."</h1>";
echo "<p>".$node->getElementsByTagName('description')->item(0)->nodeValue."</p>";
echo 'Link to story';
echo "</li>";
$i = $i + 1;
}
echo "<ul>";
The issue that I'm having is that if I specify a filter (through a URL var), when I do the foreach later down the page, I get an error:
Fatal error: Call to undefined method
DOMNodeList::getElementsByTagName()

Your XPath expression is evaluated to a boolean data type (false, because your path is wrong)
If you want to select those item elements having title or description children containg some string, use:
/rss/channel/item[(title|description)[contains(.,'string')]]

First: $xpath->query returns a DOMNodeList which does not have a method getElementsByTagName.
Second: Your query returns the title element or the description and not the item.
Change it to //item[contains(title,'".$_GET['filter']."')]

Related

How to get image url by page in PHP

This is my code :
<form method="POST">
<input name="link">
<button type="submit">></button>
</form>
<title>GET IMAGE URL</title>
<?php
if (!isset($_POST['link'])) exit();
$link = $_POST['link'];
$parse = explode('.html', $link);
echo '<div id="pin" style="float:center"><textarea class="text" cols="110" rows="50">';
for ($i = 1; $i <=5; $i++)
{
if ($i > 1)
$link = "$parse[0]-$i.html";
$get = file_get_contents($link);
if (preg_match_all('/src="(.*?)"/', $get, $matches))
{
foreach ($matches[1] as $content)
echo $content."\r\n";
}
}
echo '</textarea>';
The page I'm trying to get the img src has 10 to 15 page,so I want my code to get all the img url until the end of the page. How can I do that without the loop?
If I use:
for ($i = 1; $i <=5; $i++)
this will get only 5 page img urls, but I want to make it get until the end. Then I don't need to edit the loop everytime I submit another URL with a different number of pages.
From this
this will get only 5 page img urls, but I want to make it get until the end. Then I don't need to edit the loop everytime I submit another URL with a different number of pages.
I could understand that your problem is with dynamic number of pages.Your urls have a next page link at the bottom
下一页
Identify it and get your images in while loop
<?php
// Link given in form
$link = "http://www.xiumm.org/photos/XiuRen-17305.html";
$parse = explode('.html', $link);
$i=1;
// Intialize a boolean
$nextPageFound = true;
while($nextPageFound) {
// Construct URL Every time when nextPageFound
if ($i == 1) {
$url = "$parse[0].html";
echo "First Page<br><br>";
} else {
$url = "$parse[0]-$i.html";
}
// Getting URL Contents
$get = file_get_contents($url);
if (preg_match_all('/src="(.*?)"/', $get, $matches))
{
// echoing contents
foreach ($matches[1] as $content)
echo $content."<br>";
}
// check nextPageBtn if available
if (strpos($get, '"nextPageBtn"') !== false) {
$nextPageFound = true;
// increment +1
$i++;
echo "<br>Page $i<br><br>";
} else {
$nextPageFound = false;
echo "THE END";
}
}
?>
You should use an HTML/XML parser, like DOMDocument, in combination with DOMXPath (xpath is query language to query (X)HTML data structures):
// create DOMDocument
$doc = new DOMDocument();
// load remote HTML file
$doc->loadHTMLFile( $link );
// create DOMXPath
$xpath = new DOMXPath( $doc );
// fetch all IMG elements that have a src attribute
$nodes = $xpath->query( '//img[#src]' );
// loop trough found IMG elements and echo their src attribute values
for( $i = 0; $i < $nodes->length; $i++ ) {
echo $nodes->item( $i )->getAttribute( 'src' ) . PHP_EOL;
}
Regarding the xpath query //div[contains(#class,'pic_box')]//#src, mentioned by #Enuma, in the comments:
The resulting DOMNodeList of that query will not contain DOMElement objects, but DOMAttr objects, because the query directly asks for attributes, not elements. Since DOMAttr represents an attribute and not an element, the method getAttribute() does not exist. To get the value of the attribute you have to use the property DOMAttr->value.
So, we have to slightly alter the relevant part of our example code from above to:
// loop trough found src attributes and echo their value
for( $i = 0; $i < $nodes->length; $i++ ) {
echo $nodes->item( $i )->value . PHP_EOL;
}
Putting it all together, our example code then becomes:
// create DOMDocument
$doc = new DOMDocument();
// load remote HTML file
$doc->loadHTMLFile( $link );
// create DOMXPath
$xpath = new DOMXPath( $doc );
// fetch all src attributes that are descendants of div.pic_box
$nodes = $xpath->query( '//div[contains(#class,'pic_box')]//#src' );
// loop trough found src attributes and echo their value
for( $i = 0; $i < $nodes->length; $i++ ) {
echo $nodes->item( $i )->value . PHP_EOL;
}
PS.: In order for DOMDocument to be able to load remote files, I believe some php config setting may be required to be set, which I don't know off the top of my head, right now. But since it already appeared to be working for #Enuma, it's not actually relevant now. Perhaps I'll look them up later.

How to get value of onclick= using xpath?

I have a string that has lots of <li> sets of data. I want to get this value:
1: call.php?category=fruits&fruitid=123456
inside onclick using xpath . My current xpath doesn't get me the onclick value so I parse it further to get my required data ! Could any one tell me what is the correct xpath to get value of onclick?
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($code2);
$xpath = new DOMXPath($dom);
// Empty array to hold all links to return
$result = array();
//Loop through each <li> tag in the dom
foreach($dom->getElementsByTagName('li') as $li) {
//Loop through each <a> tag within the li, then extract the node value
foreach($li->getElementsByTagName('a') as $links){
$result[] = $links->nodeValue;
echo $result[0] . "\n";
}
$onclicks = $xpath->query("//li/a/onclick");
foreach ($onclicks as $onclick) {
echo $onclick->nodeValue . "\n";
}
}
data:
<li><a id="FR123456" onclick="setFood(false);setSeasonFruitID('123456');getit('call.php?category=fruits&fruitid=123456&',detailFruit,false);">mango season</a><img src="http://imagehosting.com/images/fru_123456.png">
</li>
onclick is an attribute, and you use #attribute_name to reference attribute in XPath :
$onclicks = $xpath->query("//li/a/#onclick");
foreach ($onclicks as $onclick) {
echo $onclick->nodeValue . "\n";
}
Try something like this :
$onclicks = $xpath->query("//li/a");
foreach ($links as $link) {
echo $link->getAttribute('onclick'). "\n";
}

How to call UL class only once using domdocument php

I am using PHP Domdocument to load my html. In my HTML, I have class="smalllist" two times. But, I need to load the first class elements.
Now, My PHP Code is
$d = new DOMDocument();
$d->validateOnParse = true;
#$d->loadHTML($html);
$xpath = new DOMXPath($d);
$table = $xpath->query('//ul[#class="smalllist"]');
foreach ($table as $row) {
echo $row->getElementsByTagName('a')->item(0)->nodeValue."-";
echo $row->getElementsByTagName('a')->item(1)->nodeValue."\n";
}
which loads both the classes.
But, I need to load only one class with that name.
Please help me in this. Thanks in advance.
DOMXPath returns a DOMNodeList which has a item() method. see if this works
$table->item(0)->getElementsByTagName('a')->item(0)->nodeValue
edited (untested):
foreach($table->item(0)->getElementsByTagName('a') as $anchor){
echo $anchor->nodeValue . "\n";
}
You can put a break within the foreach loop to read only from the first class. Or, you can do foreach ($table->item(0) as $row) {...
Code:
$count = 0;
foreach($table->item(0)->getElementsByTagName('a') as $anchor){
echo $anchor->nodeValue . "\n";
if( ++$count > 2 ) {
break;
}
}
another way rather than using break (more than one way to skin a cat):
$anchors = $table->item(0)->getElementsByTagName('a');
for($i = 0; $i < 2; $i++){
echo $anchor->item($i)->nodeValue . "\n";
}
This is my final code:
$d = new DOMDocument();
$d->validateOnParse = true;
#$d->loadHTML($html);
$xpath = new DOMXPath($d);
$table = $xpath->query('//ul[#class="smalllist"]');
$count = 0;
foreach($table->item(0)->getElementsByTagName('a') as $anchor){
$data[$k][$arr1[$count]] = $anchor->nodeValue;
if( ++$count > 1 ) {
break;
}
}
Working fine.

Fetch the attributes using PHP crawler

I am trying to fetch the name,address and location from crawling of a website . Its a single page and dont want any other thing other than this. I am using the below code.
<?php
include 'simple_html_dom.php';
$html = "http://www.phunwa.com/phone/0191/2604233";
$dom = new DomDocument();
$dom->loadHtml($html);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[#class="address-tags"]')->item(0);
for($i=0; $i < $div->length; $i++ )
{
print "nodename=".$div->item( $i )->nodeName;
print "\t";
print "nodevalue : ".$div->item( $i )->nodeValue;
print "\r\n";
echo $link->getElementsByTagName("<p>");
}
?>
The website html source code is
<div class="address-tags">
<p><strong>Name:</strong> RAJ GOPAL SINGH</p>
<p><strong>Address:</strong> R/O BARNAI NETARKOTHIAN, P.O.MUTHI TEH.& DISTT.JAMMU,X, 181206</p>
<p><strong>Location:</strong> JAMMU, Jammu & Kashmir, India</p>
<p><strong>Other Numbers:</strong> 01912604233 | +911912604233 | +91-191-2604233</p>
Can somone please help me get the three attributes as output. Nothing is echop on the page as of now.
Thanks alot .
you need $dom->load($html); instead of $dom->loadHtml($html);. After doing this you wil; find your html is not well formed, so $xpath stay empty.
Maybe try something like:
$html = file_get_contents('http://www.phunwa.com/phone/0191/2604233');
$name = preg_replace('/(.*)(<p><strong>Name:<\/strong> )([^<]+)(<\/p>)(.*)/mis','$3',$html);
$address = preg_replace('/(.*)(<p><strong>Address:<\/strong> )([^<]+)(<\/p>)(.*)/mis','$3',$html);
$location = preg_replace('/(.*)(<p><strong>Location:<\/strong> )([^<]+)(<\/p>)(.*)/mis','$3',$html);
$othernumbers = preg_replace('/(.*)(<p><strong>Other Numbers:<\/strong> )(.*)/mis','$3',$html);
list($othernumbers,$trash)= preg_split('/<\/p>/mis',$othernumbers,0);
echo 'name: '.$name.'<br>address: '.$address.'<br>location: '.$location.'<br>other numbers: '.$othernumbers;
exit;
You should use the following for your XPath query:
//*[#class='address-tags']/p
so you're retrieving the actual paragraph nodes that are children of the 'address-tags' parent. Then you can use a loop on them:
$nodes = $xpath->query('//*[#class="address-tags"]/p');
for ($i = 0; $i < $nodes->length; $i++) {
echo $nodes->item($i)->nodeValue;
}
// or just
foreach($nodes as $node) {
echo $node->nodeValue;
}
Right now your code is properly fetching the first div that's found, but then you continue treating that div as if it was a DOMNodeList returned from an xpath query, which is incorrect. ->item() returns a DOMNode object, which does NOT have an ->item() method.

returing nodeValues using while loop

I am using xpath to get various elements on a page. If I use a foreach loop like this foreach ($company as $node) { echo $node->nodeValue. "<br>"; } it works but I am only able to return values from one variable so that means I have to create two separate foreach loops. I want to be able to use the while loop so I can return both values from variable at the same time. The while loop doesnt return any error or values.
$doc = new DOMDocument();
#$doc->loadHTML($source);
$xpath = new DOMXpath($doc);
$company = $xpath->query("//*[#class='name']");
$address = $xpath->query("//*[#class='address']");
$i = 0;
while ($i < count($company)) {
echo $company->nodeValue. "<br>";
echo $address->nodeValue. "<br><br>";
$i++;
}
They are NodeLists, to retrieve individual nodes by index, use ->item()
while ($i < $company->length ) {
echo $company->item($i)->nodeValue. "<br>";
echo $address->item($i)->nodeValue. "<br><br>";
$i++;
}

Categories