need help to find correct xpath - php

I have the following this weather site
And i need to using Xpath but icant return query!
I'm using this xPath and must return 2 Row
$xpath->query('/html/body/table[3]/tbody/tr/td/table/tbody/tr/td[2]/p/table/tbody/tr/td/font/div/center/table/tbody/tr[1]/td[1]/font/font/b');
but not return anythings:
please complete this xpath
i'm using this cod butt show error
Catchable fatal error: Object of class DOMNodeList could not be
converted to string in /home/mysite/curl.php on line 23
<?php
$url="http://www.irimo.ir/farsi/current/index.asp?station=40770";
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$dom = new DomDocument();
#$dom->loadHTML($allcont);
$xpath = new DomXPath($dom);
$return = $xpath->query('/html/body/table[3]/tbody/tr/td/table/tbody/tr/td[2]/p/table/tbody/tr/td/font/div/center/table/tbody/tr[3]/td/font/b');
echo $return;
echo $xpath;
?>

Try
$xpath->query('/html/body/table[3]/tbody/tr/td/table/tbody/tr/td[2]/p/table/tbody/tr/td/font/div/center/table/tbody/tr[3]/td/font/b');

Related

Foreach loop and problems with cURL requests

I am a beginner in PHP programming. I have this script in which I'm trying to get a string multiple times, each time with different "login" data, from an external website. I am using PHP, cURL, DOM and XPath. The fact is that my code seems to work only if I don't use a foreach construct to loop the entire operation. But I don't know how else I could repeat this operation changing the data from time to time.
The situation is: I have just logged in, and now the site ask me to fill two more fields that are necessary to proceed to the next page where I can get the string that I need. The next portion of code is contained in a if block.
// A function to automatically select the form fields:
function form_fields($xpath, $query) {
$inputs = $xpath->query($query);
$fields = array();
foreach ($inputs as $input) {
$key = $input->attributes->getNamedItem('name')->nodeValue;
$type = $input->nodeName;
$value = $input->attributes->getNamedItem('value')->nodeValue;
$fields[$key] = $value;
}
return $fields;
}
// Executing the XPath queries to fill the fields:
$opzutenza = 'incarichi';
$action = $xpath->query("//form[#name='fm_$opzutenza']")->item(0)->attributes->getNamedItem('action')->nodeValue;
curl_setopt($ch, CURLOPT_URL, $action);
$fields = form_fields($xpath, "//form[#name='fm_$opzutenza']/input");
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($fields));
$html = curl_exec($ch);
$dom = new DomDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
// The strings that I need to get depend on each value contained in this select element:
$options = $xpath->query("//select[#name='sceltaincarico']/option");
$partiteiva = array();
foreach($options as $option){
$partiteiva[] = $option->nodeValue;
unset($partiteiva[0]);
}
} // -----------> END OF 'IF' BLOCK
$queriesNA = array();
foreach ($partiteiva as $piv) {
$queryNA = ".//select[#name='sceltaincarico']/option[text()='$piv']";
$queriesNA[] = $queryNA;
}
// And this is the problematic loop:
foreach($queriesNA as $querypiv){
$form = $xpath->query("//form[#name='fm_scelta_tipo_incarico']")->item(0);
$action = $form->attributes->getNamedItem('action')->nodeValue;
#$option = $xpath->query($querypiv, $form);
curl_setopt($ch, CURLOPT_URL, $action);
$fields = [
'sceltaincarico' => $option->item(0)->attributes->getNamedItem('value')->nodeValue,
'tipoincaricante' => 'incDiretto'
];
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($fields)); // ----> Filling the last field
curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, 'https://website.com/dp/api');
curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, 'https://website.com/cons/cons-services/sc/tokenB2BCookie/get');
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
$http = curl_exec($ch);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
function parse_headers($http) {
$headers = explode("\r\n", $http);
$hdrs = array();
foreach($headers as $h) {
#list($k, $v) = explode(':', $h);
$hdrs[trim($k)] = trim($v);
}
return $hdrs;
}
$hdrs = parse_headers($http);
$tokens = array(
"x-token: ".$hdrs['x-token'],
"x-b2bcookie: ".$hdrs['x-b2bcookie']
);
curl_setopt($ch, CURLOPT_HTTPHEADER, $tokens);
curl_setopt($ch, CURLOPT_URL, "https://website.com/cons/cons-services/rs/disclaimer/accetta"); // Accepting the disclaimer...
curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, "https://website.com/portale/web/guest/home");
$html = curl_exec($ch); // Finally got to the page that I need
$dom = new DomDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
// Selecting the string:
$string = $xpath->query("//div[#class='informativa']/strong[2]");
$nomeazienda = array();
foreach ($string as $str) {
$nomeazienda[] = $str->childNodes->item(0)->nodeValue;
}
// Going back to the initial page so the loop can start again from the beginning:
$piva_page = 'https://website.com/portale/scelta-utenza-lavoro?....';
curl_setopt($ch, CURLOPT_URL, $piva_page);
$html = curl_exec($ch);
$dom = new DomDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
}
curl_close($ch);
These are the error messages:
Notice: Trying to get property 'attributes' of non-object...
Fatal error: Uncaught Error: Call to a member function getNamedItem() on null...
Error: Call to a member function getNamedItem() on null...
The function getNamedItem() is the first one just after the malfunctioning loop, and so are the 'attributes'.

get the grandson nodeValue when using XPath

I am looking for a way to retrieve a nodeValue of a grandson. I only have unambiguous access to "I have unambiguous access to this class" class as shown below. How do I do it?
<class="I have unambiguous access to this class">
- <class="childClass">
-- <class="grandsonClass">
///////
function curlGet($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array($url));
$results = curl_exec($ch);
curl_close($ch);
return $results;
}
function returnXPathObject($item){
$xmlPageDom = new DOMDocument();
#$xmlPageDom->loadHTML($item);
$xmlPageXPath = new DOMXPath($xmlPageDom);
return $xmlPageXPath;
}
$examplePage = curlGet("www.example.com");
$exampleXPath = returnXPathObject($examplePage);
$rating = $exampleXPath->query("//span[#class='grandfather']"); // I can access this guy, but I want it's grandchild.
// get child of grandfather's child (grandson of grandfather)
You can do relative XPath query by passing the 'grandfather' $rating as the context element :
$query = "./span[#class='childClass']/span[#class='grandsonClass']";
$grandSon = $exampleXPath->query($query, $rating);

<?php echo file_get_contents how to get content in a certain tag

<?php echo file_get_contents ("http://www.google.com/"); ?>
but I only want to get the contents of the tag in the url...how to do that...?
I need to echo the content between a tag....not the whole page
Refer this PHP manual and cURL which also help you.
You may also use user define function instead of file_get_contents():
function get_content($URL){
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $URL);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo get_content('http://example.com');
Hope, it will resolve your issue.
I think you want to extract content from a specific html tag in the file. For this you can use regular expressions. However view the following link to parse an HTML document file:
http://php.net/manual/en/class.domdocument.php
libxml_use_internal_errors(true);
$url = "http://stackoverflow.com/questions/15947331/php-echo-file-get-contents-how-to-get-content-in-a-certain-tag";
$dom = new DomDocument();
$dom->loadHTML(file_get_contents($url));
foreach($dom->getElementsByTagName('a') as $element) {
echo $element->nodeValue.'<br/>';
}
exit;
More info: http://www.php.net/manual/en/class.domdocument.php
There you can see how to select elements by id or class, how to get elements' attribute values etc.
Note: It's better to get content via cURL instead of get_file_contents. For example:
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Also note that on some websites you have to specify options like CURLOPT_USERAGENT etc., otherwise the content may not be returned.
Here are the other options: http://www.php.net/manual/en/function.curl-setopt.php

PHP Scraping with curl - How can I debug

I just learned what scrapping and cUrl is few hours ago, and since then I am playing with that. Nevertheless, I am facing something strange now. The here below code works fine with some sites and not with others (of course I modified the url and the xpath...). Note that I have no error raised when I test if curl_exec was executed properly. So the problem must come from somwhere after. Some my questions are as follows:
How can I check if the new DOMDocument as been created properly: if(??)
How can I check if the new DOMDocument has been populated properly with html?
...if a new DOMXPath object has been created?
Hope I was clear. Thank you in advance for your replies. Cheers. Marc
My php:
<?php
$target_url = "http://www.somesite.com";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}
// parse the html into a DOMDocument
$dom = new DOMDocument();
#$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->query('somepath');
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
echo "<br />Link: $url";
}
?>
Use a try/catch to check if the document object was created, then check the return value of loadHTML() to determine if the HTML was loaded into the document. You can use a try/catch on the XPath object as well.
try
{
$dom = new DOMDocument();
$loaded = $dom->loadHTML($html);
if($loaded)
{
// loaded OK
}
else
{
// could not load HTML
}
}
catch(Exception $e)
{
// document could not be created, see $e->getMessage()
}
Problem solved. The error came from firebug who gave a wrong path. Big thanks to MrCode for his support...

Consume WebService with php

Can anyone give me an example of how I can consume the following web service with php?
http://www.webservicex.net/uszip.asmx?op=GetInfoByZIP
Here's a simple example which uses curl and the GET interface.
$zip = 97219;
$url = "http://www.webservicex.net/uszip.asmx/GetInfoByZIP?USZip=$zip";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
$xmlobj = simplexml_load_string($result);
The $result variable contains XML which looks like this
<?xml version="1.0" encoding="utf-8"?>
<NewDataSet>
<Table>
<CITY>Portland</CITY>
<STATE>OR</STATE>
<ZIP>97219</ZIP>
<AREA_CODE>503</AREA_CODE>
<TIME_ZONE>P</TIME_ZONE>
</Table>
</NewDataSet>
Once the XML is parsed into a SimpleXML object, you can get at the various nodes like this:
print $xmlobj->Table->CITY;
If you want to get fancy, you could throw the whole thing into a class:
class GetInfoByZIP {
public $zip;
public $xmlobj;
public function __construct($zip='') {
if($zip) {
$this->zip = $zip;
$this->load();
}
}
public function load() {
if($this->zip) {
$url = "http://www.webservicex.net/uszip.asmx/GetInfoByZIP?USZip={$this->zip}";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
curl_close($ch);
$this->xmlobj = simplexml_load_string($result);
}
}
public function __get($name) {
return $this->xmlobj->Table->$name;
}
}
which can then be used like this:
$zipInfo = new GetInfoByZIP(97219);
print $zipInfo->CITY;
I would use the HTTP POST or GET interfaces with curl. It looks like it gives you a nice clean XML output that you could parse with simpleXML.
Something like the following would go along way (warning, totally untested here):
$ch = curl_init('http://www.webservicex.net/uszip.asmx/GetInfoByZIP?USZip=string');
curl_setopt($ch,CURLOPT_RETURNTRANSFER,TRUE);
$xml = curl_exec($ch);
curl_close($ch);
$parsed = new SimpleXMLElement($xml);
print_r($parsed);

Categories