Trying to get same result by xpath and css element

Trying to get same result by xpath and css element - php

Im trying to get the same result from a site by using dom elements and xpath. So i can make this crawler dynamic for more sites, so that i only have to fill in url and what type(xpath, domelement).
$url = 'https://#/';
$xpath = "/html[1]/body[1]/div[3]/header[1]/div[1]/div[1]/div[2]/div[1]/ul[1]/li[2]/ul[1]/li[1]/span[1]";
$client = new Client();
$guzzleClient = new GuzzleClient(array(
'timeout' => 60,
));
$client->setClient($guzzleClient);
$crawler = $client->request('GET', $url);
$crawler->filter('.rate')->filter('.gold')->each(function ($node) {
print $node->text()."\n";
});
$result = $crawler->filterXPath($xpath);
var_dump($result);
result should be, gold price like this code piece outputs: $crawler->filter('.rate')->filter('.gold')->each(function ($node) {
print $node->text()."\n";
});
If anything is unclear please let me know!

Welcome to SO.
If you want to fetch the gold rate then you can use the below selectors.
xpath
//ul[#class='rates-widget list-inline']//span[#class='rate gold']
CSS
ul.rates-widget.list-inline span.rate.gold

Related

Webscraping with Goutte and Guzzle

I have the following method from my controller that gets the data from the site:
$goutteClient = new Client();
$guzzleClient = new GuzzleClient([
'timeout' => 60,
]);
$goutteClient->setClient($guzzleClient);
$crawler = $goutteClient->request('GET', 'https://html.duckduckgo.com/html/?q=Laravel');
$crawler->filter('.result__title .result__a')->each(function ($node) {
dump($node->text());
});
The above code gives me the title of contents from the search results. I also want to get the link of the corresponding search result. That resides in class result__extras__url.
How do I filter the link in and the title at once? Or do I have to run another method for that?

Try to inspect the attributes of the nodes. Once you get the href attribute, parse it to get the URL.
$crawler->filter('.result__title .result__a')->each(function ($node) {
$parts = parse_url(urldecode($node->attr('href')));
parse_str($parts['query'], $params);
$url = $params['uddg']; // DDG puts their masked URL and places the actual URL as a query param.
$title = $node->text();
});

For parsing, I usually do the following:
$doc = new DOMDocument();
$doc->loadHTML((string)$crawler->getBody());
from then on, you can access using getElementsByTagName functions on your DOMDocument.
for example:
$rows = $doc->getElementsByTagName('tr');
foreach ($rows as $row) {
$cols = $row->getElementsByTagName('td');
$value = trim($cols->item(0)->nodeValue);
}
You can find more information in
https://www.php.net/manual/en/class.domdocument.php

guzzle read value from response

I'm using Guzzle with Laravel to get an object from external API with a HTTP. The API return XML Object similar to this (https://www.w3schools.com/php/note.xml)
I need to check one value of the response body. Here is my code:
$client = new \GuzzleHttp\Client();
$res = $client->request('GET', 'https://www.w3schools.com/php/note.xml');
$stringBody = (string) $res->getBody()->getContents();
echo $stringBody;
which is working fine, I mean it display the body as below picture
but I couldn't get one value?
I tried different methods but non of them is working!
for example, this way:
$result = starts_with($stringBody, 'Tove');
or
splitName = explode(' ', $res->getBody());
$first_name = $splitName[0];
echo $first_name;
non is working? I think it consider the body text empty ?
I tried using json_decode but it doesn't work or not supported anymore.
Any idea?
Thanks

What you get from response is xml content. Due to browser compatibility with XML you see only text. You just need SimpleXML class of php for get content as per XML node. Here is sample code
$client = new \GuzzleHttp\Client();
$res = $client->request('GET', 'https://www.w3schools.com/php/note.xml');
$stringBody = (string) $res->getBody()->getContents();
$xml = simplexml_load_string($stringBody);
echo $xml->to;
Hope this help you.

preg_match_all to show multiple results

I'm trying to get this messages (METAR) from a page and show everything just in other php file without the styles and extra info.
At the moment I'm using this code:
<?php
$options = array('http' => array(
'method' => 'GET',
));
$config= stream_context_create($options);
$config_final=file_get_contents('http://www.smn.gov.ar/mensajes/index.php?observacion=metar&operacion=consultar&87582=on&87641=on&87750=on&87765=on&87222=on&87761=on&87860=on&87395=on&87344=on&87166=on&87904=on&87571=on&87347=on&87803=on&87576=on&87162=on&87532=on&87497=on&87097=on&87046=on&87548=on&87217=on&87506=on&87692=on&87418=on&87574=on&87715=on&87374=on&87289=on&87852=on&87178=on&87896=on&87823=on&87270=on&87155=on&87453=on&87925=on&87934=on&87480=on&87047=on&87553=on&87311=on&87909=on&87436=on&87509=on&87912=on&87623=on&87444=on&87129=on&87371=on&87645=on&87022=on&87127=on&87828=on&87121=on&87938=on&87791=on&87448=on',false, $config);
preg_match_all("|<td width=\"100%\">METAR (.*)</td>|sU", $config_final, $tiempo);
echo $tiempo[1][0];
?>
</div>
Using that code I can get only the first METAR, Waht I need is to see all of them in different lines, like showing different results.
Any ideas?
Thanks in advance

I suggest you utilize PHP's built in HTML Parsers for this, in particular the DOMDocument, and use DOMXpath to search for those needle.
Example:
$url = 'http://www.smn.gov.ar/mensajes/index.php?observacion=metar&operacion=consultar&87582=on&87641=on&87750=on&87765=on&87222=on&87761=on&87860=on&87395=on&87344=on&87166=on&87904=on&87571=on&87347=on&87803=on&87576=on&87162=on&87532=on&87497=on&87097=on&87046=on&87548=on&87217=on&87506=on&87692=on&87418=on&87574=on&87715=on&87374=on&87289=on&87852=on&87178=on&87896=on&87823=on&87270=on&87155=on&87453=on&87925=on&87934=on&87480=on&87047=on&87553=on&87311=on&87909=on&87436=on&87509=on&87912=on&87623=on&87444=on&87129=on&87371=on&87645=on&87022=on&87127=on&87828=on&87121=on&87938=on&87791=on&87448=on';
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile($url);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
// search for td's with contains METAR
$metars = $xpath->query('//td[contains(text(), "METAR")]');
if($metars->length <= 0) {
echo 'no metars found';
exit;
}
$data = array();
foreach($metars as $metar) {
$data[] = $metar->nodeValue;
}
echo '<pre>';
print_r($data);

simplexml_load_string doesn't work with soap response

I'm trying to parse the xml response from a soap service. However, I can't get simplexml_load_string to work! Here is my code:
//make soap call
objClient = new SoapClient('my-wsdl',
array('trace' => true,'exceptions' => 0, 'encoding' => 'UTF-8'));
$soapvar = new SoapVar('my-xml', XSD_ANYXML);
$objResponse = $objClient->__soapCall($operation, array($soapvar));
//process result
$str_xml = $objClient->__getLastResponse();
$rs_xml = simplexml_load_string($str_xml);
...$rs_xml always has just one element with name Envelope.
However, if I use *"print var_export($objClient->__getLastResponse(),true);"* to dump the result to my browser, then cut and paste it into my code as a string variable, it work fine! This is what I mean:
$str_xml = 'my cut and pasted xml';
$rs_xml = simplexml_load_string($str_xml);
So it seems the problem is somehow related to something $objClient->__getLastResponse() is doing to the string it creates... but I'm at a loss as to what the problem is or how to fix it.

Do the following:
$str_xml = $objClient->__getLastResponse();
$str_xml = strstr($str_xml, '<');
$rs_xml = simplexml_load_string($str_xml);
As it's a quick and easy hack to strip off stuff before the first opening element.

SoapClient + PHP + Sharepoint + Getting images from a list OR GetItemsByIds

I am using SoapClient to pull data out of a sharepoint list. If its a normal text field it works fine and gives me the image. I can even attach the image to the individual list elements and link it into another field and get the image that way. The problem with that is it asks me to log in to my sharepoint account whenever I access the page, which obviously a normal user of my site will not be able to do.
First, if there is a way around this, that will be a sufficient answer because that is my ideal way of doing it.
However, if the better way is to make a picture gallery and then pull the pictures from there then that isn't a problem.
Basically what I need to know is how to use the Imaging library and maybe the GetItemsByIds method? I am very new to soap and sharepoint in general so I appologize for what may be trivial questions but I really need to know how to do this and I can find no resource on the internet that explains what I need to know (if there is one, link it!). Keep in mind, I have to do this in PHP.
Here is some code that I use to pull the list data:
<?php
$authParams = array(
'login' => 'username',
'password' => 'pass'
);
$listName = "{GUID}";
$rowLimit = '500';
$wsdl = "list.wsdl";
$soapClient = new SoapClient($wsdl, $authParams);
$params = array(
'listName' => $listName,
'rowLimit' => $rowLimit;
);
echo file_get_contents($wsdl, FILE_TEXT, stream_context_create(array('http' => array('timeout' => 1))), 0, 1);
$rawXMLresponse = null;
try{
$rawXMLresponse = $soapClient->GetListItems($params)->GetListItemsResult->any;
}
catch(SoapFault $fault){
echo 'Fault code: '.$fault->faultcode;
echo 'Fault string: '.$fault->faultstring;
}
echo '<pre>' . $rawXMLresponse . '</pre>';
$dom = new DOMDocument();
$dom->loadXML($rawXMLresponse);
$results = $dom->getElementsByTagNameNS("#RowsetSchema", "*");
?>
//do the useful thing
<?php
unset($soapClient);
?>

It was an issue with the user permissions. Once we had an account with the correct permissions, it worked fine.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Trying to get same result by xpath and css element - php

Welcome to SO. If you want to fetch the gold rate then you can use the below selectors. xpath //ul[#class='rates-widget list-inline']//span[#class='rate gold'] CSS ul.rates-widget.list-inline span.rate.gold

Related

Webscraping with Goutte and Guzzle

guzzle read value from response

preg_match_all to show multiple results

simplexml_load_string doesn't work with soap response

SoapClient + PHP + Sharepoint + Getting images from a list OR GetItemsByIds

Categories

Resources