How to find <title> and <form> using xpath query - php

I'm using xpath query to find <title> text and occurrence of <form>.. My code is as follows
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pokemon_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query('/title');
if($pokemon_row->length > 0){
$data["title"]="Yes";
}
}
But its not working any idea how to do this?

Try this
$pokemon_row = $pokemon_xpath->query('//title');
$data["title"]="";
if($pokemon_row->length > 0){
foreach($pokemon_row as $row){
$data["title"].= $row->nodeValue . "<br/>";
}
}

Related

how to get content from div container with two class names (imdb)

Excuse my English.
everybody,
I get a white page when I try to query the content in the DIV container of the URL.
$html = file_get_contents('https://www.imdb.com/search/title?title_type=feature,tv_movie&release_date=,2018'); //get the html returned from the following url
$doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$xpath = new DOMXPath($doc);
//get all the h2's with an id
$row = $xpath->query("//div[contains(#class, 'lister-item-image') and contains(#class, 'float-left')]/a");
if($row->length > 0){
foreach($row as $row){
echo $row->nodeValue . "<br/>";
}
}
}
The content can be found within this DIVĀ“s .
<div class="lister-item-image float-left">
<a href="/title/tt1502407/?ref_=adv_li_i"
> <img alt="Halloween"
class="loadlate"
loadlate="https://m.media-amazon.com/images/M/MV5BMmMzNjJhYjUtNzFkZi00MWQ4LWJiMDEtYWM0NTAzNGZjMTI3XkEyXkFqcGdeQXVyOTE2OTMwNDk#._V1_UX67_CR0,0,67,98_AL_.jpg"
data-tconst="tt1502407"
height="98"
src="https://m.media-amazon.com/images/G/01/imdb/images/nopicture/large/film-184890147._CB470041630_.png"
width="67" />
</a> </div>
I mainly want to query the name, link, genre and length. And a maximum of 50 should be displayed and a link "Next" the next 50 should be queried.
I thank you in advance for possible help.
Working version:
Thanks to Mohammad.
$html = file_get_contents('https://www.imdb.com/search/title?title_type=feature,tv_movie&release_date=,2018'); //get the html returned from the following url
$doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$xpath = new DOMXPath($doc);
//get all the h2's with an id
$row = $xpath->query("//div[contains(#class, 'lister-item-image') and contains(#class, 'float-left')]");
if($row->length > 0){
foreach($row as $row){
echo $doc->saveHtml($row) . "<br/>";
}
}
}

PHP xpath how to get start tag

I am trying to fetch a form start tag with attributes from a DomDocument loaded with a HTML string.
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$result = $xpath->query('//form[#class="af-form acf-form"]');
if ($result->length > 0) {
echo '<pre>';
print_r(($result->item(0)->C14N()));
echo '</pre>';
die();
}
But this way it prints out the entire form. I would like to fetch only this bit:
<form action="http://localhost/wp-test/form-loose" class="af-form acf-form" id="form_5b72d1cd12cc0" method="POST">
How to do so?
Xpath fetches nodes, not opening/closing tags. DOM is an hierarchy of objects - only the serialized (HTML) string has the opening/closing tags.
However, here are two possible approaches:
Clone the node without its child nodes. Save the clone and remove the closing tag with a string function.
$html = <<<'HTML'
<form
action="http://localhost/wp-test/form-loose"
class="af-form acf-form" id="form_5b72d1cd12cc0" method="POST">
some other stuff
<input>
</form>
HTML;
$document = new DOMDocument();
#$document->loadHTML($html);
$xpath = new DOMXpath($document);
$result = $xpath->evaluate('//form[#class="af-form acf-form"][1]');
foreach ($result as $node) {
echo substr($document->saveHTML($node->cloneNode()), 0, -7);
}
Output:
<form action="http://localhost/wp-test/form-loose" class="af-form acf-form" id="form_5b72d1cd12cc0" method="POST">
Or you save each attribute:
$result = $xpath->evaluate('//form[#class="af-form acf-form"][1]');
foreach ($result as $node) {
$result = '<'.$node->nodeName;
foreach ($node->attributes as $attribute) {
$result .= $document->saveHTML($attribute);
}
$result .= '>';
echo $result;
}
Note: Adding [1] to the Xpath expression limits the result list to the first found node.

Scraping some data using PHP

I like to have update of some selected users score from https://location.services.mozilla.com/leaders For that i want to scrape their data using the id as you see for the first user <a id="g#v" href="#g#v">g#v</a>
I want to have his data using the id on that anchor. I have wrote the attached code but, couldn't succeed. Best if i could create an array like $names=array("g#v","elly","elkos"," grack"); and get the scores of all of them, so if i increase the number of names, i can have everyone's score. I was trying for a single user though:
<?php
$html = file_get_contents('https://location.services.mozilla.com/leaders');
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pokemon_doc->loadHTML($html);
libxml_clear_errors();
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query('//a[#id="g#v"]');
if($pokemon_row->length > 0){
foreach($pokemon_row as $row){
$name = $row->nodeValue;
$scores = $pokemon_xpath->query('//.td.tr/td[#class="text-right"]', $row);
foreach($scores as $score){
$lead = $score->nodeValue;
}
echo $name . ": ". $lead;
}
}
}
?>
I tweaked your scraping code to display all users and their score:
<?php
$html = file_get_contents('https://location.services.mozilla.com/leaders');
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pokemon_doc->loadHTML($html);
libxml_clear_errors();
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query('//tr');
if($pokemon_row->length > 0){
// Get each tr
foreach($pokemon_row as $row){
// Get each td in the tr group
$tds = $row->childNodes;
foreach($tds as $td){
// Filter out empty nodes
if(trim($td->nodeValue, " \n\r\t\0\xC2\xA0") !== "") {
$name = $td->nodeValue;
echo $name . " ";
}
}
echo "<br>" . PHP_EOL;
}
}
}
?>
This will print out:
Rank User Points
1. g#v 18444343
2. elly 4458330
3. elkos 4452607
4. grack 2769789
5. sonickydon 2707133
6. SDBoyd 2636721
...
If you want to search for a single user, then just replace the query line with this:
$pokemon_row = $pokemon_xpath->query('//tr[td/a[#id="g#v"]]');
which will print:
1. g#v 18444343
Hope that helps.

regex to scrape data from web page

I tried to scrap data from web page using regex but it gives DOM warning. So I want to know, is it possible for regex to scrape date, review, rate value from this page?
http://www.yelp.com/biz/franchino-san-francisco?start=80
Here is with DOM:
https://eval.in/143074 give error.
This works for smaller code : https://eval.in/143036
Is it possible using regex?
<?php
$html= file_get_contents('http://www.yelp.com/biz/franchino-san-francisco?start=80');
$html = escapeshellarg($html) ;
$html = nl2br($html);
$classname = 'rating-qualifier';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[#class='" . $classname . "']");
if ($results->length > 0) {
echo $review = $results->item(0)->nodeValue;
}
$classname = 'review_comment ieSucks';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[#class='" . $classname . "']");
if ($results->length > 0) {
echo $review = $results->item(0)->nodeValue;
}
$meta = $dom->documentElement->getElementsByTagName("meta");
echo $meta->item(0)->getAttribute('content');
?>

PHP Xpath Error already defined in Entity not showing results

I am getting errors in this php xpath app and i cannot fix, i would love some help if possible
<?php
//Get Username
$username = $_GET["u"];
$html = file_get_contents('http://us.playstation.com/publictrophy/index.htm?onlinename=' .$username);
$html = tidy_repair_string($html);
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
// Now query the document:
foreach ($xpath->query('//*[#id="id-handle"]') as $node) {
echo $node, "\n";
}
foreach ($xpath->query('//*[#id="leveltext"]') as $node1) {
echo $node1, "\n";
}
?>
put # before $dom->loadHTML($html) because loadHTML usually rises a lot of warnings and notices
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

Categories