How to find <title> and <form> using xpath query

How to find <title> and <form> using xpath query - php

I'm using xpath query to find <title> text and occurrence of <form>.. My code is as follows
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pokemon_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query('/title');
if($pokemon_row->length > 0){
$data["title"]="Yes";
}
}
But its not working any idea how to do this?

Try this
$pokemon_row = $pokemon_xpath->query('//title');
$data["title"]="";
if($pokemon_row->length > 0){
foreach($pokemon_row as $row){
$data["title"].= $row->nodeValue . "<br/>";
}
}

Related

how to get content from div container with two class names (imdb)

Excuse my English.
everybody,
I get a white page when I try to query the content in the DIV container of the URL.
$html = file_get_contents('https://www.imdb.com/search/title?title_type=feature,tv_movie&release_date=,2018'); //get the html returned from the following url
$doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$xpath = new DOMXPath($doc);
//get all the h2's with an id
$row = $xpath->query("//div[contains(#class, 'lister-item-image') and contains(#class, 'float-left')]/a");
if($row->length > 0){
foreach($row as $row){
echo $row->nodeValue . "<br/>";
}
}
}
The content can be found within this DIV´s .
<div class="lister-item-image float-left">
<a href="/title/tt1502407/?ref_=adv_li_i"
> <img alt="Halloween"
class="loadlate"
loadlate="https://m.media-amazon.com/images/M/MV5BMmMzNjJhYjUtNzFkZi00MWQ4LWJiMDEtYWM0NTAzNGZjMTI3XkEyXkFqcGdeQXVyOTE2OTMwNDk#._V1_UX67_CR0,0,67,98_AL_.jpg"
data-tconst="tt1502407"
height="98"
src="https://m.media-amazon.com/images/G/01/imdb/images/nopicture/large/film-184890147._CB470041630_.png"
width="67" />
</a> </div>
I mainly want to query the name, link, genre and length. And a maximum of 50 should be displayed and a link "Next" the next 50 should be queried.
I thank you in advance for possible help.

Working version:
Thanks to Mohammad.
$html = file_get_contents('https://www.imdb.com/search/title?title_type=feature,tv_movie&release_date=,2018'); //get the html returned from the following url
$doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$xpath = new DOMXPath($doc);
//get all the h2's with an id
$row = $xpath->query("//div[contains(#class, 'lister-item-image') and contains(#class, 'float-left')]");
if($row->length > 0){
foreach($row as $row){
echo $doc->saveHtml($row) . "<br/>";
}
}
}

PHP xpath how to get start tag

I am trying to fetch a form start tag with attributes from a DomDocument loaded with a HTML string.
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$result = $xpath->query('//form[#class="af-form acf-form"]');
if ($result->length > 0) {
echo '<pre>';
print_r(($result->item(0)->C14N()));
echo '</pre>';
die();
}
But this way it prints out the entire form. I would like to fetch only this bit:
<form action="http://localhost/wp-test/form-loose" class="af-form acf-form" id="form_5b72d1cd12cc0" method="POST">
How to do so?

Xpath fetches nodes, not opening/closing tags. DOM is an hierarchy of objects - only the serialized (HTML) string has the opening/closing tags.
However, here are two possible approaches:
Clone the node without its child nodes. Save the clone and remove the closing tag with a string function.
$html = <<<'HTML'
<form
action="http://localhost/wp-test/form-loose"
class="af-form acf-form" id="form_5b72d1cd12cc0" method="POST">
some other stuff
<input>
</form>
HTML;
$document = new DOMDocument();
#$document->loadHTML($html);
$xpath = new DOMXpath($document);
$result = $xpath->evaluate('//form[#class="af-form acf-form"][1]');
foreach ($result as $node) {
echo substr($document->saveHTML($node->cloneNode()), 0, -7);
}
Output:
<form action="http://localhost/wp-test/form-loose" class="af-form acf-form" id="form_5b72d1cd12cc0" method="POST">
Or you save each attribute:
$result = $xpath->evaluate('//form[#class="af-form acf-form"][1]');
foreach ($result as $node) {
$result = '<'.$node->nodeName;
foreach ($node->attributes as $attribute) {
$result .= $document->saveHTML($attribute);
}
$result .= '>';
echo $result;
}
Note: Adding [1] to the Xpath expression limits the result list to the first found node.

Scraping some data using PHP

I like to have update of some selected users score from https://location.services.mozilla.com/leaders For that i want to scrape their data using the id as you see for the first user <a id="g#v" href="#g#v">g#v</a>
I want to have his data using the id on that anchor. I have wrote the attached code but, couldn't succeed. Best if i could create an array like $names=array("g#v","elly","elkos"," grack"); and get the scores of all of them, so if i increase the number of names, i can have everyone's score. I was trying for a single user though:
<?php
$html = file_get_contents('https://location.services.mozilla.com/leaders');
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pokemon_doc->loadHTML($html);
libxml_clear_errors();
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query('//a[#id="g#v"]');
if($pokemon_row->length > 0){
foreach($pokemon_row as $row){
$name = $row->nodeValue;
$scores = $pokemon_xpath->query('//.td.tr/td[#class="text-right"]', $row);
foreach($scores as $score){
$lead = $score->nodeValue;
}
echo $name . ": ". $lead;
}
}
}
?>

I tweaked your scraping code to display all users and their score:
<?php
$html = file_get_contents('https://location.services.mozilla.com/leaders');
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){
$pokemon_doc->loadHTML($html);
libxml_clear_errors();
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query('//tr');
if($pokemon_row->length > 0){
// Get each tr
foreach($pokemon_row as $row){
// Get each td in the tr group
$tds = $row->childNodes;
foreach($tds as $td){
// Filter out empty nodes
if(trim($td->nodeValue, " \n\r\t\0\xC2\xA0") !== "") {
$name = $td->nodeValue;
echo $name . " ";
}
}
echo "<br>" . PHP_EOL;
}
}
}
?>
This will print out:
Rank User Points
1. g#v 18444343
2. elly 4458330
3. elkos 4452607
4. grack 2769789
5. sonickydon 2707133
6. SDBoyd 2636721
...
If you want to search for a single user, then just replace the query line with this:
$pokemon_row = $pokemon_xpath->query('//tr[td/a[#id="g#v"]]');
which will print:
1. g#v 18444343
Hope that helps.

regex to scrape data from web page

I tried to scrap data from web page using regex but it gives DOM warning. So I want to know, is it possible for regex to scrape date, review, rate value from this page?
http://www.yelp.com/biz/franchino-san-francisco?start=80
Here is with DOM:
https://eval.in/143074 give error.
This works for smaller code : https://eval.in/143036
Is it possible using regex?
<?php
$html= file_get_contents('http://www.yelp.com/biz/franchino-san-francisco?start=80');
$html = escapeshellarg($html) ;
$html = nl2br($html);
$classname = 'rating-qualifier';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[#class='" . $classname . "']");
if ($results->length > 0) {
echo $review = $results->item(0)->nodeValue;
}
$classname = 'review_comment ieSucks';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[#class='" . $classname . "']");
if ($results->length > 0) {
echo $review = $results->item(0)->nodeValue;
}
$meta = $dom->documentElement->getElementsByTagName("meta");
echo $meta->item(0)->getAttribute('content');
?>

PHP Xpath Error already defined in Entity not showing results

I am getting errors in this php xpath app and i cannot fix, i would love some help if possible
<?php
//Get Username
$username = $_GET["u"];
$html = file_get_contents('http://us.playstation.com/publictrophy/index.htm?onlinename=' .$username);
$html = tidy_repair_string($html);
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
// Now query the document:
foreach ($xpath->query('//*[#id="id-handle"]') as $node) {
echo $node, "\n";
}
foreach ($xpath->query('//*[#id="leveltext"]') as $node1) {
echo $node1, "\n";
}
?>

put # before $dom->loadHTML($html) because loadHTML usually rises a lot of warnings and notices
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to find <title> and <form> using xpath query - php

Try this $pokemon_row = $pokemon_xpath->query('//title'); $data["title"]=""; if($pokemon_row->length > 0){ foreach($pokemon_row as $row){ $data["title"].= $row->nodeValue . "<br/>"; } }

Related

how to get content from div container with two class names (imdb)

PHP xpath how to get start tag

Scraping some data using PHP

regex to scrape data from web page

PHP Xpath Error already defined in Entity not showing results

Categories

Resources