php read html and handle double id-appearance

php read html and handle double id-appearance - php

For my project I'm reading an external website which has used the same ID twice. I can't change that.
I need the content from the second appearance of that ID but my code just results the first one and does not see the second one.
Also a count to $data results 1 but not 2.
I'm desperate. Does anyone have an idea how to access the second ID 'hours'?
<?PHP
$url = 'myurl';
$contents = file_get_contents($url);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile($url);
$data = $dom->getElementById("hours");
echo $data->nodeValue."\n";
echo count($data);
?>

As #rickdenhaan points out, getElementById always returns a single element which is the first element that has that specific value of id. However you can use DOMXPath to find all nodes which have a given id value and then pick out the one you want (in this code it will find the second one):
$url = 'myurl';
$contents = file_get_contents($url);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile($url);
$xpath = new DOMXPath($dom);
$count = 0;
foreach ($xpath->query("//*[#id='hours']") as $node) {
if ($count == 1) echo $node->nodeValue;
$count++;
}
As #NigelRen points out in the comments, you can simplify this further by directly selecting the second input in the XPath i.e.
$node = $xpath->query("(//*[#id='hours'])[2]")[0];
echo $node->nodeValue;
Demo on 3v4l.org

Related

php - loadHTML() - every <p> until a certain class

I'm calling some wikipedia content two different way:
$html = file_get_contents('https://en.wikipedia.org/wiki/Sans-serif');
The first one is to call the first paragraph
$dom = new DomDocument();
#$dom->loadHTML($html);
$p = $dom->getElementsByTagName('p')->item(0)->nodeValue;
echo $p;
The second one is to call the first paragraph after a specific $id
$dom = new DOMDocument();
#$dom->loadHTML($html);
$p=$dom->getElementById('$id')->getElementsByTagName('p')->item(0);
echo $p->nodeValue;
I'm looking for a third way to call all the first part.
So I was thinking about calling all the <p> before the id or class "toc" which is the id/class of the table of content.
Any idea how to do that?

If you're just looking for the intro in plain text, you can simply use Wikipedia's API:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Sans-serif
If you want HTML formatting as well (excluding inner images and the likes):
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&titles=Sans-serif

You could use DOMDocument and DOMXPath with for example an xpath expression like:
//div[#id="toc"]/preceding-sibling::p
$doc = new DOMDocument();
$doc->load("https://en.wikipedia.org/wiki/Sans-serif");
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[#id="toc"]/preceding-sibling::p');
foreach ($nodes as $node) {
echo $node->nodeValue;
}
That would give you the content of the paragraphs preceding the div with id = toc.

Display external html specific value

I want to display the "hiOutsideTemp" value from this html page: http://amira.meteokrites.gr/ZWNTANA.htm to my page. It's a temp value.
I'm using the following code:
<?php
require_once 'simple_html_dom.php';
$html_string = file_get_contents('http://amira.meteokrites.gr/ZWNTANA.htm');
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html_string);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$values = array();
$row = $xpath->query('//td[#id="/html/body/table/tbody/tr[7]/td[2]"]');
foreach($row as $value) {
$values[] = trim($value->textContent);
}
echo '<pre>';
print_r($values);
?>
But i get no result.
What shall i do?

Looking at the content of the URL you have, it just seems to be a list of settings and not a HTML page (this is only displayed if you don't have the actual HTML file included). The content is something like...
imerominia="29/03/18";
ora=" 9:37";
sunriseTime=" 7:10";
sunsetTime="19:37";
ForecastStr=" Mostly cloudy and cooler. Windy with possible wind shift to the W, NW, or N. ";
tempUnit="°C";
outsideTemp="9.3";
hiOutsideTemp="9.5";
hiOutsideTempAT=" 0:03";
lowOutsideTemp="6.2";
lowOutsideTempAT=" 4:24";
...
So you can just load it as though it's an ini file format and this gives you an associative array of the data.
$html_string = file_get_contents('http://amira.meteokrites.gr/ZWNTANA.htm');
$data = parse_ini_string($html_string);
echo $data["hiOutsideTemp"]; // outputs - 9.5

How to detect a meta tag of another site?

I am trying to verify that someone actually owns the site that they claim to own. I need to detect a meta tag that I give them with a unique code. How can I go about doing this?

This would detect and print out the meta tag:
<?php
$html = file_get_contents('http://example.com');
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$element = $xpath->query('//meta');
// item(0) returns the 1st meta string, item(1) returns the 2nd meta string
$element = $element->item(0);
$result = $dom->saveXML($element);
$result = preg_replace('/(<[^>]+) style=".*?"/i', '$1', $result);
echo htmlspecialchars($result);
?>

How to get id of HTML elements

In PHP, I want to parse a HTML page and obtain the ids of certain elements. I am able to obtain all the elements, but unable to obtain the ids.
$doc = new DOMDocument();
$doc->loadHTML('<html><body><h3 id="h3-elem-id">A</h3></body></html>');
$divs = $doc->getElementsByTagName('h3');
foreach($divs as $n) {
(...)
}
Is there a way to also obtain the id of the element?
Thank you.

If you want the id attribute values, then you need to use getAttribute():
$doc = new DOMDocument();
$doc->loadHTML('<html><body><h3 id="h3-elem-id">A</h3></body></html>');
$divs = $doc->getElementsByTagName('h3');
foreach($divs as $n) {
echo $n->getAttribute('id') . '<br/>';
}

xpath extract complete html

I am trying to extract a complete table including the HTML tags, with XPath, that I can store in a variable, do a bit of string replacement on, then echo directly to the screen. I have found numerous posts on getting the text out of the table but I want to retain the HTML formatting since I am just going to display it (after minor modification).
At present I am extracting the table using string functions stristr, substr etc. but I would prefer to use XPath.
I can display the contents of the table with the following but it just displays the table TD fields with no formatting. It also does not store it in a variable that I can manipulate.
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$arr = $xpath->query('//table');
foreach($arr as $el) {
echo $el->textContent;
I tried this but got no output:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$arr = $xpath->query('//table');
echo $arr->saveHTML();

Use DOMNode::C14N():
foreach($arr as $el) {
echo $el->C14N();

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php read html and handle double id-appearance - php

Related

php - loadHTML() - every <p> until a certain class

Display external html specific value

How to detect a meta tag of another site?

How to get id of HTML elements

xpath extract complete html

Categories

Resources