DOMDocument type object recognition - php

This is my php code:
$dom = new DOMDocument();
$html ='<html><body><input type="text" name="test" id="test" class="form-control" value="120.00" style="text-align: right;"></body></html>';
$dom->loadHTML($html);
$myElement = $dom->getElementById("test");
How to get the type of object and type with property (input type="hidden")? for example
if ($myElement->is('input')) then etc....
if ($myElement->is('img')) then etc....
if (($myElement->is('input')) && ($myElement->has('hidden'))) then etc....
is possible?
Thank's a lot.
Aesis.

You could do like this... Make use of the getAttribute of the DOMDocument Class
<?php
$dom = new DOMDocument();
$html ='<html><body><input type="text" name="test" id="test" class="form-control" value="120.00" style="text-align: right;"></body></html>';
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('input') as $tag) {
if ($tag->getAttribute('name') === 'test') {
echo $tag->getAttribute('value'); //"prints" 120.00
echo $tag->getAttribute('type'); //"prints" text (attribute)
}
}
You can do the same for other attributes too.

Did you try $myElement->tagName or $dom->getElementById("test")->tagName ?
http://www.php.net/manual/pt_BR/domdocument.getelementbyid.php

Try this...
You can get the type of object by using the code below..
$typeofObj = $myElement->nodeName;
echo $typeOfObj;
and to find it has attribute "hidden" then
$node = $dom->saveHTML($myElement);
if(preg_match("/(hidden)/i",$node)) {
// has hidden
}
else { //not have hidden
}

Related

How to Get value with name by Dom

Hy friends I am using this method to get all href links from tag from a site
$DOM = new DOMDocument();
#$DOM->loadHTML($data);
#$links = $DOM->getElementsByTagName('a');
foreach($links as $link){
$url = $link->getAttribute('href');
echo $url;
Now I don't know how to get the value by name fb_dtsg ..... Here is the source code
<input type="hidden" name="fb_dtsg" value="AQF0dSiG6Lyr:AQEnJP0PhWzy" autocomplete="off" />
I want to get it's value with DOm how to do this...... Thanks in advance
$DOM = new DOMDocument();
#$DOM->loadHTML($data);
#$links = $DOM->getElementsByTagName('input');
foreach($inputs as $input) {
if ($input->getAttribute('name') == 'fb_dtsg') {
echo 'found, do whatever';
break;
}
}
You can use DOMXpath()'s query method to get elements by the name attribute.
$DOM = new DOMDocument();
#$DOM->loadHTML($data);
#$links = $DOM->getElementsByTagName('a');
$xpath = new DOMXpath($DOM);
$input = $xpath->query('//input[#name="fb_dtsg"]');
echo $input[0]->getAttribute('value');
This will print the value of the first input element with name 'fb_dtsg'.
Hope it helps :) Feel free to ask if you need to know anything more.
Use xpath for that.
$DOM = new DOMDocument();
#$DOM->loadHTML($data);
$xpath = new DOMXpath($DOM);
$elementByName = $xpath->query("//input[#name='fb_dtsg']");
...
http://php.net/manual/ro/class.domxpath.php
$DOM->getElementsByTagName('a'); // for tag name
$DOM->getElementsByName('fb_dtsg'); // for name
document.getElementById('fb_dtsg_id').value // for showing value of the field

Simple dom php parse get custom data attribute value

HTML:
<div class="something" data-a="abc">ddsf</d>
PHP:
foreach ($dom->find('.something[data-rel]') as $this) {
var_dump($this->attr());
}
I tried this but error. Couldn't find any info on its documentation. I want to get the data-a's value which is abc.
Why not just use the well-documented, built in, DOM extension?
Example:
$html = '<div class="something" data-a="abc">ddsf</div>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//div[#class="something"]/#data-a');
foreach ($nodes as $node) {
var_dump($node->value);
}
Output:
string(3) "abc"
It looks like this:
$dom->find('div[data-a]',0)->{'data-a'}
use
foreach($html->find('button') as $element)
echo $element->{'data-coupon'};
Use xpath
Should be something like this:
foreach ($dom->xpath('/div[#data-a]') as $item) {
...
}

PHP Simple HTML DOM Parser: Accessing custom attributes

I want to access a custom attribute that I added to some elements in an HTML file, here's an example of the littleBox="somevalue" attribute
<div id="someId" littleBox="someValue">inner text</div>
The Following doesn't work:
foreach($html->find('div') as $element){
echo $element;
if(isset($element->type)){
echo $element->littleBox;
}
}
I saw an article with a similar problem, but I couldn't replicate it for some reason. Here is what I tried:
function retrieveValue($str){
if (stripos($str, 'littleBox')){//check if element has it
$var=preg_split("/littleBox=\"/",$str);
//echo $var[1];
$var1=preg_split("/\"/",$var[1]);
echo $var1[0];
}
else
return false;
}
When ever I call the retrieveValue() function, nothing happens. Is $element (in the first PHP example above) not a string? I don't know if I missed something but it's not returning anything.
Here's the script in it's entirety:
<?php
require("../../simplehtmldom/simple_html_dom.php");
if (isset($_POST['submit'])){
$html = file_get_html($_POST['webURL']);
// Find all images
foreach($html->find('div') as $element){
echo $element;
if(isset($element->type)!= false){
echo retrieveValue($element);
}
}
}
function retrieveValue($str){
if (stripos($str, 'littleBox')){//check if element has it
$var=preg_split("/littleBox=\"/",$str);
//echo $var[1];
$var1=preg_split("/\"/",$var[1]);
return $var1[0];
}
else
return false;
}
?>
<form method="post">
Website URL<input type="text" name="webURL">
<br />
<input type="submit" name="submit">
</form>
Have you tried:
$html->getElementById("someId")->getAttribute('littleBox');
You could also use SimpleXML:
$html = '<div id="someId" littleBox="someValue">inner text</div>';
$dom = new DOMDocument;
$dom->loadXML($html);
$div = simplexml_import_dom($dom);
echo $div->attributes()->littleBox;
I would advice against using regex to parse html but shouldn't this part be like this:
$str = $html->getElementById("someId")->outertext;
$var = preg_split('/littleBox=\"/', $str);
$var1 = preg_split('/\"/',$var[1]);
echo $var1[0];
Also see this answer https://stackoverflow.com/a/8851091/1059001
See that http://code.google.com/p/phpquery/ it's like jQuery but on php. Very strong library.

PHP code to read a web page's source and get attribute from a tag

I am reading a source code of a page in PHP. There is an hidden input field <input type="hidden" name="session_id" value= in that page.
$url = 'URL HERE';
$needle = '<input type="hidden" name="session_id" value=';
$contents = file_get_contents($url);
if(strpos($contents, $needle)!== false) {
echo 'found';
} else {
echo 'not found';
}
I want to read that hidden field value.
By far the best way to do this is with the DOM extension to PHP.
$dom = new DOMDocument;
$dom->loadHtmlFile('your URL');
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//input[#name="session_id"]');
if ($elements->length) {
echo "found: ", $elements->item(0)->getAttribute('value');
} else {
echo "not found";
}
I'd look into PHP's native DOMDocument extension:
http://www.php.net/manual/en/domdocument.getelementbyid.php#example-4867

how to use dom php parser

I'm new to DOM parsing in PHP:
I have a HTML file that I'm trying to parse. It has a bunch of DIVs like this:
<div id="interestingbox">
<div id="interestingdetails" class="txtnormal">
<div>Content1</div>
<div>Content2</div>
</div>
</div>
<div id="interestingbox">
......
I'm trying to get the contents of the many div boxes using php.
How can I use the DOM parser to do this?
Thanks!
First i have to tell you that you can't use the same id on two different divs; there are classes for that point. Every element should have an unique id.
Code to get the contents of the div with id="interestingbox"
$html = '
<html>
<head></head>
<body>
<div id="interestingbox">
<div id="interestingdetails" class="txtnormal">
<div>Content1</div>
<div>Content2</div>
</div>
</div>
<div id="interestingbox2">a link</div>
</body>
</html>';
$dom_document = new DOMDocument();
$dom_document->loadHTML($html);
//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);
// if you want to get the div with id=interestingbox
$elements = $dom_xpath->query("*/div[#id='interestingbox']");
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "\n[". $element->nodeName. "]";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
//OUTPUT
[div] {
Content1
Content2
}
Example with classes:
$html = '
<html>
<head></head>
<body>
<div class="interestingbox">
<div id="interestingdetails" class="txtnormal">
<div>Content1</div>
<div>Content2</div>
</div>
</div>
<div class="interestingbox">a link</div>
</body>
</html>';
//the same as before.. just change the xpath
[...]
$elements = $dom_xpath->query("*/div[#class='interestingbox']");
[...]
//OUTPUT
[div] {
Content1
Content2
}
[div] {
a link
}
Refer to the DOMXPath page for more details.
I got this to work using simplehtmldom as a start:
$html = file_get_html('example.com');
foreach ($html->find('div[id=interestingbox]') as $result)
{
echo $result->innertext;
}
Very nice function from http://www.sitepoint.com/forums/showthread.php?611393-php5-need-something-like-innerHTML-instead-of-nodeValue
function innerXML($node)
{
$doc = $node->ownerDocument;
$frag = $doc->createDocumentFragment();
foreach ($node->childNodes as $child)
{
$frag->appendChild($child->cloneNode(TRUE));
}
return $doc->saveXML($frag);
}
$dom = new DOMDocument();
$dom->loadXML('
<html>
<body>
<table>
<tr>
<td id="foo">
The first bit of Data I want
<br />The second bit of Data I want
<br />The third bit of Data I want
</td>
</tr>
</table>
<body>
<html>
');
$xpath = new DOMXPath($dom);
$node = $xpath->evaluate("/html/body//td[#id='foo' ]");
$dataString = innerXML($node->item(0));
$dataArr = explode("<br />", $dataString);
$dataUno = $dataArr[0];
$dataDos = $dataArr[1];
$dataTres = $dataArr[2];
echo "firstdata = $nameUno<br />seconddata = $nameDos<br />thirddata = $nameTres<br />"
WebExtractor: https://github.com/knyga/webextractor
It can parse page with css, regex, xpath selectors.
Look package and tests for examples:
use WebExtractor\DataExtractor\DataExtractorFactory; use
WebExtractor\DataExtractor\DataExtractorTypes; use
WebExtractor\Client\Client;
$factory = DataExtractorFactory::getFactory(); $extractor =
$factory->createDataExtractor(DataExtractorTypes::CSS); $client = new
Client; $content =
$client->get('https://en.wikipedia.org/wiki/2014_Winter_Olympics');
$extractor->setContent($content); $h1 =
$extractor->setSelector('h1')->extract();

Categories