PHP Simple HTML DOM Parser: Accessing custom attributes - php

I want to access a custom attribute that I added to some elements in an HTML file, here's an example of the littleBox="somevalue" attribute
<div id="someId" littleBox="someValue">inner text</div>
The Following doesn't work:
foreach($html->find('div') as $element){
echo $element;
if(isset($element->type)){
echo $element->littleBox;
}
}
I saw an article with a similar problem, but I couldn't replicate it for some reason. Here is what I tried:
function retrieveValue($str){
if (stripos($str, 'littleBox')){//check if element has it
$var=preg_split("/littleBox=\"/",$str);
//echo $var[1];
$var1=preg_split("/\"/",$var[1]);
echo $var1[0];
}
else
return false;
}
When ever I call the retrieveValue() function, nothing happens. Is $element (in the first PHP example above) not a string? I don't know if I missed something but it's not returning anything.
Here's the script in it's entirety:
<?php
require("../../simplehtmldom/simple_html_dom.php");
if (isset($_POST['submit'])){
$html = file_get_html($_POST['webURL']);
// Find all images
foreach($html->find('div') as $element){
echo $element;
if(isset($element->type)!= false){
echo retrieveValue($element);
}
}
}
function retrieveValue($str){
if (stripos($str, 'littleBox')){//check if element has it
$var=preg_split("/littleBox=\"/",$str);
//echo $var[1];
$var1=preg_split("/\"/",$var[1]);
return $var1[0];
}
else
return false;
}
?>
<form method="post">
Website URL<input type="text" name="webURL">
<br />
<input type="submit" name="submit">
</form>

Have you tried:
$html->getElementById("someId")->getAttribute('littleBox');
You could also use SimpleXML:
$html = '<div id="someId" littleBox="someValue">inner text</div>';
$dom = new DOMDocument;
$dom->loadXML($html);
$div = simplexml_import_dom($dom);
echo $div->attributes()->littleBox;
I would advice against using regex to parse html but shouldn't this part be like this:
$str = $html->getElementById("someId")->outertext;
$var = preg_split('/littleBox=\"/', $str);
$var1 = preg_split('/\"/',$var[1]);
echo $var1[0];
Also see this answer https://stackoverflow.com/a/8851091/1059001

See that http://code.google.com/p/phpquery/ it's like jQuery but on php. Very strong library.

Related

Php Simple HTML DOM Parser - how to work with block repeats

I have blocks of <h2> but without attributes. After that go blocks of <p> without attributes.
Structure of this looked like this:
<h2></h2>
<p></p>
<p></p>
<p></p>
<h2></h2>
<p></p>
<p></p>
<h2></h2>
<p></p>
I'm using Php Simple HTML DOM Parser. I want to get data from <h2> block, after that get all <p> to another <h2> and so on.
But all <h2> must be connected to <p> which go after them. I thought to use key => value (example <h2> => <p>,<p>,... and another <h2>) but I am not sure how to do this.
Also, I know about next_sibling(), but don't know how to use it in loop. I did 2 variables, 1st has all <h2>, 2nd has <p>. I thought it can be useful for my goal. Here is the code:
$test = file_get_html('url');
foreach($test->find('h2') as $test2) {
echo $test2 . '<br>';
foreach($test->find('p') as $test3) {
echo $test3 .'<br>';
}
}
It's not super clear what you're looking for but here's an idea to get you started:
foreach($html->find('h2') as $el){
$h2 = $el;
while($el = $el->next_sibling()){
if('p' != $el->tag) break;
// do something
}
}
Answer of my question here. I hope it can help somebody!)
`foreach ($html->find('.div') as $div)
{
if(!$next=$div->next_sibling()) continue;
if($next->tag==='h2')
{
$h2 =$next;
echo $h2;
while ($h2 = $h2->next_sibling())
{
if(!$h2->tag=='p') break;
{
$p =$h2;
echo $p;
}
}
while ($h2 = $h2->next_sibling())
{
if(!$h2->tag=='table') break;
{
$tab =$h2;
echo $tab;
}
}
while ($h2 = $h2->next_sibling())
{
if(!$h2->tag=='ul') break;
{
$ul =$h2;
echo $ul;
}
}
}
else continue;
}`

DOMDocument type object recognition

This is my php code:
$dom = new DOMDocument();
$html ='<html><body><input type="text" name="test" id="test" class="form-control" value="120.00" style="text-align: right;"></body></html>';
$dom->loadHTML($html);
$myElement = $dom->getElementById("test");
How to get the type of object and type with property (input type="hidden")? for example
if ($myElement->is('input')) then etc....
if ($myElement->is('img')) then etc....
if (($myElement->is('input')) && ($myElement->has('hidden'))) then etc....
is possible?
Thank's a lot.
Aesis.
You could do like this... Make use of the getAttribute of the DOMDocument Class
<?php
$dom = new DOMDocument();
$html ='<html><body><input type="text" name="test" id="test" class="form-control" value="120.00" style="text-align: right;"></body></html>';
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('input') as $tag) {
if ($tag->getAttribute('name') === 'test') {
echo $tag->getAttribute('value'); //"prints" 120.00
echo $tag->getAttribute('type'); //"prints" text (attribute)
}
}
You can do the same for other attributes too.
Did you try $myElement->tagName or $dom->getElementById("test")->tagName ?
http://www.php.net/manual/pt_BR/domdocument.getelementbyid.php
Try this...
You can get the type of object by using the code below..
$typeofObj = $myElement->nodeName;
echo $typeOfObj;
and to find it has attribute "hidden" then
$node = $dom->saveHTML($myElement);
if(preg_match("/(hidden)/i",$node)) {
// has hidden
}
else { //not have hidden
}

Simple HTML DOM Not Finding DIV

I have code trying to extract the Event SKU from the Robot Events Page, here is an example. The code that I am using dosn't find any of the SKU on the page. The SKU is on line 411, with a div of the class "product-sku". My code doesn't event find the Div on the page and just downloads all the events. Here is my code:
<?php
require('simple_html_dom.php');
$html = new simple_html_dom();
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = file_get_html($event[4]);
$html->load($htmldown);
echo "Downloaded";
foreach ($html->find('div[class=product-sku]') as $row) {
$sku = $row->plaintext;
echo $sku;
}
}
?>
Can anyone help me fix my code?
This code is used DOMDocument php class. It works successfully for below sample HTML. Please try this code.
// new dom object
$dom = new DOMDocument();
// HTML string
$html_string = '<html>
<body>
<div class="product-sku1" name="div_name">The this the div content product-sku</div>
<div class="product-sku2" name="div_name">The this the div content product-sku</div>
<div class="product-sku" name="div_name">The this the div content product-sku</div>
</body>
</html>';
//load the html
$html = $dom->loadHTML($html_string);
//discard white space
$dom->preserveWhiteSpace = TRUE;
//the table by its tag name
$divs = $dom->getElementsByTagName('div');
// loop over the all DIVs
foreach ($divs as $div) {
if ($div->hasAttributes()) {
foreach ($div->attributes as $attribute){
if($attribute->name === 'class' && $attribute->value == 'product-sku'){
// Peri DIV class name and content
echo 'DIV Class Name: '.$attribute->value.PHP_EOL;
echo 'DIV Content: '.$div->nodeValue.PHP_EOL;
}
}
}
}
I would use a regex (regular expression) to accomplish pulling skus out.
The regex:
preg_match('~<div class="product-sku"><b>Event Code:</b>(.*?)</div>~',$html,$matches);
See php regex docs.
New code:
<?php
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = curl_init($event[4]);
curl_setopt($htmldown, CURLOPT_RETURNTRANSFER, true);
$html=curl_exec($htmldown);
curl_close($htmldown)
echo "Downloaded";
preg_match('~<div class="product-sku"><b>Event Code:</b>(.*?)</div>~',$html,$matches);
foreach ($matches as $row) {
echo $row;
}
}
?>
And actually in this case (using that webpage) being that there is only one sku...
instead of:
foreach ($matches as $row) {
echo $row;
}
You could just use: echo $matches[1]; (The reason for array index 1 is because the whole regex pattern plus the sku will be in $matches[0] but just the subgroup containing the sku is in $matches[1].)
try to use
require('simple_html_dom.php');
$html = new simple_html_dom();
if(!$events)
{
echo mysqli_error($con);
}
while($event = mysqli_fetch_row($events))
{
$htmldown = str_get_html($event[4]);
echo "Downloaded";
foreach ($htmldown->find('div[class=product-sku]') as $row) {
$sku = $row->plaintext;
echo $sku;
}
}
and if class "product-sku" is only for div's then you can use
$htmldown->find('.product-sku')

PHP code to read a web page's source and get attribute from a tag

I am reading a source code of a page in PHP. There is an hidden input field <input type="hidden" name="session_id" value= in that page.
$url = 'URL HERE';
$needle = '<input type="hidden" name="session_id" value=';
$contents = file_get_contents($url);
if(strpos($contents, $needle)!== false) {
echo 'found';
} else {
echo 'not found';
}
I want to read that hidden field value.
By far the best way to do this is with the DOM extension to PHP.
$dom = new DOMDocument;
$dom->loadHtmlFile('your URL');
$xpath = new DOMXPath($dom);
$elements = $xpath->query('//input[#name="session_id"]');
if ($elements->length) {
echo "found: ", $elements->item(0)->getAttribute('value');
} else {
echo "not found";
}
I'd look into PHP's native DOMDocument extension:
http://www.php.net/manual/en/domdocument.getelementbyid.php#example-4867

php simple html dom parser doesn't return anything

Why won't my script return the div with the id of "pp-featured"?
<?php
# create and load the HTML
include('lib/simple_html_dom.php');
$html = new simple_html_dom();
$html->load("http://maps.google.com/maps/place?cid=6703996311168776503&q=hills+garage&hl=en&view=feature&mcsrc=google_reviews&num=20&start=0&ved=0CFUQtQU&sa=X&ei=sCq_Tr3mJZToygTOmuCGCg");
$ret = $html->find('div[id=pp-featured]');
# output it!
echo $ret->save();
?>
this gets me on my way. Thanks for your help.
<?php
include_once 'lib/simple_html_dom.php';
$url = "http://maps.google.com/maps/place?cid=6703996311168776503&q=hills+garage&hl=en&view=feature&mcsrc=google_reviews&num=20&start=0&ved=0CFUQtQU&sa=X&ei=sCq_Tr3mJZToygTOmuCGCg";
$html = file_get_html($url);
$ret = $html->find('div[id=pp-reviews]');
foreach($ret as $story)
echo $story;
?>
The library always returns an array because it may be possible that more than one item matches the selector.
If you expect only one you should check to ensure the page your analyzing is behaving as expected.
Suggested solution:
<?php
include_once 'lib/simple_html_dom.php';
$url = "http://maps.google.com/maps/place?cid=6703996311168776503&q=hills+garage&hl=en&view=feature&mcsrc=google_reviews&num=20&start=0&ved=0CFUQtQU&sa=X&ei=sCq_Tr3mJZToygTOmuCGCg";
$html = file_get_html($url);
$ret = $html->find('div[id=pp-reviews]');
if(count($ret)==1){
echo $ret[0]->save();
}
else{
echo "Something went wrong";
}

Categories