I'm using Simple HTML DOM to try and extract a div and all of it's contents from a target URL, here is my code:
<?php
require 'simple_html_dom.php';
$html = file_get_html('http://mozilla.org');
foreach($html->find('.accordion') as $element)
echo $element . '<br>';
?>
The problem I have is that the above code only extracts the plain text of the div. There are also images in the div that I need to extract. If I use this following code, then all images are extracted but so is everything else in the page.
<?php
require 'simple_html_dom.php';
$html = file_get_html('http://mozilla.org');
echo $html;
?>
So my question is, how can I use the first bit of code to extract the contents + images from .accordion?
Thanks
You could always try;
$imgs = array();
foreach($html->find('.accordion',0)->find('img') as $img){
$imgs[] = $img->src;
}
print_r($imgs);
This should populate the $imgs variable with all of the image links from the .accordion div.
:)
Related
I'm totally new to php, and I'm having a hard time changing the src attribute of img tags.
I have a website that pulls a part of a page using Simple Html Dom php, here is the code:
<?php
include_once('simple_html_dom.php');
$html = file_get_html('http://www.tabuademares.com/br/bahia/morro-de-sao-paulo');
foreach($html ->find('img') as $item) {
$item->outertext = '';
}
$html->save();
$elem = $html->find('table[id=tabla_mareas]', 0);
echo $elem;
?>
This code correctly returns the part of the page I want. But when I do this the img tags comes with the src of the original page: /assets/svg/icon_name.svg
What I want to do is change the original src so that it looks like this: http://www.mywebsite.com/wp-content/themes/mytheme/assets/svg/icon_name.svg
I want to put the url of my site in front of assets / svg / icon_name.svg
I already tried some tutorials, but I could not make any work.
Could someone please kind of help a noob in php?
i could make it work. So if someone have the same question, here is how i managed to get the code working.
<?php
// Note you must download the php files simple_html_dom.php from
// this link https://sourceforge.net/projects/simplehtmldom/files/
//than include them
include_once('simple_html_dom.php');
//target the website
$html = file_get_html('http://the_target_website.com');
//loop thru all images of the html dom
foreach($html ->find('img') as $item) {
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $item->src;
// Set a attribute
$item->src = 'http://yourwebsite.com/'.$value;
}
//save the variable
$html->save();
//findo on html the div you want to get the content
$elem = $html->find('div[id=container]', 0);
//output it using echo
echo $elem;
?>
That's it!
did you read the documentation for read and modify attributes
As per that
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $e->href;
// Set a attribute
$e->href = 'ursitename'.$value;
I need to get all images from the info box in Wikipedia page. I made this code but it gets all images from the page not only for the info box ,i need some help.
include("simple_html_dom.php");
$wikilink = "http://en.wikipedia.org/wiki/Aberdeen_F.C.";
//Wikipedia page to parse
$html = file_get_html($wikilink);
$images_array = array();
foreach ($html->find('table.infobox vcard td, img') as $element) {
$allimages = strtok($element->src . '|', '|');
array_push($images_array, $allimages);
}
print_r($images_array);
The below example shows the html elements what i want to get
say i have html code like this
$html = "This is some stuff right here. OH MY GOSH";
i am trying to get values of href and also on which anchor work i mean check this out text i am able to get href value by following this code
$displaybody->find('a ') as $element;
echo $element;
well it works for me but how do i get value of check this out could you guys help me out. i did search but i am not able to find it out . thanks in advance
my actual html look like this
» Download MP4 « - <b>144p (Video Only)</b> - <span> 19.1</span> MB<br />
my href look like this above code return download mp4 and i want it like downloadmp4 114p (video only) 19.1 mb how do i do that
If what you are using now is the SimpleHTMLDOM, then ->innertext works fine on that anchor elements that you have found:
include 'simple_html_dom.php';
$html = "This is some stuff right here. OH MY GOSH";
$displaybody = str_get_html($html);
foreach($displaybody->find('a ') as $element) {
echo $element->innertext . '<br/>';
}
If you were referring to PHP's DOMDocument, then its not find() function you need to use, to target each anchor element, you need to use ->getElementsByTagName(), then each selected elements you need to use ->nodeValue:
$html = "This is some stuff right here. OH MY GOSH";
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $element) {
echo $element->nodeValue . '<br/>';
}
I have code that gets a div contents:
include_once('simple_html_dom.php');
$html = file_get_html("link");
$ret = $html->find('div');
echo $ret[0];
preg_match_all('/(src)=("[^"]*")/i',$ret[0], $link);
echo $link[0];
It returns the full div contents including all the CSS. However I just wanted it to echo the information after src= basically just echoing the image link and nothing else. I've tried to use preg_match with no success.
Any ideas?
Your HTML parser will help you there - there should be a src property in the $ret object:
echo $ret[0]->src;
You don't need regexp for that since you already use a dom parser.
foreach($ret as $element)
echo $element->src,'<br/>';
Folks,
I am using SIMPLEHTMLPARSER.
I am not able to parse HTML, When i var_dump the html document, it just shows the DOM structure and no HTML content.
$produrl = 'http://wap.ebay.com/Pages/ViewItem.aspx?aid=160586179890&sv=160586179890/';
var_dump(file_get_html($produrl));
$html = file_get_html($produrl);
var_dump($html->find('div[id=Teaser_Item] img[src]', 0));
Actually, what i want to extract is the IMG SRC which is:
http://wap.ebay.com/Pages/RbHttpHandler.ashx?width=51&height=240&fsize=999000&format=jpg&url=http%3A%2F%2Fi.ebayimg.com%2F00%2F%24%28KGrHqN%2C!jEE2n%28iTLozBNwBPG0bUg~~0_1.JPG%3Fset_id%3D8800005007
can someone help me debugging this, please?
Cheers
Natasha Thomas
<?php
require_once('simple_html_dom.php');
$produrl = 'http://wap.ebay.com/Pages/ViewItem.aspx?aid=160586179890&sv=160586179890/';
// Grab the document
$html = file_get_html($produrl);
// Find the img tag in the Teaser_Item div
$a = $html->find('div[id=Teaser_Item] img', 0);
// Display the src
echo($a->attr['src']);
?>