I need to get content from external page.
For example:
Let's use this site: https://en.wikipedia.org/wiki/Main_Page I need to get
only content of "On this day..." so it means div with id="mp-otd"
How can I do that with PHP?
You can do this by suing PHP DOM parser
include_once('simple_html_dom.php');
$html = file_get_html('https://en.wikipedia.org/wiki/Main_Page');
$div_content = $html->find('div[id=mp-otd]', 0);
Need to download library from http://simplehtmldom.sourceforge.net/
for example
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find specific
foreach($html->find('div #mp-otd') as $element)
echo $element->innertext . '<br>';
Related
I'm totally new to php, and I'm having a hard time changing the src attribute of img tags.
I have a website that pulls a part of a page using Simple Html Dom php, here is the code:
<?php
include_once('simple_html_dom.php');
$html = file_get_html('http://www.tabuademares.com/br/bahia/morro-de-sao-paulo');
foreach($html ->find('img') as $item) {
$item->outertext = '';
}
$html->save();
$elem = $html->find('table[id=tabla_mareas]', 0);
echo $elem;
?>
This code correctly returns the part of the page I want. But when I do this the img tags comes with the src of the original page: /assets/svg/icon_name.svg
What I want to do is change the original src so that it looks like this: http://www.mywebsite.com/wp-content/themes/mytheme/assets/svg/icon_name.svg
I want to put the url of my site in front of assets / svg / icon_name.svg
I already tried some tutorials, but I could not make any work.
Could someone please kind of help a noob in php?
i could make it work. So if someone have the same question, here is how i managed to get the code working.
<?php
// Note you must download the php files simple_html_dom.php from
// this link https://sourceforge.net/projects/simplehtmldom/files/
//than include them
include_once('simple_html_dom.php');
//target the website
$html = file_get_html('http://the_target_website.com');
//loop thru all images of the html dom
foreach($html ->find('img') as $item) {
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $item->src;
// Set a attribute
$item->src = 'http://yourwebsite.com/'.$value;
}
//save the variable
$html->save();
//findo on html the div you want to get the content
$elem = $html->find('div[id=container]', 0);
//output it using echo
echo $elem;
?>
That's it!
did you read the documentation for read and modify attributes
As per that
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $e->href;
// Set a attribute
$e->href = 'ursitename'.$value;
I try to isert a stock-chart-module from an orther site into my own website.
As i use:
jimport('simplehtmldom.simple_html_dom');
// get DOM from URL or file
$html = file_get_html('http://www.raiffeisen.com/');
foreach($html->find('div#agrarfenster') as $element)
echo $element->innertext;
The Output will work. But i need this Code for the required output:
jimport('simplehtmldom.simple_html_dom');
// get DOM from URL or file
$html = file_get_html('http://www.raiffeisen.com/');
foreach($html->find('div#boersenfenster_bf_4562') as $element)
echo $element->innertext;
This Code would'nt work. But why?
My guess is that there are those underscores in the "boersenfenster_bf_4562".
Can somebody help me?
say i have html code like this
$html = "This is some stuff right here. OH MY GOSH";
i am trying to get values of href and also on which anchor work i mean check this out text i am able to get href value by following this code
$displaybody->find('a ') as $element;
echo $element;
well it works for me but how do i get value of check this out could you guys help me out. i did search but i am not able to find it out . thanks in advance
my actual html look like this
» Download MP4 « - <b>144p (Video Only)</b> - <span> 19.1</span> MB<br />
my href look like this above code return download mp4 and i want it like downloadmp4 114p (video only) 19.1 mb how do i do that
If what you are using now is the SimpleHTMLDOM, then ->innertext works fine on that anchor elements that you have found:
include 'simple_html_dom.php';
$html = "This is some stuff right here. OH MY GOSH";
$displaybody = str_get_html($html);
foreach($displaybody->find('a ') as $element) {
echo $element->innertext . '<br/>';
}
If you were referring to PHP's DOMDocument, then its not find() function you need to use, to target each anchor element, you need to use ->getElementsByTagName(), then each selected elements you need to use ->nodeValue:
$html = "This is some stuff right here. OH MY GOSH";
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $element) {
echo $element->nodeValue . '<br/>';
}
Current code is like this :
include 'simple_html_dom.php';
// Create DOM from URL or file
$html = file_get_html('http://www.AnyLinkAlsoCan.com');
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
It will crawl and find tag like this :
<a href="http://news.example.com/node">
And will output all link it found on the website.
Example
http://news.example.com.my/node/321072
http://news.example.com.my/taxonomy/term/2
http://news.example.com.my/node/321060?tid=2
I want to search url that contain only ?tid= as you can see on the 3rd URL in the example.
http://news.example.com.my/node/321060?tid=2
I replace echo $element->href="*?tid but that just return error. Can someone help me with this?
You can use preg_match or you can check all urls taken if they contain ?tid
<?php
include 'simple_html_dom.php';
// Create DOM from URL or file
$html = file_get_html('http://www.AnyLinkAlsoCan.com');
// Find all links
foreach($html->find('a') as $element) {
$search = '?tid';
if(strpos($element->href,$search)) {
echo $element->href . '<br>';
}
}
?>
Use parse_url() to parse each url and then only select ones you want based on PHP_URL_QUERY
How should parse with PHP (simple html dom/etc..) background and other images of webpage?
case 1: inline css
<div id="id100" style="background:url(/mycar1.jpg)"></div>
case 2: css inside html page
<div id="id100"></div>
<style type="text/css">
#id100{
background:url(/mycar1.jpg);
}
</style>
case 3: separate css file
<div id="id100" style="background:url(/mycar1.jpg);"></div>
external.css
#id100{
background:url(/mycar1.jpg);
}
case 4: image inside img tag
solution to case 4 as he appears in php simple html dom parser:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
Please help me to parse case 1,2,3.
If exist more cases please write them, with soltion if you can please.
Thanks
For Case 1:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Get the style attribute for the item
$style = $html->getElementById("id100")->getAttribute('style');
// $style = background:url(/mycar1.jpg)
// You would now need to put it into a css parser or do some regular expression magic to get the values you need.
For Case 2/3:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Get the Style element
$style = $html->find('head',0)->find('style');
// $style now contains an array of style elements within the head. You will need to work out using attribute selectors what whether an element has a src attribute, if it does download the external css file and parse (using a css parser), if it doesnt then pass the innertext to the css parser.
To extract <img> from the page you can try something like:
$doc = new DOMDocument();
$doc->loadHTML("<html><body>Foo<br><img src=\"bar.jpg\" title=\"Foo bar\" alt=\"alt\"></body></html>");
$xml = simplexml_import_dom($doc);
$images = $xml->xpath('//img');
foreach ($images as $img)
echo $img['src'] . ' ' . $img['alt'] . ' ' . $img['title'];
See doc for DOMDocument for more details.