Getting content from external div

Getting content from external div - php

I need to get content from external page.
For example:
Let's use this site: https://en.wikipedia.org/wiki/Main_Page I need to get
only content of "On this day..." so it means div with id="mp-otd"
How can I do that with PHP?

You can do this by suing PHP DOM parser
include_once('simple_html_dom.php');
$html = file_get_html('https://en.wikipedia.org/wiki/Main_Page');
$div_content = $html->find('div[id=mp-otd]', 0);

Need to download library from http://simplehtmldom.sourceforge.net/
for example
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find specific
foreach($html->find('div #mp-otd') as $element)
echo $element->innertext . '<br>';

Related

Change src atribute from img, using Simple HTML Dom php library

I'm totally new to php, and I'm having a hard time changing the src attribute of img tags.
I have a website that pulls a part of a page using Simple Html Dom php, here is the code:
<?php
include_once('simple_html_dom.php');
$html = file_get_html('http://www.tabuademares.com/br/bahia/morro-de-sao-paulo');
foreach($html ->find('img') as $item) {
$item->outertext = '';
}
$html->save();
$elem = $html->find('table[id=tabla_mareas]', 0);
echo $elem;
?>
This code correctly returns the part of the page I want. But when I do this the img tags comes with the src of the original page: /assets/svg/icon_name.svg
What I want to do is change the original src so that it looks like this: http://www.mywebsite.com/wp-content/themes/mytheme/assets/svg/icon_name.svg
I want to put the url of my site in front of assets / svg / icon_name.svg
I already tried some tutorials, but I could not make any work.
Could someone please kind of help a noob in php?

i could make it work. So if someone have the same question, here is how i managed to get the code working.
<?php
// Note you must download the php files simple_html_dom.php from
// this link https://sourceforge.net/projects/simplehtmldom/files/
//than include them
include_once('simple_html_dom.php');
//target the website
$html = file_get_html('http://the_target_website.com');
//loop thru all images of the html dom
foreach($html ->find('img') as $item) {
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $item->src;
// Set a attribute
$item->src = 'http://yourwebsite.com/'.$value;
}
//save the variable
$html->save();
//findo on html the div you want to get the content
$elem = $html->find('div[id=container]', 0);
//output it using echo
echo $elem;
?>
That's it!

did you read the documentation for read and modify attributes
As per that
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $e->href;
// Set a attribute
$e->href = 'ursitename'.$value;

Issue with php simple html DOM parser in Joomla

I try to isert a stock-chart-module from an orther site into my own website.
As i use:
jimport('simplehtmldom.simple_html_dom');
// get DOM from URL or file
$html = file_get_html('http://www.raiffeisen.com/');
foreach($html->find('div#agrarfenster') as $element)
echo $element->innertext;
The Output will work. But i need this Code for the required output:
jimport('simplehtmldom.simple_html_dom');
// get DOM from URL or file
$html = file_get_html('http://www.raiffeisen.com/');
foreach($html->find('div#boersenfenster_bf_4562') as $element)
echo $element->innertext;
This Code would'nt work. But why?
My guess is that there are those underscores in the "boersenfenster_bf_4562".
Can somebody help me?

parsing html page using php to find out text on which link is assiged

say i have html code like this
$html = "This is some stuff right here. OH MY GOSH";
i am trying to get values of href and also on which anchor work i mean check this out text i am able to get href value by following this code
$displaybody->find('a ') as $element;
echo $element;
well it works for me but how do i get value of check this out could you guys help me out. i did search but i am not able to find it out . thanks in advance
my actual html look like this
» Download MP4 « - <b>144p (Video Only)</b> - <span> 19.1</span> MB<br />
my href look like this above code return download mp4 and i want it like downloadmp4 114p (video only) 19.1 mb how do i do that

If what you are using now is the SimpleHTMLDOM, then ->innertext works fine on that anchor elements that you have found:
include 'simple_html_dom.php';
$html = "This is some stuff right here. OH MY GOSH";
$displaybody = str_get_html($html);
foreach($displaybody->find('a ') as $element) {
echo $element->innertext . '<br/>';
}
If you were referring to PHP's DOMDocument, then its not find() function you need to use, to target each anchor element, you need to use ->getElementsByTagName(), then each selected elements you need to use ->nodeValue:
$html = "This is some stuff right here. OH MY GOSH";
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $element) {
echo $element->nodeValue . '<br/>';
}

How to find specific query in the URL and display whole link

Current code is like this :
include 'simple_html_dom.php';
// Create DOM from URL or file
$html = file_get_html('http://www.AnyLinkAlsoCan.com');
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
It will crawl and find tag like this :
<a href="http://news.example.com/node">
And will output all link it found on the website.
Example
http://news.example.com.my/node/321072
http://news.example.com.my/taxonomy/term/2
http://news.example.com.my/node/321060?tid=2
I want to search url that contain only ?tid= as you can see on the 3rd URL in the example.
http://news.example.com.my/node/321060?tid=2
I replace echo $element->href="*?tid but that just return error. Can someone help me with this?

You can use preg_match or you can check all urls taken if they contain ?tid
<?php
include 'simple_html_dom.php';
// Create DOM from URL or file
$html = file_get_html('http://www.AnyLinkAlsoCan.com');
// Find all links
foreach($html->find('a') as $element) {
$search = '?tid';
if(strpos($element->href,$search)) {
echo $element->href . '<br>';
}
}
?>

Use parse_url() to parse each url and then only select ones you want based on PHP_URL_QUERY

How should parse with PHP (simple html dom parser) background images and other images of webpage?

How should parse with PHP (simple html dom/etc..) background and other images of webpage?
case 1: inline css
<div id="id100" style="background:url(/mycar1.jpg)"></div>
case 2: css inside html page
<div id="id100"></div>
<style type="text/css">
#id100{
background:url(/mycar1.jpg);
}
</style>
case 3: separate css file
<div id="id100" style="background:url(/mycar1.jpg);"></div>
external.css
#id100{
background:url(/mycar1.jpg);
}
case 4: image inside img tag
solution to case 4 as he appears in php simple html dom parser:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
Please help me to parse case 1,2,3.
If exist more cases please write them, with soltion if you can please.
Thanks

For Case 1:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Get the style attribute for the item
$style = $html->getElementById("id100")->getAttribute('style');
// $style = background:url(/mycar1.jpg)
// You would now need to put it into a css parser or do some regular expression magic to get the values you need.
For Case 2/3:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Get the Style element
$style = $html->find('head',0)->find('style');
// $style now contains an array of style elements within the head. You will need to work out using attribute selectors what whether an element has a src attribute, if it does download the external css file and parse (using a css parser), if it doesnt then pass the innertext to the css parser.

To extract <img> from the page you can try something like:
$doc = new DOMDocument();
$doc->loadHTML("<html><body>Foo<br><img src=\"bar.jpg\" title=\"Foo bar\" alt=\"alt\"></body></html>");
$xml = simplexml_import_dom($doc);
$images = $xml->xpath('//img');
foreach ($images as $img)
echo $img['src'] . ' ' . $img['alt'] . ' ' . $img['title'];
See doc for DOMDocument for more details.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Getting content from external div - php

I need to get content from external page. For example: Let's use this site: https://en.wikipedia.org/wiki/Main_Page I need to get only content of "On this day..." so it means div with id="mp-otd" How can I do that with PHP?

You can do this by suing PHP DOM parser include_once('simple_html_dom.php'); $html = file_get_html('https://en.wikipedia.org/wiki/Main_Page'); $div_content = $html->find('div[id=mp-otd]', 0);

Need to download library from http://simplehtmldom.sourceforge.net/ for example // Create DOM from URL or file $html = file_get_html('http://www.google.com/'); // Find specific foreach($html->find('div #mp-otd') as $element) echo $element->innertext . '<br>';

Related

Change src atribute from img, using Simple HTML Dom php library

Issue with php simple html DOM parser in Joomla

parsing html page using php to find out text on which link is assiged

How to find specific query in the URL and display whole link

How should parse with PHP (simple html dom parser) background images and other images of webpage?

Categories

Resources