PHP DOM: doesnt load css stylies - php

I have this code which getting html code of page and than replace all the HREF attributes of the A' tag to redirect it to my site , than my site load the page and again redirect the links and so on...
<?php
libxml_use_internal_errors(true); // hide the parsing errors
$dom = new DOMDocument; // init new DOMDocument
if($_GET){
$dom->loadHtmlFile($_GET['open']); // getting link to redirect to
}else{
$dom->loadHtmlFile('http://www.stackoverflow.com'); // getting default site
}
$dom->loadHtmlFile('http://www.stackoverflow.com'); // load HTML into it
$xpath = new DOMXPath($dom); // create a new XPath
$nodes = $xpath->query('//a[#href]'); // Find all A elements with a href attribute
foreach($nodes as $node) { // Iterate over found elements
$node->setAttribute('href', 'index.php?open=http://www.stackoverflow.com'.$node->getAttribute('href')); // Change href attribute
}
echo $dom->saveXml(); // output cleaned HTML
?>
the code is perfectly running , the only problem is that it won't load CSS files somehow..
you're more than welcome to test this code and see what's the problems!
here is online version: http://browser.breet.co.il
thank you in advance!

Use saveHTML() instead of saveXml()
Using the last one, there's an xml definition at the start of the printed code so it doesn't parse correctly.

Related

Change src atribute from img, using Simple HTML Dom php library

I'm totally new to php, and I'm having a hard time changing the src attribute of img tags.
I have a website that pulls a part of a page using Simple Html Dom php, here is the code:
<?php
include_once('simple_html_dom.php');
$html = file_get_html('http://www.tabuademares.com/br/bahia/morro-de-sao-paulo');
foreach($html ->find('img') as $item) {
$item->outertext = '';
}
$html->save();
$elem = $html->find('table[id=tabla_mareas]', 0);
echo $elem;
?>
This code correctly returns the part of the page I want. But when I do this the img tags comes with the src of the original page: /assets/svg/icon_name.svg
What I want to do is change the original src so that it looks like this: http://www.mywebsite.com/wp-content/themes/mytheme/assets/svg/icon_name.svg
I want to put the url of my site in front of assets / svg / icon_name.svg
I already tried some tutorials, but I could not make any work.
Could someone please kind of help a noob in php?
i could make it work. So if someone have the same question, here is how i managed to get the code working.
<?php
// Note you must download the php files simple_html_dom.php from
// this link https://sourceforge.net/projects/simplehtmldom/files/
//than include them
include_once('simple_html_dom.php');
//target the website
$html = file_get_html('http://the_target_website.com');
//loop thru all images of the html dom
foreach($html ->find('img') as $item) {
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $item->src;
// Set a attribute
$item->src = 'http://yourwebsite.com/'.$value;
}
//save the variable
$html->save();
//findo on html the div you want to get the content
$elem = $html->find('div[id=container]', 0);
//output it using echo
echo $elem;
?>
That's it!
did you read the documentation for read and modify attributes
As per that
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $e->href;
// Set a attribute
$e->href = 'ursitename'.$value;

scraping images from url using php

i am trying to make a page that allows me to grab and save images from another link , so here's what i want to add on my page:
text box (to enter url that i want to get images from).
save dialog box to specify the path to save images.
but what i am trying to do here i want to save images only from that url and from inside specific element.
for example on my code i say go to example.com and from inside of element class="images" grab all images.
notes: not all images from the page, just from inside the element
whether element has 3 images in it or 50 or 100 i don't care.
here's what i tried and worked using php
<?php
$html = file_get_contents('http://www.tgo-tv.net');
preg_match_all( '|<img.*?src=[\'"](.*?)[\'"].*?>|i',$html, $matches );
echo $matches[ 1 ][ 0 ];
?>
this gets image name and path but what i am trying to make is a save dialog box and the code must save image directly into that path instead of echo it out
hope you understand
Edit 2
it's ok of Not having save dialog box. i must specify save path from the code
If you want something generic, you can use:
<?php
$the_site = "http://somesite.com";
$the_tag = "div"; #
$the_class = "images";
$html = file_get_contents($the_site);
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//'.$the_tag.'[contains(#class,"'.$the_class.'")]/img') as $item) {
$img_src = $item->getAttribute('src');
print $img_src."\n";
}
Usage:
Change the site, tag, which can be a div, span, a, etc. also change the class name.
For example, change the values to:
$the_site = "https://stackoverflow.com/questions/23674744/what-is-the-equivalent-of-python-any-and-all-functions-in-javascript";
$the_tag = "div"; #
$the_class = "gravatar-wrapper-32";
Output:
https://www.gravatar.com/avatar/67d8ca039ee1ffd5c6db0d29aeb4b168?s=32&d=identicon&r=PG
https://www.gravatar.com/avatar/24da669dda96b6f17a802bdb7f6d429f?s=32&d=identicon&r=PG
https://www.gravatar.com/avatar/24780fb6df85a943c7aea0402c843737?s=32&d=identicon&r=PG
Maybe you should try HTML DOM Parser for PHP. I've found this tool recently and to be honest it works pretty well. It was JQuery-like selectors as you can see on the site. I suggest you to take a look and try something like:
<?php
require_once("./simple_html_dom.php");
foreach ($html->find("<tag>") as $<tag>) //Start from the root (<html></html>) find the the parent tag you want to search in instead of <tag> (e.g "div" if you want to search in all divs)
{
foreach ($<tag>->find("img") as $img) //Start searching for img tag in all (divs) you found
{
echo $img->src . "<br>"; //Output the information from the img's src attribute (if the found tag is <img src="www.example.com/cat.png"> you will get www.example.com/cat.png as result)
}
}
?>
I hope i helped you less or more.

How to get data or value from any div in php

i Have create php page where use many div with different id name.
so i want to get data or value from one div.
Here am showing one div with id name
i want to get data or value from this div.
<div id="tablename">tablename</div>
i have use this but its not working.
$doc = new DomDocument();
$thediv = $doc->getElementById('tablename');
echo $thediv->textContent;
So please tell me how can i get this value from my div?
You need to pass the whole content of your page to the class, otherwise, it can't select nothing since it thinks the document is empty:
$content = '<div id="tablename"></div>';
$doc = new DomDocument();
$doc->loadHTML($content); // That's the addition
$thediv = $doc->getElementById('tablename');
echo $thediv->textContent;
More info:
loadHTML(): Load the HTML from a string.
loadHTMLFile(): Load the HTML from a file.
Downloaded and include PHP Simple HTML DOM Parser from https://sourceforge.net/projects/simplehtmldom/files/ and
Try this
include 'simple_html_dom.php';
$html = file_get_html("http://www.facebook.com");
$displaybody = $html->find('div[id=blueBarDOMInspector]', 0)->plaintext;
echo $displaybody ;exit;

Insert HTML codes to specific location

I know this topic was posted everywhere, but their question is not I want. I want to insert some HTML codes before the page is loaded without touching the original code in the page.
Suppose my header was rendered by a function called render_header():
function render_body() {
return "<body>
<div class='container'>
<div class='a'>A</div>
<div class='b'>B</div>
</div>
</body>";
}
From now, I want to insert HTML codes using PHP without editing the render_body(). I want a function that insert some divs to container'div.
render_body();
<?php *//Insert '<div class="c" inside div container* ?>
Just as an alternative using XPath - this should load in the output from render_body() to an XML (DOMDocument) object and create an XPath object to query your HTML so you can easily work out where you want to insert the new HTML.
This will probably only work if you're using XML well formed HTML though.
//read in the document
$xml = new DOMDocument();
$xml->loadHTML(render_body());
//create an XPath query object
$xpath = new DOMXpath($xml);
//create the HTML nodes you want to insert
// using $xml->createElement() ...
//find the node to which you want to attach the new content
$xmlDivClassA = $xpath->query('//body/div[#class="a"]')->item(0);
$xmlDivClassA->appendChild( /* the HTML nodes you've previously created */ );
//output
echo $xml->saveHTML();
Took a little while as I had to refer to the documentation ... too much JQuery lately it's ruining my ability to manipulate the DOM without looking things up :\
The only thing I can think of is to turn on output buffering and then use the DOMDocument class to read in the entire buffer and then make changes to it. It is worth doing some reading of the documentation (http://www.php.net/manual/en/book.dom.php) provided in the script...
ie.:
<?php
function render_body() {
return "<body>
<div class='container'>
<div class='a'>A</div>
<div class='b'>B</div>
</div>
</body>";
}
$dom = new DOMDocument();
$dom->loadHTML(render_body());
// get body tag
$body = $dom->getElementsByTagName('body')->item(0);
// add a new element at the end of the body
$element = $dom->createElement('div', 'My new element at the end!');
$body->appendChild($element);
echo $dom->saveHTML(); // echo what is in the dom
?>
EDIT:
As per CD001's suggestions, I have tested this code and it works.

get the href value of a specific element and load it

I'm using jquery to add rel=brochure using $('.imageOuter a').attr('rel', 'brochure') this works as expected.
However, I want to grab the link that has rel as brochure. I'm trying to do this with loadHTML, as below:
function getBrochureLink() {
$doc = new DOMDocument();
$doc->loadHTML($file);
$area = $doc->getElementsByTagName('body')->item(0);
$links = $area->getElementsByTagName("link");
foreach($links as $l) {
if($l->getAttribute("rel") == "brochure") {
$brochureLink = $l->getAttribute("href");
}
}
}
Sadly $brochureLink is empty and not grabbing it.
Your issue is that the attr is set via Javascript. When you retrieved the page's contents via loadHTML, the JS was not executed, so you can't find the matching link.
You'll have to either run the JS on the server side, put the attr into the DOM directly without JS, or find another architecture for whatever you're attempting to accomplish.

Categories