Get and set inline styles with PHP - php

Is there a way to get and set inline styles of DOM elements inside an HTML fragment with PHP? Example:
<div style="background-color:black"></div>
I need to get whether the background-color is black and if it is, change it to white. (This is an example and not the actual goal)
I tried phpQuery, but it lacks the .css() method, while the branch that implements it doesn't seem to work (at least for me).
Basically, what I need is a port of jQuery's .css() method to PHP.

Per Ryan P's good suggestion above, the PHP DOM functions may help you out. Something like this might do what you want with that particular example.
$my_url = 'index.php';
$dom = new DOMDocument;
$dom->loadHTMLfile($my_url);
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
$div_style = $div->getAttribute('style');
if ($div_style && $div_style=='background-color:black;') {
$div->setAttribute('style','background-color:white;');
}
}
echo $dom->saveHTML();

Related

php extract body tag content

I'm trying what should be very easy, but I can't get it to work. Which makes me wonder if I'm using the right workflow.
I have a simple html page which I load in my desktop application as a help file. This page has no menu just the content.
On my website I want to have a more sophisticated help system. So I want to use a php file which will show a menu, breadcrums and a header and footer.
To not duplicate my help content I want to load the original HTML help file and add its body content to my enhanced help page.
I'm using this code to extract the title:
function getURLContent($filename){
$url = realpath(dirname(__FILE__)) . DIRECTORY_SEPARATOR . $filename;
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
#$doc->loadHTMLFile($url);
return $doc;
}
function getSingleElementValue($element){
if (!is_null($element)) {
$node = $element->childNodes->item(0);
return $node->nodeValue;
}
}
$doc = getURLContent("test.html");
$title = getSingleElementValue($doc->getElementsByTagName('title')->item(0));
echo $title;
The title is correctly extracted.
Now I try to extract the body:
function getBodyContent($element){
$mock = new DOMDocument;
foreach ($element->childNodes as $child){
$mock->appendChild($mock->importNode($child, true));
}
return $mock->saveHTML();
}
$body = getBodyContent($doc->getElementsByTagName('body')->item(0));
echo $body;
The getBodyContent() function is one of the several options I tried.
All of them return the whole HTML tag, including the HEAD tag.
My question is: Is this a correct workflow or should I use something else?
Thanks.
Update: My final goal is to have a website with multiple pages that has the help files accessible via a menu. These pages will be generated using something like generate.php?page=test.html. I'm not yet at this part. The goal is also to not duplicate the content of test.html because this file will be used in my desktop application (using a web control). In my desktop application I don't need the menu and such.
Update #2: I had to add <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> to the html-file I want to read and now I do get the body content. Unfortunaly all tags are strips. I'll need to fixed that as well.
The problem is that saveHTML() will return an actual document. You don't want this. Instead, you want just what you put in.
Thankfully, you can do this much more easily.
function getBodyContent(DOMNode $element) {
$doc = $element->ownerDocument;
$wrapper = $doc->createElement('div');
foreach( $element->childNodes as $child) {
$wrapper->appendChild($child);
}
$element->appendChild($wrapper);
$html = $doc->saveHTML($wrapper);
return substr($html, strlen("<div>"), -strlen("</div>"));
}
This wraps the contents into a single element of known tag representation (the body may have attributes that make it unknown), gets the rendered HTML from that element, and strips off the known tag of the wrapper.
I'd also like to suggest an improvement to getSingleElementValue:
function getSingleElementValue(DOMNode $element) {
return trim($element->textContent);
}
Note also the use of type hints to ensure that your functions are indeed getting the kind of thing that is expected - this is useful as it means we no longer need to check "does $element->ownerDocument exist? does $element->ownerDocument->saveHTML() do what we think it does?" and other such questions. It ensures we have a DOMNode, so we know it has those things.

PHP Modify an included file

I have a bunch of .html files that I am including on a page. Conditionally, I need to add classes to some of the components in these files, for example:
<div id='foo' class='bar'></div>
to
<div id='foo' class='bar bar2'></div>
I know I can do this with some inline PHP like this
<div id='foo' class="bar <?php echo " bar2"; ?>"></div>
However, having PHP in any of the files I'm including is not an option.
I also looked into including a file and then modifying afterward, but that doesn't seem possible. Then I was thinking I should read the files line-by-line, and add it in then.
Is there a nicer way I'm not thinking of?
Since having PHP is not an option, you could use PHP's DOM Parser with an XPath selector:
$dom = new DOMDocument();
$dom->loadHTMLFile($htmlFile);
$finder = new DomXPath($dom);
// getting the class name using XPath
$nodes = $finder->query("//*[contains(#class, 'bar')]");
// changing the class name using setAttribute
foreach ($nodes as $node) {
$node->setAttribute('class', 'barbar2');
}
// modified HTML source
$html = $dom->saveHTML();
That should get you started.
You can use the DOMDocument class in PHP to retreive the information from the file and then add attributes and data.
I don't really remember the code for DOMDocument so I haven't included any code here (sorry), but here are some links:
Use this method to get the HTML from your file:
http://php.net/manual/en/domdocument.loadhtmlfile.php
Review the DOMDocument class:
http://php.net/manual/en/class.domdocument.php
You may need to use .php instead of .html.
So do like below:
$variableClass="bar2";
include("htmlfilename.html");
where the htmlfile.html consists of
<div id='foo' class="bar <?php echo $variableClass; ?>"></div>
Depends on what you actually want to achieve - but basically this tends to be better solved by jQuery on the client.
But anyway you might put your HTML fragments in a DOM object, analyze and modify it, and read the HTML back after the modifications, for example:
// including an HTML file writes to the output stream, so buffer this
ob_start();
include('myfile.html');
$html = ob_get_clean();
// make a DOMDocument
$doc = new DOMDocument();
$doc->loadHTML($html);
// make the changes you need to
$xpath = new DOMXPath($doc);
$nodelist $xpath->query('//div[#id="foo"]');
// etc...
// get modified HTML
$html = $doc->saveHTML();
Hope this helps.

manipulate html navigation with php dom

I need to add classes to the navigation HTML being output from a function in a custom CMS.
The only way I can get the output I need is to parse the HTML with PHP.
I am using PHP's DOM methods to look through the HTML and add a class to any <li> element that contains a child <ul> (top level navigation items).
So far it's working, but I have 2 questions:
Is there a more efficient way for me to go through this DOM data? It seems cumbersome to me, but that could just be my lack of experience.
In some cases, my <li> elements may already have a class, how can I add to the existing class attribute without destroying what may or may not already be there?
-
<?
$mcms_nav = getContent(
// call to cms that returns navigation html as a string
// ex. <ul id="pnav"><li>home</li>....</ul>
);
$dom = new DOMDocument();
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML($mcms_nav);
$x = new DOMXPath($dom);
foreach($x->query('//ul/li/ul') as $node)
{
$parent = $node->parentNode;
$parent_attr = $dom->createAttribute('class');
$parent_attr->value = 'has-flyout';
$parent->appendChild($parent_attr);
$flyout_attr = $dom->createAttribute('class');
$flyout_attr->value = 'flyout';
$node->appendChild($flyout_attr);
}
$mcms_nav = $dom->getElementByID('pnav');
echo $dom->saveHTML($mcms_nav);
?>
Not really. You could take the XML class from the CakePHP framework, turn this into an array, manipulate the array, and turn it back. Not sure if that's an option in your case. http://book.cakephp.org/2.0/en/core-utility-libraries/xml.html
You can use dom->hasAttribute() and dom->getAttribute() to get the existing attribute contents if they exist.
Also, a new job wouldn't hurt ;)

How to remove style tags from divs with certain class

How can I (with PHP) remove the style attributes from DIVS having a certain class?
Because of an 'Drag&Drop' process some DIV elements get polluted with unnecessary styles which can lead to problems later on.
I know I can remove the style attributes with JavaScript after a 'Drag&Drop' process, but I only remove them when to HTML is being processed by the server (For sending the HTML as an e-mail).
This isn't a particularly difficult problem, so far as I can tell. You need to load the HTML into a DOMDocument structure, then use a simple XPath attribute selector to find the relevant elements and DOMElement::removeAttribute to remove the style attribute. Your code might look like this:
$dom = new DOMDocument;
$dom->loadHTML($dirtyHtml);
$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[#class="someclass"]');
foreach ($divs as $div) {
$div->removeAttribute('style');
}
$cleanHtml = $dom->saveHTML();

How do I get the link element in a html page with PHP

First, I know that I can get the HTML of a webpage with:
file_get_contents($url);
What I am trying to do is get a specific link element in the page (found in the head).
e.g:
<link type="text/plain" rel="service" href="/service.txt" /> (the element could close with just >)
My question is: How can I get that specific element with the "rel" attribute equal to "service" so I can get the href?
My second question is: Should I also get the "base" element? Does it apply to the "link" element? I am trying to follow the standard.
Also, the html might have errors. I don't have control on how my users code there stuff.
Using PHP's DOMDocument, this should do it (untested):
$doc = new DOMDocument();
$doc->loadHTML($file);
$head = $doc->getElementsByTagName('head')->item(0);
$links = $head->getElementsByTagName("link");
foreach($links as $l) {
if($l->getAttribute("rel") == "service") {
echo $l->getAttribute("href");
}
}
You should get the Base element, but know how it works and its scope.
In truth, when I have to screen-scrape, I use phpquery. This is an older PHP port of jQuery... and what that may sound like something of a dumb concept, it is awesome for document traversal... and doesn't require well-formed XHTMl.
http://code.google.com/p/phpquery/
I'm working with Selenium under Java for Web-Application-Testing. It provides very nice features for document traversal using CSS-Selectors.
Have a look at How to use Selenium with PHP.
But this setup might be to complex for your needs if you only want to extract this one link.

Categories