I am working with a project that require the use of PHP Simple HTML Dom Parser, and I need a way to add a custom attribute to a number of elements based on class name.
I am able to loop through the elements with a foreach loop, and it would be easy to set a standard attribute such as href, but I can't find a way to add a custom attribute.
The closest I can guess is something like:
foreach($html -> find(".myelems") as $element) {
$element->myattr="customvalue";
}
but this doesn't work.
I have seen a number of other questions on similar topics, but they all suggest using an alternative method for parsing html (domDocument etc.). In my case this is not an option, as I must use Simple HTML DOM Parser.
Did you try it? Try this example (Sample: adding data tags).
include 'simple_html_dom.php';
$html_string = '
<style>.myelems{color:green}</style>
<div>
<p class="myelems">text inside 1</p>
<p class="myelems">text inside 2</p>
<p class="myelems">text inside 3</p>
<p>simple text 1</p>
<p>simple text 2</p>
</div>
';
$html = str_get_html($html_string);
foreach($html->find('div p[class="myelems"]') as $key => $p_tags) {
$p_tags->{'data-index'} = $key;
}
echo htmlentities($html);
Output:
<style>.myelems{color:green}</style>
<div>
<p class="myelems" data-index="0">text inside 1</p>
<p class="myelems" data-index="1">text inside 2</p>
<p class="myelems" data-index="2">text inside 3</p>
<p>simple text 1</p>
<p>simple text 2</p>
</div>
Well, I think it's too old post but still i think it will help somebody like me :)
So in my case I added custom attribute to an image tag
$markup = file_get_contents('pathtohtmlfile');
//Create a new DOM document
$dom = new DOMDocument;
//Parse the HTML. The # is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
#$dom->loadHTML($markup);
//Get all images tags
$imgs = $dom->getElementsByTagName('img');
//Iterate over the extracted images
foreach ($imgs as $img)
{
$img->setAttribute('customAttr', 'customAttrVal');
}
Related
I am attempting to get various elements inside of an li as shown below. I am pretty new to this so I may not be using the most efficient methods but this is where I have started...
EXAMPLE CODE SIMPLIFIED....
<li id='entry_0' title='09879879'>
<div ....>
<h2> The title text would go here </h2>
<span class='entrySize' ....> 20oz </span>
<span class='entryPrice' ....> $32.09 </span>
<span class='anotherEntry' ....> More Data I need To Grab </span>
.......
</div>
</li>
<li> .... With same structure as above .... 100's of entries like this </li>
I know how to pull individual parts separately but having trouble grasping how to do it grouped within a portion of the html.
$filename = "directory/file.html";
$html = file_get_html($filename);
for($i=0; $i<=count(entryNumber);$i++)
{
$li_id = "entry_".$i;
foreach($html->find('li[id='.$li_id.']') as $li) {
echo $li->innertext;
}
}
So this gets me the content in the line item tag with the id number as the unique attribute. I would like to grab the h2 text, entrySize, entryPrice etc as I iterate through the line item tags. What I don't understand is once I have the line item tag content how can I parse through that line item inner tags and attributes. There maybe other parts of the full HTML document that has tags with same id, class as these throughout the document so I am breaking this down to portions and than looking to parse each section at a time.
I would also like to pull the title attribute out of the title tag for the li tag.
I hope my explanation make sense.
You should probably use a DOM parser. PHP comes bundled with one, and there are many other's you could use.
http://php.net/dom
PHP Simple HTML DOM Parser
<?php
$html = file_get_content($page);
$doc = new DOMDocument();
$doc->loadHTML($html);
// now find what you need
$items = $dom->getElementsByTagName('li');
foreach ($items as $item) {
$id = $item->getAttribute('id');
if (strpos($id, 'item_') !== false) {
// found matchin li, grab its children
}
}
Use this as a baseline, we can't write all the code for you. Check out the PHP docs to finish this :) From what I have so far, you need to follow the docs to make it grab the child values, and handle them.
I want to get the HTML inside the parent element using php. For example, I have this structure:
<p>
<p>this is my first xml file </p>
</p>
and I want to get below text as a result.
<p>this is my first xml file </p>
Make use of a DOM Parser
<?php
$html='<p>
<p>this is my first xml file </p>
</p>';
$dom = new DOMDocument;
#$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('p') as $tag){
if(!empty($tag->nodeValue)){ echo $tag->nodeValue;}
}
I'm trying to get img and the div which is coming after the div which contains that img, all in one query.
So I did this:
$nodes = $xpath->query('//div[starts-with(#id, "someid")]/img |
//div[starts-with(#id, "someid")]/following-sibling::div[#class="spec_class"][1]/text()');
Now, I'm able to get the attributes of img tag, but I can't get the text of the following sibling. If I separate the query (two queries - first for the img and second query for the sibling) it works. But how can I do this with only one query? By the way, there is no error in the syntax. But somehow the union doesn't work or maybe I'm not extracting the sibling content right.
Here's the markup (which repeats many times with another text and id="someid_%randomNumber%)
<div id="someid_1">
<img src="link_to_image.png" />
...some text...
</div>
<div>...another text...</div>
<div class="spec_class">
...Important text...
</div>
I want to get in one query both link_to_image.png and ...Important text...
Your query seems correct.
Example XML:
<div>
<div id="someid-1"><img src="foo"/></div>
<div class="spec_class">bar</div>
<div class="spec_class">baz</div>
</div>
Example PHP Code:
$dom = new DOMDocument;
$dom->loadXml($xhtml);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//div…') as $node) {
echo $dom->saveXML($node);
}
Outputs (demo):
<img src="foo"/>bar
Note that you will have to iterate the DOMNodeList returned by the XPath query.
I have a HUGE HTML document that I need to parse.
The document is a list of <p> elements all (direct) children of the body tag.
The difference is the class name. The structure is like this:
<p class="first-level"></p>
<p class="second-level"></p>
<p class="third-level"></p>
<p class="third-level"></p>
<p class="nth-levels just-for-demo-1"></p>
<p class="nth-levels just-for-demo-1"></p>
<p class="third-level"></p>
<p class="second-level"></p>
<p class="third-level"></p>
<p class="nth-levels just-for-demo-2"></p>
<p class="first-level"></p>
<p class="second-level"></p>
<p class="second-level"></p>
<p class="third-level"></p>
And so on. nth-level can be any class name that isn't first-level, second-level or third-level.
Basically it's a multi-level <ul> element very poorly marked-up.
What I want to do is parse it and obtain all <p> elements (including tag, not just innerHTML) that are between one of the class names above.
In the example above, I want to get:
<p class="nth-levels just-for-demo-1"></p>
<p class="nth-levels just-for-demo-1"></p>
and
<p class="nth-levels just-for-demo-2"></p>
How the heck can I do that please?
Thank you.
Using XPath:
//p[not(#class='first-level')][not(#class='second-level')][not(#class='third-level')]
to get the (non?)matching nodes, then you can use this answerto get the outerHTML of the nodes.
Additionaly, if you're familiar with jQuery, then try jQuery port to PHP and you could have a powerful set of tools for matching a set of elements in a document (Selectors) as you used to be with jQuery along side with Hierarchy, Attribute Filters, Child Filters etc,Reference
$doc = new DOMDocument;
$doc->loadHTML(...);
$query = '//p[contains(#class, "just-for-demo-")]';
$xpath = new DOMXPath($doc);
$entries = $xpath->query($query);
foreach ($entries as $entry)
{
// not a best solution yet
$attribute = '';
foreach ($entry->attributes as $attr)
{
$attribute .= "{$attr->name}=\"{$attr->value}\"";
}
echo "<{$entry->nodeName}{$attribute}>{$entry->nodeValue}</{$entry->nodeName}>";
}
You could open the file (with fopen or something similar) and read one line at a time. Then just check if the required string is in the line (for example with strstr) and if yes, then add it to an array or do what you need with the line.
Note: this only works if the paragraphs are on different lines each.
fopen documentation
strstr documentation
given the following string in PHP:
$html = "<div>
<p><span class='test1 test2 test3'>text 1</span></p>
<p><span class='test1 test2'>text 2</span></p>
<p><span class='test1'>text 3</span></p>
<p><span class='test1 test3 test2'>text 4</span></p>
</div>";
I just want to either empty or remove any class that has "test2" in it, so the result would be this:
<div>
<p><span class=''>text 1</span></p>
<p><span class=''>text 2</span></p>
<p><span class='test1'>text 3</span></p>
<p><span class=''>text 4</span></p>
</div>
of if you're removing the element:
<div>
<p>text 1</p>
<p>text 2</p>
<p><span class='test1'>text 3</span></p>
<p>text 4</p>
</div>
I'm happy to use a regex expression or something like PHP Simple HTML DOM Parser, but I have no clue how to use it. And with regex, I know how to find the element, but not the specific attribute associated w/ it, especially if there are multiple attributes like my example above. Any ideas?
The DOMDocument class is a very straight-forward and easy-to-understand interface designed to assist you in working with your data in a DOM-like fashion. Querying your DOM with xpath selectors should be the task(s) all the more trivial:
Clear All Classes
// Build our DOMDocument, and load our HTML
$doc = new DOMDocument();
$doc->loadHTML($html);
// Preserve a reference to our DIV container
$div = $doc->getElementsByTagName("div")->item(0);
// New-up an instance of our DOMXPath class
$xpath = new DOMXPath($doc);
// Find all elements whose class attribute has test2
$elements = $xpath->query("//*[contains(#class,'test2')]");
// Cycle over each, remove attribute 'class'
foreach ($elements as $element) {
// Empty out the class attribute value
$element->attributes->getNamedItem("class")->nodeValue = '';
// Or remove the attribute entirely
// $element->removeAttribute("class");
}
// Output the HTML of our container
echo $doc->saveHTML($div);
using the PHP Simple HTML DOM Parser
Updated and tested!
You can get the simple_html_dom.php include from the above link or here.
for both cases:
include('../simple_html_dom.php');
$html = str_get_html("<div><p><span class='test1 test2 test3'>text 1</span></p>
<p><span class='test1 test2'>text 2</span></p>
<p><span class='test1'>text 3</span></p>
<p><span class='test1 test3 test2'>text 4</span></p></div>");
case 1:
foreach($html->find('span[class*="test2"]') as $e)
$e->class = '';
echo $html;
case 2:
foreach($html->find('span[class*="test2"]') as $e)
$e->parent()->innertext = $e->plaintext;
echo $html;
$notest2 = preg_replace(
"/class\s*=\s*'[^\']*test2[^\']*'/",
"class=''",
$src);
C.
You can use any DOM Parser, iterate over every element. Check whether its class attribute contains test2 class (strpos()) if so then set empty string as a value for class attribute.
You can also use regular expressions to do that - much shorter way. Just find and replace (preg_replace()) using the following expression: #class=".*?test2.*?"#is