The simple_html_dom library is great for getting known attributes, but is there a way to get a list of all the attributes for an element?
For example, if I have:
<div id="test" custom1="custom" custom2="custom">
I can easily get the id:
$el = $html->find('div');
$id = $el->id;
But, is it possible to get custom1 and custom2 if they are not known ahead of time? Ideally, the solution would produce an array of the NVP's for all attributes (id, custom1, custom2).
$el->attr is an associated array of tag=>value s
You can use get_object_vars to get an associative array, and then loop over them.
$attrs = get_object_vars($el);
foreach($attrs as $key=>$value) {
}
Related
I need to save some values from XML.
First step - I get the structure:
$xml = $dom_xml->saveXML();
$xml_ = new \SimpleXMLElement($xml);
dd($xml_);
Here TextFrame has 8 arrays. Each of them has PathPointType, which has
4 more arrays with 3 attributes each. And these attributes I need from each TextFrame.
I can get, for instance, Anchor value doing this:
$res = $xml_
->Spread
->TextFrame
->Properties
->PathGeometry
->GeometryPathType
->PathPointArray
->PathPointType
->attributes();
dd($res['Anchor']);
(BTW: is there more prettier way to get it?)
But the question is - how is it possible to loop through all arrays and save values separately for each array?
I assume here has to be a multidimensional foreach loop in conjunction with for loop?
Or is better to achieve it using DOMDocument?
As it looks as though you are starting off with DOMDocument (as you are using $dom_xml->saveXML() to generate the XML), it may be easier to continue using it and it also has some easy features for getting the details your after.
Using getElementsByTagName() allows you to get a list of the elements with a specific tag name from a start point, so starting with $dom_xml, get all of the <TextFrame> elements. Then foreach() over this list and using this element as a start point, use getElementsByTagName("PathPointType") to get the nested <PathPointType> elements. At this point you can then use getAttribute("Anchor") for each of the attributes you need from the <PathPointType> elements...
$textFrames = $dom_xml->getElementsByTagName("TextFrame");
foreach ( $textFrames as $frame ) {
$pathPointTypes = $frame->getElementsByTagName("PathPointType");
foreach ( $pathPointTypes as $type ) {
echo $type->getAttribute("Anchor").PHP_EOL;
}
}
Edit
You can extend the code to build an array of frames and then the anchors within that. This code also stores the anchor in an associative array so that if you add the other attributes, you can add them here (or remove it if you don't need another layer of detail)...
$frames =[];
foreach ( $textFrames as $frame ) {
$anchors = [];
$pathPointTypes = $frame->getElementsByTagName("PathPointType");
foreach ( $pathPointTypes as $type ) {
$anchors[] = ['Anchor' => $type->getAttribute("Anchor")];
}
$frames[] = $anchors;
}
Also if you have some way of identifying the frames, you could create an associative array at that level as well...
$frames[$frameID] = $anchors;
As a complement to the existing answer from Nigel Ren, I thought I'd show how the same loops look with SimpleXML.
Firstly, note that you don't need to convert the XML to string and back if you want to switch between DOM and SimpleXML for any reason, you can use simplexml_import_dom which just swaps out the interface:
$sxml = simplexml_import_dom($dom_xml);
Next we need our TextFrame elements; we could either step through the structure explicitly, as you had before:
$textFrames = $sxml->Spread->TextFrame;
Or we could use XPath to search for matching tag names within our current node (. is the current element, and // means "any descendant":
$textFrames = $sxml->xpath('.//TextFrame');
The first will give you a SimpleXMLElement object, and the second an array, but either way, you can use foreach to go through the matches.
This time we definitely want an XPath expression to get the PathPointType nodes, to avoid all the nested loops through levels we're not that interested in:
foreach ( $textFrames as $frame ) {
$pathPointTypes = $frame->xpath('.//PathPointType');
foreach ( $pathPointTypes as $type ) {
echo $type['Anchor'] . PHP_EOL;
}
}
Note that you don't need to call $type->attributes(); unless you're dealing with namespaces, all you need to get an attribute is $node['AttributeName']. Beware that attributes in SimpleXML are objects though, so you'll often want to force them to be strings with (string)$node['AttributeName'].
To take the final example, you might then have something like this:
$frames = [];
foreach ( $sxml->Spread->TextFrame as $frame ) {
$anchors = [];
$pathPointTypes = $frame->xpath('.//PathPointType');
foreach ( $pathPointTypes as $type ) {
$anchors[] = ['Anchor' => (string)$type['Anchor']];
}
$frames[] = $anchors;
}
I am using Simple HTML Dom parser to get an element from an HTML string using it's class name, like:
foreach ($html->find('div[class=news-div]')) {
$news = $news-div;
}
But I also need to get two elements (one is span and the other is a) that occur just before the $news, but they don't have an id that can be predicted because it is calculated dynamically, and they don't have a unique class name.
How can I extract the two adjacent elements occurring before $news-div?
SimpleHTML has prev_sibling and next_sibling methods
$elems = $html->find('div[class=news-div]');
foreach ( $elems as $news ) {
$prev_span = $news->prev_sibling();
$prev_a = $prev_span->prev_sibling();
}
I'm not finding a way to retrieve all elements that have an attribute ec:edit. I've only found examples getting namespaced elements, but not attributes.
And there is also no result when searching the attributes with attr() or hasAttr().
dbpedia example:
foreach ($qp->branch()->find('foaf|page') as $img) {
print $img->attr('rdf:resource') . PHP_EOL;
}
rdf file sample:
<dbpprop:artist rdf:resource="http://dbpedia.org/resource/The_Beatles" />
But this won't retrieve any results:
$edits = $htmldocument->find('div[mc|edit];
foreach ($edits as $key => $value) {
echo $value->attr('mc:edit');
}
sample data:
<div mc:edit="stuff"> // etc
I get nothing.
Ok, lambdas solve everything:
find('div')->filterLambda('return qp($item)->hasAttr("mc:edit");');
So I have a HTML string like this:
<td class="name">
Some Name
</td>
<td class="name">
Some Name2
</td>
Using XPath I'm able to get value of href attribute using this Xpath query:
$domXpath = new \DOMXPath($this->domPage);
$hrefs = $domXpath->query("//td[#class='name']/a/#href");
foreach($hrefs as $href) {...}
And It's even easier to get a text value, like this:
// Xpath auto. strips any html tags so we are
// left with clean text value of a element
$domXpath = new \DOMXPath($this->domPage);
$names = $domXpath->query("//td[#class='name']/");
foreach($names as $name) {...}
Now I'm curious to know, how can I combine those two queries to get both values with only one query (If it's something like that even posible?).
Fetch
//td[#class='name']/a
and then pluck the text with nodeValue and the attribute with getAttribute('href').
Apart from that, you can combine Xpath queries with the Union Operator | so you can use
//td[#class='name']/a/#href|//td[#class='name']
as well.
To reduce the code to a single loop, try:
$anchors = $domXpath->query("//td[#class='name']/a");
foreach($anchors as $a)
{
print $a->nodeValue." - ".$a->getAttribute("href")."<br/>";
}
As per above :) Too slow ..
Simplest way, evaluate is for this task!
The simplest way to obtain a value is by evaluate() method:
$xp = new DOMXPath($dom);
$v = $xp->evaluate("string(/etc[1]/#stringValue)");
Note: important to limit XPath returns to 1 item (the first a in this case), and cast the value with string() or round(), etc.
So, in a set of multiple items, using your foreach code,
$names = $domXpath->query("//td[#class='name']/");
foreach($names as $contextNode) {
$text = $domXpath->evaluate("string(./a[1])",$contextNode);
$href = $domXpath->evaluate("string(./a[1]/#href)",$contextNode);
}
PS: this example is only for evaluate's illustration... When the information already exists at the node, use what offers best performance, as methods getAttribute(), saveXML(), etc. and properties as $nodeValue, $textContent, etc. supplied by DOMNode. See #Gordon's answer for this particular problem. The XPath subquery (at context) is good for complex cases — or symplify your code, avoiding to check hasChildNodes() + loop for $childNodes, etc. with no significative gain in performance.
I'm using DOMi ( http://domi.sourceforge.net ) to create XML from arrays.
But I don't know how to create attributes in these XML (in arrays, so these attributes appear in the XML). How can I construct these arrays so I can get some tags with attributes after the convertion?
Thank you!
Looking at the source code, apparently you pass the second argument "attributes" to attachToXml:
public function attachToXml($data, $prefix, &$parentNode = false) {
if(!$parentNode) {
$parentNode = &$this->mainNode;
}
// i don't like how this is done, but i can't see an easy alternative
// that is clean. if the prefix is attributes, instead of creating
// a node, just put all of the data onto the parent node as attributes
if(strtolower($prefix) == 'attributes') {
// set all of the attributes onto the node
foreach($data as $key=>$val)
$parentNode->setAttribute($key, $val);
$node = &$parentNode;
}
//...
}