I have a situation here, i am a bit of java guy and getting some hard time with php.
I am creating an XML file from a database. For now, i created more than 90 dynamic elements, some includes attributes, child etc, w/o any problem.
But things got messed up here;
text1:
here is a list of pencils[1]. here is a list of another type of pencils[2].
I do want to have
<text1>
here is a list of pencils <id>1</id>. here is a list of another type of pencils <id>2</id>.
</text1>
i can replace substrings ([1], [2]) and insert some other text, but how to replace these substrings with DOM element?
any help is deeply appreciated..
You cannot because the string within you want to do the replacement is the node Value of the text1 node. A variant would be to structure it like:
<text1>
<partial>here is a list of pencils</partial>
<id>1</id>
<partial>.here is a list of another type of pencils</partial>
<id>2</id>
</text1>
But honestly that is suboptimal.
I assume what got you confused (and me for a second there) is the way we write HTML:
<p>some text here a link more <strong>variation</strong></p>
Which might give us the impression that it should be valid XML as well; but of course there is another thing to know; that browsers actually transform the prior HTML to the following form (~):
<p>
<textnode>some text here </textnode>
<a href="...">
<textnode>a link</a>
</a>
<textnode> more </textnode>
<strong>variation</strong>
</p>
Not the answer, but I'd recommend you rethink your XML format.
Related
I'm parsing a document HTML using DOM -> SimpleXML:
$dom = new DOMDocument();
$dom->loadHTML($this->resource->get());
$html = simplexml_import_dom($dom);
And wanna load this piece:
<p>
Some text here <strong class="wanna-attributes-too">with strong element!</strong>.
But there can be even <b>bold</b> tag and many others.
</p>
Then I want do something and export it; but inner tags are parsed as child nodes of <p> - that is formally right, but how can I reconstruct original document? Is there some library which can handle tags inside text values?
How about browsers as that is common case?
Thanks
// p.s. I CAN parse documents with nodes within text, that ISN'T problem; problem is that nodes lost their positions in original text
Update v1.0
Ok, solution can be encapsulating every node, which has nodes and value at the same time.
Updated question can be - how to get raw node value from simple_xml?
From previous HTML fragment I want something like this:
echo $nodeParagraph->rawValue;
and output will be
Some text here <strong class="wanna-attributes-too">with strong element!</strong>.
But there can be even <b>bold</b> tag and many others.
Update v2.0
My bad - SimpleXML node has saveXML (alis to asXML) which does what I want. Sorry for a noise. I'll post answer when I build working test.
So as #jzasnake pointed out, nice solution is to do this:
sample (input):
<p>
Some text here <strong class="wanna-attributes-too">with strong element!</strong>.
But there can be even <b>bold</b> tag and many others.
</p>
this outputs something like this in DOM:
p
strong
b
where text is in incorret order (if you later wanna reconstruct it).
Solution can be eveloping every text into its own node (notice <value> tags):
<p>
<value>Some text here </value><strong class="wanna-attributes-too">with strong element!</strong><value>.
But there can be even </value><b>bold</b><value> tag and many others.</value>
</p>
markup is a bit more talkative, but look at this:
p
value
strong
value
value
b
value
value
Everything is preserved, so you are able to reconstruct original document as is.
I have the following line of code in a HTML file (or something similar):
...
Link Content
...
I need to be able to extract the a/b/c/d part of the href and convert the link to something like:
Link Content
Ideally I'd like to be able to do this with regex, but most of the regex stuff I've seen for XSLT on StackOverflow seems to require XPath 2.
Ah yes... I'm using SimpleXML/DomDocument on PHP5.3 to apply the stylesheet which I believe doesn't support v2 xslt.
I think I could do string replacement to lose the first part, but I'd like to have a pattern match to extract it.
Any thoughts?
As already pointed out in the answer given by michael.hor257k, you have to adjust the & character to have valid XML. Given an input containing for example
Link Content
the following template
<xsl:template match="a/#href[starts-with(.,'#SCRIPT_NAME#')]">
<xsl:attribute name="href">
<xsl:value-of select="concat('/lookup?id=', substring-after(.,'id='))"/>
</xsl:attribute>
</xsl:template>
changes the link to
Link Content
matching every href starting with #SCRIPT_NAME#.
Though it's not clear from the question which is the part that has to be matched / how to identify the links that have to be adjusted, possibly you can adjust this example to fit your requirements or provide further input to your question.
most of the regex stuff I've seen for XSLT on StackOverflow seems to
require XPath 2.
Not most: all. Unless your specific XSLT 1.0 processor offers regex as a (procesor-specific) extension.
Now, the part missing from your question is how to recognize the part that you want to extract from the existing value. If, for example, it is always the substring that comes after (the first occurrence of) "id=", then you could use the substring-after() function to retrieve it.
Or at least in theory you could. In practice, nothing will work with the given example, because it contains an unescaped & character - a big no-no in XML.
This is just a shot in the dark, but if you are specifically looking to solve this with a regex, you may be able to use something like the following:
$xslt_string = 'Link Content';
preg_match('/href=".+?id=(.+?)"/', $xslt_string, $matches);
print_r($matches);
https://regex101.com/r/rY7oY7/1
When using PHP's BBCode extension, does anyone know what BBCODE_TYPE_ROOT means exactly? It doesn't seem necessary, at least with this example, however, it is used in most of the examples in the documentation.
The documentation is pretty vague about this element:
BBCODE_TYPE_ROOT (integer)
This BBCode tag is the special tag root (nesting level 0).
Thank you in advance.
Okay, I kept experimenting and looking at examples, and I figured it out.
The key example is on this page. Notice, the [i]Italic Text[/i] example does not get translated into HTML. This is because !i was specified under the root element. Basically, this BBCode interpreter understands the "tree" that BBCode creates. Using parents and children, you can create [ul] and [li] items respectively. Perhaps, you'd like to add properties to the "highest level" element. The !i example prevents italic text from being used when no tags have been used yet, ie: under the root element.
So if you keep the tree structure of BBCode in mind, then the BBCODE_TYPE_ROOT element is the root element. Kinda like the < HTML> element in HTML pages, except its invisible in BBCode.
If I had the following HTML:
<li> Thisislink1</li>
<li> Thisisanotherlink</li>
<li> Onemorelink</li>
Where each link will be different in length and value.
How can I search for the values inside the link (IE: Thisislink1, Thisisanotherlink and Onemorelink) with a search phrase, say 'another'. So in this example, only 'Thisisanotherlink' would be returned, but if I changed the search phrase to 'link', then all 3 values will be returned.
Don't use regex. Use DOMDocument.
/\w*another\w*/
This needs to be done in two passes:
Extract the text from all links in the document. XSL or XPath should we workable for this purpose. As you extract text, keep a copy of the DOM around so you can attach information to it and the text, telling you where the text is extracted from (if you are going to need this info later, you might not). As an alternative, just keep attach the contents of the href attribute to the text.
Be sure to extract all the text you need (e.g. title attributes, or alt text of <a href><img alt></a> type constructs.
Search the extracted text for the phrase you are looking for.
(Optional) use the information you set earlier to map back to the DOM to figure out what element you gathered the text from, and highlight it. If you extracted the href attribute, you could just make a new link using this and the matching text.
Let's say the HTML contains 15 table tags, before each table there is a div tag with some text inside. I need to get the text from the div tag that is directly before the 10th table tag in the HTML markup. How would I do that?
The only way I can think of is to use explode('<table', $html) to split the HTML into parts and then get the last div tag from the 9th value of the exploded array with regular expression. Is there a better way?
I'm reading through the PHP DOM documentation but I cannot see any method that would help me with this task there.
You load your HTML into a DOMDocument and query it with this XPath expression:
//table[10]/preceding-sibling::div[1]
This would work for the following layout:
<div>Some text.</div>
<table><!-- #1 --></table>
<!-- ...nine more... -->
<div>Some other text.</div> <!-- this would be selected -->
<table><!-- #10 --></table>
<!-- ...four more... -->
XPath is capable of doing really complex node lookups with ease. If the above expression does not yet work for you, probably very little is required to make it do what you want.
HTML is structured data represented as a string, this is something substantially different from being a string. Don't give in to the temptation of doing stuff like this with string handling functions like explode(), or even regex.
If you don't feel like learning xpath, you can use the same old-school DOM walking techniques you would use with JavaScript in the browser.
document.getElementsByTagName('table')[9]
then crawl your way up the .previousSibling values until you find one that isn't a TextNode and is a div
I've found that PHP's DOMDocument works pretty well with non-perfect HTML and then once you have the DOM I think you can even pass that into a SimpleXML object and work with it XML-style even though the original HTML/XHTML structure wasn't perfect.