I have this HTML:
<span id="bla">text</span>more text
I want to get text and more text.
I have this XPath:
//span[#id="bla"]/text()
I can't figure out how to get the closing tag and what comes after it.
The more text is called a "tail" of an element and can be retrieved via following-sibling:
//span[#id="bla"]/following-sibling::text()
<span id="bla">text</span>more text alone is not well-formed and cannot be processed via XPath.
Let's put it in context:
<div><span id="bla">text</span>more text</div>
Then, you can simply take the string value of the parent element, div:
string(/div)
to get
textmore text
as requested.
If there's other surrounding content that you don't want:
<div>DO NOT WANT<span id="bla">text</span>more text<b/>DO NOT WANT</div>
You can follow #alecxe's lead with the following-sibling:: axis and use concat() to combine the parts you want:
concat(//span[#id="bla"], //span[#id="bla"]/following-sibling::text()[1])
to again get
textmore text
as requested.
Related
How can I put div in paragraph? I changed the display to inline, but the browser divide the base paragraph to two element, and div will not member of paragraph.
form the w3 site:
The P element represents a paragraph. It cannot contain
block-level elements (including P itself).
no you cannot nest anything other than an inline element in a <p> if you want the code to validate
from the property def:
<!ELEMENT P - O (%inline;)* -- paragraph -->
I know it's hard to understand those recs at times, but the bit in brackets means the <p> can only contain inline elements
you can (and should) use a <span> inside a <p> and if required you could change it's CSS display property to block or inline-block and that would be perfectly valid, as CSS properties do not change the actual definitions of an element.. in your case it sounds like you need an inline element so just use a <span>
Make a span, set it's style to a block.
<p>
paragraph text
<span style="display: block;">Span stuff</span>
</p>
You cannot nest a <div> element inside a <p> element according to the HTML standard. Consider why you even want to do this; it should never be necessary. A <p> element can only, and logically should only contain inline elements and text.
For those who think that validation is decisive, there are in fact two ways to get a div inside a p.
One way is to use dom manipulation in script. For example
var theDiv = document.createElement("div");
theDiv.appendChild(document.createTextNode("inserted"));
var theP = document.getElementsByTagName("p")[0];
theP.insertBefore(theDiv, theP.firstChild);
See it in action here: http://www.alohci.net/text/html/div-in-p-by-script.htm.ashx
The other way is to use XHTML and serve it with an XML content type. (No support in IE prior to IE9)
See that here : http://www.alohci.net/application/xhtml+xml/div-in-p-by-mime.htm.ashx
(Do note however, that while it is possible this way - it is still not valid.)
But You makes the vital point. Semantically, it's nonsense. You wouldn't put a block of something in the middle of a paragraph if you were writing text on to paper, so there should be no need to do it in HTML either.
Div is a block. Span is inline. Both of them are containers.
You have to set display:inline for the div with css.
Of course it won't pass validation so better use span ( inline ) to achieve the same thing.
I'm using Symfony DomCrawler to get all text in a document.
$this->crawler->filter('p')->each(function (Crawler $node, $i) {
// process text
});
I'm trying to gather all text within the <body> that are outside of elements.
<body>
This is an example
<p>
blablabla
</p>
another example
<p>
<span>Yo!</span>
again, another piece of text <br/>
with an annoy BR in the middle
</p>
</body>
I'm using PHP Symfony and can use XPath (preferred) or RegEx.
The string value of the entire document can be obtained with this simple XPath:
string(/)
All text nodes in the document would be:
//text()
The immediate text node children of body would be:
/body/text()
Note that the XPaths that select text nodes would typically be converted to concatenated string values, depending upon context.
Scenario:
I need to apply a php function to the plain text contained inside HTML tags, and show the result, maintaining the original tags (with their original attributes).
Visualize:
Take this:
<p>Some text here pointing to the moon and that's it</p>
Return this:
<p>
phpFunction('Some text here pointing to the ')
phpFunction('moon')
phpFunction(' and that\'s it')
</p>
What I should do:
Use a PHP html parser (instead of using regexp) and iterate over every tag, applying the callback to the node text content.
Problem:
If I have, for example, an <a> tag inside a <p> tag, the text content of the parent <p> tag would consist of two different plain text parts, which the php callback should considerate as separate.
Question:
How should I approach this in a clean and smooth way?
Thanks for your time, all the best.
In the end, I decided to use regex instead of including an external library.
For the sake of simplicity:
$expectedOutput = preg_replace_callback(
'/>(.*)</U',
function ($withstuff) {
return '>'.doStuff($withStuff).' <';
},
$fromInput
);
This will look for everything between > and <, which is, indeed, what I was looking for.
Of course any suggestion/comment is still welcome.
Peace.
Preface: I cannot rename the source tags or edit their IDs. Any changes to the tags must happen after they have been fetched.
What I'm doing: using file_get_contents in PHP, I am requesting data from a remote site. This data is just two <p> tags. I need to hide or rename the second of the two <p> tags.
Is this possible with PHP or jQuery?
What I'm working with:
<p>Hello my name is test</p><p>I like studying geology.</p>
If you need to hide second text, you can do this with Jquery:
$('p:eq(1)').hide();
Jsfiddle
You could try a php string replace
$new_string = str_replace('</p><p>','',file_get_contents('somecontent'));
If you need to do it before render HTML, you need to parse contents and remove/replace second p tag and create a new content.
Here is a DOM parser Simple HTML DOM Parser
Find similar questions below
How to match second <a> tag in this string
How to add attribute to first P tag using PHP regular expression?
Or you can do it after rendering HTML as rNix suggested.
If I had the following HTML:
<li> Thisislink1</li>
<li> Thisisanotherlink</li>
<li> Onemorelink</li>
Where each link will be different in length and value.
How can I search for the values inside the link (IE: Thisislink1, Thisisanotherlink and Onemorelink) with a search phrase, say 'another'. So in this example, only 'Thisisanotherlink' would be returned, but if I changed the search phrase to 'link', then all 3 values will be returned.
Don't use regex. Use DOMDocument.
/\w*another\w*/
This needs to be done in two passes:
Extract the text from all links in the document. XSL or XPath should we workable for this purpose. As you extract text, keep a copy of the DOM around so you can attach information to it and the text, telling you where the text is extracted from (if you are going to need this info later, you might not). As an alternative, just keep attach the contents of the href attribute to the text.
Be sure to extract all the text you need (e.g. title attributes, or alt text of <a href><img alt></a> type constructs.
Search the extracted text for the phrase you are looking for.
(Optional) use the information you set earlier to map back to the DOM to figure out what element you gathered the text from, and highlight it. If you extracted the href attribute, you could just make a new link using this and the matching text.