Prepend HTML text using DOMDocument without parent container - php

Let's say I have <p>Text</p>
I'd like to create a function using DOMDocument to be able to insert text, eg:
insertText('<p>Text</p>', '<strong>1.</strong> ')
So that the result was <p><strong>1.</strong> Text<p>
I'm already accessing this paragraph tag, so I think I'm almost there, I just cannot figure out how to append plain text that can be read as HTML
$dom = new DOMDocument();
$question_paragraphs = array();
$dom->loadHTML($str);
$par = $dom->getElementsByTagName('p');
if ($par->length == 1) {
$par->item(0)->setAttribute("class", "first last");
###
### How do I do this here?
###
}
Is it possible to inject text this way?

You can get use the method insertBefore (Official Documentation) as follows
You create your strong element
You insert this node before the text node
$span = $dom->createElement('strong', '1.');
$par->item(0)->insertBefore($span, $par->item(0)->firstChild);
Please note that the second parameter of the insertBefore function is the child to which you want to prepend your tag. So in this case you can use firstChild as your <p> only contains the Text.
This will finally output
<p class="first last"><span>1.</span>Text</p>

Related

How to add text in between html tags

I want to add a string in between html tags in php. I'm using php's dom document class and you can only add strings inside html. Here is an example of what I am trying to accomplish
<tag>example</tag><another>sometext</another>
I want to add a string in-between these two tags so it should look like
<tag>example</tag>STRING<another>sometext</another>
I want to be able to seperate these tags so I can use the explode function to split every tag in the html page to an array then traverse them for later use.
You can add a textnode without being or having a tag.
$doc = new DOMDocument();
$tag = $doc->createElement('tag');
$doc->appendChild($tag);
$tag->appendChild($doc->createTextNode('example'));
$node = $doc->createTextNode('STRING');
$doc->appendChild($node);
$another = $doc->createElement('another');
$doc->appendChild($another);
$another->appendChild($doc->createTextNode('sometext'));
echo $doc->saveHTML();
will give
<tag>example</tag>STRING<another>sometext</another>
You need to connect php and html.
Exemple:
<?php echo "<tag>example</tag>.'STRING'.<another>sometext</another> " ?>

Get href value with DOMDocument in PHP

Following a file_get_contents, I receive this HTML:
<h1>
Manhattan Skyline
</h1>
I want to get the blablabla.html part only.
How can I parse it with DOMDocument feature in PHP?
Important: the HTML I receive contains more than one <a href="...">.
What I try is:
$page = file_get_contents('https://...');
$dom = new DOMDocument();
$dom->loadHTML($page);
$xp = new DOMXpath($dom);
$url = $xp->query('h1//a[#href=""]');
$url = $url->item(0)->getAttribute('href');
Thanks for your help.
h1//a[#href=""] is looking for an a element with an href attribute with an empty string as the value, whereas your href attribute contains something other than the empty string as the value.
If that's the entire document, then you could use the expression //a.
Otherwise, h1//a should work as well.
If you require the a element to have an href attribute with any kind of value, you could use h1//a[#href].
If the h1 is not at the root of the document, you might want to use //h1 instead. So the last example would become //h1//a[#href].

Changing a tag <a> to <div> with DOMDocument on WordPress

I'm a beginner in PHP and I would like to set up several functions to replace specific code bits on WordPress (including plugin elements that I can't edit directly).
Below is an example (first line: initial result, second line: desired result):
<span class="fn" itemprop="name">Gael Beyries</span>
<div class="vcard author"><span class="fn" itemprop="name">Gael Beyries</span></div>
PS: I came across this topic: Parsing WordPress post content but the example is too complicated for what I want to do. Could you present me an example code that solves this problem so I can try to modify it to modify other html elements?
Although I'm not sure how this fits into WP, I have basically taken the code from the linked answer and adapted it to your requirements.
I've assumed you want to find the <a> tags with class="vcard author" and this is the basis of the XPath expression. The code in the foreach() loop just copies the data into a new node and replaces the old one...
function replaceAWithDiv($content){
$dom = new DOMDocument();
$dom->loadHTML($content);
$xpath = new DOMXPath($dom);
$aTags = $xpath->query('//a[#class="vcard author"]');
foreach($aTags as $a){
// Create replacement element
$div = $dom->createElement("div");
$div->setAttribute("class", "vcard author");
// Copy contents from a tag to div
foreach ($a->childNodes as $child ) {
$div->appendChild($child);
}
// Replace a tag with div
$a->parentNode->replaceChild($div, $a);
}
return $dom->saveHTML();
}

Extract text from HTML <p> with a particular title

I have a huge file with lots of entries, they have one thing in common, the first line. I want to extract all of the text from a paragraph where the first line is:
Type of document: Contract Notice
The HTML code I am working on is here:
<!-- other HTML -->
<p>
<b>Type of document:</b>
" Contract Notice" <br>
<b>Country</b> <br>
... rest of text ...
</p>
<!-- other HTML -->
I have put the HTML into a DOM like this:
$dom = new DOMDocument;
$dom->loadHTML($content);
I need to return all of the text in the paragraph node where the first line is 'Type of document: Contract Notice' I am sure there is a simple way of doing this using DOM methods or XPath, please advise!
Speaking of XPath, try the following expression which selects<p> elements:
whose <b> child element (first one) has the value Type of document:
whose next sibling text node (first one) contains the text Contract Notice
//p[
b[1][.="Type of document:"]
/following-sibling::text()[1][contains(., "Contract Notice")]
]
With this XPath expression, you select the text of all children of the p element:
//b[text()="Type of document:"]/parent::p/*/text()
I don't like using DomDocument parsing unless I need to heavily parse a document, but if you want to do so then it could be something like:
//Using DomDocument
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DOMXpath($doc);
$matchedDoms = $xpath->query('//b[text()="Type of document:"]/parent::p//text()');
$data = '';
foreach($matchedDoms as $domMatch) {
$data .= $domMatch->data . ' ';
}
var_dump($data);
I would prefer a simple regex line to do it all, after all it's just one piece of the document you are looking for:
//Using a Regular Expression
preg_match('/<p>.*<b>Type of document:<\/b>.*Contract Notice(?<data>.*)<\/p>/si', $content, $matches);
var_dump($matches['data']); //If you want everything in there
var_dump(strip_tags($matches['data'])); //If you just want the text

How to remove consecutive links from a webpage?

I wish to remove consecutive links on a webpage
Here is a sample
<div style="font-family: Arial;">
<br>
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Google is a search
engine
In the above html I want to remove the first 2 A tags and not the third one (My script should only remove consecutive tags)
Don't use a regex for this. They are extremely powerful but not for finding this kind of "consecutive" tags.
I suggest you use DOM. Then you can browse the HTML as a tree.
Here is an example (not tested):
$doc = new DOMDocument();
// avoid blank nodes when parsing
$doc->preserveWhiteSpace = false;
// reads HTML in a string, loadHtmlFile() also exists
$doc->loadHTML($html);
// find all "a" tags
$links = $doc->getElementsByTagName('a');
// remove the first link
$parent = $links->item(0)->parentNode;
$parent->removeChild($links->item(0));
// test the node following the second link
if ($links->item(1)->nextSibling->nodeType != XML_TEXT_NODE) {
// delete this node ...
}
// print the modified HTML
// See DOMDocument's attributes if you want to format the output
echo $doc->saveHTML();

Categories