Insert XML element on the same line of deleted element - php

I have a php document that deletes an XML element (with child elements), based on the value of the attribute "id", and then creates a new element with the same child elements, but with different text added from a form input:
<?php
function ctr($myXML, $id) {
$xmlDoc = new DOMDocument();
$xmlDoc->load($myXML);
$xpath = new DOMXpath($xmlDoc);
$nodeList = $xpath->query('//noteboard[#id="'.$id.'"]');
if ($nodeList->length) {
$node = $nodeList->item(0);
$node->parentNode->removeChild($node);
}
$xmlDoc->save($myXML);
}
$xml = 'xml.xml'; // file
$to = $_POST['eAttr'];// the attribute value for "id"
ctr($xml,$to);
$target = "3";
$newline = "
<noteboard id='".$_POST['eId']."'>
<noteTitle>".$_POST['eTitle']."</noteTitle>
<noteMessage>".$_POST['eMessage']."</noteMessage>
<logo>".$_POST['eType']."</logo>
</noteboard>"; // HERE
$stats = file($xml, FILE_IGNORE_NEW_LINES);
$offset = array_search($target,$stats) +1;
array_splice($stats, $offset, 0, $newline);
file_put_contents($xml, join("\n", $stats));
?>
XML.xml
<note>
<noteboard id="title">
<noteTitle>Title Text Here</noteTitle>
<noteMessage>Text Here</noteMessage>
<logo>logo.jpg</logo>
</noteboard>
</note>
This works fine, but I would like it to put the new XML content on the line that the old element (the deleted) used to be on, instead of $target adding it to line 3. It is supposed to look like that the element is being 'edited', but it doesn't achieve this if it is on the wrong line.

The lines in an XML document are not exactly relevant, they are just formatting so that the document is easier to read (by a human). Think of it as a tree of nodes. Not only the elements are nodes but any content, like the XML declaration attributes and any text.
With that in mind you can think about your problem as replacing an element node.
First create the new noteCard element. This can be encapsulated into a function:
function createNote(DOMDocument $document, $id, array $data) {
$noteboard = $document->createElement('notecard');
$noteboard->setAttribute('id', $id);
$noteboard
->appendChild($document->createElement('noteTitle'))
->appendChild($document->createTextNode($data['title']));
$noteboard
->appendChild($document->createElement('noteMessage'))
->appendChild($document->createTextNode($data['text']));
$noteboard
->appendChild($document->createElement('logo'))
->appendChild($document->createTextNode($data['logo']));
return $noteboard;
}
Call the function to create the new notecard element node. I am using string literals here, you will have to replace that with the variables from you form.
$newNoteCard = createNote(
$document,
42,
[
'title' => 'New Title',
'text' => 'New Text',
'logo' => 'newlogo.svg',
]
);
Now that you have the new notecard, you can search the existing and replace it:
foreach($xpath->evaluate('//noteboard[#id=3][1]') as $noteboard) {
$noteboard->parentNode->replaceChild($newNoteCard, $noteboard);
}
Complete example:
$document = new DOMDocument();
$document->formatOutput = true;
$document->preserveWhiteSpace = false;
$document->loadXml($xml);
$xpath = new DOMXpath($document);
function createNote(DOMDocument $document, $id, array $data) {
$noteboard = $document->createElement('notecard');
$noteboard->setAttribute('id', $id);
$noteboard
->appendChild($document->createElement('noteTitle'))
->appendChild($document->createTextNode($data['title']));
$noteboard
->appendChild($document->createElement('noteMessage'))
->appendChild($document->createTextNode($data['text']));
$noteboard
->appendChild($document->createElement('logo'))
->appendChild($document->createTextNode($data['logo']));
return $noteboard;
}
$newNoteCard = createNote(
$document,
42,
[
'title' => 'New Title',
'text' => 'New Text',
'logo' => 'newlogo.svg',
]
);
foreach($xpath->evaluate('//noteboard[#id=3][1]') as $noteboard) {
$noteboard->parentNode->replaceChild($newNoteCard, $noteboard);
}
echo $document->saveXml();

Related

turn HTML into a PHP array

I have a string containing also HTML in a $html variable:
'Here is some text which I do not need to extract but then there are
<figure class="class-one">
<img src="/example.jpg" alt="example alt" class="some-image-class">
<figcaption>example caption</figcaption>
</figure>
And another one (and many more)
<figure class="class-one some-other-class">
<img src="/example2.jpg" alt="example2 alt">
</figure>'
I want to extract all <figure> elements and everything they contain including their attributes and other html-elements and put this in an array in PHP so I would get something like:
$figures = [
0 => [
"class" => "class-one",
"img" => [
"src" => "/example.jpg",
"alt" => "example alt",
"class" => "some-image-class"
],
"figcaption" => "example caption"
],
1 => [
"class" => "class-one some-other-class",
"img" => [
"src" => "/example2.jpg",
"alt" => "example2 alt",
"class" => null
],
"figcaption" => null
]];
So far I have tried:
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$figures = array();
foreach ($figures as $figure) {
$figures['class'] = $figure->getAttribute('class');
// here I tried to create the whole array but I can't seem to get the values from the HTML
// also I'm not sure how to get all html-elements within <figure>
}
Here is a Demo.
Here is the code that should get you where you want to be. I have added comments where I felt they would be helpful:
<?php
$htmlString = 'Here is some text which I do not need to extract but then there are <figure class="class-one"><img src="/example.jpg" alt="example alt" class="some-image-class"><figcaption>example caption</figcaption></figure>And another one (and many more)<figure class="class-one some-other-class"><img src="/example2.jpg" alt="example2 alt"></figure>';
//Create a new DOM document
$dom = new DOMDocument;
//Parse the HTML.
#$dom->loadHTML($htmlString);
//Create new XP
$xp = new DOMXpath($dom);
//Create empty figures array that will hold all of our parsed HTML data
$figures = array();
//Get all <figure> elements
$figureElements = $xp->query('//figure');
//Create number variable to keep track of our $figures array index
$figureCount = 0;
//Loop through each <figure> element
foreach ($figureElements as $figureElement) {
$figures[$figureCount]["class"] = trim($figureElement->getAttribute('class'));
$figures[$figureCount]["img"]["src"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('src');
$figures[$figureCount]["img"]["alt"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('alt');
//Check that an img class exists, otherwise set the value to null. If we don't do this PHP will throw a NOTICE.
if (boolval($xp->evaluate('//img', $figureElement)->item($figureCount))) {
$figures[$figureCount]["img"]["class"] = $xp->query('//img', $figureElement)->item($figureCount)->getAttribute('class');
} else {
$figures[$figureCount]["img"]["class"] = null;
}
//Check that a <figcaption> element exists, otherwise set the value to null
if (boolval($xp->evaluate('//figcaption', $figureElement)->item($figureCount))) {
$figures[$figureCount]["figcaption"] = $xp->query('//figcaption', $figureElement)->item($figureCount)->nodeValue;
} else {
$figures[$figureCount]["figcaption"] = null;
}
//Increment our $figureCount so that we know we can create a new array index.
$figureCount++;
}
print_r($figures);
?>
$doc = new \DOMDocument();
$doc->loadHTML($html);
$figure = $doc->getElementsByTagName("figure"); // DOMNodeList Object
//Craete array to add all DOMElement value
$figures = array();
$i= 0;
foreach($figure as $item) { // DOMElement Object
$figures[$i]['class']= $item->getAttribute('class');
//DOMElement::getElementsByTagName— Returns html tag
$img = $item->getElementsByTagName('img')[0];
if($img){
//DOMElement::getAttribute — Returns value of attribute
$figures[$i]['img']['src'] = $img->getAttribute('src');
$figures[$i]['img']['alt'] = $img->getAttribute('alt');
$figures[$i]['img']['class'] = $img->getAttribute('class');
}
//textContent - use to get the text of tag
if($item->getElementsByTagName('figcaption')[0]){
$figures[$i]['figcaption'] = $item->getElementsByTagName('figcaption')[0]->textContent;
}
$i++;
}
echo "<pre>";
print_r($figures);
echo "</pre>";

Array filter in PHP

I am using a simple html dom to parsing html file.
I have a dynamic array called links2, it can be empty or maybe have 4 elements inside or more depending on the case
<?php
include('simple_html_dom.php');
$url = 'http://www.example.com/';
$html = file_get_html($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
//////////////////////////////////////////////////////////////////////////////
foreach ($doc->getElementsByTagName('p') as $link)
{
$intro2 = $link->nodeValue;
$links2[] = array(
'value' => $link->textContent,
);
$su=count($links2);
}
$word = 'document.write(';
Assuming that the two elements contain $word in "array links2", when I try to filter this "array links2" by removing elements contains matches
unset( $links2[array_search($word, $links2 )] );
print_r($links2);
the filter removes only one element and array_diff doesn't solve the problem. Any suggestion?
solved by adding an exception
foreach ($doc->getElementsByTagName('p') as $link)
{
$dont = $link->textContent;
if (strpos($dont, 'document') === false) {
$links2[] = array(
'value' => $link->textContent,
);
}
$su=count($links2);
echo $su;

Xpath Query Won't Return Results

I'm trying to return some results from an Xpath query but it won't select the elements correctly. I'm using the following code:
public function getTrustPilotReviews($amount)
{
$trustPilotUrl = 'https://www.trustpilot.co.uk/review/purplegriffon.com';
$html5 = new HTML5;
$document = $html5->loadHtml(file_get_contents($trustPilotUrl));
$document->validateOnParse = true;
$xpath = new DOMXpath($document);
$reviewsDomNodeList = $xpath->query('//div[#id="reviews-container"]//div[#itemprop="review"]');
$reviews = new Collection;
foreach ($reviewsDomNodeList as $key => $reviewDomElement)
{
$xpath = new DOMXpath($reviewDomElement->ownerDocument);
if ((int) $xpath->query('//*[#itemprop="ratingValue"]')->item($key)->getAttribute('content') >= 4)
{
$review = [
'title' => 'Test',
'author' => $xpath->query('//*[#itemprop="author"]')->item($key)->nodeValue,
'date' => $xpath->query('//*[#class="ndate"]')->item($key)->nodeValue,
'rating' => $xpath->query('//*[#itemprop="ratingValue"]')->item($key)->nodeValue,
'body' => $xpath->query('//*[#itemprop="reviewBody"]')->item($key)->nodeValue,
];
$reviews->add((object) $review);
}
}
return $reviews->take($amount);
}
This code won't return anything:
//div[#id="reviews-container"]//div[#itemprop="review"]
But if I change it to:
//*[#id="reviews-container"]//*[#itemprop="review"]
It partially works but does not return the correct results.
It looks like you're using the HTML5-PHP library. If you do you need to use namespaces. The library loads HTML5 into an XHTML document. You can test that if you save the DOM document as XML. The output will be something like:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
...
</html>
So if you use XPath you need to register and prefix for the XHTML namespace and use it for element names.
...
$xpath = new DOMXPath($document);
$xpath->registerNamespace('x', 'http://www.w3.org/1999/xhtml');
$reviewNodes= $xpath->evaluate(
'//x:div[#id="reviews-container"]//x:div[#itemprop="review"]'
);
foreach ($reviewNodes as $reviewNode) {
...
}
...
You have an condition inside the loop that can be part of the outer XPath used to fetch the reviews:
$expression =
'//x:div[#id="reviews-container"]
//x:div[
#itemprop="review" and
(.//*[#itemprop = "ratingValue"]/#content > 4)
]'
Do not use DOMXPath::query() but DOMXPath::evaluate(), it allows you to get scalars directly. The second argument for the methods is the context node. Use relative locations paths (without a / at the start of the expression).
...
foreach ($reviewNodes as $reviewNode) {
$review = [
'title' => 'Test',
'author'=> $xpath->evaluate('string(.//*#itemprop="author"])', $reviewNode),
'date'=> $xpath->evaluate('string(.//*[#class="ndate"])', $reviewNode),
'rating'=> $xpath->evaluate('string(.//*[#class="ratingValue"])', $reviewNode),
'body'=> $xpath->evaluate('string(.//*[#class="reviewBody"])', $reviewNode)
];
...
}
Thanks to Viper-7, biberu and salathe in the ##php IRC I have this working now using:
public function getTrustPilotReviews($amount)
{
$context = stream_context_create(array('ssl' => array('verify_peer' => false)));
$url = 'https://www.trustpilot.co.uk/review/purplegriffon.com';
$data = file_get_contents($url, false, $context);
libxml_use_internal_errors(true);
$doc = new \DOMDocument();
$doc->loadHTML($data);
$xpath = new DOMXpath($doc);
$reviews = new Collection;
foreach($xpath->query('//div[#id="reviews-container"]/div[#itemprop="review"]') as $node)
{
$xpath = new DOMXpath($doc);
$rating = $xpath->query('.//*[#itemprop="ratingValue"]', $node)->item(0)->getAttribute('content');
if ($rating >= 4)
{
$review = [
'title' => $xpath->evaluate('normalize-space(descendant::*[#itemprop="headline"]/a)', $node),
'author' => $xpath->evaluate('normalize-space(descendant::*[#itemprop="author"])', $node),
'date' => $xpath->evaluate('normalize-space(descendant::*[#class="ndate"])', $node),
'rating' => $xpath->evaluate('number(descendant::*[#itemprop="ratingValue"]/#content)', $node),
'body' => $xpath->evaluate('normalize-space(descendant::*[#itemprop="reviewBody"])', $node),
];
$reviews->add((object) $review);
}
}
return $reviews->take($amount);
}

How can I retrieve infos from PHP DOMElement?

I'm working on a function that gets the whole content of the style.css file, and returns only the CSS rules that needed by the currently viewed page (it will be cached too, so the function only runs when the page was changed).
My problem is with parsing the DOM (I'm never doing it before with PHP DOM). I have the following function, but $element->tagname returns NULL. I also want to check the element's "class" attribute, but I'm stuck here.
function get_rules($html) {
$arr = array();
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('*') as $element ){
$arr[sizeof($arr)] = $element->tagname;
}
return array_unique($arr);
}
What can I do? How can I get all of the DOM elements tag name, and class from HTML?
Because tagname should be an undefined index because its supposed to be tagName (camel cased).
function get_rules($html) {
$arr = array();
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('*') as $element ){
$e = array();
$e['tagName'] = $element->tagName; // tagName not tagname
// get all elements attributes
foreach($element->attributes as $attr) {
$attrs = array();
$attrs['name'] = $attr->nodeName;
$attrs['value'] = $attr->nodeValue;
$e['attributes'][] = $attrs;
}
$arr[] = $e;
}
return $arr;
}
Simple Output

How can I extract all img tag within an anchor tag?

I would like to extract all img tags that are within an anchor tag using the PHP DOM object.
I am trying it with the code below but its getting all anchor tag and making it's text empty due the inside of an img tag.
function get_links($url) {
// Create a new DOM Document to hold our webpage structure
$xml = new DOMDocument();
// Load the url's contents into the DOM
#$xml->loadHTMLFile($url);
// Empty array to hold all links to return
$links = array();
//Loop through each <a> tag in the dom and add it to the link array
foreach($xml->getElementsByTagName('a') as $link)
{
$hrefval = '';
if(strpos($link->getAttribute('href'),'www') > 0)
{
//$links[] = array('url' => $link->getAttribute('href'), 'text' => $link->nodeValue);
$hrefval = '#URL#'.$link->getAttribute('href').'#TEXT#'.$link->nodeValue;
$links[$hrefval] = $hrefval;
}
else
{
//$links[] = array('url' => GetMainBaseFromURL($url).$link->getAttribute('href'), 'text' => $link->nodeValue);
$hrefval = '#URL#'.GetMainBaseFromURL($url).$link->getAttribute('href').'#TEXT#'.$link->nodeValue;
$links[$hrefval] = $hrefval;
}
}
foreach($xml->getElementsByTagName('img') as $link)
{
$srcval = '';
if(strpos($link->getAttribute('src'),'www') > 0)
{
//$links[] = array('src' => $link->getAttribute('src'), 'nodval' => $link->nodeValue);
$srcval = '#SRC#'.$link->getAttribute('src').'#NODEVAL#'.$link->nodeValue;
$links[$srcval] = $srcval;
}
else
{
//$links[] = array('src' => GetMainBaseFromURL($url).$link->getAttribute('src'), 'nodval' => $link->nodeValue);
$srcval = '#SRC#'.GetMainBaseFromURL($url).$link->getAttribute('src').'#NODEVAL#'.$link->nodeValue;
$links[$srcval] = $srcval;
}
}
//Return the links
//$links = unsetblankvalue($links);
return $links;
}
This returns all anchor tag and all img tag separately.
$xml = new DOMDocument;
libxml_use_internal_errors(true);
$xml->loadHTMLFile($url);
libxml_clear_errors();
libxml_use_internal_errors(false);
$xpath = new DOMXPath($xml);
foreach ($xpath->query('//a[contains(#href, "www")]/img') as $entry) {
var_dump($entry->getAttribute('src'));
}
The usage of strpos() function is not correct in the code.
Instead of using
if(strpos($link->getAttribute('href'),'www') > 0)
Use
if(strpos($link->getAttribute('href'),'www')!==false )

Categories