How to remove DOM Elements from XML in PHP?

How to remove DOM Elements from XML in PHP? - php

Following is my XML File content
Now I want to remove < text > element from xml, how can I do it.
$doc = new DOMDocument;
$doc->load("XML FILE");
$thedocument = $doc->documentElement;
$list = $thedocument->getElementsByTagName('text');
foreach ($list as $domElement){
//Code to remove current text element...
}

Did you check the manual? You can use removeChild. the manual has an example.
<?php
$doc = new DOMDocument;
$doc->load('book.xml');
$book = $doc->documentElement;
// we retrieve the chapter and remove it from the book
$chapter = $book->getElementsByTagName('chapter')->item(0);
$oldchapter = $book->removeChild($chapter);
echo $doc->saveXML();
?>

Related

Displaying the results of php document parser

Can you echo the results of a document parser or do you have to first create an array to display the results? Anyway, when running the code, nothing appears (no output or errors), and I have tried both methods. Could possibly be a site issue but I have tried a few others and get the same result.
<?php
$ebayquery ='halo';
$ebayhtml = 'https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1311.R6.TR12.TRC2.A0.H0.X.TRS0&_nkw=' . $ebayquery . '&_sacat=0';
$ebayresults = array();
$document = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$document->loadHTML($ebayhtml);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXpath($document);
$links = $xpath->query('//h3[#id="lvtitle"]/a');
foreach($links as $a) {
echo $a->nodeValue;
}
?>

There are a couple of problems with the code. Firstly is that loadHTML() takes a string for the HTML and not a filename or URI. So first you have to read the web page and pass it in ( I've used file_get_contents() here).
Secondly, the XPath was looking for any <h3> tag with an id attribute of lvtitle, there are only instances where the class attribute is lvtitle. I've updated the XPath expression to use this instead.
$ebayquery ='halo';
$ebayhtml = 'https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1311.R6.TR12.TRC2.A0.H0.X.TRS0&_nkw=' . $ebayquery . '&_sacat=0';
$ebayresults = array();
$document = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$ebayhtml = file_get_contents($ebayhtml);
$document->loadHTML($ebayhtml);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXpath($document);
$links = $xpath->query('//h3[#class="lvtitle"]/a');
print_r($links);
foreach($links as $a) {
echo $a->nodeValue.PHP_EOL;
}

Using DOMdocument createElement in foreach

I want to generate content with an array and place that in my Document that I loaded over a site. That is what I got so far. Is there a way for this with the foreach? Is there an append function?
$dom = new \DOMDocument();
$dom->loadHTML($data);
// create the new element
$newNode = $dom->createElement('div', 'this is new');
$newNode->setAttribute('id', 'new_div');
//foreach
$count = 0;
foreach($tags->contents as $content){
$contents[$count] = $content->text;
}
// fetch and replace the old element
$oldNode = $dom->getElementById('blog-xpath');
$oldNode->parentNode->replaceChild($newNode, $oldNode);
$nodes = $dom->saveHTML($dom->documentElement);

get all <h2> tag and <p> tag text from mysql text datatype column

I am trying to split and fetch p tag and h2 tag texts from database. I have tried this below code. it returns first result only. For example in my database I have
<h2>india</h2><p>country</p><h2>dravid</h2><p>cricket player</p>
I want to fetch h2 results and para results separately. but this below code returns first h2 and para results only. How do I get all h2 tag and p tag text from database?
$getdata = $res['review_content'];
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($getdata); // loads your html
$xpath = new DOMXPath($doc);
$heading = $xpath->evaluate("string(//h2/text())");
// paragraph text
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($getdata); // loads your html
$xpath = new DOMXPath($doc);
$paragraph = $xpath->evaluate("string(//p/text())");
When I tried echo $heading it returns India only. But I want to display India and Dravid

Try the below code, it will first parse the html into object,
then we are searching for specific element by there tag name getElementsByTagName and getting the content of the tag by textContent function
<?php
$getdata = '<h2>india</h2><p>country</p><h2>dravid</h2><p>cricket player</p>';
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$pTag = array();
$h2Tag= array();
$xmlDoc = new DOMDocument();
$xmlDoc->loadHTML($getdata);
$searchNode = $xmlDoc->getElementsByTagName("p");
foreach($searchNode as $d){
$pTag[] = $d->textContent;
}
$searchNode = $xmlDoc->getElementsByTagName("h2");
foreach($searchNode as $d){
$h2Tag[] = $d->textContent;
}
// pTag[] contain array of content all p tag
// h2Tag[] contain array of content all h2 tag
?>

You can use the function getElementsByTagName.
Example:
$h2 = $doc->getElementsByTagName('h2');
$p = $doc->getElementsByTagName('p');

You Should try this code it will help you get your desired result.
$db_string=html_entity_decode($file_contents);
$doc = new DOMDocument();
$doc->loadXML( $db_string );//string goes here from database
$para= $doc->getElementsByTagName( "p" );
$a= $doc->getElementsByTagName( "a" );
foreach($para as $p_tag){
$para_values = $p_tag->item(0)->nodeValue;
}
foreach($a as $a_tag){
$a_values = $a_tag->item(0)->nodeValue;
}

DOM Parser grabbing href of <a> tag by class="Decision"

I'm working with a DOM parser and I'm having issues. I'm basically trying to grab the href within the tag that only contain the class ID of 'thumbnail '. I've been trying to print the links on the screen and still get no results. Any help is appreciated. I also turned on error_reporting(E_ALL); and still nothing.
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$classId = "thumbnail ";
$div = $html->find('a#'.$classId);
echo $div;
I also tried this but still had the same result of NOTHING:
include('simple_html_dom.php');
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
$ret = $html->find('a[class=thumbnail]');
echo $ret;

You were almost there:
<?php
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a[contains(concat(' ',normalize-space(#class),' '),' thumbnail ')]");
var_dump($hrefs);
Gives:
class DOMNodeList#28 (1) {
public $length =>
int(25)
}
25 matches, I'd call it success.

This code would probably work:
$html = file_get_contents('http://www.reddit.com/r/funny');
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$hyperlinks = $xpath->query('//a[#class="thumbnail"]');
foreach($hyperlinks as $hyperlink) {
echo $hyperlink->getAttribute('href'), '<br>;'
}

if you're using simple_html_dom, why are you doing all these superfluous things? It already wraps the resource in everything you need -- http://simplehtmldom.sourceforge.net/manual.htm
include('simple_html_dom.php');
// set up:
$html = new simple_html_dom();
// load from URL:
$html->load_file('http://www.reddit.com/r/funny');
// find those <a> elements:
$links = $html->find('a[class=thumbnail]');
// done.
echo $links;

Tested it and made some changes - this works perfect too.
<?php
// load the url and set up an array for the links
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.reddit.com/r/funny');
$links = array();
// loop thru all the A elements found
foreach($dom->getElementsByTagName('a') as $link) {
$url = $link->getAttribute('href');
$class = $link->getAttribute('class');
// Check if the URL is not empty and if the class contains thumbnail
if(!empty($url) && strpos($class,'thumbnail') !== false) {
array_push($links, $url);
}
}
// Print results
print_r($links);
?>

get value of <h2> of html page with PHP DOM?

I have a var of a HTTP (craigslist) link $link, and put the contents into $linkhtml. In this var is the HTML code for a craigslist page, $link.
I need to extract the text between <h2> and </h2>. I could use a regexp, but how do I do this with PHP DOM? I have this so far:
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
What do I do next to put the contents of the element <h2> into a var $title?

if DOMDocument looks complicated to understand/use to you, then you may try PHP Simple HTML DOM Parser which provides the easiest ever way to parse html.
require 'simple_html_dom.php';
$html = '<h1>Header 1</h1><h2>Header 2</h2>';
$dom = new simple_html_dom();
$dom->load( $html );
$title = $dom->find('h2',0)->plaintext;
echo $title; // outputs: Header 2

You can use this code:
$linkhtml= file_get_contents($link);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($linkhtml); // loads your html
$xpath = new DOMXPath($doc);
$h2text = $xpath->evaluate("string(//h2/text())");
// $h2text is your text between <h2> and </h2>

You can do this with XPath: untested, may contain errors
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
$xpath = new DOMXpath($dom);
$elements = $xpath->query("/html/body/h2");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to remove DOM Elements from XML in PHP? - php

Related

Displaying the results of php document parser

Using DOMdocument createElement in foreach

get all <h2> tag and <p> tag text from mysql text datatype column

DOM Parser grabbing href of <a> tag by class="Decision"

get value of <h2> of html page with PHP DOM?

Categories

Resources