I want to get the HTML content in this page using file_get_contents as string :
https://www.emitennews.com/search/
Then I want to unminify the html code.
So far what I done to unminify it :
$html = file_get_contents("https://www.emitennews.com/search/");
$dom = new \DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML($html,LIBXML_HTML_NOIMPLIED);
$dom->formatOutput = true;
print $dom->saveXML($dom->documentElement);
But in the code above I got is error :
DOMDocument::loadHTML(): Tag header invalid in Entity, line: 1
What is the proper way to do it ?
You must add the xml tag at the first line:
$dom = new DOMDocument();
$dom->loadHTML('<?xml encoding="UTF-8">' . $html);
This is the correct code :
$html = file_get_contents("https://www.emitennews.com/search/");
$dom = new \DOMDocument();
libxml_use_internal_errors(true);
$dom->preserveWhiteSpace = false;
$dom->loadHTML('<?xml encoding="UTF-8">' . $html,LIBXML_HTML_NOIMPLIED);
$dom->formatOutput = true;
print $dom->saveXML($dom->documentElement);
The problem is the site using HTML5. So we need to put :
libxml_use_internal_errors(true);
I have a XML file and I'm trying to insert a new node between two others with PHP script.
XML file:
<playlistLog>
<hour>
<track>
<type>take</type>
<url>URL</url>
<title>1473869236.wav</title>
<mix>0</mix>
</track>
(...)
</hour>
</playlistLog>
PHP file:
$xmldoc = new DOMDocument();
$xmldoc->load('../logs/log14-09-2016.xml');
$elem = $xmldoc->getElementsByTagName("track");
$track = $xmldoc->createElement('track');
$type = $xmldoc->createElement('type', 'take');
$track->appendChild($type);
$url = $xmldoc->createElement('url', 'url');
$track->appendChild($url);
$title = $xmldoc->createElement('title', 'title');
$track->appendChild($title);
$mix = $xmldoc->createElement('mix', 'mix');
$track->appendChild($mix);
$xmldoc->documentElement->insertBefore($track,$elem[660]);
$xmldoc->save('../logs/log14-09-2016.xml');
I'm trying to insert the new node before "track" tag number 660 but when I try to open the php file it doesn't work at all.
Can anybody tell me what I am doing wrong?
SOLUTION:
After #ThW response I change a bit what he wrotes and finally the code is doing right:
$document = new DOMDocument();
$document->preserveWhiteSpace = FALSE;
$document->load('../logs/log14-09-2016.xml');
$xpath = new DOMXpath($document);
$previousTrack = $xpath->evaluate('/playlistLog/hour/track')->item(659);
$newTrack = $previousTrack->parentNode->insertBefore($document->createElement('track'),$previousTrack);
$newTrack
->appendChild($document->createElement('type'))
->appendChild($document->createTextNode('take'));
$document->formatOutput = TRUE;
echo $document->save('../logs/log14-09-2016.xml');
$elem[660] is the 661st element node with the tag name track. But its parent node is not the document element. Here is another hour ancestor between. The node you're providing to insertBefore() has a different parent then the node you're adding the new element to.
You can use the $parentNode property to make sure that you append to that node.
Additionally I suggest using Xpath to fetch the track node.
$xml = <<<'XML'
<playlistLog>
<hour>
<track>
<type>take</type>
<url>URL</url>
<title>1473869236.wav</title>
<mix>0</mix>
</track>
</hour>
</playlistLog>
XML;
$document = new DOMDocument();
$document->preserveWhiteSpace = FALSE;
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$previousTrack = $xpath->evaluate('/playlistLog/hour/track[1]')->item(0);
$newTrack = $previousTrack
->parentNode
->insertBefore(
$document->createElement('track'),
$previousTrack
);
$newTrack
->appendChild($document->createElement('type'))
->appendChild($document->createTextNode('take'));
$document->formatOutput = TRUE;
echo $document->saveXml();
Here are the codes:
$doc = new DomDocument('1.0');
// create root node
$root = $doc->createElement('root');
$root = $doc->appendChild($root);
$signed_values = array('a' => 'eee', 'b' => 'sd', 'c' => 'df');
// process one row at a time
foreach ($signed_values as $key => $val) {
// add node for each row
$occ = $doc->createElement('error');
$occ = $root->appendChild($occ);
// add a child node for each field
foreach ($signed_values as $fieldname => $fieldvalue) {
$child = $doc->createElement($fieldname);
$child = $occ->appendChild($child);
$value = $doc->createTextNode($fieldvalue);
$value = $child->appendChild($value);
}
}
// get completed xml document
$xml_string = $doc->saveXML() ;
echo $xml_string;
If I print it in the browser I don't get nice XML structure like
<xml> \n tab <child> etc.
I just get
<xml><child>ee</child></xml>
And I want to be utf-8
How is this all possible to do?
You can try to do this:
...
// get completed xml document
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$xml_string = $doc->saveXML();
echo $xml_string;
You can make set these parameter right after you've created the DOMDocument as well:
$doc = new DomDocument('1.0');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
That's probably more concise. Output in both cases is (Demo):
<?xml version="1.0"?>
<root>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
<error>
<a>eee</a>
<b>sd</b>
<c>df</c>
</error>
</root>
I'm not aware how to change the indentation character(s) with DOMDocument. You could post-process the XML with a line-by-line regular-expression based replacing (e.g. with preg_replace):
$xml_string = preg_replace('/(?:^|\G) /um', "\t", $xml_string);
Alternatively, there is the tidy extension with tidy_repair_string which can pretty print XML data as well. It's possible to specify indentation levels with it, however tidy will never output tabs.
tidy_repair_string($xml_string, ['input-xml'=> 1, 'indent' => 1, 'wrap' => 0]);
With a SimpleXml object, you can simply
$domxml = new DOMDocument('1.0');
$domxml->preserveWhiteSpace = false;
$domxml->formatOutput = true;
/* #var $xml SimpleXMLElement */
$domxml->loadXML($xml->asXML());
$domxml->save($newfile);
$xml is your simplexml object
So then you simpleXml can be saved as a new file specified by $newfile
<?php
$xml = $argv[1];
$dom = new DOMDocument();
// Initial block (must before load xml string)
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
// End initial block
$dom->loadXML($xml);
$out = $dom->saveXML();
print_R($out);
Tried all the answers but none worked. Maybe it's because I'm appending and removing childs before saving the XML.
After a lot of googling found this comment in the php documentation. I only had to reload the resulting XML to make it work.
$outXML = $xml->saveXML();
$xml = new DOMDocument();
$xml->preserveWhiteSpace = false;
$xml->formatOutput = true;
$xml->loadXML($outXML);
$outXML = $xml->saveXML();
// ##### IN SUMMARY #####
$xmlFilepath = 'test.xml';
echoFormattedXML($xmlFilepath);
/*
* echo xml in source format
*/
function echoFormattedXML($xmlFilepath) {
header('Content-Type: text/xml'); // to show source, not execute the xml
echo formatXML($xmlFilepath); // format the xml to make it readable
} // echoFormattedXML
/*
* format xml so it can be easily read but will use more disk space
*/
function formatXML($xmlFilepath) {
$loadxml = simplexml_load_file($xmlFilepath);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($loadxml->asXML());
$formatxml = new SimpleXMLElement($dom->saveXML());
//$formatxml->saveXML("testF.xml"); // save as file
return $formatxml->saveXML();
} // formatXML
Two different issues here:
Set the formatOutput and preserveWhiteSpace attributes to TRUE to generate formatted XML:
$doc->formatOutput = TRUE;
$doc->preserveWhiteSpace = TRUE;
Many web browsers (namely Internet Explorer and Firefox) format XML when they display it. Use either the View Source feature or a regular text editor to inspect the output.
See also xmlEncoding and encoding.
This is a slight variation of the above theme but I'm putting here in case others hit this and cannot make sense of it ...as I did.
When using saveXML(), preserveWhiteSpace in the target DOMdocument does not apply to imported nodes (as at PHP 5.6).
Consider the following code:
$dom = new DOMDocument(); //create a document
$dom->preserveWhiteSpace = false; //disable whitespace preservation
$dom->formatOutput = true; //pretty print output
$documentElement = $dom->createElement("Entry"); //create a node
$dom->appendChild ($documentElement); //append it
$message = new DOMDocument(); //create another document
$message->loadXML($messageXMLtext); //populate the new document from XML text
$node=$dom->importNode($message->documentElement,true); //import the new document content to a new node in the original document
$documentElement->appendChild($node); //append the new node to the document Element
$dom->saveXML($dom->documentElement); //print the original document
In this context, the $dom->saveXML(); statement will NOT pretty print the content imported from $message, but content originally in $dom will be pretty printed.
In order to achieve pretty printing for the entire $dom document, the line:
$message->preserveWhiteSpace = false;
must be included after the $message = new DOMDocument(); line - ie. the document/s from which the nodes are imported must also have preserveWhiteSpace = false.
based on the answer by #heavenevil
This function pretty prints using the browser
function prettyPrintXmlToBrowser(SimpleXMLElement $xml)
{
$domXml = new DOMDocument('1.0');
$domXml->preserveWhiteSpace = false;
$domXml->formatOutput = true;
$domXml->loadXML($xml->asXML());
$xmlString = $domXml->saveXML();
echo nl2br(str_replace(' ', ' ', htmlspecialchars($xmlString)));
}
I generating xml.When i try to add cdata it gives me error
$dom = new DomDocument("1.0", "ISO-8859-1");
$root = $dom->appendChild($dom->createElement( "Questions"));
$sxe = simplexml_import_dom( $dom );
$question = $sxe->addchild("Question");
$question->addAttribute('id', $Question_Id);
$question->addAttribute('type', $type);
$question->appendChild($sxe->createCDATASection( $Questiontext));//error
$question->addChild('Option_One', $Option_One);
$dom->formatOutput = true;
$xmlString = $dom->saveXML();
print($xmlString);
$dom->save("{$Campaign_Name}.xml");
You have made a spelling mistake, instead of appendChild you write appendChid
$question->appendChild(new DOMElement('direction'))->appendChild(new DOMCdataSection('<p>Hello!</p>'));
for DOM
I have the following code in my PHP script:
$str = '<item></item>';
$xml = new DOMDocument('1.0', 'UTF-8');
$xml->formatOutput = true;
$xml->load('file.xml');
$items = $addon->getElementsByTagName('items')->item(0);
$items->nodeValue = $str;
$xml->save('file.xml');
In the saved file.xml I see the following:
<item><\item>
How can I save it in the XML file without encoding HTML entities?
Use a DOMDocumentFragment:
<?php
$doc = new DOMDocument();
$doc->load('file.xml'); // '<doc><items/></doc>'
$item = $doc->createDocumentFragment();
$item->appendXML('<item id="1">item</item>');
$doc->getElementsByTagName('items')->item(0)->appendChild($item);
$doc->save('file.xml');
If you're appending to the root element, use $doc->documentElement->appendChild($item); instead of getElementsByTagName.