XMLReader and doctype - php

I need to parse an XML file and I need also to parse the doctype. I've tried with XML Reader but when I found a nodetype 10 (doctype), I can't get it's value.
There is a way to extract the doctype from an XML file, with XMLReader?
Edit: as asked, some sample code. however is nothing rather than a dump, right now.
$reader = new XMLReader( );
$filename = 'test.xhtml';
$reader->open($filename);
while( $reader->read( ) )
{
$nodeType = $reader->nodeType;
$nodeName = $reader->name;
$nodeValue = $reader->value;
if( $nodeType == 10 )
{
echo $nodeType ."\n";
echo $nodeName ."\n";
echo $nodeValue ."\n";
echo $reader->localName ."\n";
echo $reader->namespaceURI ."\n";
echo $reader->prefix ."\n";
echo $reader->xmlLang ."\n";
echo $reader->readString() . "\n";
echo $reader->readInnerXML() . "\n";
while( $reader->moveToNextAttribute( ) )
{
echo $reader->name . "=" . $reader->value;
}
}

You can use DOM to read the DOCTYPE data:
$doc = new DOMDocument();
$doc->loadXML($xmlData);
var_dump($doc->doctype->publicId);
var_dump($doc->doctype->systemId);
var_dump($doc->doctype->name);
var_dump($doc->doctype->entities);
var_dump($doc->doctype->notations);

I have not found a way to do this with XMLReader despite a lot of looking. However you can use DOMDocument to read the doctype quite easily, then revert to XMLReader to read the rest of the stream. For example, to get the system ID part of the doctype before processing the rest of the XML file:
$doc = new DOMDocument();
$doc->load($xmlfile);
$systemId = $doc->doctype->systemId;
unset($doc);
// Then proceed with XMLReader:
$reader = new XMLReader();
$reader->open($xmlfile);
while($reader->read())
{
// etc
I suppose that this may not be practical in all circumstances but it worked for me while processing very large XML files for which I needed to read the system ID from the doctype.

Related

Retrieve OuterXML From XML Child Node

I need to retrieve the OuterXML for each speak tag.
For example, I need to retrieve this data for the first speak tag in test.ssml:
<speak xmlns="https://www.w3.org/2001/10/synthesis" version="1.0" xml:lang="en-US">
<voice name="en-US-GuyNeural">
<prosody rate="0.00%">Test 1</prosody>
</voice>
</speak>
index.php
set_time_limit(0);
require_once('src/Config.php');
$fileName = __DIR__.DIRECTORY_SEPARATOR.'test.ssml';
$fileContent = file_get_contents($fileName);
// $fileContent = preg_replace( "/\r|\n/", "", $fileContent );
$xml=simplexml_load_file($fileName);
$reader = new XMLReader();
foreach($xml->speak as $child)
{
echo $child->getName() . " ::: " . htmlspecialchars( $reader->readOuterXml ( $child ) ). "<br>";
}
test.ssml
all tracks.mp3
bookmarks.dat
Test 1
Test 2
Current Output in Browser
Desired Output
You can get the XML directly using the SimpleXML function asXML() and don't need (as far as I can tell) the XMLReader...
$xml=simplexml_load_file($fileName);
foreach($xml->speak as $child)
{
echo $child->asXML()."<br />";
}

How to store XML files in proper order? [duplicate]

I'm trying add some data to an existing XML file using PHP's SimpleXML. The problem is it adds all the data in a single line:
<name>blah</name><class>blah</class><area>blah</area> ...
And so on. All in a single line. How to introduce line breaks?
How do I make it like this?
<name>blah</name>
<class>blah</class>
<area>blah</area>
I am using asXML() function.
Thanks.
You could use the DOMDocument class to reformat your code:
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($simpleXml->asXML());
echo $dom->saveXML();
Gumbo's solution does the trick. You can do work with simpleXml above and then add this at the end to echo and/or save it with formatting.
Code below echos it and saves it to a file (see comments in code and remove whatever you don't want):
//Format XML to save indented tree rather than one line
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($simpleXml->asXML());
//Echo XML - remove this and following line if echo not desired
echo $dom->saveXML();
//Save XML to file - remove this and following line if save not desired
$dom->save('fileName.xml');
Use dom_import_simplexml to convert to a DomElement. Then use its capacity to format output.
$dom = dom_import_simplexml($simple_xml)->ownerDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
echo $dom->saveXML();
As Gumbo and Witman answered; loading and saving an XML document from an existing file (we're a lot of newbies around here) with DOMDocument::load and DOMDocument::save.
<?php
$xmlFile = 'filename.xml';
if( !file_exists($xmlFile) ) die('Missing file: ' . $xmlFile);
else
{
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dl = #$dom->load($xmlFile); // remove error control operator (#) to print any error message generated while loading.
if ( !$dl ) die('Error while parsing the document: ' . $xmlFile);
echo $dom->save($xmlFile);
}
?>

How to prevent XMLWriter from appending blank line to outputted XML file?

The following code creates an XML file, but the last line is blank which causes problems when validated.
How can I change the following code so that the outputted file does not have a blank line at the end of it?
<?php
$xmlFileName = 'testoutput.xml';
$xml = new XMLWriter;
$xml->openURI($xmlFileName);
$xml->startDocument('1.0', 'UTF-8');
$xml->setIndent(1);
$xml->startElement('name');
$xml->text('jim');
$xml->endElement();
$xml->endDocument();
$xml->flush();
?>
#DavidRR, the validation problem comes when I validate the XML file with the following code, it tells me that there is "extra content at the end of the document":
$schema = 'test.xsd';
$files[] = 'test1.xml';
$files[] = 'test2.xml';
foreach ($files as $file) {
validateXml($file, $schema);
}
function validateXml($xmlFile, $xsdFile) {
$dom = new DOMDocument;
$dom->load($xmlFile);
libxml_use_internal_errors(true); // enable user error handling
echo "Validating <b>$xmlFile</b> with <b>$xsdFile</b>:";
if ($dom->schemaValidate($xsdFile)) {
echo '<div style="margin-left:20px">ok</div>';
} else {
$errors = libxml_get_errors();
if (count($errors) > 0) {
echo '<ul style="color:red">';
foreach ($errors as $error) {
//var_dump($error);
echo '<li>' . $error->message . '</li>';
}
echo '</ul>';
}
libxml_clear_errors();
echo '</span>';
libxml_use_internal_errors(false); // enable user error handling
}
}
Reported problem: Because of the presence of a blank line at the end of an XML file, a schema validation attempt on the file results in the error:
"Extra content at the end of the document"
I'm not able to reproduce your stated problem at codepad, PHP version 5.4-dev, or any of the earlier versions of PHP on that site. I'm including my edited version of your code here as well. (My version includes functions to create the simple XSD and XML files under examination.)
Possibility: Could your problem be related to the version of PHP that you are using?
If I haven't accurately tested your scenario with my adaptation of your code, please further modify my code to precipitate the problem.
<?php
$xsdFile = sys_get_temp_dir() . '/test1.xsd';
$xmlFile = sys_get_temp_dir() . '/test1.xml';
createXsdFile($xsdFile);
createXmlFile($xmlFile);
$files[] = $xmlFile;
foreach ($files as $file) {
validateXml($file, $xsdFile);
}
function validateXml($xmlFile, $xsdFile) {
$dom = new DOMDocument;
$dom->load($xmlFile);
libxml_use_internal_errors(true); // enable user error handling
echo "Validating <b>$xmlFile</b> with <b>$xsdFile</b>:";
if ($dom->schemaValidate($xsdFile)) {
echo '<div style="margin-left:20px">ok</div>';
} else {
$errors = libxml_get_errors();
if (count($errors) > 0) {
echo '<ul style="color:red">';
foreach ($errors as $error) {
//var_dump($error);
echo '<li>' . $error->message . '</li>';
}
echo '</ul>';
}
libxml_clear_errors();
echo '</span>';
libxml_use_internal_errors(false); // enable user error handling
}
}
function createXsdFile($xsdFile) {
$file = fopen($xsdFile, 'w');
fwrite($file, "<?xml version='1.0' encoding='utf-8'?>\n");
fwrite($file, "<schema xmlns='http://www.w3.org/2001/XMLSchema'>\n");
fwrite($file, "<element name='name' type='string' />\n");
fwrite($file, "</schema>\n");
fclose($file);
}
//
// Appends a blank line at the end of the XML file.
// Does this cause a schema validation problem?
//
function createXmlFile($xmlFile) {
$xml = new XMLWriter;
$xml->openURI($xmlFile);
$xml->startDocument('1.0', 'UTF-8');
$xml->setIndent(1);
$xml->startElement('name');
$xml->text('jim');
$xml->endElement();
$xml->endDocument();
$xml->flush();
}
?>
I have found no way to change the behavior of XmlWriter in that regard. A possible fix would be to read the file, trim it and then write it back to file, e.g.
file_put_contents($xmlFileName, trim(file_get_contents($xmlFileName)));
demo
An alternative would be to ftruncate the file
ftruncate(fopen($xmlFileName, 'r+'), filesize($xmlFileName) - strlen(PHP_EOL));
demo
The latter assumes there will be a platform dependent newline in the file. If there isn't, this will likely break the file then. The trim version is more solid in that regard as it will not damage the file if there isnt a newline, but it has to read the entire file into memory in order to trim the content.
If you are on linux/unix system, you can do:
$test = `head -n -1 < $xmlFileName > $xmlFileName`;
See this.

getting image src in php

how to get image source from an img tag using php function.
Or, you can use the built-in DOM functions (if you use PHP 5+):
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);
$imgs = $xpath->query("//img");
for ($i=0; $i < $imgs->length; $i++) {
$img = $imgs->item($i);
$src = $img->getAttribute("src");
// do something with $src
}
This keeps you from having to use external classes.
Consider taking a look at this.
I'm not sure if this is an accepted method of solving your problem, but check this code snippet out:
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
You can use PHP Simple HTML DOM Parser (http://simplehtmldom.sourceforge.net/)
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element) {
echo $element->src.'<br>';
}
// Find all links
foreach($html->find('a') as $element) {
echo $element->href.'<br>';
}
$path1 = 'http://example.com/index.html';//path of the html page
$file = file_get_contents($path1);
$dom = new DOMDocument;
#$dom->loadHTML($file);
$links = $dom->getElementsByTagName('img');
foreach ($links as $link)
{
$re = $link->getAttribute('src');
$a[] = $re;
}
Output:
Array
(
[0] => demo/banner_31.png
[1] => demo/my_code.png
)

PHP simpleXML how to save the file in a formatted way?

I'm trying add some data to an existing XML file using PHP's SimpleXML. The problem is it adds all the data in a single line:
<name>blah</name><class>blah</class><area>blah</area> ...
And so on. All in a single line. How to introduce line breaks?
How do I make it like this?
<name>blah</name>
<class>blah</class>
<area>blah</area>
I am using asXML() function.
Thanks.
You could use the DOMDocument class to reformat your code:
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($simpleXml->asXML());
echo $dom->saveXML();
Gumbo's solution does the trick. You can do work with simpleXml above and then add this at the end to echo and/or save it with formatting.
Code below echos it and saves it to a file (see comments in code and remove whatever you don't want):
//Format XML to save indented tree rather than one line
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($simpleXml->asXML());
//Echo XML - remove this and following line if echo not desired
echo $dom->saveXML();
//Save XML to file - remove this and following line if save not desired
$dom->save('fileName.xml');
Use dom_import_simplexml to convert to a DomElement. Then use its capacity to format output.
$dom = dom_import_simplexml($simple_xml)->ownerDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
echo $dom->saveXML();
As Gumbo and Witman answered; loading and saving an XML document from an existing file (we're a lot of newbies around here) with DOMDocument::load and DOMDocument::save.
<?php
$xmlFile = 'filename.xml';
if( !file_exists($xmlFile) ) die('Missing file: ' . $xmlFile);
else
{
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dl = #$dom->load($xmlFile); // remove error control operator (#) to print any error message generated while loading.
if ( !$dl ) die('Error while parsing the document: ' . $xmlFile);
echo $dom->save($xmlFile);
}
?>

Categories