Two errors when using PHP XMLreader - php

I have the following two errors when using XMLReader.
1) Warning: XMLReader::read() [xmlreader.read]: MyXML.xml:43102: parser error : xmlParseEntityRef: no name
2) Warning: XMLReader::read() [xmlreader.read]: ^ in MyXMLReader.php on line 56
Does anyone know what those refer to?
My PHP Code (The XML file is about 100MB so I can't include it):
<?php
//Assign file names
$XMLFile = 'MyXML.xml';
$CSVFile = 'MyCSV.csv';
//take start time to calculate run-time
$time_start = time();
//Open PHP's XMLReader. XMLReader opens each element in the XML one by one to keep memory use small.
$xml = new XMLReader();
$xml->open($XMLFile, null, 1<<19);
//Loop through all elements. Save all text from tags and attributes.
while ($xml->read()) {
if($xml->nodeType == XMLReader::TEXT) {
$row[$xml->name] = $xml->value;
}
if($xml->hasAttributes) {
while($xml->moveToNextAttribute()) {
$row[$xml->name] = $xml->value;
}
}
}
//save the titles which should appear in CSV file. All others will not be included.
$SavedRows = $row;
unset($row);
//Remove unnecessary columns i.e. datasource URLs
$RemoveColumn='xmlns:message, xmlns:common, xmlns:frb, xmlns:xsi, xsi:schemaLocation, xmlns:kf';
$RemoveColumns = explode(',', $RemoveColumn);
foreach($RemoveColumns as $key => $val) {
$val = trim($val);
unset($SavedRows[$val]);
}
//initiate all rows which should be included
foreach($SavedRows as $key => $val) {
$row[$key] = '';
}
//Create csv file
$fp = fopen($CSVFile, 'w');
//Input the column headings as first row
fputcsv($fp, array_keys($row), ',');
// Start 2nd loop through XML.
$xml = new XMLReader();
$xml->open($XMLFile, null, 1<<19);
while ($xml->read()) {
//Determine if tag is empty (An empty tag will contain data) Non empty tags contain series information.
$Output = $xml->isEmptyElement;
//Take data from non empty XML tags
if($xml->nodeType == XMLReader::TEXT) {
if(isset($SavedRows[$xml->name])) {
$row[$xml->name] = $xml->value;
}
}
//take data from XML tag attributes
if($xml->hasAttributes) {
while($xml->moveToNextAttribute()) {
if(isset($SavedRows[$xml->name])) {
$row[$xml->name] = $xml->value;
}
}
}
//If tag is empty, assume it is data and write row to file.
if($Output) {
fputcsv($fp, array_values($row), ',');
}
}
//Close file handle
fclose($fp);
//Calculate runtime
$time_end = time();
$time = $time_end - $time_start;
echo "Complete. Runtime: $time seconds";
?>

xmlParseEntityRef: no name
Means you've got bogus unescaped ampersands in the XML file. (Well, “XML”... technically if it ain't well-formed, it ain't XML.)
You'll need to check the file for lone &s (or fix the code that generated it) to escape them to &. According to the error, the first one's on line 43102 of the file (yikes!).

Related

Iterate through a CSV file and get every value for a specified header?

I have a CSV file and I want to check if the row contains a special title. Only if my row contains a special title it should be converted to XML, other stuff added and so on.
My question now is, how can I iterate through the whole CSV file and get for every title the value in this field?
Because if it matches my special title I just want to convert the specified row where the title is matching my title. Maybe also an idea how I can do that?
Sample: CSV File
I must add that feature to my actual function. Because my actual function is just is converting the whole CSV to XML. But I just want to convert the specified rows.
My actual function:
function csvToXML($inputFilename, $outputFilename, $delimiter = ',')
{
// Open csv to read
$inputFile = fopen($inputFilename, 'rt');
// Get the headers of the file
$headers = fgetcsv($inputFile, 0, $delimiter);
// Create a new dom document with pretty formatting
$doc = new DOMDocument('1.0', 'utf-8');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
// Add a root node to the document
$root = $doc->createElement('products');
$root = $doc->appendChild($root);
// Loop through each row creating a <row> node with the correct data
while (($row = fgetcsv($inputFile, 0, $delimiter)) !== false) {
$container = $doc->createElement('product');
foreach ($headers as $i => $header) {
$child = $doc->createElement($header);
$child = $container->appendChild($child);
$value = $doc->createTextNode($row[$i]);
$value = $child->appendChild($value);
}
$root->appendChild($container);
}
$strxml = $doc->saveXML();
$handle = fopen($outputFilename, 'w');
fwrite($handle, $strxml);
fclose($handle);
}
Just check the title before adding the rows to XML. You could do it by adding the following lines:
while (($row = fgetcsv($inputFile, 0, $delimiter)) !== false) {
$specialTitles = Array('Title 1', 'Title 2', 'Title 3'); // titles you want to keep
if(in_array($row[1], $specialTitles)){
$container = $doc->createElement('product');
foreach ($headers as $i => $header) {
$child = $doc->createElement($header);
$child = $container->appendChild($child);
$value = $doc->createTextNode($row[$i]);
$value = $child->appendChild($value);
}
$root->appendChild($container);
}
}

Read All Lines In A File And Remove Ones Containg Certain String

I'm rather new to PHP, and was trying to remove all lines where any instance of the string variable 'user' appears. My current code
if($action == "removeUser")
{
foreach(file('users.txt') as $line)
{
if (strpos($line, $parameters) !== false)
{
$line = "";
}
}
}
For some reason this doesn't seem to have any effect at all. What am I doing wrong?
You need to open the file and read the lines.
<?php
if($action == "removeUser")
{
$filename = "users.txt";
// Open your file
$handle = fopen($filename, "r");
$new_content='';
echo "Valid input: <br><br>";
// Loop through all the lines
while( $line = fgets($handle) )
{
//try to find the string 'user' - Case-insensitive
if(stristr($line,"user")===FALSE)
{
// To remove white spaces
$line=trim($line);
if($line!='') echo $line."<br>";
//if doesn't contain the string "user",
// add it to new input
$new_content.=$line."\n";
}
}
// closes the file
fclose($handle);
$new_content=trim($new_content); // Remove the \n from the last line
echo "<br>Updating file with new content...";
file_put_contents($filename,$new_content);
echo "Ok";
}
?>

extracting anchor values hidden in div tags

From a html page I need to extract the values of v from all anchor links…each anchor link is hidden in some 5 div tags
<a href="/watch?v=value to be retrived&list=blabla&feature=plpp_play_all">
Each v value has 11 characters, for this as of now am trying to read it by character by character like
<?php
$file=fopen("xx.html","r") or exit("Unable to open file!");
$d='v';
$dd='=';
$vd=array();
while (!feof($file))
{
$f=fgetc($file);
if($f==$d)
{
$ff=fgetc($file);
if ($ff==$dd)
{
$idea='';
for($i=0;$i<=10;$i++)
{
$sData = fgetc($file);
$id=$id.$sData;
}
array_push($vd, $id);
That is am getting each character of v and storing it in sData variable and pushing it into id so as to get those 11 characters as a string(id)…
the problem is…searching for the ‘v=’ through the entire html file and if found reading the 11characters and pushing it into a sData array is sucking, it is taking considerable amount of time…so pls help me to sophisticate the things
<?php
function substring(&$string,$start,$end)
{
$pos = strpos(">".$string,$start);
if(! $pos) return "";
$pos--;
$string = substr($string,$pos+strlen($start));
$posend = strpos($string,$end);
$toret = substr($string,0,$posend);
$string = substr($string,$posend);
return $toret;
}
$contents = #file_get_contents("xx.html");
$old="";
$videosArray=array();
while ($old <> $contents)
{
$old = $contents;
$v = substring($contents,"?v=","&");
if($v) $videosArray[] = $v;
}
//$videosArray is array of v's
?>
I would better parse HTML with SimpleXML and XPath:
// Get your page HTML string
$html = file_get_contents('xx.html');
// As per comment by Gordon to suppress invalid markup warnings
libxml_use_internal_errors(true);
// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);
// Find a nodes
$anchors = $xml->xpath('//a[contains(#href, "v=")]');
foreach ($anchors as $a)
{
$href = (string)$a['href'];
$url = parse_url($href);
parse_str($url['query'], $params);
// $params['v'] contains what we need
$vd[] = $params['v']; // push into array
}
// Clear invalid markup error buffer
libxml_clear_errors();

CSV generation in PHP Smarty

I have the PHP output, which is in table form, and I want that output in CSV format. How can I get that using PHP Smarty?
You should parse that PHP output with an xml parser. Use something like:
$phpOutput = ... ; //this is the output containing only the table
$xml = simplexml_load_string($phpOutput);
$csvOutput = "";
foreach ($xml->tr as $rows)
{
$cells = array();
foreach ($rows->td as $cell)
{
$cells[] = (string)$cell;
}
$csvOutput .= implode(",",$cells)."\r\n";
}
$smarty->assign("csv",$csvOutput);
Of course you must be careful and close every tag if you don't want to get warnings.

Indentation with DOMDocument in PHP

I'm using DOMDocument to generate a new XML file and I would like for the output of the file to be indented nicely so that it's easy to follow for a human reader.
For example, when DOMDocument outputs this data:
<?xml version="1.0"?>
<this attr="that"><foo>lkjalksjdlakjdlkasd</foo><foo>lkjlkasjlkajklajslk</foo></this>
I want the XML file to be:
<?xml version="1.0"?>
<this attr="that">
<foo>lkjalksjdlakjdlkasd</foo>
<foo>lkjlkasjlkajklajslk</foo>
</this>
I've been searching around looking for answers, and everything that I've found seems to say to try to control the white space this way:
$foo = new DOMDocument();
$foo->preserveWhiteSpace = false;
$foo->formatOutput = true;
But this does not seem to do anything. Perhaps this only works when reading XML? Keep in mind I'm trying to write new documents.
Is there anything built-in to DOMDocument to do this? Or a function that can accomplish this easily?
DomDocument will do the trick, I personally spent couple of hours Googling and trying to figure this out and I noted that if you use
$xmlDoc = new DOMDocument ();
$xmlDoc->loadXML ( $xml );
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->formatOutput = true;
$xmlDoc->save($xml_file);
In that order, It just doesn't work but, if you use the same code but in this order:
$xmlDoc = new DOMDocument ();
$xmlDoc->preserveWhiteSpace = false;
$xmlDoc->formatOutput = true;
$xmlDoc->loadXML ( $xml );
$xmlDoc->save($archivoxml);
Works like a charm, hope this helps
After some help from John and playing around with this on my own, it seems that even DOMDocument's inherent support for formatting didn't meet my needs. So, I decided to write my own indentation function.
This is a pretty crude function that I just threw together quickly, so if anyone has any optimization tips or anything to say about it in general, I'd be glad to hear it!
function indent($text)
{
// Create new lines where necessary
$find = array('>', '</', "\n\n");
$replace = array(">\n", "\n</", "\n");
$text = str_replace($find, $replace, $text);
$text = trim($text); // for the \n that was added after the final tag
$text_array = explode("\n", $text);
$open_tags = 0;
foreach ($text_array AS $key => $line)
{
if (($key == 0) || ($key == 1)) // The first line shouldn't affect the indentation
$tabs = '';
else
{
for ($i = 1; $i <= $open_tags; $i++)
$tabs .= "\t";
}
if ($key != 0)
{
if ((strpos($line, '</') === false) && (strpos($line, '>') !== false))
$open_tags++;
else if ($open_tags > 0)
$open_tags--;
}
$new_array[] = $tabs . $line;
unset($tabs);
}
$indented_text = implode("\n", $new_array);
return $indented_text;
}
I have tried running the code below setting formatOutput and preserveWhiteSpace in different ways, and the only member that has any effect on the output is formatOutput. Can you run the script below and see if it works?
<?php
echo "<pre>";
$foo = new DOMDocument();
//$foo->preserveWhiteSpace = false;
$foo->formatOutput = true;
$root = $foo->createElement("root");
$root->setAttribute("attr", "that");
$bar = $foo->createElement("bar", "some text in bar");
$baz = $foo->createElement("baz", "some text in baz");
$foo->appendChild($root);
$root->appendChild($bar);
$root->appendChild($baz);
echo htmlspecialchars($foo->saveXML());
echo "</pre>";
?>
Which method do you call when printing the xml?
I use this:
$doc = new DOMDocument('1.0', 'utf-8');
$root = $doc->createElement('root');
$doc->appendChild($root);
(...)
$doc->formatOutput = true;
$doc->saveXML($root);
It works perfectly but prints out only the element, so you must print the <?xml ... ?> part manually..
Most answers in this topic deal with xml text flow.
Here is another approach using the dom functionalities to perform the indentation job.
The loadXML() dom method imports indentation characters present in the xml source as text nodes. The idea is to remove such text nodes from the dom and then recreate correctly formatted ones (see comments in the code below for more details).
The xmlIndent() function is implemented as a method of the indentDomDocument class, which is inherited from domDocument.
Below is a complete example of how to use it :
$dom = new indentDomDocument("1.0");
$xml = file_get_contents("books.xml");
$dom->loadXML($xml);
$dom->xmlIndent();
echo $dom->saveXML();
class indentDomDocument extends domDocument {
public function xmlIndent() {
// Retrieve all text nodes using XPath
$x = new DOMXPath($this);
$nodeList = $x->query("//text()");
foreach($nodeList as $node) {
// 1. "Trim" each text node by removing its leading and trailing spaces and newlines.
$node->nodeValue = preg_replace("/^[\s\r\n]+/", "", $node->nodeValue);
$node->nodeValue = preg_replace("/[\s\r\n]+$/", "", $node->nodeValue);
// 2. Resulting text node may have become "empty" (zero length nodeValue) after trim. If so, remove it from the dom.
if(strlen($node->nodeValue) == 0) $node->parentNode->removeChild($node);
}
// 3. Starting from root (documentElement), recursively indent each node.
$this->xmlIndentRecursive($this->documentElement, 0);
} // end function xmlIndent
private function xmlIndentRecursive($currentNode, $depth) {
$indentCurrent = true;
if(($currentNode->nodeType == XML_TEXT_NODE) && ($currentNode->parentNode->childNodes->length == 1)) {
// A text node being the unique child of its parent will not be indented.
// In this special case, we must tell the parent node not to indent its closing tag.
$indentCurrent = false;
}
if($indentCurrent && $depth > 0) {
// Indenting a node consists of inserting before it a new text node
// containing a newline followed by a number of tabs corresponding
// to the node depth.
$textNode = $this->createTextNode("\n" . str_repeat("\t", $depth));
$currentNode->parentNode->insertBefore($textNode, $currentNode);
}
if($currentNode->childNodes) {
$indentClosingTag = false;
foreach($currentNode->childNodes as $childNode) $indentClosingTag = $this->xmlIndentRecursive($childNode, $depth+1);
if($indentClosingTag) {
// If children have been indented, then the closing tag
// of the current node must also be indented.
$textNode = $this->createTextNode("\n" . str_repeat("\t", $depth));
$currentNode->appendChild($textNode);
}
}
return $indentCurrent;
} // end function xmlIndentRecursive
} // end class indentDomDocument
Yo peeps,
just found out that apparently, a root XML element may not contain text children. This is nonintuitive a. f. But apparently, this is the reason that, for instance,
$x = new \DOMDocument;
$x -> preserveWhiteSpace = false;
$x -> formatOutput = true;
$x -> loadXML('<root>a<b>c</b></root>');
echo $x -> saveXML();
will fail to indent.
https://bugs.php.net/bug.php?id=54972
So there you go, h. t. h. et c.
header("Content-Type: text/xml");
$str = "";
$str .= "<customer>";
$str .= "<offer>";
$str .= "<opened></opened>";
$str .= "<redeemed></redeemed>";
$str .= "</offer>";
echo $str .= "</customer>";
If you are using any extension other than .xml then first set the header Content-Type header to the correct value.

Categories