compress.zlib adding symbols to the end of xml string - php

I wrote the following php code to:
1. download an xml.gz file
2. decompress the file and get the contents as a string
3. save it as an xml file.
$data = file_get_contents('compress.zlib://'.$url);
file_put_contents($name, $data);
echo file_get_contents($name);
The code works absolutely fine except for one irritating bug.
The XML file it generates (after decompressing), has several symbols attached to the end of it such as these:
qD�_}7
I need to fix this bug because when I parse this XML string in javascript to convert into an XML object, it gives the following error - "Invalid XML"
Please suggest corrections or alternative solutions to the problem.

Related

Extract XML from .prt file using PHP but file becomes unreadable when opened with PHP

I have a .prt (CAD Design File) that I need to extract some XML from using PHP. When I view this file directly in the browser, I can see the XML along with some unreadable areas. However, when I go to open it using PHP to get the XML I need from it, the file becomes mostly unreadable and the XML is no where to be found as the file looks like it was encrypted.
This is an example of what the .prt file looks like when opened directly in the browser: File in Browser
This is an example of what the file looks like when opened using PHP: Using PHP
This is how I am trying to open the file with PHP:
$handle = fopen("thePart.prt", "rb");
$contents = trim(stream_get_contents($handle));
fclose($handle);
//echo out contents to see what happens
echo $contents;
If I could get this file to open without doing what it is doing, I can get the XML out of it myself. How do I fix the issue that I am having? Thank you very much in advance.
Real Answer
Turns out that there was no problem at all with the code. The browser was just interpreting the XML tags as HTML and so the data was not displayed (PHP by default sets a content type of text/html). When viewing the source code, the XML was plain and visible. The XML can also be seen without viewing the source by setting the content type of the php file:
header('Content-Type: text/plain');
This way, the browser will just display the XML as it is, without attempting to parse it as HTML first.
Initial Answer
Just a guess here, but it might be that you're opening the file in binary mode (the "rb" in your first line of code. Try opening it as a plain text file (use "r" instead of "rb").
More likely, it's an encoding issue where PHP is trying to decode a UTF-8 file as ASCII, for instance. Since you are opening a binary file (CAD Design File is binary with a little XML, I'm assuming), PHP might be getting confused while trying to detect the encoding of the file. I would need a copy of the file to know for sure.
Try comparing the result of mb_detect_encoding:
mb_detect_encoding($contents)
and the actual encoding of the XML data within the .prt file. If they are different, that's how you know that PHP is using the wrong encoding. In that case, use mb_convert_encoding to convert from PHP's detected encoding to that of the XML data.

getting xml data as is(with tags, attributes and values) with php

Hello im struggling with a problem. I have an url that contains xml data...
when i'm using file_get_contents($url) or fopen($url,'r') it gives me only values:
Consider the xml:
<tag1 attrName="something">
<tag2>some Value</tag2>
<tag2>some Other Value</tag2>
...
...
</tag1>
what i get: some Value, some Other Value
But i need to get whole xml (with tags and attributes and its' values) and parse it with my own way because there's a restriction that i'm not allowed to use php 5.x practices.I mean i cant use any parser.. It shouldnt be so hard to get xml data as is.. should it??
what i get: some Value, some Other Value
Nope - my suspicion is that that is what you see in your browser, because it is swallowing all <tags>.
The XML source code will be there after a file_get_contents() operation.
You are using file_get_contents() which states
This function is similar to file(), except that file_get_contents()
returns the file in a string, starting at the specified offset up to
maxlen bytes. On failure, file_get_contents() will return FALSE.
Press Ctrl+u to see the source code in any of the major browsers(except IE where its F12 in IE9). I am sure that your code will be there. Your browser wont display the tags that's all.
The other longer(but better way) to display an XML file from your php file is to pass the content type as text/xml. Use the following way
<?php
header("Content-Type: text/xml");//SHOULD come before any output
// dynamically generate and output your xml here
?>

Load an invalid XML in PHP DOM

I have and input XML file that is not correctly formatted ( ie. it has '&' instead of '& amp;')
When i try to load this XML using PHP DOM, $doc->load("file.xml") it throws and error and stops the parsing.
Is there any way to load this un-formatted XML? and No I cant edit the source XML file.
I did try using $doc->loadHTML() but it throws errors all over the place.
I wanted to know if there is a proper way to do this (like load file contents and change it using regex or something similar)
Try setting $doc->validateOnParse = false; before loading your XML via $doc->loadHTML(...).
First, check that it's the & that's causing the error and not something else.
One way or another, you'll have to modify the XML to get it parsed. The HTML in loadHTML is loaded from a string, can't you just replace the invalid characters with the correct ones?
If your installation supports the PHP Tidy extension (http://php.net/manual/en/book.tidy.php) you could try to clean it up with that, though in my experience it's far from foolproof.
If you are sure that's the only thing making it not validate, then you could try loading the file into a string with file_get_contents() function, then search & replace through the string to change the &'s into &'s, then place that string into simpleXML like $xml = simplexml_load_string($cleaned_string);

Why do I see `á` instead of a space when writing to screen (encoding problem)?

I am completely lost with encoding issues, I have no idea what's going on, what the problem is exactly and how to fix it.
Basically I'm just trying to read an HTML file from a Zip file, parse it then output pieces to XML. Now something funky is happening with the text I get out of the parser.
When parsing the HTML, instead of a space I get á only if I write to the screen. If I keep it in a variable and write to a file it looks fine in the file. However even though it looks right in the XML something is wrong with it, my PHP parser can't parse that XML nor does IE seem to like it.
I had to first mb_convert_encoding($xmlcontent, "ASCII"); so I could get that XML to parse in PHP.
Any idea what my problem is?
extract HTML from a .tar.gz file using Perl
my $tar = Archive::Tar->new;
$tar->read("myfile.tar.gz");
$tar->extract_file('index.html', 'output.html');
load HTML, this is where it starts to get funky, I get output like Numberáofásourceálines
my $tree = HTML::TreeBuilder->new;
$tree->parse_file('output.html') or die $!;
$tree->elementify;
write to XML
my $output = new IO::File(">output.xml");
my $writer = new XML::Writer(OUTPUT => $output, DATA_MODE => 1,DATA_INDENT => 2);
If it looks correct when you write it to a file and wrong when you write it to the terminal, it sounds like your terminal is expecting the wrong encoding. Check your terminal settings.'
Also, see Jon Rockway's answer to "Why does modern Perl avoid UTF-8 by default?". With encodings, you have to convert your input to the correct encoding and convert your output to the correct encoding. Everything that looks at the data needs to know which encoding you're using.
I think I just fixed it by processing this on the html before parsing it, thanks for all the great pointers!
s/\&nbsp\;/ /g;

XML not well formed error

I have a php script that writes xml data to a file and another one that sends the contents of this file to the client as the response.
But on the client side,im getting the following error:
XML Parsing Error: not well-formed
When i view source of the page, the XML i see is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<books><date>December 24th, 2009</date><total>2</total><book><name>Book 1</name><url>http://www.mydomain.com/posters/68370/img.jpg</url></book><book><name>Book 2</name><url>http://www.anotherdomain.com/posters/76198/img1.jpg</url></book></books>
In file1.php i have the following code that writes the XML to a file :
$file= fopen("book_results.xml", "w");
$xml_writer = new XMLWriter();
$xml_writer->openMemory();
$xml_writer->startDocument('1.0', 'UTF-8', 'yes');
$xml_writer->startElement('books');
$xml_writer->writeElement('date',get_current_date()); // Like December 23rd, 2009
$xml_writer->writeElement('total',$totalResults);
foreach($bookList as $key => $value) { /* $bookList contains key value pairs */
$xml_writer->startElement('book');
$xml_writer->writeElement('name',$key);
$xml_writer->writeElement('url',$value);
$xml_writer->endElement(); //book
}
$xml_writer->endElement(); //books
$xml_data = $xml_writer->outputMemory();
fwrite($file,$xml_data);
fclose($file);
And in index.php, i have the following code to send the contents of the file as a response
<?php
//Send the xml file contents as response
header('Content-type: text/xml');
readfile('book_results.xml');
?>
What could be causing the error ?
Please help.
Thank You.
The above looks good to me (including the fact that you're forming the XML via a dedicated component) and either:
what you're using to validate this is wrong
you're looking at something different to what you think you are
I would definitely try another tool/browser/whatever to validate this. Additionally, you may want to save the XML file as sent to the browser, and check it using XMLStarlet (a command-line XML toolkit).
I'm wondering also if it's an issue that we can't easily see - a character encoding problem or a Byte-Order-Mark issue (related to encodings). Does the character encoding of the web page you're sending match/differ from the encoding of the XML (UTF-8).
There are some free websites and tools for checking for validity in XML.
According to the XML Validator, when I pasted your XML above into the textarea, it said "no errors found".
However, Validome says "Can not find declaration of element 'books'."
Perhaps Jeff's suggestion of changing date and total to attributes might help. It would probably be easy to try that.
Have you tried using those 2 loose date and total tags as attributes instead?:
<books date="December 24th" total="2">
Also, xml can be quite sensitive. Make sure to use CDATA tags were appropriate
It validates fine in WMHelp XMLPad 3.0.1.0, and opens fine in FireFox 3.0.8 and IE7 without errors.
The only thing I can see, from a copy and paste of your XML, is that the XML declaration is followed by a CR/LF combination (0x0D0x0A). This is platform specific (Windows), and may be an issue on the client; you didn't mention what the client was, however, so I can't be sure if that's the problem.
Ensure that you are writing UTF-8 or 7-bit ASCII encoding to the file (test with a text editor or the 'file' command, if you have it), and that your checker supports it. Keep in mind that UTF-8 can include a signature (sometimes called the byte-order mark) in the first three bytes (EF BB BF) that sometimes confuses some tools if it is there, and rarely if it is not.
xml version='1.0' encoding='UTF-8' standalone='yes'
use single quote.

Categories