XSLT stripping comments

XSLT stripping comments - php

I have a weird problem. Using XSLT transformations with PHP and for some reason, the compiled template file that is printed to the user strips all comments from the code. This never occurred before and have been unable to debug this problem at all. Even at the source $xslt->transformToXML($xml), it is stripped comments now, when it wasn't before.
This is particularly annoying with JS blocks that are wrapped in <!-- -->.
Any ideas?

As far as I know, unless you tell it otherwise, an XSLT transform will strip comments and processing instructions.
If you want to keep comments you can add something like
<xsl:template match="comment()">
<xsl:comment><xsl:value-of select="."/></xsl:comment>
</xsl:template>
to your xslt file.

Related

XML declaration allowed only at the start of the document Sitemap Error [duplicate]

This error,
The processing instruction target matching "[xX][mM][lL]" is not allowed
occurs whenever I run an XSLT page that begins as follows:
<?xml version="1.0" encoding="windows-1256"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:include href="../header.xsl"/>
<xsl:template match="/">
<xsl:call-template name="pstyle"/>
<xsl:call-template name="Validation"/>
<xsl:variable name="strLang">
<xsl:value-of select="//lang"/>
</xsl:variable>
<!-- ////////////// Page Title ///////////// -->
<title>
<xsl:value-of select="//ListStudentFinishedExam.Title"/>
</title>
Note: I removed any leading spaces before the first line, but the error still occurs!

Xerces-based tools will emit the following error
The processing instruction target matching "[xX][mM][lL]" is not allowed.
when an XML declaration is encountered anywhere other than at the top of an XML file.
This is a valid diagnostic message; other XML parsers should issue a similar error message in this situation.
To correct the problem, check the following possibilities:
Some blank space or other visible content exists before the <?xml ?>
declaration.
Resolution: remove blank space or any other
visible content before the XML declaration.
Some invisible content exists before the <?xml ?>
declaration. Most commonly this is a Byte Order Mark
(BOM).
Resolution:
Remove the BOM using techniques such as those suggested by the W3C
page on the BOM in HTML.
A stray <?xml ?> declaration exists within the XML content.
This can happen when XML files are combined programmatically or
via cut-and-paste. There can only be one <?xml ?> declaration
in an XML file, and it can only be at the top.
Resolution: Search for
<?xml in a case-insensitive manner, and remove all but the top XML
declaration from the file.

Debug your XML file. Either there is space or added extra or fewer tags.
For better understanding build the project through the command line. Windows: gradlew build
In my case, AndroidManifest.xml has a blank space at the very first line
<Empty Row> // This Creates the issue
<?xml version="1.0" encoding="utf-8"?>

There was auto generated Copyright message in XML and a blank line before <resources> tag, once I removed it my build was successful.

just remove this line: <?xml version="1.0" encoding="utf-8"?> because this kind of error only come because of this line or you might also check the format of your line according the mentioned line in this answer.

I had a similar issue with 50,000 rdf/xml files in 5,000 directories (the Project Gutenberg catalog file). I solved it with riot (in the jena distribution)
the directory is cache/epub/NN/nn.rdf (where NN is a number)
in the directory above the directory where all the files are, i.e. in cache
riot epub/*/*.rdf --output=turtle > allTurtle.ttl
This produces possibly many warnings but the result is in a format which can be loaded into jena (using the fuseki web interface).
surprisingly simple (at least in this case).

Another reason of the above error is corrupted jar file. I got the same error but for Junit when running unit tests. Removing jar and downloading it again fixed the issue.

in my case was a wrong path in a config file: file was not found (path was wrong) and it came out with this exception:
Error configuring from input stream. Initial cause was The processing
instruction target matching "[xX][mM][lL]" is not allowed.

For PHP, put this line of code before you start printing your XML:
while(ob_get_level()) ob_end_clean();

It's worth checking your server's folders to see if there's a stray pom.xml hanging around.
I found that I had the problem everyone else described with a malformed pom.xml, but in a folder that I didn't expect to be on the server. An old build was sticking around unwelcome D:

For my case, the tab is the trouble maker. Replace the tab with blank should resolve the issue

How to extract part of an attributes value using XSLT

I have the following line of code in a HTML file (or something similar):
...
Link Content
...
I need to be able to extract the a/b/c/d part of the href and convert the link to something like:
Link Content
Ideally I'd like to be able to do this with regex, but most of the regex stuff I've seen for XSLT on StackOverflow seems to require XPath 2.
Ah yes... I'm using SimpleXML/DomDocument on PHP5.3 to apply the stylesheet which I believe doesn't support v2 xslt.
I think I could do string replacement to lose the first part, but I'd like to have a pattern match to extract it.
Any thoughts?

As already pointed out in the answer given by michael.hor257k, you have to adjust the & character to have valid XML. Given an input containing for example
Link Content
the following template
<xsl:template match="a/#href[starts-with(.,'#SCRIPT_NAME#')]">
<xsl:attribute name="href">
<xsl:value-of select="concat('/lookup?id=', substring-after(.,'id='))"/>
</xsl:attribute>
</xsl:template>
changes the link to
Link Content
matching every href starting with #SCRIPT_NAME#.
Though it's not clear from the question which is the part that has to be matched / how to identify the links that have to be adjusted, possibly you can adjust this example to fit your requirements or provide further input to your question.

most of the regex stuff I've seen for XSLT on StackOverflow seems to
require XPath 2.
Not most: all. Unless your specific XSLT 1.0 processor offers regex as a (procesor-specific) extension.
Now, the part missing from your question is how to recognize the part that you want to extract from the existing value. If, for example, it is always the substring that comes after (the first occurrence of) "id=", then you could use the substring-after() function to retrieve it.
Or at least in theory you could. In practice, nothing will work with the given example, because it contains an unescaped & character - a big no-no in XML.

This is just a shot in the dark, but if you are specifically looking to solve this with a regex, you may be able to use something like the following:
$xslt_string = 'Link Content';
preg_match('/href=".+?id=(.+?)"/', $xslt_string, $matches);
print_r($matches);
https://regex101.com/r/rY7oY7/1

Parsing XML - line feed, carriage return??? I'm confused?

Ok, I have search for about 3 hours and have decided to post this. I am pulling a XML feed and have one XML element that has a bunch of text creating one paragraph. When I look at the source though, I see it broken with carriage returns (as mentioned in the title, not sure if that's correct).
Here is the feed I am pulling from: http://jobs.cbizsoft.com/cbizjobs/jobdetail_post.aspx?cid=cbiz_advantech&jobid=Req-0005
I am using php to build the xml file and then jquery/ajax to build the page as needed.
My question is if I can use php to parse the breaks and format the output to look nicer?
Thanks for the help!

Ok, if I understand you correctly, the problem is when the text is output in your HTML document, then the line breaks are gone. This is because in HTML line breaks (like all white space) is collapse into one space, so
<div>Hello World!</div>
and
<div>Hello
World!</div>
produce the same output.
There are several ways you can solve this:
Put the CSS style white-space: pre-line (or pre-wrap) on the surrounding element.
Or use PHP to replace all line breaks with <br>
Or use a markdown library that basically does the same as the second point, but with additional kinds of formatting such as properly wrapping paragraphs or turn bullet lists in a real HTML list.

You should be using a CDATA section for the description data so that any offending characters are ignored by XML parsers
<Item name='Description' caption='Description'>
<![CDATA[
- Support acquisition and installation processes and coordinate with multiple SPAWAR and Navy stakeholders.
- Analyze acquisition policy life cycle and provide analytical support.
...
]]>
</Item>
Melaos is correct that carriage returns (and other whitespace characters) are valid within XML documents

See the nl2br() function, which will put HTML line breaks for each actual line in the text.
Here's a quicky example with your XML.

XSLTProcessor in PHP always removes white space

I'm trying to use XSLTProcessor to combine some XML and a XSLT stylesheet to combine to a html file.
However it always results with outputting the html in 1 line.
So for example my XSLT:
<p>
<strong>my sheet</strong>
this is <strong>my</strong> <em>style</em>
</p>
Turns into:
<p><strong>my sheet</strong>this is <strong>my</strong><em>style</em></p>
I am using:
<xsl:preserve-space elements="*" />
<xsl:output method="html" version="4.0" encoding="iso-8859-1" indent="yes"/>
But I would like to preserve my html as it is.
Anyone has any idea's?

preserve-space deals with the processing of elements and their contents from the data file, and does not affect how the script is parsed. The short answer is that you can't, and shouldn't.
If you have significant whitespace (for example two spans which need a space in between to prevent the words running together) then you add it in with <xsl:text> </xsl:text>. If you don't have significant whitespace (for example, between <h1>..</h1> space <p>...), then you shouldn't try to add it in.
XML is there to precisely, reliably transfer a document tree from one program to another, and being pretty is in no way part of its job. XSLT won't add in whitespace, because it doesn't know where it is safe to do so, and it won't take it away, because it doesn't know where that is useful. Remember XSLT know nothing about HTML; it's markup language independent. To do what you want, XSLT would need to know that it can put space around block elements (h1, p, etc) but not around spans, otherwise you might get floating punctuation:
my cunning paragraph with
<span>text</span>
, and more
The above is clearly not acceptable output. Because it doesn't know what elements are safe and what aren't, XSLT does the obviously correct opinion and doesn't risk malprocessing your data for sake of some pretty-printing.
XML is not designed to be written by hand, nor read as raw data. Don't try it. Open the XML output in Firefox, and it can do the formatting for you, and if you want it took pretty, do that in another application.
For completeness, there is in fact one safe way of doing pretty printing without affecting spacing:
<root
><h1>The correct way of handling pretty-printing with XML</h1
><p
>A test paragraph with a <span
>span</span
>, which won't break</p
></root
>
Finally, kill ISO-8859-1. It must die. Try to avoid h1 inside p.

Cross compatible CSS (positioning)?

I have a site: http://www.quass.com/erase.php
Position of the flash widget is fine in Firefox but not in IE8
What's the reason? How to fix it?

You don't define DOCTYPE so the page is rendered in quirks mode. So you need to use a proper DOCTYPE. Here you can find what is DOCTYPE and what are the options you have. You must add it in the top of your html document. If you want to use html5 the DOCTYPE is still needed, so you have to use <!DOCTYPE HTML>. Then, whatever is your DOCTYPE choice, you can validate your web page with the w3 validator.

As answered by reiso up here, you have a malformed HTML problem. And while Firefox doesn't bother that much, IE8 is a bit pricky in this.
Check your source, close all <div>s tags before starting another <div>, if you make the html well-written and standard-compliant I'm 100% sure everything will work as you wish :)
Once I had the very same problem, and guess what? it was just a <div> tag not closed that caused IE to mess around with everything.

Your code is malformed. Most specifically you are missing a doctype declaration. <!doctype html> is a decent choice for starters — there should be absolutely no characters before it in your source.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

XSLT stripping comments - php

As far as I know, unless you tell it otherwise, an XSLT transform will strip comments and processing instructions. If you want to keep comments you can add something like <xsl:template match="comment()"> <xsl:comment><xsl:value-of select="."/></xsl:comment> </xsl:template> to your xslt file.

Related

XML declaration allowed only at the start of the document Sitemap Error [duplicate]

How to extract part of an attributes value using XSLT

Parsing XML - line feed, carriage return??? I'm confused?

XSLTProcessor in PHP always removes white space

Cross compatible CSS (positioning)?

Categories

Resources