Showing posts with label XSLT. Show all posts
Showing posts with label XSLT. Show all posts

Monday, 24 May 2010

Removing whitespace from HTML tags

The Problem

This situation occurs when using the .net 2.0 classes to transform XML into HTML with a style sheet the transform (System.Xml.Xsl.XslCompiledTransform).  the output will include whitespace inside of certain HTML tags that may occasionally display within form fields when output.  This can be a pain especially when displaying TextArea elements. In IE the behaviour appeared to change unexpectedly when some other element appeared before the TextArea – such as a dropdown list (select).

The Solution

A solution is to remove the whitespaces from these tags after it has been transformed.

private string CleanUpWhitespaceInTags(string xmlString)
{
return cleanUpWhitespaceInTagsRegEx.Replace(xmlString, cleanUpWhitespaceInTagsReplacement);
}

private static readonly Regex cleanUpWhitespaceInTagsRegEx = new Regex(@"<(?<tagName>[\w-]+)\b(?<attributes>[^>]*)>\s*(?<content>[^<]*?)\s*</\1>");
private const string cleanUpWhitespaceInTagsReplacement = "<${tagName}${attributes}>${content}</${tagName}>";


The above code uses a regular expression to match tags that have no inner xml/html and gets named captures of the tag name, it’s attributes and content (without any surrounding whitespace).



To break it down:

























































< match the opening tag’s starting brace
(?<tagName>[\w-]+) capture any valid tag name (containing one or more word characters or dashes)
\b check that we are at a word boundary
(?<attributes>[^>]*) capture any characters after the tag name and before an end brace as attributes
> match the opening tag’s end brace
\s* match any whitespace on the left of the content
(?<content>[^<]*?) capture any character that isn’t a starting brace but don’t be greedy
\s* match any whitespace on the right of the content
</\1> match the closing tag


Disclaimer: Whether this regex covers all scenarios has not been tested but it works for the situation we encountered.



The replacement string then reconstitutes the tag, attributes and content without the leading or trailing whitespace.



Technorati Tags: ,,

Wednesday, 14 January 2009

Using an ASP.NET handler to output a CSV file

I wanted a ASP.NET handler to output some data as a CSV file with the most likely target being to open it in Excel.

I had the following problems to resolve:

  1. Force download rather than opening in IE.

  2. Numbers beginning with zeros (0) must not be stripped off in Excel.

  3. Creating the CSV from an XML file using XSLT.

  4. Decoding entities (e.g. &amp;) in XSLT.

  5. Accessing Session State from the handler code.
So lets look at these in order ...

1. Force download rather than opening in IE.

This one is fairly simple to resolve with some addition of headers:

context.Response.ContentType = "text/csv";
context.Response.AddHeader("Content-Disposition", "inline; filename=\"file.csv\"");

The only thing to look out for here is to use Response.AddHeader rather than the Response.Headers collection as it won't work on most IIS platforms.

2. Numbers beginning with zeros (0) must not be stripped off in Excel.

This solution is a disappointing but effective one. Enclose the number in quotes with a leading equals sign.

So 01234 becomes ="001234"

This is a very Excel bias solution but was good enough for me. Another solution is to lead with a single quote but unfortunately this will initially show in the Excel cell until the user 'edits' the cell then it disappears.

3. Creating the CSV from an XML file using XSLT.

This is fairly straightforward, here's the XSLT:

<xsl:stylesheet version="1.0" xsl="http://www.w3.org/1999/XSL/Transform" msxsl="urn:schemas-microsoft-com:xslt" prefixes="msxsl">
<xsl:output method="text" encoding="utf-8">

<xsl:template match="doc">
<xsl:apply-templates select="row">
</xsl:apply-templates>

<xsl:template match="row">
<xsl:for-each select="*">
<xsl:value-of select="." disable-output-escaping="yes" />
<xsl:if test="position() != last()">
<xsl:value-of select="','" />
</xsl:if>
</xsl:for-each>
<xsl:text>&#10;</xsl:text>
</xsl:template>
</xsl:stylesheet>

This assumes an Xml document hierarchy of doc > row > fields.

This is fine except that often you want to output only certain fields, perhaps in a particular format, and show the field headings in the first row. The above can be adapted for this by swapping out the "for-each" for a "value-of" naming each field individually.

4. Decoding entities (e.g. &amp;) in XSLT.

In the above Xslt you will notice that I have added disable-output-escaping which will do just that.

5. Accessing Session State from the handler code.

The class definition for a handler is by default:

public class CsvHandler : IHttpHandler

To indicate that you need session state too you need to add another interface although this is just a marker (probably should have been an attribute), I only need read access so:

public class CsvHandler : IHttpHandler, IReadOnlySessionState


Okay that covers all the issues listed above.