Title: BBCode Parser
Description: A very flexible, extendable and configurable parser for BBCode.
Author: Paulo Santos
eMail: pjondevelopment@gmail.com
Environment: VB.NET 2008 (Visual Studio 2008)
Keywords: BBCode Parser Tokenizer

BBCode Parser

DISCLAIMER

The Software is provided "AS IS", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and non-infringement. in no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.

Introduction

I've been working on improving the website of the company I work for during the last few weeks . One of my tasks was to provide means to allow the end user to edit rich content text while providing the management with enough peace of mind that the users wouldn't mess up with the site design and configuration.

So I proposed using BBCode which is pretty common among many Internet forums (like phpBB, Snitz, among others) given a very low learning curve, if any, among our user base.

The problem is that the few parsers for BBCode I've found on line weren't flexible and extensible enough to meet our needs. So, as a any developer, I rolled up my sleeves and coded one myself.

Parsing the BBCode

The first thing I took into account when creating this parser is that it needed to be flexible, able to add as many tags as wanted, formatting the text in any way possible.

The heart of the parser is a tokenizer I've created a while back for another project that never took off, but it was pretty robust for this project, so, I grabbed it along.

With the tokenizer the rest of the code pretty much came by itself, and the desired objective was accomplished.

Within the syntax of BBCode, a tag starts with an open square bracket '[' and ends with a close square bracket ']'. Anything between this two markers is considered a tag.

BBCode Tags

There are basically three types of tags:

  1. Simple Tags [tag]
  2. Value Tags [name=value]
  3. Parametrized Tags [tag param=value]

Each tag may or may not require a closing tag. A closing tag has the same name of the tag starting with a forward slash '/'.

The name of a tag can be any character except the equals sign (=) and the square brackets ([]).

Yes, I know that there are some parsers out there that allows tags with the square bracket within the tag, but this is a limitation I can live with.

Configuring Tags

While there are some basic tags common to most of BBCode parsers, there is no standard, although there are a few attempts to do so, like for instance BBCode.org.

With this in mind this parser is very configurable and flexible. It can be configured either programmatically or through an XML file

To configure the parser programmatically you need to add tag definition to the BBCodeParser.Dictionary collection, as shown on the example below.

Dim parser As New BBCodeParser()
parser.Dictionary.Add("b", "<b>{value}</b>", True)
parser.Dictionary.Add("u", "<u>{value}</u>", True)
parser.Dictionary.Add("i", "<i>{value}</i>", True)

The last parameter indicates that the tag requires a closing tag. With the configuration above the text

[b][u][i]Sample Text[/i][/u][/b]

would be formatted as:

Sample Text

Tag Parameters

Any tag can have any number of parameters. Each parameter consists of a pair of name=value. The value can be between a single (') or double (") quote.

This parameters can then be used in placeholders within the replacement text format.

There are two special parameters:

  • default : Only used for value tags and it has the value of the text after the first equal sign in the tag.
  • value : Used for any tag that requires a closing tag, represents the formatted text inside the tag.

For example, with the configuration below:

parser.Dictionary.Add _
("url", _ "<a href='{default|value}'>{value|default}</a>", _ True)

the text [url=http://pjondevelopment.50webs.com]here[/url] could be used as a link pointing to this site.

Note that on the replacement format text, the placeholders are formed with '{' and '}', separated by pipes '|', if multiple parameters can be used on its place. When formatting, the first non-empty parameter will be used.

So, in the example above, we have two placeholders: {default|value} and {value|default} this means that in the first placeholder, it will be used either the value of the default parameter or the formatted text between [url=...] and [/url].

Extending Tags

Besides the tag definition the parser can be extended even further by simply creating a derived class from BBCodeElement and add it to the BBCodeParser.ElementTypes collection. With custom types the possibilities are endless, as one might read the contents from a file, a database, or wherever the object wants, based or not on the tag parameters.

Just note that the derived type must provide a public default constructor.

parser.ElementTypes.Add _
  ("userImage", _
   GetType(SubClassOfBBCodeElement))

With this configuration, every time the text [userImage] is found, it will be replaced by whatever text the class SubClassOfBBCodeElement provides through the method Format(ITextFormatter).

Saving and Loading Configurations

By calling the method SaveConfiguration all the elements' configuration will be saved to an XML file, that can be loaded by the LoadConfiguration method.

The BBCode XML Configuration File has the following structure:

<?xml version="1.0" encoding="utf-8"?>
<Configuration>
  <ElementTypes>
    <element 
      name="tagName" 
      type="SubClassOfBBCodeElement, AssemblyQualifiedName" />
  </ElementTypes>
  <Dictionary>
    <tag name="tagName" 
         requireClosingTag="True|False">
      <!-- XHTML Replacement Text -->
      Use {parameter} as a placeholder 
      for any variable portion of the text.
    </tag>
    <tag name="otherTagName" 
         requireClosingTag="True|False" 
         escaped="True">
      Use the escaped="True" if the text 
      is not a valid XML, escapting all 
      the necessary XML characters.
    </tag>
  </Dictionary>
</Configuration>

The tag can be in either or both sections, if a tag is in both sections of the configuration file, the tag will be created using the type specified in the <ElementTypes> section and having its BBCodeElement.ReplacementFormat property set with the text from the <Dictionary> section.

The order of the sections is not important, however the tags must be described, so the BBCode parser can format the tag properly. If a tag is not found in either sections, it will be rendered as simple text, ignoring the fact that it is a tag.

BBCodeParser Class

Public Properties

Public PropertyDictionary
Read Only.
Gets the dictionary of known elements of the BBCodeParser Class.
ReadOnly Property Dictionary() _
As BBCodeElementDictionary
Public PropertyElementTypes
Read Only.
Gets the dictionary of known custom element of the BBCodeParser Class.
ReadOnly Property ElementTypes() _
As BBCodeElementTypeDictionary

Public Methods

Public MethodLoadConfiguration
Loads the configuration (Dictionary and ElementTypes) from the specified file, Stream or TextReader.
Sub LoadConfiguration(ByVal fileName As String)
Sub LoadConfiguration(ByVal stream As IO.Stream)
Sub LoadConfiguration(ByVal reader As IO.TextReader)
Public MethodSaveConfiguration
Saves the configuration (Dictionary and ElementTypes) to the specified file, Stream or TextWriter.
Sub SaveConfiguration(ByVal fileName As String)
Sub SaveConfiguration(ByVal stream As IO.Stream)
Sub SaveConfiguration(ByVal writer As IO.TextWriter)
Public MethodParse
Parses the BBCode returning an BBCodeDocument.
Function Parse(ByVal text As String) As BBCodeDocument
Function Parse(ByVal stream As IO.Stream) _
As BBCodeDocument
Function Parse(ByVal stream As IO.Stream, _
ByVal encoding As Text.Encoding) _
As BBCodeDocument
Function Parse(ByVal reader As IO.TextReader) _
As BBCodeDocument

BBCodeDocument Class

Public Properties

Public PropertyNodes
Read Only.
Gets the collection of BBCodeNode parsed by the BBCodeParser.
ReadOnly Property Nodes() As BBCodeNodeCollection
Public PropertyText
Gets or sets the BBCode of the BBCodeDocument.
Property Text() As String

Public Methods

Public MethodFormat
Formats the BBCode into HTML by default, or any other format using an implementation of the ITextFormatter interface..
Function Format() As String 
Function Format(ByVal formatter As ITextFormatter) _
As String

BBCodeNode Class

Public Properties

Public PropertyInnerBBCode
Gets or sets the BBCode for the node.
Property InnerBBCode() As String
Public PropertyInnerText
Gets or sets the text of the BBCodeNode.
Property InnerText() As String
Public PropertyOuterBBCode
Read Only.
Gets the BBCode for the complete node.
Property OuterBBCode() As String
Public PropertyParent
Read Only.
Gets node above the current BBCodeNode.
Property Parent() As BBCodeNode

Public Methods

Public MethodFormat
Formats the BBCode using an implementation of the ITextFormatter interface..
Function Format(ByVal formatter As ITextFormatter) _
As String

BBCodeElement Class

Inherits from BBCodeNode

Public Properties

Public PropertyAttributes
Read Only.
Gets a collection of attributes of the BBCodeElement.
Property ReadOnly Attributes() _
As BBCodeAttributeDictionary
Public PropertyInnerBBCode
Inherited from BBCodeNode.
Public PropertyInnerText
Inherited from BBCodeNode.
Public PropertyName
Read Only.
Gets the name of the tag.
Property ReadOnly Name() As String
Public PropertyNodes
Read Only.
Gets a collection of BBCodeNode children of the current BBCodeElement.
Property ReadOnly Attributes() _
As BBCodeAttributeDictionary
Public PropertyOuterBBCode
Inherited from BBCodeNode.
Public PropertyParent
Inherited from BBCodeNode.
Public PropertyReplacementFormat
Read Only.
Gets the string that describes how the element should be formatted by the ITextFormatter.
Property ReadOnly ReplacementFormat() As String
Public PropertyRequireClosingTag
Read Only.
Gets a value indicating if the tag requires a closing tag.
Property ReadOnly RequireClosingTag() As Boolean

Public Methods

Public MethodFormat
Inherited from BBCodeNode.