Wikitext

The Wikitext extension is a fast wikitext-to-HTML translator written in C and packaged as a Ruby extension.

Usage is straightforward:

$ irb -r wikitext
>> Wikitext::Parser.new.parse("hello world!")
=> "<p>hello world!</p>\n"

Design goals

I needed a wikitext-to-HTML translator for a Rails application; a number of design goals flowed on from this:

  • fast: Rails has a reputation for being slow, so the translator had to be part of the solution, not part of the problem

  • efficient: again, given how much memory Rails likes to use, the translator had to be very memory-efficient

  • robust: on a public-facing web application that had to be up for long periods, the translator had to be stable (no crashes, no resource leaks)

  • secure: again, accepting input from untrusted sources meant that the translator had to sanitize or reject unsafe input

  • easy to use: for end users, the translator should provide a simple, familiar markup as close as possible to what they already know from other applications (such as MediaWiki, the wiki software that powers Wikipedia)

  • forgiving: wikitext is presentation markup, not source code, so the translator should do a reasonable job of formatting even the most invalid markup rather than giving up

  • informative: when provided invalid markup the translator should fail gracefully and emit HTML that provides useful visual feedback about where the errors are in the input

  • multilingual-friendly: the translator should handle input beyond printable ASCII in a compatible fashion

  • attractive: the emitted HTML source should be consistent and attractive

  • valid output: regardless of the input, the translator should always produce valid HTML5 output

  • well-tested: the translator should have a comprehensive test suite to ensure that its behaviour is not only correct but also stable over time

  • cross-platform: should work identically on Mac OS X, Linux (explicitly tested platforms) and perhaps others as well

Some notable things that were not design goals:

  • implement all of the MediaWiki syntax (tables etc)

Markup

The markup is very close to that used by MediaWiki, the most popular wiki software and the one that powers Wikipedia.

Headings

= Heading 1 =
== Heading 2 ==
=== Heading 3 ===
==== Heading 4 ====
===== Heading 5 =====
====== Heading 6 ======

Are marked up as:

<h1>Heading 1</h1>
<h2>Heading 2</h2>
<h3>Heading 3</h3>
<h4>Heading 4</h4>
<h5>Heading 5</h5>
<h6>Heading 6</h6>

Paragraphs

Consecutive linebreaks are converted into paragraph breaks.

This is one paragraph.
Another line.

And this is another.

Would be marked up as:

<p>This is one paragraph. Another line.</p>
<p>And this is another.</p>

Emphasis, Strong

Emphasis is marked up as follows:

''emphasized''

Which gets translated into:

<em>emphasized</em>

Strong is marked up like this:

'''strong text'''

And transformed into:

<strong>strong text</strong>

You can nest spans inside one another, provided you don’t try to produce invalid HTML (for example, nesting strong inside strong). Here is a valid example:

'''''foo'' bar''' baz

This would become:

<strong><em>foo</em> bar</strong> baz

Note that the translator emits HTML on the fly, so when it sees the first run of five apostrophes it has no way of knowing what will come afterwards and so doesn’t know whether you mean to say “strong em” or “em strong”; it therefore always assumes “strong em”. If you wish to force the alternative interpretation you can do one of the following:

'' '''foo''' bar'' baz (ie. use whitespace)
''<nowiki></nowiki>'''foo''' bar'' baz (ie. insert an empty nowiki span)
<em><strong>foo</strong> bar</em> baz (ie. use explicit HTML tags instead)
<em>'''foo''' bar</em> baz (ie. use explicit HTML tags instead)

Note that to avoid ambiguity, the translator will not let you intermix the shorthand style with the literal HTML tag style.

<em>foo'' (ie. intermixed, invalid)

Teletype

The translator recognizes both standard HTML tt tags and the backtick (‘) as a shorthand. These two are equivalent:

<tt>fixed</tt>
`fixed`

As of version 2.0, this markup is actually translated to code tags in the output because the tt tag was removed from the HTML5 standard.

If you need to insert a literal backtick in your text you use a nowiki span:

here follows a literal <nowiki>`</nowiki> backtick

To avoid ambiguity, the translator will not let you intermix the two styles.

nowiki spans

Already mentioned above, you can use nowiki tags to temporarily disable wikitext markup. As soon as the translator sees the opening nowiki tag it starts emitting a literal copy of everything it sees up until the closing nowiki tag:

Hello <nowiki>''world''</nowiki>

Would be emitted as:

Hello ''world''

Blockquotes

> Hello world!
> Bye for now.

Would be emitted as:

<blockquote><p>Hello world! Bye for now.</p></blockquote>

You can nest blockquotes or any other kind of block or span inside blockquotes. For example:

> first quote
>> quote inside a quote

Preformatted text

Any line indented with whitespace will be interpreted as part of a pre block. Wikitext markup inside pre blocks has no special meaning. For example, consider the following block indented by a single space:

 // source code listing
 void foo(void)
 {
     x++;
 }

Would be translated into:

<pre>// source code listing
void foo(void)
{
    x++;
}</pre>

pre blocks may be nested inside blockquote blocks.

[[article title]]

Would become:

<a href="/wiki/article_title">article title</a>

And:

[[title|link text]]

Would become:

<a href="/wiki/article">link text</a>

See the Wikitext::Parser attributes documentation for how you can override the default link prefix (/wiki/ as shown in the example), and how “red links” can be implemented by applying custom CSS depending on the link target (this can be used to make links to non-existent pages appear in a different color).

Alternative blockquote and preformatted block syntax

For blockquote and pre blocks that go on for many lines it may be more convenient to use the alternative syntax which uses standard HTML tags rather than special prefixes at the beginning of each line.

<blockquote>This is
a blockquote!</blockquote>

<pre>And this is
preformatted text</pre>

blockquote and pre blocks may nest inside other blockquote blocks.

Note that to avoid ambiguity, the translator will not let you intermix the two styles (HTML markup and wikitext markup).

pre blocks may also contain a custom lang attribute for the purposes of marking up a block for syntax-highlighting (note that the highlighting itself would be provided by JavaScript in the browser and is not actually part of the Wikitext extension). For example:

<pre lang="ruby">puts @person.name</pre>

Would be translated into:

<pre class="ruby-syntax">puts @person.name</pre>

The lang attribute may only contain letters, so “Objective-C”, for example would need to be written as “objc” or similar.

[http://example.com/ this site]

Would become:

<a href="http://example.com/" class="external">this site</a>

See the Wikitext::Parser attributes documentation for information on overriding the default external link class (external in this example), or including a rel attribute of “nofollow” (which may be useful for search-engine optimization).

Note that in addition to providing a fully-qualified URL including a protocol (such as “http://” or “ftp://”) you also have the option of using an unqualified “path”-style URL. This is useful for making links to other pages still on the same site, but outside of the wiki:

[/issues/1024 ticket #1024]

Would become:

<a href="/issues/1024">ticket #1024</a>

Note that no “external” class is included in the generated link.

To avoid false positives, what constitutes a “path” is narrowly-defined as a string that begins with a slash, optionally followed by zero or more “path components” consisting of upper and lowercase letters, numbers, underscores, hyphens or periods. Path components are separated by a slash, and the trailing slash after the last path component is optional.

Images

{{foo.png}}

When outputting using HTML syntax (the default), this would become:

<img src="/images/foo.png" alt="foo.png">

When outputting using XML syntax, this would become a self-closing tag:

<img src="/images/foo.png" alt="foo.png" />

See the Wikitext::Parser documentation for information on setting the output syntax.

You can override the “/images/” prefix using the img_prefix attribute of the Parser.

You can also specify “absolute” image “src” attributes regardless of the current prefix setting by starting the image path with a forward slash; that is:

{{/foo.png}}

Would become:

<img src="/foo.png" alt="/foo.png">

Lists

Lists come in both unordered (“ul”):

* item
* item
* item

And ordered (“ol”) forms:

# first
# second
# third

These would produce, respectively:

<ul>
  <li>item</li>
  <li>item</li>
  <li>item</li>
</ul>

And:

<ol>
  <li>first</li>
  <li>second</li>
  <li>third</li>
</ol>

Lists may be nested inside one another as needed. For example:

# outer a
# outer b
#* nested 1
#* nested 2
# outer c
## nested foo
## nested bar
##* x
##* y
##** z

Would produce:

<ol>
  <li>outer a</li>
  <li>outer b
    <ul>
      <li>nested 1</li>
      <li>nested 2</li>
    </ul>
  </li>
  <li>outer c
    <ol>
      <li>nested foo</li>
      <li>nested bar
        <ul>
          <li>x</li>
          <li>y
            <ul>
              <li>z</li>
            </ul>
          </li>
        </ul>
      </li>
    </ol>
  </li>
</ol>

Ruby support

Version 4.0.0 and above target Ruby 2.0.0 or higher.

For older versions of Ruby, you may use the 3.1 release or older.

Rails support

The Wikitext extension provides a template handler so that templates named following the template_name.html.wikitext format will automatically be translated from wikitext markup into HTML when rendered.

Additionally, an optional Haml filter is available if you require "wikitext/haml_filter", which enables you to write wikitext markup inline (in Haml):

:wikitext
  = Here is some [[wikitext]] =

Likewise, a to_wikitext method (aliased as w) is added to the String class (and also NilClass, for convenience) so that content can be easily translated from inside view templates following patterns like:

@post.body.w

The to_wikitext method will preprocess its string using the String#wikitext_preprocess method, if it is defined, before feeding the string through the parser. This can be used to add application-specific behavior such as converting special strings like:

ticket #1234

into links. An example preprocessor is included with the extension but it is not active by default; it can be activated with:

require 'wikitext/preprocess'

Finally, a Wikitext::Parser#shared_parser method is added to provide convenient access to a shared singleton instance of the parser so as to avoid repeatedly instantiating and setting up new parser instances as part of every request.

Rails 2.3

For Rails 2.3.x support, use version 2.1.x of the Wikitext gem.

The plug-in can be activated with an appropriate config.gem statement in your config/environment.rb:

config.gem 'wikitext', '2.1.1'

Rails 3.0

For Rails 3.0.x support, use version 2.1.x of the Wikitext gem.

Add a line like the following to your Gemfile:

gem 'wikitext', '~> 2.1.1'

Note that while older versions of Wikitext do work with Rails 3 to some degree, for full compatibility Wikitext version 2.0 or higher should be used.

Rails 3.1

Add a line like the following to your Gemfile:

gem 'wikitext'

Links

Author

Wikitext is written and maintained by Greg Hurrell (greg@hurrell.net). Other contributors that have submitted patches include:

  • Mike Stangel

License

Copyright 2007-present Greg Hurrell. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Feedback

Please let me know if you’re using the Wikitext extension in your project. If you have any bug reports or feature requests please open a ticket in the issue tracker at wincent.com/issues.