Class: Wikitext::Parser

Inherits:
Object
  • Object
show all
Defined in:
doc/rdoc.rb

Overview

Attributes

line_ending (String)

The line ending to be used in the generated HTML (defaults to “n”).

The prefix to be prepended to internal links (defaults to “/wiki/”). For example, given an internal_link_prefix of “/wiki/”, the internal link:

[[Apple]]

would be transformed into:

<a href="/wiki/Apple">Apple</a>

The CSS class to be applied to external links (defaults to “external”). For example, given an external_link_class of “external”, the external link:

[http://www.google.com/ the best search engine]

would be transformed into:

<a class="external" href="http://www.google.com/">the best search engine</a>

The rel attribute to be applied to external links (defaults to nil, meaning that no rel attribute is applied). Setting a rel attribute of “nofollow” may be useful for search-engine optimization (see en.wikipedia.org/wiki/Nofollow for more details).

This attribute can be set during initialization:

parser = Wikitext::Parser.new :external_link_rel => 'nofollow'

Or via setting an attribute on the parser:

parser = Wikitext::Parser.new
parser.external_link_rel = 'nofollow'

Or at parse time:

parser = Wikitext::Parser.new
parser.parse input, :external_link_rel => 'nofollow'

Setting external_link_rel to nil suppresses the emission of any previously configured rel attribute:

parser.parse input, :external_link_rel => nil

mailto_class (String)

The CSS class to be applied to external “mailto” links (defaults to “mailto”). For example:

[mailto:user@example.com user@example.com]

or if autolinking of email addresses is turned on, just:

user@example.com

would be transformed into:

<a class="mailto" href="mailto:user@example.com">user@example.com</a>

img_prefix (String)

The prefix to be prepended to image tags (defaults to “/images/”). For example, given this image markup:

{{foo.png}}

The following img tag would be produced:

<img src="/images/foo.png" alt="foo.png" />

autolink (boolean)

Whether to autolink URIs found in the plain scope. When true:

http://apple.com/

will be transformed to:

<a href="http://apple.com/">http://apple.com/</a>

and if an external_link_class is set (to “external”, for example) then the transformation will be:

<a class="external" href="http://apple.com/">http://apple.com/</a>

When false, no transformation will be applied and the link will be echoed literally:

http://apple.com/

pre_code (boolean)

When true, “pre” blocks are formatted using “code” elements. For example:

<pre>foo</pre>

Produces:

<pre><code>foo</code></pre>

When false (the default), it produces:

<pre>foo</pre>

space_to_underscore (boolean)

Whether spaces in link targets should be encoded normally or transformed into underscores.

When false, an internal link like:

[[foo bar]]

Would be converted into:

<a href="/wiki/foo%20bar">foo bar</a>

But when true (the default), it would be converted into:

<a href="/wiki/foo_bar">foo bar</a>

Converting spaces to underscores makes most URLs prettier, but it comes at a cost: when this mode is true the articles “foo bar” and “foo_bar” can no longer be disambiguated, and a link to “foo_bar” will actually resolve to “foo bar”; it is therefore recommended that you explicitly disallow underscores in titles at the application level so as to avoid this kind of confusion.

base_heading_level (integer)

An integer between 0 and 6 denoting the current “heading level”. This can be used to inform the parser of the “context” in which it is translating markup.

For example, the parser might be translating blog post excerpts on a page where there is an “h1” title element for the page itself and an “h2” title element for each excerpt. In this context it is useful to set base_heading_level to 2, so that any “top level” headings in the markup (that is “h1” elements) can be automatically transformed into “h3” elements so that they appear to be appropriately “nested” inside the containing page elements.

In this way, markup authors can be freed from thinking about which header size they should use and just always start from “h1” for their most general content and work their way down.

An additional benefit is that markup can be used in different contexts at different levels of nesting and the headings will be adjusted to suit automatically with no intervention from the markup author.

Finally, it’s worth noting that in contexts where the user input is not necessarily trusted, this setting can be used to prevent users from inappropriately employing “h1” tags in deeply-nested contexts where they would otherwise disturb the visual harmony of the page.

output_style (Symbol)

Wikitext emits valid HTML5 fragments. By default, the output syntax is HTML. Optionally, the output syntax can be changed to XML by setting the output_style to “:xml”.

This can be done during initialization:

parser = Wikitext::Parser.new :output_style => :xml

Or via setting an attribute on the parser:

parser = Wikitext::Parser.new
parser.output_style = :xml

Or at parse time:

parser = Wikitext::Parser.new
parser.parse input, :output_style => :xml

In practice the only difference between the two output syntaxes is that the XML syntax uses self closing img tags:

<img src="foo.png" alt="Foo" />

While the HTML syntax does not:

<img src="foo.png" alt="Foo">

“Red links” can be implemented by providing a custom link_proc block at parse time. This can be used to check for existing or non-existent link targets and apply custom CSS styling accordingly. For example, consider:

link_proc = lambda { |target| target == 'bar' ? 'redlink' : nil }
Wikitext::Parser.new.parse '[[foo]] [[bar]]', :link_proc => link_proc

This would add the “redlink” CSS class to the “bar” link but not the “foo” link. Please note that if your link_proc involves database queries then you should implement an appropriate caching strategy to ensure that markup with many links does not overwhelm your database.

A link_proc may also be set during initialization:

parser = Wikitext::Parser.new :link_proc => link_proc

Or via setting an attribute on the parser:

parser = Wikitext::Parser.new
parser.link_proc = link_proc

Many more examples of link procs can be found in the spec suite:

Defined Under Namespace

Classes: Error, Token

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(options = {}) ⇒ Parser

Prepares a Parser instance.

There are a number of attributes that you can set on the returned parser to customize its behaviour. See the attributes documentation in the Parser class. You also have the option of overriding the attributes at initialization time passing in the attribute name in symbol form together with the overridden value.

In other words, both:

parser = Wikitext::Parser.new
parser.autolink = false
parser.mailto_class = 'mail'

And:

parser = Wikitext::Parser.new :autolink => false, :mailto_class => 'mail'

Are equivalent.



332
333
334
335
# File 'doc/rdoc.rb', line 332

def initialize options = {}
  # This is just a placeholder.
  # See parser.c for the C source code to this method.
end

Class Method Details

URL-encodes an internal link target for use as an href attribute in an anchor. Expects string to be UTF-8-encoded.

For example, the link target:

foo, "bar" & baz €

would be encoded as:

foo%2c%20%22bar%22%20%26%20baz%e2%82%ac

The encoding is based on RFCs 2396 and 2718. The “unreserved” characters a..z, a..Z, 0..9, “-”, “_”, “.” and “~” are passed through unchanged and all others are converted into percent escapes.

When combined with sanitize_link_target this method can be used to emit the following link for the example article:

<a href="foo%2c%20%22bar%22%20%26%20baz%e2%82%ac">foo, &quot;bar&quot; &amp; baz &#x20ac;</a>

Note that when space_to_underscore is true spaces are treated specially, and are converted to “_” rather than “%20”. For the majority of links this yields much prettier URLs at the cost of some reduction in the namespace of possible titles (this is because when using space_to_underscore you should disallow underscores in article titles to avoid ambiguity between titles like “foo bar” and “foo_bar”).



308
309
310
311
# File 'doc/rdoc.rb', line 308

def self.encode_link_target string
  # This is just a placeholder.
  # See parser.c for the C source code to this method.
end

Sanitizes an internal link target for inclusion within the HTML stream. Expects string to be UTF-8-encoded.

For example, a link target for the article titled:

foo, "bar" & baz €

would be sanitized as:

foo, &quot;bar&quot; &amp; baz &#x20ac;

Note that characters which have special meaning within HTML such as quotes and ampersands are turned into named entities, and characters outside of the printable ASCII range are turned into hexadecimal entities.

See also encode_link_target.



274
275
276
277
# File 'doc/rdoc.rb', line 274

def self.sanitize_link_target string
  # This is just a placeholder.
  # See parser.c for the C source code to this method.
end

Instance Method Details

#benchmarking_tokenize(string) ⇒ Object

Like the tokenize method feeds string into the scanner to obtain the corresponding tokens, but unlike the tokenize method it does not return them because its sole purpose is to measure the speed of the scanner.

Just like the tokenize method raises a Wikitext::Parser::Error if passed invalid UTF-8 input.



357
358
359
360
# File 'doc/rdoc.rb', line 357

def benchmarking_tokenize string
  # This is just a placeholder.
  # See parser.c for the C source code to this method.
end

#parse(string, options = {}) ⇒ Object

Parses and transforms the UTF-8 wikitext markup input string into HTML. Raises a Wikitext::Parser::Error if passed invalid UTF-8. You can customize some aspects of the transformation by setting attributes on the parser instance before calling this method (see the attributes documentation for the Parser class), or by passing in an (optional) options hash.

Options that can be overridden at parse-time include:

indent

A non-negative number (to add an arbitrary amount of indentation to all lines in the output) or false (to disable indentation entirely).

base_heading_level

An integer between 0 and 6 denoting the current “heading level” (documented above).

output_style

A symbol, “:xml”, to emit XML syntax (by default HTML syntax is emitted)

link_proc

A lambda that can be used to apply custom CSS to links to produce “red links” (documented above)



382
383
384
385
# File 'doc/rdoc.rb', line 382

def parse string, options = {}
  # This is just a placeholder.
  # See parser.c for the C source code to this method.
end

#tokenize(string) ⇒ Object

Feeds the UTF-8-encoded string into the scanner and returns an array of recognized tokens. Raises a Wikitext::Parser::Error exception if the input string is not valid UTF-8.

Normally you don’t need to invoke this method manually because the parse method automatically sets up a scanner and obtains tokens as it needs them. This method exists for testing and introspection only.



345
346
347
348
# File 'doc/rdoc.rb', line 345

def tokenize string
  # This is just a placeholder.
  # See parser.c for the C source code to this method.
end