Class RDoc::Markup
In: markup.rb
markup/fragments.rb
markup/inline.rb
markup/lines.rb
markup/to_flow.rb
doc-tmp/rdoc/markup.rb
Parent: Object

RDoc::Markup parses plain text documents and attempts to decompose them into their constituent parts. Some of these parts are high-level: paragraphs, chunks of verbatim text, list entries and the like. Other parts happen at the character level: a piece of bold text, a word in code font. This markup is similar in spirit to that used on WikiWiki webs, where folks create web pages using a simple set of formatting rules.

RDoc::Markup itself does no output formatting: this is left to a different set of classes.

RDoc::Markup is extendable at runtime: you can add new markup elements to be recognised in the documents that RDoc::Markup parses.

RDoc::Markup is intended to be the basis for a family of tools which share the common requirement that simple, plain-text should be rendered in a variety of different output formats and media. It is envisaged that RDoc::Markup could be the basis for formating RDoc style comment blocks, Wiki entries, and online FAQs.

Basic Formatting

  • RDoc::Markup looks for a document‘s natural left margin. This is used as the initial margin for the document.
  • Consecutive lines starting at this margin are considered to be a paragraph.
  • If a paragraph starts with a "*", "-", or with "<digit>.", then it is taken to be the start of a list. The margin in increased to be the first non-space following the list start flag. Subsequent lines should be indented to this new margin until the list ends. For example:
       * this is a list with three paragraphs in
         the first item.  This is the first paragraph.
    
         And this is the second paragraph.
    
         1. This is an indented, numbered list.
         2. This is the second item in that list
    
         This is the third conventional paragraph in the
         first list item.
    
       * This is the second item in the original list
    
  • You can also construct labeled lists, sometimes called description or definition lists. Do this by putting the label in square brackets and indenting the list body:
        [cat]  a small furry mammal
               that seems to sleep a lot
    
        [ant]  a little insect that is known
               to enjoy picnics
    

    A minor variation on labeled lists uses two colons to separate the label from the list body:

        cat::  a small furry mammal
               that seems to sleep a lot
    
        ant::  a little insect that is known
               to enjoy picnics
    

    This latter style guarantees that the list bodies’ left margins are aligned: think of them as a two column table.

  • Any line that starts to the right of the current margin is treated as verbatim text. This is useful for code listings. The example of a list above is also verbatim text.
  • A line starting with an equals sign (=) is treated as a heading. Level one headings have one equals sign, level two headings have two,and so on.
  • A line starting with three or more hyphens (at the current indent) generates a horizontal rule. The more hyphens, the thicker the rule (within reason, and if supported by the output device)
  • You can use markup within text (except verbatim) to change the appearance of parts of that text. Out of the box, RDoc::Markup supports word-based and general markup.

    Word-based markup uses flag characters around individual words:

    *word*
    displays word in a bold font
    _word_
    displays word in an emphasized font
    +word+
    displays word in a code font

    General markup affects text between a start delimiter and and end delimiter. Not surprisingly, these delimiters look like HTML markup.

    <b>text...</b>
    displays word in a bold font
    <em>text...</em>
    displays word in an emphasized font
    <i>text...</i>
    displays word in an emphasized font
    <tt>text...</tt>
    displays word in a code font

    Unlike conventional Wiki markup, general markup can cross line boundaries. You can turn off the interpretation of markup by preceding the first character with a backslash, so \<b>bold text</b> and \*bold* produce <b>bold text</b> and *bold respectively.

  • Hyperlinks to the web starting http:, mailto:, ftp:, or www. are recognized. An HTTP url that references an external image file is converted into an inline <IMG..>. Hyperlinks starting ‘link:’ are assumed to refer to local files whose path is relative to the —op directory.

    Hyperlinks can also be of the form label[url], in which case the label is used in the displayed text, and url is used as the target. If label contains multiple words, put it in braces: {multi word label}[url].

Synopsis

This code converts input_string to HTML. The conversion takes place in the convert method, so you can use the same RDoc::Markup object to convert multiple input strings.

  require 'rdoc/markup'
  require 'rdoc/markup/to_html'

  p = RDoc::Markup.new
  h = RDoc::Markup::ToHtml.new

  puts p.convert(input_string, h)

You can extend the RDoc::Markup parser to recognise new markup sequences, and to add special processing for text that matches a regular epxression. Here we make WikiWords significant to the parser, and also make the sequences {word} and <no>text...</no> signify strike-through text. When then subclass the HTML output class to deal with these:

  require 'rdoc/markup'
  require 'rdoc/markup/to_html'

  class WikiHtml < RDoc::Markup::ToHtml
    def handle_special_WIKIWORD(special)
      "<font color=red>" + special.text + "</font>"
    end
  end

  m = RDoc::Markup.new
  m.add_word_pair("{", "}", :STRIKE)
  m.add_html("no", :STRIKE)

  m.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)

  h = WikiHtml.new
  h.add_tag(:STRIKE, "<strike>", "</strike>")

  puts "<body>" + m.convert(ARGF.read, h) + "</body>"

Methods

Public Class methods

Take a block of text and use various heuristics to determine it‘s structure (paragraphs, lists, and so on). Invoke an event handler as we identify significant chunks.

[Source]

     # File doc-tmp/rdoc/markup.rb, line 194
194:   def initialize
195:     @am = RDoc::Markup::AttributeManager.new
196:     @output = nil
197:   end

Take a block of text and use various heuristics to determine it‘s structure (paragraphs, lists, and so on). Invoke an event handler as we identify significant chunks.

[Source]

     # File markup.rb, line 194
194:   def initialize
195:     @am = RDoc::Markup::AttributeManager.new
196:     @output = nil
197:   end

Public Instance methods

Add to the sequences recognized as general markup.

[Source]

     # File doc-tmp/rdoc/markup.rb, line 211
211:   def add_html(tag, name)
212:     @am.add_html(tag, name)
213:   end

Add to the sequences recognized as general markup.

[Source]

     # File markup.rb, line 211
211:   def add_html(tag, name)
212:     @am.add_html(tag, name)
213:   end

Add to other inline sequences. For example, we could add WikiWords using something like:

   parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)

Each wiki word will be presented to the output formatter via the accept_special method.

[Source]

     # File markup.rb, line 224
224:   def add_special(pattern, name)
225:     @am.add_special(pattern, name)
226:   end

Add to other inline sequences. For example, we could add WikiWords using something like:

   parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)

Each wiki word will be presented to the output formatter via the accept_special method.

[Source]

     # File doc-tmp/rdoc/markup.rb, line 224
224:   def add_special(pattern, name)
225:     @am.add_special(pattern, name)
226:   end

Add to the sequences used to add formatting to an individual word (such as bold). Matching entries will generate attibutes that the output formatters can recognize by their name.

[Source]

     # File doc-tmp/rdoc/markup.rb, line 204
204:   def add_word_pair(start, stop, name)
205:     @am.add_word_pair(start, stop, name)
206:   end

Add to the sequences used to add formatting to an individual word (such as bold). Matching entries will generate attibutes that the output formatters can recognize by their name.

[Source]

     # File markup.rb, line 204
204:   def add_word_pair(start, stop, name)
205:     @am.add_word_pair(start, stop, name)
206:   end

Look through the text at line indentation. We flag each line as being Blank, a paragraph, a list element, or verbatim text.

[Source]

     # File markup.rb, line 254
254:   def assign_types_to_lines(margin = 0, level = 0)
255:     now_blocking = false
256:     while line = @lines.next
257:       if line.blank? then
258:         line.stamp :BLANK, level
259:         next
260:       end
261: 
262:       # if a line contains non-blanks before the margin, then it must belong
263:       # to an outer level
264: 
265:       text = line.text
266: 
267:       for i in 0...margin
268:         if text[i] != SPACE
269:           @lines.unget
270:           return
271:         end
272:       end
273: 
274:       active_line = text[margin..-1]
275: 
276:       #
277:       # block_exceptions checking
278:       #
279:       if @block_exceptions
280:         if now_blocking
281:           line.stamp(:PARAGRAPH, level)
282:           @block_exceptions.each{ |be|
283:             if now_blocking == be['name']
284:               be['replaces'].each{ |rep|
285:                 line.text.gsub!(rep['from'], rep['to'])
286:               }
287:             end
288:             if now_blocking == be['name'] && line.text =~ be['end']
289:               now_blocking = false
290:               break
291:             end
292:           }
293:           next
294:         else
295:           @block_exceptions.each{ |be|
296:             if line.text =~ be['start']
297:               now_blocking = be['name']
298:               line.stamp(:PARAGRAPH, level)
299:               break
300:             end
301:           }
302:           next if now_blocking
303:         end
304:       end
305: 
306:       # Rules (horizontal lines) look like
307:       #
308:       #  ---   (three or more hyphens)
309:       #
310:       # The more hyphens, the thicker the rule
311:       #
312: 
313:       if /^(---+)\s*$/ =~ active_line
314:         line.stamp :RULE, level, $1.length-2
315:         next
316:       end
317: 
318:       # Then look for list entries.  First the ones that have to have
319:       # text following them (* xxx, - xxx, and dd. xxx)
320: 
321:       if SIMPLE_LIST_RE =~ active_line
322:         offset = margin + $1.length
323:         prefix = $2
324:         prefix_length = prefix.length
325: 
326:         flag = case prefix
327:                when "*","-" then :BULLET
328:                when /^\d/   then :NUMBER
329:                when /^[A-Z]/ then :UPPERALPHA
330:                when /^[a-z]/ then :LOWERALPHA
331:                else raise "Invalid List Type: #{self.inspect}"
332:                end
333: 
334:         line.stamp :LIST, level+1, prefix, flag
335:         text[margin, prefix_length] = " " * prefix_length
336:         assign_types_to_lines(offset, level + 1)
337:         next
338:       end
339: 
340:       if LABEL_LIST_RE =~ active_line
341:         offset = margin + $1.length
342:         prefix = $2
343:         prefix_length = prefix.length
344: 
345:         next if handled_labeled_list(line, level, margin, offset, prefix)
346:       end
347: 
348:       # Headings look like
349:       # = Main heading
350:       # == Second level
351:       # === Third
352:       #
353:       # Headings reset the level to 0
354: 
355:       if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/
356:         prefix_length = $1.length
357:         prefix_length = 6 if prefix_length > 6
358:         line.stamp :HEADING, 0, prefix_length
359:         line.strip_leading(margin + prefix_length)
360:         next
361:       end
362: 
363:       # If the character's a space, then we have verbatim text,
364:       # otherwise
365: 
366:       if active_line[0] == SPACE
367:         line.strip_leading(margin) if margin > 0
368:         line.stamp :VERBATIM, level
369:       else
370:         line.stamp :PARAGRAPH, level
371:       end
372:     end
373:   end

Look through the text at line indentation. We flag each line as being Blank, a paragraph, a list element, or verbatim text.

[Source]

     # File doc-tmp/rdoc/markup.rb, line 254
254:   def assign_types_to_lines(margin = 0, level = 0)
255:     now_blocking = false
256:     while line = @lines.next
257:       if line.blank? then
258:         line.stamp :BLANK, level
259:         next
260:       end
261: 
262:       # if a line contains non-blanks before the margin, then it must belong
263:       # to an outer level
264: 
265:       text = line.text
266: 
267:       for i in 0...margin
268:         if text[i] != SPACE
269:           @lines.unget
270:           return
271:         end
272:       end
273: 
274:       active_line = text[margin..-1]
275: 
276:       #
277:       # block_exceptions checking
278:       #
279:       if @block_exceptions
280:         if now_blocking
281:           line.stamp(:PARAGRAPH, level)
282:           @block_exceptions.each{ |be|
283:             if now_blocking == be['name']
284:               be['replaces'].each{ |rep|
285:                 line.text.gsub!(rep['from'], rep['to'])
286:               }
287:             end
288:             if now_blocking == be['name'] && line.text =~ be['end']
289:               now_blocking = false
290:               break
291:             end
292:           }
293:           next
294:         else
295:           @block_exceptions.each{ |be|
296:             if line.text =~ be['start']
297:               now_blocking = be['name']
298:               line.stamp(:PARAGRAPH, level)
299:               break
300:             end
301:           }
302:           next if now_blocking
303:         end
304:       end
305: 
306:       # Rules (horizontal lines) look like
307:       #
308:       #  ---   (three or more hyphens)
309:       #
310:       # The more hyphens, the thicker the rule
311:       #
312: 
313:       if /^(---+)\s*$/ =~ active_line
314:         line.stamp :RULE, level, $1.length-2
315:         next
316:       end
317: 
318:       # Then look for list entries.  First the ones that have to have
319:       # text following them (* xxx, - xxx, and dd. xxx)
320: 
321:       if SIMPLE_LIST_RE =~ active_line
322:         offset = margin + $1.length
323:         prefix = $2
324:         prefix_length = prefix.length
325: 
326:         flag = case prefix
327:                when "*","-" then :BULLET
328:                when /^\d/   then :NUMBER
329:                when /^[A-Z]/ then :UPPERALPHA
330:                when /^[a-z]/ then :LOWERALPHA
331:                else raise "Invalid List Type: #{self.inspect}"
332:                end
333: 
334:         line.stamp :LIST, level+1, prefix, flag
335:         text[margin, prefix_length] = " " * prefix_length
336:         assign_types_to_lines(offset, level + 1)
337:         next
338:       end
339: 
340:       if LABEL_LIST_RE =~ active_line
341:         offset = margin + $1.length
342:         prefix = $2
343:         prefix_length = prefix.length
344: 
345:         next if handled_labeled_list(line, level, margin, offset, prefix)
346:       end
347: 
348:       # Headings look like
349:       # = Main heading
350:       # == Second level
351:       # === Third
352:       #
353:       # Headings reset the level to 0
354: 
355:       if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/
356:         prefix_length = $1.length
357:         prefix_length = 6 if prefix_length > 6
358:         line.stamp :HEADING, 0, prefix_length
359:         line.strip_leading(margin + prefix_length)
360:         next
361:       end
362: 
363:       # If the character's a space, then we have verbatim text,
364:       # otherwise
365: 
366:       if active_line[0] == SPACE
367:         line.strip_leading(margin) if margin > 0
368:         line.stamp :VERBATIM, level
369:       else
370:         line.stamp :PARAGRAPH, level
371:       end
372:     end
373:   end

For debugging, we allow access to our line contents as text.

[Source]

     # File markup.rb, line 489
489:   def content
490:     @lines.as_text
491:   end

For debugging, we allow access to our line contents as text.

[Source]

     # File doc-tmp/rdoc/markup.rb, line 489
489:   def content
490:     @lines.as_text
491:   end

We take a string, split it into lines, work out the type of each line, and from there deduce groups of lines (for example all lines in a paragraph). We then invoke the output formatter using a Visitor to display the result.

[Source]

     # File doc-tmp/rdoc/markup.rb, line 234
234:   def convert(str, op, block_exceptions=nil)
235:     lines = str.split(/\r?\n/).map { |line| Line.new line }
236:     @lines = Lines.new lines
237:     @block_exceptions = block_exceptions
238: 
239:     return "" if @lines.empty?
240:     @lines.normalize
241:     assign_types_to_lines
242:     group = group_lines
243:     # call the output formatter to handle the result
244:     #group.each { |line| p line }
245:     group.accept @am, op
246:   end

We take a string, split it into lines, work out the type of each line, and from there deduce groups of lines (for example all lines in a paragraph). We then invoke the output formatter using a Visitor to display the result.

[Source]

     # File markup.rb, line 234
234:   def convert(str, op, block_exceptions=nil)
235:     lines = str.split(/\r?\n/).map { |line| Line.new line }
236:     @lines = Lines.new lines
237:     @block_exceptions = block_exceptions
238: 
239:     return "" if @lines.empty?
240:     @lines.normalize
241:     assign_types_to_lines
242:     group = group_lines
243:     # call the output formatter to handle the result
244:     #group.each { |line| p line }
245:     group.accept @am, op
246:   end

For debugging, return the list of line types.

[Source]

     # File doc-tmp/rdoc/markup.rb, line 497
497:   def get_line_types
498:     @lines.line_types
499:   end

For debugging, return the list of line types.

[Source]

     # File markup.rb, line 497
497:   def get_line_types
498:     @lines.line_types
499:   end

Return a block consisting of fragments which are paragraphs, list entries or verbatim text. We merge consecutive lines of the same type and level together. We are also slightly tricky with lists: the lines following a list introduction look like paragraph lines at the next level, and we remap them into list entries instead.

[Source]

     # File markup.rb, line 456
456:   def group_lines
457:     @lines.rewind
458: 
459:     in_list = false
460:     wanted_type = wanted_level = nil
461: 
462:     block = LineCollection.new
463:     group = nil
464: 
465:     while line = @lines.next
466:       if line.level == wanted_level and line.type == wanted_type
467:         group.add_text(line.text)
468:       else
469:         group = block.fragment_for(line)
470:         block.add(group)
471: 
472:         if line.type == :LIST
473:           wanted_type = :PARAGRAPH
474:         else
475:           wanted_type = line.type
476:         end
477: 
478:         wanted_level = line.type == :HEADING ? line.param : line.level
479:       end
480:     end
481: 
482:     block.normalize
483:     block
484:   end

Return a block consisting of fragments which are paragraphs, list entries or verbatim text. We merge consecutive lines of the same type and level together. We are also slightly tricky with lists: the lines following a list introduction look like paragraph lines at the next level, and we remap them into list entries instead.

[Source]

     # File doc-tmp/rdoc/markup.rb, line 456
456:   def group_lines
457:     @lines.rewind
458: 
459:     in_list = false
460:     wanted_type = wanted_level = nil
461: 
462:     block = LineCollection.new
463:     group = nil
464: 
465:     while line = @lines.next
466:       if line.level == wanted_level and line.type == wanted_type
467:         group.add_text(line.text)
468:       else
469:         group = block.fragment_for(line)
470:         block.add(group)
471: 
472:         if line.type == :LIST
473:           wanted_type = :PARAGRAPH
474:         else
475:           wanted_type = line.type
476:         end
477: 
478:         wanted_level = line.type == :HEADING ? line.param : line.level
479:       end
480:     end
481: 
482:     block.normalize
483:     block
484:   end

Handle labeled list entries, We have a special case to deal with. Because the labels can be long, they force the remaining block of text over the to right:

  this is a long label that I wrote:: and here is the
                                      block of text with
                                      a silly margin

So we allow the special case. If the label is followed by nothing, and if the following line is indented, then we take the indent of that line as the new margin.

  this is a long label that I wrote::
      here is a more reasonably indented block which
      will be attached to the label.

[Source]

     # File markup.rb, line 393
393:   def handled_labeled_list(line, level, margin, offset, prefix)
394:     prefix_length = prefix.length
395:     text = line.text
396:     flag = nil
397: 
398:     case prefix
399:     when /^\[/ then
400:       flag = :LABELED
401:       prefix = prefix[1, prefix.length-2]
402:     when /:$/ then
403:       flag = :NOTE
404:       prefix.chop!
405:     else
406:       raise "Invalid List Type: #{self.inspect}"
407:     end
408: 
409:     # body is on the next line
410:     if text.length <= offset then
411:       original_line = line
412:       line = @lines.next
413:       return false unless line
414:       text = line.text
415: 
416:       for i in 0..margin
417:         if text[i] != SPACE
418:           @lines.unget
419:           return false
420:         end
421:       end
422: 
423:       i = margin
424:       i += 1 while text[i] == SPACE
425: 
426:       if i >= text.length then
427:         @lines.unget
428:         return false
429:       else
430:         offset = i
431:         prefix_length = 0
432: 
433:         if text[offset..-1] =~ SIMPLE_LIST_RE then
434:           @lines.unget
435:           line = original_line
436:           line.text = ''
437:         else
438:           @lines.delete original_line
439:         end
440:       end
441:     end
442: 
443:     line.stamp :LIST, level+1, prefix, flag
444:     text[margin, prefix_length] = " " * prefix_length
445:     assign_types_to_lines(offset, level + 1)
446:     return true
447:   end

Handle labeled list entries, We have a special case to deal with. Because the labels can be long, they force the remaining block of text over the to right:

  this is a long label that I wrote:: and here is the
                                      block of text with
                                      a silly margin

So we allow the special case. If the label is followed by nothing, and if the following line is indented, then we take the indent of that line as the new margin.

  this is a long label that I wrote::
      here is a more reasonably indented block which
      will be attached to the label.

[Source]

     # File doc-tmp/rdoc/markup.rb, line 393
393:   def handled_labeled_list(line, level, margin, offset, prefix)
394:     prefix_length = prefix.length
395:     text = line.text
396:     flag = nil
397: 
398:     case prefix
399:     when /^\[/ then
400:       flag = :LABELED
401:       prefix = prefix[1, prefix.length-2]
402:     when /:$/ then
403:       flag = :NOTE
404:       prefix.chop!
405:     else
406:       raise "Invalid List Type: #{self.inspect}"
407:     end
408: 
409:     # body is on the next line
410:     if text.length <= offset then
411:       original_line = line
412:       line = @lines.next
413:       return false unless line
414:       text = line.text
415: 
416:       for i in 0..margin
417:         if text[i] != SPACE
418:           @lines.unget
419:           return false
420:         end
421:       end
422: 
423:       i = margin
424:       i += 1 while text[i] == SPACE
425: 
426:       if i >= text.length then
427:         @lines.unget
428:         return false
429:       else
430:         offset = i
431:         prefix_length = 0
432: 
433:         if text[offset..-1] =~ SIMPLE_LIST_RE then
434:           @lines.unget
435:           line = original_line
436:           line.text = ''
437:         else
438:           @lines.delete original_line
439:         end
440:       end
441:     end
442: 
443:     line.stamp :LIST, level+1, prefix, flag
444:     text[margin, prefix_length] = " " * prefix_length
445:     assign_types_to_lines(offset, level + 1)
446:     return true
447:   end

[Validate]