| Class | RDoc::Markup |
| In: |
markup.rb
markup/fragments.rb markup/inline.rb markup/lines.rb markup/to_flow.rb doc-tmp/rdoc/markup.rb |
| Parent: | Object |
RDoc::Markup parses plain text documents and attempts to decompose them into their constituent parts. Some of these parts are high-level: paragraphs, chunks of verbatim text, list entries and the like. Other parts happen at the character level: a piece of bold text, a word in code font. This markup is similar in spirit to that used on WikiWiki webs, where folks create web pages using a simple set of formatting rules.
RDoc::Markup itself does no output formatting: this is left to a different set of classes.
RDoc::Markup is extendable at runtime: you can add new markup elements to be recognised in the documents that RDoc::Markup parses.
RDoc::Markup is intended to be the basis for a family of tools which share the common requirement that simple, plain-text should be rendered in a variety of different output formats and media. It is envisaged that RDoc::Markup could be the basis for formating RDoc style comment blocks, Wiki entries, and online FAQs.
* this is a list with three paragraphs in
the first item. This is the first paragraph.
And this is the second paragraph.
1. This is an indented, numbered list.
2. This is the second item in that list
This is the third conventional paragraph in the
first list item.
* This is the second item in the original list
[cat] a small furry mammal
that seems to sleep a lot
[ant] a little insect that is known
to enjoy picnics
A minor variation on labeled lists uses two colons to separate the label from the list body:
cat:: a small furry mammal
that seems to sleep a lot
ant:: a little insect that is known
to enjoy picnics
This latter style guarantees that the list bodies’ left margins are aligned: think of them as a two column table.
Word-based markup uses flag characters around individual words:
General markup affects text between a start delimiter and and end delimiter. Not surprisingly, these delimiters look like HTML markup.
Unlike conventional Wiki markup, general markup can cross line boundaries. You can turn off the interpretation of markup by preceding the first character with a backslash, so \<b>bold text</b> and \*bold* produce <b>bold text</b> and *bold respectively.
Hyperlinks can also be of the form label[url], in which case the label is used in the displayed text, and url is used as the target. If label contains multiple words, put it in braces: {multi word label}[url].
This code converts input_string to HTML. The conversion takes place in the convert method, so you can use the same RDoc::Markup object to convert multiple input strings.
require 'rdoc/markup' require 'rdoc/markup/to_html' p = RDoc::Markup.new h = RDoc::Markup::ToHtml.new puts p.convert(input_string, h)
You can extend the RDoc::Markup parser to recognise new markup sequences, and to add special processing for text that matches a regular epxression. Here we make WikiWords significant to the parser, and also make the sequences {word} and <no>text...</no> signify strike-through text. When then subclass the HTML output class to deal with these:
require 'rdoc/markup'
require 'rdoc/markup/to_html'
class WikiHtml < RDoc::Markup::ToHtml
def handle_special_WIKIWORD(special)
"<font color=red>" + special.text + "</font>"
end
end
m = RDoc::Markup.new
m.add_word_pair("{", "}", :STRIKE)
m.add_html("no", :STRIKE)
m.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
h = WikiHtml.new
h.add_tag(:STRIKE, "<strike>", "</strike>")
puts "<body>" + m.convert(ARGF.read, h) + "</body>"
Take a block of text and use various heuristics to determine it‘s structure (paragraphs, lists, and so on). Invoke an event handler as we identify significant chunks.
# File doc-tmp/rdoc/markup.rb, line 194
194: def initialize
195: @am = RDoc::Markup::AttributeManager.new
196: @output = nil
197: end
Take a block of text and use various heuristics to determine it‘s structure (paragraphs, lists, and so on). Invoke an event handler as we identify significant chunks.
# File markup.rb, line 194
194: def initialize
195: @am = RDoc::Markup::AttributeManager.new
196: @output = nil
197: end
Add to the sequences recognized as general markup.
# File doc-tmp/rdoc/markup.rb, line 211
211: def add_html(tag, name)
212: @am.add_html(tag, name)
213: end
Add to the sequences recognized as general markup.
# File markup.rb, line 211
211: def add_html(tag, name)
212: @am.add_html(tag, name)
213: end
Add to other inline sequences. For example, we could add WikiWords using something like:
parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
Each wiki word will be presented to the output formatter via the accept_special method.
# File markup.rb, line 224
224: def add_special(pattern, name)
225: @am.add_special(pattern, name)
226: end
Add to other inline sequences. For example, we could add WikiWords using something like:
parser.add_special(/\b([A-Z][a-z]+[A-Z]\w+)/, :WIKIWORD)
Each wiki word will be presented to the output formatter via the accept_special method.
# File doc-tmp/rdoc/markup.rb, line 224
224: def add_special(pattern, name)
225: @am.add_special(pattern, name)
226: end
Add to the sequences used to add formatting to an individual word (such as bold). Matching entries will generate attibutes that the output formatters can recognize by their name.
# File doc-tmp/rdoc/markup.rb, line 204
204: def add_word_pair(start, stop, name)
205: @am.add_word_pair(start, stop, name)
206: end
Add to the sequences used to add formatting to an individual word (such as bold). Matching entries will generate attibutes that the output formatters can recognize by their name.
# File markup.rb, line 204
204: def add_word_pair(start, stop, name)
205: @am.add_word_pair(start, stop, name)
206: end
Look through the text at line indentation. We flag each line as being Blank, a paragraph, a list element, or verbatim text.
# File markup.rb, line 254
254: def assign_types_to_lines(margin = 0, level = 0)
255: now_blocking = false
256: while line = @lines.next
257: if line.blank? then
258: line.stamp :BLANK, level
259: next
260: end
261:
262: # if a line contains non-blanks before the margin, then it must belong
263: # to an outer level
264:
265: text = line.text
266:
267: for i in 0...margin
268: if text[i] != SPACE
269: @lines.unget
270: return
271: end
272: end
273:
274: active_line = text[margin..-1]
275:
276: #
277: # block_exceptions checking
278: #
279: if @block_exceptions
280: if now_blocking
281: line.stamp(:PARAGRAPH, level)
282: @block_exceptions.each{ |be|
283: if now_blocking == be['name']
284: be['replaces'].each{ |rep|
285: line.text.gsub!(rep['from'], rep['to'])
286: }
287: end
288: if now_blocking == be['name'] && line.text =~ be['end']
289: now_blocking = false
290: break
291: end
292: }
293: next
294: else
295: @block_exceptions.each{ |be|
296: if line.text =~ be['start']
297: now_blocking = be['name']
298: line.stamp(:PARAGRAPH, level)
299: break
300: end
301: }
302: next if now_blocking
303: end
304: end
305:
306: # Rules (horizontal lines) look like
307: #
308: # --- (three or more hyphens)
309: #
310: # The more hyphens, the thicker the rule
311: #
312:
313: if /^(---+)\s*$/ =~ active_line
314: line.stamp :RULE, level, $1.length-2
315: next
316: end
317:
318: # Then look for list entries. First the ones that have to have
319: # text following them (* xxx, - xxx, and dd. xxx)
320:
321: if SIMPLE_LIST_RE =~ active_line
322: offset = margin + $1.length
323: prefix = $2
324: prefix_length = prefix.length
325:
326: flag = case prefix
327: when "*","-" then :BULLET
328: when /^\d/ then :NUMBER
329: when /^[A-Z]/ then :UPPERALPHA
330: when /^[a-z]/ then :LOWERALPHA
331: else raise "Invalid List Type: #{self.inspect}"
332: end
333:
334: line.stamp :LIST, level+1, prefix, flag
335: text[margin, prefix_length] = " " * prefix_length
336: assign_types_to_lines(offset, level + 1)
337: next
338: end
339:
340: if LABEL_LIST_RE =~ active_line
341: offset = margin + $1.length
342: prefix = $2
343: prefix_length = prefix.length
344:
345: next if handled_labeled_list(line, level, margin, offset, prefix)
346: end
347:
348: # Headings look like
349: # = Main heading
350: # == Second level
351: # === Third
352: #
353: # Headings reset the level to 0
354:
355: if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/
356: prefix_length = $1.length
357: prefix_length = 6 if prefix_length > 6
358: line.stamp :HEADING, 0, prefix_length
359: line.strip_leading(margin + prefix_length)
360: next
361: end
362:
363: # If the character's a space, then we have verbatim text,
364: # otherwise
365:
366: if active_line[0] == SPACE
367: line.strip_leading(margin) if margin > 0
368: line.stamp :VERBATIM, level
369: else
370: line.stamp :PARAGRAPH, level
371: end
372: end
373: end
Look through the text at line indentation. We flag each line as being Blank, a paragraph, a list element, or verbatim text.
# File doc-tmp/rdoc/markup.rb, line 254
254: def assign_types_to_lines(margin = 0, level = 0)
255: now_blocking = false
256: while line = @lines.next
257: if line.blank? then
258: line.stamp :BLANK, level
259: next
260: end
261:
262: # if a line contains non-blanks before the margin, then it must belong
263: # to an outer level
264:
265: text = line.text
266:
267: for i in 0...margin
268: if text[i] != SPACE
269: @lines.unget
270: return
271: end
272: end
273:
274: active_line = text[margin..-1]
275:
276: #
277: # block_exceptions checking
278: #
279: if @block_exceptions
280: if now_blocking
281: line.stamp(:PARAGRAPH, level)
282: @block_exceptions.each{ |be|
283: if now_blocking == be['name']
284: be['replaces'].each{ |rep|
285: line.text.gsub!(rep['from'], rep['to'])
286: }
287: end
288: if now_blocking == be['name'] && line.text =~ be['end']
289: now_blocking = false
290: break
291: end
292: }
293: next
294: else
295: @block_exceptions.each{ |be|
296: if line.text =~ be['start']
297: now_blocking = be['name']
298: line.stamp(:PARAGRAPH, level)
299: break
300: end
301: }
302: next if now_blocking
303: end
304: end
305:
306: # Rules (horizontal lines) look like
307: #
308: # --- (three or more hyphens)
309: #
310: # The more hyphens, the thicker the rule
311: #
312:
313: if /^(---+)\s*$/ =~ active_line
314: line.stamp :RULE, level, $1.length-2
315: next
316: end
317:
318: # Then look for list entries. First the ones that have to have
319: # text following them (* xxx, - xxx, and dd. xxx)
320:
321: if SIMPLE_LIST_RE =~ active_line
322: offset = margin + $1.length
323: prefix = $2
324: prefix_length = prefix.length
325:
326: flag = case prefix
327: when "*","-" then :BULLET
328: when /^\d/ then :NUMBER
329: when /^[A-Z]/ then :UPPERALPHA
330: when /^[a-z]/ then :LOWERALPHA
331: else raise "Invalid List Type: #{self.inspect}"
332: end
333:
334: line.stamp :LIST, level+1, prefix, flag
335: text[margin, prefix_length] = " " * prefix_length
336: assign_types_to_lines(offset, level + 1)
337: next
338: end
339:
340: if LABEL_LIST_RE =~ active_line
341: offset = margin + $1.length
342: prefix = $2
343: prefix_length = prefix.length
344:
345: next if handled_labeled_list(line, level, margin, offset, prefix)
346: end
347:
348: # Headings look like
349: # = Main heading
350: # == Second level
351: # === Third
352: #
353: # Headings reset the level to 0
354:
355: if active_line[0] == ?= and active_line =~ /^(=+)\s*(.*)/
356: prefix_length = $1.length
357: prefix_length = 6 if prefix_length > 6
358: line.stamp :HEADING, 0, prefix_length
359: line.strip_leading(margin + prefix_length)
360: next
361: end
362:
363: # If the character's a space, then we have verbatim text,
364: # otherwise
365:
366: if active_line[0] == SPACE
367: line.strip_leading(margin) if margin > 0
368: line.stamp :VERBATIM, level
369: else
370: line.stamp :PARAGRAPH, level
371: end
372: end
373: end
For debugging, we allow access to our line contents as text.
# File markup.rb, line 489
489: def content
490: @lines.as_text
491: end
For debugging, we allow access to our line contents as text.
# File doc-tmp/rdoc/markup.rb, line 489
489: def content
490: @lines.as_text
491: end
We take a string, split it into lines, work out the type of each line, and from there deduce groups of lines (for example all lines in a paragraph). We then invoke the output formatter using a Visitor to display the result.
# File doc-tmp/rdoc/markup.rb, line 234
234: def convert(str, op, block_exceptions=nil)
235: lines = str.split(/\r?\n/).map { |line| Line.new line }
236: @lines = Lines.new lines
237: @block_exceptions = block_exceptions
238:
239: return "" if @lines.empty?
240: @lines.normalize
241: assign_types_to_lines
242: group = group_lines
243: # call the output formatter to handle the result
244: #group.each { |line| p line }
245: group.accept @am, op
246: end
We take a string, split it into lines, work out the type of each line, and from there deduce groups of lines (for example all lines in a paragraph). We then invoke the output formatter using a Visitor to display the result.
# File markup.rb, line 234
234: def convert(str, op, block_exceptions=nil)
235: lines = str.split(/\r?\n/).map { |line| Line.new line }
236: @lines = Lines.new lines
237: @block_exceptions = block_exceptions
238:
239: return "" if @lines.empty?
240: @lines.normalize
241: assign_types_to_lines
242: group = group_lines
243: # call the output formatter to handle the result
244: #group.each { |line| p line }
245: group.accept @am, op
246: end
For debugging, return the list of line types.
# File doc-tmp/rdoc/markup.rb, line 497
497: def get_line_types
498: @lines.line_types
499: end
For debugging, return the list of line types.
# File markup.rb, line 497
497: def get_line_types
498: @lines.line_types
499: end
Return a block consisting of fragments which are paragraphs, list entries or verbatim text. We merge consecutive lines of the same type and level together. We are also slightly tricky with lists: the lines following a list introduction look like paragraph lines at the next level, and we remap them into list entries instead.
# File markup.rb, line 456
456: def group_lines
457: @lines.rewind
458:
459: in_list = false
460: wanted_type = wanted_level = nil
461:
462: block = LineCollection.new
463: group = nil
464:
465: while line = @lines.next
466: if line.level == wanted_level and line.type == wanted_type
467: group.add_text(line.text)
468: else
469: group = block.fragment_for(line)
470: block.add(group)
471:
472: if line.type == :LIST
473: wanted_type = :PARAGRAPH
474: else
475: wanted_type = line.type
476: end
477:
478: wanted_level = line.type == :HEADING ? line.param : line.level
479: end
480: end
481:
482: block.normalize
483: block
484: end
Return a block consisting of fragments which are paragraphs, list entries or verbatim text. We merge consecutive lines of the same type and level together. We are also slightly tricky with lists: the lines following a list introduction look like paragraph lines at the next level, and we remap them into list entries instead.
# File doc-tmp/rdoc/markup.rb, line 456
456: def group_lines
457: @lines.rewind
458:
459: in_list = false
460: wanted_type = wanted_level = nil
461:
462: block = LineCollection.new
463: group = nil
464:
465: while line = @lines.next
466: if line.level == wanted_level and line.type == wanted_type
467: group.add_text(line.text)
468: else
469: group = block.fragment_for(line)
470: block.add(group)
471:
472: if line.type == :LIST
473: wanted_type = :PARAGRAPH
474: else
475: wanted_type = line.type
476: end
477:
478: wanted_level = line.type == :HEADING ? line.param : line.level
479: end
480: end
481:
482: block.normalize
483: block
484: end
Handle labeled list entries, We have a special case to deal with. Because the labels can be long, they force the remaining block of text over the to right:
this is a long label that I wrote:: and here is the
block of text with
a silly margin
So we allow the special case. If the label is followed by nothing, and if the following line is indented, then we take the indent of that line as the new margin.
this is a long label that I wrote::
here is a more reasonably indented block which
will be attached to the label.
# File markup.rb, line 393
393: def handled_labeled_list(line, level, margin, offset, prefix)
394: prefix_length = prefix.length
395: text = line.text
396: flag = nil
397:
398: case prefix
399: when /^\[/ then
400: flag = :LABELED
401: prefix = prefix[1, prefix.length-2]
402: when /:$/ then
403: flag = :NOTE
404: prefix.chop!
405: else
406: raise "Invalid List Type: #{self.inspect}"
407: end
408:
409: # body is on the next line
410: if text.length <= offset then
411: original_line = line
412: line = @lines.next
413: return false unless line
414: text = line.text
415:
416: for i in 0..margin
417: if text[i] != SPACE
418: @lines.unget
419: return false
420: end
421: end
422:
423: i = margin
424: i += 1 while text[i] == SPACE
425:
426: if i >= text.length then
427: @lines.unget
428: return false
429: else
430: offset = i
431: prefix_length = 0
432:
433: if text[offset..-1] =~ SIMPLE_LIST_RE then
434: @lines.unget
435: line = original_line
436: line.text = ''
437: else
438: @lines.delete original_line
439: end
440: end
441: end
442:
443: line.stamp :LIST, level+1, prefix, flag
444: text[margin, prefix_length] = " " * prefix_length
445: assign_types_to_lines(offset, level + 1)
446: return true
447: end
Handle labeled list entries, We have a special case to deal with. Because the labels can be long, they force the remaining block of text over the to right:
this is a long label that I wrote:: and here is the
block of text with
a silly margin
So we allow the special case. If the label is followed by nothing, and if the following line is indented, then we take the indent of that line as the new margin.
this is a long label that I wrote::
here is a more reasonably indented block which
will be attached to the label.
# File doc-tmp/rdoc/markup.rb, line 393
393: def handled_labeled_list(line, level, margin, offset, prefix)
394: prefix_length = prefix.length
395: text = line.text
396: flag = nil
397:
398: case prefix
399: when /^\[/ then
400: flag = :LABELED
401: prefix = prefix[1, prefix.length-2]
402: when /:$/ then
403: flag = :NOTE
404: prefix.chop!
405: else
406: raise "Invalid List Type: #{self.inspect}"
407: end
408:
409: # body is on the next line
410: if text.length <= offset then
411: original_line = line
412: line = @lines.next
413: return false unless line
414: text = line.text
415:
416: for i in 0..margin
417: if text[i] != SPACE
418: @lines.unget
419: return false
420: end
421: end
422:
423: i = margin
424: i += 1 while text[i] == SPACE
425:
426: if i >= text.length then
427: @lines.unget
428: return false
429: else
430: offset = i
431: prefix_length = 0
432:
433: if text[offset..-1] =~ SIMPLE_LIST_RE then
434: @lines.unget
435: line = original_line
436: line.text = ''
437: else
438: @lines.delete original_line
439: end
440: end
441: end
442:
443: line.stamp :LIST, level+1, prefix, flag
444: text[margin, prefix_length] = " " * prefix_length
445: assign_types_to_lines(offset, level + 1)
446: return true
447: end