170 lines
6.0 KiB
ReStructuredText
170 lines
6.0 KiB
ReStructuredText
.. -*- mode: rst -*-
|
|
|
|
========================
|
|
Write your own formatter
|
|
========================
|
|
|
|
As well as creating :doc:`your own lexer <lexerdevelopment>`, writing a new
|
|
formatter for Pygments is easy and straightforward.
|
|
|
|
A formatter is a class that is initialized with some keyword arguments (the
|
|
formatter options) and that must provides a `format()` method.
|
|
Additionally a formatter should provide a `get_style_defs()` method that
|
|
returns the style definitions from the style in a form usable for the
|
|
formatter's output format.
|
|
|
|
|
|
Quickstart
|
|
==========
|
|
|
|
The most basic formatter shipped with Pygments is the `NullFormatter`. It just
|
|
sends the value of a token to the output stream:
|
|
|
|
.. sourcecode:: python
|
|
|
|
from pygments.formatter import Formatter
|
|
|
|
class NullFormatter(Formatter):
|
|
def format(self, tokensource, outfile):
|
|
for ttype, value in tokensource:
|
|
outfile.write(value)
|
|
|
|
As you can see, the `format()` method is passed two parameters: `tokensource`
|
|
and `outfile`. The first is an iterable of ``(token_type, value)`` tuples,
|
|
the latter a file like object with a `write()` method.
|
|
|
|
Because the formatter is that basic it doesn't overwrite the `get_style_defs()`
|
|
method.
|
|
|
|
|
|
Styles
|
|
======
|
|
|
|
Styles aren't instantiated but their metaclass provides some class functions
|
|
so that you can access the style definitions easily.
|
|
|
|
Styles are iterable and yield tuples in the form ``(ttype, d)`` where `ttype`
|
|
is a token and `d` is a dict with the following keys:
|
|
|
|
``'color'``
|
|
Hexadecimal color value (eg: ``'ff0000'`` for red) or `None` if not
|
|
defined.
|
|
|
|
``'bold'``
|
|
`True` if the value should be bold
|
|
|
|
``'italic'``
|
|
`True` if the value should be italic
|
|
|
|
``'underline'``
|
|
`True` if the value should be underlined
|
|
|
|
``'bgcolor'``
|
|
Hexadecimal color value for the background (eg: ``'eeeeeee'`` for light
|
|
gray) or `None` if not defined.
|
|
|
|
``'border'``
|
|
Hexadecimal color value for the border (eg: ``'0000aa'`` for a dark
|
|
blue) or `None` for no border.
|
|
|
|
Additional keys might appear in the future, formatters should ignore all keys
|
|
they don't support.
|
|
|
|
|
|
HTML 3.2 Formatter
|
|
==================
|
|
|
|
For an more complex example, let's implement a HTML 3.2 Formatter. We don't
|
|
use CSS but inline markup (``<u>``, ``<font>``, etc). Because this isn't good
|
|
style this formatter isn't in the standard library ;-)
|
|
|
|
.. sourcecode:: python
|
|
|
|
from pygments.formatter import Formatter
|
|
|
|
class OldHtmlFormatter(Formatter):
|
|
|
|
def __init__(self, **options):
|
|
Formatter.__init__(self, **options)
|
|
|
|
# create a dict of (start, end) tuples that wrap the
|
|
# value of a token so that we can use it in the format
|
|
# method later
|
|
self.styles = {}
|
|
|
|
# we iterate over the `_styles` attribute of a style item
|
|
# that contains the parsed style values.
|
|
for token, style in self.style:
|
|
start = end = ''
|
|
# a style item is a tuple in the following form:
|
|
# colors are readily specified in hex: 'RRGGBB'
|
|
if style['color']:
|
|
start += '<font color="#%s">' % style['color']
|
|
end = '</font>' + end
|
|
if style['bold']:
|
|
start += '<b>'
|
|
end = '</b>' + end
|
|
if style['italic']:
|
|
start += '<i>'
|
|
end = '</i>' + end
|
|
if style['underline']:
|
|
start += '<u>'
|
|
end = '</u>' + end
|
|
self.styles[token] = (start, end)
|
|
|
|
def format(self, tokensource, outfile):
|
|
# lastval is a string we use for caching
|
|
# because it's possible that an lexer yields a number
|
|
# of consecutive tokens with the same token type.
|
|
# to minimize the size of the generated html markup we
|
|
# try to join the values of same-type tokens here
|
|
lastval = ''
|
|
lasttype = None
|
|
|
|
# wrap the whole output with <pre>
|
|
outfile.write('<pre>')
|
|
|
|
for ttype, value in tokensource:
|
|
# if the token type doesn't exist in the stylemap
|
|
# we try it with the parent of the token type
|
|
# eg: parent of Token.Literal.String.Double is
|
|
# Token.Literal.String
|
|
while ttype not in self.styles:
|
|
ttype = ttype.parent
|
|
if ttype == lasttype:
|
|
# the current token type is the same of the last
|
|
# iteration. cache it
|
|
lastval += value
|
|
else:
|
|
# not the same token as last iteration, but we
|
|
# have some data in the buffer. wrap it with the
|
|
# defined style and write it to the output file
|
|
if lastval:
|
|
stylebegin, styleend = self.styles[lasttype]
|
|
outfile.write(stylebegin + lastval + styleend)
|
|
# set lastval/lasttype to current values
|
|
lastval = value
|
|
lasttype = ttype
|
|
|
|
# if something is left in the buffer, write it to the
|
|
# output file, then close the opened <pre> tag
|
|
if lastval:
|
|
stylebegin, styleend = self.styles[lasttype]
|
|
outfile.write(stylebegin + lastval + styleend)
|
|
outfile.write('</pre>\n')
|
|
|
|
The comments should explain it. Again, this formatter doesn't override the
|
|
`get_style_defs()` method. If we would have used CSS classes instead of
|
|
inline HTML markup, we would need to generate the CSS first. For that
|
|
purpose the `get_style_defs()` method exists:
|
|
|
|
|
|
Generating Style Definitions
|
|
============================
|
|
|
|
Some formatters like the `LatexFormatter` and the `HtmlFormatter` don't
|
|
output inline markup but reference either macros or css classes. Because
|
|
the definitions of those are not part of the output, the `get_style_defs()`
|
|
method exists. It is passed one parameter (if it's used and how it's used
|
|
is up to the formatter) and has to return a string or ``None``.
|