Update vendor'd Pygments to 2.1.3
This commit is contained in:
316
vendor/pygments/doc/docs/api.rst
vendored
Normal file
316
vendor/pygments/doc/docs/api.rst
vendored
Normal file
@@ -0,0 +1,316 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
=====================
|
||||
The full Pygments API
|
||||
=====================
|
||||
|
||||
This page describes the Pygments API.
|
||||
|
||||
High-level API
|
||||
==============
|
||||
|
||||
.. module:: pygments
|
||||
|
||||
Functions from the :mod:`pygments` module:
|
||||
|
||||
.. function:: lex(code, lexer)
|
||||
|
||||
Lex `code` with the `lexer` (must be a `Lexer` instance)
|
||||
and return an iterable of tokens. Currently, this only calls
|
||||
`lexer.get_tokens()`.
|
||||
|
||||
.. function:: format(tokens, formatter, outfile=None)
|
||||
|
||||
Format a token stream (iterable of tokens) `tokens` with the
|
||||
`formatter` (must be a `Formatter` instance). The result is
|
||||
written to `outfile`, or if that is ``None``, returned as a
|
||||
string.
|
||||
|
||||
.. function:: highlight(code, lexer, formatter, outfile=None)
|
||||
|
||||
This is the most high-level highlighting function.
|
||||
It combines `lex` and `format` in one function.
|
||||
|
||||
|
||||
.. module:: pygments.lexers
|
||||
|
||||
Functions from :mod:`pygments.lexers`:
|
||||
|
||||
.. function:: get_lexer_by_name(alias, **options)
|
||||
|
||||
Return an instance of a `Lexer` subclass that has `alias` in its
|
||||
aliases list. The lexer is given the `options` at its
|
||||
instantiation.
|
||||
|
||||
Will raise :exc:`pygments.util.ClassNotFound` if no lexer with that alias is
|
||||
found.
|
||||
|
||||
.. function:: get_lexer_for_filename(fn, **options)
|
||||
|
||||
Return a `Lexer` subclass instance that has a filename pattern
|
||||
matching `fn`. The lexer is given the `options` at its
|
||||
instantiation.
|
||||
|
||||
Will raise :exc:`pygments.util.ClassNotFound` if no lexer for that filename
|
||||
is found.
|
||||
|
||||
.. function:: get_lexer_for_mimetype(mime, **options)
|
||||
|
||||
Return a `Lexer` subclass instance that has `mime` in its mimetype
|
||||
list. The lexer is given the `options` at its instantiation.
|
||||
|
||||
Will raise :exc:`pygments.util.ClassNotFound` if not lexer for that mimetype
|
||||
is found.
|
||||
|
||||
.. function:: guess_lexer(text, **options)
|
||||
|
||||
Return a `Lexer` subclass instance that's guessed from the text in
|
||||
`text`. For that, the :meth:`.analyse_text()` method of every known lexer
|
||||
class is called with the text as argument, and the lexer which returned the
|
||||
highest value will be instantiated and returned.
|
||||
|
||||
:exc:`pygments.util.ClassNotFound` is raised if no lexer thinks it can
|
||||
handle the content.
|
||||
|
||||
.. function:: guess_lexer_for_filename(filename, text, **options)
|
||||
|
||||
As :func:`guess_lexer()`, but only lexers which have a pattern in `filenames`
|
||||
or `alias_filenames` that matches `filename` are taken into consideration.
|
||||
|
||||
:exc:`pygments.util.ClassNotFound` is raised if no lexer thinks it can
|
||||
handle the content.
|
||||
|
||||
.. function:: get_all_lexers()
|
||||
|
||||
Return an iterable over all registered lexers, yielding tuples in the
|
||||
format::
|
||||
|
||||
(longname, tuple of aliases, tuple of filename patterns, tuple of mimetypes)
|
||||
|
||||
.. versionadded:: 0.6
|
||||
|
||||
|
||||
.. module:: pygments.formatters
|
||||
|
||||
Functions from :mod:`pygments.formatters`:
|
||||
|
||||
.. function:: get_formatter_by_name(alias, **options)
|
||||
|
||||
Return an instance of a :class:`.Formatter` subclass that has `alias` in its
|
||||
aliases list. The formatter is given the `options` at its instantiation.
|
||||
|
||||
Will raise :exc:`pygments.util.ClassNotFound` if no formatter with that
|
||||
alias is found.
|
||||
|
||||
.. function:: get_formatter_for_filename(fn, **options)
|
||||
|
||||
Return a :class:`.Formatter` subclass instance that has a filename pattern
|
||||
matching `fn`. The formatter is given the `options` at its instantiation.
|
||||
|
||||
Will raise :exc:`pygments.util.ClassNotFound` if no formatter for that filename
|
||||
is found.
|
||||
|
||||
|
||||
.. module:: pygments.styles
|
||||
|
||||
Functions from :mod:`pygments.styles`:
|
||||
|
||||
.. function:: get_style_by_name(name)
|
||||
|
||||
Return a style class by its short name. The names of the builtin styles
|
||||
are listed in :data:`pygments.styles.STYLE_MAP`.
|
||||
|
||||
Will raise :exc:`pygments.util.ClassNotFound` if no style of that name is
|
||||
found.
|
||||
|
||||
.. function:: get_all_styles()
|
||||
|
||||
Return an iterable over all registered styles, yielding their names.
|
||||
|
||||
.. versionadded:: 0.6
|
||||
|
||||
|
||||
.. module:: pygments.lexer
|
||||
|
||||
Lexers
|
||||
======
|
||||
|
||||
The base lexer class from which all lexers are derived is:
|
||||
|
||||
.. class:: Lexer(**options)
|
||||
|
||||
The constructor takes a \*\*keywords dictionary of options.
|
||||
Every subclass must first process its own options and then call
|
||||
the `Lexer` constructor, since it processes the `stripnl`,
|
||||
`stripall` and `tabsize` options.
|
||||
|
||||
An example looks like this:
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
def __init__(self, **options):
|
||||
self.compress = options.get('compress', '')
|
||||
Lexer.__init__(self, **options)
|
||||
|
||||
As these options must all be specifiable as strings (due to the
|
||||
command line usage), there are various utility functions
|
||||
available to help with that, see `Option processing`_.
|
||||
|
||||
.. method:: get_tokens(text)
|
||||
|
||||
This method is the basic interface of a lexer. It is called by
|
||||
the `highlight()` function. It must process the text and return an
|
||||
iterable of ``(tokentype, value)`` pairs from `text`.
|
||||
|
||||
Normally, you don't need to override this method. The default
|
||||
implementation processes the `stripnl`, `stripall` and `tabsize`
|
||||
options and then yields all tokens from `get_tokens_unprocessed()`,
|
||||
with the ``index`` dropped.
|
||||
|
||||
.. method:: get_tokens_unprocessed(text)
|
||||
|
||||
This method should process the text and return an iterable of
|
||||
``(index, tokentype, value)`` tuples where ``index`` is the starting
|
||||
position of the token within the input text.
|
||||
|
||||
This method must be overridden by subclasses.
|
||||
|
||||
.. staticmethod:: analyse_text(text)
|
||||
|
||||
A static method which is called for lexer guessing. It should analyse
|
||||
the text and return a float in the range from ``0.0`` to ``1.0``.
|
||||
If it returns ``0.0``, the lexer will not be selected as the most
|
||||
probable one, if it returns ``1.0``, it will be selected immediately.
|
||||
|
||||
.. note:: You don't have to add ``@staticmethod`` to the definition of
|
||||
this method, this will be taken care of by the Lexer's metaclass.
|
||||
|
||||
For a list of known tokens have a look at the :doc:`tokens` page.
|
||||
|
||||
A lexer also can have the following attributes (in fact, they are mandatory
|
||||
except `alias_filenames`) that are used by the builtin lookup mechanism.
|
||||
|
||||
.. attribute:: name
|
||||
|
||||
Full name for the lexer, in human-readable form.
|
||||
|
||||
.. attribute:: aliases
|
||||
|
||||
A list of short, unique identifiers that can be used to lookup
|
||||
the lexer from a list, e.g. using `get_lexer_by_name()`.
|
||||
|
||||
.. attribute:: filenames
|
||||
|
||||
A list of `fnmatch` patterns that match filenames which contain
|
||||
content for this lexer. The patterns in this list should be unique among
|
||||
all lexers.
|
||||
|
||||
.. attribute:: alias_filenames
|
||||
|
||||
A list of `fnmatch` patterns that match filenames which may or may not
|
||||
contain content for this lexer. This list is used by the
|
||||
:func:`.guess_lexer_for_filename()` function, to determine which lexers
|
||||
are then included in guessing the correct one. That means that
|
||||
e.g. every lexer for HTML and a template language should include
|
||||
``\*.html`` in this list.
|
||||
|
||||
.. attribute:: mimetypes
|
||||
|
||||
A list of MIME types for content that can be lexed with this
|
||||
lexer.
|
||||
|
||||
|
||||
.. module:: pygments.formatter
|
||||
|
||||
Formatters
|
||||
==========
|
||||
|
||||
A formatter is derived from this class:
|
||||
|
||||
|
||||
.. class:: Formatter(**options)
|
||||
|
||||
As with lexers, this constructor processes options and then must call the
|
||||
base class :meth:`__init__`.
|
||||
|
||||
The :class:`Formatter` class recognizes the options `style`, `full` and
|
||||
`title`. It is up to the formatter class whether it uses them.
|
||||
|
||||
.. method:: get_style_defs(arg='')
|
||||
|
||||
This method must return statements or declarations suitable to define
|
||||
the current style for subsequent highlighted text (e.g. CSS classes
|
||||
in the `HTMLFormatter`).
|
||||
|
||||
The optional argument `arg` can be used to modify the generation and
|
||||
is formatter dependent (it is standardized because it can be given on
|
||||
the command line).
|
||||
|
||||
This method is called by the ``-S`` :doc:`command-line option <cmdline>`,
|
||||
the `arg` is then given by the ``-a`` option.
|
||||
|
||||
.. method:: format(tokensource, outfile)
|
||||
|
||||
This method must format the tokens from the `tokensource` iterable and
|
||||
write the formatted version to the file object `outfile`.
|
||||
|
||||
Formatter options can control how exactly the tokens are converted.
|
||||
|
||||
.. versionadded:: 0.7
|
||||
A formatter must have the following attributes that are used by the
|
||||
builtin lookup mechanism.
|
||||
|
||||
.. attribute:: name
|
||||
|
||||
Full name for the formatter, in human-readable form.
|
||||
|
||||
.. attribute:: aliases
|
||||
|
||||
A list of short, unique identifiers that can be used to lookup
|
||||
the formatter from a list, e.g. using :func:`.get_formatter_by_name()`.
|
||||
|
||||
.. attribute:: filenames
|
||||
|
||||
A list of :mod:`fnmatch` patterns that match filenames for which this
|
||||
formatter can produce output. The patterns in this list should be unique
|
||||
among all formatters.
|
||||
|
||||
|
||||
.. module:: pygments.util
|
||||
|
||||
Option processing
|
||||
=================
|
||||
|
||||
The :mod:`pygments.util` module has some utility functions usable for option
|
||||
processing:
|
||||
|
||||
.. exception:: OptionError
|
||||
|
||||
This exception will be raised by all option processing functions if
|
||||
the type or value of the argument is not correct.
|
||||
|
||||
.. function:: get_bool_opt(options, optname, default=None)
|
||||
|
||||
Interpret the key `optname` from the dictionary `options` as a boolean and
|
||||
return it. Return `default` if `optname` is not in `options`.
|
||||
|
||||
The valid string values for ``True`` are ``1``, ``yes``, ``true`` and
|
||||
``on``, the ones for ``False`` are ``0``, ``no``, ``false`` and ``off``
|
||||
(matched case-insensitively).
|
||||
|
||||
.. function:: get_int_opt(options, optname, default=None)
|
||||
|
||||
As :func:`get_bool_opt`, but interpret the value as an integer.
|
||||
|
||||
.. function:: get_list_opt(options, optname, default=None)
|
||||
|
||||
If the key `optname` from the dictionary `options` is a string,
|
||||
split it at whitespace and return it. If it is already a list
|
||||
or a tuple, it is returned as a list.
|
||||
|
||||
.. function:: get_choice_opt(options, optname, allowed, default=None)
|
||||
|
||||
If the key `optname` from the dictionary is not in the sequence
|
||||
`allowed`, raise an error, otherwise return it.
|
||||
|
||||
.. versionadded:: 0.8
|
||||
4
vendor/pygments/doc/docs/authors.rst
vendored
Normal file
4
vendor/pygments/doc/docs/authors.rst
vendored
Normal file
@@ -0,0 +1,4 @@
|
||||
Full contributor list
|
||||
=====================
|
||||
|
||||
.. include:: ../../AUTHORS
|
||||
1
vendor/pygments/doc/docs/changelog.rst
vendored
Normal file
1
vendor/pygments/doc/docs/changelog.rst
vendored
Normal file
@@ -0,0 +1 @@
|
||||
.. include:: ../../CHANGES
|
||||
149
vendor/pygments/doc/docs/cmdline.rst
vendored
Normal file
149
vendor/pygments/doc/docs/cmdline.rst
vendored
Normal file
@@ -0,0 +1,149 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
======================
|
||||
Command Line Interface
|
||||
======================
|
||||
|
||||
You can use Pygments from the shell, provided you installed the
|
||||
:program:`pygmentize` script::
|
||||
|
||||
$ pygmentize test.py
|
||||
print "Hello World"
|
||||
|
||||
will print the file test.py to standard output, using the Python lexer
|
||||
(inferred from the file name extension) and the terminal formatter (because
|
||||
you didn't give an explicit formatter name).
|
||||
|
||||
If you want HTML output::
|
||||
|
||||
$ pygmentize -f html -l python -o test.html test.py
|
||||
|
||||
As you can see, the -l option explicitly selects a lexer. As seen above, if you
|
||||
give an input file name and it has an extension that Pygments recognizes, you can
|
||||
omit this option.
|
||||
|
||||
The ``-o`` option gives an output file name. If it is not given, output is
|
||||
written to stdout.
|
||||
|
||||
The ``-f`` option selects a formatter (as with ``-l``, it can also be omitted
|
||||
if an output file name is given and has a supported extension).
|
||||
If no output file name is given and ``-f`` is omitted, the
|
||||
:class:`.TerminalFormatter` is used.
|
||||
|
||||
The above command could therefore also be given as::
|
||||
|
||||
$ pygmentize -o test.html test.py
|
||||
|
||||
To create a full HTML document, including line numbers and stylesheet (using the
|
||||
"emacs" style), highlighting the Python file ``test.py`` to ``test.html``::
|
||||
|
||||
$ pygmentize -O full,style=emacs -o test.html test.py
|
||||
|
||||
|
||||
Options and filters
|
||||
-------------------
|
||||
|
||||
Lexer and formatter options can be given using the ``-O`` option::
|
||||
|
||||
$ pygmentize -f html -O style=colorful,linenos=1 -l python test.py
|
||||
|
||||
Be sure to enclose the option string in quotes if it contains any special shell
|
||||
characters, such as spaces or expansion wildcards like ``*``. If an option
|
||||
expects a list value, separate the list entries with spaces (you'll have to
|
||||
quote the option value in this case too, so that the shell doesn't split it).
|
||||
|
||||
Since the ``-O`` option argument is split at commas and expects the split values
|
||||
to be of the form ``name=value``, you can't give an option value that contains
|
||||
commas or equals signs. Therefore, an option ``-P`` is provided (as of Pygments
|
||||
0.9) that works like ``-O`` but can only pass one option per ``-P``. Its value
|
||||
can then contain all characters::
|
||||
|
||||
$ pygmentize -P "heading=Pygments, the Python highlighter" ...
|
||||
|
||||
Filters are added to the token stream using the ``-F`` option::
|
||||
|
||||
$ pygmentize -f html -l pascal -F keywordcase:case=upper main.pas
|
||||
|
||||
As you see, options for the filter are given after a colon. As for ``-O``, the
|
||||
filter name and options must be one shell word, so there may not be any spaces
|
||||
around the colon.
|
||||
|
||||
|
||||
Generating styles
|
||||
-----------------
|
||||
|
||||
Formatters normally don't output full style information. For example, the HTML
|
||||
formatter by default only outputs ``<span>`` tags with ``class`` attributes.
|
||||
Therefore, there's a special ``-S`` option for generating style definitions.
|
||||
Usage is as follows::
|
||||
|
||||
$ pygmentize -f html -S colorful -a .syntax
|
||||
|
||||
generates a CSS style sheet (because you selected the HTML formatter) for
|
||||
the "colorful" style prepending a ".syntax" selector to all style rules.
|
||||
|
||||
For an explanation what ``-a`` means for :doc:`a particular formatter
|
||||
<formatters>`, look for the `arg` argument for the formatter's
|
||||
:meth:`.get_style_defs()` method.
|
||||
|
||||
|
||||
Getting lexer names
|
||||
-------------------
|
||||
|
||||
.. versionadded:: 1.0
|
||||
|
||||
The ``-N`` option guesses a lexer name for a given filename, so that ::
|
||||
|
||||
$ pygmentize -N setup.py
|
||||
|
||||
will print out ``python``. It won't highlight anything yet. If no specific
|
||||
lexer is known for that filename, ``text`` is printed.
|
||||
|
||||
|
||||
Getting help
|
||||
------------
|
||||
|
||||
The ``-L`` option lists lexers, formatters, along with their short
|
||||
names and supported file name extensions, styles and filters. If you want to see
|
||||
only one category, give it as an argument::
|
||||
|
||||
$ pygmentize -L filters
|
||||
|
||||
will list only all installed filters.
|
||||
|
||||
The ``-H`` option will give you detailed information (the same that can be found
|
||||
in this documentation) about a lexer, formatter or filter. Usage is as follows::
|
||||
|
||||
$ pygmentize -H formatter html
|
||||
|
||||
will print the help for the HTML formatter, while ::
|
||||
|
||||
$ pygmentize -H lexer python
|
||||
|
||||
will print the help for the Python lexer, etc.
|
||||
|
||||
|
||||
A note on encodings
|
||||
-------------------
|
||||
|
||||
.. versionadded:: 0.9
|
||||
|
||||
Pygments tries to be smart regarding encodings in the formatting process:
|
||||
|
||||
* If you give an ``encoding`` option, it will be used as the input and
|
||||
output encoding.
|
||||
|
||||
* If you give an ``outencoding`` option, it will override ``encoding``
|
||||
as the output encoding.
|
||||
|
||||
* If you give an ``inencoding`` option, it will override ``encoding``
|
||||
as the input encoding.
|
||||
|
||||
* If you don't give an encoding and have given an output file, the default
|
||||
encoding for lexer and formatter is the terminal encoding or the default
|
||||
locale encoding of the system. As a last resort, ``latin1`` is used (which
|
||||
will pass through all non-ASCII characters).
|
||||
|
||||
* If you don't give an encoding and haven't given an output file (that means
|
||||
output is written to the console), the default encoding for lexer and
|
||||
formatter is the terminal encoding (``sys.stdout.encoding``).
|
||||
71
vendor/pygments/doc/docs/filterdevelopment.rst
vendored
Normal file
71
vendor/pygments/doc/docs/filterdevelopment.rst
vendored
Normal file
@@ -0,0 +1,71 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
=====================
|
||||
Write your own filter
|
||||
=====================
|
||||
|
||||
.. versionadded:: 0.7
|
||||
|
||||
Writing own filters is very easy. All you have to do is to subclass
|
||||
the `Filter` class and override the `filter` method. Additionally a
|
||||
filter is instantiated with some keyword arguments you can use to
|
||||
adjust the behavior of your filter.
|
||||
|
||||
|
||||
Subclassing Filters
|
||||
===================
|
||||
|
||||
As an example, we write a filter that converts all `Name.Function` tokens
|
||||
to normal `Name` tokens to make the output less colorful.
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
from pygments.util import get_bool_opt
|
||||
from pygments.token import Name
|
||||
from pygments.filter import Filter
|
||||
|
||||
class UncolorFilter(Filter):
|
||||
|
||||
def __init__(self, **options):
|
||||
Filter.__init__(self, **options)
|
||||
self.class_too = get_bool_opt(options, 'classtoo')
|
||||
|
||||
def filter(self, lexer, stream):
|
||||
for ttype, value in stream:
|
||||
if ttype is Name.Function or (self.class_too and
|
||||
ttype is Name.Class):
|
||||
ttype = Name
|
||||
yield ttype, value
|
||||
|
||||
Some notes on the `lexer` argument: that can be quite confusing since it doesn't
|
||||
need to be a lexer instance. If a filter was added by using the `add_filter()`
|
||||
function of lexers, that lexer is registered for the filter. In that case
|
||||
`lexer` will refer to the lexer that has registered the filter. It *can* be used
|
||||
to access options passed to a lexer. Because it could be `None` you always have
|
||||
to check for that case if you access it.
|
||||
|
||||
|
||||
Using a decorator
|
||||
=================
|
||||
|
||||
You can also use the `simplefilter` decorator from the `pygments.filter` module:
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
from pygments.util import get_bool_opt
|
||||
from pygments.token import Name
|
||||
from pygments.filter import simplefilter
|
||||
|
||||
|
||||
@simplefilter
|
||||
def uncolor(self, lexer, stream, options):
|
||||
class_too = get_bool_opt(options, 'classtoo')
|
||||
for ttype, value in stream:
|
||||
if ttype is Name.Function or (class_too and
|
||||
ttype is Name.Class):
|
||||
ttype = Name
|
||||
yield ttype, value
|
||||
|
||||
The decorator automatically subclasses an internal filter class and uses the
|
||||
decorated function as a method for filtering. (That's why there is a `self`
|
||||
argument that you probably won't end up using in the method.)
|
||||
41
vendor/pygments/doc/docs/filters.rst
vendored
Normal file
41
vendor/pygments/doc/docs/filters.rst
vendored
Normal file
@@ -0,0 +1,41 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
=======
|
||||
Filters
|
||||
=======
|
||||
|
||||
.. versionadded:: 0.7
|
||||
|
||||
You can filter token streams coming from lexers to improve or annotate the
|
||||
output. For example, you can highlight special words in comments, convert
|
||||
keywords to upper or lowercase to enforce a style guide etc.
|
||||
|
||||
To apply a filter, you can use the `add_filter()` method of a lexer:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.lexers import PythonLexer
|
||||
>>> l = PythonLexer()
|
||||
>>> # add a filter given by a string and options
|
||||
>>> l.add_filter('codetagify', case='lower')
|
||||
>>> l.filters
|
||||
[<pygments.filters.CodeTagFilter object at 0xb785decc>]
|
||||
>>> from pygments.filters import KeywordCaseFilter
|
||||
>>> # or give an instance
|
||||
>>> l.add_filter(KeywordCaseFilter(case='lower'))
|
||||
|
||||
The `add_filter()` method takes keyword arguments which are forwarded to
|
||||
the constructor of the filter.
|
||||
|
||||
To get a list of all registered filters by name, you can use the
|
||||
`get_all_filters()` function from the `pygments.filters` module that returns an
|
||||
iterable for all known filters.
|
||||
|
||||
If you want to write your own filter, have a look at :doc:`Write your own filter
|
||||
<filterdevelopment>`.
|
||||
|
||||
|
||||
Builtin Filters
|
||||
===============
|
||||
|
||||
.. pygmentsdoc:: filters
|
||||
169
vendor/pygments/doc/docs/formatterdevelopment.rst
vendored
Normal file
169
vendor/pygments/doc/docs/formatterdevelopment.rst
vendored
Normal file
@@ -0,0 +1,169 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
========================
|
||||
Write your own formatter
|
||||
========================
|
||||
|
||||
As well as creating :doc:`your own lexer <lexerdevelopment>`, writing a new
|
||||
formatter for Pygments is easy and straightforward.
|
||||
|
||||
A formatter is a class that is initialized with some keyword arguments (the
|
||||
formatter options) and that must provides a `format()` method.
|
||||
Additionally a formatter should provide a `get_style_defs()` method that
|
||||
returns the style definitions from the style in a form usable for the
|
||||
formatter's output format.
|
||||
|
||||
|
||||
Quickstart
|
||||
==========
|
||||
|
||||
The most basic formatter shipped with Pygments is the `NullFormatter`. It just
|
||||
sends the value of a token to the output stream:
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
from pygments.formatter import Formatter
|
||||
|
||||
class NullFormatter(Formatter):
|
||||
def format(self, tokensource, outfile):
|
||||
for ttype, value in tokensource:
|
||||
outfile.write(value)
|
||||
|
||||
As you can see, the `format()` method is passed two parameters: `tokensource`
|
||||
and `outfile`. The first is an iterable of ``(token_type, value)`` tuples,
|
||||
the latter a file like object with a `write()` method.
|
||||
|
||||
Because the formatter is that basic it doesn't overwrite the `get_style_defs()`
|
||||
method.
|
||||
|
||||
|
||||
Styles
|
||||
======
|
||||
|
||||
Styles aren't instantiated but their metaclass provides some class functions
|
||||
so that you can access the style definitions easily.
|
||||
|
||||
Styles are iterable and yield tuples in the form ``(ttype, d)`` where `ttype`
|
||||
is a token and `d` is a dict with the following keys:
|
||||
|
||||
``'color'``
|
||||
Hexadecimal color value (eg: ``'ff0000'`` for red) or `None` if not
|
||||
defined.
|
||||
|
||||
``'bold'``
|
||||
`True` if the value should be bold
|
||||
|
||||
``'italic'``
|
||||
`True` if the value should be italic
|
||||
|
||||
``'underline'``
|
||||
`True` if the value should be underlined
|
||||
|
||||
``'bgcolor'``
|
||||
Hexadecimal color value for the background (eg: ``'eeeeeee'`` for light
|
||||
gray) or `None` if not defined.
|
||||
|
||||
``'border'``
|
||||
Hexadecimal color value for the border (eg: ``'0000aa'`` for a dark
|
||||
blue) or `None` for no border.
|
||||
|
||||
Additional keys might appear in the future, formatters should ignore all keys
|
||||
they don't support.
|
||||
|
||||
|
||||
HTML 3.2 Formatter
|
||||
==================
|
||||
|
||||
For an more complex example, let's implement a HTML 3.2 Formatter. We don't
|
||||
use CSS but inline markup (``<u>``, ``<font>``, etc). Because this isn't good
|
||||
style this formatter isn't in the standard library ;-)
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
from pygments.formatter import Formatter
|
||||
|
||||
class OldHtmlFormatter(Formatter):
|
||||
|
||||
def __init__(self, **options):
|
||||
Formatter.__init__(self, **options)
|
||||
|
||||
# create a dict of (start, end) tuples that wrap the
|
||||
# value of a token so that we can use it in the format
|
||||
# method later
|
||||
self.styles = {}
|
||||
|
||||
# we iterate over the `_styles` attribute of a style item
|
||||
# that contains the parsed style values.
|
||||
for token, style in self.style:
|
||||
start = end = ''
|
||||
# a style item is a tuple in the following form:
|
||||
# colors are readily specified in hex: 'RRGGBB'
|
||||
if style['color']:
|
||||
start += '<font color="#%s">' % style['color']
|
||||
end = '</font>' + end
|
||||
if style['bold']:
|
||||
start += '<b>'
|
||||
end = '</b>' + end
|
||||
if style['italic']:
|
||||
start += '<i>'
|
||||
end = '</i>' + end
|
||||
if style['underline']:
|
||||
start += '<u>'
|
||||
end = '</u>' + end
|
||||
self.styles[token] = (start, end)
|
||||
|
||||
def format(self, tokensource, outfile):
|
||||
# lastval is a string we use for caching
|
||||
# because it's possible that an lexer yields a number
|
||||
# of consecutive tokens with the same token type.
|
||||
# to minimize the size of the generated html markup we
|
||||
# try to join the values of same-type tokens here
|
||||
lastval = ''
|
||||
lasttype = None
|
||||
|
||||
# wrap the whole output with <pre>
|
||||
outfile.write('<pre>')
|
||||
|
||||
for ttype, value in tokensource:
|
||||
# if the token type doesn't exist in the stylemap
|
||||
# we try it with the parent of the token type
|
||||
# eg: parent of Token.Literal.String.Double is
|
||||
# Token.Literal.String
|
||||
while ttype not in self.styles:
|
||||
ttype = ttype.parent
|
||||
if ttype == lasttype:
|
||||
# the current token type is the same of the last
|
||||
# iteration. cache it
|
||||
lastval += value
|
||||
else:
|
||||
# not the same token as last iteration, but we
|
||||
# have some data in the buffer. wrap it with the
|
||||
# defined style and write it to the output file
|
||||
if lastval:
|
||||
stylebegin, styleend = self.styles[lasttype]
|
||||
outfile.write(stylebegin + lastval + styleend)
|
||||
# set lastval/lasttype to current values
|
||||
lastval = value
|
||||
lasttype = ttype
|
||||
|
||||
# if something is left in the buffer, write it to the
|
||||
# output file, then close the opened <pre> tag
|
||||
if lastval:
|
||||
stylebegin, styleend = self.styles[lasttype]
|
||||
outfile.write(stylebegin + lastval + styleend)
|
||||
outfile.write('</pre>\n')
|
||||
|
||||
The comments should explain it. Again, this formatter doesn't override the
|
||||
`get_style_defs()` method. If we would have used CSS classes instead of
|
||||
inline HTML markup, we would need to generate the CSS first. For that
|
||||
purpose the `get_style_defs()` method exists:
|
||||
|
||||
|
||||
Generating Style Definitions
|
||||
============================
|
||||
|
||||
Some formatters like the `LatexFormatter` and the `HtmlFormatter` don't
|
||||
output inline markup but reference either macros or css classes. Because
|
||||
the definitions of those are not part of the output, the `get_style_defs()`
|
||||
method exists. It is passed one parameter (if it's used and how it's used
|
||||
is up to the formatter) and has to return a string or ``None``.
|
||||
48
vendor/pygments/doc/docs/formatters.rst
vendored
Normal file
48
vendor/pygments/doc/docs/formatters.rst
vendored
Normal file
@@ -0,0 +1,48 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
====================
|
||||
Available formatters
|
||||
====================
|
||||
|
||||
This page lists all builtin formatters.
|
||||
|
||||
Common options
|
||||
==============
|
||||
|
||||
All formatters support these options:
|
||||
|
||||
`encoding`
|
||||
If given, must be an encoding name (such as ``"utf-8"``). This will
|
||||
be used to convert the token strings (which are Unicode strings)
|
||||
to byte strings in the output (default: ``None``).
|
||||
It will also be written in an encoding declaration suitable for the
|
||||
document format if the `full` option is given (e.g. a ``meta
|
||||
content-type`` directive in HTML or an invocation of the `inputenc`
|
||||
package in LaTeX).
|
||||
|
||||
If this is ``""`` or ``None``, Unicode strings will be written
|
||||
to the output file, which most file-like objects do not support.
|
||||
For example, `pygments.highlight()` will return a Unicode string if
|
||||
called with no `outfile` argument and a formatter that has `encoding`
|
||||
set to ``None`` because it uses a `StringIO.StringIO` object that
|
||||
supports Unicode arguments to `write()`. Using a regular file object
|
||||
wouldn't work.
|
||||
|
||||
.. versionadded:: 0.6
|
||||
|
||||
`outencoding`
|
||||
When using Pygments from the command line, any `encoding` option given is
|
||||
passed to the lexer and the formatter. This is sometimes not desirable,
|
||||
for example if you want to set the input encoding to ``"guess"``.
|
||||
Therefore, `outencoding` has been introduced which overrides `encoding`
|
||||
for the formatter if given.
|
||||
|
||||
.. versionadded:: 0.7
|
||||
|
||||
|
||||
Formatter classes
|
||||
=================
|
||||
|
||||
All these classes are importable from :mod:`pygments.formatters`.
|
||||
|
||||
.. pygmentsdoc:: formatters
|
||||
66
vendor/pygments/doc/docs/index.rst
vendored
Normal file
66
vendor/pygments/doc/docs/index.rst
vendored
Normal file
@@ -0,0 +1,66 @@
|
||||
Pygments documentation
|
||||
======================
|
||||
|
||||
**Starting with Pygments**
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
../download
|
||||
quickstart
|
||||
cmdline
|
||||
|
||||
**Builtin components**
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
lexers
|
||||
filters
|
||||
formatters
|
||||
styles
|
||||
|
||||
**Reference**
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
unicode
|
||||
tokens
|
||||
api
|
||||
|
||||
**Hacking for Pygments**
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
lexerdevelopment
|
||||
formatterdevelopment
|
||||
filterdevelopment
|
||||
plugins
|
||||
|
||||
**Hints and tricks**
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
rstdirective
|
||||
moinmoin
|
||||
java
|
||||
integrate
|
||||
|
||||
**About Pygments**
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
changelog
|
||||
authors
|
||||
|
||||
|
||||
If you find bugs or have suggestions for the documentation, please look
|
||||
:ref:`here <contribute>` for info on how to contact the team.
|
||||
|
||||
.. XXX You can download an offline version of this documentation from the
|
||||
:doc:`download page </download>`.
|
||||
|
||||
40
vendor/pygments/doc/docs/integrate.rst
vendored
Normal file
40
vendor/pygments/doc/docs/integrate.rst
vendored
Normal file
@@ -0,0 +1,40 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
===================================
|
||||
Using Pygments in various scenarios
|
||||
===================================
|
||||
|
||||
Markdown
|
||||
--------
|
||||
|
||||
Since Pygments 0.9, the distribution ships Markdown_ preprocessor sample code
|
||||
that uses Pygments to render source code in
|
||||
:file:`external/markdown-processor.py`. You can copy and adapt it to your
|
||||
liking.
|
||||
|
||||
.. _Markdown: http://www.freewisdom.org/projects/python-markdown/
|
||||
|
||||
TextMate
|
||||
--------
|
||||
|
||||
Antonio Cangiano has created a Pygments bundle for TextMate that allows to
|
||||
colorize code via a simple menu option. It can be found here_.
|
||||
|
||||
.. _here: http://antoniocangiano.com/2008/10/28/pygments-textmate-bundle/
|
||||
|
||||
Bash completion
|
||||
---------------
|
||||
|
||||
The source distribution contains a file ``external/pygments.bashcomp`` that
|
||||
sets up completion for the ``pygmentize`` command in bash.
|
||||
|
||||
Wrappers for other languages
|
||||
----------------------------
|
||||
|
||||
These libraries provide Pygments highlighting for users of other languages
|
||||
than Python:
|
||||
|
||||
* `pygments.rb <https://github.com/tmm1/pygments.rb>`_, a pygments wrapper for Ruby
|
||||
* `Clygments <https://github.com/bfontaine/clygments>`_, a pygments wrapper for
|
||||
Clojure
|
||||
* `PHPygments <https://github.com/capynet/PHPygments>`_, a pygments wrapper for PHP
|
||||
70
vendor/pygments/doc/docs/java.rst
vendored
Normal file
70
vendor/pygments/doc/docs/java.rst
vendored
Normal file
@@ -0,0 +1,70 @@
|
||||
=====================
|
||||
Use Pygments in Java
|
||||
=====================
|
||||
|
||||
Thanks to `Jython <http://www.jython.org>`_ it is possible to use Pygments in
|
||||
Java.
|
||||
|
||||
This page is a simple tutorial to get an idea of how this works. You can
|
||||
then look at the `Jython documentation <http://www.jython.org/docs/>`_ for more
|
||||
advanced uses.
|
||||
|
||||
Since version 1.5, Pygments is deployed on `Maven Central
|
||||
<http://repo1.maven.org/maven2/org/pygments/pygments/>`_ as a JAR, as is Jython
|
||||
which makes it a lot easier to create a Java project.
|
||||
|
||||
Here is an example of a `Maven <http://www.maven.org>`_ ``pom.xml`` file for a
|
||||
project running Pygments:
|
||||
|
||||
.. sourcecode:: xml
|
||||
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
|
||||
<project xmlns="http://maven.apache.org/POM/4.0.0"
|
||||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
|
||||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
|
||||
http://maven.apache.org/maven-v4_0_0.xsd">
|
||||
<modelVersion>4.0.0</modelVersion>
|
||||
<groupId>example</groupId>
|
||||
<artifactId>example</artifactId>
|
||||
<version>1.0-SNAPSHOT</version>
|
||||
<dependencies>
|
||||
<dependency>
|
||||
<groupId>org.python</groupId>
|
||||
<artifactId>jython-standalone</artifactId>
|
||||
<version>2.5.3</version>
|
||||
</dependency>
|
||||
<dependency>
|
||||
<groupId>org.pygments</groupId>
|
||||
<artifactId>pygments</artifactId>
|
||||
<version>1.5</version>
|
||||
<scope>runtime</scope>
|
||||
</dependency>
|
||||
</dependencies>
|
||||
</project>
|
||||
|
||||
The following Java example:
|
||||
|
||||
.. sourcecode:: java
|
||||
|
||||
PythonInterpreter interpreter = new PythonInterpreter();
|
||||
|
||||
// Set a variable with the content you want to work with
|
||||
interpreter.set("code", code);
|
||||
|
||||
// Simple use Pygments as you would in Python
|
||||
interpreter.exec("from pygments import highlight\n"
|
||||
+ "from pygments.lexers import PythonLexer\n"
|
||||
+ "from pygments.formatters import HtmlFormatter\n"
|
||||
+ "\nresult = highlight(code, PythonLexer(), HtmlFormatter())");
|
||||
|
||||
// Get the result that has been set in a variable
|
||||
System.out.println(interpreter.get("result", String.class));
|
||||
|
||||
will print something like:
|
||||
|
||||
.. sourcecode:: html
|
||||
|
||||
<div class="highlight">
|
||||
<pre><span class="k">print</span> <span class="s">"Hello World"</span></pre>
|
||||
</div>
|
||||
681
vendor/pygments/doc/docs/lexerdevelopment.rst
vendored
Normal file
681
vendor/pygments/doc/docs/lexerdevelopment.rst
vendored
Normal file
@@ -0,0 +1,681 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
.. highlight:: python
|
||||
|
||||
====================
|
||||
Write your own lexer
|
||||
====================
|
||||
|
||||
If a lexer for your favorite language is missing in the Pygments package, you
|
||||
can easily write your own and extend Pygments.
|
||||
|
||||
All you need can be found inside the :mod:`pygments.lexer` module. As you can
|
||||
read in the :doc:`API documentation <api>`, a lexer is a class that is
|
||||
initialized with some keyword arguments (the lexer options) and that provides a
|
||||
:meth:`.get_tokens_unprocessed()` method which is given a string or unicode
|
||||
object with the data to lex.
|
||||
|
||||
The :meth:`.get_tokens_unprocessed()` method must return an iterator or iterable
|
||||
containing tuples in the form ``(index, token, value)``. Normally you don't
|
||||
need to do this since there are base lexers that do most of the work and that
|
||||
you can subclass.
|
||||
|
||||
|
||||
RegexLexer
|
||||
==========
|
||||
|
||||
The lexer base class used by almost all of Pygments' lexers is the
|
||||
:class:`RegexLexer`. This class allows you to define lexing rules in terms of
|
||||
*regular expressions* for different *states*.
|
||||
|
||||
States are groups of regular expressions that are matched against the input
|
||||
string at the *current position*. If one of these expressions matches, a
|
||||
corresponding action is performed (such as yielding a token with a specific
|
||||
type, or changing state), the current position is set to where the last match
|
||||
ended and the matching process continues with the first regex of the current
|
||||
state.
|
||||
|
||||
Lexer states are kept on a stack: each time a new state is entered, the new
|
||||
state is pushed onto the stack. The most basic lexers (like the `DiffLexer`)
|
||||
just need one state.
|
||||
|
||||
Each state is defined as a list of tuples in the form (`regex`, `action`,
|
||||
`new_state`) where the last item is optional. In the most basic form, `action`
|
||||
is a token type (like `Name.Builtin`). That means: When `regex` matches, emit a
|
||||
token with the match text and type `tokentype` and push `new_state` on the state
|
||||
stack. If the new state is ``'#pop'``, the topmost state is popped from the
|
||||
stack instead. To pop more than one state, use ``'#pop:2'`` and so on.
|
||||
``'#push'`` is a synonym for pushing the current state on the stack.
|
||||
|
||||
The following example shows the `DiffLexer` from the builtin lexers. Note that
|
||||
it contains some additional attributes `name`, `aliases` and `filenames` which
|
||||
aren't required for a lexer. They are used by the builtin lexer lookup
|
||||
functions. ::
|
||||
|
||||
from pygments.lexer import RegexLexer
|
||||
from pygments.token import *
|
||||
|
||||
class DiffLexer(RegexLexer):
|
||||
name = 'Diff'
|
||||
aliases = ['diff']
|
||||
filenames = ['*.diff']
|
||||
|
||||
tokens = {
|
||||
'root': [
|
||||
(r' .*\n', Text),
|
||||
(r'\+.*\n', Generic.Inserted),
|
||||
(r'-.*\n', Generic.Deleted),
|
||||
(r'@.*\n', Generic.Subheading),
|
||||
(r'Index.*\n', Generic.Heading),
|
||||
(r'=.*\n', Generic.Heading),
|
||||
(r'.*\n', Text),
|
||||
]
|
||||
}
|
||||
|
||||
As you can see this lexer only uses one state. When the lexer starts scanning
|
||||
the text, it first checks if the current character is a space. If this is true
|
||||
it scans everything until newline and returns the data as a `Text` token (which
|
||||
is the "no special highlighting" token).
|
||||
|
||||
If this rule doesn't match, it checks if the current char is a plus sign. And
|
||||
so on.
|
||||
|
||||
If no rule matches at the current position, the current char is emitted as an
|
||||
`Error` token that indicates a lexing error, and the position is increased by
|
||||
one.
|
||||
|
||||
|
||||
Adding and testing a new lexer
|
||||
==============================
|
||||
|
||||
To make Pygments aware of your new lexer, you have to perform the following
|
||||
steps:
|
||||
|
||||
First, change to the current directory containing the Pygments source code:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ cd .../pygments-main
|
||||
|
||||
Select a matching module under ``pygments/lexers``, or create a new module for
|
||||
your lexer class.
|
||||
|
||||
Next, make sure the lexer is known from outside of the module. All modules in
|
||||
the ``pygments.lexers`` specify ``__all__``. For example, ``esoteric.py`` sets::
|
||||
|
||||
__all__ = ['BrainfuckLexer', 'BefungeLexer', ...]
|
||||
|
||||
Simply add the name of your lexer class to this list.
|
||||
|
||||
Finally the lexer can be made publicly known by rebuilding the lexer mapping:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ make mapfiles
|
||||
|
||||
To test the new lexer, store an example file with the proper extension in
|
||||
``tests/examplefiles``. For example, to test your ``DiffLexer``, add a
|
||||
``tests/examplefiles/example.diff`` containing a sample diff output.
|
||||
|
||||
Now you can use pygmentize to render your example to HTML:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ ./pygmentize -O full -f html -o /tmp/example.html tests/examplefiles/example.diff
|
||||
|
||||
Note that this explicitly calls the ``pygmentize`` in the current directory
|
||||
by preceding it with ``./``. This ensures your modifications are used.
|
||||
Otherwise a possibly already installed, unmodified version without your new
|
||||
lexer would have been called from the system search path (``$PATH``).
|
||||
|
||||
To view the result, open ``/tmp/example.html`` in your browser.
|
||||
|
||||
Once the example renders as expected, you should run the complete test suite:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ make test
|
||||
|
||||
It also tests that your lexer fulfills the lexer API and certain invariants,
|
||||
such as that the concatenation of all token text is the same as the input text.
|
||||
|
||||
|
||||
Regex Flags
|
||||
===========
|
||||
|
||||
You can either define regex flags locally in the regex (``r'(?x)foo bar'``) or
|
||||
globally by adding a `flags` attribute to your lexer class. If no attribute is
|
||||
defined, it defaults to `re.MULTILINE`. For more information about regular
|
||||
expression flags see the page about `regular expressions`_ in the Python
|
||||
documentation.
|
||||
|
||||
.. _regular expressions: http://docs.python.org/library/re.html#regular-expression-syntax
|
||||
|
||||
|
||||
Scanning multiple tokens at once
|
||||
================================
|
||||
|
||||
So far, the `action` element in the rule tuple of regex, action and state has
|
||||
been a single token type. Now we look at the first of several other possible
|
||||
values.
|
||||
|
||||
Here is a more complex lexer that highlights INI files. INI files consist of
|
||||
sections, comments and ``key = value`` pairs::
|
||||
|
||||
from pygments.lexer import RegexLexer, bygroups
|
||||
from pygments.token import *
|
||||
|
||||
class IniLexer(RegexLexer):
|
||||
name = 'INI'
|
||||
aliases = ['ini', 'cfg']
|
||||
filenames = ['*.ini', '*.cfg']
|
||||
|
||||
tokens = {
|
||||
'root': [
|
||||
(r'\s+', Text),
|
||||
(r';.*?$', Comment),
|
||||
(r'\[.*?\]$', Keyword),
|
||||
(r'(.*?)(\s*)(=)(\s*)(.*?)$',
|
||||
bygroups(Name.Attribute, Text, Operator, Text, String))
|
||||
]
|
||||
}
|
||||
|
||||
The lexer first looks for whitespace, comments and section names. Later it
|
||||
looks for a line that looks like a key, value pair, separated by an ``'='``
|
||||
sign, and optional whitespace.
|
||||
|
||||
The `bygroups` helper yields each capturing group in the regex with a different
|
||||
token type. First the `Name.Attribute` token, then a `Text` token for the
|
||||
optional whitespace, after that a `Operator` token for the equals sign. Then a
|
||||
`Text` token for the whitespace again. The rest of the line is returned as
|
||||
`String`.
|
||||
|
||||
Note that for this to work, every part of the match must be inside a capturing
|
||||
group (a ``(...)``), and there must not be any nested capturing groups. If you
|
||||
nevertheless need a group, use a non-capturing group defined using this syntax:
|
||||
``(?:some|words|here)`` (note the ``?:`` after the beginning parenthesis).
|
||||
|
||||
If you find yourself needing a capturing group inside the regex which shouldn't
|
||||
be part of the output but is used in the regular expressions for backreferencing
|
||||
(eg: ``r'(<(foo|bar)>)(.*?)(</\2>)'``), you can pass `None` to the bygroups
|
||||
function and that group will be skipped in the output.
|
||||
|
||||
|
||||
Changing states
|
||||
===============
|
||||
|
||||
Many lexers need multiple states to work as expected. For example, some
|
||||
languages allow multiline comments to be nested. Since this is a recursive
|
||||
pattern it's impossible to lex just using regular expressions.
|
||||
|
||||
Here is a lexer that recognizes C++ style comments (multi-line with ``/* */``
|
||||
and single-line with ``//`` until end of line)::
|
||||
|
||||
from pygments.lexer import RegexLexer
|
||||
from pygments.token import *
|
||||
|
||||
class CppCommentLexer(RegexLexer):
|
||||
name = 'Example Lexer with states'
|
||||
|
||||
tokens = {
|
||||
'root': [
|
||||
(r'[^/]+', Text),
|
||||
(r'/\*', Comment.Multiline, 'comment'),
|
||||
(r'//.*?$', Comment.Singleline),
|
||||
(r'/', Text)
|
||||
],
|
||||
'comment': [
|
||||
(r'[^*/]', Comment.Multiline),
|
||||
(r'/\*', Comment.Multiline, '#push'),
|
||||
(r'\*/', Comment.Multiline, '#pop'),
|
||||
(r'[*/]', Comment.Multiline)
|
||||
]
|
||||
}
|
||||
|
||||
This lexer starts lexing in the ``'root'`` state. It tries to match as much as
|
||||
possible until it finds a slash (``'/'``). If the next character after the slash
|
||||
is an asterisk (``'*'``) the `RegexLexer` sends those two characters to the
|
||||
output stream marked as `Comment.Multiline` and continues lexing with the rules
|
||||
defined in the ``'comment'`` state.
|
||||
|
||||
If there wasn't an asterisk after the slash, the `RegexLexer` checks if it's a
|
||||
Singleline comment (i.e. followed by a second slash). If this also wasn't the
|
||||
case it must be a single slash, which is not a comment starter (the separate
|
||||
regex for a single slash must also be given, else the slash would be marked as
|
||||
an error token).
|
||||
|
||||
Inside the ``'comment'`` state, we do the same thing again. Scan until the
|
||||
lexer finds a star or slash. If it's the opening of a multiline comment, push
|
||||
the ``'comment'`` state on the stack and continue scanning, again in the
|
||||
``'comment'`` state. Else, check if it's the end of the multiline comment. If
|
||||
yes, pop one state from the stack.
|
||||
|
||||
Note: If you pop from an empty stack you'll get an `IndexError`. (There is an
|
||||
easy way to prevent this from happening: don't ``'#pop'`` in the root state).
|
||||
|
||||
If the `RegexLexer` encounters a newline that is flagged as an error token, the
|
||||
stack is emptied and the lexer continues scanning in the ``'root'`` state. This
|
||||
can help producing error-tolerant highlighting for erroneous input, e.g. when a
|
||||
single-line string is not closed.
|
||||
|
||||
|
||||
Advanced state tricks
|
||||
=====================
|
||||
|
||||
There are a few more things you can do with states:
|
||||
|
||||
- You can push multiple states onto the stack if you give a tuple instead of a
|
||||
simple string as the third item in a rule tuple. For example, if you want to
|
||||
match a comment containing a directive, something like:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
/* <processing directive> rest of comment */
|
||||
|
||||
you can use this rule::
|
||||
|
||||
tokens = {
|
||||
'root': [
|
||||
(r'/\* <', Comment, ('comment', 'directive')),
|
||||
...
|
||||
],
|
||||
'directive': [
|
||||
(r'[^>]*', Comment.Directive),
|
||||
(r'>', Comment, '#pop'),
|
||||
],
|
||||
'comment': [
|
||||
(r'[^*]+', Comment),
|
||||
(r'\*/', Comment, '#pop'),
|
||||
(r'\*', Comment),
|
||||
]
|
||||
}
|
||||
|
||||
When this encounters the above sample, first ``'comment'`` and ``'directive'``
|
||||
are pushed onto the stack, then the lexer continues in the directive state
|
||||
until it finds the closing ``>``, then it continues in the comment state until
|
||||
the closing ``*/``. Then, both states are popped from the stack again and
|
||||
lexing continues in the root state.
|
||||
|
||||
.. versionadded:: 0.9
|
||||
The tuple can contain the special ``'#push'`` and ``'#pop'`` (but not
|
||||
``'#pop:n'``) directives.
|
||||
|
||||
|
||||
- You can include the rules of a state in the definition of another. This is
|
||||
done by using `include` from `pygments.lexer`::
|
||||
|
||||
from pygments.lexer import RegexLexer, bygroups, include
|
||||
from pygments.token import *
|
||||
|
||||
class ExampleLexer(RegexLexer):
|
||||
tokens = {
|
||||
'comments': [
|
||||
(r'/\*.*?\*/', Comment),
|
||||
(r'//.*?\n', Comment),
|
||||
],
|
||||
'root': [
|
||||
include('comments'),
|
||||
(r'(function )(\w+)( {)',
|
||||
bygroups(Keyword, Name, Keyword), 'function'),
|
||||
(r'.', Text),
|
||||
],
|
||||
'function': [
|
||||
(r'[^}/]+', Text),
|
||||
include('comments'),
|
||||
(r'/', Text),
|
||||
(r'\}', Keyword, '#pop'),
|
||||
]
|
||||
}
|
||||
|
||||
This is a hypothetical lexer for a language that consist of functions and
|
||||
comments. Because comments can occur at toplevel and in functions, we need
|
||||
rules for comments in both states. As you can see, the `include` helper saves
|
||||
repeating rules that occur more than once (in this example, the state
|
||||
``'comment'`` will never be entered by the lexer, as it's only there to be
|
||||
included in ``'root'`` and ``'function'``).
|
||||
|
||||
- Sometimes, you may want to "combine" a state from existing ones. This is
|
||||
possible with the `combined` helper from `pygments.lexer`.
|
||||
|
||||
If you, instead of a new state, write ``combined('state1', 'state2')`` as the
|
||||
third item of a rule tuple, a new anonymous state will be formed from state1
|
||||
and state2 and if the rule matches, the lexer will enter this state.
|
||||
|
||||
This is not used very often, but can be helpful in some cases, such as the
|
||||
`PythonLexer`'s string literal processing.
|
||||
|
||||
- If you want your lexer to start lexing in a different state you can modify the
|
||||
stack by overriding the `get_tokens_unprocessed()` method::
|
||||
|
||||
from pygments.lexer import RegexLexer
|
||||
|
||||
class ExampleLexer(RegexLexer):
|
||||
tokens = {...}
|
||||
|
||||
def get_tokens_unprocessed(self, text, stack=('root', 'otherstate')):
|
||||
for item in RegexLexer.get_tokens_unprocessed(text, stack):
|
||||
yield item
|
||||
|
||||
Some lexers like the `PhpLexer` use this to make the leading ``<?php``
|
||||
preprocessor comments optional. Note that you can crash the lexer easily by
|
||||
putting values into the stack that don't exist in the token map. Also
|
||||
removing ``'root'`` from the stack can result in strange errors!
|
||||
|
||||
- In some lexers, a state should be popped if anything is encountered that isn't
|
||||
matched by a rule in the state. You could use an empty regex at the end of
|
||||
the state list, but Pygments provides a more obvious way of spelling that:
|
||||
``default('#pop')`` is equivalent to ``('', Text, '#pop')``.
|
||||
|
||||
.. versionadded:: 2.0
|
||||
|
||||
|
||||
Subclassing lexers derived from RegexLexer
|
||||
==========================================
|
||||
|
||||
.. versionadded:: 1.6
|
||||
|
||||
Sometimes multiple languages are very similar, but should still be lexed by
|
||||
different lexer classes.
|
||||
|
||||
When subclassing a lexer derived from RegexLexer, the ``tokens`` dictionaries
|
||||
defined in the parent and child class are merged. For example::
|
||||
|
||||
from pygments.lexer import RegexLexer, inherit
|
||||
from pygments.token import *
|
||||
|
||||
class BaseLexer(RegexLexer):
|
||||
tokens = {
|
||||
'root': [
|
||||
('[a-z]+', Name),
|
||||
(r'/\*', Comment, 'comment'),
|
||||
('"', String, 'string'),
|
||||
('\s+', Text),
|
||||
],
|
||||
'string': [
|
||||
('[^"]+', String),
|
||||
('"', String, '#pop'),
|
||||
],
|
||||
'comment': [
|
||||
...
|
||||
],
|
||||
}
|
||||
|
||||
class DerivedLexer(BaseLexer):
|
||||
tokens = {
|
||||
'root': [
|
||||
('[0-9]+', Number),
|
||||
inherit,
|
||||
],
|
||||
'string': [
|
||||
(r'[^"\\]+', String),
|
||||
(r'\\.', String.Escape),
|
||||
('"', String, '#pop'),
|
||||
],
|
||||
}
|
||||
|
||||
The `BaseLexer` defines two states, lexing names and strings. The
|
||||
`DerivedLexer` defines its own tokens dictionary, which extends the definitions
|
||||
of the base lexer:
|
||||
|
||||
* The "root" state has an additional rule and then the special object `inherit`,
|
||||
which tells Pygments to insert the token definitions of the parent class at
|
||||
that point.
|
||||
|
||||
* The "string" state is replaced entirely, since there is not `inherit` rule.
|
||||
|
||||
* The "comment" state is inherited entirely.
|
||||
|
||||
|
||||
Using multiple lexers
|
||||
=====================
|
||||
|
||||
Using multiple lexers for the same input can be tricky. One of the easiest
|
||||
combination techniques is shown here: You can replace the action entry in a rule
|
||||
tuple with a lexer class. The matched text will then be lexed with that lexer,
|
||||
and the resulting tokens will be yielded.
|
||||
|
||||
For example, look at this stripped-down HTML lexer::
|
||||
|
||||
from pygments.lexer import RegexLexer, bygroups, using
|
||||
from pygments.token import *
|
||||
from pygments.lexers.javascript import JavascriptLexer
|
||||
|
||||
class HtmlLexer(RegexLexer):
|
||||
name = 'HTML'
|
||||
aliases = ['html']
|
||||
filenames = ['*.html', '*.htm']
|
||||
|
||||
flags = re.IGNORECASE | re.DOTALL
|
||||
tokens = {
|
||||
'root': [
|
||||
('[^<&]+', Text),
|
||||
('&.*?;', Name.Entity),
|
||||
(r'<\s*script\s*', Name.Tag, ('script-content', 'tag')),
|
||||
(r'<\s*[a-zA-Z0-9:]+', Name.Tag, 'tag'),
|
||||
(r'<\s*/\s*[a-zA-Z0-9:]+\s*>', Name.Tag),
|
||||
],
|
||||
'script-content': [
|
||||
(r'(.+?)(<\s*/\s*script\s*>)',
|
||||
bygroups(using(JavascriptLexer), Name.Tag),
|
||||
'#pop'),
|
||||
]
|
||||
}
|
||||
|
||||
Here the content of a ``<script>`` tag is passed to a newly created instance of
|
||||
a `JavascriptLexer` and not processed by the `HtmlLexer`. This is done using
|
||||
the `using` helper that takes the other lexer class as its parameter.
|
||||
|
||||
Note the combination of `bygroups` and `using`. This makes sure that the
|
||||
content up to the ``</script>`` end tag is processed by the `JavascriptLexer`,
|
||||
while the end tag is yielded as a normal token with the `Name.Tag` type.
|
||||
|
||||
Also note the ``(r'<\s*script\s*', Name.Tag, ('script-content', 'tag'))`` rule.
|
||||
Here, two states are pushed onto the state stack, ``'script-content'`` and
|
||||
``'tag'``. That means that first ``'tag'`` is processed, which will lex
|
||||
attributes and the closing ``>``, then the ``'tag'`` state is popped and the
|
||||
next state on top of the stack will be ``'script-content'``.
|
||||
|
||||
Since you cannot refer to the class currently being defined, use `this`
|
||||
(imported from `pygments.lexer`) to refer to the current lexer class, i.e.
|
||||
``using(this)``. This construct may seem unnecessary, but this is often the
|
||||
most obvious way of lexing arbitrary syntax between fixed delimiters without
|
||||
introducing deeply nested states.
|
||||
|
||||
The `using()` helper has a special keyword argument, `state`, which works as
|
||||
follows: if given, the lexer to use initially is not in the ``"root"`` state,
|
||||
but in the state given by this argument. This does not work with advanced
|
||||
`RegexLexer` subclasses such as `ExtendedRegexLexer` (see below).
|
||||
|
||||
Any other keywords arguments passed to `using()` are added to the keyword
|
||||
arguments used to create the lexer.
|
||||
|
||||
|
||||
Delegating Lexer
|
||||
================
|
||||
|
||||
Another approach for nested lexers is the `DelegatingLexer` which is for example
|
||||
used for the template engine lexers. It takes two lexers as arguments on
|
||||
initialisation: a `root_lexer` and a `language_lexer`.
|
||||
|
||||
The input is processed as follows: First, the whole text is lexed with the
|
||||
`language_lexer`. All tokens yielded with the special type of ``Other`` are
|
||||
then concatenated and given to the `root_lexer`. The language tokens of the
|
||||
`language_lexer` are then inserted into the `root_lexer`'s token stream at the
|
||||
appropriate positions. ::
|
||||
|
||||
from pygments.lexer import DelegatingLexer
|
||||
from pygments.lexers.web import HtmlLexer, PhpLexer
|
||||
|
||||
class HtmlPhpLexer(DelegatingLexer):
|
||||
def __init__(self, **options):
|
||||
super(HtmlPhpLexer, self).__init__(HtmlLexer, PhpLexer, **options)
|
||||
|
||||
This procedure ensures that e.g. HTML with template tags in it is highlighted
|
||||
correctly even if the template tags are put into HTML tags or attributes.
|
||||
|
||||
If you want to change the needle token ``Other`` to something else, you can give
|
||||
the lexer another token type as the third parameter::
|
||||
|
||||
DelegatingLexer.__init__(MyLexer, OtherLexer, Text, **options)
|
||||
|
||||
|
||||
Callbacks
|
||||
=========
|
||||
|
||||
Sometimes the grammar of a language is so complex that a lexer would be unable
|
||||
to process it just by using regular expressions and stacks.
|
||||
|
||||
For this, the `RegexLexer` allows callbacks to be given in rule tuples, instead
|
||||
of token types (`bygroups` and `using` are nothing else but preimplemented
|
||||
callbacks). The callback must be a function taking two arguments:
|
||||
|
||||
* the lexer itself
|
||||
* the match object for the last matched rule
|
||||
|
||||
The callback must then return an iterable of (or simply yield) ``(index,
|
||||
tokentype, value)`` tuples, which are then just passed through by
|
||||
`get_tokens_unprocessed()`. The ``index`` here is the position of the token in
|
||||
the input string, ``tokentype`` is the normal token type (like `Name.Builtin`),
|
||||
and ``value`` the associated part of the input string.
|
||||
|
||||
You can see an example here::
|
||||
|
||||
from pygments.lexer import RegexLexer
|
||||
from pygments.token import Generic
|
||||
|
||||
class HypotheticLexer(RegexLexer):
|
||||
|
||||
def headline_callback(lexer, match):
|
||||
equal_signs = match.group(1)
|
||||
text = match.group(2)
|
||||
yield match.start(), Generic.Headline, equal_signs + text + equal_signs
|
||||
|
||||
tokens = {
|
||||
'root': [
|
||||
(r'(=+)(.*?)(\1)', headline_callback)
|
||||
]
|
||||
}
|
||||
|
||||
If the regex for the `headline_callback` matches, the function is called with
|
||||
the match object. Note that after the callback is done, processing continues
|
||||
normally, that is, after the end of the previous match. The callback has no
|
||||
possibility to influence the position.
|
||||
|
||||
There are not really any simple examples for lexer callbacks, but you can see
|
||||
them in action e.g. in the `SMLLexer` class in `ml.py`_.
|
||||
|
||||
.. _ml.py: http://bitbucket.org/birkenfeld/pygments-main/src/tip/pygments/lexers/ml.py
|
||||
|
||||
|
||||
The ExtendedRegexLexer class
|
||||
============================
|
||||
|
||||
The `RegexLexer`, even with callbacks, unfortunately isn't powerful enough for
|
||||
the funky syntax rules of languages such as Ruby.
|
||||
|
||||
But fear not; even then you don't have to abandon the regular expression
|
||||
approach: Pygments has a subclass of `RegexLexer`, the `ExtendedRegexLexer`.
|
||||
All features known from RegexLexers are available here too, and the tokens are
|
||||
specified in exactly the same way, *except* for one detail:
|
||||
|
||||
The `get_tokens_unprocessed()` method holds its internal state data not as local
|
||||
variables, but in an instance of the `pygments.lexer.LexerContext` class, and
|
||||
that instance is passed to callbacks as a third argument. This means that you
|
||||
can modify the lexer state in callbacks.
|
||||
|
||||
The `LexerContext` class has the following members:
|
||||
|
||||
* `text` -- the input text
|
||||
* `pos` -- the current starting position that is used for matching regexes
|
||||
* `stack` -- a list containing the state stack
|
||||
* `end` -- the maximum position to which regexes are matched, this defaults to
|
||||
the length of `text`
|
||||
|
||||
Additionally, the `get_tokens_unprocessed()` method can be given a
|
||||
`LexerContext` instead of a string and will then process this context instead of
|
||||
creating a new one for the string argument.
|
||||
|
||||
Note that because you can set the current position to anything in the callback,
|
||||
it won't be automatically be set by the caller after the callback is finished.
|
||||
For example, this is how the hypothetical lexer above would be written with the
|
||||
`ExtendedRegexLexer`::
|
||||
|
||||
from pygments.lexer import ExtendedRegexLexer
|
||||
from pygments.token import Generic
|
||||
|
||||
class ExHypotheticLexer(ExtendedRegexLexer):
|
||||
|
||||
def headline_callback(lexer, match, ctx):
|
||||
equal_signs = match.group(1)
|
||||
text = match.group(2)
|
||||
yield match.start(), Generic.Headline, equal_signs + text + equal_signs
|
||||
ctx.pos = match.end()
|
||||
|
||||
tokens = {
|
||||
'root': [
|
||||
(r'(=+)(.*?)(\1)', headline_callback)
|
||||
]
|
||||
}
|
||||
|
||||
This might sound confusing (and it can really be). But it is needed, and for an
|
||||
example look at the Ruby lexer in `ruby.py`_.
|
||||
|
||||
.. _ruby.py: https://bitbucket.org/birkenfeld/pygments-main/src/tip/pygments/lexers/ruby.py
|
||||
|
||||
|
||||
Handling Lists of Keywords
|
||||
==========================
|
||||
|
||||
For a relatively short list (hundreds) you can construct an optimized regular
|
||||
expression directly using ``words()`` (longer lists, see next section). This
|
||||
function handles a few things for you automatically, including escaping
|
||||
metacharacters and Python's first-match rather than longest-match in
|
||||
alternations. Feel free to put the lists themselves in
|
||||
``pygments/lexers/_$lang_builtins.py`` (see examples there), and generated by
|
||||
code if possible.
|
||||
|
||||
An example of using ``words()`` is something like::
|
||||
|
||||
from pygments.lexer import RegexLexer, words, Name
|
||||
|
||||
class MyLexer(RegexLexer):
|
||||
|
||||
tokens = {
|
||||
'root': [
|
||||
(words(('else', 'elseif'), suffix=r'\b'), Name.Builtin),
|
||||
(r'\w+', Name),
|
||||
],
|
||||
}
|
||||
|
||||
As you can see, you can add ``prefix`` and ``suffix`` parts to the constructed
|
||||
regex.
|
||||
|
||||
|
||||
Modifying Token Streams
|
||||
=======================
|
||||
|
||||
Some languages ship a lot of builtin functions (for example PHP). The total
|
||||
amount of those functions differs from system to system because not everybody
|
||||
has every extension installed. In the case of PHP there are over 3000 builtin
|
||||
functions. That's an incredibly huge amount of functions, much more than you
|
||||
want to put into a regular expression.
|
||||
|
||||
But because only `Name` tokens can be function names this is solvable by
|
||||
overriding the ``get_tokens_unprocessed()`` method. The following lexer
|
||||
subclasses the `PythonLexer` so that it highlights some additional names as
|
||||
pseudo keywords::
|
||||
|
||||
from pygments.lexers.python import PythonLexer
|
||||
from pygments.token import Name, Keyword
|
||||
|
||||
class MyPythonLexer(PythonLexer):
|
||||
EXTRA_KEYWORDS = set(('foo', 'bar', 'foobar', 'barfoo', 'spam', 'eggs'))
|
||||
|
||||
def get_tokens_unprocessed(self, text):
|
||||
for index, token, value in PythonLexer.get_tokens_unprocessed(self, text):
|
||||
if token is Name and value in self.EXTRA_KEYWORDS:
|
||||
yield index, Keyword.Pseudo, value
|
||||
else:
|
||||
yield index, token, value
|
||||
|
||||
The `PhpLexer` and `LuaLexer` use this method to resolve builtin functions.
|
||||
69
vendor/pygments/doc/docs/lexers.rst
vendored
Normal file
69
vendor/pygments/doc/docs/lexers.rst
vendored
Normal file
@@ -0,0 +1,69 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
================
|
||||
Available lexers
|
||||
================
|
||||
|
||||
This page lists all available builtin lexers and the options they take.
|
||||
|
||||
Currently, **all lexers** support these options:
|
||||
|
||||
`stripnl`
|
||||
Strip leading and trailing newlines from the input (default: ``True``)
|
||||
|
||||
`stripall`
|
||||
Strip all leading and trailing whitespace from the input (default:
|
||||
``False``).
|
||||
|
||||
`ensurenl`
|
||||
Make sure that the input ends with a newline (default: ``True``). This
|
||||
is required for some lexers that consume input linewise.
|
||||
|
||||
.. versionadded:: 1.3
|
||||
|
||||
`tabsize`
|
||||
If given and greater than 0, expand tabs in the input (default: ``0``).
|
||||
|
||||
`encoding`
|
||||
If given, must be an encoding name (such as ``"utf-8"``). This encoding
|
||||
will be used to convert the input string to Unicode (if it is not already
|
||||
a Unicode string). The default is ``"guess"``.
|
||||
|
||||
If this option is set to ``"guess"``, a simple UTF-8 vs. Latin-1
|
||||
detection is used, if it is set to ``"chardet"``, the
|
||||
`chardet library <http://chardet.feedparser.org/>`_ is used to
|
||||
guess the encoding of the input.
|
||||
|
||||
.. versionadded:: 0.6
|
||||
|
||||
|
||||
The "Short Names" field lists the identifiers that can be used with the
|
||||
`get_lexer_by_name()` function.
|
||||
|
||||
These lexers are builtin and can be imported from `pygments.lexers`:
|
||||
|
||||
.. pygmentsdoc:: lexers
|
||||
|
||||
|
||||
Iterating over all lexers
|
||||
-------------------------
|
||||
|
||||
.. versionadded:: 0.6
|
||||
|
||||
To get all lexers (both the builtin and the plugin ones), you can
|
||||
use the `get_all_lexers()` function from the `pygments.lexers`
|
||||
module:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.lexers import get_all_lexers
|
||||
>>> i = get_all_lexers()
|
||||
>>> i.next()
|
||||
('Diff', ('diff',), ('*.diff', '*.patch'), ('text/x-diff', 'text/x-patch'))
|
||||
>>> i.next()
|
||||
('Delphi', ('delphi', 'objectpascal', 'pas', 'pascal'), ('*.pas',), ('text/x-pascal',))
|
||||
>>> i.next()
|
||||
('XML+Ruby', ('xml+erb', 'xml+ruby'), (), ())
|
||||
|
||||
As you can see, the return value is an iterator which yields tuples
|
||||
in the form ``(name, aliases, filetypes, mimetypes)``.
|
||||
39
vendor/pygments/doc/docs/moinmoin.rst
vendored
Normal file
39
vendor/pygments/doc/docs/moinmoin.rst
vendored
Normal file
@@ -0,0 +1,39 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
============================
|
||||
Using Pygments with MoinMoin
|
||||
============================
|
||||
|
||||
From Pygments 0.7, the source distribution ships a `Moin`_ parser plugin that
|
||||
can be used to get Pygments highlighting in Moin wiki pages.
|
||||
|
||||
To use it, copy the file `external/moin-parser.py` from the Pygments
|
||||
distribution to the `data/plugin/parser` subdirectory of your Moin instance.
|
||||
Edit the options at the top of the file (currently ``ATTACHMENTS`` and
|
||||
``INLINESTYLES``) and rename the file to the name that the parser directive
|
||||
should have. For example, if you name the file ``code.py``, you can get a
|
||||
highlighted Python code sample with this Wiki markup::
|
||||
|
||||
{{{
|
||||
#!code python
|
||||
[...]
|
||||
}}}
|
||||
|
||||
where ``python`` is the Pygments name of the lexer to use.
|
||||
|
||||
Additionally, if you set the ``ATTACHMENTS`` option to True, Pygments will also
|
||||
be called for all attachments for whose filenames there is no other parser
|
||||
registered.
|
||||
|
||||
You are responsible for including CSS rules that will map the Pygments CSS
|
||||
classes to colors. You can output a stylesheet file with `pygmentize`, put it
|
||||
into the `htdocs` directory of your Moin instance and then include it in the
|
||||
`stylesheets` configuration option in the Moin config, e.g.::
|
||||
|
||||
stylesheets = [('screen', '/htdocs/pygments.css')]
|
||||
|
||||
If you do not want to do that and are willing to accept larger HTML output, you
|
||||
can set the ``INLINESTYLES`` option to True.
|
||||
|
||||
|
||||
.. _Moin: http://moinmoin.wikiwikiweb.de/
|
||||
93
vendor/pygments/doc/docs/plugins.rst
vendored
Normal file
93
vendor/pygments/doc/docs/plugins.rst
vendored
Normal file
@@ -0,0 +1,93 @@
|
||||
================
|
||||
Register Plugins
|
||||
================
|
||||
|
||||
If you want to extend Pygments without hacking the sources, but want to
|
||||
use the lexer/formatter/style/filter lookup functions (`lexers.get_lexer_by_name`
|
||||
et al.), you can use `setuptools`_ entrypoints to add new lexers, formatters
|
||||
or styles as if they were in the Pygments core.
|
||||
|
||||
.. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools
|
||||
|
||||
That means you can use your highlighter modules with the `pygmentize` script,
|
||||
which relies on the mentioned functions.
|
||||
|
||||
|
||||
Entrypoints
|
||||
===========
|
||||
|
||||
Here is a list of setuptools entrypoints that Pygments understands:
|
||||
|
||||
`pygments.lexers`
|
||||
|
||||
This entrypoint is used for adding new lexers to the Pygments core.
|
||||
The name of the entrypoint values doesn't really matter, Pygments extracts
|
||||
required metadata from the class definition:
|
||||
|
||||
.. sourcecode:: ini
|
||||
|
||||
[pygments.lexers]
|
||||
yourlexer = yourmodule:YourLexer
|
||||
|
||||
Note that you have to define ``name``, ``aliases`` and ``filename``
|
||||
attributes so that you can use the highlighter from the command line:
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
class YourLexer(...):
|
||||
name = 'Name Of Your Lexer'
|
||||
aliases = ['alias']
|
||||
filenames = ['*.ext']
|
||||
|
||||
|
||||
`pygments.formatters`
|
||||
|
||||
You can use this entrypoint to add new formatters to Pygments. The
|
||||
name of an entrypoint item is the name of the formatter. If you
|
||||
prefix the name with a slash it's used as a filename pattern:
|
||||
|
||||
.. sourcecode:: ini
|
||||
|
||||
[pygments.formatters]
|
||||
yourformatter = yourmodule:YourFormatter
|
||||
/.ext = yourmodule:YourFormatter
|
||||
|
||||
|
||||
`pygments.styles`
|
||||
|
||||
To add a new style you can use this entrypoint. The name of the entrypoint
|
||||
is the name of the style:
|
||||
|
||||
.. sourcecode:: ini
|
||||
|
||||
[pygments.styles]
|
||||
yourstyle = yourmodule:YourStyle
|
||||
|
||||
|
||||
`pygments.filters`
|
||||
|
||||
Use this entrypoint to register a new filter. The name of the
|
||||
entrypoint is the name of the filter:
|
||||
|
||||
.. sourcecode:: ini
|
||||
|
||||
[pygments.filters]
|
||||
yourfilter = yourmodule:YourFilter
|
||||
|
||||
|
||||
How To Use Entrypoints
|
||||
======================
|
||||
|
||||
This documentation doesn't explain how to use those entrypoints because this is
|
||||
covered in the `setuptools documentation`_. That page should cover everything
|
||||
you need to write a plugin.
|
||||
|
||||
.. _setuptools documentation: http://peak.telecommunity.com/DevCenter/setuptools
|
||||
|
||||
|
||||
Extending The Core
|
||||
==================
|
||||
|
||||
If you have written a Pygments plugin that is open source, please inform us
|
||||
about that. There is a high chance that we'll add it to the Pygments
|
||||
distribution.
|
||||
205
vendor/pygments/doc/docs/quickstart.rst
vendored
Normal file
205
vendor/pygments/doc/docs/quickstart.rst
vendored
Normal file
@@ -0,0 +1,205 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
===========================
|
||||
Introduction and Quickstart
|
||||
===========================
|
||||
|
||||
|
||||
Welcome to Pygments! This document explains the basic concepts and terms and
|
||||
gives a few examples of how to use the library.
|
||||
|
||||
|
||||
Architecture
|
||||
============
|
||||
|
||||
There are four types of components that work together highlighting a piece of
|
||||
code:
|
||||
|
||||
* A **lexer** splits the source into tokens, fragments of the source that
|
||||
have a token type that determines what the text represents semantically
|
||||
(e.g., keyword, string, or comment). There is a lexer for every language
|
||||
or markup format that Pygments supports.
|
||||
* The token stream can be piped through **filters**, which usually modify
|
||||
the token types or text fragments, e.g. uppercasing all keywords.
|
||||
* A **formatter** then takes the token stream and writes it to an output
|
||||
file, in a format such as HTML, LaTeX or RTF.
|
||||
* While writing the output, a **style** determines how to highlight all the
|
||||
different token types. It maps them to attributes like "red and bold".
|
||||
|
||||
|
||||
Example
|
||||
=======
|
||||
|
||||
Here is a small example for highlighting Python code:
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
from pygments import highlight
|
||||
from pygments.lexers import PythonLexer
|
||||
from pygments.formatters import HtmlFormatter
|
||||
|
||||
code = 'print "Hello World"'
|
||||
print highlight(code, PythonLexer(), HtmlFormatter())
|
||||
|
||||
which prints something like this:
|
||||
|
||||
.. sourcecode:: html
|
||||
|
||||
<div class="highlight">
|
||||
<pre><span class="k">print</span> <span class="s">"Hello World"</span></pre>
|
||||
</div>
|
||||
|
||||
As you can see, Pygments uses CSS classes (by default, but you can change that)
|
||||
instead of inline styles in order to avoid outputting redundant style information over
|
||||
and over. A CSS stylesheet that contains all CSS classes possibly used in the output
|
||||
can be produced by:
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
print HtmlFormatter().get_style_defs('.highlight')
|
||||
|
||||
The argument to :func:`get_style_defs` is used as an additional CSS selector:
|
||||
the output may look like this:
|
||||
|
||||
.. sourcecode:: css
|
||||
|
||||
.highlight .k { color: #AA22FF; font-weight: bold }
|
||||
.highlight .s { color: #BB4444 }
|
||||
...
|
||||
|
||||
|
||||
Options
|
||||
=======
|
||||
|
||||
The :func:`highlight()` function supports a fourth argument called *outfile*, it
|
||||
must be a file object if given. The formatted output will then be written to
|
||||
this file instead of being returned as a string.
|
||||
|
||||
Lexers and formatters both support options. They are given to them as keyword
|
||||
arguments either to the class or to the lookup method:
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
from pygments import highlight
|
||||
from pygments.lexers import get_lexer_by_name
|
||||
from pygments.formatters import HtmlFormatter
|
||||
|
||||
lexer = get_lexer_by_name("python", stripall=True)
|
||||
formatter = HtmlFormatter(linenos=True, cssclass="source")
|
||||
result = highlight(code, lexer, formatter)
|
||||
|
||||
This makes the lexer strip all leading and trailing whitespace from the input
|
||||
(`stripall` option), lets the formatter output line numbers (`linenos` option),
|
||||
and sets the wrapping ``<div>``'s class to ``source`` (instead of
|
||||
``highlight``).
|
||||
|
||||
Important options include:
|
||||
|
||||
`encoding` : for lexers and formatters
|
||||
Since Pygments uses Unicode strings internally, this determines which
|
||||
encoding will be used to convert to or from byte strings.
|
||||
`style` : for formatters
|
||||
The name of the style to use when writing the output.
|
||||
|
||||
|
||||
For an overview of builtin lexers and formatters and their options, visit the
|
||||
:doc:`lexer <lexers>` and :doc:`formatters <formatters>` lists.
|
||||
|
||||
For a documentation on filters, see :doc:`this page <filters>`.
|
||||
|
||||
|
||||
Lexer and formatter lookup
|
||||
==========================
|
||||
|
||||
If you want to lookup a built-in lexer by its alias or a filename, you can use
|
||||
one of the following methods:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.lexers import (get_lexer_by_name,
|
||||
... get_lexer_for_filename, get_lexer_for_mimetype)
|
||||
|
||||
>>> get_lexer_by_name('python')
|
||||
<pygments.lexers.PythonLexer>
|
||||
|
||||
>>> get_lexer_for_filename('spam.rb')
|
||||
<pygments.lexers.RubyLexer>
|
||||
|
||||
>>> get_lexer_for_mimetype('text/x-perl')
|
||||
<pygments.lexers.PerlLexer>
|
||||
|
||||
All these functions accept keyword arguments; they will be passed to the lexer
|
||||
as options.
|
||||
|
||||
A similar API is available for formatters: use :func:`.get_formatter_by_name()`
|
||||
and :func:`.get_formatter_for_filename()` from the :mod:`pygments.formatters`
|
||||
module for this purpose.
|
||||
|
||||
|
||||
Guessing lexers
|
||||
===============
|
||||
|
||||
If you don't know the content of the file, or you want to highlight a file
|
||||
whose extension is ambiguous, such as ``.html`` (which could contain plain HTML
|
||||
or some template tags), use these functions:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.lexers import guess_lexer, guess_lexer_for_filename
|
||||
|
||||
>>> guess_lexer('#!/usr/bin/python\nprint "Hello World!"')
|
||||
<pygments.lexers.PythonLexer>
|
||||
|
||||
>>> guess_lexer_for_filename('test.py', 'print "Hello World!"')
|
||||
<pygments.lexers.PythonLexer>
|
||||
|
||||
:func:`.guess_lexer()` passes the given content to the lexer classes'
|
||||
:meth:`analyse_text()` method and returns the one for which it returns the
|
||||
highest number.
|
||||
|
||||
All lexers have two different filename pattern lists: the primary and the
|
||||
secondary one. The :func:`.get_lexer_for_filename()` function only uses the
|
||||
primary list, whose entries are supposed to be unique among all lexers.
|
||||
:func:`.guess_lexer_for_filename()`, however, will first loop through all lexers
|
||||
and look at the primary and secondary filename patterns if the filename matches.
|
||||
If only one lexer matches, it is returned, else the guessing mechanism of
|
||||
:func:`.guess_lexer()` is used with the matching lexers.
|
||||
|
||||
As usual, keyword arguments to these functions are given to the created lexer
|
||||
as options.
|
||||
|
||||
|
||||
Command line usage
|
||||
==================
|
||||
|
||||
You can use Pygments from the command line, using the :program:`pygmentize`
|
||||
script::
|
||||
|
||||
$ pygmentize test.py
|
||||
|
||||
will highlight the Python file test.py using ANSI escape sequences
|
||||
(a.k.a. terminal colors) and print the result to standard output.
|
||||
|
||||
To output HTML, use the ``-f`` option::
|
||||
|
||||
$ pygmentize -f html -o test.html test.py
|
||||
|
||||
to write an HTML-highlighted version of test.py to the file test.html.
|
||||
Note that it will only be a snippet of HTML, if you want a full HTML document,
|
||||
use the "full" option::
|
||||
|
||||
$ pygmentize -f html -O full -o test.html test.py
|
||||
|
||||
This will produce a full HTML document with included stylesheet.
|
||||
|
||||
A style can be selected with ``-O style=<name>``.
|
||||
|
||||
If you need a stylesheet for an existing HTML file using Pygments CSS classes,
|
||||
it can be created with::
|
||||
|
||||
$ pygmentize -S default -f html > style.css
|
||||
|
||||
where ``default`` is the style name.
|
||||
|
||||
More options and tricks and be found in the :doc:`command line reference
|
||||
<cmdline>`.
|
||||
22
vendor/pygments/doc/docs/rstdirective.rst
vendored
Normal file
22
vendor/pygments/doc/docs/rstdirective.rst
vendored
Normal file
@@ -0,0 +1,22 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
================================
|
||||
Using Pygments in ReST documents
|
||||
================================
|
||||
|
||||
Many Python people use `ReST`_ for documentation their sourcecode, programs,
|
||||
scripts et cetera. This also means that documentation often includes sourcecode
|
||||
samples or snippets.
|
||||
|
||||
You can easily enable Pygments support for your ReST texts using a custom
|
||||
directive -- this is also how this documentation displays source code.
|
||||
|
||||
From Pygments 0.9, the directive is shipped in the distribution as
|
||||
`external/rst-directive.py`. You can copy and adapt this code to your liking.
|
||||
|
||||
.. removed -- too confusing
|
||||
*Loosely related note:* The ReST lexer now recognizes ``.. sourcecode::`` and
|
||||
``.. code::`` directives and highlights the contents in the specified language
|
||||
if the `handlecodeblocks` option is true.
|
||||
|
||||
.. _ReST: http://docutils.sf.net/rst.html
|
||||
145
vendor/pygments/doc/docs/styles.rst
vendored
Normal file
145
vendor/pygments/doc/docs/styles.rst
vendored
Normal file
@@ -0,0 +1,145 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
======
|
||||
Styles
|
||||
======
|
||||
|
||||
Pygments comes with some builtin styles that work for both the HTML and
|
||||
LaTeX formatter.
|
||||
|
||||
The builtin styles can be looked up with the `get_style_by_name` function:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.styles import get_style_by_name
|
||||
>>> get_style_by_name('colorful')
|
||||
<class 'pygments.styles.colorful.ColorfulStyle'>
|
||||
|
||||
You can pass a instance of a `Style` class to a formatter as the `style`
|
||||
option in form of a string:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.styles import get_style_by_name
|
||||
>>> from pygments.formatters import HtmlFormatter
|
||||
>>> HtmlFormatter(style='colorful').style
|
||||
<class 'pygments.styles.colorful.ColorfulStyle'>
|
||||
|
||||
Or you can also import your own style (which must be a subclass of
|
||||
`pygments.style.Style`) and pass it to the formatter:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from yourapp.yourmodule import YourStyle
|
||||
>>> from pygments.formatters import HtmlFormatter
|
||||
>>> HtmlFormatter(style=YourStyle).style
|
||||
<class 'yourapp.yourmodule.YourStyle'>
|
||||
|
||||
|
||||
Creating Own Styles
|
||||
===================
|
||||
|
||||
So, how to create a style? All you have to do is to subclass `Style` and
|
||||
define some styles:
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
from pygments.style import Style
|
||||
from pygments.token import Keyword, Name, Comment, String, Error, \
|
||||
Number, Operator, Generic
|
||||
|
||||
class YourStyle(Style):
|
||||
default_style = ""
|
||||
styles = {
|
||||
Comment: 'italic #888',
|
||||
Keyword: 'bold #005',
|
||||
Name: '#f00',
|
||||
Name.Function: '#0f0',
|
||||
Name.Class: 'bold #0f0',
|
||||
String: 'bg:#eee #111'
|
||||
}
|
||||
|
||||
That's it. There are just a few rules. When you define a style for `Name`
|
||||
the style automatically also affects `Name.Function` and so on. If you
|
||||
defined ``'bold'`` and you don't want boldface for a subtoken use ``'nobold'``.
|
||||
|
||||
(Philosophy: the styles aren't written in CSS syntax since this way
|
||||
they can be used for a variety of formatters.)
|
||||
|
||||
`default_style` is the style inherited by all token types.
|
||||
|
||||
To make the style usable for Pygments, you must
|
||||
|
||||
* either register it as a plugin (see :doc:`the plugin docs <plugins>`)
|
||||
* or drop it into the `styles` subpackage of your Pygments distribution one style
|
||||
class per style, where the file name is the style name and the class name is
|
||||
`StylenameClass`. For example, if your style should be called
|
||||
``"mondrian"``, name the class `MondrianStyle`, put it into the file
|
||||
``mondrian.py`` and this file into the ``pygments.styles`` subpackage
|
||||
directory.
|
||||
|
||||
|
||||
Style Rules
|
||||
===========
|
||||
|
||||
Here a small overview of all allowed styles:
|
||||
|
||||
``bold``
|
||||
render text as bold
|
||||
``nobold``
|
||||
don't render text as bold (to prevent subtokens being highlighted bold)
|
||||
``italic``
|
||||
render text italic
|
||||
``noitalic``
|
||||
don't render text as italic
|
||||
``underline``
|
||||
render text underlined
|
||||
``nounderline``
|
||||
don't render text underlined
|
||||
``bg:``
|
||||
transparent background
|
||||
``bg:#000000``
|
||||
background color (black)
|
||||
``border:``
|
||||
no border
|
||||
``border:#ffffff``
|
||||
border color (white)
|
||||
``#ff0000``
|
||||
text color (red)
|
||||
``noinherit``
|
||||
don't inherit styles from supertoken
|
||||
|
||||
Note that there may not be a space between ``bg:`` and the color value
|
||||
since the style definition string is split at whitespace.
|
||||
Also, using named colors is not allowed since the supported color names
|
||||
vary for different formatters.
|
||||
|
||||
Furthermore, not all lexers might support every style.
|
||||
|
||||
|
||||
Builtin Styles
|
||||
==============
|
||||
|
||||
Pygments ships some builtin styles which are maintained by the Pygments team.
|
||||
|
||||
To get a list of known styles you can use this snippet:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.styles import STYLE_MAP
|
||||
>>> STYLE_MAP.keys()
|
||||
['default', 'emacs', 'friendly', 'colorful']
|
||||
|
||||
|
||||
Getting a list of available styles
|
||||
==================================
|
||||
|
||||
.. versionadded:: 0.6
|
||||
|
||||
Because it could be that a plugin registered a style, there is
|
||||
a way to iterate over all styles:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.styles import get_all_styles
|
||||
>>> styles = list(get_all_styles())
|
||||
356
vendor/pygments/doc/docs/tokens.rst
vendored
Normal file
356
vendor/pygments/doc/docs/tokens.rst
vendored
Normal file
@@ -0,0 +1,356 @@
|
||||
.. -*- mode: rst -*-
|
||||
|
||||
==============
|
||||
Builtin Tokens
|
||||
==============
|
||||
|
||||
.. module:: pygments.token
|
||||
|
||||
In the :mod:`pygments.token` module, there is a special object called `Token`
|
||||
that is used to create token types.
|
||||
|
||||
You can create a new token type by accessing an attribute of `Token`:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.token import Token
|
||||
>>> Token.String
|
||||
Token.String
|
||||
>>> Token.String is Token.String
|
||||
True
|
||||
|
||||
Note that tokens are singletons so you can use the ``is`` operator for comparing
|
||||
token types.
|
||||
|
||||
As of Pygments 0.7 you can also use the ``in`` operator to perform set tests:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.token import Comment
|
||||
>>> Comment.Single in Comment
|
||||
True
|
||||
>>> Comment in Comment.Multi
|
||||
False
|
||||
|
||||
This can be useful in :doc:`filters <filters>` and if you write lexers on your
|
||||
own without using the base lexers.
|
||||
|
||||
You can also split a token type into a hierarchy, and get the parent of it:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> String.split()
|
||||
[Token, Token.Literal, Token.Literal.String]
|
||||
>>> String.parent
|
||||
Token.Literal
|
||||
|
||||
In principle, you can create an unlimited number of token types but nobody can
|
||||
guarantee that a style would define style rules for a token type. Because of
|
||||
that, Pygments proposes some global token types defined in the
|
||||
`pygments.token.STANDARD_TYPES` dict.
|
||||
|
||||
For some tokens aliases are already defined:
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.token import String
|
||||
>>> String
|
||||
Token.Literal.String
|
||||
|
||||
Inside the :mod:`pygments.token` module the following aliases are defined:
|
||||
|
||||
============= ============================ ====================================
|
||||
`Text` `Token.Text` for any type of text data
|
||||
`Whitespace` `Token.Text.Whitespace` for specially highlighted whitespace
|
||||
`Error` `Token.Error` represents lexer errors
|
||||
`Other` `Token.Other` special token for data not
|
||||
matched by a parser (e.g. HTML
|
||||
markup in PHP code)
|
||||
`Keyword` `Token.Keyword` any kind of keywords
|
||||
`Name` `Token.Name` variable/function names
|
||||
`Literal` `Token.Literal` Any literals
|
||||
`String` `Token.Literal.String` string literals
|
||||
`Number` `Token.Literal.Number` number literals
|
||||
`Operator` `Token.Operator` operators (``+``, ``not``...)
|
||||
`Punctuation` `Token.Punctuation` punctuation (``[``, ``(``...)
|
||||
`Comment` `Token.Comment` any kind of comments
|
||||
`Generic` `Token.Generic` generic tokens (have a look at
|
||||
the explanation below)
|
||||
============= ============================ ====================================
|
||||
|
||||
The `Whitespace` token type is new in Pygments 0.8. It is used only by the
|
||||
`VisibleWhitespaceFilter` currently.
|
||||
|
||||
Normally you just create token types using the already defined aliases. For each
|
||||
of those token aliases, a number of subtypes exists (excluding the special tokens
|
||||
`Token.Text`, `Token.Error` and `Token.Other`)
|
||||
|
||||
The `is_token_subtype()` function in the `pygments.token` module can be used to
|
||||
test if a token type is a subtype of another (such as `Name.Tag` and `Name`).
|
||||
(This is the same as ``Name.Tag in Name``. The overloaded `in` operator was newly
|
||||
introduced in Pygments 0.7, the function still exists for backwards
|
||||
compatibility.)
|
||||
|
||||
With Pygments 0.7, it's also possible to convert strings to token types (for example
|
||||
if you want to supply a token from the command line):
|
||||
|
||||
.. sourcecode:: pycon
|
||||
|
||||
>>> from pygments.token import String, string_to_tokentype
|
||||
>>> string_to_tokentype("String")
|
||||
Token.Literal.String
|
||||
>>> string_to_tokentype("Token.Literal.String")
|
||||
Token.Literal.String
|
||||
>>> string_to_tokentype(String)
|
||||
Token.Literal.String
|
||||
|
||||
|
||||
Keyword Tokens
|
||||
==============
|
||||
|
||||
`Keyword`
|
||||
For any kind of keyword (especially if it doesn't match any of the
|
||||
subtypes of course).
|
||||
|
||||
`Keyword.Constant`
|
||||
For keywords that are constants (e.g. ``None`` in future Python versions).
|
||||
|
||||
`Keyword.Declaration`
|
||||
For keywords used for variable declaration (e.g. ``var`` in some programming
|
||||
languages like JavaScript).
|
||||
|
||||
`Keyword.Namespace`
|
||||
For keywords used for namespace declarations (e.g. ``import`` in Python and
|
||||
Java and ``package`` in Java).
|
||||
|
||||
`Keyword.Pseudo`
|
||||
For keywords that aren't really keywords (e.g. ``None`` in old Python
|
||||
versions).
|
||||
|
||||
`Keyword.Reserved`
|
||||
For reserved keywords.
|
||||
|
||||
`Keyword.Type`
|
||||
For builtin types that can't be used as identifiers (e.g. ``int``,
|
||||
``char`` etc. in C).
|
||||
|
||||
|
||||
Name Tokens
|
||||
===========
|
||||
|
||||
`Name`
|
||||
For any name (variable names, function names, classes).
|
||||
|
||||
`Name.Attribute`
|
||||
For all attributes (e.g. in HTML tags).
|
||||
|
||||
`Name.Builtin`
|
||||
Builtin names; names that are available in the global namespace.
|
||||
|
||||
`Name.Builtin.Pseudo`
|
||||
Builtin names that are implicit (e.g. ``self`` in Ruby, ``this`` in Java).
|
||||
|
||||
`Name.Class`
|
||||
Class names. Because no lexer can know if a name is a class or a function
|
||||
or something else this token is meant for class declarations.
|
||||
|
||||
`Name.Constant`
|
||||
Token type for constants. In some languages you can recognise a token by the
|
||||
way it's defined (the value after a ``const`` keyword for example). In
|
||||
other languages constants are uppercase by definition (Ruby).
|
||||
|
||||
`Name.Decorator`
|
||||
Token type for decorators. Decorators are syntactic elements in the Python
|
||||
language. Similar syntax elements exist in C# and Java.
|
||||
|
||||
`Name.Entity`
|
||||
Token type for special entities. (e.g. `` `` in HTML).
|
||||
|
||||
`Name.Exception`
|
||||
Token type for exception names (e.g. ``RuntimeError`` in Python). Some languages
|
||||
define exceptions in the function signature (Java). You can highlight
|
||||
the name of that exception using this token then.
|
||||
|
||||
`Name.Function`
|
||||
Token type for function names.
|
||||
|
||||
`Name.Label`
|
||||
Token type for label names (e.g. in languages that support ``goto``).
|
||||
|
||||
`Name.Namespace`
|
||||
Token type for namespaces. (e.g. import paths in Java/Python), names following
|
||||
the ``module``/``namespace`` keyword in other languages.
|
||||
|
||||
`Name.Other`
|
||||
Other names. Normally unused.
|
||||
|
||||
`Name.Tag`
|
||||
Tag names (in HTML/XML markup or configuration files).
|
||||
|
||||
`Name.Variable`
|
||||
Token type for variables. Some languages have prefixes for variable names
|
||||
(PHP, Ruby, Perl). You can highlight them using this token.
|
||||
|
||||
`Name.Variable.Class`
|
||||
same as `Name.Variable` but for class variables (also static variables).
|
||||
|
||||
`Name.Variable.Global`
|
||||
same as `Name.Variable` but for global variables (used in Ruby, for
|
||||
example).
|
||||
|
||||
`Name.Variable.Instance`
|
||||
same as `Name.Variable` but for instance variables.
|
||||
|
||||
|
||||
Literals
|
||||
========
|
||||
|
||||
`Literal`
|
||||
For any literal (if not further defined).
|
||||
|
||||
`Literal.Date`
|
||||
for date literals (e.g. ``42d`` in Boo).
|
||||
|
||||
|
||||
`String`
|
||||
For any string literal.
|
||||
|
||||
`String.Backtick`
|
||||
Token type for strings enclosed in backticks.
|
||||
|
||||
`String.Char`
|
||||
Token type for single characters (e.g. Java, C).
|
||||
|
||||
`String.Doc`
|
||||
Token type for documentation strings (for example Python).
|
||||
|
||||
`String.Double`
|
||||
Double quoted strings.
|
||||
|
||||
`String.Escape`
|
||||
Token type for escape sequences in strings.
|
||||
|
||||
`String.Heredoc`
|
||||
Token type for "heredoc" strings (e.g. in Ruby or Perl).
|
||||
|
||||
`String.Interpol`
|
||||
Token type for interpolated parts in strings (e.g. ``#{foo}`` in Ruby).
|
||||
|
||||
`String.Other`
|
||||
Token type for any other strings (for example ``%q{foo}`` string constructs
|
||||
in Ruby).
|
||||
|
||||
`String.Regex`
|
||||
Token type for regular expression literals (e.g. ``/foo/`` in JavaScript).
|
||||
|
||||
`String.Single`
|
||||
Token type for single quoted strings.
|
||||
|
||||
`String.Symbol`
|
||||
Token type for symbols (e.g. ``:foo`` in LISP or Ruby).
|
||||
|
||||
|
||||
`Number`
|
||||
Token type for any number literal.
|
||||
|
||||
`Number.Bin`
|
||||
Token type for binary literals (e.g. ``0b101010``).
|
||||
|
||||
`Number.Float`
|
||||
Token type for float literals (e.g. ``42.0``).
|
||||
|
||||
`Number.Hex`
|
||||
Token type for hexadecimal number literals (e.g. ``0xdeadbeef``).
|
||||
|
||||
`Number.Integer`
|
||||
Token type for integer literals (e.g. ``42``).
|
||||
|
||||
`Number.Integer.Long`
|
||||
Token type for long integer literals (e.g. ``42L`` in Python).
|
||||
|
||||
`Number.Oct`
|
||||
Token type for octal literals.
|
||||
|
||||
|
||||
Operators
|
||||
=========
|
||||
|
||||
`Operator`
|
||||
For any punctuation operator (e.g. ``+``, ``-``).
|
||||
|
||||
`Operator.Word`
|
||||
For any operator that is a word (e.g. ``not``).
|
||||
|
||||
|
||||
Punctuation
|
||||
===========
|
||||
|
||||
.. versionadded:: 0.7
|
||||
|
||||
`Punctuation`
|
||||
For any punctuation which is not an operator (e.g. ``[``, ``(``...)
|
||||
|
||||
|
||||
Comments
|
||||
========
|
||||
|
||||
`Comment`
|
||||
Token type for any comment.
|
||||
|
||||
`Comment.Hashbang`
|
||||
Token type for hashbang comments (i.e. first lines of files that start with
|
||||
``#!``).
|
||||
|
||||
`Comment.Multiline`
|
||||
Token type for multiline comments.
|
||||
|
||||
`Comment.Preproc`
|
||||
Token type for preprocessor comments (also ``<?php``/``<%`` constructs).
|
||||
|
||||
`Comment.Single`
|
||||
Token type for comments that end at the end of a line (e.g. ``# foo``).
|
||||
|
||||
`Comment.Special`
|
||||
Special data in comments. For example code tags, author and license
|
||||
information, etc.
|
||||
|
||||
|
||||
Generic Tokens
|
||||
==============
|
||||
|
||||
Generic tokens are for special lexers like the `DiffLexer` that doesn't really
|
||||
highlight a programming language but a patch file.
|
||||
|
||||
|
||||
`Generic`
|
||||
A generic, unstyled token. Normally you don't use this token type.
|
||||
|
||||
`Generic.Deleted`
|
||||
Marks the token value as deleted.
|
||||
|
||||
`Generic.Emph`
|
||||
Marks the token value as emphasized.
|
||||
|
||||
`Generic.Error`
|
||||
Marks the token value as an error message.
|
||||
|
||||
`Generic.Heading`
|
||||
Marks the token value as headline.
|
||||
|
||||
`Generic.Inserted`
|
||||
Marks the token value as inserted.
|
||||
|
||||
`Generic.Output`
|
||||
Marks the token value as program output (e.g. for python cli lexer).
|
||||
|
||||
`Generic.Prompt`
|
||||
Marks the token value as command prompt (e.g. bash lexer).
|
||||
|
||||
`Generic.Strong`
|
||||
Marks the token value as bold (e.g. for rst lexer).
|
||||
|
||||
`Generic.Subheading`
|
||||
Marks the token value as subheadline.
|
||||
|
||||
`Generic.Traceback`
|
||||
Marks the token value as a part of an error traceback.
|
||||
58
vendor/pygments/doc/docs/unicode.rst
vendored
Normal file
58
vendor/pygments/doc/docs/unicode.rst
vendored
Normal file
@@ -0,0 +1,58 @@
|
||||
=====================
|
||||
Unicode and Encodings
|
||||
=====================
|
||||
|
||||
Since Pygments 0.6, all lexers use unicode strings internally. Because of that
|
||||
you might encounter the occasional :exc:`UnicodeDecodeError` if you pass strings
|
||||
with the wrong encoding.
|
||||
|
||||
Per default all lexers have their input encoding set to `guess`. This means
|
||||
that the following encodings are tried:
|
||||
|
||||
* UTF-8 (including BOM handling)
|
||||
* The locale encoding (i.e. the result of `locale.getpreferredencoding()`)
|
||||
* As a last resort, `latin1`
|
||||
|
||||
If you pass a lexer a byte string object (not unicode), it tries to decode the
|
||||
data using this encoding.
|
||||
|
||||
You can override the encoding using the `encoding` or `inencoding` lexer
|
||||
options. If you have the `chardet`_ library installed and set the encoding to
|
||||
``chardet`` if will analyse the text and use the encoding it thinks is the
|
||||
right one automatically:
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
from pygments.lexers import PythonLexer
|
||||
lexer = PythonLexer(encoding='chardet')
|
||||
|
||||
The best way is to pass Pygments unicode objects. In that case you can't get
|
||||
unexpected output.
|
||||
|
||||
The formatters now send Unicode objects to the stream if you don't set the
|
||||
output encoding. You can do so by passing the formatters an `encoding` option:
|
||||
|
||||
.. sourcecode:: python
|
||||
|
||||
from pygments.formatters import HtmlFormatter
|
||||
f = HtmlFormatter(encoding='utf-8')
|
||||
|
||||
**You will have to set this option if you have non-ASCII characters in the
|
||||
source and the output stream does not accept Unicode written to it!**
|
||||
This is the case for all regular files and for terminals.
|
||||
|
||||
Note: The Terminal formatter tries to be smart: if its output stream has an
|
||||
`encoding` attribute, and you haven't set the option, it will encode any
|
||||
Unicode string with this encoding before writing it. This is the case for
|
||||
`sys.stdout`, for example. The other formatters don't have that behavior.
|
||||
|
||||
Another note: If you call Pygments via the command line (`pygmentize`),
|
||||
encoding is handled differently, see :doc:`the command line docs <cmdline>`.
|
||||
|
||||
.. versionadded:: 0.7
|
||||
The formatters now also accept an `outencoding` option which will override
|
||||
the `encoding` option if given. This makes it possible to use a single
|
||||
options dict with lexers and formatters, and still have different input and
|
||||
output encodings.
|
||||
|
||||
.. _chardet: http://chardet.feedparser.org/
|
||||
Reference in New Issue
Block a user