API Reference

WikiText

class wikitextparser.WikiText(string: Union[MutableSequence[str], str], _type_to_spans: Dict[str, List[List[int]]] = None)[source]

Bases: object

__call__(start: int, stop: Optional[int] = False, step: int = None) → str[source]

Return self.string[start] or self.string[start:stop].

Return self.string[start] if stop is False. Otherwise return self.string[start:stop:step].

__contains__(value: Union[str, WikiText]) → bool[source]

Return True if parsed_wikitext is inside self. False otherwise.

Also self and parsed_wikitext should belong to the same parsed wikitext object for this function to return True.

__delitem__(key: Union[slice, int]) → None[source]

Remove the specified range or character from self.string.

Note: If an operation involves both insertion and deletion, it’ll be safer to use the insert function first. Otherwise there is a possibility of insertion into the wrong spans.

__init__(string: Union[MutableSequence[str], str], _type_to_spans: Dict[str, List[List[int]]] = None) → None[source]

Initialize the object.

Set the initial values for self._lststr, self._type_to_spans.

Parameters:
  • string – The string to be parsed or a list containing the string of the parent object.
  • _type_to_spans – If the lststr is already parsed, pass its _type_to_spans property as _type_to_spans to avoid parsing it again.
__repr__() → str[source]

Return the string representation of self.

__setitem__(key: Union[slice, int], value: str) → None[source]

Set a new string for the given slice or character index.

Use this method instead of calling insert and del consecutively. By doing so only one of the _insert_update and _shrink_update functions will be called and the performance will improve.

__str__() → str[source]

Return self-object as a string.

static ancestors(type_: Optional[str] = None) → list[source]

Return [] (the root node has no ancestors).

comments

Return a list of comment objects.

Return a list of found external link objects.

Note:

Templates adjacent to external links are considered part of the link. In reality, this depends on the contents of the template:

>>> WikiText(
...    'http://example.com{{dead link}}'
...).external_links[0].url
'http://example.com{{dead link}}'
>>> WikiText(
...    '[http://example.com{{space template}} text]'
...).external_links[0].url
'http://example.com{{space template}}'
get_bolds(recursive=True) → List[wikitextparser._comment_bold_italic.Bold][source]

Return bold parts of self.

Parameters:recursive – if True also look inside templates, parser functions, extension tags, etc.
get_bolds_and_italics(*, recursive=True, filter_cls: type = None) → List[Union[wikitextparser._comment_bold_italic.Bold, wikitextparser._comment_bold_italic.Italic]][source]

Return a list of bold and italic objects in self.

This is faster than calling get_bolds and get_italics individually. :keyword recursive: if True also look inside templates, parser

functions, extension tags, etc.
Parameters:filter_cls – only return this type. Should be wikitextparser.Bold or wikitextparser.Italic. The default is None and means both bolds and italics.
get_italics(recursive=True) → List[wikitextparser._comment_bold_italic.Italic][source]

Return italic parts of self.

Parameters:recursive – if True also look inside templates, parser functions, extension tags, etc.
get_lists(pattern: Union[str, Tuple[str]] = ('\\#', '\\*', '[:;]')) → List[wikitextparser._wikilist.WikiList][source]

Return a list of WikiList objects.

Parameters:pattern

The starting pattern for list items. If pattern is not None, it will be passed to the regex engine, so remember to escape the * character. Examples:

  • ’#’ means top-level ordered lists
  • ’#*’ means unordred lists inside an ordered one
  • Currently definition lists are not well supported, but you
    can use ‘[:;]’ as their pattern.

Tips and tricks:

Be careful when using the following patterns as they will probably cause malfunction in the sublists method of the resultant List. (However don’t worry about them if you are not going to use the sublists or List.get_lists method.)
  • Use ‘*+’ as a pattern and nested unordered lists will be
    treated as flat.
  • Use ‘*s*’ as pattern to rtstrip items of the list.
get_sections(include_subsections=True, level=None) → List[wikitextparser._section.Section][source]

Return a list of sections in current wikitext.

The first section will always be the lead section, even if it is an empty string.

Parameters:
  • include_subsections – Only return the leading part of each section if False.
  • level – Only return sections where section.level == level. Return all levels if None (default).
get_tables(recursive=False) → List[wikitextparser._table.Table][source]

Return tables. Include nested tables if recursive is True.

get_tags(name=None) → List[wikitextparser._tag.Tag][source]

Return all tags with the given name.

insert(index: int, string: str) → None[source]

Insert the given string before the specified index.

This method has the same effect as self[index:index] = string; it only avoids some condition checks as it rules out the possibility of the key being an slice, or the need to shrink any of the sub-spans.

lists(pattern: str = None) → List[wikitextparser._wikilist.WikiList][source]

Deprecated, use self.get_lists instead.

parameters

Return a list of parameter objects.

static parent(type_: Optional[str] = None) → Optional[wikitextparser._wikitext.WikiText][source]

Return None (The parent of the root node is None).

parser_functions

Return a list of parser function objects.

pformat(indent: str = ' ', remove_comments=False) → str[source]

Return a pretty-print of self.string as string.

Try to organize templates and parser functions by indenting, aligning at the equal signs, and adding space where appropriate.

Note that this function will not mutate self.

plain_text(*, replace_templates=True, replace_parser_functions=True, replace_parameters=True, replace_tags=True, replace_external_links=True, replace_wikilinks=True, unescape_html_entities=True, replace_bolds_and_italics=True, _is_root_node=False) → str[source]

Return a plain text string representation of self.

Comments are always removed. :keyword replace_templates: Replace {{template|argument}} with ``. :keyword replace_parser_functions: Replace {{#if:a|y|n}} with ``. :keyword replace_parameters: Replace {{{a}}} with `` and {{{a|b}}}

with b.
Parameters:
  • replace_tags – Replace <s>text</s> with text.
  • replace_external_links – Replace [https://wikimedia.org/ wm] with wm, and [https://wikimedia.org/] with ``.
  • replace_wikilinks – Replace wikilinks with their text representation, e.g. [[a|b]] with b and [[a]] with a.
  • unescape_html_entities – Replace HTML entities like &Sigma;, &#931;, and &#x3a3; with Σ.
  • replace_bolds – replace ‘’’b’’’ with b.
  • replace_italics – replace ‘’i’’ with i.
sections

Return self.get_section(include_subsections=True).

span

Return the span of self relative to the start of the root node.

string

Return str(self). Support get, set, and delete operations.

getter and deleter: Note that this will overwrite the current string,
emptying any object that points to the old string.
tables

Return a list of all tables.

tags(name=None) → List[wikitextparser._tag.Tag][source]

Deprecated, use self.get_tags instead.

templates

Return a list of templates as template objects.

Return a list of wikilink objects.

SubWikiText

class wikitextparser._wikitext.SubWikiText(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None)[source]

Bases: wikitextparser._wikitext.WikiText

Define a class to be inherited by some subclasses of WikiText.

Allow to focus on a particular part of WikiText.

__init__(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None) → None[source]

Initialize the object.

ancestors(type_: Optional[str] = None) → List[wikitextparser._wikitext.WikiText][source]

Return the ancestors of the current node.

Parameters:type – the type of the desired ancestors as a string. Currently the following types are supported: {Template, ParserFunction, WikiLink, Comment, Parameter, ExtensionTag}. The default is None and means all the ancestors of any type above.
parent(type_: Optional[str] = None) → Optional[wikitextparser._wikitext.WikiText][source]

Return the parent node of the current object.

Parameters:type – the type of the desired parent object. Currently the following types are supported: {Template, ParserFunction, WikiLink, Comment, Parameter, ExtensionTag}. The default is None and means the first parent, of any type above.
Returns:parent WikiText object or None if no parent with the desired type_ is found.

SubWikiTextWithAttrs

class wikitextparser._tag.SubWikiTextWithAttrs(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None)[source]

Bases: wikitextparser._wikitext.SubWikiText

Define a class for SubWikiText objects that have attributes.

Any class that is going to inherit from SubWikiTextWithAttrs should provide _attrs_match property. Note that matching should be done on shadow. It’s usually a good idea to cache the _attrs_match property.

attrs

Return self attributes as a dictionary.

del_attr(attr_name: str) → None[source]

Delete all the attributes with the given name.

Pass if the attr_name is not found in self.

get_attr(attr_name: str) → Optional[str][source]

Return the value of the last attribute with the given name.

Return None if the attr_name does not exist in self. If there are already multiple attributes with the given name, only return the value of the last one. Return an empty string if the mentioned name is an empty attribute.

has_attr(attr_name: str) → bool[source]

Return True if self contains an attribute with the given name.

set_attr(attr_name: str, attr_value: str) → None[source]

Set the value for the given attribute name.

If there are already multiple attributes with the given name, only set the value for the last one. If attr_value == ‘’, use the implicit empty attribute syntax.

SubWikiTextWithArgs

class wikitextparser._parser_function.SubWikiTextWithArgs(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None)[source]

Bases: wikitextparser._wikitext.SubWikiText

Define common attributes for Template and ParserFunction.

arguments

Parse template content. Create self.name and self.arguments.

get_lists(pattern: Union[str, Iterable[str]] = ('\\#', '\\*', '[:;]')) → List[wikitextparser._wikilist.WikiList][source]

Return the lists in all arguments.

For performance reasons it is usually preferred to get a specific Argument and use the get_lists method of that argument instead.

name

Template’s name (includes whitespace).

getter: Return the name. setter: Set a new name.

nesting_level

Return the nesting level of self.

The minimum nesting_level is 0. Being part of any Template or ParserFunction increases the level by one.

Template

class wikitextparser.Template(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None)[source]

Bases: wikitextparser._parser_function.SubWikiTextWithArgs

Convert strings to Template objects.

The string should start with {{ and end with }}.

del_arg(name: str) → None[source]

Delete all arguments with the given then.

get_arg(name: str) → Optional[wikitextparser._argument.Argument][source]

Return the last argument with the given name.

Return None if no argument with that name is found.

has_arg(name: str, value: str = None) → bool[source]

Return true if the is an arg named name.

Also check equality of values if value is provided.

Note: If you just need to get an argument and you want to LBYL, it’s
better to get_arg directly and then check if the returned value is None.
normal_name(rm_namespaces=('Template', ), *, code: str = None, capitalize=False) → str[source]

Return normal form of self.name.

  • Remove comments.
  • Remove language code.
  • Remove namespace (“template:” or any of localized_namespaces.
  • Use space instead of underscore.
  • Remove consecutive spaces.
  • Use uppercase for the first letter if capitalize.
  • Remove #anchor.
Parameters:
  • rm_namespaces – is used to provide additional localized namespaces for the template namespace. They will be removed from the result. Default is (‘Template’,).
  • capitalize – If True, convert the first letter of the template’s name to a capital letter. See [[mw:Manual:$wgCapitalLinks]] for more info.
  • code – is the language code.
Example:
>>> Template(
...     '{{ eN : tEmPlAtE : <!-- c --> t_1 # b | a }}'
... ).normal_name(code='en')
'T 1'
rm_dup_args_safe(tag: str = None) → None[source]

Remove duplicate arguments in a safe manner.

Remove the duplicate arguments only in the following situations:
  1. Both arguments have the same name AND value. (Remove one of
    them.)
  2. Arguments have the same name and one of them is empty. (Remove
    the empty one.)
Warning: Although this is considered to be safe and no meaningful data
is removed from wikitext, but the result of the rendered wikitext may actually change if the second arg is empty and removed but the first had had a value.

If tag is defined, it should be a string that will be appended to the value of the remaining duplicate arguments.

Also see rm_first_of_dup_args function.

rm_first_of_dup_args() → None[source]

Eliminate duplicate arguments by removing the first occurrences.

Remove the first occurrences of duplicate arguments, regardless of their value. Result of the rendered wikitext should remain the same. Warning: Some meaningful data may be removed from wikitext.

Also see rm_dup_args_safe function.

set_arg(name: str, value: str, positional: bool = None, before: str = None, after: str = None, preserve_spacing: bool = True) → None[source]

Set the value for name argument. Add it if it doesn’t exist.

  • Use positional, before and after keyword arguments only when adding a new argument.
  • If before is given, ignore after.
  • If neither before nor after are given and it’s needed to add a new argument, then append the new argument to the end.
  • If positional is True, try to add the given value as a positional argument. Ignore preserve_spacing if positional is True. If it’s None, do what seems more appropriate.
templates

Return a list of templates as template objects.

ParserFunction

class wikitextparser.ParserFunction(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None)[source]

Bases: wikitextparser._parser_function.SubWikiTextWithArgs

parser_functions

Return a list of parser function objects.

Argument

class wikitextparser.Argument(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None, _parent: SubWikiTextWithArgs = None)[source]

Bases: wikitextparser._wikitext.SubWikiText

Create a new Argument Object.

Note that in MediaWiki documentation arguments are (also) called parameters. In this module the convention is: {{{parameter}}}, {{template|argument}}. See https://www.mediawiki.org/wiki/Help:Templates for more information.

__init__(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None, _parent: SubWikiTextWithArgs = None)[source]

Initialize the object.

name

Argument’s name.

getter: return the position as a string, for positional arguments. setter: convert it to keyword argument if positional.

positional

True if self is positional, False if keyword.

setter:
If set to False, convert self to keyword argumentn. Raise ValueError on trying to convert positional to keyword argument.
value

Value of self.

Support both keyword or positional arguments. getter:

Return value of self.
setter:
Assign a new value to self.

Parameter

class wikitextparser.Parameter(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None)[source]

Bases: wikitextparser._wikitext.SubWikiText

append_default(new_default_name: str) → None[source]

Append a new default parameter in the appropriate place.

Add the new default to the innter-most parameter. If the parameter already exists among defaults, don’t change anything.

Example:
>>> p = Parameter('{{{p1|{{{p2|}}}}}}')
>>> p.append_default('p3')
>>> p
Parameter("'{{{p1|{{{p2|{{{p3|}}}}}}}}}'")
default

The default value of current parameter.

getter: Return None if there is no default. setter: Set a new default value. deleter: Delete the default value, including the pipe character.

name

Current parameter’s name.

getter: Return current parameter’s name. setter: set a new name for the current parameter.

parameters

Return a list of parameter objects.

pipe

Return | if there is a pipe (default value) in the Parameter.

Return ‘’ otherwise.

Section

class wikitextparser.Section(*args, **kwargs)[source]

Bases: wikitextparser._wikitext.SubWikiText

__init__(*args, **kwargs)[source]

Initialize the object.

contents

Contents of this section.

getter: return the contents setter: Set contents to a new string value.

level

The level of this section.

getter: Return level which as an int in range(1,7) or 0 for the lead
section.

setter: Change the level.

title

The title of this section.

getter: Return the title or None for lead sections or sections that
don’t have any title.

setter: Set a new title. deleter: Remove the title, including the equal sign and the newline

after it.

Comment

class wikitextparser.Comment(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None)[source]

Bases: wikitextparser._wikitext.SubWikiText

comments

Return a list of comment objects.

contents

Return contents of this comment.

Table

class wikitextparser.Table(*args, **kwargs)[source]

Bases: wikitextparser._tag.SubWikiTextWithAttrs

__init__(*args, **kwargs)[source]

Initialize the object.

caption

Caption of the table. Support get and set.

caption_attrs

Caption attributes. Support get and set operations.

cells(row: int = None, column: int = None, span: bool = True) → Union[List[List[wikitextparser._cell.Cell]], List[wikitextparser._cell.Cell], wikitextparser._cell.Cell][source]

Return a list of lists containing Cell objects.

Parameters:
  • span – If is True, rearrange the result according to colspan and rospan attributes.
  • row – Return the specified row only. Zero-based index.
  • column – Return the specified column only. Zero-based index.

If both row and column are provided, return the relevant cell object.

If only need the values inside cells, then use the data method instead.

data(span: bool = True, strip: bool = True, row: int = None, column: int = None) → Union[List[List[str]], List[str], str][source]

Return a list containing lists of row values.

Parameters:
  • span – If true, calculate rows according to rowspans and colspans attributes. Otherwise ignore them.
  • row – Return the specified row only. Zero-based index.
  • column – Return the specified column only. Zero-based index.
  • strip – strip data values
Note: Due to the lots of complications that it may cause, this function
won’t look inside templates, parser functions, etc. See https://www.mediawiki.org/wiki/Extension:Pipe_Escape for how wiki-tables can be inserted within templates.
nesting_level

Return the nesting level of self.

The minimum nesting_level is 0. Being part of any Table increases the level by one.

Tag

class wikitextparser.Tag(*args, **kwargs)[source]

Bases: wikitextparser._tag.SubWikiTextWithAttrs

__init__(*args, **kwargs)[source]

Initialize the object.

contents

Tag contents. Support both get and set operations.

setter:
Set contents to a new value. Note that if the tag is self-closing, then it will be expanded to have a start tag and an end tag. For example: >>> t = Tag(‘<t/>’) >>> t.contents = ‘n’ >>> t.string ‘<t>n</t>’
get_tags(name=None) → List[wikitextparser._tag.Tag][source]

Return all tags with the given name.

name

Tag’s name. Support both get and set operations.

parsed_contents

Return the contents as a SubWikiText object.

WikiList

class wikitextparser.WikiList(string: Union[str, MutableSequence[str]], pattern: str, _match: Match[AnyStr] = None, _type_to_spans: Dict[str, List[List[int]]] = None, _span: List[int] = None, _type: str = None)[source]

Bases: wikitextparser._wikitext.SubWikiText

Class to represent ordered, unordered, and definition lists.

__init__(string: Union[str, MutableSequence[str]], pattern: str, _match: Match[AnyStr] = None, _type_to_spans: Dict[str, List[List[int]]] = None, _span: List[int] = None, _type: str = None) → None[source]

Initialize the object.

convert(newstart: str) → None[source]

Convert to another list type by replacing starting pattern.

fullitems

Return list of item strings. Includes their start and sub-items.

get_lists(pattern: Union[str, Iterable[str]] = ('\\#', '\\*', '[:;]')) → List[wikitextparser._wikilist.WikiList][source]

Return a list of WikiList objects.

Parameters:pattern

The starting pattern for list items. If pattern is not None, it will be passed to the regex engine, so remember to escape the * character. Examples:

  • ’#’ means top-level ordered lists
  • ’#*’ means unordred lists inside an ordered one
  • Currently definition lists are not well supported, but you
    can use ‘[:;]’ as their pattern.

Tips and tricks:

Be careful when using the following patterns as they will probably cause malfunction in the sublists method of the resultant List. (However don’t worry about them if you are not going to use the sublists or List.get_lists method.)
  • Use ‘*+’ as a pattern and nested unordered lists will be
    treated as flat.
  • Use ‘*s*’ as pattern to rtstrip items of the list.
items

Return items as a list of strings.

Don’t include sub-items and the start pattern.

level

Return level of nesting for the current list.

Level is a one-based index, for example the level for * a will be 1.

sublists(i: int = None, pattern: Union[str, Iterable[str]] = ('\\#', '\\*', '[:;]')) → List[wikitextparser._wikilist.WikiList][source]

Return the Lists inside the item with the given index.

Parameters:
  • i – The index if the item which its sub-lists are desired.
  • pattern – The starting symbol for the desired sub-lists. The pattern of the current list will be automatically added as prefix.

SubWikiText

class wikitextparser._comment_bold_italic.BoldItalic(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None)[source]

Bases: wikitextparser._wikitext.SubWikiText

text

Return text value of self (without triple quotes).

Bold

class wikitextparser.Bold(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None)[source]

Bases: wikitextparser._comment_bold_italic.BoldItalic

Italic

class wikitextparser.Italic(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None, end_token: bool = True)[source]

Bases: wikitextparser._comment_bold_italic.BoldItalic

__init__(string: Union[str, MutableSequence[str]], _type_to_spans: Optional[Dict[str, List[List[int]]]] = None, _span: Optional[List[int]] = None, _type: Union[str, int, None] = None, end_token: bool = True)[source]

Initialize the Italic object.

Parameters:end_token – set to True if the italic object ends with a ‘’ token False otherwise.