Changelog¶
v0.56.4 (2025-05-14)¶
Fixed a bug in detection of images in
plain_textmethod. (#141)Improved typehints.
v0.56.3 (2024-10-18)¶
Fixed a bug in detecting HTML tags nested in wiki markup. (#140)
Improved type hints.
v0.56.2¶
Fixed a bug in
external_linksproperty where|was recognized as part of the link by mistake. (#139)
v0.56.1¶
Fixed a bug in
get_sectionswhentop_levels_onlywasTrue.
v0.56.0¶
Drop Python 3.7 support.
v0.55.14¶
Fixed a bug in detecting the text of an external link. (#137)
v0.55.13¶
Fixed a bug in
Section.levelresulting in malformed section titles when multiple levels are added (#135)
v0.55.12¶
Performance improvements in extracting bold and italic nodes. (#133)
v0.55.11¶
Performance improvements in
__setitem__/__delitem__andpformat/plain_textmethods. (#131)
v0.55.10¶
Fixed a bug in
plain_textcausingIndexErrorwhen using a custom function to replacetemplates/parser_functions.
v0.55.9¶
Fixed a bug in
plain_textnot detecting images with multiple dots correctly. (#129)
v0.55.8¶
Fixed: Equal signs in extension tag attributes are no longer confused with name-value separator in arguments. (#128)
v0.55.7¶
Fixed a bug in
plain_text. (#126)Fixed another bug in parsing tables that end without a
|}mark. (#125)
v0.55.6¶
Fixed bug in parsing tables that end without a
|}mark. (#124)
v0.55.5¶
Fixed: regression in
plain_textnot being able to handle wikilinks only containing fragment/anchor, not title.
v0.55.4¶
plain_textmethod now uses a more accurate image-detection algorithm.
v0.55.3¶
Fixed and improved handling of tables and images in
plain_text(#122)
v0.55.0¶
Added:
top_levels_onlyargument toget_sections.Deprecated: Calling
get_sectionswith positional arguments is now deprecated.
v0.54.1¶
Fixed some bugs in
plain_textmethod. (#119, #120)Fixed bug in
get_tags. (#121)
v0.54.0¶
Fixed a bug in
WikiText.external_linksnot detecting external links inserted via overwriting a template string. (#74)The following already deprecated functions/parameters are removed:
Setting
Parameter.defaulttoNoneis not possible anymore. Usedel Parameter.defaultinstead.The default value for
preserve_spacingparameter ofTemplate.set_argis nowFalse. (It was deprecated to call this method without providing a value forpreserve_spacing)The
patternparameter ofWikiList.sublists,WikiList.get_listsandWikiText.get_listscannot beNoneanymore. Use the default value instead.WikiText.lists`andWikiText.tagsare removed. Useget_lists`orget_tagsinstead.
v0.53.0¶
Fixed a bug in
plain_text()/remove_markup, not being able to handle table with row/colspan. (#116)plain_text()will now include table captions.
v0.52.1¶
Fixed a syntax error for Python < 3.10.
v0.52.0¶
BREAKING CHANGE: dropping Python 3.6 support.
Fixed error in getting
plain_text()of emptied-out wikitext (#113)Deprecated: Calling
Template.set_arg()without specifying a value forpreserve_spacingparameter is deprecated. This is a temporary warning in preparation for changing the default value of this parameter fromTruetoFalse. (#111)Fixed the
stacklevelof warnings.New feature:
plain_text()replaces wiki-tables with a TSV string. (#115)
v0.51.2¶
Fixed a bug in detecting reverse pipe tricks as wikilinks.
v0.51.1¶
Fixed a bug in
WikiText.external_linkscausing external links within extension tags (e.g. ref tag) not to be detected when tag is inside a template/parser function/parameter. (#110)
v0.51.0¶
WikiText.get_listsnow correctly detects lists with a missing level (#70)WikiList.sublistsare now returned in sorted order.
v0.50.2¶
Fixed a bug in
WikiText.pformatwhich used to causeIndexErroron a parser function which had no argument, e.g. for{{FULLPAGENAMEE}}.
v0.50.0¶
Feat:
Tableobjects now haverow_attrsproperty.Fixed: Infinite loop on parsing tables containing
\r. (this is just to prevent infinite loop, CRLF line endings are not supported)
v0.49.4¶
Fixed: Handle empty tables instead of raising IndexError. (#107)
v0.49.3¶
Fixed an issue in handling of / in tags. (#108)
Fixed a false-positive detection of invalid external links. (#109)
v0.49.2¶
Fixed an issue in
Template.normal_name()causing IndexError on empty/invalid template names, e.g.{{Template:}}. (#105)
v0.49.1¶
Fixed a bug in
plain_text/remove_markupcausing duplicate values when replacing nested templates.
v0.49.0¶
Feature:
replace_templatesandreplace_parser_functionsparameters ofplain_text/remove_markupnow accept a function mappingTemplateorParserFuctionobjects to desired replacement string. (#103)
v0.48.3¶
Fixed a bug in
Tag.parsed_contentsmethod. (#102)
v0.48.2¶
Fixed a bug in
plain_textmethod. (#101)
v0.48.1¶
Fixed a bug in
pformatandplain_textmethods. (#100)
v0.48.0¶
BREAKING: dropped support for Python 3.5
Fixed: bug in handling of external links with uppercase scheme. (#99)
v0.47.9¶
Fix missing tables rows after comments (#98)
v0.47.8¶
Fixed: Templates titles cannot include wikilinks
Fixed: Detection of tags withing WikiLinks (#96)
v0.47.7¶
Fixed a bug in
Template.set_argcausing duplicate values. (#97)
v0.47.6¶
Fixed problem in detecting extension tags with uppercase letters in their names (#95)
v0.47.5¶
Fixed regex requirement for Python 3.5 on Windows platform.
v0.47.4¶
Fixed handling of external links within definition lists. (#91)
v0.47.3¶
Fixed a bug in
plain_textmethod, not handling self-closing tags correctly.
v0.47.2¶
Fixed a bug that was causing the parser to hang when parsing complicated nested tags.
v0.47.1¶
Fixed the order of items in
WikiList.fullitems. (#72)Fixed and improved a few edge cases in
Table.caption. (pr #81)Fixed handling of external links within definition lists. (pr #83)
Fixed a bug in parsing extension tags. (#90)
v0.47.0¶
MW variables are now recognized recognized as parser functions, not templates. (#69)
Fixed a bug in mutation of root element when a child was mutated. (#66)
Fixed a bug that was causing templates like
{{NAMESPACE|2}}to be detected as a parser function. It is a template if the first argument starts with a:.Fixed bugs in detecting attributes of table cells. (#71, #73)
Fixed a bug in detecting header cells in tables. (#77)
Fixed a bug in
get_tagswhere extension tags without attributes were not returned. (#84)Fixed a bug in
get_tablesmethod where tables within tag extensions were not recognized (#85)
v0.46.0¶
Fixed a bug in detection parser functions without parameters.
{{NAMESPACE}}used to be detected as template, but{{NAMESPACE:MediaWiki}}a parser function. Now both of them will be detected parser functions.
v0.45.3¶
Fix a bug in detecting external links within extension tags. (#65)
Fix a few bugs
plain_text/remove_markup. (#65)
v0.45.2¶
Detect unclosed comments, e.g.
<!== a.Fix parsing priority of tag extensions and comments. For example the comment in
<ref>b<!--c</ref>d-->used to be parsed as with<!--c</ref>d-->as comment which was incorrect.
v0.45.1¶
Fixed a catastrophic backtracking issue in parsing nested extension tags. (#60)
Fixed a bug in
Bold.textandItalic.text, failing to parse objects containing\n. (#61)
v0.45.0¶
Fixed a bug in parsing tags containing the
<character. (#58)Updated the list of known extension tags.
Improved detection of nested tag extensions, e.g. a
<ref>tag within<references>.
v0.44.1¶
Fixed a bug in
get_bolds_and_italicscausing it to return duplicate items in some situations. This was also causing an error inplain_textmethod. (#57)
v0.44.0¶
Fixed bug in matching header cells in
Table.cells. (#53)Add
Cell.is_headerproperty.
v0.43.2¶
Fixed a bug in detection of
Table.captionandTable.caption_attrs.
v0.43.1¶
Improve the performance of
get_bolds_and_italics(recursive=True, filter_cls=None).Fix a bug in
get_bolds_and_italics(recursive=False, filter_cls=None)which was causing it to return recursive Bold items.
v0.43.0¶
Remove the deprecated parameters of
Template.normal_name().Fix a bug in
get_bolds_and_italics()which was causing it to return onlyBolditems.
v0.42.3¶
Fix a bug in handling of comments in template names. (#54)
v0.42.2¶
Improve the handling of weird
colspanandrowspanvalues in tables. (#53)
v0.42.1¶
Fix a syntax error in Python 3.5.
v0.42.0¶
- BREAKING CHANGE:
Remove
replace_bolds/replace_italicsparams fromremove_markup/plain_textmethods. Users can use the newreplace_bolds_and_italicsparameter. Removing only bolds or only italics is no longer possible.
Add
get_bolds_and_italicsas a new method.Fixed bugs and rewrote the algorithm for finding
BoldandItalicobjects. (#51)
v0.41.0¶
Trying to mutate an overwritten/detached object will now raise
DeadIndexError(a subclass ofTypeError). Hopefully this will prevent some subtle late-appearing bugs.
v0.38.2¶
Fix a bug in
plaintextmethod.
v0.38.1¶
Fix a bug in detection of external links in parsable tag extensions. (#50)
v0.38.0¶
Fix a bug in handling of half-marked bold/italic, e.g.
'''bold\n.
v0.37.13¶
Fix a bug handling of half-marked bold/italic items e.g.
'''bold text\n.
v0.37.12¶
Improve handling of extension tags inside external links. (#49)
Ignore invalid attributes that do not start with space characters. (#48)
v0.37.11¶
Improved how invalid attributes (in html tags, tables, etc.) are handled. (#47)
v0.37.10¶
Fixed a bug in handling
<pre>tags. (#46)
v0.37.9¶
Fixed a bug in parsing tag attributes. (#44)
v0.37.8¶
Fixed handling of tags having different casings in start and end name, e.g.
<s></S>.Fix handling of extension tags.
Fixed a bug in
get_bolds/get_italicsresulting in duplicate items in returned values. It also was causing a subtle issue inplain_text/remove_markup, too. (#42)Fixed detection of parameters containing single braces.
v0.37.7¶
Fix handling of external links containing wikilinks.
v0.37.6¶
Fixed a bug in
plain_text/remove_markupcausing unexpectedly empty objects. (#40)
v0.37.5¶
Fixed some other bugs in
plain_text/remove_markupfunctions for:images containing wikitext
tags containing bold/italic items
nested tags
Fixed a bug in extracting sub-tags.
v0.37.4¶
Fixed a bug in Tag objects causing strange behaviour upon mutating a tag.
Fixed a bug in
plain_text/remove_markupfunctions, causing some objects that are expected to be removed, remain in the result. (#39)
v0.37.3¶
Fix syntax errors for python 3.5, 3.6, and 3.7.
v0.37.2¶
Fix a bug in getting the parser functions of a Template object.
v0.37.1¶
Fix a catastrophic backtracking issue for wikitexts containing html tags. (#37)
v0.37.0¶
Add
wikitextparser.remove_markupfunction andWikiText.plain_textmethod.Improve detection of parameters and wikilinks.
Add
get_boldsandget_italicsmethods.WikiLink.wikilinks,WikiList.get_lists(),Template.templates,Tag.get_tags(),ParserFunction.parser_functions, andParameter.parameterswon’t return objects equal toselfanymore, only sub-elements will be returned.Improve handling of comments within wikilinks.
WikiLink.text.setterno longer accepts None values. This was marked as deprecated since v0.25.0.Drop support for Python 3.4.
Remove the deprecated
pprintmethod. Users should usepformatinstead.Allow a tuple of patterns in
get_listandsublistsmethod. The defaultNoneis now deprecated and a tuple is used instead.
v0.36.0¶
Add a new parameter,
level, for theget_sectionsmethod.
v0.35.0¶
Fixed a rare bug in handling lists and template arguments when there is newline or a pipe inside a starting or closing tag.
Section.titlewill return None instead of''when the section does not have any title.
v0.34.0¶
Invoking the deleter of
Section.titlewon’t raise a RuntimeError anymore if the section does not have a title already.
v0.33.0¶
Add a deleter for
Section.titleproperty. (#32)
v0.32.0¶
Fixed a bug in
WikiText.get_lists()which was causing it to sometimes return items in an unordered fashion. (#31)
v0.31.0¶
Rename
WikiText.lists()method toWikiText.get_lists()and deprecate the old name.Add
get_sections()method withinclude_subsectionsparameter which allows getting section without including subsections. (#23)
v0.30.0¶
Fixed a bug in parsing wikilinks contianing
[.*](#29)Fixed: wikilinks are not allowed to be preceded by
[anymore.Rename
WikiText.tags()method toWikiText.get_tags()and deprecate the old name.
v0.29.2¶
Fix a bug in detecting the end-tag of two consecutive same-name tags. (#27)
v0.29.1¶
Properly exclude the
testpackage from the source distribution.
v0.29.0¶
Fix a regression in parsing some corner cases of nested templates. (#26)
The previously deprecated
WikiText.__getitem__now raises NotImplementedError.WikiText.__call__: Remove the deprecated support for start is None.
Optimize a little and use more robust algorithms.
v0.28.1¶
Implemented a workaround for a catastrophic backtracking condition when parsing tables. (#22)
v0.28.0¶
Add
get_tablesas a new method toWikiTextobjects. It allows extracting tables in a non-recursive manner.The
nesting_levelproperty was only meaningful for tables, templates, and parser functions, remove it from other types.
v0.27.0¶
Fix a bug in detecting nested tables. (#21)
Fix a few bug in detecting tables and template arguments.
Changed the
commentsproperty ofCommentobjects to return an empty list.Changed the
external_linksproperty ofExternalLinkobjects to return an empty list.
v0.26.1¶
Fix a bug in setting
Section.contentswhich only occurred when the title had trailing whitespace.Setting
Section.levelwill not overwriteSection.titleanymore.
v0.26.0¶
Define
WikiLink.titleproperty. It is similar toWikiLink.targetbut will not include the#fragment.
v0.25.1¶
Deprecate using None as the start value of
__call__.
v0.25.0¶
Added fragment property to
WikiLinkclass (#18)Added deleter method for
WikiLink.textproperty.Deprecated: Setting
WikiLink.texttoNone. Usedel WikiLink.textinstead.Added deleter method for
WikiLink.targetproperty.Added deleter method for
ExternalLink.textproperty.Added deleter method for
Parameter.defaultproperty.Deprecated: Setting
Parameter.defaulttoNone. Usedel Parameter.defaultinstead.Defined
WikiText.__call__to get a slice of wikitext as string.Deprecated
WikiText.__getitem__. UseWikiText.__call__orWikiText.stringinstead.
v0.24.4¶
Fixed a bug in
Tag.parsed_contents. (#19)
v0.24.3¶
Fixed a rarely occurring bug in detecting parameters with names consisting only of whitespace or underscores.
v0.24.2¶
Fixed a bug in detecting parser functions containing parameters.
v0.24.1¶
Fixed a bug in detecting table header cells that start with +, -, or }. (#17)
v0.24.0¶
Define deleter method for
WikiText.stringproperty and addTemplate.del_argmethod. (#14)Improve the
listsmethod ofTemplateandParserFunctionclasses. (#15)Fixed a bug in detection of multiline arguments. (#13)
Deprecated
capital_linksparameter ofTemplate.normal_name. Usecapitalizeinstead (keyword-only argument).Deprecated the
codeparameter ofTemplate.normal_nameas a positional argument deprecate. It’s now a keyword-only argument.
v0.23.0¶
Fixed a bug in
Sectionobjects that was causing them to return the properties of the whole page (#15).Removed the deprecated attribute access methods. The following deprecated methods accessible on
TableandTagobjects, have been removed:.has,.get,.set. Use.has_attr,.get_attr,.set_attrinstead.Fixed a bug in
set_attrmethod.Removed the deprecated
Table.getdatamethod. UseTable.datainstead.Removed the deprecated
Table.getrdata(row_num)method. UseTable.data(row=row_num)instead.Removed the deprecated
Table.getcdata(col_num)method. UseTable.data(col=col_num)instead.Removed the deprecated
Table.table_attrsproperty. UseTable.attrsor other attribute-related methods instead.
v0.22.1¶
Fixed MemoryError caused by very long or unclosed comment tags (issue #12)
v0.22.0¶
Change the behaviour of external_links property to never return Templates or parser functions as part of the external link.
Add support for literal IPv6 external links, e.g. https://[2001:db8:85a3:8d3:1319:8a2e:370:7348]:443/.
Fixed: Do not mistake the equal signs of section titles for template keyword arguments.
v0.21.5¶
Fixed Invalid escape sequences for Python 3.6.
Added
msg,msgnw,raw,safesubst, andsubstto known parser function identifiers.
v0.21.4¶
Fixed a bug in Table.data (issue #9)
v0.21.3¶
Fixed: A bug in processing
Sectionobjects.
v0.21.2¶
Fixed: A bug in
external_links(the starting position must now be a word boundary; previously this condition was not checked)
v0.21.1¶
Fixed: A bug in
external_links(external links withing sub-templates are now detected correctly; previously they were ignored)
v0.21.0¶
Changed: The order of results, now everything is sorted by its starting position.
Fixed: Bug in
ancestorsandparentmethods
v0.20.0¶
Added:
parentandancestorsmethodsAdded:
__version__to__init__.py
v0.19.0¶
Removed: Support for Python 3.3
Fixed: Handling of comments and tags in section titles
v0.18.0¶
Changed: Add an underscore prefix to private internal modules names
Changed: Moved test modules to a different directory
Changed: Templates adjacent to external links are now treated as part of the link
Fixed: A bug in handling tag extensions withing parser functions
Fixed: A minor bug in Template.set_arg
Changed: ExternalLink.text: Return None if the link is not within brackets
Fixed: Handling of comments and templates in external links