API Documentation

Documentation of classes and methods.

Matcher

class iamsystem.Matcher(tokenizer: ~iamsystem.tokenization.api.ITokenizer = <iamsystem.tokenization.tokenize.TokenizerImp object>, stopwords: ~iamsystem.stopwords.api.IStopwords[~iamsystem.tokenization.api.TokenT] = None)[source]

Bases: IMatcher[TokenT]

Main public API to perform semantic annotation (aka entity linking) with iamsystem algorithm.

__init__(tokenizer: ~iamsystem.tokenization.api.ITokenizer = <iamsystem.tokenization.tokenize.TokenizerImp object>, stopwords: ~iamsystem.stopwords.api.IStopwords[~iamsystem.tokenization.api.TokenT] = None)[source]

Create an IAMsystem matcher to annotate documents.

Parameters

tokenizer – default french_tokenizer(). A ITokenizer instance responsible for tokenizing and normalizing.
stopwords – provide a IStopwords. If None, default to Stopwords.

add_fuzzy_algo(fuzzy_algo: FuzzyAlgo[TokenT]) → None[source]

Add a fuzzy algorithms to provide synonym(s) that helps matching: a token of a document and a token of a keyword.

Parameters: fuzzy_algo – a FuzzyAlgo instance.
Returns: None.

add_keyword(keyword: IKeyword) → None[source]

Add a keyword to find in a document.

Parameters: keyword – IKeyword to search in a document.
Returns: None.

add_keywords(keywords: Iterable[IKeyword]) → None[source]

Utility function to add multiple keywords.

Parameters: keywords – IKeyword to search in a document.
Returns: None.

add_labels(labels: Iterable[str]) → None[source]

Utility function to call ‘add_keywords’ by providing a list of labels, IKeyword instances are created and added.

Parameters: labels – the labels (keywords) to be searched in the document.
Returns: None.

add_stopwords(words: Iterable[str]) → None[source]

Add words (tokens) to be ignored in IKeyword and in documents.

Parameters: words – a list of words to ignore.
Returns: None.

annot_text(text: str, w: int = 1) → List[Annotation[TokenT]][source]

Annotate a document.

Parameters

text – the document to annotate.
w – Window. How much discontinuous keyword’s tokens to find can be. By default, w=1 means the sequence must be continuous. w=2 means each token can be separated by another token.

Returns

a list of Annotation.

annot_tokens(tokens: Sequence[TokenT], w: int) → List[Annotation[TokenT]][source]

Annotate a sequence of tokens.

Parameters

tokens – an ordered or unordered sequence of tokens.
w – Window. How much discontinuous keyword’s tokens to find can be. By default, w=1 means the sequence must be continuous. w=2 means each token can be separated by another token.
remove_nested_annots – if two annotations overlap, remove the shorter one.

Returns

a list of Annotation.

property fuzzy_algos: Iterable[FuzzyAlgo[TokenT]]

The fuzzy algorithms used by the algorithm.

Returns: FuzzyAlgo instances responsible for finding possible synonyms for each token of a document.

get_keywords_unigrams() → Set[str][source]: Get all the unigrams (single words excluding stopwords) in the keywords.

get_synonyms(tokens: Sequence[TokenT], i: int, w_states: List[List[IState]]) → Iterable[Tuple[Tuple[str, ...], List[str]]][source]

Get synonyms of a token with configured fuzzy algorithms.

Parameters

tokens – document’s tokens.
i – the ith token for which synonyms are expected.
w_states – algorithm’s states.

Returns

tuples of synonyms and fuzzy algorithm’s names.

is_stopword(word: str) → bool[source]: Return True if word is a stopword.

is_token_a_stopword(token: TokenT) → bool[source]

Check if a token is a stopword.

Parameters: token – a generic token that implements IToken.
Returns: True if the token is a stopword.

property keywords: Collection[IKeyword]: Return the keywords added.

property remove_nested_annots: bool

whether to remove nested annotations. Default to True.

Type: Matcher config

tokenize(text: str) → Sequence[TokenT][source]

Tokenize a text with the tokenizer’s instance.

Parameters: text – a document or a keyword.
Returns: A sequence of tokens, the type depends on the tokenizer but must implement IToken protocol.

Annotation

class iamsystem.Annotation(tokens_states: Sequence[TransitionState[TokenT]])[source]

Bases: Span[TokenT]

Ouput class of Matcher storing information about linked entities.

end: int

get_tokens_algos() → Iterable[Tuple[TokenT, List[str]]][source]

Get each token and the list of fuzzy algorithms that matched it.

Returns: an iterator of tuples (token0, [‘algo1’,…]) where token0 is a token and [‘algo1’,…] a list of fuzzy algorithms.

property keywords: Sequence[IKeyword]: The linked entities, IKeyword instances that matched a document’s tokens.

label: str

norm_label: str

start: int

to_brat_format() → str

Get Brat offsets format. See https://brat.nlplab.org/standoff.html ‘The start-offset is the index of the first character of the annotated span in the text (“.txt” file), i.e. the number of characters in the document preceding it. The end-offset is the index of the first character after the annotated span.’

Returns: a string format of tokens’ offsets

to_dict(text: str = None) → Dict[str, Any][source]

Return a dictionary representation of this object.

Parameters: text – the document from which this annotation comes from. Default to None.
Returns: A dictionary of relevant attributes.

to_string(text: str = None, debug=False) → str[source]

Get a default string representation of this object.

Parameters

text – the document from which this annotation comes from. Default to None. If set, add the document substring: text[ first-token-start-offset : last-token-end-offset].
debug – default to False. If True, add the sequence of tokens and fuzzyalgo names.

Returns

a concatenated string of ‘keywords’ ‘start’ ‘end’ ‘substring’? ‘debug_info’?

property tokens: Sequence[TokenT]

The tokens of the document that matched the keywords attribute of this instance.

Returns: an ordered sequence of TokenT, a generic type that implements IToken.

rm_nested_annots

iamsystem.rm_nested_annots(annots: List[Annotation], keep_ancestors=False)[source]

In case of two nested annotations, remove the shorter one. For example, if we have “prostate” and “prostate cancer” annnotations, “prostate” annotation is removed.

Parameters

annots – a list of annotations.
keep_ancestors – Default to False. Whether to keep the nested annotations that are ancestors and remove only other cases.

Returns

a filtered list of annotations.

replace_annots

iamsystem.replace_annots(text: str, annots: Sequence[Annotation], new_labels: Sequence[str])[source]

Replace each annotation in a document (text parameter) by a new label. Warning: an annotation is ignored if overlapped by another one.

Parameters

text – the document from which the annotations come from.
annots – an ordered sequence of annotation.
new_labels – one new label per annotation, same length as annots expected.

Returns

a new document.

Keyword and subclasses

IKeyword

class iamsystem.IKeyword(*args, **kwargs)[source]

Bases: Protocol

A string to search in a document (ex: “heart failure”).

__init__(*args, **kwargs)

get_kb_id()[source]: Get the knowledge base id of this keyword.

label: str

Keyword

class iamsystem.Keyword(label: str)[source]

Bases: IKeyword

Base class to search keywords in a document.

__init__(label: str)[source]

Create a keyword.

Parameters: label – a string to search in a document (ex: “heart failure”).

get_kb_id()[source]

Get the knowledge base id of this keyword. It returns the label if this method is not overriden in the subclass.

Returns: A unique identifier.

Term

class iamsystem.Term(label: str, code: str)[source]

Bases: Keyword

This class represents a term in a particular domain where each keyword is associated to a unique identifier called a code.

__init__(label: str, code: str)[source]

Create a term.

Parameters

label – a string to search in a document (ex: “heart failure”).
code – the code associated to this keyword.

get_kb_id()[source]: returns the code of this term.

Terminology

class iamsystem.Terminology[source]

Bases: IStoreKeywords

A utility class to store a set of keywords.

add_keyword(keyword: IKeyword) → None[source]

Add a keyword.

Parameters: keyword – a IKeyword or a subclass.
Returns: None

add_keywords(keywords: Iterable[IKeyword]) → None[source]

Add multiple keywords.

Parameters: keywords – a IKeyword or a subclass.
Returns: None

get_unigrams(tokenizer: ITokenizer, stopwords: IStopwords) → Set[str][source]: Get all the unigrams (single words excluding stopwords) in the keywords.

property keywords: Collection[IKeyword]: Get the collection of keywords.

property size: int: Get the number of keywords.

Tokenization

IOffsets

class iamsystem.IOffsets(*args, **kwargs)[source]

Bases: Protocol

Offsets interface. Default implementation Offsets.

Offsets

class iamsystem.Offsets(start: int, end: int)[source]

Bases: IOffsets

Store the start and end offsets of a token.

__init__(start: int, end: int)[source]

Parameters

start – start-offset is the index of the first character of the annotated span.
end – end-offset is the index of the first character after the annotated span.

IToken

class iamsystem.IToken(*args, **kwargs)[source]

Bases: IOffsets, Protocol

Token interface. Default implementation Token

Token

class iamsystem.Token(start: int, end: int, label: str, norm_label: str)[source]

Bases: Offsets, IToken

Store the label, normalized label, start and end offsets of a token.

__init__(start: int, end: int, label: str, norm_label: str)[source]

Create a token.

Parameters

start – start-offset is the index of the first character of the annotated span.
end – end-offset is the index of the first character after the annotated span.
label – the label as it is in the document.
norm_label – the normalized label (used by iamsystem’s algorithm to perform entity linking).

ITokenizer

class iamsystem.ITokenizer(*args, **kwargs)[source]

Bases: Protocol[TokenT]

Tokenizer Interface. Default implementation TokenizerImp.

tokenize(text: str) → Sequence[TokenT][source]

Tokenize a string.

Parameters: text – an unormalized string.
Returns: A sequence of generic type (TokenT) that implements IToken protocol.

TokenizerImp

class iamsystem.TokenizerImp(split: Callable[[str], Iterable[IOffsets]], normalize: Callable[[str], str])[source]

Bases: ITokenizer[Token]

A ITokenizer implementation. Class responsible for the tokenization, normalization of tokens. See also french_tokenizer(), english_tokenizer().

__init__(split: Callable[[str], Iterable[IOffsets]], normalize: Callable[[str], str])[source]

Create a custom tokenizer that splits and normalizes a string.

Parameters

split – a function that split a text into (start,end) tuples. This function must return an iterable of IOffsets . See also split_find_iter_closure().
normalize – a function that normalizes a string. This function must return a string.

tokenize(text: str) → Sequence[Token][source]: Split the text into a sequence of Token.

english_tokenizer

iamsystem.english_tokenizer() → TokenizerImp[source]

An opinionated English tokenizer.: It splits the text by ‘word’ character.

It normalizes by lowercasing.

Returns: a TokenizerImp implementation.

french_tokenizer

iamsystem.french_tokenizer() → TokenizerImp[source]

An opinionated French tokenizer.: It splits the text by ‘word’ character.

It normalizes by lowercasing and unicode normalization form.

Returns: a TokenizerImp implementation.

Build a custom split function

iamsystem.split_find_iter_closure(pattern: str) → Callable[[str], Iterable[IOffsets]][source]

Build a split function that maps a document to (start, end) tuples.

Parameters: pattern – a regex to split sentence characters.
Returns: a split function.

Order tokens

iamsystem.tokenize_and_order_decorator(tokenize: Callable[[str], Sequence[TokenT]]) → Callable[[str], Sequence[TokenT]][source]

Decorate a tokenize function: the tokens are sorted alphabetically by their label.

Parameters: tokenize – a tokenize function to decorate.
Returns: the decorated tokenize function.

Stopwords classes

IStopwords

class iamsystem.IStopwords(*args, **kwargs)[source]

Bases: Protocol[TokenT]

Stopwords Interface.

is_token_a_stopword(token: TokenT) → bool[source]

Check if a token is a stopword.

Parameters: token – a generic Token that implements IToken protocol.
Returns: true if this token is a stopword.

Stopwords

class iamsystem.Stopwords(stopwords: Optional[Iterable[str]] = None)[source]

Bases: SimpleStopwords[TokenT]

A simple implementation of IStopwords protocol.

add(words: Iterable[str]) → None[source]

Add stopwords.

Parameters: words – a list of string.
Returns: None

is_stopword(word: str) → bool[source]: True if, after lowercasing, the word belongs to the stopwords set

property stopwords: Get the set of stopwords.

NegativeStopwords

class iamsystem.NegativeStopwords(words_to_keep: Optional[Iterable[str]] = None)[source]

Bases: IStopwords[TokenT]

Like a negative image (a total inversion, in which light areas appear dark and vice versa), every token is a stopword until proven otherwise.

add_fun_is_a_word_to_keep(fun: Callable[[TokenT], bool]) → None[source]

Add a function that checks if a word should be kept.

Parameters: fun – a Callable that takes a token as a parameter and returns a boolean.
Returns: None.

add_words(words_to_keep: Iterable[str]) → None[source]

Add words not to be ignored.

Parameters: words_to_keep – a list of string.
Returns: None

is_token_a_stopword(token: TokenT) → bool[source]

Check if it’s not token to keep.

Parameters: token – a token.
Returns: False if the token’s lowercase belongs to the set of word to keep or if a function add_fun_is_a_word_to_keep() returns True.

NoStopwords

class iamsystem.NoStopwords(*args, **kwargs)[source]

Bases: SimpleStopwords[TokenT]

Utility class. Class to use when no stopwords are used.

is_stopword(word: str) → bool[source]: Return False.

is_token_a_stopword(token: TokenT) → bool[source]: Return False.

Fuzzy algorithms

Abstract Base classes

FuzzyAlgo

class iamsystem.FuzzyAlgo(name: str)[source]

Bases: Generic[TokenT], ABC

Fuzzy Algorithm base class.

NO_SYN: Iterable[Tuple[str, ...]] = []: Default value to return by a fuzzy algorithm if no synonym found.

abstract get_synonyms(tokens: Sequence[TokenT], i: int, w_states: List[List[IState]]) → Iterable[Tuple[Tuple[str, ...], str]][source]

Main API function to retrieve all synonyms provided by a fuzzy algorithm.

Parameters

tokens – the sequence of tokens of the document. Useful when the fuzzy algorithm needs context, namely the tokens around the token of interest given by ‘i’ parameter.
i – the ith token of this sequence for which synonyms are expected.
w_states – the states in which the algorithm currently is. Useful is the fuzzy algorithm needs to know the current states and the possible state transitions.

Returns

0 to many synonyms (SynAlgo type).

static word_to_syn(word: str) → Tuple[str, ...][source]

Utility function to transform a string to expected SynType.

Parameters: word – a word synonym produced by the algorithm. Ex: word=’insuffisance’ for token ‘ins’.
Returns: SynType, the expected output format.

static words_seq_to_syn(words: Sequence[str]) → Tuple[str, ...][source]

Utility function to transform a sequence of string to the expected output type.

Parameters: words – a sequence of words produced by the algorithm. Ex: words=[‘insuffisance’, ‘cardiaque’] for the token ‘ic’.
Returns: SynType, the expected output format.

ContextFreeAlgo

class iamsystem.ContextFreeAlgo(name: str)[source]

Bases: FuzzyAlgo[TokenT], ABC

A FuzzyAlgo that doesn’t take into account context, only the current token.

get_synonyms(tokens: Sequence[TokenT], i: int, w_states: List[List[IState]]) → Iterable[Tuple[Tuple[str, ...], str]][source]: Delegate to get_syns_of_token.

abstract get_syns_of_token(token: TokenT) → Iterable[Tuple[str, ...]][source]: Returns synonyms of this token.

NormLabelAlgo

class iamsystem.NormLabelAlgo(name: str)[source]

Bases: ContextFreeAlgo[TokenT], INormLabelAlgo, ABC

A FuzzyAlgo that uses only the normalized label of a token. These fuzzy algorithms can be put in cache to avoid calling them multiple times. See CacheFuzzyAlgos.

get_syns_of_token(token: TokenT) → Iterable[Tuple[str, ...]][source]: Delegate to get_syns_of_word.

abstract get_syns_of_word(word: str) → Iterable[Tuple[str, ...]][source]: Returns synonyms of this word (e.g. the normalized label of a token).

CacheFuzzyAlgos

class iamsystem.CacheFuzzyAlgos(name: str = 'Cache')[source]

Bases: FuzzyAlgo, Generic[TokenT]

A FuzzyAlgo that provides a cache for NormLabelAlgo algorithms. Since these algorithms don’t depend on context, their output can be cached to avoid calling them multiple times.

add_algo(algo: INormLabelAlgo) → None[source]: Add NormLabelAlgo.

empty_cache() → None[source]: Empty the cache. Done automatically when an algorithm is added.

get_synonyms(tokens: Sequence[IToken], i: int, w_states: List[List[IState]]) → List[Tuple[Tuple[str, ...], str]][source]: Implements superclass abstract method.

get_syns_of_word(word: str) → List[Tuple[Tuple[str, ...], str]][source]: Retrieve all synonyms of fuzzy algorithms from cache or by calling them once.

property max_nb_of_words: The maximum number of words to put in cache. Default 100.000 words

Abbreviations

class iamsystem.Abbreviations(name: str, token_is_an_abbreviation: ~typing.Callable[[~iamsystem.tokenization.api.TokenT], bool] = <function Abbreviations.<lambda>>)[source]

Bases: ContextFreeAlgo[TokenT], INormLabelAlgo, ABC

A FuzzyAlgo to handle abbreviations. This class doesn’t take into account the context of a document to return a long form.

__init__(name: str, token_is_an_abbreviation: ~typing.Callable[[~iamsystem.tokenization.api.TokenT], bool] = <function Abbreviations.<lambda>>)[source]

Create an instance to store abbreviations.

Parameters

name – a name given to this algorithm. (ex: ‘medical abbs’)
token_is_an_abbreviation – a function that verify if a token is an abbreviation (ex: checks all letters are uppercase). The function is called before the dictionary look-up is performed to retrieve long forms. Default: no checks performed, the function returns always true.

add(short_form: str, long_form: str, tokenizer: ITokenizer) → None[source]

Add an abbreviation.

Parameters

short_form – an abbreviation short form (ex: CHF).
long_form – an abbreviation long form. (ex: congestive heart failure).
tokenizer – a ITokenizer to tokenize the long form. It is recommanded to use your Matcher tokenizer.

Returns

None.

add_tokenized_long_form(short_form, long_form: Sequence[str]) → None[source]: Add an abbreviation already tokenized.

get_syns_of_token(token: TokenT) → Iterable[Tuple[str, ...]][source]: Return the abbreviation long form(s).

get_syns_of_word(word: str) → Iterable[Tuple[str, ...]][source]: Return the abbreviation long form(s).

FuzzyRegex

class iamsystem.FuzzyRegex(algo_name: str, pattern: str, pattern_name: str)[source]

Bases: ContextFreeAlgo, INormLabelAlgo

A FuzzyAlgo to handle regular expressions. Useful when one or multiple tokens of a keyword need to be matched to a regular expression.

get_syns_of_token(token: TokenT) → Iterable[Tuple[str, ...]][source]: Return the pattern_name if this token matches the regular expression.

get_syns_of_word(word: str) → Iterable[Tuple[str, ...]][source]: Return the pattern_name if this word matches it.

replace_pattern_in_keyword(keyword: IKeyword, tokenizer: ITokenizer) → IKeyword[source]: Utility function to replace keyword’s tokens that match the pattern by the pattern name.

token_matches_pattern(token: TokenT) → bool[source]: Return True if this token matches this instance’s pattern.

WordNormalizer

class iamsystem.WordNormalizer(name: str, norm_fun: Callable[[str], str])[source]

Bases: NormLabelAlgo

A FuzzyAlgo to handle normalization techniques such as stemming and lemmatization.

add_words(words: Iterable[str]) → None[source]

A list of possible word synonyms, in general all the tokens of your keywords. An easy way to provide these tokens is to call get_keywords_unigrams() of the matcher.

Parameters: words – A list of words to normalize and store.
Returns: None.

get_syns_of_word(word: str) → Iterable[Tuple[str, ...]][source]

Return all the words that have the same normalized form of this word

For example, if the normalize function is an english stemmer, and you provided add_words=[“eating”], this instance stored the stem “eat” associated to the word “eating”. Then, if a document contains the token “eats”, since the stem is the same, this function returns the synonym “eating”.

Parameters: word – a string, i.e. a word from a document.
Returns: word synonyms and algorithm name.

SpellWise

SpellWiseWrapper

class iamsystem.SpellWiseWrapper(spellwise_algo: ESpellWiseAlgo, max_distance: int, min_nb_char: int = 5, name: str = None)[source]

Bases: NormLabelAlgo

A FuzzyAlgo that wraps an algorithm from the spellwise library.

add_words(words: Iterable[str], warn=False) → None[source]

A list of possible word synonyms, in general all the tokens of your keywords. An easy way to provide these tokens is to call get_keywords_unigrams() method after you added your keywords to the matcher instance.

Parameters

words – A list of possible synonyms.
warn – raise a warning if a word added is ignored. Default False.

Returns

None.

add_words_to_ignore(words: Iterable[str])[source]: Add words that the algorithm will ignore: no string distance will be computed.

get_syns_of_word(word: str) → Iterable[Tuple[str, ...]][source]: Returns closest words if this the word is not a word to ignore.

property max_distance: Maximum edit distance (see spellwise documentation).

property min_nb_char: The minimum number of characters a word must have not to be ignored.

ESpellWiseAlgo

class iamsystem.ESpellWiseAlgo(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Enumerated list of spellwise library algorithms. See spellwise documentation for more information.

CAVERPHONE_1 = <class 'spellwise.algorithms.caverphone_one.CaverphoneOne'>

CAVERPHONE_2 = <class 'spellwise.algorithms.caverphone_two.CaverphoneTwo'>

EDITEX = <class 'spellwise.algorithms.editex.Editex'>

LEVENSHTEIN = <class 'spellwise.algorithms.levenshtein.Levenshtein'>

SOUNDEX = <class 'spellwise.algorithms.soundex.Soundex'>

TYPOX = <class 'spellwise.algorithms.typox.Typox'>

Brat

BratDocument

class iamsystem.BratDocument[source]

Bases: object

Class representing a Brat Document containing Brat’s annotations, namely Brat Entity and Brat Note in this package. A BratDocument should be linked to a single text document. Entities and notes can be serialized in a text file with ‘ann’ extension, one per line. See https://brat.nlplab.org/standoff.html

add_annots(annots: List[Annotation], text: str, keyword_attr: str = None, brat_type: str = None) → None[source]

Add iamsystem annotations to convert them to Brat format.

Parameters

annots – a list of Annotation, Matcher output.
text – the document from which these annotations comes from.
keyword_attr – the attribute name of a IKeyword that stores brat_type. Default to None. If None, brat_type parameter must be used.
brat_type – A string, the Brat entity type for all these annotations. Default to None. If None, keyword_attr parameter must be used.

Returns

None

add_entity(brat_type: str, offsets: List[IOffsets], text: str) → None[source]

Add a Brat Entity.

Parameters

brat_type – A Brat entity type (see Brat documentation).
offsets – a list of (start,end) annotation offsets. See IOffsets. A list is expected since the tokens can be discontinuous.
text – document substring using (start,end) offsets (not the document itself).

Returns

None

entities_to_string() → str[source]: Brat entities in the Brat format ready to be serialized to ‘.ann’ text file.

get_entities() → Iterable[BratEntity][source]: An iterable of Brat entities.

get_notes() → Iterable[BratNote][source]: An iterable of Brat notes.

notes_to_string() → str[source]: Brat notes in the Brat format ready to be serialized to ‘.ann’ text file.

BratEntity

class iamsystem.BratEntity(entity_id: str, brat_type: str, offsets: Sequence[IOffsets], text: str)[source]

Bases: object

Class representing a Brat Entity. https://brat.nlplab.org/standoff.html: ‘Each entity annotation has a unique ID and is defined by type (e.g. Person or Organization). and the span of characters containing the entity mention (represented as a “start end” offset pair).’

Format: ID TYPE START END[;START END]* TEXT.

__init__(entity_id: str, brat_type: str, offsets: Sequence[IOffsets], text: str)[source]

Create a Brat Entity.

Parameters

entity_id – a unique ID (^T[0-9]+$).
brat_type – A Brat entity type (see Brat documentation).
offsets – (start,end) annotation offsets. See IOffsets.
text – document substring using (start,end) offsets.

BratNote

class iamsystem.BratNote(note_id: str, ref_id: str, note: str)[source]

Bases: object

Class representing a Brat Note. https://brat.nlplab.org/standoff.html Brat notes are used to store additionnal information on a detected entity. Format: #ID TYPE REFID NOTE

__init__(note_id: str, ref_id: str, note: str)[source]

Create a Brat Note.

Parameters

note_id – a unique ID (^#[0-9]+$)
ref_id – a unique ID. For a BratEntity, the format is (^T[0-9]+$)
note – any string comment.

TYPE = 'IAMSYSTEM': BratNote type. Replace by ‘AnnotatorNotes’ to be human writable in Brat interface

BratWriter

class iamsystem.BratWriter[source]

Bases: object

Utility class to write IAMsystem annotations in Brat format to a text file.

classmethod saveEntities(brat_entities: Iterable[BratEntity], write: Callable[[str], Any]) → None[source]

Write Brat entities.

Parameters

brat_entities – an iterable of Brat entities.
write – a write function (ex: f.write from ‘with(open(filename, ‘w’)) as f:’)

Returns

None

classmethod saveNotes(brat_notes: Iterable[BratNote], write: Callable[[str], Any]) → None[source]

Write Brat notes.

Parameters

brat_notes – an iterable of Brat notes.
write – a write function ex: f.write from ‘with(open(filename, ‘w’)) as f:

API Documentation

Matcher

Annotation

rm_nested_annots

replace_annots

Keyword and subclasses

IKeyword

Keyword

Term

Terminology

Tokenization

IOffsets

Offsets

IToken

Token

ITokenizer

TokenizerImp

english_tokenizer

french_tokenizer

Build a custom split function

Order tokens

Stopwords classes

IStopwords

Stopwords

NegativeStopwords

NoStopwords

Fuzzy algorithms

Abstract Base classes

FuzzyAlgo

ContextFreeAlgo

NormLabelAlgo

CacheFuzzyAlgos

Abbreviations

FuzzyRegex

WordNormalizer

SpellWise

SpellWiseWrapper

ESpellWiseAlgo

Brat

BratDocument

BratEntity

BratNote

BratWriter

spaCy

IAMsystemSpacy

TokenSpacyAdapter

IsStopSpacy

SpacyTokenizer