API Documentation

Documentation of classes and methods.

Matcher

class iamsystem.Matcher(tokenizer: ~iamsystem.tokenization.api.ITokenizer = <iamsystem.tokenization.tokenize.TokenizerImp object>, stopwords: ~iamsystem.stopwords.api.IStopwords[~iamsystem.tokenization.api.TokenT] = None)[source]

Bases: IMatcher[TokenT]

Main public API to perform semantic annotation (aka entity linking) with iamsystem algorithm.

__init__(tokenizer: ~iamsystem.tokenization.api.ITokenizer = <iamsystem.tokenization.tokenize.TokenizerImp object>, stopwords: ~iamsystem.stopwords.api.IStopwords[~iamsystem.tokenization.api.TokenT] = None)[source]

Create an IAMsystem matcher to annotate documents.

Parameters
add_fuzzy_algo(fuzzy_algo: FuzzyAlgo[TokenT]) None[source]
Add a fuzzy algorithms to provide synonym(s) that helps matching

a token of a document and a token of a keyword.

Parameters

fuzzy_algo – a FuzzyAlgo instance.

Returns

None.

add_keyword(keyword: IKeyword) None[source]

Add a keyword to find in a document.

Parameters

keywordIKeyword to search in a document.

Returns

None.

add_keywords(keywords: Iterable[IKeyword]) None[source]

Utility function to add multiple keywords.

Parameters

keywordsIKeyword to search in a document.

Returns

None.

add_labels(labels: Iterable[str]) None[source]

Utility function to call ‘add_keywords’ by providing a list of labels, IKeyword instances are created and added.

Parameters

labels – the labels (keywords) to be searched in the document.

Returns

None.

add_stopwords(words: Iterable[str]) None[source]

Add words (tokens) to be ignored in IKeyword and in documents.

Parameters

words – a list of words to ignore.

Returns

None.

annot_text(text: str, w: int = 1) List[Annotation[TokenT]][source]

Annotate a document.

Parameters
  • text – the document to annotate.

  • w – Window. How much discontinuous keyword’s tokens to find can be. By default, w=1 means the sequence must be continuous. w=2 means each token can be separated by another token.

Returns

a list of Annotation.

annot_tokens(tokens: Sequence[TokenT], w: int) List[Annotation[TokenT]][source]

Annotate a sequence of tokens.

Parameters
  • tokens – an ordered or unordered sequence of tokens.

  • w – Window. How much discontinuous keyword’s tokens to find can be. By default, w=1 means the sequence must be continuous. w=2 means each token can be separated by another token.

  • remove_nested_annots – if two annotations overlap, remove the shorter one.

Returns

a list of Annotation.

property fuzzy_algos: Iterable[FuzzyAlgo[TokenT]]

The fuzzy algorithms used by the algorithm.

Returns

FuzzyAlgo instances responsible for finding possible synonyms for each token of a document.

get_keywords_unigrams() Set[str][source]

Get all the unigrams (single words excluding stopwords) in the keywords.

get_synonyms(tokens: Sequence[TokenT], i: int, w_states: List[List[IState]]) Iterable[Tuple[Tuple[str, ...], List[str]]][source]

Get synonyms of a token with configured fuzzy algorithms.

Parameters
  • tokens – document’s tokens.

  • i – the ith token for which synonyms are expected.

  • w_states – algorithm’s states.

Returns

tuples of synonyms and fuzzy algorithm’s names.

is_stopword(word: str) bool[source]

Return True if word is a stopword.

is_token_a_stopword(token: TokenT) bool[source]

Check if a token is a stopword.

Parameters

token – a generic token that implements IToken.

Returns

True if the token is a stopword.

property keywords: Collection[IKeyword]

Return the keywords added.

property remove_nested_annots: bool

whether to remove nested annotations. Default to True.

Type

Matcher config

tokenize(text: str) Sequence[TokenT][source]

Tokenize a text with the tokenizer’s instance.

Parameters

text – a document or a keyword.

Returns

A sequence of tokens, the type depends on the tokenizer but must implement IToken protocol.

Annotation

class iamsystem.Annotation(tokens_states: Sequence[TransitionState[TokenT]])[source]

Bases: Span[TokenT]

Ouput class of Matcher storing information about linked entities.

end: int
get_tokens_algos() Iterable[Tuple[TokenT, List[str]]][source]

Get each token and the list of fuzzy algorithms that matched it.

Returns

an iterator of tuples (token0, [‘algo1’,…]) where token0 is a token and [‘algo1’,…] a list of fuzzy algorithms.

property keywords: Sequence[IKeyword]

The linked entities, IKeyword instances that matched a document’s tokens.

label: str
norm_label: str
start: int
to_brat_format() str

Get Brat offsets format. See https://brat.nlplab.org/standoff.html ‘The start-offset is the index of the first character of the annotated span in the text (“.txt” file), i.e. the number of characters in the document preceding it. The end-offset is the index of the first character after the annotated span.’

Returns

a string format of tokens’ offsets

to_dict(text: str = None) Dict[str, Any][source]

Return a dictionary representation of this object.

Parameters

text – the document from which this annotation comes from. Default to None.

Returns

A dictionary of relevant attributes.

to_string(text: str = None, debug=False) str[source]

Get a default string representation of this object.

Parameters
  • text – the document from which this annotation comes from. Default to None. If set, add the document substring: text[ first-token-start-offset : last-token-end-offset].

  • debug – default to False. If True, add the sequence of tokens and fuzzyalgo names.

Returns

a concatenated string of ‘keywords’ ‘start’ ‘end’ ‘substring’? ‘debug_info’?

property tokens: Sequence[TokenT]

The tokens of the document that matched the keywords attribute of this instance.

Returns

an ordered sequence of TokenT, a generic type that implements IToken.

rm_nested_annots

iamsystem.rm_nested_annots(annots: List[Annotation], keep_ancestors=False)[source]

In case of two nested annotations, remove the shorter one. For example, if we have “prostate” and “prostate cancer” annnotations, “prostate” annotation is removed.

Parameters
  • annots – a list of annotations.

  • keep_ancestors – Default to False. Whether to keep the nested annotations that are ancestors and remove only other cases.

Returns

a filtered list of annotations.

replace_annots

iamsystem.replace_annots(text: str, annots: Sequence[Annotation], new_labels: Sequence[str])[source]

Replace each annotation in a document (text parameter) by a new label. Warning: an annotation is ignored if overlapped by another one.

Parameters
  • text – the document from which the annotations come from.

  • annots – an ordered sequence of annotation.

  • new_labels – one new label per annotation, same length as annots expected.

Returns

a new document.

Keyword and subclasses

IKeyword

class iamsystem.IKeyword(*args, **kwargs)[source]

Bases: Protocol

A string to search in a document (ex: “heart failure”).

__init__(*args, **kwargs)
get_kb_id()[source]

Get the knowledge base id of this keyword.

label: str

Keyword

class iamsystem.Keyword(label: str)[source]

Bases: IKeyword

Base class to search keywords in a document.

__init__(label: str)[source]

Create a keyword.

Parameters

label – a string to search in a document (ex: “heart failure”).

get_kb_id()[source]

Get the knowledge base id of this keyword. It returns the label if this method is not overriden in the subclass.

Returns

A unique identifier.

Term

class iamsystem.Term(label: str, code: str)[source]

Bases: Keyword

This class represents a term in a particular domain where each keyword is associated to a unique identifier called a code.

__init__(label: str, code: str)[source]

Create a term.

Parameters
  • label – a string to search in a document (ex: “heart failure”).

  • code – the code associated to this keyword.

get_kb_id()[source]

returns the code of this term.

Terminology

class iamsystem.Terminology[source]

Bases: IStoreKeywords

A utility class to store a set of keywords.

add_keyword(keyword: IKeyword) None[source]

Add a keyword.

Parameters

keyword – a IKeyword or a subclass.

Returns

None

add_keywords(keywords: Iterable[IKeyword]) None[source]

Add multiple keywords.

Parameters

keywords – a IKeyword or a subclass.

Returns

None

get_unigrams(tokenizer: ITokenizer, stopwords: IStopwords) Set[str][source]

Get all the unigrams (single words excluding stopwords) in the keywords.

property keywords: Collection[IKeyword]

Get the collection of keywords.

property size: int

Get the number of keywords.

Tokenization

IOffsets

class iamsystem.IOffsets(*args, **kwargs)[source]

Bases: Protocol

Offsets interface. Default implementation Offsets.

Offsets

class iamsystem.Offsets(start: int, end: int)[source]

Bases: IOffsets

Store the start and end offsets of a token.

__init__(start: int, end: int)[source]
Parameters
  • start – start-offset is the index of the first character of the annotated span.

  • end – end-offset is the index of the first character after the annotated span.

IToken

class iamsystem.IToken(*args, **kwargs)[source]

Bases: IOffsets, Protocol

Token interface. Default implementation Token

Token

class iamsystem.Token(start: int, end: int, label: str, norm_label: str)[source]

Bases: Offsets, IToken

Store the label, normalized label, start and end offsets of a token.

__init__(start: int, end: int, label: str, norm_label: str)[source]

Create a token.

Parameters
  • start – start-offset is the index of the first character of the annotated span.

  • end – end-offset is the index of the first character after the annotated span.

  • label – the label as it is in the document.

  • norm_label – the normalized label (used by iamsystem’s algorithm to perform entity linking).

ITokenizer

class iamsystem.ITokenizer(*args, **kwargs)[source]

Bases: Protocol[TokenT]

Tokenizer Interface. Default implementation TokenizerImp.

tokenize(text: str) Sequence[TokenT][source]

Tokenize a string.

Parameters

text – an unormalized string.

Returns

A sequence of generic type (TokenT) that implements IToken protocol.

TokenizerImp

class iamsystem.TokenizerImp(split: Callable[[str], Iterable[IOffsets]], normalize: Callable[[str], str])[source]

Bases: ITokenizer[Token]

A ITokenizer implementation. Class responsible for the tokenization, normalization of tokens. See also french_tokenizer(), english_tokenizer().

__init__(split: Callable[[str], Iterable[IOffsets]], normalize: Callable[[str], str])[source]

Create a custom tokenizer that splits and normalizes a string.

Parameters
  • split – a function that split a text into (start,end) tuples. This function must return an iterable of IOffsets . See also split_find_iter_closure().

  • normalize – a function that normalizes a string. This function must return a string.

tokenize(text: str) Sequence[Token][source]

Split the text into a sequence of Token.

english_tokenizer

iamsystem.english_tokenizer() TokenizerImp[source]
An opinionated English tokenizer.
It splits the text by ‘word’ character.
It normalizes by lowercasing.
Returns

a TokenizerImp implementation.

french_tokenizer

iamsystem.french_tokenizer() TokenizerImp[source]
An opinionated French tokenizer.
It splits the text by ‘word’ character.
It normalizes by lowercasing and unicode normalization form.
Returns

a TokenizerImp implementation.

Build a custom split function

iamsystem.split_find_iter_closure(pattern: str) Callable[[str], Iterable[IOffsets]][source]

Build a split function that maps a document to (start, end) tuples.

Parameters

pattern – a regex to split sentence characters.

Returns

a split function.

Order tokens

iamsystem.tokenize_and_order_decorator(tokenize: Callable[[str], Sequence[TokenT]]) Callable[[str], Sequence[TokenT]][source]

Decorate a tokenize function: the tokens are sorted alphabetically by their label.

Parameters

tokenize – a tokenize function to decorate.

Returns

the decorated tokenize function.

Stopwords classes

IStopwords

class iamsystem.IStopwords(*args, **kwargs)[source]

Bases: Protocol[TokenT]

Stopwords Interface.

is_token_a_stopword(token: TokenT) bool[source]

Check if a token is a stopword.

Parameters

token – a generic Token that implements IToken protocol.

Returns

true if this token is a stopword.

Stopwords

class iamsystem.Stopwords(stopwords: Optional[Iterable[str]] = None)[source]

Bases: SimpleStopwords[TokenT]

A simple implementation of IStopwords protocol.

add(words: Iterable[str]) None[source]

Add stopwords.

Parameters

words – a list of string.

Returns

None

is_stopword(word: str) bool[source]

True if, after lowercasing, the word belongs to the stopwords set

property stopwords

Get the set of stopwords.

NegativeStopwords

class iamsystem.NegativeStopwords(words_to_keep: Optional[Iterable[str]] = None)[source]

Bases: IStopwords[TokenT]

Like a negative image (a total inversion, in which light areas appear dark and vice versa), every token is a stopword until proven otherwise.

add_fun_is_a_word_to_keep(fun: Callable[[TokenT], bool]) None[source]

Add a function that checks if a word should be kept.

Parameters

fun – a Callable that takes a token as a parameter and returns a boolean.

Returns

None.

add_words(words_to_keep: Iterable[str]) None[source]

Add words not to be ignored.

Parameters

words_to_keep – a list of string.

Returns

None

is_token_a_stopword(token: TokenT) bool[source]

Check if it’s not token to keep.

Parameters

token – a token.

Returns

False if the token’s lowercase belongs to the set of word to keep or if a function add_fun_is_a_word_to_keep() returns True.

NoStopwords

class iamsystem.NoStopwords(*args, **kwargs)[source]

Bases: SimpleStopwords[TokenT]

Utility class. Class to use when no stopwords are used.

is_stopword(word: str) bool[source]

Return False.

is_token_a_stopword(token: TokenT) bool[source]

Return False.

Fuzzy algorithms

Abstract Base classes

FuzzyAlgo

class iamsystem.FuzzyAlgo(name: str)[source]

Bases: Generic[TokenT], ABC

Fuzzy Algorithm base class.

NO_SYN: Iterable[Tuple[str, ...]] = []

Default value to return by a fuzzy algorithm if no synonym found.

abstract get_synonyms(tokens: Sequence[TokenT], i: int, w_states: List[List[IState]]) Iterable[Tuple[Tuple[str, ...], str]][source]

Main API function to retrieve all synonyms provided by a fuzzy algorithm.

Parameters
  • tokens – the sequence of tokens of the document. Useful when the fuzzy algorithm needs context, namely the tokens around the token of interest given by ‘i’ parameter.

  • i – the ith token of this sequence for which synonyms are expected.

  • w_states – the states in which the algorithm currently is. Useful is the fuzzy algorithm needs to know the current states and the possible state transitions.

Returns

0 to many synonyms (SynAlgo type).

static word_to_syn(word: str) Tuple[str, ...][source]

Utility function to transform a string to expected SynType.

Parameters

word – a word synonym produced by the algorithm. Ex: word=’insuffisance’ for token ‘ins’.

Returns

SynType, the expected output format.

static words_seq_to_syn(words: Sequence[str]) Tuple[str, ...][source]

Utility function to transform a sequence of string to the expected output type.

Parameters

words – a sequence of words produced by the algorithm. Ex: words=[‘insuffisance’, ‘cardiaque’] for the token ‘ic’.

Returns

SynType, the expected output format.

ContextFreeAlgo

class iamsystem.ContextFreeAlgo(name: str)[source]

Bases: FuzzyAlgo[TokenT], ABC

A FuzzyAlgo that doesn’t take into account context, only the current token.

get_synonyms(tokens: Sequence[TokenT], i: int, w_states: List[List[IState]]) Iterable[Tuple[Tuple[str, ...], str]][source]

Delegate to get_syns_of_token.

abstract get_syns_of_token(token: TokenT) Iterable[Tuple[str, ...]][source]

Returns synonyms of this token.

NormLabelAlgo

class iamsystem.NormLabelAlgo(name: str)[source]

Bases: ContextFreeAlgo[TokenT], INormLabelAlgo, ABC

A FuzzyAlgo that uses only the normalized label of a token. These fuzzy algorithms can be put in cache to avoid calling them multiple times. See CacheFuzzyAlgos.

get_syns_of_token(token: TokenT) Iterable[Tuple[str, ...]][source]

Delegate to get_syns_of_word.

abstract get_syns_of_word(word: str) Iterable[Tuple[str, ...]][source]

Returns synonyms of this word (e.g. the normalized label of a token).

CacheFuzzyAlgos

class iamsystem.CacheFuzzyAlgos(name: str = 'Cache')[source]

Bases: FuzzyAlgo, Generic[TokenT]

A FuzzyAlgo that provides a cache for NormLabelAlgo algorithms. Since these algorithms don’t depend on context, their output can be cached to avoid calling them multiple times.

add_algo(algo: INormLabelAlgo) None[source]

Add NormLabelAlgo.

empty_cache() None[source]

Empty the cache. Done automatically when an algorithm is added.

get_synonyms(tokens: Sequence[IToken], i: int, w_states: List[List[IState]]) List[Tuple[Tuple[str, ...], str]][source]

Implements superclass abstract method.

get_syns_of_word(word: str) List[Tuple[Tuple[str, ...], str]][source]

Retrieve all synonyms of fuzzy algorithms from cache or by calling them once.

property max_nb_of_words

The maximum number of words to put in cache. Default 100.000 words

Abbreviations

class iamsystem.Abbreviations(name: str, token_is_an_abbreviation: ~typing.Callable[[~iamsystem.tokenization.api.TokenT], bool] = <function Abbreviations.<lambda>>)[source]

Bases: ContextFreeAlgo[TokenT], INormLabelAlgo, ABC

A FuzzyAlgo to handle abbreviations. This class doesn’t take into account the context of a document to return a long form.

__init__(name: str, token_is_an_abbreviation: ~typing.Callable[[~iamsystem.tokenization.api.TokenT], bool] = <function Abbreviations.<lambda>>)[source]

Create an instance to store abbreviations.

Parameters
  • name – a name given to this algorithm. (ex: ‘medical abbs’)

  • token_is_an_abbreviation – a function that verify if a token is an abbreviation (ex: checks all letters are uppercase). The function is called before the dictionary look-up is performed to retrieve long forms. Default: no checks performed, the function returns always true.

add(short_form: str, long_form: str, tokenizer: ITokenizer) None[source]

Add an abbreviation.

Parameters
  • short_form – an abbreviation short form (ex: CHF).

  • long_form – an abbreviation long form. (ex: congestive heart failure).

  • tokenizer – a ITokenizer to tokenize the long form. It is recommanded to use your Matcher tokenizer.

Returns

None.

add_tokenized_long_form(short_form, long_form: Sequence[str]) None[source]

Add an abbreviation already tokenized.

get_syns_of_token(token: TokenT) Iterable[Tuple[str, ...]][source]

Return the abbreviation long form(s).

get_syns_of_word(word: str) Iterable[Tuple[str, ...]][source]

Return the abbreviation long form(s).

FuzzyRegex

class iamsystem.FuzzyRegex(algo_name: str, pattern: str, pattern_name: str)[source]

Bases: ContextFreeAlgo, INormLabelAlgo

A FuzzyAlgo to handle regular expressions. Useful when one or multiple tokens of a keyword need to be matched to a regular expression.

get_syns_of_token(token: TokenT) Iterable[Tuple[str, ...]][source]

Return the pattern_name if this token matches the regular expression.

get_syns_of_word(word: str) Iterable[Tuple[str, ...]][source]

Return the pattern_name if this word matches it.

replace_pattern_in_keyword(keyword: IKeyword, tokenizer: ITokenizer) IKeyword[source]

Utility function to replace keyword’s tokens that match the pattern by the pattern name.

token_matches_pattern(token: TokenT) bool[source]

Return True if this token matches this instance’s pattern.

WordNormalizer

class iamsystem.WordNormalizer(name: str, norm_fun: Callable[[str], str])[source]

Bases: NormLabelAlgo

A FuzzyAlgo to handle normalization techniques such as stemming and lemmatization.

add_words(words: Iterable[str]) None[source]

A list of possible word synonyms, in general all the tokens of your keywords. An easy way to provide these tokens is to call get_keywords_unigrams() of the matcher.

Parameters

words – A list of words to normalize and store.

Returns

None.

get_syns_of_word(word: str) Iterable[Tuple[str, ...]][source]

Return all the words that have the same normalized form of this word

For example, if the normalize function is an english stemmer, and you provided add_words=[“eating”], this instance stored the stem “eat” associated to the word “eating”. Then, if a document contains the token “eats”, since the stem is the same, this function returns the synonym “eating”.

Parameters

word – a string, i.e. a word from a document.

Returns

word synonyms and algorithm name.

SpellWise

SpellWiseWrapper

class iamsystem.SpellWiseWrapper(spellwise_algo: ESpellWiseAlgo, max_distance: int, min_nb_char: int = 5, name: str = None)[source]

Bases: NormLabelAlgo

A FuzzyAlgo that wraps an algorithm from the spellwise library.

add_words(words: Iterable[str], warn=False) None[source]

A list of possible word synonyms, in general all the tokens of your keywords. An easy way to provide these tokens is to call get_keywords_unigrams() method after you added your keywords to the matcher instance.

Parameters
  • words – A list of possible synonyms.

  • warn – raise a warning if a word added is ignored. Default False.

Returns

None.

add_words_to_ignore(words: Iterable[str])[source]

Add words that the algorithm will ignore: no string distance will be computed.

get_syns_of_word(word: str) Iterable[Tuple[str, ...]][source]

Returns closest words if this the word is not a word to ignore.

property max_distance

Maximum edit distance (see spellwise documentation).

property min_nb_char

The minimum number of characters a word must have not to be ignored.

ESpellWiseAlgo

class iamsystem.ESpellWiseAlgo(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Enumerated list of spellwise library algorithms. See spellwise documentation for more information.

CAVERPHONE_1 = <class 'spellwise.algorithms.caverphone_one.CaverphoneOne'>
CAVERPHONE_2 = <class 'spellwise.algorithms.caverphone_two.CaverphoneTwo'>
EDITEX = <class 'spellwise.algorithms.editex.Editex'>
LEVENSHTEIN = <class 'spellwise.algorithms.levenshtein.Levenshtein'>
SOUNDEX = <class 'spellwise.algorithms.soundex.Soundex'>
TYPOX = <class 'spellwise.algorithms.typox.Typox'>

Brat

BratDocument

class iamsystem.BratDocument[source]

Bases: object

Class representing a Brat Document containing Brat’s annotations, namely Brat Entity and Brat Note in this package. A BratDocument should be linked to a single text document. Entities and notes can be serialized in a text file with ‘ann’ extension, one per line. See https://brat.nlplab.org/standoff.html

add_annots(annots: List[Annotation], text: str, keyword_attr: str = None, brat_type: str = None) None[source]

Add iamsystem annotations to convert them to Brat format.

Parameters
  • annots – a list of Annotation, Matcher output.

  • text – the document from which these annotations comes from.

  • keyword_attr – the attribute name of a IKeyword that stores brat_type. Default to None. If None, brat_type parameter must be used.

  • brat_type – A string, the Brat entity type for all these annotations. Default to None. If None, keyword_attr parameter must be used.

Returns

None

add_entity(brat_type: str, offsets: List[IOffsets], text: str) None[source]

Add a Brat Entity.

Parameters
  • brat_type – A Brat entity type (see Brat documentation).

  • offsets – a list of (start,end) annotation offsets. See IOffsets. A list is expected since the tokens can be discontinuous.

  • text – document substring using (start,end) offsets (not the document itself).

Returns

None

entities_to_string() str[source]

Brat entities in the Brat format ready to be serialized to ‘.ann’ text file.

get_entities() Iterable[BratEntity][source]

An iterable of Brat entities.

get_notes() Iterable[BratNote][source]

An iterable of Brat notes.

notes_to_string() str[source]

Brat notes in the Brat format ready to be serialized to ‘.ann’ text file.

BratEntity

class iamsystem.BratEntity(entity_id: str, brat_type: str, offsets: Sequence[IOffsets], text: str)[source]

Bases: object

Class representing a Brat Entity. https://brat.nlplab.org/standoff.html: ‘Each entity annotation has a unique ID and is defined by type (e.g. Person or Organization). and the span of characters containing the entity mention (represented as a “start end” offset pair).’

Format: ID TYPE START END[;START END]* TEXT.

__init__(entity_id: str, brat_type: str, offsets: Sequence[IOffsets], text: str)[source]

Create a Brat Entity.

Parameters
  • entity_id – a unique ID (^T[0-9]+$).

  • brat_type – A Brat entity type (see Brat documentation).

  • offsets – (start,end) annotation offsets. See IOffsets.

  • text – document substring using (start,end) offsets.

BratNote

class iamsystem.BratNote(note_id: str, ref_id: str, note: str)[source]

Bases: object

Class representing a Brat Note. https://brat.nlplab.org/standoff.html Brat notes are used to store additionnal information on a detected entity. Format: #ID TYPE REFID NOTE

__init__(note_id: str, ref_id: str, note: str)[source]

Create a Brat Note.

Parameters
  • note_id – a unique ID (^#[0-9]+$)

  • ref_id – a unique ID. For a BratEntity, the format is (^T[0-9]+$)

  • note – any string comment.

TYPE = 'IAMSYSTEM'

BratNote type. Replace by ‘AnnotatorNotes’ to be human writable in Brat interface

BratWriter

class iamsystem.BratWriter[source]

Bases: object

Utility class to write IAMsystem annotations in Brat format to a text file.

classmethod saveEntities(brat_entities: Iterable[BratEntity], write: Callable[[str], Any]) None[source]

Write Brat entities.

Parameters
  • brat_entities – an iterable of Brat entities.

  • write – a write function (ex: f.write from ‘with(open(filename, ‘w’)) as f:’)

Returns

None

classmethod saveNotes(brat_notes: Iterable[BratNote], write: Callable[[str], Any]) None[source]

Write Brat notes.

Parameters
  • brat_notes – an iterable of Brat notes.

  • write – a write function ex: f.write from ‘with(open(filename, ‘w’)) as f:

Returns

None

spaCy

IAMsystemSpacy

TokenSpacyAdapter

IsStopSpacy

SpacyTokenizer