API Reference¶
An overview of chirptext
modules.
Chirp Text - Minimalist Text Processing Library
Enhanced IO module¶
Chirptext’s enhanced IO functions
- chirptext.chio.iter_csv_stream(input_stream, fieldnames=None, sniff=False, *args, **kwargs)[source]¶
Read CSV content as a table (list of lists) from an input stream
- chirptext.chio.process_file(path, processor, encoding='utf-8', mode='rt', *args, **kwargs)[source]¶
Process a text file’s content. If the file name ends with .gz, read it as gzip file
- chirptext.chio.read(path, encoding='utf-8', *args, **kwargs)¶
Read text file content. If the file name ends with .gz, read it as gzip file. If mode argument is provided as ‘rb’, content will be read as byte stream. By default, content is read as text (string).
# Read content as text >>> txt = chio.read_file(“sample.txt”) # Read content as binary (bytes) >>> bin = chio.read_file(“sample.dat.gz”, mode=”rb”)
- Parameters
encoding – defaulted to UTF-8. Will be ignored if reading mode is ‘rb’
- chirptext.chio.read_csv(path, fieldnames=None, sniff=True, encoding='utf-8', *args, **kwargs)[source]¶
Read CSV rows as table from a file. By default, csv.reader() will be used any output will be a list of lists. If fieldnames is provided, DictReader will be used and output will be list of OrderedDict instead. CSV sniffing (dialect detection) is enabled by default, set sniff=False to switch it off.
- chirptext.chio.read_csv_iter(path, fieldnames=None, sniff=True, mode='rt', encoding='utf-8', *args, **kwargs)[source]¶
Iterate through CSV rows in a file. By default, csv.reader() will be used any output will be a list of lists. If fieldnames is provided, DictReader will be used and output will be list of OrderedDict instead. CSV sniffing (dialect detection) is enabled by default, set sniff=False to switch it off.
- chirptext.chio.read_file(path, encoding='utf-8', *args, **kwargs)[source]¶
Read text file content. If the file name ends with .gz, read it as gzip file. If mode argument is provided as ‘rb’, content will be read as byte stream. By default, content is read as text (string).
# Read content as text >>> txt = chio.read_file(“sample.txt”) # Read content as binary (bytes) >>> bin = chio.read_file(“sample.dat.gz”, mode=”rb”)
- Parameters
encoding – defaulted to UTF-8. Will be ignored if reading mode is ‘rb’
- chirptext.chio.write(path, content, mode=None, encoding='utf-8')¶
Write content to a file. If the path ends with .gz, gzip will be used.
- chirptext.chio.write_csv(path, rows, dialect='excel', fieldnames=None, quoting=1, extrasaction='ignore', encoding='utf-8', newline='', *args, **kwargs)[source]¶
Write rows data to a CSV file (with or without fieldnames)
By default content will be written in excel-csv dialect. This can be changed by using the optional argument dialect.
Text annotation (TTL) module¶
Text Annotation (texttaglib - TTL) module
Japanese parser¶
Convenient Japanese text parser that produces results in TTL format
- chirptext.deko.analyse(content, splitlines=True, format=None, **kwargs)[source]¶
Japanese text > tokenize/txt/html
- chirptext.deko.get_mecab_bin()¶
Get MeCab binary location
- chirptext.deko.set_mecab_bin(location)¶
Set MeCab binary location
Chinese character radicals¶
Tools for processing Chinese
- class chirptext.sino.Radical(idseq='', radical='', variants='', strokes='', meaning='', pinyin='', hanviet='', hiragana='', romaji='', hangeul='', romaja='', frequency='', simplified='', examples='')[source]¶
Chinese Radical Source: https://en.wikipedia.org/wiki/Kangxi_radical#Table_of_radicals
Swadesh list¶
Language profile: UK English
Vietnamese support functions¶
Dao Phay: A collection of tools for processing Vietnamese text using Python.
Utilities¶
Miscellaneous tools for text processing
- class chirptext.leutile.AppConfig(name, mode='ini', working_dir='.', extra_potentials=None)[source]¶
Application Configuration Helper This class supports guessing configuration file location, and reads either INI (default) or JSON format.
- property config¶
Read config automatically if required
- property config_path¶
Path to config file
- read_config(key, strict=False, **kwargs)[source]¶
Read a config by key
Default value can be passed by using the kwarg default
>>> read_config(key, default='my value')
- Parameters
key – configuration key
strict – Set to True to raise KeyError if config key was not set. Defaulted to False
default – Optional kwarg to set default value when key could not be found
- class chirptext.leutile.FileHub(*filenames, working_dir='.', default_mode='a', ext='txt')[source]¶
A helper class for working with multiple text reports at the same time
- class chirptext.leutile.Table(header=True, padding=True, NoneValue=None)[source]¶
A text-based table which can be used with TextReport
- chirptext.leutile.hamilton_allocate(numbers, total=100, precision=2)[source]¶
Use largest remainder (Hamilton) method to make sure rounded percentages add up to 100 >>> hamilton_allocate((33.33, 33.33, 33.33)) [33.34, 33.33, 33.33] >>> hamilton_allocate((24.99, 24.99, 24.99, 24.99)) [25.0, 25.0, 25.0, 25.0] >>> hamilton_allocate((76.69, 20.83, 2.49)) [76.69, 20.83, 2.48] >>> hamilton_allocate([13.626332, 47.989636, 9.596008, 28.788024]) [13.63, 47.99, 9.59, 28.79]
Command-line applications¶
Command-line interface helper
- class chirptext.cli.CLIApp(desc, add_vq=True, add_tasks=True, **kwargs)[source]¶
A simple template for command-line interface applications
- property logger¶
Lazy logger
- chirptext.cli.setup_logging(config_path, log_dir=None, force_setup=False, default_level=30, silent=True)[source]¶
Try to load logging configuration from a file. Set level to INFO if failed.
- Parameters
config_path – Path to the logging config file (JSON)
log_dir – Path to log output directory. When log_dir is not None and the directory does not exist, it will be created automatically.
Python data mapping functions¶
Data mapping functions
- class chirptext.anhxa.TypedJSONEncoder(*args, type_map=None, **kwargs)[source]¶
- default(obj)[source]¶
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- class chirptext.anhxa.TypelessSONEncoder(*args, type_map=None, **kwargs)[source]¶
- default(obj)[source]¶
Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)