.. chirptext documentation master file, created by
sphinx-quickstart on Mon May 3 11:34:14 2021.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to chirptext's documentation!
=====================================
ChirpText, an `open source and free software `_,
is a collection of text processing tools for Python.
.. image:: https://img.shields.io/lgtm/alerts/g/letuananh/chirptext.svg?logo=lgtm&logoWidth=18
:target: https://lgtm.com/projects/g/letuananh/chirptext/alerts/
.. image:: https://img.shields.io/lgtm/grade/python/g/letuananh/chirptext.svg?logo=lgtm&logoWidth=18
:target: https://lgtm.com/projects/g/letuananh/chirptext/context:python
It is not meant to be a powerful tank like the popular NTLK but a small
package which you can pip-install anywhere and write a few lines of code
to process textual data.
Main features
=============
- Parse Japanese text (Does not require ``mecab-python3`` package even on Windows, only a binary release
(i.e. ``mecab.exe``) is required)
- Built-in “lite” `text annotation formats `__ (TTL/CSV and TTL/JSON)
- Helper functions and useful data for processing English, Japanese, Chinese and Vietnamese.
- Enhanced ``open()`` function that support common text-based and binary-based format (txt, gz, csv, tsv, json, etc.)
- Quick text-based report generation
- Application configuration management which can make educated guess about config files’ whereabouts
- **((Experimental)** Web fetcher with responsible web crawling ethics
(support caching out of the box)
- **(Experimental)** Console application template
Installation
============
Chirptext is available on `PyPI `_ and can be installed using ``pip install``
.. code:: bash
python install chirptext
**Note**: chirptext library does not support Python 2 anymore. Please
update to Python 3 to use this package.
Sample codes
============
Using MeCab on Windows
----------------------
You can download mecab binary package from
http://taku910.github.io/mecab/#download and install it.
After installed you can try:
.. code:: python
>>> from chirptext import deko
>>> sent = deko.parse('猫が好きです。')
>>> sent.tokens
[[猫(名詞-一般/*/*|猫|ネコ|ネコ)], [が(助詞-格助詞/一般/*|が|ガ|ガ)], [好き(名詞-形容動詞語幹/*/*|好き|スキ|スキ)], [です(助動詞-*/*/*|です|デス|デス)], [。(記号-句点/*/*|。|。|。)], [EOS(-//|||)]]
>>> sent.words
['猫', 'が', '好き', 'です', '。']
>>> sent[0].pos
'名詞'
>>> sent[0].root
'猫'
>>> sent[0].reading
'ネコ'
If you installed MeCab to a custom location, for example
``C:\mecab\bin\mecab.exe``, try
.. code:: python
>>> deko.set_mecab_bin("C:\\mecab\\bin\\mecab.exe")
>>> deko.get_mecab_bin()
'C:\\mecab\\bin\\mecab.exe'
# Just that & now you can use mecab
>>> deko.parse('雨が降る。').words
['雨', 'が', '降る', '。']
Convenient IO APIs
------------------
.. code:: python
>>> from chirptext import chio
>>> chio.write_tsv('data/test.tsv', [['a', 'b'], ['c', 'd']])
>>> chio.read_tsv('data/tes.tsv')
[['a', 'b'], ['c', 'd']]
>>> chio.write_file('data/content.tar.gz', 'Support writing to .tar.gz file')
>>> chio.read_file('data/content.tar.gz')
'Support writing to .tar.gz file'
>>> for row in chio.read_tsv_iter('data/test.tsv'):
... print(row)
...
['a', 'b']
['c', 'd']
.. toctree::
:maxdepth: 2
:caption: Contents:
recipes
api
Useful links
------------
- Chirptext source code: https://github.com/letuananh/chirptext/
- Chirptext documentation: https://chirptext.readthedocs.io/
- Chirptext on PyPI: https://pypi.org/project/chirptext/
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`