【论文查重】pycorrector 搭建错别字识别

先更新 wheel ,否则会报下面这个错

error: can’t copy ‘pycorrector/data/kenlm’: doesn’t exist or not a regular file

pip install –upgrade wheel
pip install –upgrade setuptools

pip install pypinyin
pip install kenlm
pip install numpy
pip install jieba
pip install pycorrector

kenlm  在centos下安装可能会出现找不到 Python.h 的错误

解决办法    yum install python36u-devel

python 文件内容

#coding=utf-8
import pycorrector
corrected_sent,detail=pycorrector.correct(‘少先队员因该为老人让坐’)
print(corrected_sent, detail)

执行结果

2019-03-08 10:10:32,814 – /usr/local/lib/python3.5/dist-packages/pycorrector/corrector.py – DEBUG – Loaded same pinyin file: /usr/local/lib/python3.5/dist-packages/pycorrector/data/same_pinyin.txt, same stroke file: /usr/local/lib/python3.5/dist-packages/pycorrector/data/same_stroke.txt, spend: 0.047 s.
2019-03-08 10:10:32,816 – /usr/local/lib/python3.5/dist-packages/pycorrector/detector.py – DEBUG – Loaded language model: /usr/local/lib/python3.5/dist-packages/pycorrector/data/kenlm/people_chars_lm.klm, spend: 0.0011556148529052734 s
2019-03-08 10:10:34,098 – /usr/local/lib/python3.5/dist-packages/pycorrector/detector.py – DEBUG – Loaded word freq file: /usr/local/lib/python3.5/dist-packages/pycorrector/data/word_dict.txt, spend: 1.2822625637054443 s
2019-03-08 10:10:34,099 – /usr/local/lib/python3.5/dist-packages/pycorrector/detector.py – DEBUG – Loaded confusion file: /usr/local/lib/python3.5/dist-packages/pycorrector/data/custom_confusion.txt, spend: 1.2832543849945068 s
少先队员应该为老人让座 [[‘因该’, ‘应该’, 4, 6], [‘坐’, ‘座’, 10, 11]]