Project description
Chardet: The Universal Character Encoding Detector

- ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
- Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)
- EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)
- EUC-KR, ISO-2022-KR, Johab (Korean)
- KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
- ISO-8859-5, windows-1251 (Bulgarian)
- ISO-8859-1, windows-1252, MacRoman (Western European languages)
- ISO-8859-7, windows-1253 (Greek)
- ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
- TIS-620 (Thai)
Our ISO-8859-2 and windows-1250 (Hungarian) probers have been temporarily disabled until we can retrain the models.
Requires Python 3.7+.
'프로그래밍 > Python' 카테고리의 다른 글
[python] pip install prettytable, 표 형태로 데이터를 보여준다. (0) | 2024.03.25 |
[python] 한글 자음 확인해서 치환하기 (0) | 2024.03.22 |
[python] 엑셀 읽고 쓰기 openpyxl (0) | 2024.03.20 |
[python] 한글 자음, 모음, 초성 추출하기 (0) | 2024.03.20 |
[python] 한글 자음 모음 분리하기, jamo, jamotools (0) | 2024.03.15 |