긍정적 사고, 음식의 절제, 규칙적인 운동

자음

[python] 한글 자음 확인해서 치환하기 2024.03.22
[python] 한글 자음, 모음, 초성 추출하기 2024.03.20
한글 자음, 모음 설명 2024.03.15
[python] 한글 자음 모음 분리하기, jamo, jamotools 2024.03.15

[python] 한글 자음 확인해서 치환하기

홍반장水_ 2024. 3. 22. 15:49

2024. 3. 22. 15:49

[python] 한글 자음 확인해서 치환하기

""" 한글 자음인지 확인 
"""
# Define the list of Korean initial consonants (초성)
INITIAL_CONSONANTS = [
    'ㄱ', 'ㄲ', 'ㄴ', 'ㄷ', 'ㄸ', 'ㄹ', 
    'ㅁ', 'ㅂ', 'ㅃ', 'ㅅ', 'ㅆ', 'ㅇ', 
    'ㅈ', 'ㅉ', 'ㅊ', 'ㅋ', 'ㅌ', 'ㅍ', 'ㅎ'
]

def is_korean_char(ch):
    """Check if the character is a Korean syllable."""
    return 0xAC00 <= ord(ch) <= 0xD7A3

def get_initial_consonant(ch):
    """Extract the initial consonant from a Korean syllable."""
    if not is_korean_char(ch):
        return None  # or you could return an empty string or raise an exception
    
    # Calculate the index for the initial consonant
    initial_index = (ord(ch) - 0xAC00) // (21 * 28)
    return INITIAL_CONSONANTS[initial_index]

# Example usage
sentence = "안녕하세요"
initials = [get_initial_consonant(ch) for ch in sentence if is_korean_char(ch)]
print(''.join(initials))  # Output: ㅇㄴㅎㅅㅇ

저작자표시 비영리

'프로그래밍 > Python' 카테고리의 다른 글

[PYTHON] Lambda function (0)	2024.04.05
[python] pip install prettytable, 표 형태로 데이터를 보여준다. (0)	2024.03.25
[python] 문자열 인코딩 확인하기. chardet (0)	2024.03.22
[python] 엑셀 읽고 쓰기 openpyxl (0)	2024.03.20
[python] 한글 자음, 모음, 초성 추출하기 (0)	2024.03.20

[python] 한글 자음, 모음, 초성 추출하기

홍반장水_ 2024. 3. 20. 17:34

2024. 3. 20. 17:34

[python] 한글 자음, 모음, 초성 추출하기

# pip install jamotools 
# https://pypi.org/project/jamotools/
# A library for Korean Jamo split and vectorize. 
#
# 음절 분할 및 jamos를 음절에 결합하는 API는 hangul-utils 를 기반으로 합니다 .
# 
# Split_syllables : 음절 문자열을 jamos 문자열로 변환하고 유니코드 유형을 변환하도록 선택할 수 있습니다.
# Join_jamos : jamos 문자열을 음절 문자열로 변환합니다.
# Normalize_to_compat_jamo : jamos 문자열을 한글 호환성 Jamo 문자열로 정규화합니다 .
import jamotools

print(jamotools.split_syllable_char(u"안"))
#('ㅇ', 'ㅏ', 'ㄴ')

print(jamotools.split_syllables(u"안녕하세요"))
# ㅇㅏㄴㄴㅕㅇㅎㅏㅅㅔㅇㅛ


sentence = u"앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집  깨죽은 검은 깨 깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라."
s = jamotools.split_syllables(sentence)
print(s, '\n')

""" ㅇㅏㅍ ㅈㅣㅂ ㅍㅏㅌㅈㅜㄱㅇㅡㄴ ㅂㅜㄺㅇㅡㄴ ㅍㅏㅌ ㅍㅜㅅㅍㅏㅌㅈㅜㄱㅇㅣㄱㅗ,
ㄷㅟㅅㅈㅣㅂ ㅋㅗㅇㅈㅜㄱㅇㅡㄴ ㅎㅐㅅㅋㅗㅇ ㄷㅏㄴㅋㅗㅇ ㅋㅗㅇㅈㅜㄱ.ㅇㅜㄹㅣ
ㅈㅣㅂ ㄲㅐㅈㅜㄱㅇㅡㄴ ㄱㅓㅁㅇㅡㄴ ㄲㅐ ㄲㅐㅈㅜㄱㅇㅣㄴㄷㅔ ㅅㅏㄹㅏㅁㄷㅡㄹㅇㅡㄴ
ㅎㅐㅅㅋㅗㅇ ㄷㅏㄴㅋㅗㅇ ㅋㅗㅇㅈㅜㄱ ㄲㅐㅈㅜㄱ ㅈㅜㄱㅁㅓㄱㄱㅣㄹㅡㄹ
ㅅㅣㅀㅇㅓㅎㅏㄷㅓㄹㅏ. """

sentence2 = jamotools.join_jamos(s)
print(sentence2)
""" 앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집 깨죽은 검은 깨
깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라. """

print(sentence == sentence2)
# True


# 자음만 추출
def extract_vowels(text):
    vowels = set(['ㅏ', 'ㅑ', 'ㅓ', 'ㅕ', 'ㅗ', 'ㅛ', 'ㅜ', 'ㅠ', 'ㅡ', 'ㅣ', 'ㅐ', 'ㅒ', 'ㅔ', 'ㅖ', 'ㅘ', 'ㅙ', 'ㅚ', 'ㅝ', 'ㅞ', 'ㅟ', 'ㅢ'])
    result = ''
    for char in text:
        if '가' <= char <= '힣':  # Check if the character is Hangul
            syllables = jamotools.split_syllables(char)
            for syllable in syllables:
                if syllable in vowels:
                    result += syllable
    return result

sentence = u"앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집  깨죽은 검은 깨 깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라."
vowels_only = extract_vowels(sentence)
print(vowels_only)
# ㅏㅣㅏㅜㅡㅜㅡㅏㅜㅏㅜㅣㅗㅟㅣㅗㅜㅡㅐㅗㅏㅗㅗㅜㅜㅣㅣㅐㅜㅡㅓㅡㅐㅐㅜㅣㅔㅏㅏㅡㅡㅐㅗㅏㅗㅗㅜㅐㅜㅜㅓㅣㅡㅣㅓㅏㅓㅏ


# 모음만 추출 
def extract_consonants(text):
    consonants = set(['ㄱ', 'ㄲ', 'ㄴ', 'ㄷ', 'ㄸ', 'ㄹ', 'ㅁ', 'ㅂ', 'ㅃ', 'ㅅ', 'ㅆ', 'ㅇ', 'ㅈ', 'ㅉ', 'ㅊ', 'ㅋ', 'ㅌ', 'ㅍ', 'ㅎ'])
    result = ''
    for char in text:
        if '가' <= char <= '힣':  # Check if the character is Hangul
            syllables = jamotools.split_syllables(char)
            for syllable in syllables:
                if syllable in consonants:
                    result += syllable
    return result

sentence = u"앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집  깨죽은 검은 깨 깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라."
consonants_only = extract_consonants(sentence)
print(consonants_only)
# ㅇㅍㅈㅂㅍㅌㅈㄱㅇㄴㅂㅇㄴㅍㅌㅍㅅㅍㅌㅈㄱㅇㄱㄷㅅㅈㅂㅋㅇㅈㄱㅇㄴㅎㅅㅋㅇㄷㄴㅋㅇㅋㅇㅈㄱㅇㄹㅈㅂㄲㅈㄱㅇㄴㄱㅁㅇㄴㄲㄲㅈㄱㅇㄴㄷㅅㄹㅁㄷㄹㅇㄴㅎㅅㅋㅇㄷㄴㅋㅇㅋㅇㅈㄱㄲㅈㄱㅈㄱㅁㄱㄱㄹㄹㅅㅇㅎㄷㄹ


# 초성만 추출 
def extract_initial_consonants(text):
    result = ''
    for char in text:
        if '가' <= char <= '힣':  # Check if the character is Hangul
            initial_consonant = jamotools.split_syllable_char(char)[0]
            result += initial_consonant
    return result

sentence = u"앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집  깨죽은 검은 깨 깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라."
initial_consonants_only = extract_initial_consonants(sentence)
print(initial_consonants_only)

저작자표시 비영리

'프로그래밍 > Python' 카테고리의 다른 글

[python] 문자열 인코딩 확인하기. chardet (0)	2024.03.22
[python] 엑셀 읽고 쓰기 openpyxl (0)	2024.03.20
[python] 한글 자음 모음 분리하기, jamo, jamotools (0)	2024.03.15
[python] gTTS 한글 speak (0)	2024.03.12
[python] pyttsx3, TTS, AudioBook, gTTS (0)	2024.03.12

한글 자음, 모음 설명

홍반장水_ 2024. 3. 15. 17:16

2024. 3. 15. 17:16

https://namu.wiki/w/%ED%95%9C%EA%B8%80/%EC%9E%90%EB%AA%A8

한글/자모

현재 사전이나 컴퓨터 한글 코드에서 한글 자모는 다음 순으로 배열한다. 한글 맞춤법 제4항 붙임 2와 그 해설에 따

namu.wiki

한글을 이루는 낱글자. '자모'라고 하여 ' 자음'과 ' 모음'의 약자로 알고 있는 경우가 많으나, 한자를 보면 子母가 아닌 字母이다. '글자(字)를 이루는 모(母)체'라는 뜻.

자음(19자) : ㄱ ㄲ ㄴ ㄷ ㄸ ㄹ ㅁ ㅂ ㅃ ㅅ ㅆ ㅇ ㅈ ㅉ ㅊ ㅋ ㅌ ㅍ ㅎ
모음(21자) : ㅏ ㅐ ㅑ ㅒ ㅓ ㅔ ㅕ ㅖ ㅗ ㅘ ㅙ ㅚ ㅛ ㅜ ㅝ ㅞ ㅟ ㅠ ㅡ ㅢ ㅣ
받침(27자) : ㄱ ㄲ ㄳ ㄴ ㄵ ㄶ ㄷ ㄹ ㄺ ㄻ ㄼ ㄽ ㄾ ㄿ ㅀ ㅁ ㅂ ㅄ ㅅ ㅆ ㅇ ㅈ ㅊ ㅋ ㅌ ㅍ ㅎ

저작자표시 비영리

'프로그래밍' 카테고리의 다른 글

애플, 자율주행차 버리고 ‘로봇’ 택했다...프로젝트 스컹크웍스 시동 (0)	2024.04.05
취약층 의대 보낸 '서울런' 인강…"AI로 맞춤 학습 강화"(종합) (1)	2024.03.29
[VSCODE] vscode-pdf (0)	2024.03.07
프리랜서를 위한 종합소득세 신고 Q&A (0)	2024.02.05
Meta의 무료 Code Llama AI 프로그래밍 도구로 GPT-4와의 격차 해소 (0)	2024.01.31

[python] 한글 자음 모음 분리하기, jamo, jamotools

홍반장水_ 2024. 3. 15. 16:41

2024. 3. 15. 16:41

[python] 한글 자음 모음 분리하기

pip install jamo

https://pypi.org/project/jamo/

jamo

A Hangul syllable and jamo analyzer.

pypi.org

https://github.com/jdongian/python-jamo

------------

pip install jamotools

https://pypi.org/project/jamotools/

A library for Korean Jamo split and vectorize. 한국어 Jamo를 분할하고 벡터화하는 라이브러리입니다.

>>> import jamotools
>>> print(jamotools.split_syllable_char(u"안"))
('ㅇ', 'ㅏ', 'ㄴ')

>>> print(jamotools.split_syllables(u"안녕하세요"))
ㅇㅏㄴㄴㅕㅇㅎㅏㅅㅔㅇㅛ

>>> sentence = u"앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집
    깨죽은 검은 깨 깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라."
>>> s = jamotools.split_syllables(sentence)
>>> print(s)
ㅇㅏㅍ ㅈㅣㅂ ㅍㅏㅌㅈㅜㄱㅇㅡㄴ ㅂㅜㄺㅇㅡㄴ ㅍㅏㅌ ㅍㅜㅅㅍㅏㅌㅈㅜㄱㅇㅣㄱㅗ,
ㄷㅟㅅㅈㅣㅂ ㅋㅗㅇㅈㅜㄱㅇㅡㄴ ㅎㅐㅅㅋㅗㅇ ㄷㅏㄴㅋㅗㅇ ㅋㅗㅇㅈㅜㄱ.ㅇㅜㄹㅣ
ㅈㅣㅂ ㄲㅐㅈㅜㄱㅇㅡㄴ ㄱㅓㅁㅇㅡㄴ ㄲㅐ ㄲㅐㅈㅜㄱㅇㅣㄴㄷㅔ ㅅㅏㄹㅏㅁㄷㅡㄹㅇㅡㄴ
ㅎㅐㅅㅋㅗㅇ ㄷㅏㄴㅋㅗㅇ ㅋㅗㅇㅈㅜㄱ ㄲㅐㅈㅜㄱ ㅈㅜㄱㅁㅓㄱㄱㅣㄹㅡㄹ
ㅅㅣㅀㅇㅓㅎㅏㄷㅓㄹㅏ.

>>> sentence2 = jamotools.join_jamos(s)
>>> print(sentence2)
앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집 깨죽은 검은 깨
깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라.

>>> print(sentence == sentence2)
True

저작자표시 비영리

'프로그래밍 > Python' 카테고리의 다른 글

[python] 엑셀 읽고 쓰기 openpyxl (0)	2024.03.20
[python] 한글 자음, 모음, 초성 추출하기 (0)	2024.03.20
[python] gTTS 한글 speak (0)	2024.03.12
[python] pyttsx3, TTS, AudioBook, gTTS (0)	2024.03.12
[python] Pandas tutorial (0)	2024.03.08

PREV 이전 1 NEXT 다음

긍정적 사고, 음식의 절제, 규칙적인 운동

자음

[python] 한글 자음 확인해서 치환하기

'프로그래밍 > Python' 카테고리의 다른 글

[python] 한글 자음, 모음, 초성 추출하기

'프로그래밍 > Python' 카테고리의 다른 글

한글 자음, 모음 설명

'프로그래밍' 카테고리의 다른 글

[python] 한글 자음 모음 분리하기, jamo, jamotools

'프로그래밍 > Python' 카테고리의 다른 글

+ Recent posts

티스토리툴바