반응형

[python] 한글 자음, 모음, 초성 추출하기 

 

# pip install jamotools 
# https://pypi.org/project/jamotools/
# A library for Korean Jamo split and vectorize. 
#
# 음절 분할 및 jamos를 음절에 결합하는 API는 hangul-utils 를 기반으로 합니다 .
# 
# Split_syllables : 음절 문자열을 jamos 문자열로 변환하고 유니코드 유형을 변환하도록 선택할 수 있습니다.
# Join_jamos : jamos 문자열을 음절 문자열로 변환합니다.
# Normalize_to_compat_jamo : jamos 문자열을 한글 호환성 Jamo 문자열로 정규화합니다 .
import jamotools

print(jamotools.split_syllable_char(u"안"))
#('ㅇ', 'ㅏ', 'ㄴ')

print(jamotools.split_syllables(u"안녕하세요"))
# ㅇㅏㄴㄴㅕㅇㅎㅏㅅㅔㅇㅛ


sentence = u"앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집  깨죽은 검은 깨 깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라."
s = jamotools.split_syllables(sentence)
print(s, '\n')

""" ㅇㅏㅍ ㅈㅣㅂ ㅍㅏㅌㅈㅜㄱㅇㅡㄴ ㅂㅜㄺㅇㅡㄴ ㅍㅏㅌ ㅍㅜㅅㅍㅏㅌㅈㅜㄱㅇㅣㄱㅗ,
ㄷㅟㅅㅈㅣㅂ ㅋㅗㅇㅈㅜㄱㅇㅡㄴ ㅎㅐㅅㅋㅗㅇ ㄷㅏㄴㅋㅗㅇ ㅋㅗㅇㅈㅜㄱ.ㅇㅜㄹㅣ
ㅈㅣㅂ ㄲㅐㅈㅜㄱㅇㅡㄴ ㄱㅓㅁㅇㅡㄴ ㄲㅐ ㄲㅐㅈㅜㄱㅇㅣㄴㄷㅔ ㅅㅏㄹㅏㅁㄷㅡㄹㅇㅡㄴ
ㅎㅐㅅㅋㅗㅇ ㄷㅏㄴㅋㅗㅇ ㅋㅗㅇㅈㅜㄱ ㄲㅐㅈㅜㄱ ㅈㅜㄱㅁㅓㄱㄱㅣㄹㅡㄹ
ㅅㅣㅀㅇㅓㅎㅏㄷㅓㄹㅏ. """

sentence2 = jamotools.join_jamos(s)
print(sentence2)
""" 앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집 깨죽은 검은 깨
깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라. """

print(sentence == sentence2)
# True


# 자음만 추출
def extract_vowels(text):
    vowels = set(['ㅏ', 'ㅑ', 'ㅓ', 'ㅕ', 'ㅗ', 'ㅛ', 'ㅜ', 'ㅠ', 'ㅡ', 'ㅣ', 'ㅐ', 'ㅒ', 'ㅔ', 'ㅖ', 'ㅘ', 'ㅙ', 'ㅚ', 'ㅝ', 'ㅞ', 'ㅟ', 'ㅢ'])
    result = ''
    for char in text:
        if '가' <= char <= '힣':  # Check if the character is Hangul
            syllables = jamotools.split_syllables(char)
            for syllable in syllables:
                if syllable in vowels:
                    result += syllable
    return result

sentence = u"앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집  깨죽은 검은 깨 깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라."
vowels_only = extract_vowels(sentence)
print(vowels_only)
# ㅏㅣㅏㅜㅡㅜㅡㅏㅜㅏㅜㅣㅗㅟㅣㅗㅜㅡㅐㅗㅏㅗㅗㅜㅜㅣㅣㅐㅜㅡㅓㅡㅐㅐㅜㅣㅔㅏㅏㅡㅡㅐㅗㅏㅗㅗㅜㅐㅜㅜㅓㅣㅡㅣㅓㅏㅓㅏ


# 모음만 추출 
def extract_consonants(text):
    consonants = set(['ㄱ', 'ㄲ', 'ㄴ', 'ㄷ', 'ㄸ', 'ㄹ', 'ㅁ', 'ㅂ', 'ㅃ', 'ㅅ', 'ㅆ', 'ㅇ', 'ㅈ', 'ㅉ', 'ㅊ', 'ㅋ', 'ㅌ', 'ㅍ', 'ㅎ'])
    result = ''
    for char in text:
        if '가' <= char <= '힣':  # Check if the character is Hangul
            syllables = jamotools.split_syllables(char)
            for syllable in syllables:
                if syllable in consonants:
                    result += syllable
    return result

sentence = u"앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집  깨죽은 검은 깨 깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라."
consonants_only = extract_consonants(sentence)
print(consonants_only)
# ㅇㅍㅈㅂㅍㅌㅈㄱㅇㄴㅂㅇㄴㅍㅌㅍㅅㅍㅌㅈㄱㅇㄱㄷㅅㅈㅂㅋㅇㅈㄱㅇㄴㅎㅅㅋㅇㄷㄴㅋㅇㅋㅇㅈㄱㅇㄹㅈㅂㄲㅈㄱㅇㄴㄱㅁㅇㄴㄲㄲㅈㄱㅇㄴㄷㅅㄹㅁㄷㄹㅇㄴㅎㅅㅋㅇㄷㄴㅋㅇㅋㅇㅈㄱㄲㅈㄱㅈㄱㅁㄱㄱㄹㄹㅅㅇㅎㄷㄹ


# 초성만 추출 
def extract_initial_consonants(text):
    result = ''
    for char in text:
        if '가' <= char <= '힣':  # Check if the character is Hangul
            initial_consonant = jamotools.split_syllable_char(char)[0]
            result += initial_consonant
    return result

sentence = u"앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집  깨죽은 검은 깨 깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라."
initial_consonants_only = extract_initial_consonants(sentence)
print(initial_consonants_only)
반응형
반응형

[python] 한글 자음 모음 분리하기 

 

 

pip install jamo

https://pypi.org/project/jamo/

 

jamo

A Hangul syllable and jamo analyzer.

pypi.org

https://github.com/jdongian/python-jamo  

 

------------

 

pip install jamotools

https://pypi.org/project/jamotools/

A library for Korean Jamo split and vectorize.  한국어 Jamo를 분할하고 벡터화하는 라이브러리입니다.

 

 

>>> import jamotools
>>> print(jamotools.split_syllable_char(u"안"))
('ㅇ', 'ㅏ', 'ㄴ')

>>> print(jamotools.split_syllables(u"안녕하세요"))
ㅇㅏㄴㄴㅕㅇㅎㅏㅅㅔㅇㅛ

>>> sentence = u"앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집
    깨죽은 검은 깨 깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라."
>>> s = jamotools.split_syllables(sentence)
>>> print(s)
ㅇㅏㅍ ㅈㅣㅂ ㅍㅏㅌㅈㅜㄱㅇㅡㄴ ㅂㅜㄺㅇㅡㄴ ㅍㅏㅌ ㅍㅜㅅㅍㅏㅌㅈㅜㄱㅇㅣㄱㅗ,
ㄷㅟㅅㅈㅣㅂ ㅋㅗㅇㅈㅜㄱㅇㅡㄴ ㅎㅐㅅㅋㅗㅇ ㄷㅏㄴㅋㅗㅇ ㅋㅗㅇㅈㅜㄱ.ㅇㅜㄹㅣ
ㅈㅣㅂ ㄲㅐㅈㅜㄱㅇㅡㄴ ㄱㅓㅁㅇㅡㄴ ㄲㅐ ㄲㅐㅈㅜㄱㅇㅣㄴㄷㅔ ㅅㅏㄹㅏㅁㄷㅡㄹㅇㅡㄴ
ㅎㅐㅅㅋㅗㅇ ㄷㅏㄴㅋㅗㅇ ㅋㅗㅇㅈㅜㄱ ㄲㅐㅈㅜㄱ ㅈㅜㄱㅁㅓㄱㄱㅣㄹㅡㄹ
ㅅㅣㅀㅇㅓㅎㅏㄷㅓㄹㅏ.

>>> sentence2 = jamotools.join_jamos(s)
>>> print(sentence2)
앞 집 팥죽은 붉은 팥 풋팥죽이고, 뒷집 콩죽은 햇콩 단콩 콩죽.우리 집 깨죽은 검은 깨
깨죽인데 사람들은 햇콩 단콩 콩죽 깨죽 죽먹기를 싫어하더라.

>>> print(sentence == sentence2)
True
반응형
반응형

[python] gTTS 한글 speak

 

# gTTS : Google Text to Speech API : Google에서 제공하는 TTS 서비스. gTTS라는 모듈을 인스톨해야 함
"""
    'en'으로 지정했을 때 한글이 text에 포함되어 있으면 이를 무시하지만, 'ko'일때 text내의 영문은 무시되지 않고 음성합성을 수행한다(아주 이상함)
    영어는 여자 성우, 한글은 남자 성우이다(변경 불가능)
"""

from gtts import gTTS
import pygame
import time

text ="안녕하세요, 여러분. 파이썬으로 노는 것은 재미있습니다!!!"

tts = gTTS(text=text, lang='ko')

tts.save("helloKO.mp3")



# Initialize Pygame mixer
pygame.mixer.init()

# Load the audio file
pygame.mixer.music.load("helloKO.mp3")

# Play the audio file
pygame.mixer.music.play()

# Allow time for the audio to play
time.sleep(10)

# Stop the playback
pygame.mixer.music.stop()

 

반응형
반응형

pyttsx3

*** https://medium.com/analytics-vidhya/easy-way-to-build-an-audiobook-using-python-20fc7d6fb1af

 

Easy Way To Build An Audiobook Using Python

An audiobook is nothing but a book that was recorded in an audio format. It can also be stated as a book that is being read aloud…

medium.com

 

pip install gTTS  https://pypi.org/project/gTTS/

 

https://pypi.org/project/pyttsx3/

 

pyttsx3

Text to Speech (TTS) library for Python 2 and 3. Works without internet connection or delay. Supports multiple TTS engines, including Sapi5, nsss, and espeak.

pypi.org

Text to Speech (TTS) library for Python 2 and 3. Works without internet connection or delay. Supports multiple TTS engines, including Sapi5, nsss, and espeak.

 

pip install pyttsx3

Project description

pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline, and is compatible with both Python 2 and 3.

Installation

pip install pyttsx3

If you recieve errors such as No module named win32com.client, No module named win32, or No module named win32api, you will need to additionally install pypiwin32.

Usage :

import pyttsx3
engine = pyttsx3.init()
engine.say("I will speak this text")
engine.runAndWait()

Changing Voice , Rate and Volume :

import pyttsx3
engine = pyttsx3.init() # object creation

""" RATE"""
rate = engine.getProperty('rate')   # getting details of current speaking rate
print (rate)                        #printing current voice rate
engine.setProperty('rate', 125)     # setting up new voice rate


"""VOLUME"""
volume = engine.getProperty('volume')   #getting to know current volume level (min=0 and max=1)
print (volume)                          #printing current volume level
engine.setProperty('volume',1.0)    # setting up volume level  between 0 and 1

"""VOICE"""
voices = engine.getProperty('voices')       #getting details of current voice
#engine.setProperty('voice', voices[0].id)  #changing index, changes voices. o for male
engine.setProperty('voice', voices[1].id)   #changing index, changes voices. 1 for female

engine.say("Hello World!")
engine.say('My current speaking rate is ' + str(rate))
engine.runAndWait()
engine.stop()

"""Saving Voice to a file"""
# On linux make sure that 'espeak' and 'ffmpeg' are installed
engine.save_to_file('Hello World', 'test.mp3')
engine.runAndWait()

Full documentation of the Library

https://pyttsx3.readthedocs.io/en/latest/

Included TTS engines:

  • sapi5
  • nsss
  • espeak

Feel free to wrap another text-to-speech engine for use with pyttsx3.

반응형
반응형

https://pandas.pydata.org/pandas-docs/stable/index.html

 

pandas documentation — pandas 2.2.1 documentation

API reference The reference guide contains a detailed description of the pandas API. The reference describes how the methods work and which parameters can be used. It assumes that you have an understanding of the key concepts.

pandas.pydata.org

 

Community tutorials — pandas 2.2.1 documentation

 

pandas.pydata.org

 

반응형
반응형

[python] PyMuPDF 에서 해상도 올리기. PDF to IMG

How to Increase Image Resolution

 

https://pymupdf.readthedocs.io/en/latest/recipes-images.html

 

Images - PyMuPDF 1.23.25 documentation

Previous Text

pymupdf.readthedocs.io

The image of a document page is represented by a Pixmap, and the simplest way to create a pixmap is via method Page.get_pixmap().

This method has many options to influence the result. The most important among them is the Matrix, which lets you zoom, rotate, distort or mirror the outcome.

Page.get_pixmap() by default will use the Identity matrix, which does nothing.

In the following, we apply a zoom factor of 2 to each dimension, which will generate an image with a four times better resolution for us (and also about 4 times the size):

zoom_x = 2.0  # horizontal zoom
zoom_y = 2.0  # vertical zoom
mat = fitz.Matrix(zoom_x, zoom_y)  # zoom factor 2 in each dimension
pix = page.get_pixmap(matrix=mat)  # use 'mat' instead of the identity matrix

dpi = 600
pix = page.get_pixmap(dpi)

Since version 1.19.2 there is a more direct way to set the resolution: Parameter "dpi" (dots per inch) can be used in place of "matrix". To create a 300 dpi image of a page specify pix = page.get_pixmap(dpi=300). Apart from notation brevity, this approach has the additional advantage that the dpi value is saved with the image file – which does not happen automatically when using the Matrix notation.

 
반응형
반응형

djangorestframework 3.14.0

 

 

https://www.django-rest-framework.org/

 


*** 참고 : https://towardsdatascience.com/designing-and-deploying-a-machine-learning-python-application-part-2-99eb37787b2b

https://pypi.org/project/djangorestframework/

pip install djangorestframework

Requirements Python 3.6+ Django 4.1, 4.0, 3.2, 3.1, 3.0 We highly recommend and only officially support the latest patch release of each Python and Django series.

Installation Install using pip...

pip install djangorestframework
Add 'rest_framework' to your INSTALLED_APPS setting.

INSTALLED_APPS = [
    ...
    'rest_framework',
]

 

Designing and Deploying a Machine Learning Python Application (Part 2)

You don’t have to be Atlas to get your model into the cloud

towardsdatascience.com

 

 

 

 

 

반응형
반응형

https://wikidocs.net/book/2070

 

Python 계단밟기

## Python을 배워보자 > 문제를 풀다보면 문법은 자동으로 알게된다. > 문법을 배우는것이 프로그램을 배우는것이 아니라 어떻게 활용하는냐가 중요하다

wikidocs.net

 

반응형

+ Recent posts