Encoding

[PYTHON] 파일 인코딩 관련 2022.12.20
[python] pandas 결과값을 csv 파일 형식으로 누적해서 저장하기: to_csv 2022.04.05
[python] 큰 파일 분할해서 만들기 2020.12.14
sublime text3 , encoding, 한글인코딩 적용하기. 2019.04.18
[python] 파이썬에서 유니코드 스트림 다루기 2017.06.22
[M/L] tensorflow - Module: tf.compat 문자 인코딩 2017.03.30

[PYTHON] 파일 인코딩 관련

홍반장水_ 2022. 12. 20. 14:58

2022. 12. 20. 14:58

[PYTHON] 파일 인코딩 관련

UTF-8

UTF-8-SIG

""" 
    인코딩 정보
"""
import os
import sys 
 
 
#작업하는 경로(위치)가 어디인지 확인
#print(os.getcwd())
prePath_in = "./Project/" 
prePath_out = "./Project/" 



#1. 기본 내용적기

# 기본 텍스트 신규 입력
file = open("test.txt", "w")
file.write("내용입력")
file.close()

# 한글깨짐 방지 ENCODING UTF-8
file = open("test.txt", "w", encoding="UTF-8")
file.write("내용입력")
file.close()

# 한글깨짐 방지2 ENCODING UTF-8
# txt는 UTF-8로도 충분한데 csv는 UTF-8로만 하면 읽을땐 다른걸로 읽을 경우 깨짐 현상 발생
file = open("test.csv", "w", encoding="UTF-8-sig")
file.write("test,test,test\n")
file.write("잘되나,안된다,오된다\n")
file.close()

# 참고
# Permission denied: 'test.csv' 가 나온다
# 파일 열고 있어서 수정할 수 없다는거다. 꺼주자.

# 추가 입력
file = open("test.txt", "a")
file.write("추가 내용입력")
file.close()


# 읽기
file = open("test.txt", "r", encoding="UTF-8")
print(file.read())
file.close()

# with 함수 : open & close 포함 
with open("test.txt", "w", encoding="UTF-8") as file:
    file.write("내용입력")
with open("test.txt", "r", encoding="UTF-8") as file:
    print(file.read())
 

 

#2. print 한거 txt 파일에 넣기

# sys.stdout 함수 사용하여 log 저장하기
f = open('test.txt','w', encoding='utf-8') # 로그 저장할 file open
sys.stdout = f
print("내용입력")
sys.stdout = sys.__stdout__   # 원래의 stdout으로 복구
f.close()                     # 로그 파일 닫기

#이렇게 해도 되긴 하는데.. 프로그램이 종료 안되면 문제가 생길듯?
sys.stdout = open('test.txt','w', encoding='utf-8')
print("내용입력")

저작자표시 비영리 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

[python] 데이터 형식 변환 (Data Type Conversion) (0)	2022.12.22
[python] 환경변수 관련. os, sys (0)	2022.12.20
[python] psutil - Python에서 프로세스 및 시스템 모니터링을 위한 크로스 플랫폼 lib. (0)	2022.12.14
[python] Python Package Index (PyPI) : https://pypi.org/ (0)	2022.12.13
[python] sketchpy 0.1.0 (0)	2022.12.13

[python] pandas 결과값을 csv 파일 형식으로 누적해서 저장하기: to_csv

홍반장水_ 2022. 4. 5. 15:18

2022. 4. 5. 15:18

   
## 시간 표시  ##################################### 
import time
import datetime
now = datetime.datetime.now()

timeserise = time.time()
timeserise = str(int(timeserise))
print(timeserise)
print(now)
#################################################  

#작업하는 경로(위치)가 어디인지 확인
print(os.getcwd())

prePath = "./Project/DataCrawring/"


# CSV 파일로 저장
def dfToCsv(movie_df, num):
    try:
        # 파일이 존재하면 누적저장 mode='a', header=False
        if not os.path.exists(prePath  +'input/movie_data'+str(num) +'.csv'): 
            #movie_df.to_csv((prePath  +'input/movie_data'+str(num) +'.csv'),   header=False, line_terminator=False, encoding='utf-8-sig')        
            movie_df.to_csv((prePath  +'input/movie_data'+str(num) +'.csv'),   index=False, mode='w', header=True, line_terminator=False, encoding='utf-8-sig')
            print("First Save Success~~~ ")        
        else:
            movie_df.to_csv((prePath  +'input/movie_data'+str(num) +'.csv'),   index=False, mode='a', header=False, line_terminator=False, encoding='utf-8-sig')
            print("Add Save Success~~~ ")        
    except:
        print("Error - dfToCsv.....")

pandas 결과값을 csv 파일 형식으로 누적해서 저장하기: to_csv

to_csv Append Mode 사용하기

import pandas as pd
import os

# 샘플 데이터 생성
soda = {'상품명': ['콜라', '사이다'], '가격': [2700, 2000]}
df = pd.DataFrame(soda)

# .to_csv 
# 최초 생성 이후 mode는 append
if not os.path.exists('output.csv'):
    df.to_csv('output.csv', index=False, mode='w', encoding='utf-8-sig')
else:
    df.to_csv('output.csv', index=False, mode='a', encoding='utf-8-sig', header=False)

encoding='utf-8' 사용시 한글깨짐 현상이 발생하여, 'utf-8-sig'를 사용하였습니다. utf-8-sig에 관한 더 자세한 내용은 https://stackoverflow.com/questions/25788037/pandas-df-to-csvfile-csv-encode-utf-8-still-gives-trash-characters-for-min를 참고해주세요.

저작자표시 비영리 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

[python] 초보자를 위한 파이썬 300제 (0)	2022.04.07
[python] dictionary 를 json 으로 변환 (0)	2022.04.07
[python] 웹 크롤링 준비 - Beautifulsoup4, reqeusts, urllib3 (0)	2022.04.04
[python] 이런 에러시 " Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found " (0)	2022.03.21
[python] conda update (0)	2022.03.18

[python] 큰 파일 분할해서 만들기

홍반장水_ 2020. 12. 14. 16:26

2020. 12. 14. 16:26

큰 txt파일을 읽으면 메모리 문제 발생하기때문에 분할해서 단어장 처리.

import os
import sys
import konlpy
import pandas as pd
import numpy as np
os.environ['JAVA_OPTS'] = 'Xmx4096M'

import itertools

import mr #local module

file_name = "test_export_mentions_2020-11-17_title.txt"
#file_name = "test_export_mentions_2020-11-17_title_utf8.txt"  #test
file_out  = "outputfile"
lines_tot = mr.file_len(file_name)
filesize  = mr.getfilesize(file_name) * 1000
print("파일명 : ", file_name)
print("줄 개수 : ", lines_tot)
print("파일사이즈 : ", filesize)

f = open(file_name,'r', encoding='utf-8')
numbits  = 1000000
loop_num = round(os.stat(file_name).st_size/numbits+1)+1

print(os.stat(file_name).st_size/numbits+1)
print(loop_num)

for i in range(0, loop_num):
    o = open('./input/'+file_out+str(i)+'.txt','w', encoding='utf-8')
    segment = f.readlines(numbits)
    for c in range(0,len(segment)):
        o.write(segment[c]+"\n")
    o.close()

import itertools

def f_append(text):
    sign = 'N'
    #기존 파일의 단어를 가져와서 신규 단어가 있는지 확인
    with open('./replace_word.txt','r',encoding='utf-8') as f:
        list_word = f.read().strip().split('\n')
        for line in list_word:
            if line == text:
                sign = 'Exist'
                #print('Exist')
        '''
        list_word = f.read()
        if list_word.find(text) >=0:
            sign = 'N'
            print('Exist')
        '''
    if sign == 'N':
        #기존 파일에 단어추가
        with open('./replace_word.txt', 'a', encoding='utf-8') as myfile:
            myfile.write(text)
            myfile.write('\n')
            sign = 'Yes'

    return sign

def f_list():
    #단어파일을 list로 리턴
    with open('./replace_word.txt','r',encoding='utf-8') as f:
        list_word = f.read().strip().split('\n')
    return list_word

def f_del(text):
    #입력받은 단어를 삭제
    sign = 'N'
    matrix = []
    with open('./replace_word.txt','r',encoding='utf-8') as f:
        dic = f.read().strip().split('\n')

    for word in dic:
        if word != text:
            matrix.append(word)
        else:
            sign = 'Del';
    print(sign)
    print(dic)

    if sign == 'Del':
        with open('./replace_word.txt', 'w', encoding='utf-8') as myfile:
            #myfile.write(matrix)
            for line2 in matrix:
                print(line2)
                myfile.write(line2)
                myfile.write('\n')
            sign = 'Y'

    return sign

저작자표시 비영리 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

[python] wordcloud시 불용어 지정 (0)	2020.12.15
[python] sorted, 문자열 길이로 정렬, 한글 정렬 (0)	2020.12.15
[python] konlpy - Okt, komoran, Pykomoran (0)	2020.12.14
[python] Customized Konlpy 사용하기 (0)	2020.12.09
konlpy에서 다음과 같은 에러가 나옵니다. TypeError: No matching overloads found for kr.lucypark.okt.OktInterface.tokenize(list,java.lang.Boolean,java.lang.Boolean), options are: public java.util.List kr.lucypark.okt.OktInterface.tokenize(ja.. (0)	2020.12.09

sublime text3 , encoding, 한글인코딩 적용하기.

홍반장水_ 2019. 4. 18. 13:30

2019. 4. 18. 13:30

ST3에서 Ctrl + Shift + P로 패키지 팔렛트를 호출한 다음에 팔렛트 창에 install을 입력하면 Package Control: Install Package만 보이고 (위에 그림 참고), Package Control: Install Package를 선택한 다음에 CovertToUTF8을 찾아서 설치한다.

저작자표시 비영리 (새창열림)

'프로그래밍' 카테고리의 다른 글

WinMerge - 파일 소스 비교, 틀린부분 표시 (0)	2019.04.18
Total Commander - 파일 검색, 비교, 이름변경 등 안되는게 없다. (0)	2019.04.18
putty - https://www.putty.org/ SSH and telnet client (0)	2019.04.17
과학기술인 등록번호 https://www.ntis.go.kr/ (0)	2019.04.15
2019년 4월 tiobe 프로그래밍 언어 순위 (0)	2019.04.14

[python] 파이썬에서 유니코드 스트림 다루기

홍반장水_ 2017. 6. 22. 15:13

2017. 6. 22. 15:13

파이썬에서 유니코드 스트림 다루기

# 입력 스트림과 출력 스트림을 연다

input = open("input.txt", "rt", encoding="utf-16")

output = open("output.txt", "wt", encoding="utf-8")

# 유니코드 데이터 조각들을 스트리밍한다

with input, output:

while True:

# 데이터 조각을 읽고

chunk = input.read(4096)

if not chunk:

break

# 수직 탭을 삭제한다

chunk = chunk.replace("\u000B", "")

# 데이터 조각을 쓴다

output.write(chunk)

저작자표시 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

[python] Unofficial Windows Binaries for Python Extension Packages (0)	2017.06.27
[Python] matplotlib의 scatter 만들기 . 그래프 (0)	2017.06.26
[python] 응용: 쉘 스크립트를 이용해 해당 디렉토리의 파일들 모두 변경 (0)	2017.06.22
[Python] python matplotlib 에서 한글폰트 사용하기. font_manager 의 폰트 리스트 확인 (0)	2017.06.21
[Python] .py 파일 실핼시 날짜(파라미터)입력 받아서 쓰기 (0)	2017.06.21

[M/L] tensorflow - Module: tf.compat 문자 인코딩

홍반장水_ 2017. 3. 30. 19:07

2017. 3. 30. 19:07

https://www.tensorflow.org/api_docs/python/tf/compat

Module: tf.compat

Module `tf.compat`

Functions for Python 2 vs. 3 compatibility.

Conversion routines

In addition to the functions below, as_str converts an object to a str.

Types

The compatibility module also provides the following types:

bytes_or_text_types
complex_types
integral_types
real_types

Members

as_bytes(...): 바이트 또는 유니 코드를 bytesutf-8 인코딩을 사용하여 텍스트 로 변환합니다 .

as_str(...): 바이트 또는 유니 코드를 bytesutf-8 인코딩을 사용하여 텍스트 로 변환합니다 .

as_str_any(...): str와 같이 변환 str(value)하지만 as_strfor를 사용 합니다 bytes.

as_text(...): 주어진 인수를 유니 코드 문자열로 반환합니다.

Constant bytes_or_text_types

Constant complex_types

Constant integral_types

Constant real_types

Defined in tensorflow/python/util/compat.py.

저작자표시 (새창열림)

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

한국 자율주행차 테스트, 이제 ‘K-CITY’에서 (0)	2017.04.07
[M/L] TensorFlow-Tutorials (0)	2017.03.31
[M/L] scikit-learn: machine learning in Python http://scikit-learn.org (0)	2017.03.30
NLP란 무엇인가요? http://konlpy.org/ko/v0.4.4/start/ (0)	2017.03.30
python, pip, conda, KNL, word2vec example, Pycharm (0)	2017.03.28

PREV 이전 1 NEXT 다음

긍정적 사고, 음식의 절제, 규칙적인 운동

Encoding

[PYTHON] 파일 인코딩 관련

'프로그래밍 > Python' 카테고리의 다른 글

[python] pandas 결과값을 csv 파일 형식으로 누적해서 저장하기: to_csv

to_csv Append Mode 사용하기

'프로그래밍 > Python' 카테고리의 다른 글

[python] 큰 파일 분할해서 만들기

'프로그래밍 > Python' 카테고리의 다른 글

sublime text3 , encoding, 한글인코딩 적용하기.

'프로그래밍' 카테고리의 다른 글

[python] 파이썬에서 유니코드 스트림 다루기

파이썬에서 유니코드 스트림 다루기

'프로그래밍 > Python' 카테고리의 다른 글

[M/L] tensorflow - Module: tf.compat 문자 인코딩

Module: tf.compat

Module `tf.compat`

Conversion routines

Types

Members

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

+ Recent posts

티스토리툴바

Encoding

'프로그래밍 > Python' 카테고리의 다른 글

to_csv Append Mode 사용하기

'프로그래밍 > Python' 카테고리의 다른 글

'프로그래밍 > Python' 카테고리의 다른 글

'프로그래밍' 카테고리의 다른 글

파이썬에서 유니코드 스트림 다루기

'프로그래밍 > Python' 카테고리의 다른 글

Module: tf.compat

Module tf.compat

Conversion routines

Types

Members

'프로그래밍 > AI_DeepLearning' 카테고리의 다른 글

티스토리툴바

Module `tf.compat`