filename='C:/Users/haesaekju/OneDrive/Documents/PyData/P00000001-ALL.csv'
chunksize=2 * 10 ** 5
for cnt, chunk in enumerate(pd.read_csv(filename, chunksize=chunksize)):
#preprocessing(chunk)
chunk.to_csv('str_' + str(cnt) + '.csv', header=['cmte_id','cand_id','cand_nm','contbr_nm','contbr_city','contbr_st','contbr_zip','contbr_employer','contbr_occupation','contb_receipt_amt','contb_receipt_dt','receipt_desc','memo_cd','memo_text','form_tp','file_num'])
if cnt >= 10:
break
반응형
'Python, PySpark' 카테고리의 다른 글
Python loop로 객체 삭제 (0) | 2022.05.13 |
---|---|
뱅크샐러드 과제물 작성 : 2021/11/16 (0) | 2021.11.16 |
SAS retain equivalent in Python (0) | 2021.08.22 |
python pandas.assign() 여러 변수 만들기 : 이전 변수 의존 (0) | 2021.06.26 |
Pseudo SAS Retain in Python (0) | 2021.06.18 |