'파이썬(Python)&판다스(Pandas)&Polars' 카테고리의 글 목록 (2 Page)

[polars] pl.Config(fmt_str_lengths=) 데이터프레임의 글자를 모두 출력

https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.Config.set_fmt_str_lengths.html

파이썬(Python)&판다스(Pandas)&Polars 2023. 12. 12. 20:08

[Python] 변수명을 for문으로 할당하기 - globals()

전역변수를 활용하면 가능하다. 예를 들어, 시도명 또는 시도코드마다 데이터 프레임을 만들고 싶을 때 아래와 같은 코드를 사용할 수 있음 global()[변수명] sido_code_list = { '11': '서울', '51': '강원', '41': '경기', '48': '경남', '47': '경북', '29': '광주', '27': '대구', '30': '대전', '26': '부산', '36': '세종', '31': '울산', '28': '인천', '46': '전남', '45': '전북', '50': '제주', '44': '충남', '43': '충북' } for code, sido in sido_code_list.items(): # 동적으로 변수명 생성 df_name = f"df_{code}" #df_1..

파이썬(Python)&판다스(Pandas)&Polars 2023. 11. 21. 18:44

[polars] 특정 조건을 만족하는 데이터 조회(df.filter)

pandas는 df[[df['컬럼'] == 'abc']] 이런 구문으로 찾았는데, polars는 filter라는 함수를 사용함 df.filter(pl.col("컬럼") == "abc") 조건이 여러개인 경우 공식 홈페이지 예시 df.filter((pl.col("foo") < 3) & (pl.col("ham") == "a")) 중복값이 있는 행 조 df.filter(pl.col('컬럼').is_duplicated())

파이썬(Python)&판다스(Pandas)&Polars 2023. 11. 21. 16:29

[polars] 셀 안의 문자열 또는 리스트 값 모두 보이게 출력(polars.Config.set_fmt_str_lengths, max_colwidth)

polas에서 데이터프레임을 출력하면 긴 문자열을 아래 사진처럼 끊긴다 pandas에서는 max_colwidth을 쓰면 해결됐었음 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.option_context.html pandas.option_context — pandas 2.1.3 documentation next pandas.option_context.__call__ pandas.pydata.org polars의 문법 pl.Config.set_fmt_str_lengths(200) 공식 홈페이지의 예시 코드 df = pl.DataFrame( { "txt": [ "Play it, Sam. Play 'As Time Goes By'.", "T..

파이썬(Python)&판다스(Pandas)&Polars 2023. 11. 21. 15:07

[polars] 데이터프레임의 특정 컬럼 또는 모든 컬럼의 데이터 타입 변경하기(cast)

공식홈페이지 기본 문법 df.cast({"foo": pl.Float32, "bar": pl.UInt8}) 모든 데이터의 형식을 str(문자열)로 바꾸기 -> utf8 df = df.with_columns(pl.all().cast(pl.Utf8, strict=False)) with_columns로 모든 컬럼(pl.all())을 불러오고, 모든 컬럼의 데이터 타입을 cast로 지정해 변환함 strict는 예외발생시 강제로 수행할 것인가를 지정함 (참고용) 리스트 안에 있는 값을 str으로 변환하는 코드 출처: https://stackoverflow.com/questions/75628413/cast-column-of-type-list-to-str-in-polars df.with_columns(pl.col("f..

파이썬(Python)&판다스(Pandas)&Polars 2023. 11. 21. 12:53

[polars] write_csv로 UTF-8-SIG 처럼 저장하기(include_bom)

pandas와 다르게, index가 포함되지 않고 저장됨 # 기본 코드 df.write_csv("파일명.csv") 판다스는 인코딩을 특정해서 저장할 수 있지만, polars는 현재(2023-11-21) 지원하지 않음 대신, 계속 업데이트 중이라서 최근 `include_bom` 변수로 윈도우의 인코딩을 지원할 수 있게 되었음 (polars 0.19.15 버전 기준) https://github.com/pola-rs/polars/pull/12253 df.write_csv("파일명.csv", include_bom=True) 이외의 파라미터는 공식홈페이지를 참고 아래는 단순 번역한 내용임 (최신 라이브러리라서, 계속 변경될 가능성이 높음) parameter: - file (파일): 결과가 기록될 파일 경로 또는 ..

파이썬(Python)&판다스(Pandas)&Polars 2023. 11. 21. 11:19

[polars] with_columns(), map_elements(=apply) 컬럼 전처리 후 새로운 컬럼 만들기

모두 null값이 없는 컬럼이라면 아래와 같이 사용할 수 있음 df = df.with_columns(새컬럼 = pl.col('참조컬럼1') + pl.col('참조컬럼2')) null값이 존재하는 행을 처리하려면 when, then, otherwise 사용 df = df.with_columns(새컬럼 = pl.when(pl.col('참조컬럼2') != None).then(pl.col('참조컬럼1') + pl.col('참조컬럼2')).otherwise(None)) 참조컬럼2에 null(None)이 아니면 (when) 참조컬럼1+참조컬럼2의 값을 기입(then) 참조컬럼2에 null(None)이면 (when) 참조컬럼1+참조컬럼2의 값이 아니라(otherwise) None을 기입 apply(=map_eleme..

파이썬(Python)&판다스(Pandas)&Polars 2023. 11. 17. 10:45

[polars] read_csv, 특정 문자열을 None 처리, dtypes 설정

csv 파일을 읽어올 때, 특정 문자열을 None으로 바꾸기 df = pl.read_csv("data.csv", dtype=str, na_values=["", " "]) 참고: stackoverflow df = pl.read_csv("test.csv", infer_schema_length=0).with_columns(pl.all().cast(pl.Utf8, strict=False)) 데이터프레임이 이미 생성된 경우, 특정 문자열을 None으로 바꾸기 df = df.with_columns( pl.when(pl.col(pl.Utf8) == "") .then(None) .otherwise(pl.col(pl.Utf8)) # keep original value .name.keep() ) 여러 문자열 중 하나라도 ..

파이썬(Python)&판다스(Pandas)&Polars 2023. 11. 17. 10:25

[Python] 여러 딕셔너리(dictionary)를 하나의 딕셔너리로 합치기

**는 Python에서 unpacking operator로 사용됩니다. 딕셔너리 unpacking을 할 때, **를 사용하면 딕셔너리 안의 key-value 쌍들을 분리해서, 해당 key와 value를 변수들에 각각 할당할 수 있다. 예를 들어, 다음과 같이 두 개의 딕셔너리를 하나로 합치고 싶을 때, dict1 = {'a': 1, 'b': 2} dict2 = {'c': 3, 'd': 4} **를 사용하여 unpacking하면 다음과 같이 작성할 수 있다. combined_dict = {**dict1, **dict2} print(combined_dict) # {'a': 1, 'b': 2, 'c': 3, 'd': 4} 이처럼 **를 사용하면 여러 개의 딕셔너리를 하나로 합치기 쉬워지며, 만약 키 값이 중복..

파이썬(Python)&판다스(Pandas)&Polars 2023. 5. 3. 16:33

[Pandas] groupby, agg 여러 행을 단일 행의 리스트로 넣기

pandas의 explode 함수의 반대 기능을 하는 코드를 소개합니다. chatGPT의 도움을 받아 작성된 코드입니다. 문자열을 단일 행에 리스트로 합치는 방법은 다음과 같이 groupby와 agg를 사용하여 구현할 수 있습니다. import pandas as pd # 예시 데이터프레임 df = pd.DataFrame({ 'id': [1, 1, 2, 2], 'text': ['hello', 'world', 'foo', 'bar'] }) # 문자열을 합쳐서 리스트로 만들기 grouped = df.groupby('id')['text'].apply(list).reset_index(name='text_list') 위 코드에서 groupby를 사용하여 id를 그룹으로 지정하고, agg 함수 대신 apply 함수를..

파이썬(Python)&판다스(Pandas)&Polars 2023. 4. 28. 14:07

여분의 해마

티스토리툴바

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31