잡동사니 블로그
[Python] TabTransformer to use Tensorflow 본문
에 이어서 뭔가 적용 해보기 위해 Python을 들쑤시다가
공식코드
https://github.com/lucidrains/tab-transformer-pytorch
인데 왜인진 모르겠는데 자꾸 에러 떠서 github좀 돌아다니다가 발견한 repository 공유
https://github.com/aruberts/TabTransformerTF/tree/main
Pytorch가 아닌 Tensorflow로 되어있음.
from tabtransformertf.models.tabtransformer import TabTransformer
from sklearn.model_selection import train_test_split
from tabtransformertf.utils.preprocessing import df_to_dataset, build_categorical_prep
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from sklearn.metrics import f1_score
#titanic data use
df = sns.load_dataset('titanic')
#handle missing data
df = df.drop(['deck', 'age'], axis=1)
df = df.dropna()
df = df.reset_index(drop=True)
df = df[['sex', 'embarked', 'class', 'who', 'adult_male', 'embark_town', 'pclass', 'sibsp','alone','fare','survived']]
df = df.astype('str')
df['fare']=df['fare'].astype('float')
df['survived']=df['survived'].astype('int')
#assign a feature
categorical_features_list = ['sex', 'embarked', 'class', 'who', 'adult_male', 'embark_town', 'pclass', 'sibsp','alone']
numerical_features_list = ['fare']
label_feature=['survived']
category_prep_layers = build_categorical_prep(df, categorical_features_list)
# Split the data into training and testing sets
df, df1 = train_test_split(df, test_size=0.2, random_state=28)
#에러 뜨면 from tabtransformertf.utils.preprocessing import df_to_dataset << 이용
def df_to_dataset(
dataframe: pd.DataFrame,
target: str = None,
#test set -> shuffle = False
shuffle: bool = True,
batch_size: int = 32,
):
df = dataframe.copy()
if target:
labels = df.pop(target)
dataset = {}
for key, value in df.items():
#dataset[key] = value[:, tf.newaxis] # old version
dataset[key] = np.array(value)[:, tf.newaxis] # modified
dataset = tf.data.Dataset.from_tensor_slices((dict(dataset), labels))
else:
dataset = {}
for key, value in df.items():
#dataset[key] = value[:, tf.newaxis] # old version
dataset[key] = np.array(value)[:, tf.newaxis] # modified
dataset = tf.data.Dataset.from_tensor_slices(dict(dataset))
if shuffle:
dataset = dataset.shuffle(buffer_size=len(dataframe))
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(batch_size)
return dataset
#Make dataset class
train_dataset = df_to_dataset(df[categorical_features_list + numerical_features_list + label_feature], *label_feature)
#binary classification
model = TabTransformer(
numerical_features = numerical_features_list,
categorical_features = categorical_features_list,
categorical_lookup=category_prep_layers,
out_dim=1,
embedding_dim=32,
#out shape and function
out_activation='sigmoid',
#number of transformer block
depth=6,
#number of head attenion
heads=4,
#dropout in attention layers
attn_dropout=0.2,
#dropout in dense layers
ff_dropout=0.2,
#mlp layer unit factor
#hidden_units = [input_dim // f for f in factors]
mlp_hidden_factors=[2, 4],
# Fixed column embeddings
use_column_embedding=True,
)
#model train
model.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy'])
model.fit(train_dataset, epochs=40)
#model test
test_data = df_to_dataset(df1[categorical_features_list + numerical_features_list + label_feature], *label_feature,shuffle=False)
y_pred=[1 if i > 0.5 else 0 for i in model.predict(test_data)]
print(f1_score(df1['survived'].values,y_pred))
# Epoch 40/40
# 23/23 [==============================] - 1s 24ms/step - loss: 0.4314 - accuracy: 0.8172
# 6/6 [==============================] - 1s 8ms/step
# 0.7478260869565218
#reference
#https://github.com/aruberts/TabTransformerTF/tree/main
Titanic data으로 진행 데이터 타입을 맞추고, numerical_features, categorical_features만 잘 지정해주면 무리없이 코드가 실행됨.
'Python' 카테고리의 다른 글
[Python] Multi-input model in pytorch (3) | 2023.11.27 |
---|---|
[Python] 심심해서 만든 무신사 추천상품 크롤링 (1) | 2023.11.20 |
[Python] Folium을 이용한 지도 시각화 (1) | 2023.10.24 |
[Python] OpenCV로 Contour 및 Color 검출 하기 (0) | 2023.09.13 |
[Python] pytorch와 sklearn의 train_test_split 활용하여 데이터 셋 나누기와 간단한 CNN (0) | 2023.09.07 |