Notice
Recent Posts
Recent Comments
Link
«   2024/09   »
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30
Archives
Today
Total
관리 메뉴

잡동사니 블로그

[Python] TabTransformer to use Tensorflow 본문

Python

[Python] TabTransformer to use Tensorflow

코딩부대찌개 2023. 11. 19. 01:28

https://eupppo.tistory.com/entry/%EB%85%BC%EB%AC%B8-%EC%9D%BD%EA%B8%B0-TabTransformer-Tabular-Data-Modeling-Using-Contextual-Embeddings

 

[논문 읽기] TabTransformer: Tabular Data Modeling Using Contextual Embeddings

https://arxiv.org/abs/2012.06678 TabTransformer: Tabular Data Modeling Using Contextual Embeddings We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self

eupppo.tistory.com

에 이어서 뭔가 적용 해보기 위해 Python을 들쑤시다가

사진 넣을게 없어서 ai한테 tabtransformer 그려달라고 했더니 이렇게 나옴 -> ??

 

공식코드

https://github.com/lucidrains/tab-transformer-pytorch

 

GitHub - lucidrains/tab-transformer-pytorch: Implementation of TabTransformer, attention network for tabular data, in Pytorch

Implementation of TabTransformer, attention network for tabular data, in Pytorch - GitHub - lucidrains/tab-transformer-pytorch: Implementation of TabTransformer, attention network for tabular data,...

github.com

인데 왜인진 모르겠는데 자꾸 에러 떠서 github좀 돌아다니다가 발견한 repository 공유

 

https://github.com/aruberts/TabTransformerTF/tree/main

 

GitHub - aruberts/TabTransformerTF: TensorFlow implementation of TabTransformer

TensorFlow implementation of TabTransformer. Contribute to aruberts/TabTransformerTF development by creating an account on GitHub.

github.com

Pytorch가 아닌 Tensorflow로 되어있음.

from tabtransformertf.models.tabtransformer import TabTransformer
from sklearn.model_selection import train_test_split
from tabtransformertf.utils.preprocessing import df_to_dataset, build_categorical_prep
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
from sklearn.metrics import f1_score

#titanic data use
df = sns.load_dataset('titanic')

#handle missing data
df = df.drop(['deck', 'age'], axis=1)
df = df.dropna()
df = df.reset_index(drop=True)
df = df[['sex', 'embarked', 'class', 'who', 'adult_male', 'embark_town', 'pclass', 'sibsp','alone','fare','survived']]
df = df.astype('str')
df['fare']=df['fare'].astype('float')
df['survived']=df['survived'].astype('int')

#assign a feature 
categorical_features_list =  ['sex', 'embarked', 'class', 'who', 'adult_male', 'embark_town', 'pclass', 'sibsp','alone']
numerical_features_list = ['fare']
label_feature=['survived']
category_prep_layers = build_categorical_prep(df, categorical_features_list)

# Split the data into training and testing sets
df, df1 = train_test_split(df, test_size=0.2, random_state=28)

#에러 뜨면 from tabtransformertf.utils.preprocessing import df_to_dataset << 이용 
def df_to_dataset(
    dataframe: pd.DataFrame,
    target: str = None,
    #test set -> shuffle = False
    shuffle: bool = True,
    batch_size: int = 32,
): 
    df = dataframe.copy()
    if target:
        labels = df.pop(target)
        dataset = {}
        for key, value in df.items():
            #dataset[key] = value[:, tf.newaxis] # old version
            dataset[key] = np.array(value)[:, tf.newaxis] # modified
        dataset = tf.data.Dataset.from_tensor_slices((dict(dataset), labels))
    else:
        dataset = {}
        for key, value in df.items():
            #dataset[key] = value[:, tf.newaxis] # old version
            dataset[key] = np.array(value)[:, tf.newaxis] # modified
        dataset = tf.data.Dataset.from_tensor_slices(dict(dataset))
    if shuffle:
        dataset = dataset.shuffle(buffer_size=len(dataframe))
    dataset = dataset.batch(batch_size)
    dataset = dataset.prefetch(batch_size)
    return dataset

#Make dataset class
train_dataset = df_to_dataset(df[categorical_features_list +  numerical_features_list + label_feature], *label_feature)

#binary classification
model = TabTransformer(
    numerical_features = numerical_features_list,
    categorical_features = categorical_features_list,
    categorical_lookup=category_prep_layers,
    out_dim=1, 
    embedding_dim=32,  
    #out shape and function
    out_activation='sigmoid',
    #number of transformer block
    depth=6,
    #number of head attenion 
    heads=4,
    #dropout in attention layers
    attn_dropout=0.2,
    #dropout in dense layers
    ff_dropout=0.2,
    #mlp layer unit factor
    #hidden_units = [input_dim // f for f in factors]
    mlp_hidden_factors=[2, 4],
    # Fixed column embeddings
    use_column_embedding=True, 
)

#model train
model.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy'])
model.fit(train_dataset, epochs=40)

#model test
test_data = df_to_dataset(df1[categorical_features_list +  numerical_features_list + label_feature], *label_feature,shuffle=False)
y_pred=[1 if i > 0.5 else 0 for i in model.predict(test_data)]
print(f1_score(df1['survived'].values,y_pred))
# Epoch 40/40
# 23/23 [==============================] - 1s 24ms/step - loss: 0.4314 - accuracy: 0.8172
# 6/6 [==============================] - 1s 8ms/step
# 0.7478260869565218
#reference
#https://github.com/aruberts/TabTransformerTF/tree/main

Titanic data으로 진행 데이터 타입을 맞추고, numerical_features, categorical_features만 잘 지정해주면 무리없이 코드가 실행됨.