Gradient Accumulation

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Archives

Today

Total

관리 메뉴

잡동사니 블로그

Gradient Accumulation 본문

공부용

Gradient Accumulation

코딩부대찌개 2025. 1. 1. 15:37

Gradient Accumulation은 메모리 제약이 있는 환경에서 큰 batch를 가진 것과 같은 효과를 구현하기 위한 학습 기법으로 작은 크기의 mini-batch를 여러 번 처리하며 Gradient를 누적(accumulate)하고, 누적이 끝난 뒤에 Optimizer를 업데이트하는 방식으로 작동함.

Gradient Accumulation가 필요한 이유?

1.1. GPU 메모리 한계

배치 크기를 늘리면 더 많은 데이터와 모델 파라미터를 메모리에 로드해야 하며, 이는 종종 GPU 메모리 부족 문제를 일으킴.

1.2. 큰 배치 크기의 이점

Gradient 안정성: Gradient의 분산을 줄여 학습이 더 안정적임.
일반화 성능: 큰 배치 크기는 모델이 더 일반화된 패턴을 학습하도록 함.

model.train()
    
for epoch in range(num_epochs):     
    batch_loss = 0.0

    for i, (inputs, targets) in enumerate(train_dataloader):
        inputs, targets = inputs.to(device), targets.to(device)

        with autocast():
            outputs = model(inputs)
            #loss function
            loss = loss_fn(outputs, targets)

        # Gradient Accumulation 적용
        loss = loss / accumulation_steps
        scaler.scale(loss).backward()

        batch_loss += loss.item()

        # 일정 Step마다 가중치 업데이트
        if (i + 1) % accumulation_steps == 0 :
            scaler.step(optimizer)
            scaler.update()
            optimizer.zero_grad()

'공부용' 카테고리의 다른 글

OpenCV로 이미지 경계 강조하기 → 침식(Erode)과 팽창(Dilate) (0)	2025.02.07
현재 디렉토리에서 하위 디렉토리까지 ipynb_checkpoints 지우기 (0)	2025.02.05
[논문 읽기] SAM 2: Segment Anything in Images and Videos (3)	2024.10.29
Kmedoids clustering (3)	2024.10.16
Dice Coefficient(Dice Score) -> Dice Loss Function (1)	2024.10.09

'공부용' Related Articles

잡동사니 블로그

Gradient Accumulation 본문

Gradient Accumulation

Gradient Accumulation가 필요한 이유?

'공부용' 카테고리의 다른 글

티스토리툴바