본문 바로가기

Deep Learning

Pytorch DDP (Distributed Data Parallel) 사용 관련 웹페이지 모음

반응형

1. Pytorch Tutorial

[overview] https://pytorch.org/tutorials/beginner/dist_overview.html

 

PyTorch Distributed Overview — PyTorch Tutorials 2.0.1+cu117 documentation

PyTorch Distributed Overview Author: Shen Li Note View and edit this tutorial in github. This is the overview page for the torch.distributed package. The goal of this page is to categorize documents into different topics and briefly describe each of them.

pytorch.org

[getting started with DDP] https://pytorch.org/tutorials/intermediate/ddp_tutorial.html

 

Getting Started with Distributed Data Parallel — PyTorch Tutorials 2.0.1+cu117 documentation

Getting Started with Distributed Data Parallel Author: Shen Li Edited by: Joe Zhu Note View and edit this tutorial in github. Prerequisites: DistributedDataParallel (DDP) implements data parallelism at the module level which can run across multiple machine

pytorch.org

2. 개인 블로그

[설명 모음] https://kongsberg.tistory.com/7

 

pytorch Distributed DataParallel 설명 (multi-gpu 하는 법)

from torch.utils.data.distributed import DistributedSampler train_dataset = datasets.ImageFolder(traindir, ...) train_sampler = DistributedSampler(train_dataset) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, shuffle

kongsberg.tistory.com

[용어 가이드] https://better-tomorrow.tistory.com/entry/Pytorch-Multi-GPU-%EC%A0%95%EB%A6%AC-%EC%A4%91

 

Pytorch Multi-GPU 정리 중

용어 [노드] (분산 처리에서는 GPU가 달려 있는 machine을 node라는 용어로 지칭한다고 함) ex, 컴퓨터가 한 대 이면 node 1 ex, 컴퓨터가 두 대 이면 node 2 [World SIze] Number of processes participating in the job 작업

better-tomorrow.tistory.com

[예제와 함께 설명] https://dbwp031.tistory.com/32

 

[Pytorch] DDP-Distributed Data Parallel 구현

도입 ImageNet과 같이 큰 모델을 학습할 때엔 multi-gpu를 활용하여 학습속도를 향상시킬 수 있습니다. DataParallel과 DistributedDataParallel이 대표적인 방법입니다. DataParallel방법은 딱 한 줄만 수정해줘도

dbwp031.tistory.com