Pytorch로 간단한 모듈 만들기

AI 기초이론

Pytorch로 간단한 모듈 만들기 - 1

피라냐콜라다 2023. 3. 17. 14:47

실제 딥러닝 모델을 만들기에 앞서, 이번 포스팅에서는 간단한 Custom 모듈을 만들어보고자 한다. 아무래도 이번 포스팅은 여러 모듈 예제를 보여주다 보니 길이가 많이 길어질 것 같다.

이를 위해 nn.Module 이라는 모듈을 이용할 것이다. 자세한 내용은 아래서 찾아볼 수 있다.

https://pytorch.org/docs/stable/generated/torch.nn.Module.html?highlight=nn+module#torch.nn.Module

Module — PyTorch 1.13 documentation

Shortcuts

pytorch.org

흔히 모델이라 하면 여러 layer와 activation function이 연결된 복잡한 구조를 상상하지만 넓게 보면 모델은 모듈의 하나이다. 간단하게 nn.Module을 이용해 더하기를 하는 모듈을 만들어보자

import torch
from torch import nn

class Add(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x1, x2):
        return torch.add(x1, x2)


x1 = torch.tensor([1])
x2 = torch.tensor([2])

add = Add()
add(x1, x2)

>>> tensor([3])

위와같이 모듈을을 만들때, forward의 의미는 해당 definition 아래에 있는 연산을 해당 모듈이 사용될때마다 실행하는 것이다. 그렇기에 Add()모듈만 불러와 사용했는데도 덧셈이 진행되는 것이다.

아래의 예시들을 이용해 module의 여러 기능들에 대해 알아보자. 그에 앞서 주어진 tensor에 받은 값을 더하는 모듈 Add를 정의하겠다.

class Add(nn.Module):
    def __init__(self, value):
        super().__init__()
        self.value = value

    def forward(self, x):
        return x + self.value

모듈 여러개를 연속으로 시행하고 싶을때 여러가지 방법으로 구현할 수 있다.

1. Sequential

Sequential()을 이용하면 여러 모듈을 연속으로 연결하여 사용할 수 있다.

calculator=nn.Sequential(Add(3), Add(2), Add(5))

x = torch.tensor([1])

output = calculator(x)

>>> tensor([11])

2. ModuleList

새로운 모듈을 생성하고, ModuleList()를 사용해 모듈의 리스트를 형성하고 forward에서 모듈리스트에 있는 모듈들을 실행하게 만들 수도 있다.

class Calculator(nn.Module):
    def __init__(self):
        super().__init__()
        self.add_list = nn.ModuleList([Add(2), Add(3), Add(5)])

    def forward(self, x):
        for module in self.add_list:
            x = module(x)
        return x
        
x = torch.tensor([1])

calculator = Calculator()
output = calculator(x)

>>> tensor([11])

3. ModuleDict

사용할 모듈들의 갯수가 몇개 안된다면 위처럼 list로 관리해도 상관이 없지만 모듈의 수가 많아지면 indexing으로 모듈을 찾아 쓰기 곤란할 수 있다. 그럴때는 dict형식의 object를 사용해 key와 value형식으로 모듈들을 관리하여 사용할 수 있다.

class Calculator(nn.Module):
    def __init__(self):
        super().__init__()
        self.add_dict = nn.ModuleDict({'add2': Add(2),
                                       'add3': Add(3),
                                       'add5': Add(5)})

    def forward(self, x):
        x = self.add_dict["add2"](x)
        x = self.add_dict["add3"](x)
        x = self.add_dict["add5"](x)
        return x

x = torch.tensor([1])

calculator = Calculator()
output = calculator(x)

>>> tensor([11])

이때 list가 아닌 ModuleList를 쓰는 이유는 list를 사용할 경우 list 자체와 list안에 담겨있는 모듈들이 submodule에 저장이 되지 않기 때문이다.

이번에는 Linear transformation의 기능을 가진 간단한 모듈을 만들어보자 간단하게 Wx+b를 하는 모듈을 만들어보자

class Linear(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()
        self.W = Parameter(torch.ones(out_features, in_features))
        self.b = Parameter(torch.ones(in_features, out_features))
        

    def forward(self, x):
        output = torch.addmm(self.b, x, self.W.T)
        return output

x = torch.Tensor([[1, 2],
                  [3, 4]])

linear = Linear(2, 3)
output = linear(x)

>>> tensor([[4,4,4][8,8,8]])

#[1,2][3,4]@[1,1,1][1,1,1]+[1][1]=[4,4,4][8,8,8]

이 경우에 tensor 그 자체로 지정하는 것이 아닌 Parameter()를 이용해 tensor를 할당하는 것은 Parameter로 할당해줘야만 backpropagation 과정에서 gradient값을 계산하여 업데이트 해주고 모델을 저장할때 값을 저장해주기 때문이다.

다음으로 아래의 예시 코드와 함께 유용할수도 있는 몇가지 module 함수를 알아보도록 하자

class Function_A(nn.Module):
    def __init__(self, name):
        super().__init__()
        self.name = name

    def forward(self, x):
        x = x * 2
        return x

class Function_B(nn.Module):
    def __init__(self):
        super().__init__()
        self.W1 = Parameter(torch.Tensor([10]))
        self.W2 = Parameter(torch.Tensor([2]))

    def forward(self, x):
        x = x / self.W1
        x = x / self.W2

        return x

class Function_C(nn.Module):
    def __init__(self):
        super().__init__()
        self.register_buffer('duck', torch.Tensor([7]), persistent=True)

    def forward(self, x):
        x = x * self.duck
        
        return x

class Function_D(nn.Module):
    def __init__(self):
        super().__init__()
        self.W1 = Parameter(torch.Tensor([3]))
        self.W2 = Parameter(torch.Tensor([5]))
        self.c = Function_C()

    def forward(self, x):
        x = x + self.W1
        x = self.c(x)
        x = x / self.W2

        return x


# Layer
class Layer_AB(nn.Module):
    def __init__(self):
        super().__init__()

        self.a = Function_A('duck')
        self.b = Function_B()

    def forward(self, x):
        x = self.a(x) / 5
        x = self.b(x)

        return x

class Layer_CD(nn.Module):
    def __init__(self):
        super().__init__()

        self.c = Function_C()
        self.d = Function_D()

    def forward(self, x):
        x = self.c(x)
        x = self.d(x) + 1

        return x


# Model
class Model(nn.Module):
    def __init__(self):
        super().__init__()

        self.ab = Layer_AB()
        self.cd = Layer_CD()

    def forward(self, x):
        x = self.ab(x)
        x = self.cd(x)

        return x

x = torch.tensor([7])

model = Model()
model(x)

위의 예시 코드는 model아래 두개의 layer, 각 layer아래에 두개의 Function이 있는 형태의 모델이다. 그러나 남이 보기에는 한번에 그 구조를 알기 어렵다. 이럴때, named_children, named_modules의 함수를 이용해 그 구조를 파악해볼 수 있겠다.

1. named_modules()

이 함수는 모듈 아래에 있는 모든 모듈의 이름과 그 모듈을 구한다.

for name, module in model.named_modules():
    print(f"[ Name ] : {name}\n[ Module ]\n{module}")
    print("-" * 30)

위 코드는 model이라는 모듈 아래에 있는 모든 모듈들의 이름과 그 형성 구조를 출력하고, 아래와 같은 출력을 가진다.

[ Name ] : 
[ Module ]
Model(
  (ab): Layer_AB(
    (a): Function_A()
    (b): Function_B()
  )
  (cd): Layer_CD(
    (c): Function_C()
    (d): Function_D(
      (c): Function_C()
    )
  )
)
------------------------------
[ Name ] : ab
[ Module ]
Layer_AB(
  (a): Function_A()
  (b): Function_B()
)
------------------------------
[ Name ] : ab.a
[ Module ]
Function_A()
------------------------------
[ Name ] : ab.b
[ Module ]
Function_B()
------------------------------
[ Name ] : cd
[ Module ]
Layer_CD(
  (c): Function_C()
  (d): Function_D(
    (c): Function_C()
  )
)
------------------------------
[ Name ] : cd.c
[ Module ]
Function_C()
------------------------------
[ Name ] : cd.d
[ Module ]
Function_D(
  (c): Function_C()
)
------------------------------
[ Name ] : cd.d.c
[ Module ]
Function_C()
------------------------------

2. named_children()

이 함수는 모든 모듈이 아닌 해당 모듈 바로 아래에 있는 모듈들만 구한다.

for name, child in model.named_children():
    print(f"[ Name ] : {name}\n[ Children ]\n{child}")
    print("-" * 30)

위 코드는 model 바로 아래에 있는 두 layer만 출력한다.

[ Name ] : ab
[ Children ]
Layer_AB(
  (a): Function_A()
  (b): Function_B()
)
------------------------------
[ Name ] : cd
[ Children ]
Layer_CD(
  (c): Function_C()
  (d): Function_D(
    (c): Function_C()
  )
)
------------------------------

만약 이름은 필요없고 module만 필요하다면 modules() 또는 children()만 사용하면 된다.

3. get_submodule()

위의 출력의 [Name] 에 표시되는 값을 이용하여 바로 그 모듈을 불러올 수 있다.

model.get_submodule("ab.a")

>>> Function_A()

4. named_parameters()

모듈아래에 있는 모든 parameter들과 그 이름을 불러온다.

for name, parameter in model.named_parameters():
    print(f"[ Name ] : {name}\n[ Parameter ]\n{parameter}")
    print("-" * 30)

위 코드는 아래와 같은 출력을 가진다.

[ Name ] : ab.b.W1
[ Parameter ]
Parameter containing:
tensor([10.], requires_grad=True)
------------------------------
[ Name ] : ab.b.W2
[ Parameter ]
Parameter containing:
tensor([2.], requires_grad=True)
------------------------------
[ Name ] : cd.d.W1
[ Parameter ]
Parameter containing:
tensor([3.], requires_grad=True)
------------------------------
[ Name ] : cd.d.W2
[ Parameter ]
Parameter containing:
tensor([5.], requires_grad=True)
------------------------------

특정 parameter만 불러오고 싶다면 get_parameter() 를 쓰면 된다.

parameter = model.get_parameter("ab.b.W1")

>>> tensor([10.], requires_grad=True)

5. repr(), extra_repr()

repr() 함수를 사용하면 위의 loop를 사용하지 않고 간단하게 모델의 형태를 볼 수 있다.

model_repr = repr(model)

>>>
Model(
  (ab): Layer_AB(
    (a): Function_A()
    (b): Function_B()
  )
  (cd): Layer_CD(
    (c): Function_C()
    (d): Function_D()
  )
)

extra_repr()란?

repr() 함수를 실행하기 전에 extra_repr()라는 이름의 함수를 만들고 그 내용을 쓰면 repr()를 실행할때 추가로 출력을 시킬 수 있다.

예를 들어 위 함수의 A를 다음과 같이 변형시켜보자

class Function_A(nn.Module):
    def __init__(self, name):					#name 변수 추가
        super().__init__()
        self.name = name

    def forward(self, x):
        x = x * 2
        return x

    def extra_repr(self):					#extra_repr 정의
        return 'name={}'.format(self.name)

그리고 Function_A에 name 변수를(예를 들어 "duck") 넣기 위해 Layer_AB도 수정해주자

class Layer_AB(nn.Module):
    def __init__(self):
        super().__init__()

        self.a = Function_A('duck')				ㅓ#name 변수에 "duck" 할당
        self.b = Function_B()

    def forward(self, x):
        x = self.a(x) / 5
        x = self.b(x)

        return x

그후 똑같이 repr() 함수를 적용하면 다음과 같은 결과가 나온다

model_repr = repr(model)

>>>
Model(
  (ab): Layer_AB(
    (a): Function_A(name=duck)
    (b): Function_B()
  )
  (cd): Layer_CD(
    (c): Function_C()
    (d): Function_D()
  )
)