(TVM开发代码学习)熟悉Relay算子的代码

2021-12-31 15:02:58 阅读：381 来源： 互联网

标签：kernel layout Relay 代码 TVM attrs indices data out

本文作为上一篇文章(TVM开发代码学习)给Relay系统添加新算子 - 知乎 (zhihu.com)的补充，主要是从6个部分熟悉Relay算子代码，从添加一个算子的角度去解构TVM中的Relay算子。

回顾上一节，TVM中给Relay增加算子需要有以下几点：

在src文件里的的定义一个继承自AttrsNode的结构体，包含算子的一般属性参数。 include/tvm/relay/attrs/
src文件夹里的type relation函数 src/relay/op/
注册算子的属性信息src/relay/op/
python端的计算定义python/tvm/topi/
python端的计算+调度打包函数python/tvm/relay/op/strategy/
在src文件夹里创建该算子的CallNode实例，并且注册。 src/relay/op/
python端的简洁API，实现最后的功能。python/tvm/relay/op

注意，在下面，因为步骤3和6都是在进行某些类的注册，所以我会把这两步骤并在一起（说实话，我并不是很理解官方指引里把这分成两个步骤的意思，它们一般都在一个cc文件里，实现的功能也类似，也许是为了叙述逻辑上通顺？）。6是针对CallNode类的注册，创建一个算子的CallNode实例即可全局注册，而3是针对步骤1创建的AttrsNode和步骤2创建的type relation的注册，

所以TVM中Relay算子代码分为以下六个部分：

1、一个继承自AttrsNode的结构体，包含算子的一般属性参数。 include/tvm/relay/attrs/

2、src文件夹里的type relation函数 src/relay/op/

3、注册算子的属性信息AttrsNode、类型关系以及创建CallNode并注册。src/relay/op/

4、python端的计算定义python/tvm/topi/

5、python端的计算+调度打包函数python/tvm/relay/op/strategy/

6、python端的简洁API，实现最后的功能。python/tvm/relay/op

随手画个图，更清晰一点：

为了更加熟悉其流程，今天挑Softmax算子和Conv1d算子的代码看看。

Sofrmax算子：

1.AttrsNode

/*! \brief Attributes used in softmax operators */
struct SoftmaxAttrs : public tvm::AttrsNode<SoftmaxAttrs> {
  int axis;

  TVM_DECLARE_ATTRS(SoftmaxAttrs, "relay.attrs.SoftmaxAttrs") {
    TVM_ATTR_FIELD(axis).set_default(-1).describe("The axis to sum over when computing softmax.");
  }
};

源代码位于include\tvm\relay\attrs\nn.h

Softmax算子的属性只需要一个axis就够了。

2.type relation函数

softmax的type relation函数非常简单，毕竟算子本身就简单。它没有定义一个属于softmax的type relation，而是使用了IdentityRel()，和众多算子(relu、fast_softmax、l2_normalize等)公用这个类型关联函数。这些算子的输入输出类型、形状都相同。

源码位于\src\relay\op\type_relations.cc

bool IdentityRel(const Array<Type>& types, int num_inputs, const Attrs& attrs,
                 const TypeReporter& reporter) {
  for (size_t i = 1; i < types.size(); ++i) {
    reporter->Assign(types[i], types[0]);
  }
  return true;
}

3.AttrsNode注册、Type Relation注册、CallNode注册

针对AttrsNode的注册TVM_REGISTER_NODE_TYPE

针对算子type relation的注册RELAY_REGISTER_OP

源码位于\src\relay\op\nn\nn.cc

TVM_REGISTER_NODE_TYPE(SoftmaxAttrs);
RELAY_REGISTER_OP("nn.softmax")
    .describe(R"code(Softmax layer.

.. math:: \text{softmax}(x)_i = \frac{exp(x_i)}{\sum_j exp(x_j)}

.. note::
    This operator can be optimized away for inference.

- **data**: The input data
)code" TVM_ADD_FILELINE)
    .set_attrs_type<SoftmaxAttrs>()
    .set_num_inputs(1)
    .add_argument("data", "Tensor", "The input tensor.")
    .set_support_level(1)
    .add_type_rel("Identity", IdentityRel);

然后，创建一个CallNode对象，并使用注册宏TVM_REGISTER_GLOBAL注册。

TVM_REGISTER_GLOBAL("relay.op.nn._make.softmax").set_body_typed([](Expr data, int axis) {
  auto attrs = make_object<SoftmaxAttrs>();
  attrs->axis = axis;
  static const Op& op = Op::Get("nn.softmax");
  return Call(op, {data}, Attrs(attrs), {});
});

不同于上一节还写了个MakeCumsum函数返回CallNode，这里因为softmax简单，他直接将callnode创建和宏注册放在了一起。

4.python端的topi库中定义具体计算

源码在\python\tvm\topi\nn\softmax.py，这或许是整个TVM算子定义里最难的一步，涉及了其张量表达式的设计。

def softmax_common(x, axis, use_fast_exp):
    """The common part of softmax and fast_softmax"""
    shape = x.shape
    if axis < 0:
        axis = len(shape) + axis
    if axis >= len(shape):
        ValueError("axis parameter should be less than input dim")

    k1 = te.reduce_axis((0, shape[axis]), name="k")
    k2 = te.reduce_axis((0, shape[axis]), name="k")

    def insert_reduce_index(indices, reduce_index):
        return indices[:axis] + (reduce_index,) + indices[axis:]

    def get_non_reduce_indices(indices):
        return tuple([var for (i, var) in enumerate(indices) if i != axis])

    def _compute_max(*indices):
        eval_range = insert_reduce_index(indices, k1)
        return tvm.te.max(x[eval_range], axis=k1)

    def _compute_delta(max_elem, *indices):
        non_reduce_indices = get_non_reduce_indices(indices)
        return x[indices] - max_elem[non_reduce_indices]

    def _compute_exp(max_elem, *indices):
        non_reduce_indices = get_non_reduce_indices(indices)
        return te.exp(x[indices] - max_elem[non_reduce_indices])

    def _compute_expsum(exp, *indices):
        eval_range = insert_reduce_index(indices, k2)
        return te.sum(exp[eval_range], axis=k2)

    def _normalize(exp, expsum, *indices):
        non_reduce_indices = get_non_reduce_indices(indices)
        return exp[indices] / expsum[non_reduce_indices]

    reduced_shape = tuple([dim for (i, dim) in enumerate(shape) if i != axis])
    max_elem = te.compute(reduced_shape, _compute_max, name="T_softmax_maxelem")

    if use_fast_exp:
        delta = te.compute(
            shape, lambda *indices: _compute_delta(max_elem, *indices), name="T_softmax_delta"
        )
        exp = topi.math.fast_exp(delta)
    else:
        exp = te.compute(
            shape, lambda *indices: _compute_exp(max_elem, *indices), name="T_softmax_exp"
        )
    expsum = te.compute(
        reduced_shape, lambda *indices: _compute_expsum(exp, *indices), name="T_softmax_expsum"
    )
    return te.compute(
        shape,
        lambda *indices: _normalize(exp, expsum, *indices),
        name="T_softmax_norm",
        attrs={"axis": axis},
)

5.打包计算与调度

源代码在\python\tvm\relay\op\strategy\generic.py，可以看到其流程和上节介绍的基本一致

def wrap_compute_softmax(topi_compute):
    """Wrap softmax topi compute"""

    def _compute_softmax(attrs, inputs, out_type):
        axis = attrs.get_int("axis")
        return [topi_compute(inputs[0], axis)]

return _compute_softmax

@override_native_generic_func("softmax_strategy")
def softmax_strategy(attrs, inputs, out_type, target):
    """softmax generic strategy"""
    strategy = _op.OpStrategy()
    strategy.add_implementation(
        wrap_compute_softmax(topi.nn.softmax),
        wrap_topi_schedule(topi.generic.schedule_softmax),
        name="softmax.generic",
    )
    return strategy

6.最后的打包

\python\tvm\relay\op\nn\nn.py

def softmax(data, axis=-1):
    r"""Computes softmax.

    .. math:: \text{softmax}(x)_i = \frac{exp(x_i)}{\sum_j exp(x_j)}

    .. note::
        This operator can be optimized away for inference.

    Parameters
    ----------
    data: tvm.relay.Expr
        The input data to the operator.

    axis: int, optional
        The axis to sum over when computing softmax

    Returns
    -------
    result : tvm.relay.Expr
        The computed result.
    """
return _make.softmax(data, axis)

一维卷积算子Conv1d

1.AttrsNode

/*! \brief Attributes used in 1D convolution operators */
struct Conv1DAttrs : public tvm::AttrsNode<Conv1DAttrs> {
  Array<IndexExpr> strides;
  Array<IndexExpr> padding;
  Array<IndexExpr> dilation;
  int groups;
  IndexExpr channels;
  Array<IndexExpr> kernel_size;
  std::string data_layout;
  std::string kernel_layout;
  std::string out_layout;
  DataType out_dtype;

  TVM_DECLARE_ATTRS(Conv1DAttrs, "relay.attrs.Conv1DAttrs") {
    TVM_ATTR_FIELD(strides)
        .set_default(Array<IndexExpr>({
            1,
        }))
        .describe("Specifies the stride of the convolution.");
    TVM_ATTR_FIELD(padding)
        .set_default(Array<IndexExpr>({0, 0}))
        .describe(
            "If padding is non-zero, then the input is implicitly zero-padded"
            "on both sides for padding number of points");
    TVM_ATTR_FIELD(dilation)
        .set_default(Array<IndexExpr>({
            1,
        }))
        .describe("Specifies the dilation rate to use for dilated convolution.");
    TVM_ATTR_FIELD(groups).set_default(1).describe(
        "Currently unused but may be added in the future.");
    TVM_ATTR_FIELD(channels)
        .describe(
            "The number of output channels in the convolution."
            " If it is not set, inferred by shape of the weight.")
        .set_default(NullValue<IndexExpr>());
    TVM_ATTR_FIELD(kernel_size)
        .describe("Specifies the dimensions of the convolution window.")
        .set_default(NullValue<Array<IndexExpr>>());
    TVM_ATTR_FIELD(data_layout)
        .set_default("NCW")
        .describe(
            "Dimension ordering of input data. Can be 'NCW', 'NWC', etc."
            "'N', 'C', 'W' stands for batch, channel, and width"
            "dimensions respectively. Convolution is applied on the 'W'"
            "dimension.");
    TVM_ATTR_FIELD(kernel_layout)
        .set_default("OIW")
        .describe(
            "Dimension ordering of weight. Can be 'OIW', or 'WIO', etc."
            "'O', 'I', 'W' stands for num_filter, input_channel, and width"
            "dimensions respectively.");

    // use 0 bits to indicate none.
    TVM_ATTR_FIELD(out_dtype)
        .set_default(NullValue<DataType>())
        .describe("Output data type, set to explicit type under mixed precision setting");
  }
};

2.type relation函数

源代码在\src\relay\op\nn\convolution.h，该文件里包含着TVM所有卷积算子的类型联系函数。这个conv1d的类型函数完成了输入输出数据格式的检查、通过reporter实现输出类型的约束的功能。

// Standard convolution operator shape relations
template <typename AttrType>
bool Conv1DRel(const Array<Type>& types, int num_inputs, const Attrs& attrs,
               const TypeReporter& reporter) {
  ICHECK_EQ(types.size(), 3);
  const auto* data = types[0].as<TensorTypeNode>();
  const auto* weight = types[1].as<TensorTypeNode>();
  if (data == nullptr) return false;
  static const Layout kNCW("NCW");
  static const Layout kOIW("OIW");

  const AttrType* param = attrs.as<AttrType>();
  ICHECK(param != nullptr);
  const Layout in_layout(param->data_layout);
  const Layout kernel_layout(param->kernel_layout);

  const auto trans_in_layout = tir::BijectiveLayout(in_layout, kNCW);
  ICHECK(trans_in_layout.defined())
      << "Conv only support input layouts that are convertible from NCW."
      << " But got " << in_layout;

  const auto trans_kernel_layout = tir::BijectiveLayout(kernel_layout, kOIW);
  ICHECK(trans_kernel_layout.defined())
      << "Conv only support kernel layouts that are convertible from OIW."
      << " But got " << kernel_layout;

  Layout out_layout(param->out_layout == "" ? param->data_layout : param->out_layout);
  const auto trans_out_layout = tir::BijectiveLayout(out_layout, kNCW);
  ICHECK(trans_out_layout.defined())
      << "Conv only support output layouts that are convertible from NCW."
      << " But got " << out_layout;

  Array<IndexExpr> dshape_ncw = trans_in_layout.ForwardShape(data->shape);

  IndexExpr channels, dilated_ksize;
  // infer weight if the kernel_size and channels are defined
  if (param->kernel_size.defined() && param->channels.defined()) {
    Array<IndexExpr> wshape;

    wshape = {{param->channels, dshape_ncw[1], param->kernel_size[0]}};

    wshape = trans_kernel_layout.BackwardShape(wshape);
    channels = param->channels;
    dilated_ksize = 1 + (param->kernel_size[0] - 1) * param->dilation[0];
    DataType weight_dtype = data->dtype;
    if (weight != nullptr) {
      weight_dtype = weight->dtype;
    }
    // assign result to reporter
    reporter->Assign(types[1], TensorType(wshape, weight_dtype));
  } else {
    // use weight to infer the conv shape.
    if (weight == nullptr) return false;
    auto wshape = trans_kernel_layout.ForwardShape(weight->shape);
    if (param->kernel_size.defined()) {
      // check the size
      ICHECK(reporter->AssertEQ(param->kernel_size[0], wshape[2]))
          << "Conv1D: shape of weight is inconsistent with kernel_size, "
          << " kernel_size=" << param->kernel_size << " wshape=" << wshape;
    }
    if (param->channels.defined()) {
      ICHECK(reporter->AssertEQ(param->channels, wshape[0]))
          << "Conv1D: shape of weight is inconsistent with channels, "
          << " channels=" << param->channels << " wshape=" << wshape;
    }
    if (!dshape_ncw[1].as<tir::AnyNode>() && !wshape[1].as<tir::AnyNode>()) {
      ICHECK(reporter->AssertEQ(dshape_ncw[1], wshape[1]));
    }
    channels = wshape[0];
    dilated_ksize = 1 + (wshape[2] - 1) * param->dilation[0];
  }
  // dilation
  Array<IndexExpr> oshape({dshape_ncw[0], channels, 0});

  if (!dshape_ncw[2].as<tir::AnyNode>()) {
    oshape.Set(2, indexdiv(dshape_ncw[2] + param->padding[0] + param->padding[1] - dilated_ksize,
                           param->strides[0]) +
                      1);
  } else {
    oshape.Set(2, dshape_ncw[2]);
  }

  DataType out_dtype = param->out_dtype;
  if (out_dtype.bits() == 0) {
    out_dtype = data->dtype;
  }
  oshape = trans_out_layout.BackwardShape(oshape);
  // assign output type
  reporter->Assign(types[2], TensorType(oshape, out_dtype));
  return true;
}

3.AttrsNode注册、Type Relation注册、CallNode注册

代码位于\src\relay\op\nn\convolution.cc

AttrsNode：

// relay.nn.conv1d
TVM_REGISTER_NODE_TYPE(Conv1DAttrs);

Type Relation与额外信息注册：

RELAY_REGISTER_OP("nn.conv1d")
    .describe(R"code(1D convolution layer (e.g. spatial convolution over sequences).

This layer creates a convolution kernel that is convolved
with the layer input to produce a tensor of outputs.

- **data**: This depends on the `layout` parameter. Input is 3D array of shape
            (batch_size, in_channels, width) if `layout` is `NCW`.
- **weight**: (channels, in_channels, kernel_size)
- **out**:  This depends on the `layout` parameter. Output is 3D array of shape
            (batch_size, channels, out_width) if `layout` is `NCW`.

)code" TVM_ADD_FILELINE)
    .set_attrs_type<Conv1DAttrs>()
    .set_num_inputs(2)
    .add_argument("data", "Tensor", "The input tensor.")
    .add_argument("weight", "Tensor", "The weight tensor.")
    .set_support_level(2)
    .add_type_rel("Conv1D", Conv1DRel<Conv1DAttrs>)
.set_attr<FInferCorrectLayout>("FInferCorrectLayout", ConvInferCorrectLayout<Conv1DAttrs>);

CallNode及其注册：

//建立CallNode，这个CallNode，1d2d3d卷积算子通用
template <typename T>
inline Expr MakeConv(Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
                     Array<IndexExpr> dilation, int groups, IndexExpr channels,
                     Array<IndexExpr> kernel_size, std::string data_layout,
                     std::string kernel_layout, std::string out_layout, DataType out_dtype,
                     std::string op_name) {
  auto attrs = make_object<T>();
  attrs->strides = std::move(strides);
  attrs->padding = std::move(padding);
  attrs->dilation = std::move(dilation);
  attrs->groups = groups;
  attrs->channels = std::move(channels);
  attrs->kernel_size = std::move(kernel_size);
  attrs->data_layout = std::move(data_layout);
  attrs->kernel_layout = std::move(kernel_layout);
  attrs->out_layout = std::move(out_layout);
  attrs->out_dtype = std::move(out_dtype);
  const Op& op = Op::Get(op_name);
  return Call(op, {data, weight}, Attrs(attrs), {});
}
//注册
TVM_REGISTER_GLOBAL("relay.op.nn._make.conv1d")
    .set_body_typed([](Expr data, Expr weight, Array<IndexExpr> strides, Array<IndexExpr> padding,
                       Array<IndexExpr> dilation, int groups, IndexExpr channels,
                       Array<IndexExpr> kernel_size, String data_layout, String kernel_layout,
                       String out_layout, DataType out_dtype) {
      return MakeConv<Conv1DAttrs>(data, weight, strides, padding, dilation, groups, channels,
                                   kernel_size, data_layout, kernel_layout, out_layout, out_dtype,
                                   "nn.conv1d");
});

4.TOPI库中计算的定义

源代码位于\python\tvm\topi\nn\conv1d.py中，里面实现了NWC和NCW两种数据格式的卷积，这里只展示NCW的。

TOPI中用tensor expression定义算子算是TVM算子代码里最难最抽象的，陈天奇老师称其是连接高级IR与低级IR间的桥梁。感觉网上包括TVM社区里关于TVM中的张量表达式没有简单清晰、内容完备的教程，以后会开坑分享内容。

def conv1d_ncw(data, kernel, strides=1, padding="VALID", dilation=1, out_dtype=None):
    if out_dtype is None:
        out_dtype = data.dtype
    if isinstance(strides, (tuple, list)):
        strides = strides[0]
    if isinstance(dilation, (tuple, list)):
        dilation = dilation[0]

    batch, in_channels, data_width = data.shape
    out_channels, _, kernel_size = kernel.shape

    # Compute the output shape
    dilated_kernel_size = (kernel_size - 1) * dilation + 1
    pad_left, pad_right = get_pad_tuple1d(padding, (dilated_kernel_size,))
    out_channels = simplify(out_channels)
    out_width = simplify((data_width - dilated_kernel_size + pad_left + pad_right) // strides + 1)

    # Apply padding
    pad_before = [0, 0, pad_left]
    pad_after = [0, 0, pad_right]
    temp = pad(data, pad_before, pad_after, name="pad_temp")

    # Compute graph
    rc = te.reduce_axis((0, in_channels), name="rc")
    rw = te.reduce_axis((0, kernel_size), name="rw")

    return te.compute(
        (batch, out_channels, out_width),
        lambda b, c, w: te.sum(
            temp[b, rc, w * strides + rw * dilation].astype(out_dtype)
            * kernel[c, rc, rw].astype(out_dtype),
            axis=[rc, rw],
        ),
        tag="conv1d_ncw",
)

5.计算+调度的包装

同上篇文章，CPU的实现位于\python\tvm\relay\op\strategy\generic.py

暂时没有对比CPU与GPU实现方式的区别，等熟悉了CUDA编程和硬件加速的内容之后或许会写一篇分享。

# conv1d
def wrap_compute_conv1d(topi_compute):
    """wrap conv1d topi compute"""

    def _compute_conv1d(attrs, inputs, out_type):
        """Compute definition of conv1d"""
        strides = get_const_tuple(attrs.strides)
        padding = get_const_tuple(attrs.padding)
        dilation = get_const_tuple(attrs.dilation)
        out_dtype = attrs.out_dtype
        out_dtype = inputs[0].dtype if out_dtype in ("same", "") else out_dtype
        return [topi_compute(inputs[0], inputs[1], strides, padding, dilation, out_dtype)]

    return _compute_conv1d


@override_native_generic_func("conv1d_strategy")
def conv1d_strategy(attrs, inputs, out_type, target):
    """conv1d generic strategy"""
    logger.warning("conv1d is not optimized for this platform.")
    layout = attrs.data_layout
    dilation = get_const_tuple(attrs.dilation)
    if dilation[0] < 1:
        raise ValueError("dilation should be a positive value")
    strategy = _op.OpStrategy()
    if layout == "NCW":
        strategy.add_implementation(
            wrap_compute_conv1d(topi.nn.conv1d_ncw),
            wrap_topi_schedule(topi.generic.schedule_conv1d_ncw),
            name="conv1d_ncw.generic",
        )
    elif layout == "NWC":
        strategy.add_implementation(
            wrap_compute_conv1d(topi.nn.conv1d_nwc),
            wrap_topi_schedule(topi.generic.schedule_conv1d_nwc),
            name="conv1d_nwc.generic",
        )
    else:
        raise ValueError("Unsupported conv1d layout {}".format(layout))
    return strategy

6、最后的API封装

def conv1d(
    data,
    weight,
    strides=1,
    padding=0,
    dilation=1,
    groups=1,
    channels=None,
    kernel_size=None,
    data_layout="NCW",
    kernel_layout="OIW",
    out_layout="",
    out_dtype="",
):
    if isinstance(kernel_size, int):
        kernel_size = (kernel_size,)
    if isinstance(strides, int):
        strides = (strides,)
    if isinstance(dilation, int):
        dilation = (dilation,)
    padding = get_pad_tuple1d(padding)
    return _make.conv1d(
        data,
        weight,
        strides,
        padding,
        dilation,
        groups,
        channels,
        kernel_size,
        data_layout,
        kernel_layout,
        out_layout,
        out_dtype,
    )

标签：kernel,layout,Relay,代码,TVM,attrs,indices,data,out
来源： https://blog.csdn.net/weixin_44114416/article/details/122255964

本站声明： 1. iCode9 技术分享网（下文简称本站）提供的所有内容，仅供技术学习、探讨和分享；
2. 关于本站的所有留言、评论、转载及引用，纯属内容发起人的个人观点，与本站观点和立场无关；
3. 关于本站的所有言论和文字，纯属内容发起人的个人观点，与本站观点和立场无关；
4. 本站文章均是网友提供，不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属；如您发现该文章侵犯了您的权益，可联系我们第一时间进行删除；
5. 本站为非盈利性的个人网站，所有内容不会用来进行牟利，也不会利用任何形式的广告来间接获益，纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

ICode9

(TVM开发代码学习)熟悉Relay算子的代码

Sofrmax算子：

一维卷积算子Conv1d