欧博百家乐Attention withDiscrete Cosine Transform,arXiv

DCT-Former: Efficient Self-Attention withDiscrete Cosine Transform
arXiv - CS - Machine Learning Pub Date : 2022-03-02 , DOI: arxiv-2203.01178
Carmelo Scribano , Giorgia Franchini , Marco Prato , Marko Bertogna

Since their introduction the Trasformer architectures emerged as the dominating architectures for both natural language processing and, more recently, computer vision applications. An intrinsic limitation of this family of "fully-attentive" architectures arises from the computation of the dot-product attention, which grows both in memory consumption and number of operations as $O(n^2)$ where $n$ stands for the input sequence length, thus limiting the applications that require modeling very long sequences. Several approaches have been proposed so far in the literature to mitigate this issue, with varying degrees of success. Our idea takes inspiration from the world of \textit{lossy} data compression (such as the JPEG algorithm) to derive an approximation of the attention module by leveraging the properties of the Discrete Cosine Transform. An extensive section of experiments shows that our method takes up less memory for the same performance, while also drastically reducing inference time. This makes it particularly suitable in real-time contexts on embedded platforms. Moreover, we assume that the results of our research might serve as a starting point for a broader family of deep neural models with reduced memory footprint. The implementation will be made publicly available at https://github.com/cscribano/DCT-Former-Public

中文翻译:

DCT-Former:具有离散余弦变换的高效自注意力

自从引入 Trasformer 架构以来,欧博百家乐它成为自然语言处理以及最近的计算机视觉应用程序的主导架构。这个“全注意力”架构家族的一个内在限制来自于点积注意力的计算,它在内存消耗和操作数量上都随着 $O(n^2)$ 的增长而增长,其中 $n$ 代表输入序列长度,从而限制了需要对非常长的序列进行建模的应用程序。迄今为止,文献中已经提出了几种方法来缓解这个问题,并取得了不同程度的成功。我们的想法从 \textit{lossy} 数据压缩(例如 JPEG 算法)的世界中汲取灵感,通过利用离散余弦变换的属性推导出注意力模块的近似值。大量实验表明,我们的方法在相同性能下占用更少的内存,同时也大大减少了推理时间。这使得它特别适用于嵌入式平台上的实时环境。此外,我们假设我们的研究结果可以作为更广泛的深度神经模型家族的起点,并减少内存占用。该实施将在 https://github.com/cscribano/DCT-Former-Public 上公开 我们假设我们的研究结果可以作为更广泛的深度神经模型家族的起点,并减少内存占用。该实施将在 https://github.com/cscribano/DCT-Former-Public 上公开 我们假设我们的研究结果可以作为更广泛的深度神经模型家族的起点,并减少内存占用。该实施将在 https://github.com/cscribano/DCT-Former-Public 上公开

更新日期:2022-03-02

2025-09-14 05:04 点击量:2