Transformer_(deep_learning_architecture) Search Results

Transformer (deep learning architecture)

In deep learning, transformer is an architecture based on the multi-head attention mechanism, in which text is converted to numerical representations...

106 KB (13,130 words) - 14:54, 15 July 2025

Mamba (deep learning architecture)

Mamba is a deep learning architecture focused on sequence modeling. It was developed by researchers from Carnegie Mellon University and Princeton University...

11 KB (1,159 words) - 19:42, 16 April 2025

Generative pre-trained transformer

used in natural language processing. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able...

65 KB (5,276 words) - 21:27, 10 July 2025

Attention (machine learning)

masked self-attention". Recurrent neural network seq2seq Transformer (deep learning architecture) Attention Dynamic neural network Niu, Zhaoyang; Zhong...

35 KB (3,418 words) - 03:54, 9 July 2025

Ashish Vaswani

his pioneering contributions in the field of deep learning, most notably the development of the Transformer neural network, which he co-authored in landmark...

5 KB (383 words) - 06:54, 22 May 2025

Vision transformer

of 1.6 exaFLOPs. Transformer (machine learning model) Convolutional neural network Attention (machine learning) Perceiver Deep learning PyTorch TensorFlow...

37 KB (4,127 words) - 15:37, 11 July 2025

Deep Learning Super Sampling

Deep Learning Super Sampling (DLSS) is a suite of real-time deep learning image enhancement and upscaling technologies developed by Nvidia that are available...

35 KB (3,842 words) - 05:21, 16 July 2025

Attention Is All You Need

machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based...

15 KB (3,911 words) - 13:54, 9 July 2025

Deep learning

purpose. Most modern deep learning models are based on multi-layered neural networks such as convolutional neural networks and transformers, although they can...

182 KB (17,994 words) - 00:54, 4 July 2025

T5 (language model) (section Architecture)

(Text-to-Text Transfer Transformer) is a series of large language models developed by Google AI introduced in 2019. Like the original Transformer model, T5 models...

20 KB (1,932 words) - 03:55, 7 May 2025

Transformer (disambiguation)

Transformer (deep learning architecture), a machine learning architecture Transformer (flying car), a DARPA military project "Electronic transformer"...

2 KB (236 words) - 11:42, 17 June 2024

BERT (language model) (redirect from Bidirectional Encoder Representations from Transformers)

a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. BERT dramatically improved the state-of-the-art...

32 KB (3,623 words) - 12:46, 7 July 2025

Google Brain (redirect from Google deep learning project)

present in a photo that a human could easily spot. The transformer deep learning architecture was invented by Google Brain researchers in 2017, and explained...

44 KB (4,228 words) - 06:25, 18 June 2025

Large language model (category Deep learning)

deep recurrent neural networks. These early NMT systems used LSTM-based encoder-decoder architectures, as they preceded the invention of transformers...

133 KB (14,153 words) - 11:12, 16 July 2025

Residual neural network (category Deep learning)

network (also referred to as a residual network or ResNet) is a deep learning architecture in which the layers learn residual functions with reference to...

28 KB (3,042 words) - 23:27, 7 June 2025

Deep reinforcement learning

Deep reinforcement learning (DRL) is a subfield of machine learning that combines principles of reinforcement learning (RL) and deep learning. It involves...

12 KB (1,658 words) - 12:58, 11 June 2025

DeepSeek

source-available DeepSeek License. The architecture was essentially the same as the Llama series. They used the pre-norm decoder-only Transformer with RMSNorm...

69 KB (6,445 words) - 23:52, 16 July 2025

Multimodal learning

Multimodal learning is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images...

9 KB (2,212 words) - 22:40, 1 June 2025

Normalization (machine learning)

Jingbo; Li, Changliang; Wong, Derek F.; Chao, Lidia S. (2019). "Learning Deep Transformer Models for Machine Translation". arXiv:1906.01787 [cs.CL]. Xiong...

35 KB (5,361 words) - 05:48, 19 June 2025

DeepL Translator

gradually expanded to support 35 languages. Its algorithm uses the transformer architecture. It offers a paid subscription for additional features and access...

30 KB (2,324 words) - 13:34, 16 July 2025

Noam Shazeer

to the field of artificial intelligence and deep learning, particularly in the development of transformer models and natural language processing. Noam...

8 KB (854 words) - 16:28, 6 April 2025

Convolutional neural network (redirect from CNN (machine learning model))

recently been replaced—in some cases—by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during...

138 KB (15,569 words) - 03:12, 17 July 2025

Feature learning

autoencoders. Self-supervised learning has since been applied to many modalities through the use of deep neural network architectures such as convolutional neural...

45 KB (5,114 words) - 09:22, 4 July 2025

Neural network (machine learning)

adversarial networks (GAN) and transformers are used for content creation across numerous industries. This is because deep learning models are able to learn...

168 KB (17,613 words) - 15:58, 16 July 2025

Long short-term memory (category Deep learning)

2024). One of the 2 blocks (mLSTM) of the architecture are parallelizable like the Transformer architecture, the other ones (sLSTM) allow state tracking...

52 KB (5,822 words) - 10:08, 15 July 2025

GPT-3 (redirect from Generative Pre-trained Transformer 3)

transformer-based deep-learning neural network architectures. Previously, the best-performing neural NLP models commonly employed supervised learning...

55 KB (4,921 words) - 19:37, 10 July 2025

Neural processing unit (redirect from Deep learning accelerator)

Intel. May 20, 2024. "AMD XDNA Architecture". "Deploying Transformers on the Apple Neural Engine". Apple Machine Learning Research. Retrieved August 24...

9 KB (767 words) - 03:50, 15 July 2025

Reinforcement learning

Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs...

69 KB (8,200 words) - 16:29, 4 July 2025

Imitation learning

new policy on the aggregated dataset. The Decision Transformer approach models reinforcement learning as a sequence modelling problem. Similar to Behavior...

13 KB (1,339 words) - 15:09, 2 June 2025

Whisper (speech recognition system) (section Architecture)

Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture. Whisper Large V2 was released on...

16 KB (1,659 words) - 22:46, 13 July 2025