The transformer is a deep learning architecture that was developed by researchers at Google and is based on the multi-head attention mechanism, which...
106 KB (13,091 words) - 21:14, 29 April 2025
natural language processing by machines. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able...
65 KB (5,342 words) - 13:55, 1 May 2025
Mamba is a deep learning architecture focused on sequence modeling. It was developed by researchers from Carnegie Mellon University and Princeton University...
11 KB (1,159 words) - 19:42, 16 April 2025
masked self-attention". Recurrent neural network seq2seq Transformer (deep learning architecture) Attention Dynamic neural network Niu, Zhaoyang; Zhong...
36 KB (3,494 words) - 17:00, 1 May 2025
his pioneering contributions in the field of deep learning, most notably the development of the Transformer neural network, which he co-authored in landmark...
5 KB (382 words) - 19:03, 25 March 2025
of 1.6 exaFLOPs. Transformer (machine learning model) Convolutional neural network Attention (machine learning) Perceiver Deep learning PyTorch TensorFlow...
37 KB (4,127 words) - 20:13, 29 April 2025
Transformer (deep learning architecture), a machine learning architecture Transformer (flying car), a DARPA military project "Electronic transformer"...
2 KB (236 words) - 11:42, 17 June 2024
a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. BERT dramatically improved the state-of-the-art...
31 KB (3,528 words) - 01:20, 29 April 2025
Deep Learning Super Sampling (DLSS) is a suite of real-time deep learning image enhancement and upscaling technologies developed by Nvidia that are available...
34 KB (3,692 words) - 18:38, 5 March 2025
"How has DeepSeek improved the Transformer architecture?". Epoch AI. Retrieved 3 February 2025. Metz, Cade (27 January 2025). "What is DeepSeek? And How...
62 KB (6,059 words) - 16:53, 1 May 2025
purpose. Most modern deep learning models are based on multi-layered neural networks such as convolutional neural networks and transformers, although they can...
180 KB (17,764 words) - 08:07, 11 April 2025
machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based...
15 KB (3,915 words) - 20:36, 1 May 2025
T5 (language model) (section Architecture)
(Text-to-Text Transfer Transformer) is a series of large language models developed by Google AI introduced in 2019. Like the original Transformer model, T5 models...
20 KB (1,932 words) - 22:58, 21 March 2025
Google Brain (redirect from Google deep learning project)
present in a photo that a human could easily spot. The transformer deep learning architecture was invented by Google Brain researchers in 2017, and explained...
44 KB (4,223 words) - 07:15, 26 April 2025
Multimodal learning is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images...
9 KB (2,338 words) - 08:44, 24 October 2024
Residual neural network (category Deep learning)
network (also referred to as a residual network or ResNet) is a deep learning architecture in which the layers learn residual functions with reference to...
27 KB (2,929 words) - 23:15, 25 February 2025
Convolutional neural network (redirect from CNN (machine learning model))
recently been replaced—in some cases—by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during...
138 KB (15,599 words) - 06:42, 18 April 2025
GPT-1 (category Generative pre-trained transformers)
Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017. In...
32 KB (1,064 words) - 01:34, 21 March 2025
Li, Changliang; Wong, Derek F.; Chao, Lidia S. (2019-06-04), Learning Deep Transformer Models for Machine Translation, arXiv:1906.01787 Xiong, Ruibin;...
31 KB (4,740 words) - 02:07, 19 January 2025
Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture. Whisper Large V2 was released on...
15 KB (1,613 words) - 00:22, 7 April 2025
Deep reinforcement learning (deep RL) is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the...
27 KB (2,929 words) - 10:33, 13 March 2025
GPT-3 (redirect from Generative Pre-trained Transformer 3)
transformer-based deep-learning neural network architectures. Previously, the best-performing neural NLP models commonly employed supervised learning...
54 KB (4,913 words) - 12:17, 2 May 2025
Neural processing unit (redirect from Deep learning accelerator)
A neural processing unit (NPU), also known as AI accelerator or deep learning processor, is a class of specialized hardware accelerator or computer system...
51 KB (4,926 words) - 10:31, 10 April 2025
autoencoders. Self-supervised learning has since been applied to many modalities through the use of deep neural network architectures such as convolutional neural...
45 KB (5,114 words) - 14:51, 30 April 2025
Long short-term memory (category Deep learning)
2024). One of the 2 blocks (mLSTM) of the architecture are parallelizable like the Transformer architecture, the other ones (sLSTM) allow state tracking...
52 KB (5,788 words) - 16:41, 2 May 2025
day. AlphaChip is an reinforcement learning-based neural architecture that guides the task of chip placement. DeepMind claimed that the time needed to...
92 KB (8,892 words) - 18:11, 18 April 2025
Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs...
64 KB (7,580 words) - 08:49, 30 April 2025
adversarial networks (GAN) and transformers are used for content creation across numerous industries. This is because deep learning models are able to learn...
168 KB (17,637 words) - 20:48, 21 April 2025
GPT-2 (redirect from Generative Pre-trained Transformer 2)
GPT-4, a generative pre-trained transformer architecture, implementing a deep neural network, specifically a transformer model, which uses attention instead...
44 KB (3,243 words) - 09:14, 19 April 2025
ongoing AI spring, and further increasing interest in deep learning. The transformer architecture was first described in 2017 as a method to teach ANNs...
84 KB (8,626 words) - 11:12, 27 April 2025