• Thumbnail for Transformer (deep learning architecture)
    In deep learning, transformer is an architecture based on the multi-head attention mechanism, in which text is converted to numerical representations...
    106 KB (13,130 words) - 14:54, 15 July 2025
  • Mamba is a deep learning architecture focused on sequence modeling. It was developed by researchers from Carnegie Mellon University and Princeton University...
    11 KB (1,159 words) - 19:42, 16 April 2025
  • Thumbnail for Generative pre-trained transformer
    used in natural language processing. It is based on the transformer deep learning architecture, pre-trained on large data sets of unlabeled text, and able...
    65 KB (5,276 words) - 21:27, 10 July 2025
  • Thumbnail for Attention (machine learning)
    masked self-attention". Recurrent neural network seq2seq Transformer (deep learning architecture) Attention Dynamic neural network Niu, Zhaoyang; Zhong...
    35 KB (3,418 words) - 03:54, 9 July 2025
  • his pioneering contributions in the field of deep learning, most notably the development of the Transformer neural network, which he co-authored in landmark...
    5 KB (383 words) - 06:54, 22 May 2025
  • Thumbnail for Vision transformer
    of 1.6 exaFLOPs. Transformer (machine learning model) Convolutional neural network Attention (machine learning) Perceiver Deep learning PyTorch TensorFlow...
    37 KB (4,127 words) - 15:37, 11 July 2025
  • Deep Learning Super Sampling (DLSS) is a suite of real-time deep learning image enhancement and upscaling technologies developed by Nvidia that are available...
    35 KB (3,842 words) - 05:21, 16 July 2025
  • Thumbnail for Attention Is All You Need
    machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based...
    15 KB (3,911 words) - 13:54, 9 July 2025
  • Thumbnail for Deep learning
    purpose. Most modern deep learning models are based on multi-layered neural networks such as convolutional neural networks and transformers, although they can...
    182 KB (17,994 words) - 00:54, 4 July 2025
  • (Text-to-Text Transfer Transformer) is a series of large language models developed by Google AI introduced in 2019. Like the original Transformer model, T5 models...
    20 KB (1,932 words) - 03:55, 7 May 2025
  • Transformer (deep learning architecture), a machine learning architecture Transformer (flying car), a DARPA military project "Electronic transformer"...
    2 KB (236 words) - 11:42, 17 June 2024
  • a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. BERT dramatically improved the state-of-the-art...
    32 KB (3,623 words) - 12:46, 7 July 2025
  • Thumbnail for Residual neural network
    Residual neural network (category Deep learning)
    network (also referred to as a residual network or ResNet) is a deep learning architecture in which the layers learn residual functions with reference to...
    28 KB (3,042 words) - 23:27, 7 June 2025
  • Deep reinforcement learning (DRL) is a subfield of machine learning that combines principles of reinforcement learning (RL) and deep learning. It involves...
    12 KB (1,658 words) - 12:58, 11 June 2025
  • present in a photo that a human could easily spot. The transformer deep learning architecture was invented by Google Brain researchers in 2017, and explained...
    44 KB (4,228 words) - 06:25, 18 June 2025
  • Large language model (category Deep learning)
    deep recurrent neural networks. These early NMT systems used LSTM-based encoder-decoder architectures, as they preceded the invention of transformers...
    133 KB (14,153 words) - 11:12, 16 July 2025
  • Multimodal learning is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images...
    9 KB (2,212 words) - 22:40, 1 June 2025
  • Thumbnail for DeepL Translator
    gradually expanded to support 35 languages. Its algorithm uses the transformer architecture. It offers a paid subscription for additional features and access...
    30 KB (2,324 words) - 13:34, 16 July 2025
  • Jingbo; Li, Changliang; Wong, Derek F.; Chao, Lidia S. (2019). "Learning Deep Transformer Models for Machine Translation". arXiv:1906.01787 [cs.CL]. Xiong...
    35 KB (5,361 words) - 05:48, 19 June 2025
  • source-available DeepSeek License. The architecture was essentially the same as the Llama series. They used the pre-norm decoder-only Transformer with RMSNorm...
    69 KB (6,445 words) - 23:52, 16 July 2025
  • recently been replaced—in some cases—by newer deep learning architectures such as the transformer. Vanishing gradients and exploding gradients, seen during...
    138 KB (15,569 words) - 03:12, 17 July 2025
  • Intel. May 20, 2024. "AMD XDNA Architecture". "Deploying Transformers on the Apple Neural Engine". Apple Machine Learning Research. Retrieved August 24...
    9 KB (767 words) - 03:50, 15 July 2025
  • Thumbnail for Feature learning
    autoencoders. Self-supervised learning has since been applied to many modalities through the use of deep neural network architectures such as convolutional neural...
    45 KB (5,114 words) - 09:22, 4 July 2025
  • Thumbnail for Neural network (machine learning)
    adversarial networks (GAN) and transformers are used for content creation across numerous industries. This is because deep learning models are able to learn...
    168 KB (17,613 words) - 15:58, 16 July 2025
  • to the field of artificial intelligence and deep learning, particularly in the development of transformer models and natural language processing. Noam...
    8 KB (854 words) - 16:28, 6 April 2025
  • Thumbnail for Long short-term memory
    Long short-term memory (category Deep learning)
    2024). One of the 2 blocks (mLSTM) of the architecture are parallelizable like the Transformer architecture, the other ones (sLSTM) allow state tracking...
    52 KB (5,822 words) - 10:08, 15 July 2025
  • Whisper is a weakly-supervised deep learning acoustic model, made using an encoder-decoder transformer architecture. Whisper Large V2 was released on...
    16 KB (1,659 words) - 22:46, 13 July 2025
  • Thumbnail for Reinforcement learning
    Reinforcement learning is one of the three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning differs...
    69 KB (8,200 words) - 16:29, 4 July 2025
  • Q-learning algorithm. In 2014, Google DeepMind patented an application of Q-learning to deep learning, titled "deep reinforcement learning" or "deep Q-learning"...
    29 KB (3,856 words) - 18:16, 16 July 2025
  • Thumbnail for GPT-1
    GPT-1 (category Generative pre-trained transformers)
    Pre-trained Transformer 1 (GPT-1) was the first of OpenAI's large language models following Google's invention of the transformer architecture in 2017. In...
    32 KB (1,069 words) - 19:45, 10 July 2025