• Thumbnail for Language model benchmark
    Language model benchmark is a standardized test designed to evaluate the performance of language model on various natural language processing tasks. These...
    103 KB (11,038 words) - 12:06, 30 July 2025
  • A language model is a model of the human brain's ability to produce natural language. Language models are useful for a variety of tasks, including speech...
    17 KB (2,424 words) - 12:05, 30 July 2025
  • large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing...
    135 KB (14,333 words) - 04:47, 4 August 2025
  • Thumbnail for Claude (language model)
    Claude is a family of large language models developed by Anthropic. The first model was released in March 2023. The Claude 3 family, released in March...
    26 KB (2,274 words) - 20:30, 2 August 2025
  • Humanity's Last Exam (category Large language models)
    Humanity's Last Exam (HLE) is a language model benchmark consisting of 2,500 questions across a broad range of subjects. It was created jointly by the...
    8 KB (525 words) - 15:19, 2 August 2025
  • Measuring Massive Multitask Language Understanding (MMLU) is a popular benchmark for evaluating the capabilities of large language models. It inspired several...
    6 KB (746 words) - 20:37, 28 July 2025
  • Reasoning language models (RLMs) are large language models that are trained further to solve tasks that take several steps of reasoning. They tend to do...
    26 KB (3,061 words) - 21:30, 31 July 2025
  • written. List of chatbots List of language model benchmarks This is the date that documentation describing the model's architecture was first released....
    64 KB (3,353 words) - 15:04, 24 July 2025
  • variety of industry benchmarks, while Gemini Pro was said to have outperformed GPT-3.5. Gemini Ultra was also the first language model to outperform human...
    64 KB (5,017 words) - 19:03, 2 August 2025
  • Thumbnail for Llama (language model)
    Llama (Large Language Model Meta AI) is a family of large language models (LLMs) released by Meta AI starting in February 2023. The latest version is Llama...
    57 KB (5,448 words) - 20:35, 2 August 2025
  • for Transformer-based Masked Language-models, arXiv:2106.10199 "Papers with Code - MMLU Benchmark (Multi-task Language Understanding)". paperswithcode...
    54 KB (5,552 words) - 18:04, 25 July 2025
  • Bidirectional encoder representations from transformers (BERT) is a language model introduced in October 2018 by researchers at Google. It learns to represent...
    32 KB (3,623 words) - 20:01, 2 August 2025
  • average accuracy of 67.5% on the Measuring Massive Multitask Language Understanding (MMLU) benchmark, which is 7% higher than Gopher's performance. Chinchilla...
    8 KB (615 words) - 19:14, 2 August 2025
  • Thumbnail for Benchmark (computing)
    In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance...
    22 KB (2,675 words) - 14:20, 31 July 2025
  • Thumbnail for Qwen
    Qwen (category Large language models)
    large language models developed by Chinese company Alibaba Cloud. In July 2024, it was ranked as the top Chinese language model in some benchmarks and third...
    22 KB (1,560 words) - 20:03, 2 August 2025
  • in 2016, and of the paper that introduced the language model benchmark MMLU (Massive Multitask Language Understanding) in 2020. In February 2022, Hendrycks...
    10 KB (860 words) - 05:42, 11 June 2025
  • Emily M. Bender and colleagues in a 2021 paper, that frames large language models as systems that statistically mimic text without real understanding...
    22 KB (2,359 words) - 14:00, 3 August 2025
  • Thumbnail for Generative artificial intelligence
    Generative artificial intelligence (category CS1 Japanese-language sources (ja))
    language model benchmarks. Yann LeCun has advocated open-source models for their value to vertical applications and for improving AI safety. Language...
    155 KB (13,950 words) - 05:14, 30 July 2025
  • OpenAI o3 (category Large language models)
    accuracy of o1. List of large language models Knight, Will (December 20, 2024). "OpenAI Upgrades Its Smartest AI Model With Improved Reasoning Skills"...
    9 KB (851 words) - 20:12, 2 August 2025
  • most powerful Arabic-language AI model". ZDNET. Retrieved 2025-07-31. "Core42 Sets New Benchmark for Arabic Large Language Models with the Release of Jais...
    5 KB (463 words) - 12:57, 1 August 2025
  • Thumbnail for LMArena
    LMArena (category Large language models)
    evaluates large language models (LLMs) through anonymous, crowd-sourced pairwise comparisons. Users enter prompts for two anonymous models to respond to...
    4 KB (341 words) - 18:29, 11 July 2025
  • Retrieval-augmented generation (category Large language models)
    Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information. With RAG, LLMs...
    24 KB (2,549 words) - 15:32, 16 July 2025
  • Thumbnail for PaLM
    PaLM (redirect from Pathways Language Model)
    PaLM (Pathways Language Model) is a 540 billion-parameter dense decoder-only transformer-based large language model (LLM) developed by Google AI. Researchers...
    13 KB (807 words) - 19:02, 2 August 2025
  • company founded in 2021. Anthropic has developed a family of large language models (LLMs) named Claude as a competitor to OpenAI's ChatGPT and Google's...
    39 KB (3,620 words) - 05:10, 2 August 2025
  • Perplexity (category Language modeling)
    q={\tilde {p}}} . In natural language processing (NLP), a corpus is a structured collection of texts or documents, and a language model is a probability distribution...
    13 KB (1,895 words) - 17:29, 22 July 2025
  • DeepSeek (category Articles containing Chinese-language text)
    is a Chinese artificial intelligence company that develops large language models (LLMs). Based in Hangzhou, Zhejiang, Deepseek is owned and funded by...
    71 KB (6,578 words) - 11:51, 3 August 2025
  • Thumbnail for Benchmark (surveying)
    The term benchmark, bench mark, or survey benchmark originates from the chiseled horizontal marks that surveyors made in stone structures, into which an...
    10 KB (1,053 words) - 15:42, 10 February 2025
  • intelligence (AI) model. A prompt is natural language text describing the task that an AI should perform. A prompt for a text-to-text language model can be a query...
    40 KB (4,480 words) - 21:07, 27 July 2025
  • (Google's family of large language models) and other generative AI tools, such as the text-to-image model Imagen and the text-to-video model Veo. The start-up...
    98 KB (9,531 words) - 05:53, 3 August 2025
  • evaluating and aligning large language models (LLMs), including through initiatives such as Humanity's Last Exam, a benchmark designed to assess advanced...
    25 KB (2,312 words) - 05:00, 2 August 2025