Cross-entropy Search Results

Cross-entropy

In information theory, the cross-entropy between two probability distributions p {\displaystyle p} and q {\displaystyle q} , over the same underlying...

19 KB (3,264 words) - 23:00, 21 April 2025

Kullback–Leibler divergence (redirect from Kullback–Leibler entropy)

statistics, the Kullback–Leibler (KL) divergence (also called relative entropy and I-divergence), denoted D KL ( P ∥ Q ) {\displaystyle D_{\text{KL}}(P\parallel...

77 KB (13,067 words) - 13:07, 12 June 2025

Entropy (information theory)

In information theory, the entropy of a random variable quantifies the average level of uncertainty or information associated with the variable's potential...

72 KB (10,220 words) - 13:03, 6 June 2025

Principle of maximum entropy

entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy,...

31 KB (4,196 words) - 11:16, 14 June 2025

Ensemble learning (section Amended Cross-Entropy Cost: An Approach for Encouraging Diversity in Classification Ensemble)

correlation for regression tasks or using information measures such as cross entropy for classification tasks. Theoretically, one can justify the diversity...

53 KB (6,685 words) - 14:14, 8 June 2025

Tsallis entropy

In physics, the Tsallis entropy is a generalization of the standard Boltzmann–Gibbs entropy. It is proportional to the expectation of the q-logarithm...

24 KB (2,881 words) - 17:28, 12 June 2025

Cross-entropy method

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous...

7 KB (1,085 words) - 19:50, 23 April 2025

Neural machine translation (section Cross-entropy loss)

of the factors’ logarithms and flipping the sign yields the classic cross-entropy loss: θ ∗ = a r g m i n θ − ∑ i T log ⁡ ∑ j = 1 J ( i ) P ( y j ( i...

36 KB (3,901 words) - 13:08, 9 June 2025

Hyperbolastic functions (section Binary Cross-entropy for hyperbolastic H1)

binary cross-entropy compares the observed y ∈ { 0 , 1 } {\displaystyle y\in \{0,1\}} with the predicted probabilities. The average binary cross-entropy for...

41 KB (7,041 words) - 15:11, 5 May 2025

Wishart distribution (section Cross-entropy)

_{p}\left({\frac {n}{2}}\right)+{\frac {np}{2}}\end{aligned}}} The cross-entropy of two Wishart distributions p 0 {\displaystyle p_{0}} with parameters...

27 KB (4,194 words) - 19:55, 19 June 2025

Cross-entropy benchmarking

Cross-entropy benchmarking (also referred to as XEB) is a quantum benchmarking protocol which can be used to demonstrate quantum supremacy. In XEB, a...

4 KB (548 words) - 18:33, 10 December 2024

Maximum likelihood estimation (section Relation to minimizing Kullback–Leibler divergence and cross entropy)

the relationship between maximizing the likelihood and minimizing the cross-entropy, URL (version: 2019-11-06): https://stats.stackexchange.com/q/364237...

68 KB (9,706 words) - 19:59, 16 June 2025

Reasoning language model

The ORM is usually trained via logistic regression, i.e. minimizing cross-entropy loss. Given a PRM, an ORM can be constructed by multiplying the total...

24 KB (2,862 words) - 09:59, 13 June 2025

Maximum entropy probability distribution

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of...

36 KB (4,495 words) - 18:46, 19 June 2025

Knowledge distillation

is different than the data set used to train the large model) using cross-entropy as the loss function between the output of the distilled model y ( x...

17 KB (2,568 words) - 19:31, 2 June 2025

Large language model

evaluation and comparison of language models, cross-entropy is generally the preferred metric over entropy. The underlying principle is that a lower BPW...

115 KB (11,926 words) - 02:40, 16 June 2025

Perplexity (category Entropy and information)

{1}{N}}\sum _{i=1}^{N}\log _{b}q(x_{i})} may also be interpreted as a cross-entropy: H ( p ~ , q ) = − ∑ x p ~ ( x ) log b ⁡ q ( x ) {\displaystyle H({\tilde...

13 KB (1,893 words) - 18:04, 6 June 2025

Beta distribution (section Quantities of information (entropy))

expression is identical to the negative of the cross-entropy (see section on "Quantities of information (entropy)"). Therefore, finding the maximum of the...

245 KB (40,562 words) - 12:56, 14 May 2025

Genetic algorithm

The cross-entropy (CE) method generates candidate solutions via a parameterized probability distribution. The parameters are updated via cross-entropy minimization...

69 KB (8,221 words) - 21:33, 24 May 2025

Central tendency

interpreted geometrically by using entropy to measure variation: the MLE minimizes cross-entropy (equivalently, relative entropy, Kullback–Leibler divergence)...

13 KB (1,720 words) - 09:30, 21 May 2025

Backpropagation

loss function or "cost function" For classification, this is usually cross-entropy (XC, log loss), while for regression it is usually squared error loss...

56 KB (7,993 words) - 15:52, 29 May 2025

Index of information theory articles

conditional entropy conditional quantum entropy confusion and diffusion cross-entropy data compression entropic uncertainty (Hirchman uncertainty) entropy encoding...

1 KB (93 words) - 09:42, 8 August 2023

Simulated annealing

The cross-entropy method (CE) generates candidate solutions via a parameterized probability distribution. The parameters are updated via cross-entropy minimization...

35 KB (4,641 words) - 11:29, 29 May 2025

Word2vec

is trained by gradient descent to minimize the cross-entropy loss. In full formula, the cross-entropy loss is: − ∑ i ln ⁡ e v w i ′ ⋅ ( ∑ j ∈ i + N v...

33 KB (4,250 words) - 02:31, 10 June 2025

Torch (machine learning)

the mean squared error criterion implemented in MSECriterion and the cross-entropy criterion implemented in ClassNLLCriterion. What follows is an example...

10 KB (863 words) - 00:26, 14 December 2024

Multinomial logistic regression (redirect from Maximum entropy classifier)

regression, multinomial logit (mlogit), the maximum entropy (MaxEnt) classifier, and the conditional maximum entropy model. Multinomial logistic regression is used...

31 KB (5,225 words) - 12:07, 3 March 2025

Stochastic optimization

Battiti, G. Tecchiolli (1994), recently reviewed in the reference book cross-entropy method by Rubinstein and Kroese (2004) random search by Anatoly Zhigljavsky...

12 KB (1,071 words) - 06:25, 15 December 2024

Iterative proportional fitting

perform biproportion. We have also the entropy maximization, information loss minimization (or cross-entropy) or RAS which consists of factoring the...

22 KB (3,463 words) - 21:01, 17 March 2025

Reinforcement learning from human feedback

supervised model. In particular, it is trained to minimize the following cross-entropy loss function: L ( θ ) = − 1 ( K 2 ) E ( x , y w , y l ) [ log ⁡ ( σ...

62 KB (8,617 words) - 19:50, 11 May 2025

Information theory (section Entropy of an information source)

y)\,} Despite similar notation, joint entropy should not be confused with cross-entropy. The conditional entropy or conditional uncertainty of X given...

64 KB (7,973 words) - 23:39, 4 June 2025