Restricted Boltzmann machine

<h2 id="structure">Structure</h2>
The standard type of RBM has binary-valued (<a href="/facts/Boolean_algebra/VkCToAmg">Boolean</a>) hidden and visible units, and consists of a <a href="/facts/Matrix_(mathematics)/qa8FI4ko">matrix</a> of weights 
 
 
 
 W
 
 
 {\displaystyle W}
 
 of size 
 
 
 
 m
 ×
 n
 
 
 {\displaystyle m\times n}
 
. Each weight element 
 
 
 
 (
 
 w
 
 i
 ,
 j
 
 
 )
 
 
 {\displaystyle (w_{i,j})}
 
 of the matrix is associated with the connection between the visible (input) unit 
 
 
 
 
 v
 
 i
 
 
 
 
 {\displaystyle v_{i}}
 
 and the hidden unit 
 
 
 
 
 h
 
 j
 
 
 
 
 {\displaystyle h_{j}}
 
. In addition, there are bias weights (offsets) 
 
 
 
 
 a
 
 i
 
 
 
 
 {\displaystyle a_{i}}
 
 for 
 
 
 
 
 v
 
 i
 
 
 
 
 {\displaystyle v_{i}}
 
 and 
 
 
 
 
 b
 
 j
 
 
 
 
 {\displaystyle b_{j}}
 
 for 
 
 
 
 
 h
 
 j
 
 
 
 
 {\displaystyle h_{j}}
 
. Given the weights and biases, the energy of a configuration (pair of Boolean vectors) (v,h) is defined as

E
        (
        v
        ,
        h
        )
        =
        −
        
          ∑
          
            i
          
        
        
          a
          
            i
          
        
        
          v
          
            i
          
        
        −
        
          ∑
          
            j
          
        
        
          b
          
            j
          
        
        
          h
          
            j
          
        
        −
        
          ∑
          
            i
          
        
        
          ∑
          
            j
          
        
        
          v
          
            i
          
        
        
          w
          
            i
            ,
            j
          
        
        
          h
          
            j
          
        
      
    
    {\displaystyle E(v,h)=-\sum _{i}a_{i}v_{i}-\sum _{j}b_{j}h_{j}-\sum _{i}\sum _{j}v_{i}w_{i,j}h_{j}}

or, in matrix notation,

E
        (
        v
        ,
        h
        )
        =
        −
        
          a
          
            
              T
            
          
        
        v
        −
        
          b
          
            
              T
            
          
        
        h
        −
        
          v
          
            
              T
            
          
        
        W
        h
        .
      
    
    {\displaystyle E(v,h)=-a^{\mathrm {T} }v-b^{\mathrm {T} }h-v^{\mathrm {T} }Wh.}

This energy function is analogous to that of a <a href="/facts/Hopfield_network/xUBr6rpm">Hopfield network</a>. As with general Boltzmann machines, the <a href="/facts/Joint_probability_distribution/klX2ksGY">joint probability distribution</a> for the visible and hidden vectors is defined in terms of the energy function as follows,<a class="footnote-ref" id="fnref:14" href="#fn:14">14</a>

P
        (
        v
        ,
        h
        )
        =
        
          
            1
            Z
          
        
        
          e
          
            −
            E
            (
            v
            ,
            h
            )
          
        
      
    
    {\displaystyle P(v,h)={\frac {1}{Z}}e^{-E(v,h)}}

where 
 
 
 
 Z
 
 
 {\displaystyle Z}
 
 is a <a href="/facts/Partition_function_(mathematics)/dp53Le9y">partition function</a> defined as the sum of 
 
 
 
 
 e
 
 −
 E
 (
 v
 ,
 h
 )
 
 
 
 
 {\displaystyle e^{-E(v,h)}}
 
 over all possible configurations, which can be interpreted as a <a href="/facts/Normalizing_constant/KKTfybqr">normalizing constant</a> to ensure that the probabilities sum to 1. The <a href="/facts/Marginal_distribution/U9XBWAd1">marginal probability</a> of a visible vector is the sum of 
 
 
 
 P
 (
 v
 ,
 h
 )
 
 
 {\displaystyle P(v,h)}
 
 over all possible hidden layer configurations,<a class="footnote-ref" id="fnref:15" href="#fn:15">15</a>

P
 (
 v
 )
 =
 
 
 1
 Z
 
 
 
 ∑
 
 {
 h
 }
 
 
 
 e
 
 −
 E
 (
 v
 ,
 h
 )
 
 
 
 
 {\displaystyle P(v)={\frac {1}{Z}}\sum _{\{h\}}e^{-E(v,h)}}
 
,
and vice versa. Since the underlying graph structure of the RBM is <a href="/facts/Bipartite_graph/xWcXV9MB">bipartite</a> (meaning there are no intra-layer connections), the hidden unit activations are <a href="/facts/Conditional_independence/udUXFaMw">mutually independent</a> given the visible unit activations. Conversely, the visible unit activations are mutually independent given the hidden unit activations.<a class="footnote-ref" id="fnref:16" href="#fn:16">16</a> That is, for m visible units and n hidden units, the <a href="/facts/Conditional_probability/QcN2UERV">conditional probability</a> of a configuration of the visible units v, given a configuration of the hidden units h, is

P
 (
 v
 
 |
 
 h
 )
 =
 
 ∏
 
 i
 =
 1
 
 
 m
 
 
 P
 (
 
 v
 
 i
 
 
 
 |
 
 h
 )
 
 
 {\displaystyle P(v|h)=\prod _{i=1}^{m}P(v_{i}|h)}
 
.
Conversely, the conditional probability of h given v is

P
 (
 h
 
 |
 
 v
 )
 =
 
 ∏
 
 j
 =
 1
 
 
 n
 
 
 P
 (
 
 h
 
 j
 
 
 
 |
 
 v
 )
 
 
 {\displaystyle P(h|v)=\prod _{j=1}^{n}P(h_{j}|v)}
 
.
The individual activation probabilities are given by

P
        (
        
          h
          
            j
          
        
        =
        1
        
          |
        
        v
        )
        =
        σ
        
          (
          
            
              b
              
                j
              
            
            +
            
              ∑
              
                i
                =
                1
              
              
                m
              
            
            
              w
              
                i
                ,
                j
              
            
            
              v
              
                i
              
            
          
          )
        
      
    
    {\displaystyle P(h_{j}=1|v)=\sigma \left(b_{j}+\sum _{i=1}^{m}w_{i,j}v_{i}\right)}
  
 and 
  
    
      
        
        P
        (
        
          v
          
            i
          
        
        =
        1
        
          |
        
        h
        )
        =
        σ
        
          (
          
            
              a
              
                i
              
            
            +
            
              ∑
              
                j
                =
                1
              
              
                n
              
            
            
              w
              
                i
                ,
                j
              
            
            
              h
              
                j
              
            
          
          )
        
      
    
    {\displaystyle \,P(v_{i}=1|h)=\sigma \left(a_{i}+\sum _{j=1}^{n}w_{i,j}h_{j}\right)}

where 
 
 
 
 σ
 
 
 {\displaystyle \sigma }
 
 denotes the <a href="/facts/Logistic_function/IaQ254t7">logistic sigmoid</a>.
The visible units of Restricted Boltzmann Machine can be <a href="/facts/Multinomial_distribution/y58v1p9J">multinomial</a>, although the hidden units are <a href="/facts/Bernoulli_distribution/ChCtYyvs">Bernoulli</a>. In this case, the logistic function for visible units is replaced by the <a href="/facts/Softmax_function/pvxeWV6L">softmax function</a>

P
        (
        
          v
          
            i
          
          
            k
          
        
        =
        1
        
          |
        
        h
        )
        =
        
          
            
              exp
              ⁡
              (
              
                a
                
                  i
                
                
                  k
                
              
              +
              
                Σ
                
                  j
                
              
              
                W
                
                  i
                  j
                
                
                  k
                
              
              
                h
                
                  j
                
              
              )
            
            
              
                Σ
                
                  
                    k
                    ′
                  
                  =
                  1
                
                
                  K
                
              
              exp
              ⁡
              (
              
                a
                
                  i
                
                
                  
                    k
                    ′
                  
                
              
              +
              
                Σ
                
                  j
                
              
              
                W
                
                  i
                  j
                
                
                  
                    k
                    ′
                  
                
              
              
                h
                
                  j
                
              
              )
            
          
        
      
    
    {\displaystyle P(v_{i}^{k}=1|h)={\frac {\exp(a_{i}^{k}+\Sigma _{j}W_{ij}^{k}h_{j})}{\Sigma _{k'=1}^{K}\exp(a_{i}^{k'}+\Sigma _{j}W_{ij}^{k'}h_{j})}}}

where K is the number of discrete values that the visible values have. They are applied in topic modeling,<a class="footnote-ref" id="fnref:17" href="#fn:17">17</a> and <a href="/facts/Recommender_system/HjodW6nS">recommender systems</a>.<a class="footnote-ref" id="fnref:18" href="#fn:18">18</a>

<h3>Relation to other models</h3>
Restricted Boltzmann machines are a special case of <a href="/facts/Boltzmann_machine/2wyLI0pI">Boltzmann machines</a> and <a href="/facts/Markov_random_field/DJazwaeP">Markov random fields</a>.<a class="footnote-ref" id="fnref:19" href="#fn:19">19</a><a class="footnote-ref" id="fnref:20" href="#fn:20">20</a>
The <a href="/facts/Graphical_model/XxfmKhmM">graphical model</a> of RBMs corresponds to that of <a href="/facts/Factor_analysis/LT6B9I7D">factor analysis</a>.<a class="footnote-ref" id="fnref:21" href="#fn:21">21</a>

<h2 id="training-algorithm">Training algorithm</h2>
Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set 
 
 
 
 V
 
 
 {\displaystyle V}
 
 (a matrix, each row of which is treated as a visible vector 
 
 
 
 v
 
 
 {\displaystyle v}
 
),

arg
        ⁡
        
          max
          
            W
          
        
        
          ∏
          
            v
            ∈
            V
          
        
        P
        (
        v
        )
      
    
    {\displaystyle \arg \max _{W}\prod _{v\in V}P(v)}

or equivalently, to maximize the <a href="/facts/Expected_value/1XV0JKL8">expected</a> <a href="/facts/Log_probability/wMFT5vj0">log probability</a> of a training sample 
 
 
 
 v
 
 
 {\displaystyle v}
 
 selected randomly from 
 
 
 
 V
 
 
 {\displaystyle V}
 
:<a class="footnote-ref" id="fnref:22" href="#fn:22">22</a><a class="footnote-ref" id="fnref:23" href="#fn:23">23</a>

arg
        ⁡
        
          max
          
            W
          
        
        
          E
        
        
          [
          
            log
            ⁡
            P
            (
            v
            )
          
          ]
        
      
    
    {\displaystyle \arg \max _{W}\mathbb {E} \left[\log P(v)\right]}

The algorithm most often used to train RBMs, that is, to optimize the weight matrix 
 
 
 
 W
 
 
 {\displaystyle W}
 
, is the contrastive divergence (CD) algorithm due to <a href="/facts/Geoffrey_Hinton/HJU6lC2H">Hinton</a>, originally developed to train PoE (<a href="/facts/Product_of_experts/gkXjf426">product of experts</a>) models.<a class="footnote-ref" id="fnref:24" href="#fn:24">24</a><a class="footnote-ref" id="fnref:25" href="#fn:25">25</a>
The algorithm performs <a href="/facts/Gibbs_sampling/HDYRhxEq">Gibbs sampling</a> and is used inside a <a href="/facts/Gradient_descent/pFFrek0F">gradient descent</a> procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update.
The basic, single-step contrastive divergence (CD-1) procedure for a single sample can be summarized as follows:

<ol><li>Take a training sample v, compute the probabilities of the hidden units and sample a hidden activation vector h from this probability distribution.</li>
<li>Compute the <a href="/facts/Outer_product/qR9C2BU1">outer product</a> of v and h and call this the positive gradient.</li>
<li>From h, sample a reconstruction v' of the visible units, then resample the hidden activations h' from this. (Gibbs sampling step)</li>
<li>Compute the <a href="/facts/Outer_product/qR9C2BU1">outer product</a> of v' and h' and call this the negative gradient.</li>
<li>Let the update to the weight matrix 
 
 
 
 W
 
 
 {\displaystyle W}
 
 be the positive gradient minus the negative gradient, times some learning rate: 
 
 
 
 Δ
 W
 =
 ϵ
 (
 v
 
 h
 
 
 T
 
 
 
 −
 
 v
 ′
 
 
 h
 
 ′
 
 
 T
 
 
 
 
 )
 
 
 {\displaystyle \Delta W=\epsilon (vh^{\mathsf {T}}-v'h'^{\mathsf {T}})}
 
.</li>
<li>Update the biases a and b analogously: 
 
 
 
 Δ
 a
 =
 ϵ
 (
 v
 −
 
 v
 ′
 
 )
 
 
 {\displaystyle \Delta a=\epsilon (v-v')}
 
, 
 
 
 
 Δ
 b
 =
 ϵ
 (
 h
 −
 
 h
 ′
 
 )
 
 
 {\displaystyle \Delta b=\epsilon (h-h')}
 
.</li></ol>
A Practical Guide to Training RBMs written by Hinton can be found on his homepage.<a class="footnote-ref" id="fnref:26" href="#fn:26">26</a>

<h2 id="stacked-restricted-boltzmann-machine">Stacked Restricted Boltzmann Machine</h2>

See also: <a href="/facts/Deep_belief_network/aPkAjLIU">Deep belief network</a>
<ul><li>The difference between the Stacked Restricted Boltzmann Machines and RBM is that RBM has lateral connections within a layer that are prohibited to make analysis tractable. On the other hand, the Stacked Boltzmann consists of a combination of an unsupervised three-layer network with symmetric weights and a supervised fine-tuned top layer for recognizing three classes.</li>
<li>The usage of Stacked Boltzmann is to <a href="/facts/Natural-language_understanding/5waCJC1e">understand Natural languages</a>, <a href="/facts/Document_retrieval/H0n6LnCQ">retrieve documents</a>, image generation, and classification. These functions are trained with unsupervised pre-training and/or supervised fine-tuning. Unlike the undirected symmetric top layer, with a two-way unsymmetric layer for connection for RBM. The restricted Boltzmann's connection is three-layers with asymmetric weights, and two networks are combined into one.</li>
<li>Stacked Boltzmann does share similarities with RBM, the neuron for Stacked Boltzmann is a stochastic binary Hopfield neuron, which is the same as the Restricted Boltzmann Machine. The energy from both Restricted Boltzmann and RBM is given by Gibb's probability measure: 
 
 
 
 E
 =
 −
 
 
 1
 2
 
 
 
 ∑
 
 i
 ,
 j
 
 
 
 
 w
 
 i
 j
 
 
 
 
 s
 
 i
 
 
 
 
 
 s
 
 j
 
 
 
 
 +
 
 ∑
 
 i
 
 
 
 
 θ
 
 i
 
 
 
 
 
 s
 
 i
 
 
 
 
 
 {\displaystyle E=-{\frac {1}{2}}\sum _{i,j}{w_{ij}{s_{i}}{s_{j}}}+\sum _{i}{\theta _{i}}{s_{i}}}
 
. The training process of Restricted Boltzmann is similar to RBM. Restricted Boltzmann train one layer at a time and approximate equilibrium state with a 3-segment pass, not performing back propagation. Restricted Boltzmann uses both supervised and unsupervised on different RBM for pre-training for classification and recognition. The training uses contrastive divergence with Gibbs sampling: Δwij = e*(pij - p'ij)</li>
<li>The restricted Boltzmann's strength is it performs a non-linear transformation so it's easy to expand, and can give a hierarchical layer of features. The Weakness is that it has complicated calculations of integer and real-valued neurons. It does not follow the gradient of any function, so the approximation of Contrastive divergence to maximum likelihood is improvised.<a class="footnote-ref" id="fnref:27" href="#fn:27">27</a></li></ul>
<h2 id="literature">Literature</h2>
<ul><li>Fischer, Asja; Igel, Christian (2012), "An Introduction to Restricted Boltzmann Machines", Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Lecture Notes in Computer Science, vol. 7441, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 14–36, <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1007%2F978-3-642-33275-3_2">10.1007/978-3-642-33275-3_2</a>, <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-3-642-33274-6</li></ul>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Autoencoder/jnTqhtm9">Autoencoder</a></li>
<li><a href="/facts/Helmholtz_machine/E5G0Wzpg">Helmholtz machine</a></li></ul>

<h2 id="bibliography">Bibliography</h2>
<ul><li>Chen, Edwin (2011-07-18). <a href="http://blog.echen.me/2011/07/18/introduction-to-restricted-boltzmann-machines/">"Introduction to Restricted Boltzmann Machines"</a>. Edwin Chen's blog.</li>
<li>Nicholson, Chris; Gibson, Adam. <a href="https://web.archive.org/web/20170211042953/https://deeplearning4j.org/restrictedboltzmannmachine.html">"A Beginner's Tutorial for Restricted Boltzmann Machines"</a>. <a href="/facts/Deeplearning4j/chxHLqJs">Deeplearning4j</a> Documentation. Archived from the original on 2017-02-11. Retrieved 2018-11-15.{{cite web}}: CS1 maint: bot: original URL status unknown (link)</li>
<li>Nicholson, Chris; Gibson, Adam. <a href="https://web.archive.org/web/20160920122139/http://deeplearning4j.org/understandingRBMs.html">"Understanding RBMs"</a>. Deeplearning4j Documentation. Archived from <a href="http://deeplearning4j.org/understandingRBMs.html">the original</a> on 2016-09-20. Retrieved 2014-12-29.</li></ul>
<h2 id="external-links">External links</h2>
<ul><li><a href="/facts/Python_(programming_language)/YbuGqofa">Python</a> <a href="https://github.com/AmazaspShumik/sklearn-bayes/blob/master/skbayes/decomposition_models/rbm.py">implementation</a> of Bernoulli RBM and <a href="https://github.com/AmazaspShumik/sklearn-bayes/blob/master/ipython_notebooks_tutorials/decomposition_models/rbm_demo.ipynb">tutorial</a></li>
<li><a href="https://github.com/swirepe/SimpleRBM">SimpleRBM</a> is a very small RBM code (24kB) useful for you to learn about how RBMs learn and work.</li>
<li><a href="/facts/Julia_(programming_language)/AoB0PJ9C">Julia</a> implementation of Restricted Boltzmann machines: <a href="https://github.com/cossio/RestrictedBoltzmannMachines.jl">https://github.com/cossio/RestrictedBoltzmannMachines.jl</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Sherrington, David; Kirkpatrick, Scott (1975), "Solvable Model of a Spin-Glass", Physical Review Letters, 35 (35): 1792–1796, Bibcode:1975PhRvL..35.1792S, doi:10.1103/PhysRevLett.35.1792 <a href="/wiki/Bibcode_(identifier)" target="_blank">/wiki/Bibcode_(identifier)</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Smolensky, Paul (1986). "Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory" (PDF). In Rumelhart, David E.; McLelland, James L. (eds.). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations. MIT Press. pp. 194–281. ISBN 0-262-68053-X. <a href="0-262-68053-X" target="_blank">0-262-68053-X</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Hinton, G. E.; Salakhutdinov, R. R. (2006). "Reducing the Dimensionality of Data with Neural Networks" (PDF). Science. 313 (5786): 504–507. Bibcode:2006Sci...313..504H. doi:10.1126/science.1127647. PMID 16873662. S2CID 1658773. Archived from the original (PDF) on 2015-12-23. Retrieved 2015-12-02. <a href="https://web.archive.org/web/20151223152006/http://www.cs.toronto.edu/~hinton/science.pdf" target="_blank">https://web.archive.org/web/20151223152006/http://www.cs.toronto.edu/~hinton/science.pdf</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Larochelle, H.; Bengio, Y. (2008). Classification using discriminative restricted Boltzmann machines (PDF). Proceedings of the 25th international conference on Machine learning - ICML '08. p. 536. doi:10.1145/1390156.1390224. ISBN 978-1-60558-205-4. <a href="978-1-60558-205-4" target="_blank">978-1-60558-205-4</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Salakhutdinov, R.; Mnih, A.; Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th international conference on Machine learning - ICML '07. p. 791. doi:10.1145/1273496.1273596. ISBN 978-1-59593-793-3. <a href="978-1-59593-793-3" target="_blank">978-1-59593-793-3</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Coates, Adam; Lee, Honglak; Ng, Andrew Y. (2011). An analysis of single-layer networks in unsupervised feature learning (PDF). International Conference on Artificial Intelligence and Statistics (AISTATS). Archived from the original (PDF) on 2014-12-20. Retrieved 2014-12-19. <a href="https://web.archive.org/web/20141220030058/http://cs.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf" target="_blank">https://web.archive.org/web/20141220030058/http://cs.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Ruslan Salakhutdinov and Geoffrey Hinton (2010). Replicated softmax: an undirected topic model Archived 2012-05-25 at the Wayback Machine. Neural Information Processing Systems 23. <a href="http://books.nips.cc/papers/files/nips22/NIPS2009_0817.pdf" target="_blank">http://books.nips.cc/papers/files/nips22/NIPS2009_0817.pdf</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Bravi, Barbara; Di Gioacchino, Andrea; Fernandez-de-Cossio-Diaz, Jorge; Walczak, Aleksandra M; Mora, Thierry; Cocco, Simona; Monasson, Rémi (2023-09-08). Bitbol, Anne-Florence; Eisen, Michael B (eds.). "A transfer-learning approach to predict antigen immunogenicity and T-cell receptor specificity". eLife. 12: e85126. doi:10.7554/eLife.85126. ISSN 2050-084X. PMC 10522340. PMID 37681658. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10522340" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10522340</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Carleo, Giuseppe; Troyer, Matthias (2017-02-10). "Solving the quantum many-body problem with artificial neural networks". Science. 355 (6325): 602–606. arXiv:1606.02318. Bibcode:2017Sci...355..602C. doi:10.1126/science.aag2302. ISSN 0036-8075. PMID 28183973. S2CID 206651104. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Melko, Roger G.; Carleo, Giuseppe; Carrasquilla, Juan; Cirac, J. Ignacio (September 2019). "Restricted Boltzmann machines in quantum physics". Nature Physics. 15 (9): 887–892. Bibcode:2019NatPh..15..887M. doi:10.1038/s41567-019-0545-1. ISSN 1745-2481. S2CID 256704838. <a href="/wiki/Bibcode_(identifier)" target="_blank">/wiki/Bibcode_(identifier)</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Pan, Ruizhi; Clark, Charles W. (2024). "Efficiency of neural-network state representations of one-dimensional quantum spin systems". Physical Review Research. 6 (2): 023193. arXiv:2302.00173. Bibcode:2024PhRvR...6b3193P. doi:10.1103/PhysRevResearch.6.023193. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005). On contrastive divergence learning. Artificial Intelligence and Statistics. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.221.8829&rep=rep1&type=pdf#page=42" target="_blank">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.221.8829&rep=rep1&type=pdf#page=42</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
<li id="fn:13">Hinton, G. (2009). "Deep belief networks". Scholarpedia. 4 (5): 5947. Bibcode:2009SchpJ...4.5947H. doi:10.4249/scholarpedia.5947. <a href="https://doi.org/10.4249%2Fscholarpedia.5947" target="_blank">https://doi.org/10.4249%2Fscholarpedia.5947</a> <a href="#fnref:13" class="footnote-back-ref">↩</a></li>
<li id="fn:14">Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010–003, University of Toronto. <a href="http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf" target="_blank">http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf</a> <a href="#fnref:14" class="footnote-back-ref">↩</a></li>
<li id="fn:15">Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010–003, University of Toronto. <a href="http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf" target="_blank">http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf</a> <a href="#fnref:15" class="footnote-back-ref">↩</a></li>
<li id="fn:16">Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005). On contrastive divergence learning. Artificial Intelligence and Statistics. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.221.8829&rep=rep1&type=pdf#page=42" target="_blank">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.221.8829&rep=rep1&type=pdf#page=42</a> <a href="#fnref:16" class="footnote-back-ref">↩</a></li>
<li id="fn:17">Ruslan Salakhutdinov and Geoffrey Hinton (2010). Replicated softmax: an undirected topic model Archived 2012-05-25 at the Wayback Machine. Neural Information Processing Systems 23. <a href="http://books.nips.cc/papers/files/nips22/NIPS2009_0817.pdf" target="_blank">http://books.nips.cc/papers/files/nips22/NIPS2009_0817.pdf</a> <a href="#fnref:17" class="footnote-back-ref">↩</a></li>
<li id="fn:18">Salakhutdinov, R.; Mnih, A.; Hinton, G. (2007). Restricted Boltzmann machines for collaborative filtering. Proceedings of the 24th international conference on Machine learning - ICML '07. p. 791. doi:10.1145/1273496.1273596. ISBN 978-1-59593-793-3. <a href="978-1-59593-793-3" target="_blank">978-1-59593-793-3</a> <a href="#fnref:18" class="footnote-back-ref">↩</a></li>
<li id="fn:19">Sutskever, Ilya; Tieleman, Tijmen (2010). "On the convergence properties of contrastive divergence" (PDF). Proc. 13th Int'l Conf. On AI and Statistics (AISTATS). Archived from the original (PDF) on 2015-06-10. <a href="https://web.archive.org/web/20150610230811/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_SutskeverT10.pdf" target="_blank">https://web.archive.org/web/20150610230811/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_SutskeverT10.pdf</a> <a href="#fnref:19" class="footnote-back-ref">↩</a></li>
<li id="fn:20">Asja Fischer and Christian Igel. Training Restricted Boltzmann Machines: An Introduction Archived 2015-06-10 at the Wayback Machine. Pattern Recognition 47, pp. 25-39, 2014 <a href="http://image.diku.dk/igel/paper/TRBMAI.pdf" target="_blank">http://image.diku.dk/igel/paper/TRBMAI.pdf</a> <a href="#fnref:20" class="footnote-back-ref">↩</a></li>
<li id="fn:21">María Angélica Cueto; Jason Morton; Bernd Sturmfels (2010). "Geometry of the restricted Boltzmann machine". Algebraic Methods in Statistics and Probability. 516. American Mathematical Society. arXiv:0908.4425. Bibcode:2009arXiv0908.4425A. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:21" class="footnote-back-ref">↩</a></li>
<li id="fn:22">Sutskever, Ilya; Tieleman, Tijmen (2010). "On the convergence properties of contrastive divergence" (PDF). Proc. 13th Int'l Conf. On AI and Statistics (AISTATS). Archived from the original (PDF) on 2015-06-10. <a href="https://web.archive.org/web/20150610230811/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_SutskeverT10.pdf" target="_blank">https://web.archive.org/web/20150610230811/http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_SutskeverT10.pdf</a> <a href="#fnref:22" class="footnote-back-ref">↩</a></li>
<li id="fn:23">Asja Fischer and Christian Igel. Training Restricted Boltzmann Machines: An Introduction Archived 2015-06-10 at the Wayback Machine. Pattern Recognition 47, pp. 25-39, 2014 <a href="http://image.diku.dk/igel/paper/TRBMAI.pdf" target="_blank">http://image.diku.dk/igel/paper/TRBMAI.pdf</a> <a href="#fnref:23" class="footnote-back-ref">↩</a></li>
<li id="fn:24">Geoffrey Hinton (1999). Products of Experts. ICANN 1999. <a href="http://www.gatsby.ucl.ac.uk/publications/papers/06-1999.pdf" target="_blank">http://www.gatsby.ucl.ac.uk/publications/papers/06-1999.pdf</a> <a href="#fnref:24" class="footnote-back-ref">↩</a></li>
<li id="fn:25">Hinton, G. E. (2002). "Training Products of Experts by Minimizing Contrastive Divergence" (PDF). Neural Computation. 14 (8): 1771–1800. doi:10.1162/089976602760128018. PMID 12180402. S2CID 207596505. <a href="http://www.cs.toronto.edu/~hinton/absps/tr00-004.pdf" target="_blank">http://www.cs.toronto.edu/~hinton/absps/tr00-004.pdf</a> <a href="#fnref:25" class="footnote-back-ref">↩</a></li>
<li id="fn:26">Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010–003, University of Toronto. <a href="http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf" target="_blank">http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf</a> <a href="#fnref:26" class="footnote-back-ref">↩</a></li>
<li id="fn:27">Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010–003, University of Toronto. <a href="http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf" target="_blank">http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf</a> <a href="#fnref:27" class="footnote-back-ref">↩</a></li>
</ol>

Restricted Boltzmann machine open-in-new

Restricted Boltzmann machine