Inception score

<h2 id="definition">Definition</h2>
Let there be two spaces, the space of images 
 
 
 
 
 Ω
 
 X
 
 
 
 
 {\displaystyle \Omega _{X}}
 
 and the space of labels 
 
 
 
 
 Ω
 
 Y
 
 
 
 
 {\displaystyle \Omega _{Y}}
 
. The space of labels is finite.
Let 
 
 
 
 
 p
 
 g
 e
 n
 
 
 
 
 {\displaystyle p_{gen}}
 
 be a probability distribution over 
 
 
 
 
 Ω
 
 X
 
 
 
 
 {\displaystyle \Omega _{X}}
 
 that we wish to judge.
Let a discriminator be a function of type 
 
 
 
 
 p
 
 d
 i
 s
 
 
 :
 
 Ω
 
 X
 
 
 →
 M
 (
 
 Ω
 
 Y
 
 
 )
 
 
 {\displaystyle p_{dis}:\Omega _{X}\to M(\Omega _{Y})}
 
where 
 
 
 
 M
 (
 
 Ω
 
 Y
 
 
 )
 
 
 {\displaystyle M(\Omega _{Y})}
 
 is the set of all probability distributions on 
 
 
 
 
 Ω
 
 Y
 
 
 
 
 {\displaystyle \Omega _{Y}}
 
. For any image 
 
 
 
 x
 
 
 {\displaystyle x}
 
, and any label 
 
 
 
 y
 
 
 {\displaystyle y}
 
, let 
 
 
 
 
 p
 
 d
 i
 s
 
 
 (
 y
 
 |
 
 x
 )
 
 
 {\displaystyle p_{dis}(y|x)}
 
 be the probability that image 
 
 
 
 x
 
 
 {\displaystyle x}
 
 has label 
 
 
 
 y
 
 
 {\displaystyle y}
 
, according to the discriminator. It is usually implemented as an Inception-v3 network trained on ImageNet.
The Inception Score of 
 
 
 
 
 p
 
 g
 e
 n
 
 
 
 
 {\displaystyle p_{gen}}
 
 relative to 
 
 
 
 
 p
 
 d
 i
 s
 
 
 
 
 {\displaystyle p_{dis}}
 
 is
 
 
 
 I
 S
 (
 
 p
 
 g
 e
 n
 
 
 ,
 
 p
 
 d
 i
 s
 
 
 )
 :=
 exp
 ⁡
 
 (
 
 
 
 E
 
 
 x
 ∼
 
 p
 
 g
 e
 n
 
 
 
 
 
 [
 
 
 D
 
 K
 L
 
 
 
 (
 
 
 p
 
 d
 i
 s
 
 
 (
 ⋅
 
 |
 
 x
 )
 ‖
 ∫
 
 p
 
 d
 i
 s
 
 
 (
 ⋅
 
 |
 
 x
 )
 
 p
 
 g
 e
 n
 
 
 (
 x
 )
 d
 x
 
 )
 
 
 ]
 
 
 )
 
 
 
 {\displaystyle IS(p_{gen},p_{dis}):=\exp \left(\mathbb {E} _{x\sim p_{gen}}\left[D_{KL}\left(p_{dis}(\cdot |x)\|\int p_{dis}(\cdot |x)p_{gen}(x)dx\right)\right]\right)}
 
Equivalent rewrites include
 
 
 
 ln
 ⁡
 I
 S
 (
 
 p
 
 g
 e
 n
 
 
 ,
 
 p
 
 d
 i
 s
 
 
 )
 :=
 
 
 E
 
 
 x
 ∼
 
 p
 
 g
 e
 n
 
 
 
 
 
 [
 
 
 D
 
 K
 L
 
 
 
 (
 
 
 p
 
 d
 i
 s
 
 
 (
 ⋅
 
 |
 
 x
 )
 ‖
 
 
 E
 
 
 x
 ∼
 
 p
 
 g
 e
 n
 
 
 
 
 [
 
 p
 
 d
 i
 s
 
 
 (
 ⋅
 
 |
 
 x
 )
 ]
 
 )
 
 
 ]
 
 
 
 {\displaystyle \ln IS(p_{gen},p_{dis}):=\mathbb {E} _{x\sim p_{gen}}\left[D_{KL}\left(p_{dis}(\cdot |x)\|\mathbb {E} _{x\sim p_{gen}}[p_{dis}(\cdot |x)]\right)\right]}

ln
        ⁡
        I
        S
        (
        
          p
          
            g
            e
            n
          
        
        ,
        
          p
          
            d
            i
            s
          
        
        )
        :=
        H
        [
        
          
            E
          
          
            x
            ∼
            
              p
              
                g
                e
                n
              
            
          
        
        [
        
          p
          
            d
            i
            s
          
        
        (
        ⋅
        
          |
        
        x
        )
        ]
        ]
        −
        
          
            E
          
          
            x
            ∼
            
              p
              
                g
                e
                n
              
            
          
        
        [
        H
        [
        
          p
          
            d
            i
            s
          
        
        (
        ⋅
        
          |
        
        x
        )
        ]
        ]
      
    
    {\displaystyle \ln IS(p_{gen},p_{dis}):=H[\mathbb {E} _{x\sim p_{gen}}[p_{dis}(\cdot |x)]]-\mathbb {E} _{x\sim p_{gen}}[H[p_{dis}(\cdot |x)]]}

ln
 ⁡
 I
 S
 
 
 {\displaystyle \ln IS}
 
 is nonnegative by <a href="/facts/Jensen%2527s_inequality/sosRQWN1">Jensen's inequality</a>.

Pseudocode:<blockquote>INPUT discriminator 
 
 
 
 
 p
 
 d
 i
 s
 
 
 
 
 {\displaystyle p_{dis}}
 
.
INPUT generator 
 
 
 
 g
 
 
 {\displaystyle g}
 
.
Sample images 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
 from generator.
Compute 
 
 
 
 
 p
 
 d
 i
 s
 
 
 (
 ⋅
 
 |
 
 
 x
 
 i
 
 
 )
 
 
 {\displaystyle p_{dis}(\cdot |x_{i})}
 
, the probability distribution over labels conditional on image 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
.
Sum up the results to obtain 
 
 
 
 
 
 
 p
 ^
 
 
 
 
 
 {\displaystyle {\hat {p}}}
 
, an empirical estimate of 
 
 
 
 ∫
 
 p
 
 d
 i
 s
 
 
 (
 ⋅
 
 |
 
 x
 )
 
 p
 
 g
 e
 n
 
 
 (
 x
 )
 d
 x
 
 
 {\displaystyle \int p_{dis}(\cdot |x)p_{gen}(x)dx}
 
.
Sample more images 
 
 
 
 
 x
 
 i
 
 
 
 
 {\displaystyle x_{i}}
 
 from generator, and for each, compute 
 
 
 
 
 D
 
 K
 L
 
 
 
 (
 
 
 p
 
 d
 i
 s
 
 
 (
 ⋅
 
 |
 
 
 x
 
 i
 
 
 )
 ‖
 
 
 
 p
 ^
 
 
 
 
 )
 
 
 
 {\displaystyle D_{KL}\left(p_{dis}(\cdot |x_{i})\|{\hat {p}}\right)}
 
.
Average the results, and take its exponential.

RETURN the result.</blockquote>
<h3>Interpretation</h3>
A higher inception score is interpreted as "better", as it means that 
 
 
 
 
 p
 
 g
 e
 n
 
 
 
 
 {\displaystyle p_{gen}}
 
 is a "sharp and distinct" collection of pictures.

 
 
 
 ln
 ⁡
 I
 S
 (
 
 p
 
 g
 e
 n
 
 
 ,
 
 p
 
 d
 i
 s
 
 
 )
 ∈
 [
 0
 ,
 ln
 ⁡
 N
 ]
 
 
 {\displaystyle \ln IS(p_{gen},p_{dis})\in [0,\ln N]}
 
, where 
 
 
 
 N
 
 
 {\displaystyle N}
 
 is the total number of possible labels.

 
 
 
 ln
 ⁡
 I
 S
 (
 
 p
 
 g
 e
 n
 
 
 ,
 
 p
 
 d
 i
 s
 
 
 )
 =
 0
 
 
 {\displaystyle \ln IS(p_{gen},p_{dis})=0}
 
 iff for almost all 
 
 
 
 x
 ∼
 
 p
 
 g
 e
 n
 
 
 
 
 {\displaystyle x\sim p_{gen}}

p
 
 d
 i
 s
 
 
 (
 ⋅
 
 |
 
 x
 )
 =
 ∫
 
 p
 
 d
 i
 s
 
 
 (
 ⋅
 
 |
 
 x
 )
 
 p
 
 g
 e
 n
 
 
 (
 x
 )
 d
 x
 
 
 {\displaystyle p_{dis}(\cdot |x)=\int p_{dis}(\cdot |x)p_{gen}(x)dx}
 
That means 
 
 
 
 
 p
 
 g
 e
 n
 
 
 
 
 {\displaystyle p_{gen}}
 
 is completely "indistinct". That is, for any image 
 
 
 
 x
 
 
 {\displaystyle x}
 
 sampled from 
 
 
 
 
 p
 
 g
 e
 n
 
 
 
 
 {\displaystyle p_{gen}}
 
, discriminator returns exactly the same label predictions 
 
 
 
 
 p
 
 d
 i
 s
 
 
 (
 ⋅
 
 |
 
 x
 )
 
 
 {\displaystyle p_{dis}(\cdot |x)}
 
.
The highest inception score 
 
 
 
 N
 
 
 {\displaystyle N}
 
 is achieved if and only if the two conditions are both true:

<ul><li>For almost all 
 
 
 
 x
 ∼
 
 p
 
 g
 e
 n
 
 
 
 
 {\displaystyle x\sim p_{gen}}
 
, the distribution 
 
 
 
 
 p
 
 d
 i
 s
 
 
 (
 y
 
 |
 
 x
 )
 
 
 {\displaystyle p_{dis}(y|x)}
 
 is concentrated on one label. That is, 
 
 
 
 
 H
 
 y
 
 
 [
 
 p
 
 d
 i
 s
 
 
 (
 y
 
 |
 
 x
 )
 ]
 =
 0
 
 
 {\displaystyle H_{y}[p_{dis}(y|x)]=0}
 
. That is, every image sampled from 
 
 
 
 
 p
 
 g
 e
 n
 
 
 
 
 {\displaystyle p_{gen}}
 
 is exactly classified by the discriminator.</li>
<li>For every label 
 
 
 
 y
 
 
 {\displaystyle y}
 
, the proportion of generated images labelled as 
 
 
 
 y
 
 
 {\displaystyle y}
 
 is exactly 
 
 
 
 
 
 E
 
 
 x
 ∼
 
 p
 
 g
 e
 n
 
 
 
 
 [
 
 p
 
 d
 i
 s
 
 
 (
 y
 
 |
 
 x
 )
 ]
 =
 
 
 1
 N
 
 
 
 
 {\displaystyle \mathbb {E} _{x\sim p_{gen}}[p_{dis}(y|x)]={\frac {1}{N}}}
 
. That is, the generated images are equally distributed over all labels.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Salimans, Tim; Goodfellow, Ian; Zaremba, Wojciech; Cheung, Vicki; Radford, Alec; Chen, Xi; Chen, Xi (2016). "Improved Techniques for Training GANs". Advances in Neural Information Processing Systems. 29. Curran Associates, Inc. arXiv:1606.03498. <a href="https://proceedings.neurips.cc/paper/2016/hash/8a3363abe792db2d8761d6403605aeb7-Abstract.html" target="_blank">https://proceedings.neurips.cc/paper/2016/hash/8a3363abe792db2d8761d6403605aeb7-Abstract.html</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Frolov, Stanislav; Hinz, Tobias; Raue, Federico; Hees, Jörn; Dengel, Andreas (December 2021). "Adversarial text-to-image synthesis: A review". Neural Networks. 144: 187–209. arXiv:2101.09983. doi:10.1016/j.neunet.2021.07.019. PMID 34500257. S2CID 231698782. <a href="https://doi.org/10.1016%2Fj.neunet.2021.07.019" target="_blank">https://doi.org/10.1016%2Fj.neunet.2021.07.019</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Borji, Ali (2022). "Pros and cons of GAN evaluation measures: New developments". Computer Vision and Image Understanding. 215: 103329. arXiv:2103.09396. doi:10.1016/j.cviu.2021.103329. S2CID 232257836. <a href="https://linkinghub.elsevier.com/retrieve/pii/S1077314221001685" target="_blank">https://linkinghub.elsevier.com/retrieve/pii/S1077314221001685</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
</ol>

Inception score open-in-new

Inception score