EfficientNet

<h2 id="compound-scaling">Compound scaling</h2>
<p>EfficientNet introduces compound scaling, which, instead of scaling one dimension of the network at a time, such as depth (number of layers), width (number of channels), or resolution (input image size), uses a compound coefficient 
  
    
      
        ϕ
      
    
    {\displaystyle \phi }
  
 to scale all three dimensions simultaneously. Specifically, given a baseline network, the depth, width, and resolution are scaled according to the following equations:<a class="footnote-ref" id="fnref:2" href="#fn:2"><sup>2</sup></a>
  
    
      
        
          
            
              
                
                  depth multiplier: 
                
                d
              
              
                
                =
                
                  α
                  
                    ϕ
                  
                
              
            
            
              
                
                  width multiplier: 
                
                w
              
              
                
                =
                
                  β
                  
                    ϕ
                  
                
              
            
            
              
                
                  resolution multiplier: 
                
                r
              
              
                
                =
                
                  γ
                  
                    ϕ
                  
                
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}{\text{depth multiplier: }}d&=\alpha ^{\phi }\\{\text{width multiplier: }}w&=\beta ^{\phi }\\{\text{resolution multiplier: }}r&=\gamma ^{\phi }\end{aligned}}}
  
subject to 
  
    
      
        α
        ⋅
        
          β
          
            2
          
        
        ⋅
        
          γ
          
            2
          
        
        ≈
        2
      
    
    {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2}
  
 and 
  
    
      
        α
        ≥
        1
        ,
        β
        ≥
        1
        ,
        γ
        ≥
        1
      
    
    {\displaystyle \alpha \geq 1,\beta \geq 1,\gamma \geq 1}
  
. The 
  
    
      
        α
        ⋅
        
          β
          
            2
          
        
        ⋅
        
          γ
          
            2
          
        
        ≈
        2
      
    
    {\displaystyle \alpha \cdot \beta ^{2}\cdot \gamma ^{2}\approx 2}
  
 condition is such that increasing 
  
    
      
        ϕ
      
    
    {\displaystyle \phi }
  
 by a factor of 
  
    
      
        
          ϕ
          
            0
          
        
      
    
    {\displaystyle \phi _{0}}
  
 would increase the total FLOPs of running the network on an image approximately 
  
    
      
        
          2
          
            
              ϕ
              
                0
              
            
          
        
      
    
    {\displaystyle 2^{\phi _{0}}}
  
 times. The <a href="/facts/Hyperparameter_(machine_learning)/jCxm3fTA">hyperparameters</a> 
  
    
      
        α
      
    
    {\displaystyle \alpha }
  
, 
  
    
      
        β
      
    
    {\displaystyle \beta }
  
, and 
  
    
      
        γ
      
    
    {\displaystyle \gamma }
  
 are determined by a small <a href="/facts/Grid_search/bzcJJrxs">grid search</a>. The original paper suggested 1.2, 1.1, and 1.15, respectively.
</p><p>Architecturally, they optimized the choice of modules by <a href="/facts/Neural_architecture_search/txtVQK92">neural architecture search</a> (NAS), and found that the inverted bottleneck convolution (which they called MBConv) used in <a href="/facts/MobileNet/gaLNKeYR">MobileNet</a> worked well.
</p><p>The EfficientNet family is a stack of MBConv layers, with shapes determined by the compound scaling. The original publication consisted of 8 models, from EfficientNet-B0 to EfficientNet-B7, with increasing model size and accuracy. EfficientNet-B0 is the baseline network, and subsequent models are obtained by scaling the baseline network by increasing 
  
    
      
        ϕ
      
    
    {\displaystyle \phi }
  
.
</p>
<h2 id="variants">Variants</h2>
<p>EfficientNet has been adapted for fast inference on <a href="/facts/Edge_computing/4lqr8N6V">edge</a> <a href="/facts/Tensor_Processing_Unit/aVlg0LFF">TPUs</a><a class="footnote-ref" id="fnref:3" href="#fn:3"><sup>3</sup></a> and centralized TPU or <a href="/facts/Graphics_processing_unit/PTK1RQVp">GPU</a> <a href="/facts/Computer_cluster/mT7S1BcX">clusters</a> by NAS.<a class="footnote-ref" id="fnref:4" href="#fn:4"><sup>4</sup></a>
</p><p>EfficientNet V2 was published in June 2021. The architecture was improved by further NAS search with more types of convolutional layers.<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a> It also introduced a training method, which progressively increases image size during training, and uses regularization techniques like <a href="/facts/Dropout_training/HvqnkJpt">dropout</a>, <a href="/facts/Data_augmentation/vf8p9qHe">RandAugment</a>,<a class="footnote-ref" id="fnref:6" href="#fn:6"><sup>6</sup></a> and Mixup.<a class="footnote-ref" id="fnref:7" href="#fn:7"><sup>7</sup></a> The authors claim this approach mitigates accuracy drops often associated with progressive resizing.
</p>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Convolutional_neural_network/kHEdnmGU">Convolutional neural network</a></li>
<li><a href="/facts/SqueezeNet/rTV0pYCP">SqueezeNet</a></li>
<li><a href="/facts/MobileNet/gaLNKeYR">MobileNet</a></li>
<li><a href="/facts/You_Only_Look_Once/Kqm7lJ3k">You Only Look Once</a></li></ul>

<h2 id="external-links">External links</h2>
<ul><li><a href="https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html">EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling (Google AI Blog)</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Tan, Mingxing; Le, Quoc V. (2020-09-11), EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, arXiv:1905.11946 <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Tan, Mingxing; Le, Quoc V. (2020-09-11), EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, arXiv:1905.11946 <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>"EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML". research.google. August 6, 2019. Retrieved 2024-10-18. <a href="https://research.google/blog/efficientnet-edgetpu-creating-accelerator-optimized-neural-networks-with-automl/" target="_blank">https://research.google/blog/efficientnet-edgetpu-creating-accelerator-optimized-neural-networks-with-automl/</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>Li, Sheng; Tan, Mingxing; Pang, Ruoming; Li, Andrew; Cheng, Liqun; Le, Quoc; Jouppi, Norman P. (2021-02-10), Searching for Fast Model Families on Datacenter Accelerators, arXiv:2102.05610 <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Tan, Mingxing; Le, Quoc V. (2021-06-23), EfficientNetV2: Smaller Models and Faster Training, arXiv:2104.00298 <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
<li id="fn:6"><p>Cubuk, Ekin D.; Zoph, Barret; Shlens, Jonathon; Le, Quoc V. (2020). "Randaugment: Practical Automated Data Augmentation With a Reduced Search Space": 702–703. arXiv:1909.13719. {{cite journal}}: Cite journal requires |journal= (help) <a href="https://openaccess.thecvf.com/content_CVPRW_2020/html/w40/Cubuk_Randaugment_Practical_Automated_Data_Augmentation_With_a_Reduced_Search_Space_CVPRW_2020_paper.html" target="_blank">https://openaccess.thecvf.com/content_CVPRW_2020/html/w40/Cubuk_Randaugment_Practical_Automated_Data_Augmentation_With_a_Reduced_Search_Space_CVPRW_2020_paper.html</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></p></li>
<li id="fn:7"><p>Zhang, Hongyi; Cisse, Moustapha; Dauphin, Yann N.; Lopez-Paz, David (2018-04-27), mixup: Beyond Empirical Risk Minimization, arXiv:1710.09412 <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></p></li>
</ol>

EfficientNet open-in-new

EfficientNet