List of datasets in computer vision and image processing

<h2 id="object-detection-and-recognition">Object detection and recognition</h2>
<table><tbody><tr><th scope="col">Dataset Name</th><th scope="col">Brief description</th><th scope="col">Preprocessing</th><th scope="col">Instances</th><th scope="col">Format</th><th scope="col">Default Task</th><th scope="col">Created (updated)</th><th scope="col">Reference</th><th scope="col">Creator</th></tr><tr><td><a href="/facts/MNIST_database/nzUEydbL">MNIST</a></td><td>Database of grayscale handwritten digits.</td><td></td><td>60,000</td><td>image, label</td><td>classification</td><td>1994</td><td><a class="footnote-ref" id="fnref:1" href="#fn:1">1</a></td><td>LeCun et al.</td></tr><tr><td><a href="/facts/MNIST_database/nzUEydbL">Extended MNIST</a></td><td>Database of grayscale handwritten digits and letters.</td><td></td><td>810,000</td><td>image, label</td><td>classification</td><td>2010</td><td><a class="footnote-ref" id="fnref:2" href="#fn:2">2</a></td><td>NIST</td></tr><tr><td>NYU Object Recognition Benchmark (NORB)</td><td>Stereoscopic pairs of photos of toys in various orientations.</td><td>Centering, perturbation.</td><td>97,200 image pairs (50 uniform-colored toys under 36 angles, 9 azimuths, and 6 lighting conditions)</td><td>Images</td><td>Object recognition</td><td>2004</td><td><a class="footnote-ref" id="fnref:3" href="#fn:3">3</a><a class="footnote-ref" id="fnref:4" href="#fn:4">4</a></td><td>LeCun et al.</td></tr><tr><td><a href="/facts/80_Million_Tiny_Images/PXJGcDyx">80 Million Tiny Images</a></td><td>80 million 32×32 images labelled with 75,062 non-abstract nouns.</td><td></td><td>80,000,000</td><td>image, label</td><td></td><td>2008</td><td><a class="footnote-ref" id="fnref:5" href="#fn:5">5</a></td><td>Torralba et al.</td></tr><tr><td>Street View House Numbers (SVHN)</td><td>630,420 digits with bounding boxes in house numbers captured in <a href="/facts/Google_Street_View/Ruqzal5A">Google Street View</a>.</td><td></td><td>630,420</td><td>image, label, bounding boxes</td><td></td><td>2011</td><td><a class="footnote-ref" id="fnref:6" href="#fn:6">6</a><a class="footnote-ref" id="fnref:7" href="#fn:7">7</a></td><td>Netzer et al.</td></tr><tr><td>JFT-300M</td><td>Dataset internal to Google Research. 303M images with 375M labels in 18291 categories</td><td></td><td>303,000,000</td><td>image, label</td><td></td><td>2017</td><td><a class="footnote-ref" id="fnref:8" href="#fn:8">8</a><a class="footnote-ref" id="fnref:9" href="#fn:9">9</a><a class="footnote-ref" id="fnref:10" href="#fn:10">10</a></td><td>Google Research</td></tr><tr><td>JFT-3B</td><td>Internal to Google Research. 3 billion images, annotated with ~30k categories in a hierarchy.</td><td></td><td>3,000,000,000</td><td>image, label</td><td></td><td>2021</td><td><a class="footnote-ref" id="fnref:11" href="#fn:11">11</a></td><td>Google Research</td></tr><tr><td><a href="http://places2.csail.mit.edu/">Places</a></td><td>10+ million images in 400+ scene classes, with 5000 to 30,000 images per class.</td><td></td><td>10,000,000</td><td>image, label</td><td></td><td>2018</td><td><a class="footnote-ref" id="fnref:12" href="#fn:12">12</a></td><td>Zhou et al</td></tr><tr><td>Ego 4D</td><td>A massive-scale, <a href="/facts/Egocentric_vision/gK86w1KA">egocentric</a> dataset and benchmark suite collected across 74 worldwide locations and 9 countries, with over 3,670 hours of daily-life activity video.</td><td>Object bounding boxes, transcriptions, labeling.</td><td>3,670 video hours</td><td>video, audio, transcriptions</td><td>Multimodal first-person task</td><td>2022</td><td><a class="footnote-ref" id="fnref:13" href="#fn:13">13</a></td><td>K. Grauman et al.</td></tr><tr><td>Wikipedia-based Image Text Dataset</td><td>37.5 million image-text examples with 11.5 million unique images across 108 Wikipedia languages.</td><td></td><td>11,500,000</td><td>image, caption</td><td>Pretraining, image captioning</td><td>2021</td><td><a class="footnote-ref" id="fnref:14" href="#fn:14">14</a></td><td>Srinivasan e al, Google Research</td></tr><tr><td>Visual Genome</td><td>Images and their description</td><td></td><td>108,000</td><td>images, text</td><td>Image captioning</td><td>2016</td><td><a class="footnote-ref" id="fnref:15" href="#fn:15">15</a></td><td>R. Krishna et al.</td></tr><tr><td>Berkeley 3-D Object Dataset</td><td>849 images taken in 75 different scenes. About 50 different object classes are labeled.</td><td>Object bounding boxes and labeling.</td><td>849</td><td>labeled images, text</td><td>Object recognition</td><td>2014</td><td><a class="footnote-ref" id="fnref:16" href="#fn:16">16</a><a class="footnote-ref" id="fnref:17" href="#fn:17">17</a></td><td>A. Janoch et al.</td></tr><tr><td>Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500)</td><td>500 natural images, explicitly separated into disjoint train, validation and test subsets + benchmarking code. Based on BSDS300.</td><td>Each image segmented by five different subjects on average.</td><td>500</td><td>Segmented images</td><td>Contour detection and hierarchical image segmentation</td><td>2011</td><td><a class="footnote-ref" id="fnref:18" href="#fn:18">18</a></td><td><a href="/facts/University_of_California%2c_Berkeley/MXVEjklr">University of California, Berkeley</a></td></tr><tr><td>Microsoft Common Objects in Context (COCO)</td><td>complex everyday scenes of common objects in their natural context.</td><td>Object highlighting, labeling, and classification into 91 object types.</td><td>2,500,000</td><td>Labeled images, text</td><td>Object recognition</td><td>2015</td><td><a class="footnote-ref" id="fnref:19" href="#fn:19">19</a><a class="footnote-ref" id="fnref:20" href="#fn:20">20</a><a class="footnote-ref" id="fnref:21" href="#fn:21">21</a></td><td>T. Lin et al.</td></tr><tr><td><a href="/facts/ImageNet/DPaK4GLC">ImageNet</a></td><td>Labeled object image database, used in the <a href="/facts/ImageNet_Large_Scale_Visual_Recognition_Challenge/DPaK4GLC">ImageNet Large Scale Visual Recognition Challenge</a></td><td>Labeled objects, bounding boxes, descriptive words, SIFT features</td><td>14,197,122</td><td>Images, text</td><td>Object recognition, scene recognition</td><td>2009 (2014)</td><td><a class="footnote-ref" id="fnref:22" href="#fn:22">22</a><a class="footnote-ref" id="fnref:23" href="#fn:23">23</a><a class="footnote-ref" id="fnref:24" href="#fn:24">24</a></td><td>J. Deng et al.</td></tr><tr><td>SUN (Scene UNderstanding)</td><td>Very large scene and object recognition database.</td><td>Places and objects are labeled. Objects are segmented.</td><td>131,067</td><td>Images, text</td><td>Object recognition, scene recognition</td><td>2014</td><td><a class="footnote-ref" id="fnref:25" href="#fn:25">25</a><a class="footnote-ref" id="fnref:26" href="#fn:26">26</a></td><td>J. Xiao et al.</td></tr><tr><td>LSUN (Large SUN)</td><td>10 scene categories (bedroom, etc) and 20 object categories (airplane, etc)</td><td>Images and labels.</td><td>~60 million</td><td>Images, text</td><td>Object recognition, scene recognition</td><td>2015</td><td><a class="footnote-ref" id="fnref:27" href="#fn:27">27</a><a class="footnote-ref" id="fnref:28" href="#fn:28">28</a><a class="footnote-ref" id="fnref:29" href="#fn:29">29</a></td><td>Yu et al.</td></tr><tr><td>LVIS (Large Vocabulary Instance Segmentation)</td><td>segmentation masks for over 1000 entry-level object categories in images</td><td></td><td>2.2 million segmentations, 164K images</td><td>Images, segmentation masks.</td><td>image segmentation masking</td><td>2019</td><td><a class="footnote-ref" id="fnref:30" href="#fn:30">30</a></td><td></td></tr><tr><td>Open Images</td><td>A Large set of images listed as having CC BY 2.0 license with image-level labels and bounding boxes spanning thousands of classes.</td><td>Image-level labels, Bounding boxes</td><td>9,178,275</td><td>Images, text</td><td>Classification, Object recognition</td><td>2017(V7 : 2022)</td><td><a class="footnote-ref" id="fnref:31" href="#fn:31">31</a></td><td></td></tr><tr><td>TV News Channel Commercial Detection Dataset</td><td>TV commercials and news broadcasts.</td><td>Audio and video features extracted from still images.</td><td>129,685</td><td>Text</td><td>Clustering, classification</td><td>2015</td><td><a class="footnote-ref" id="fnref:32" href="#fn:32">32</a><a class="footnote-ref" id="fnref:33" href="#fn:33">33</a></td><td>P. Guha et al.</td></tr><tr><td>Statlog (Image Segmentation) Dataset</td><td>The instances were drawn randomly from a database of 7 outdoor images and hand-segmented to create a classification for every pixel.</td><td>Many features calculated.</td><td>2310</td><td>Text</td><td>Classification</td><td>1990</td><td><a class="footnote-ref" id="fnref:34" href="#fn:34">34</a></td><td><a href="/facts/University_of_Massachusetts/82Dwv2JU">University of Massachusetts</a></td></tr><tr><td><a href="/facts/Caltech_101/o17fLq9s">Caltech 101</a></td><td>Pictures of objects.</td><td>Detailed object outlines marked.</td><td>9146</td><td>Images</td><td>Classification, object recognition</td><td>2003</td><td><a class="footnote-ref" id="fnref:35" href="#fn:35">35</a><a class="footnote-ref" id="fnref:36" href="#fn:36">36</a></td><td>F. Li et al.</td></tr><tr><td>Caltech-256</td><td>Large dataset of images for object classification.</td><td>Images categorized and hand-sorted.</td><td>30,607</td><td>Images, Text</td><td>Classification, object detection</td><td>2007</td><td><a class="footnote-ref" id="fnref:37" href="#fn:37">37</a><a class="footnote-ref" id="fnref:38" href="#fn:38">38</a></td><td>G. Griffin et al.</td></tr><tr><td>COYO-700M</td><td>Image–text-pair dataset</td><td>10 billion pairs of alt-text and image sources in HTML documents in CommonCrawl</td><td>746,972,269</td><td>Images, Text</td><td>Classification, Image-Language</td><td>2022</td><td><a class="footnote-ref" id="fnref:39" href="#fn:39">39</a></td><td></td></tr><tr><td>SIFT10M Dataset</td><td><a href="/facts/Scale-invariant_feature_transform/oX14fcpr">SIFT</a> features of Caltech-256 dataset.</td><td>Extensive SIFT feature extraction.</td><td>11,164,866</td><td>Text</td><td>Classification, object detection</td><td>2016</td><td><a class="footnote-ref" id="fnref:40" href="#fn:40">40</a></td><td>X. Fu et al.</td></tr><tr><td><a href="/facts/LabelMe/7QUugKBy">LabelMe</a></td><td>Annotated pictures of scenes.</td><td>Objects outlined.</td><td>187,240</td><td>Images, text</td><td>Classification, object detection</td><td>2005</td><td><a class="footnote-ref" id="fnref:41" href="#fn:41">41</a></td><td><a href="/facts/MIT_Computer_Science_and_Artificial_Intelligence_Laboratory/mTg66jb9">MIT Computer Science and Artificial Intelligence Laboratory</a></td></tr><tr><td>PASCAL VOC Dataset</td><td>Images in 20 categories and localization bounding boxes.</td><td>Labeling, bounding box included</td><td>500,000</td><td>Images, text</td><td>Classification, object detection</td><td>2010</td><td><a class="footnote-ref" id="fnref:42" href="#fn:42">42</a><a class="footnote-ref" id="fnref:43" href="#fn:43">43</a></td><td>M. Everingham et al.</td></tr><tr><td><a href="/facts/CIFAR-10/7YThml10">CIFAR-10</a> Dataset</td><td>Many small, low-resolution, images of 10 classes of objects.</td><td>Classes labelled, training set splits created.</td><td>60,000</td><td>Images</td><td>Classification</td><td>2009</td><td><a class="footnote-ref" id="fnref:44" href="#fn:44">44</a><a class="footnote-ref" id="fnref:45" href="#fn:45">45</a></td><td><a href="/facts/Alex_Krizhevsky/SjGe6B3f">A. Krizhevsky</a> et al.</td></tr><tr><td>CIFAR-100 Dataset</td><td>Like CIFAR-10, above, but 100 classes of objects are given.</td><td>Classes labelled, training set splits created.</td><td>60,000</td><td>Images</td><td>Classification</td><td>2009</td><td><a class="footnote-ref" id="fnref:46" href="#fn:46">46</a><a class="footnote-ref" id="fnref:47" href="#fn:47">47</a></td><td>A. Krizhevsky et al.</td></tr><tr><td>CINIC-10 Dataset</td><td>A unified contribution of CIFAR-10 and Imagenet with 10 classes, and 3 splits. Larger than CIFAR-10.</td><td>Classes labelled, training, validation, test set splits created.</td><td>270,000</td><td>Images</td><td>Classification</td><td>2018</td><td><a class="footnote-ref" id="fnref:48" href="#fn:48">48</a></td><td>Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey</td></tr><tr><td><a href="/facts/Fashion_MNIST/eMFARmXf">Fashion-MNIST</a></td><td>A MNIST-like fashion product database</td><td>Classes labelled, training set splits created.</td><td>60,000</td><td>Images</td><td>Classification</td><td>2017</td><td><a class="footnote-ref" id="fnref:49" href="#fn:49">49</a></td><td>Zalando SE</td></tr><tr><td>notMNIST</td><td>Some publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A–J taken from different fonts.</td><td>Classes labelled, training set splits created.</td><td>500,000</td><td>Images</td><td>Classification</td><td>2011</td><td><a class="footnote-ref" id="fnref:50" href="#fn:50">50</a></td><td>Yaroslav Bulatov</td></tr><tr><td>Linnaeus 5 dataset</td><td>Images of 5 classes of objects.</td><td>Classes labelled, training set splits created.</td><td>8000</td><td>Images</td><td>Classification</td><td>2017</td><td><a class="footnote-ref" id="fnref:51" href="#fn:51">51</a></td><td>Chaladze & Kalatozishvili</td></tr><tr><td>11K Hands</td><td>11,076 hand images (1600 x 1200 pixels) of 190 subjects, of varying ages between 18 – 75 years old, for gender recognition and biometric identification.</td><td>None</td><td>11,076 hand images</td><td>Images and (.mat, .txt, and .csv) label files</td><td>Gender recognition and biometric identification</td><td>2017</td><td><a class="footnote-ref" id="fnref:52" href="#fn:52">52</a></td><td>M Afifi</td></tr><tr><td>CORe50</td><td>Specifically designed for Continuous/Lifelong Learning and Object Recognition, is a collection of more than 500 videos (30fps) of 50 domestic objects belonging to 10 different categories.</td><td>Classes labelled, training set splits created based on a 3-way, multi-runs benchmark.</td><td>164,866 RBG-D images</td><td>images (.png or .pkl)and (.pkl, .txt, .tsv) label files</td><td>Classification, Object recognition</td><td>2017</td><td><a class="footnote-ref" id="fnref:53" href="#fn:53">53</a></td><td>V. Lomonaco and D. Maltoni</td></tr><tr><td>OpenLORIS-Object</td><td>Lifelong/Continual Robotic Vision dataset (OpenLORIS-Object) collected by real robots mounted with multiple high-resolution sensors, includes a collection of 121 object instances (1st version of dataset, 40 categories daily necessities objects under 20 scenes). The dataset has rigorously considered 4 environment factors under different scenes, including illumination, occlusion, object pixel size and clutter, and defines the difficulty levels of each factor explicitly.</td><td>Classes labelled, training/validation/testing set splits created by benchmark scripts.</td><td>1,106,424 RBG-D images</td><td>images (.png and .pkl)and (.pkl) label files</td><td>Classification, Lifelong object recognition, Robotic Vision</td><td>2019</td><td><a class="footnote-ref" id="fnref:54" href="#fn:54">54</a></td><td>Q. She et al.</td></tr><tr><td>THz and thermal video data set</td><td>This multispectral data set includes terahertz, thermal, visual, near infrared, and three-dimensional videos of objects hidden under people's clothes.</td><td>images and 3D point clouds</td><td>More than 20 videos. The duration of each video is about 85 seconds (about 345 frames).</td><td>AP2J</td><td>Experiments with hidden object detection</td><td>2019</td><td><a class="footnote-ref" id="fnref:55" href="#fn:55">55</a><a class="footnote-ref" id="fnref:56" href="#fn:56">56</a></td><td>Alexei A. Morozov and Olga S. Sushkova</td></tr></tbody></table>
<h3>3D Objects</h3>
See (Calli et al, 2015)<a class="footnote-ref" id="fnref:57" href="#fn:57">57</a> for a review of 33 datasets of 3D object as of 2015. See (Downs et al., 2022)<a class="footnote-ref" id="fnref:58" href="#fn:58">58</a> for a review of more datasets as of 2022.

<table><tbody><tr><th scope="col">Dataset Name</th><th scope="col">Brief description</th><th scope="col">Preprocessing</th><th scope="col">Instances</th><th scope="col">Format</th><th scope="col">Default Task</th><th scope="col">Created (updated)</th><th scope="col">Reference</th><th scope="col">Creator</th></tr><tr><td>Princeton Shape Benchmark</td><td>3D polygonal models collected from the Internet</td><td></td><td>1814 models in 92 categories</td><td>3D polygonal models, categories</td><td>shape-based retrieval and analysis</td><td>2004</td><td><a class="footnote-ref" id="fnref:59" href="#fn:59">59</a><a class="footnote-ref" id="fnref:60" href="#fn:60">60</a></td><td>Shilane et al.</td></tr><tr><td>Berkeley 3-D Object Dataset (B3DO)</td><td>Depth and color images collected from crowdsourced <a href="/facts/Kinect/dD5P9XDW">Microsoft Kinect</a> users. Annotated in 50 object categories.</td><td></td><td>849 images, in 75 scenes</td><td>color image, depth image, object class, bounding boxes, 3D center points</td><td>Predict bounding boxes</td><td>2011, updated 2014</td><td><a class="footnote-ref" id="fnref:61" href="#fn:61">61</a></td><td>Janoch et al.</td></tr><tr><td>ShapeNet</td><td>3D models. Some are classified into <a href="/facts/WordNet/TbrGVGAF">WordNet synsets</a>, like <a href="/facts/ImageNet/DPaK4GLC">ImageNet</a>. Partially classified into 3,135 categories.</td><td></td><td>3,000,000 models, 220,000 of which are classified.</td><td>3D models, class labels</td><td>Predict class label.</td><td>2015</td><td><a class="footnote-ref" id="fnref:62" href="#fn:62">62</a></td><td>Chang et al.</td></tr><tr><td>ObjectNet3D</td><td>Images, 3D shapes, and objects 100 categories.</td><td></td><td>90127 images, 201888 objects, 44147 3D shapes</td><td>images, 3D shapes, object bounding boxes, category labels</td><td>recognizing the 3D pose and 3D shape of objects from 2D images</td><td>2016</td><td><a class="footnote-ref" id="fnref:63" href="#fn:63">63</a><a class="footnote-ref" id="fnref:64" href="#fn:64">64</a></td><td>Xiang et al.</td></tr><tr><td>Common Objects in 3D (CO3D)</td><td>Video frames from videos capturing objects from 50 MS-COCO categories, filmed by people on Amazon Mechanical Turk.</td><td></td><td>6 million frames from 40000 videos</td><td>multi-view images, camera poses, 3D point clouds, object category</td><td>Predict object category. Generate objects.</td><td>2021, updated 2022 as CO3Dv2</td><td><a class="footnote-ref" id="fnref:65" href="#fn:65">65</a><a class="footnote-ref" id="fnref:66" href="#fn:66">66</a></td><td><a href="/facts/Meta_AI/raPFo7MW">Meta AI</a></td></tr><tr><td>Google Scanned Objects</td><td>Scanned objects in <a href="/facts/Chemical_table_file/WG8N7EhF">SDF</a> format.</td><td></td><td>over 10 million</td><td></td><td></td><td>2022</td><td><a class="footnote-ref" id="fnref:67" href="#fn:67">67</a></td><td><a href="/facts/Google_AI/dG4ouT6p">Google AI</a></td></tr><tr><td>Objectverse-XL</td><td>3D objects</td><td></td><td>over 10 million</td><td>3D objects, metadata</td><td>novel view synthesis, 3D object generation</td><td>2023</td><td><a class="footnote-ref" id="fnref:68" href="#fn:68">68</a></td><td>Deitke et al.</td></tr><tr><td>OmniObject3D</td><td>Scanned objects, labelled in 190 daily categories</td><td></td><td>6,000</td><td>textured meshes, point clouds, multiview images, videos</td><td>robust 3D perception, novel-view synthesis,surface reconstruction, 3D object generation</td><td>2023</td><td><a class="footnote-ref" id="fnref:69" href="#fn:69">69</a><a class="footnote-ref" id="fnref:70" href="#fn:70">70</a></td><td>Wu et al.</td></tr><tr><td>UnCommon Objects in 3D (uCO3D)</td><td>1,070 categories in the LVIS</td><td></td><td></td><td></td><td></td><td>2025</td><td><a class="footnote-ref" id="fnref:71" href="#fn:71">71</a><a class="footnote-ref" id="fnref:72" href="#fn:72">72</a></td><td><a href="/facts/Meta_AI/raPFo7MW">Meta AI</a></td></tr></tbody></table>
<h3>Object detection and recognition for autonomous vehicles</h3>

<table><tbody><tr><th scope="col">Dataset Name</th><th scope="col">Brief description</th><th scope="col">Preprocessing</th><th scope="col">Instances</th><th scope="col">Format</th><th scope="col">Default Task</th><th scope="col">Created (updated)</th><th scope="col">Reference</th><th scope="col">Creator</th></tr><tr><td>Cityscapes Dataset</td><td>Stereo video sequences recorded in street scenes, with pixel-level annotations. Metadata also included.</td><td>Pixel-level segmentation and labeling</td><td>25,000</td><td>Images, text</td><td>Classification, object detection</td><td>2016</td><td><a class="footnote-ref" id="fnref:73" href="#fn:73">73</a></td><td><a href="/facts/Daimler_AG/KfdR8UTS">Daimler AG</a> et al.</td></tr><tr><td>German Traffic Sign Detection Benchmark Dataset</td><td>Images from vehicles of traffic signs on German roads. These signs comply with UN standards and therefore are the same as in other countries.</td><td>Signs manually labeled</td><td>900</td><td>Images</td><td>Classification</td><td>2013</td><td><a class="footnote-ref" id="fnref:74" href="#fn:74">74</a><a class="footnote-ref" id="fnref:75" href="#fn:75">75</a></td><td>S. Houben et al.</td></tr><tr><td>KITTI Vision Benchmark Dataset</td><td>Autonomous vehicles driving through a mid-size city captured images of various areas using cameras and laser scanners.</td><td>Many benchmarks extracted from data.</td><td>>100 GB of data</td><td>Images, text</td><td>Classification, object detection</td><td>2012</td><td><a class="footnote-ref" id="fnref:76" href="#fn:76">76</a><a class="footnote-ref" id="fnref:77" href="#fn:77">77</a><a class="footnote-ref" id="fnref:78" href="#fn:78">78</a></td><td>A. Geiger et al.</td></tr><tr><td>FieldSAFE</td><td>Multi-modal dataset for obstacle detection in agriculture including stereo camera, thermal camera, web camera, 360-degree camera, lidar, radar, and precise localization.</td><td>Classes labelled geographically.</td><td>>400 GB of data</td><td>Images and 3D point clouds</td><td>Classification, object detection, object localization</td><td>2017</td><td><a class="footnote-ref" id="fnref:79" href="#fn:79">79</a></td><td>M. Kragh et al.</td></tr><tr><td>Daimler Monocular Pedestrian Detection dataset</td><td>It is a dataset of pedestrians in urban environments.</td><td>Pedestrians are box-wise labeled.</td><td>Labeled part contains 15560 samples with pedestrians and 6744 samples without. Test set contains 21790 images without labels.</td><td>Images</td><td>Object recognition and classification</td><td>2006</td><td><a class="footnote-ref" id="fnref:80" href="#fn:80">80</a><a class="footnote-ref" id="fnref:81" href="#fn:81">81</a><a class="footnote-ref" id="fnref:82" href="#fn:82">82</a></td><td><a href="/facts/Daimler_AG/KfdR8UTS">Daimler AG</a></td></tr><tr><td>CamVid</td><td>The Cambridge-driving Labeled Video Database (CamVid) is a collection of videos.</td><td>The dataset is labeled with semantic labels for 32 semantic classes.</td><td>over 700 images</td><td>Images</td><td>Object recognition and classification</td><td>2008</td><td><a class="footnote-ref" id="fnref:83" href="#fn:83">83</a><a class="footnote-ref" id="fnref:84" href="#fn:84">84</a><a class="footnote-ref" id="fnref:85" href="#fn:85">85</a></td><td>Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, Roberto Cipolla</td></tr><tr><td>RailSem19</td><td>RailSem19 is a dataset for understanding scenes for vision systems on railways.</td><td>The dataset is labeled semanticly and box-wise.</td><td>8500</td><td>Images</td><td>Object recognition and classification, scene recognition</td><td>2019</td><td><a class="footnote-ref" id="fnref:86" href="#fn:86">86</a><a class="footnote-ref" id="fnref:87" href="#fn:87">87</a></td><td>Oliver Zendel, Markus Murschitz, Marcel Zeilinger, Daniel Steininger, Sara Abbasi, Csaba Beleznai</td></tr><tr><td>BOREAS</td><td>BOREAS is a multi-season autonomous driving dataset. It includes data from includes a Velodyne Alpha-Prime (128-beam) lidar, a FLIR Blackfly S camera, a Navtech CIR304-H radar, and an Applanix POS LV GNSS-INS.</td><td>The data is annotated by 3D bounding boxes.</td><td>350 km of driving data</td><td>Images, Lidar and Radar data</td><td>Object recognition and classification, scene recognition</td><td>2023</td><td><a class="footnote-ref" id="fnref:88" href="#fn:88">88</a><a class="footnote-ref" id="fnref:89" href="#fn:89">89</a></td><td>Keenan Burnett, David J. Yoon, Yuchen Wu, Andrew Zou Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith Y.K. Leung, <a href="/facts/Angela_Schoellig/ww2MyGf7">Angela P. Schoellig</a>, Timothy D. Barfoot</td></tr><tr><td>Bosch Small Traffic Lights Dataset</td><td>It is a dataset of traffic lights.</td><td>The labeling include bounding boxes of traffic lights together with their state (active light).</td><td>5000 images for training and a video sequence of 8334 frames for evaluation</td><td>Images</td><td>Traffic light recognition</td><td>2017</td><td><a class="footnote-ref" id="fnref:90" href="#fn:90">90</a><a class="footnote-ref" id="fnref:91" href="#fn:91">91</a></td><td>Karsten Behrendt, Libor Novak, Rami Botros</td></tr><tr><td>FRSign</td><td>It is a dataset of French railway signals.</td><td>The labeling include bounding boxes of railway signals together with their state (active light).</td><td>more than 100000</td><td>Images</td><td>Railway signal recognition</td><td>2020</td><td><a class="footnote-ref" id="fnref:92" href="#fn:92">92</a><a class="footnote-ref" id="fnref:93" href="#fn:93">93</a></td><td>Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow, Grégoire Roblin, Roman Potarusov, Hatem Hajri</td></tr><tr><td>GERALD</td><td>It is a dataset of German railway signals.</td><td>The labeling include bounding boxes of railway signals together with their state (active light).</td><td>5000</td><td>Images</td><td>Railway signal recognition</td><td>2023</td><td><a class="footnote-ref" id="fnref:94" href="#fn:94">94</a><a class="footnote-ref" id="fnref:95" href="#fn:95">95</a></td><td>Philipp Leibner, Fabian Hampel, Christian Schindler</td></tr><tr><td>Multi-cue pedestrian</td><td>Multi-cue onboard pedestrian detection dataset is a dataset for detection of pedestrians.</td><td>The databaset is labeled box-wise.</td><td>1092 image pairs with 1776 boxes for pedestrians</td><td>Images</td><td>Object recognition and classification</td><td>2009</td><td><a class="footnote-ref" id="fnref:96" href="#fn:96">96</a></td><td>Christian Wojek, Stefan Walk, Bernt Schiele</td></tr><tr><td>RAWPED</td><td>RAWPED is a dataset for detection of pedestrians in the context of railways.</td><td>The dataset is labeled box-wise.</td><td>26000</td><td>Images</td><td>Object recognition and classification</td><td>2020</td><td><a class="footnote-ref" id="fnref:97" href="#fn:97">97</a><a class="footnote-ref" id="fnref:98" href="#fn:98">98</a></td><td>Tugce Toprak, Burak Belenlioglu, Burak Aydın, Cuneyt Guzelis, M. Alper Selver</td></tr><tr><td>OSDaR23</td><td>OSDaR23 is a multi-sensory dataset for detection of objects in the context of railways.</td><td>The databaset is labeled box-wise.</td><td>16874 frames</td><td>Images, Lidar, Radar and Infrared</td><td>Object recognition and classification</td><td>2023</td><td><a class="footnote-ref" id="fnref:99" href="#fn:99">99</a><a class="footnote-ref" id="fnref:100" href="#fn:100">100</a></td><td>Roman Tilly, Rustam Tagiew, Pavel Klasek (<a href="/facts/DZSF/m8XAtYuw">DZSF</a>); Philipp Neumaier, Patrick Denzler, Tobias Klockau, Martin Boekhoff, Martin Köppel (Digitale Schiene Deutschland); Karsten Schwalbe (FusionSystems)</td></tr><tr><td>Agroverse</td><td>Argoverse is a multi-sensory dataset for detection of objects in the context of roads.</td><td>The dataset is annotated box-wise.</td><td>320 hours of recording</td><td>Data from 7 cameras and LiDAR</td><td>Object recognition and classification, object tracking</td><td>2022</td><td><a class="footnote-ref" id="fnref:101" href="#fn:101">101</a><a class="footnote-ref" id="fnref:102" href="#fn:102">102</a></td><td>Argo AI, <a href="/facts/Carnegie_Mellon_University/XKd7L2KA">Carnegie Mellon University</a>, <a href="/facts/Georgia_Institute_of_Technology/HKtrUSao">Georgia Institute of Technology</a></td></tr><tr><td>Rail3D</td><td>Rail3D is a LiDAR dataset for railways recorded in Hungary, France, and Belgium</td><td>The dataset is annotated semantically</td><td>288 million annotated points</td><td>LiDAR</td><td>Object recognition and classification, object tracking</td><td>2024</td><td><a class="footnote-ref" id="fnref:103" href="#fn:103">103</a></td><td>Abderrazzaq Kharroubi, Ballouch Zouhair, Rafika Hajji, Anass Yarroudh, and Roland Billen; <a href="/facts/University_of_Li%25C3%25A8ge/YQBja3KQ">University of Liège</a> and Hassan II Institute of Agronomy and Veterinary Medicine</td></tr><tr><td>WHU-Railway3D</td><td>WHU-Railway3D is a LiDAR dataset for urban, rural, and plateau railways recorded in China</td><td>The dataset is annotated semantically</td><td>4.6 billion annotated data points</td><td>LiDAR</td><td>Object recognition and classification, object tracking</td><td>2024</td><td><a class="footnote-ref" id="fnref:104" href="#fn:104">104</a></td><td>Bo Qiu, Yuzhou Zhou, Lei Dai; Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, Bisheng Yang; <a href="/facts/Wuhan_University/WYURh1Ju">Wuhan University</a>, <a href="/facts/University_of_Oxford/ExVVQL6V">University of Oxford</a>, <a href="/facts/Hong_Kong_Polytechnic_University/CHjvYpPE">Hong Kong Polytechnic University</a>, <a href="/facts/Nanyang_Technological_University/C0ksstA4">Nanyang Technological University</a>, <a href="/facts/Xiamen_University/fdzcCFsM">Xiamen University</a> and <a href="/facts/Tsinghua_University/HNwjrEuU">Tsinghua University</a></td></tr><tr><td>RailFOD23</td><td>A dataset of foreign objects on railway catenary</td><td>The dataset is annotated boxwise</td><td>14,615 images</td><td>Images</td><td>Object recognition and classification, object tracking</td><td>2024</td><td><a class="footnote-ref" id="fnref:105" href="#fn:105">105</a></td><td>Zhichao Chen, Jie Yang, Zhicheng Feng, Hao Zhu; <a href="/facts/Jiangxi_University_of_Science_and_Technology/RNhKMnDP">Jiangxi University of Science and Technology</a></td></tr><tr><td>ESRORAD</td><td>A dataset of images and point clouds for urban road and rail scenes from <a href="/facts/Le_Havre/VHwdcG61">Le Havre</a> and <a href="/facts/Rouen/o4noeA7S">Rouen</a></td><td>The dataset is annotated boxwise</td><td>2,700 k virtual images and 100,000 real images</td><td>Images, LiDAR</td><td>Object recognition and classification, object tracking</td><td>2022</td><td><a class="footnote-ref" id="fnref:106" href="#fn:106">106</a></td><td>Redouane Khemmar, Antoine Mauri, Camille Dulompont, Jayadeep Gajula, Vincent Vauchey, Madjid Haddad and Rémi Boutteau; <a href="/facts/Le_Havre_Normandy_University/AReYmvwR">Le Havre Normandy University</a> and SEGULA Technologies</td></tr><tr><td>RailVID</td><td>Data recorded by AT615X infrared thermography from InfiRay in diverse railway <a href="/facts/Scenario_(vehicular_automation)/mMJCdSU8">scenarios</a>, including carport, depot, and straight.</td><td>The dataset is annotated semantically</td><td>1,071 images</td><td>infrared images</td><td>Object recognition and classification, object tracking</td><td>2022</td><td><a class="footnote-ref" id="fnref:107" href="#fn:107">107</a></td><td>Hao Yuan, Zhenkun Mei, Yihao Chen, Weilong Niu, Cheng Wu; <a href="/facts/Soochow_University_(Suzhou)/1fsaVkVH">Soochow University</a></td></tr><tr><td>RailPC</td><td>LiDAR dataset in the context of railways</td><td>The dataset is annotated semantically</td><td>3 billion data points</td><td>LiDAR</td><td>Object recognition and classification, object tracking</td><td>2024</td><td><a class="footnote-ref" id="fnref:108" href="#fn:108">108</a></td><td>Tengping Jiang, Shiwei Li, Qinyu Zhang, Guangshuai Wang, Zequn Zhang, Fankun Zeng, Peng An, Xin Jin, Shan Liu, Yongjun Wang ; <a href="/facts/Nanjing_Normal_University/CqJKg7nW">Nanjing Normal University</a>, <a href="/facts/Ministry_of_Natural_Resources_(China)/sdCzosUT">Ministry of Natural Resources</a>, <a href="/facts/Eastern_Institute_of_Technology/1So5oN4E">Eastern Institute of Technology</a>, Tianjin Key Laboratory of Rail Transit Navigation Positioning and Spatio‐temporal Big Data Technology, <a href="/facts/Northwest_Normal_University/FdfBnkpT">Northwest Normal University</a>, <a href="/facts/Washington_University_in_St._Louis/GmIT8zU6">Washington University in St. Louis</a> and <a href="/facts/Ningbo_University_of_Technology/5M7UXNBT">Ningbo University of Technology</a></td></tr><tr><td>RailCloud-HdF</td><td>LiDAR dataset in the context of railways</td><td>The dataset is annotated semantically</td><td>8060.3 million data points</td><td>LiDAR</td><td>Object recognition and classification, object tracking</td><td>2024</td><td><a class="footnote-ref" id="fnref:109" href="#fn:109">109</a></td><td>Mahdi Abid , Mathis Teixeira, Ankur Mahtani and Thomas Laurent; Railenium</td></tr><tr><td>RailGoerl24</td><td>RGB and LiDAR dataset in the context of railways</td><td>The dataset is annotated boxwise</td><td>12205 HD RGB frames and 383922305 LiDAR colored cloud points</td><td>RGB, LiDAR</td><td>Person recognition and classification</td><td>2025</td><td><a class="footnote-ref" id="fnref:110" href="#fn:110">110</a></td><td>DZSF, PECS-WORK GmbH, EYYES Deutschland GmbH, TU Dresden</td></tr></tbody></table>
<h2 id="facial-recognition">Facial recognition</h2>
In <a href="/facts/Computer_vision/Tl2Yyk66">computer vision</a>, face images have been used extensively to develop <a href="/facts/Facial_recognition_system/mtVxaDln">facial recognition systems</a>, <a href="/facts/Face_detection/KVIvPRFA">face detection</a>, and many other projects that use images of faces. See <a class="footnote-ref" id="fnref:111" href="#fn:111">111</a> for a curated list of datasets, focused on the pre-2005 period.

<table><tbody><tr><th scope="col">Dataset name</th><th scope="col">Brief description</th><th scope="col">Preprocessing</th><th scope="col">Instances</th><th scope="col">Format</th><th scope="col">Default task</th><th scope="col">Created (updated)</th><th scope="col">Reference</th><th scope="col">Creator</th></tr><tr><td>Labeled Faces in the Wild (LFW)</td><td>Images of named individuals obtained by Internet search.</td><td>frontal face detection, bounding box cropping</td><td>13233 images of 5749 named individuals</td><td>images, labels</td><td>unconstrained face recognition</td><td>2008</td><td><a class="footnote-ref" id="fnref:112" href="#fn:112">112</a><a class="footnote-ref" id="fnref:113" href="#fn:113">113</a></td><td>Huang et al.</td></tr><tr><td>Aff-Wild</td><td>298 videos of 200 individuals, ~1,250,000 manually annotated images: annotated in terms of dimensional affect (valence-arousal); in-the-wild setting; color database; various resolutions (average = 640x360)</td><td>the detected faces, facial landmarks and valence-arousal annotations</td><td>~1,250,000 manually annotated images</td><td>video (visual + audio <a href="/facts/Modality_(human-computer_interaction)/KzL4uqMm">modalities</a>)</td><td>affect recognition (valence-arousal estimation)</td><td>2017</td><td>CVPR<a class="footnote-ref" id="fnref:114" href="#fn:114">114</a>IJCV<a class="footnote-ref" id="fnref:115" href="#fn:115">115</a></td><td>D. Kollias et al.</td></tr><tr><td>Aff-Wild2</td><td>558 videos of 458 individuals, ~2,800,000 manually annotated images: annotated in terms of i) categorical affect (7 basic expressions: neutral, happiness, sadness, surprise, fear, disgust, anger); ii) dimensional affect (valence-arousal); iii) action units (AUs 1,2,4,6,12,15,20,25); in-the-wild setting; color database; various resolutions (average = 1030x630)</td><td>the detected faces, detected and aligned faces and annotations</td><td>~2,800,000 manually annotated images</td><td>video (visual + audio modalities)</td><td>affect recognition (valence-arousal estimation, basic expression classification, action unit detection)</td><td>2019</td><td>BMVC<a class="footnote-ref" id="fnref:116" href="#fn:116">116</a>FG<a class="footnote-ref" id="fnref:117" href="#fn:117">117</a></td><td>D. Kollias et al.</td></tr><tr><td><a href="/facts/FERET_(facial_recognition_technology)/fBrC9If4">FERET (facial recognition technology)</a></td><td>11338 images of 1199 individuals in different positions and at different times.</td><td>None.</td><td>11,338</td><td>Images</td><td>Classification, face recognition</td><td>2003</td><td><a class="footnote-ref" id="fnref:118" href="#fn:118">118</a><a class="footnote-ref" id="fnref:119" href="#fn:119">119</a></td><td><a href="/facts/United_States_Department_of_Defense/dWAvf7e3">United States Department of Defense</a></td></tr><tr><td>Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)</td><td>7,356 video and audio recordings of 24 professional actors. 8 emotions each at two intensities.</td><td>Files labelled with expression. Perceptual validation ratings provided by 319 raters.</td><td>7,356</td><td>Video, sound files</td><td>Classification, face recognition, voice recognition</td><td>2018</td><td><a class="footnote-ref" id="fnref:120" href="#fn:120">120</a><a class="footnote-ref" id="fnref:121" href="#fn:121">121</a></td><td>S.R. Livingstone and F.A. Russo</td></tr><tr><td>SCFace</td><td>Color images of faces at various angles.</td><td>Location of facial features extracted. Coordinates of features given.</td><td>4,160</td><td>Images, text</td><td><a href="/facts/Statistical_classification/jXXHRkXR">Classification</a>, face recognition</td><td>2011</td><td><a class="footnote-ref" id="fnref:122" href="#fn:122">122</a><a class="footnote-ref" id="fnref:123" href="#fn:123">123</a></td><td>M. Grgic et al.</td></tr><tr><td>Yale Face Database</td><td>Faces of 15 individuals in 11 different expressions.</td><td>Labels of expressions.</td><td>165</td><td>Images</td><td>Face recognition</td><td>1997</td><td><a class="footnote-ref" id="fnref:124" href="#fn:124">124</a><a class="footnote-ref" id="fnref:125" href="#fn:125">125</a></td><td>J. Yang et al.</td></tr><tr><td>Cohn-Kanade AU-Coded Expression Database</td><td>Large database of images with labels for expressions.</td><td>Tracking of certain facial features.</td><td>500+ sequences</td><td>Images, text</td><td>Facial expression analysis</td><td>2000</td><td><a class="footnote-ref" id="fnref:126" href="#fn:126">126</a><a class="footnote-ref" id="fnref:127" href="#fn:127">127</a></td><td>T. Kanade et al.</td></tr><tr><td>JAFFE Facial Expression Database</td><td>213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models.</td><td>Images are cropped to the facial region. Includes semantic ratings data on emotion labels.</td><td>213</td><td>Images, text</td><td>Facial expression cognition</td><td>1998</td><td><a class="footnote-ref" id="fnref:128" href="#fn:128">128</a><a class="footnote-ref" id="fnref:129" href="#fn:129">129</a></td><td>Lyons, Kamachi, Gyoba</td></tr><tr><td>FaceScrub</td><td>Images of public figures scrubbed from image searching.</td><td>Name and m/f annotation.</td><td>107,818</td><td>Images, text</td><td>Face recognition</td><td>2014</td><td><a class="footnote-ref" id="fnref:130" href="#fn:130">130</a><a class="footnote-ref" id="fnref:131" href="#fn:131">131</a></td><td>H. Ng et al.</td></tr><tr><td>BioID Face Database</td><td>Images of faces with eye positions marked.</td><td>Manually set eye positions.</td><td>1521</td><td>Images, text</td><td>Face recognition</td><td>2001</td><td><a class="footnote-ref" id="fnref:132" href="#fn:132">132</a></td><td>BioID</td></tr><tr><td>Skin Segmentation Dataset</td><td>Randomly sampled color values from face images.</td><td>B, G, R, values extracted.</td><td>245,057</td><td>Text</td><td>Segmentation, classification</td><td>2012</td><td><a class="footnote-ref" id="fnref:133" href="#fn:133">133</a><a class="footnote-ref" id="fnref:134" href="#fn:134">134</a></td><td>R. Bhatt.</td></tr><tr><td>Bosphorus</td><td>3D Face image database.</td><td>34 action units and 6 expressions labeled; 24 facial landmarks labeled.</td><td>4652</td><td>Images, text</td><td>Face recognition, classification</td><td>2008</td><td><a class="footnote-ref" id="fnref:135" href="#fn:135">135</a><a class="footnote-ref" id="fnref:136" href="#fn:136">136</a></td><td>A Savran et al.</td></tr><tr><td>UOY 3D-Face</td><td>neutral face, 5 expressions: anger, happiness, sadness, eyes closed, eyebrows raised.</td><td>labeling.</td><td>5250</td><td>Images, text</td><td>Face recognition, classification</td><td>2004</td><td><a class="footnote-ref" id="fnref:137" href="#fn:137">137</a><a class="footnote-ref" id="fnref:138" href="#fn:138">138</a></td><td><a href="/facts/University_of_York/qXEdu8vD">University of York</a></td></tr><tr><td>CASIA 3D Face Database</td><td>Expressions: Anger, smile, laugh, surprise, closed eyes.</td><td>None.</td><td>4624</td><td>Images, text</td><td>Face recognition, classification</td><td>2007</td><td><a class="footnote-ref" id="fnref:139" href="#fn:139">139</a><a class="footnote-ref" id="fnref:140" href="#fn:140">140</a></td><td><a href="/facts/Institute_of_Automation%2c_Chinese_Academy_of_Sciences/JxAHgCv8">Institute of Automation, Chinese Academy of Sciences</a></td></tr><tr><td>CASIA NIR</td><td>Expressions: Anger Disgust Fear Happiness Sadness Surprise</td><td>None.</td><td>480</td><td>Annotated Visible Spectrum and Near Infrared Video captures at 25 frames per second</td><td>Face recognition, classification</td><td>2011</td><td><a class="footnote-ref" id="fnref:141" href="#fn:141">141</a></td><td>Zhao, G. et al.</td></tr><tr><td>BU-3DFE</td><td>neutral face, and 6 expressions: anger, happiness, sadness, surprise, disgust, fear (4 levels). 3D images extracted.</td><td>None.</td><td>2500</td><td>Images, text</td><td>Facial expression recognition, classification</td><td>2006</td><td><a class="footnote-ref" id="fnref:142" href="#fn:142">142</a></td><td><a href="/facts/Binghamton_University/zqumysQb">Binghamton University</a></td></tr><tr><td><a href="/facts/Face_Recognition_Grand_Challenge/gNQuxGmS">Face Recognition Grand Challenge</a> Dataset</td><td>Up to 22 samples for each subject. Expressions: anger, happiness, sadness, surprise, disgust, puffy. 3D Data.</td><td>None.</td><td>4007</td><td>Images, text</td><td>Face recognition, classification</td><td>2004</td><td><a class="footnote-ref" id="fnref:143" href="#fn:143">143</a><a class="footnote-ref" id="fnref:144" href="#fn:144">144</a></td><td><a href="/facts/National_Institute_of_Standards_and_Technology/gr9Fnh85">National Institute of Standards and Technology</a></td></tr><tr><td>Gavabdb</td><td>Up to 61 samples for each subject. Expressions neutral face, smile, frontal accentuated laugh, frontal random gesture. 3D images.</td><td>None.</td><td>549</td><td>Images, text</td><td>Face recognition, classification</td><td>2008</td><td><a class="footnote-ref" id="fnref:145" href="#fn:145">145</a><a class="footnote-ref" id="fnref:146" href="#fn:146">146</a></td><td><a href="/facts/King_Juan_Carlos_University/LV7sa1ES">King Juan Carlos University</a></td></tr><tr><td>3D-RMA</td><td>Up to 100 subjects, expressions mostly neutral. Several poses as well.</td><td>None.</td><td>9971</td><td>Images, text</td><td>Face recognition, classification</td><td>2004</td><td><a class="footnote-ref" id="fnref:147" href="#fn:147">147</a><a class="footnote-ref" id="fnref:148" href="#fn:148">148</a></td><td><a href="/facts/Royal_Military_Academy_(Belgium)/hhkITMz8">Royal Military Academy (Belgium)</a></td></tr><tr><td>SoF</td><td>112 persons (66 males and 46 females) wear glasses under different illumination conditions.</td><td>A set of synthetic filters (blur, occlusions, noise, and posterization ) with different level of difficulty.</td><td>42,592 (2,662 original image × 16 synthetic image)</td><td>Images, Mat file</td><td>Gender classification, face detection, face recognition, age estimation, and glasses detection</td><td>2017</td><td><a class="footnote-ref" id="fnref:149" href="#fn:149">149</a><a class="footnote-ref" id="fnref:150" href="#fn:150">150</a></td><td>Afifi, M. et al.</td></tr><tr><td>IMDb-WIKI</td><td>IMDb and Wikipedia face images with gender and age labels.</td><td>None</td><td>523,051</td><td>Images</td><td>Gender classification, face detection, face recognition, age estimation</td><td>2015</td><td><a class="footnote-ref" id="fnref:151" href="#fn:151">151</a></td><td>R. Rothe, R. Timofte, L. V. Gool</td></tr></tbody></table>
<h2 id="action-recognition">Action recognition</h2>
<table><tbody><tr><th scope="col">Dataset name</th><th scope="col">Brief description</th><th scope="col">Preprocessing</th><th scope="col">Instances</th><th scope="col">Format</th><th scope="col">Default Task</th><th scope="col">Created (updated)</th><th scope="col">Reference</th><th scope="col">Creator</th></tr><tr><td>AVA-Kinetics Localized Human Actions Video</td><td>Annotated 80 action classes from keyframes from videos from Kinetics-700.</td><td></td><td>1.6 million annotations. 238,906 video clips, 624,430 keyframes.</td><td>Annotations, videos.</td><td>Action prediction</td><td>2020</td><td><a class="footnote-ref" id="fnref:152" href="#fn:152">152</a><a class="footnote-ref" id="fnref:153" href="#fn:153">153</a></td><td>Li et al from Perception Team of <a href="/facts/Google_AI/dG4ouT6p">Google AI</a>.</td></tr><tr><td>TV Human Interaction Dataset</td><td>Videos from 20 different TV shows for prediction social actions: handshake, high five, hug, kiss and none.</td><td>None.</td><td>6,766 video clips</td><td>video clips</td><td>Action prediction</td><td>2013</td><td><a class="footnote-ref" id="fnref:154" href="#fn:154">154</a></td><td>Patron-Perez, A. et al.</td></tr><tr><td>Berkeley Multimodal Human Action Database (MHAD)</td><td>Recordings of a single person performing 12 actions</td><td>MoCap pre-processing</td><td>660 action samples</td><td>8 PhaseSpace Motion Capture, 2 Stereo Cameras, 4 Quad Cameras, 6 accelerometers, 4 microphones</td><td>Action classification</td><td>2013</td><td><a class="footnote-ref" id="fnref:155" href="#fn:155">155</a></td><td>Ofli, F. et al.</td></tr><tr><td>THUMOS Dataset</td><td>Large video dataset for action classification.</td><td>Actions classified and labeled.</td><td>45M frames of video</td><td>Video, images, text</td><td>Classification, action detection</td><td>2013</td><td><a class="footnote-ref" id="fnref:156" href="#fn:156">156</a><a class="footnote-ref" id="fnref:157" href="#fn:157">157</a></td><td>Y. Jiang et al.</td></tr><tr><td>MEXAction2</td><td>Video dataset for action localization and spotting</td><td>Actions classified and labeled.</td><td>1000</td><td>Video</td><td>Action detection</td><td>2014</td><td><a class="footnote-ref" id="fnref:158" href="#fn:158">158</a></td><td>Stoian et al.</td></tr></tbody></table>
<h2 id="handwriting-and-character-recognition">Handwriting and character recognition</h2>
<table><tbody><tr><th>Dataset name</th><th>Brief description</th><th>Preprocessing</th><th>Instances</th><th>Format</th><th>Default Task</th><th>Created (updated)</th><th>Reference</th><th>Creator</th></tr><tr><td>Artificial Characters Dataset</td><td>Artificially generated data describing the structure of 10 capital English letters.</td><td>Coordinates of lines drawn given as integers. Various other features.</td><td>6000</td><td>Text</td><td><a href="/facts/Handwriting_recognition/RgdyWtDQ">Handwriting recognition</a>, classification</td><td>1992</td><td><a class="footnote-ref" id="fnref:159" href="#fn:159">159</a></td><td>H. Guvenir et al.</td></tr><tr><td>Letter Dataset</td><td>Upper-case printed letters.</td><td>17 features are extracted from all images.</td><td>20,000</td><td>Text</td><td>OCR, classification</td><td>1991</td><td><a class="footnote-ref" id="fnref:160" href="#fn:160">160</a><a class="footnote-ref" id="fnref:161" href="#fn:161">161</a></td><td>D. Slate et al.</td></tr><tr><td>CASIA-HWDB</td><td>Offline handwritten <a href="/facts/Chinese_characters/5pn1PFvj">Chinese character</a> database. 3755 classes in the <a href="/facts/GB_2312/5bhsAQE5">GB 2312</a> character set.</td><td>Gray-scaled images with background pixels labeled as 255.</td><td>1,172,907</td><td>Images, Text</td><td>Handwriting recognition, classification</td><td>2009</td><td><a class="footnote-ref" id="fnref:162" href="#fn:162">162</a></td><td><a href="/facts/Institute_of_Automation%2c_Chinese_Academy_of_Sciences/JxAHgCv8">CASIA</a></td></tr><tr><td>CASIA-OLHWDB</td><td>Online handwritten Chinese character database, collected using Anoto pen on paper. 3755 classes in the <a href="/facts/GB_2312/5bhsAQE5">GB 2312</a> character set.</td><td>Provides the sequences of coordinates of strokes.</td><td>1,174,364</td><td>Images, Text</td><td>Handwriting recognition, classification</td><td>2009</td><td><a class="footnote-ref" id="fnref:163" href="#fn:163">163</a><a class="footnote-ref" id="fnref:164" href="#fn:164">164</a></td><td>CASIA</td></tr><tr><td>Character Trajectories Dataset</td><td>Labeled samples of pen tip trajectories for people writing simple characters.</td><td>3-dimensional pen tip velocity trajectory matrix for each sample</td><td>2858</td><td>Text</td><td>Handwriting recognition, classification</td><td>2008</td><td><a class="footnote-ref" id="fnref:165" href="#fn:165">165</a><a class="footnote-ref" id="fnref:166" href="#fn:166">166</a></td><td>B. Williams</td></tr><tr><td>Chars74K Dataset</td><td>Character recognition in natural images of symbols used in both English and <a href="/facts/Kannada_alphabet/LUbwH0pS">Kannada</a></td><td></td><td>74,107</td><td></td><td>Character recognition, handwriting recognition, OCR, classification</td><td>2009</td><td><a class="footnote-ref" id="fnref:167" href="#fn:167">167</a></td><td>T. de Campos</td></tr><tr><td>EMNIST dataset</td><td>Handwritten characters from 3600 contributors</td><td>Derived from NIST Special Database 19. Converted to 28x28 pixel images, matching the MNIST dataset.<a class="footnote-ref" id="fnref:168" href="#fn:168">168</a></td><td>800,000</td><td>Images</td><td>character recognition, classification, handwriting recognition</td><td>2016</td><td>EMNIST dataset<a class="footnote-ref" id="fnref:169" href="#fn:169">169</a>Documentation<a class="footnote-ref" id="fnref:170" href="#fn:170">170</a></td><td>Gregory Cohen, et al.</td></tr><tr><td>UJI Pen Characters Dataset</td><td>Isolated handwritten characters</td><td>Coordinates of pen position as characters were written given.</td><td>11,640</td><td>Text</td><td>Handwriting recognition, classification</td><td>2009</td><td><a class="footnote-ref" id="fnref:171" href="#fn:171">171</a><a class="footnote-ref" id="fnref:172" href="#fn:172">172</a></td><td>F. Prat et al.</td></tr><tr><td>Gisette Dataset</td><td>Handwriting samples from the often-confused 4 and 9 characters.</td><td>Features extracted from images, split into train/test, handwriting images size-normalized.</td><td>13,500</td><td>Images, text</td><td>Handwriting recognition, classification</td><td>2003</td><td><a class="footnote-ref" id="fnref:173" href="#fn:173">173</a></td><td>Yann LeCun et al.</td></tr><tr><td>Omniglot dataset</td><td>1623 different handwritten characters from 50 different alphabets.</td><td>Hand-labeled.</td><td>38,300</td><td>Images, text, strokes</td><td>Classification, <a href="/facts/One-shot_learning_in_computer_vision/CgIw6JSz">one-shot learning</a></td><td>2015</td><td><a class="footnote-ref" id="fnref:174" href="#fn:174">174</a><a class="footnote-ref" id="fnref:175" href="#fn:175">175</a></td><td><a href="/facts/American_Association_for_the_Advancement_of_Science/zfWYugCg">American Association for the Advancement of Science</a></td></tr><tr><td><a href="/facts/MNIST_database/nzUEydbL">MNIST database</a></td><td>Database of handwritten digits.</td><td>Hand-labeled.</td><td>60,000</td><td>Images, text</td><td>Classification</td><td>1994</td><td><a class="footnote-ref" id="fnref:176" href="#fn:176">176</a><a class="footnote-ref" id="fnref:177" href="#fn:177">177</a></td><td><a href="/facts/National_Institute_of_Standards_and_Technology/gr9Fnh85">National Institute of Standards and Technology</a></td></tr><tr><td>Optical Recognition of Handwritten Digits Dataset</td><td>Normalized bitmaps of handwritten data.</td><td>Size normalized and mapped to bitmaps.</td><td>5620</td><td>Images, text</td><td>Handwriting recognition, classification</td><td>1998</td><td><a class="footnote-ref" id="fnref:178" href="#fn:178">178</a></td><td>E. Alpaydin et al.</td></tr><tr><td>Pen-Based Recognition of Handwritten Digits Dataset</td><td>Handwritten digits on electronic pen-tablet.</td><td>Feature vectors extracted to be uniformly spaced.</td><td>10,992</td><td>Images, text</td><td>Handwriting recognition, classification</td><td>1998</td><td><a class="footnote-ref" id="fnref:179" href="#fn:179">179</a><a class="footnote-ref" id="fnref:180" href="#fn:180">180</a></td><td>E. Alpaydin et al.</td></tr><tr><td>Semeion Handwritten Digit Dataset</td><td>Handwritten digits from 80 people.</td><td>All handwritten digits have been normalized for size and mapped to the same grid.</td><td>1593</td><td>Images, text</td><td>Handwriting recognition, classification</td><td>2008</td><td><a class="footnote-ref" id="fnref:181" href="#fn:181">181</a></td><td>T. Srl</td></tr><tr><td>HASYv2</td><td>Handwritten mathematical symbols</td><td>All symbols are centered and of size 32px x 32px.</td><td>168233</td><td>Images, text</td><td>Classification</td><td>2017</td><td><a class="footnote-ref" id="fnref:182" href="#fn:182">182</a></td><td>Martin Thoma</td></tr><tr><td>Noisy Handwritten Bangla Dataset</td><td>Includes Handwritten Numeral Dataset (10 classes) and Basic Character Dataset (50 classes), each dataset has three types of noise: white gaussian, motion blur, and reduced contrast.</td><td>All images are centered and of size 32x32.</td><td>Numeral Dataset:23330,Character Dataset:76000</td><td>Images,text</td><td>Handwriting recognition,classification</td><td>2017</td><td><a class="footnote-ref" id="fnref:183" href="#fn:183">183</a><a class="footnote-ref" id="fnref:184" href="#fn:184">184</a></td><td>M. Karki et al.</td></tr></tbody></table>
<h2 id="aerial-images">Aerial images</h2>
<table><tbody><tr><th scope="col">Dataset name</th><th scope="col">Brief description</th><th scope="col">Preprocessing</th><th scope="col">Instances</th><th scope="col">Format</th><th scope="col">Default Task</th><th scope="col">Created (updated)</th><th scope="col">Reference</th><th scope="col">Creator</th></tr><tr><td>iSAID: Instance Segmentation in Aerial Images Dataset</td><td></td><td>Precise instance-level annotatio carried out by professional annotators, cross-checked and validated by expert annotators complying with well-defined guidelines.</td><td>655,451 (15 classes)</td><td>Images, jpg, json</td><td>Aerial Classification, Object Detection, Instance Segmentation</td><td>2019</td><td><a class="footnote-ref" id="fnref:185" href="#fn:185">185</a><a class="footnote-ref" id="fnref:186" href="#fn:186">186</a></td><td>Syed Waqas Zamir,Aditya Arora,Akshita Gupta,Salman Khan,Guolei Sun,Fahad Shahbaz Khan, Fan Zhu,Ling Shao, Gui-Song Xia, Xiang Bai</td></tr><tr><td>Aerial Image Segmentation Dataset</td><td>80 high-resolution aerial images with spatial resolution ranging from 0.3 to 1.0.</td><td>Images manually segmented.</td><td>80</td><td>Images</td><td>Aerial Classification, object detection</td><td>2013</td><td><a class="footnote-ref" id="fnref:187" href="#fn:187">187</a><a class="footnote-ref" id="fnref:188" href="#fn:188">188</a></td><td>J. Yuan et al.</td></tr><tr><td>KIT AIS Data Set</td><td>Multiple labeled training and evaluation datasets of aerial images of crowds.</td><td>Images manually labeled to show paths of individuals through crowds.</td><td>~ 150</td><td>Images with paths</td><td>People tracking, aerial tracking</td><td>2012</td><td><a class="footnote-ref" id="fnref:189" href="#fn:189">189</a><a class="footnote-ref" id="fnref:190" href="#fn:190">190</a></td><td>M. Butenuth et al.</td></tr><tr><td>Wilt Dataset</td><td>Remote sensing data of diseased trees and other land cover.</td><td>Various features extracted.</td><td>4899</td><td>Images</td><td>Classification, aerial object detection</td><td>2014</td><td><a class="footnote-ref" id="fnref:191" href="#fn:191">191</a><a class="footnote-ref" id="fnref:192" href="#fn:192">192</a></td><td>B. Johnson</td></tr><tr><td>MASATI dataset</td><td>Maritime scenes of optical aerial images from the visible spectrum. It contains color images in dynamic marine environments, each image may contain one or multiple targets in different weather and illumination conditions.</td><td>Object bounding boxes and labeling.</td><td>7389</td><td>Images</td><td>Classification, aerial object detection</td><td>2018</td><td><a class="footnote-ref" id="fnref:193" href="#fn:193">193</a><a class="footnote-ref" id="fnref:194" href="#fn:194">194</a></td><td>A.-J. Gallego et al.</td></tr><tr><td>Forest Type Mapping Dataset</td><td>Satellite imagery of forests in Japan.</td><td>Image wavelength bands extracted.</td><td>326</td><td>Text</td><td>Classification</td><td>2015</td><td><a class="footnote-ref" id="fnref:195" href="#fn:195">195</a><a class="footnote-ref" id="fnref:196" href="#fn:196">196</a></td><td>B. Johnson</td></tr><tr><td><a href="/facts/Overhead_Imagery_Research_Data_Set/a1SUqQh5">Overhead Imagery Research Data Set</a></td><td>Annotated overhead imagery. Images with multiple objects.</td><td>Over 30 annotations and over 60 statistics that describe the target within the context of the image.</td><td>1000</td><td>Images, text</td><td>Classification</td><td>2009</td><td><a class="footnote-ref" id="fnref:197" href="#fn:197">197</a><a class="footnote-ref" id="fnref:198" href="#fn:198">198</a></td><td>F. Tanner et al.</td></tr><tr><td>SpaceNet</td><td>SpaceNet is a corpus of commercial satellite imagery and labeled training data.</td><td>GeoTiff and GeoJSON files containing building footprints.</td><td>>17533</td><td>Images</td><td>Classification, Object Identification</td><td>2017</td><td><a class="footnote-ref" id="fnref:199" href="#fn:199">199</a><a class="footnote-ref" id="fnref:200" href="#fn:200">200</a><a class="footnote-ref" id="fnref:201" href="#fn:201">201</a></td><td><a href="/facts/DigitalGlobe/t0uyAdux">DigitalGlobe, Inc.</a></td></tr><tr><td>UC Merced Land Use Dataset</td><td>These images were manually extracted from large images from the USGS National Map Urban Area Imagery collection for various urban areas around the US.</td><td>This is a 21 class land use image dataset meant for research purposes. There are 100 images for each class.</td><td>2,100</td><td>Image chips of 256x256, 30 cm (1 foot) GSD</td><td>Land cover classification</td><td>2010</td><td><a class="footnote-ref" id="fnref:202" href="#fn:202">202</a></td><td>Yi Yang and Shawn Newsam</td></tr><tr><td>SAT-4 Airborne Dataset</td><td>Images were extracted from the National Agriculture Imagery Program (NAIP) dataset.</td><td>SAT-4 has four broad land cover classes, includes barren land, trees, grassland and a class that consists of all land cover classes other than the above three.</td><td>500,000</td><td>Images</td><td>Classification</td><td>2015</td><td><a class="footnote-ref" id="fnref:203" href="#fn:203">203</a><a class="footnote-ref" id="fnref:204" href="#fn:204">204</a></td><td>S. Basu et al.</td></tr><tr><td>SAT-6 Airborne Dataset</td><td>Images were extracted from the National Agriculture Imagery Program (NAIP) dataset.</td><td>SAT-6 has six broad land cover classes, includes barren land, trees, grassland, roads, buildings and water bodies.</td><td>405,000</td><td>Images</td><td>Classification</td><td>2015</td><td><a class="footnote-ref" id="fnref:205" href="#fn:205">205</a><a class="footnote-ref" id="fnref:206" href="#fn:206">206</a></td><td>S. Basu et al.</td></tr></tbody></table>
<h2 id="underwater-images">Underwater images</h2>
<table><tbody><tr><th scope="col">Dataset name</th><th scope="col">Brief description</th><th scope="col">Preprocessing</th><th scope="col">Instances</th><th scope="col">Format</th><th scope="col">Default Task</th><th scope="col">Created (updated)</th><th scope="col">Reference</th><th scope="col">Creator</th></tr><tr><td>SUIM Dataset</td><td>The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants.</td><td>Images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor.</td><td>1,635</td><td>Images</td><td>Segmentation</td><td>2020</td><td><a class="footnote-ref" id="fnref:207" href="#fn:207">207</a></td><td>Md Jahidul Islam et al.</td></tr><tr><td>LIACI Dataset</td><td>Images have been collected during underwater ship inspections and annotated by human domain experts.</td><td>Images with pixel annotations for ten object categories: defects, corrosion, paint peel, marine growth, sea chest gratings, overboard valves, propeller, anodes, bilge keel and ship hull.</td><td>1,893</td><td>Images</td><td>Segmentation</td><td>2022</td><td><a class="footnote-ref" id="fnref:208" href="#fn:208">208</a></td><td>Waszak et al.</td></tr></tbody></table>
<h2 id="other-images">Other images</h2>
<table><tbody><tr><th scope="col">Dataset name</th><th scope="col">Brief description</th><th scope="col">Preprocessing</th><th scope="col">Instances</th><th scope="col">Format</th><th scope="col">Default Task</th><th scope="col">Created (updated)</th><th scope="col">Reference</th><th scope="col">Creator</th></tr><tr><td>Kodak Lossless True Color Image Suite</td><td>RGB images for testing image compression.</td><td>None</td><td>24</td><td>Image</td><td>Image compression</td><td>1999</td><td><a class="footnote-ref" id="fnref:209" href="#fn:209">209</a></td><td><a href="/facts/Kodak/zMnK74Cp">Kodak</a></td></tr><tr><td>NRC-GAMMA</td><td>A novel benchmark gas meter image dataset</td><td>None</td><td>28,883</td><td>Image, Label</td><td>Classification</td><td>2021</td><td><a class="footnote-ref" id="fnref:210" href="#fn:210">210</a><a class="footnote-ref" id="fnref:211" href="#fn:211">211</a></td><td>A. Ebadi, P. Paul, S. Auer, & S. Tremblay</td></tr><tr><td>The SUPATLANTIQUE dataset</td><td>Images of scanned official and Wikipedia documents</td><td>None</td><td>4908</td><td>TIFF/pdf</td><td>Source device identification, forgery detection, Classification,..</td><td>2020</td><td><a class="footnote-ref" id="fnref:212" href="#fn:212">212</a></td><td>C. Ben Rabah et al.</td></tr><tr><td>Density functional theory quantum simulations of graphene</td><td>Labelled images of raw input to a simulation of graphene</td><td>Raw data (in HDF5 format) and output labels from density functional theory quantum simulation</td><td>60744 test and 501473 training files</td><td>Labeled images</td><td>Regression</td><td>2019</td><td><a class="footnote-ref" id="fnref:213" href="#fn:213">213</a></td><td>K. Mills & I. Tamblyn</td></tr><tr><td>Quantum simulations of an electron in a two dimensional potential well</td><td>Labelled images of raw input to a simulation of 2d Quantum mechanics</td><td>Raw data (in HDF5 format) and output labels from quantum simulation</td><td>1.3 million images</td><td>Labeled images</td><td>Regression</td><td>2017</td><td><a class="footnote-ref" id="fnref:214" href="#fn:214">214</a></td><td>K. Mills, M.A. Spanner, & I. Tamblyn</td></tr><tr><td>MPII Cooking Activities Dataset</td><td>Videos and images of various cooking activities.</td><td>Activity paths and directions, labels, fine-grained motion labeling, activity class, still image extraction and labeling.</td><td>881,755 frames</td><td>Labeled video, images, text</td><td>Classification</td><td>2012</td><td><a class="footnote-ref" id="fnref:215" href="#fn:215">215</a><a class="footnote-ref" id="fnref:216" href="#fn:216">216</a></td><td>M. Rohrbach et al.</td></tr><tr><td>FAMOS Dataset</td><td>5,000 unique microstructures, all samples have been acquired 3 times with two different cameras.</td><td>Original PNG files, sorted per camera and then per acquisition. MATLAB datafiles with one 16384 times 5000 matrix per camera per acquisition.</td><td>30,000</td><td>Images and .mat files</td><td>Authentication</td><td>2012</td><td><a class="footnote-ref" id="fnref:217" href="#fn:217">217</a></td><td>S. Voloshynovskiy, et al.</td></tr><tr><td>PharmaPack Dataset</td><td>1,000 unique classes with 54 images per class.</td><td>Class labeling, many local descriptors, like SIFT and aKaZE, and local feature agreators, like Fisher Vector (FV).</td><td>54,000</td><td>Images and .mat files</td><td>Fine-grain classification</td><td>2017</td><td><a class="footnote-ref" id="fnref:218" href="#fn:218">218</a></td><td>O. Taran and S. Rezaeifar, et al.</td></tr><tr><td>Stanford Dogs Dataset</td><td>Images of 120 breeds of dogs from around the world.</td><td>Train/test splits and ImageNet annotations provided.</td><td>20,580</td><td>Images, text</td><td>Fine-grain classification</td><td>2011</td><td><a class="footnote-ref" id="fnref:219" href="#fn:219">219</a><a class="footnote-ref" id="fnref:220" href="#fn:220">220</a></td><td>A. Khosla et al.</td></tr><tr><td>StanfordExtra Dataset</td><td>2D keypoints and segmentations for the Stanford Dogs Dataset.</td><td>2D keypoints and segmentations provided.</td><td>12,035</td><td>Labelled images</td><td>3D reconstruction/pose estimation</td><td>2020</td><td><a class="footnote-ref" id="fnref:221" href="#fn:221">221</a></td><td>B. Biggs et al.</td></tr><tr><td>The Oxford-IIIT Pet Dataset</td><td>37 categories of pets with roughly 200 images of each.</td><td>Breed labeled, tight bounding box, foreground-background segmentation.</td><td>~ 7,400</td><td>Images, text</td><td>Classification, object detection</td><td>2012</td><td><a class="footnote-ref" id="fnref:222" href="#fn:222">222</a><a class="footnote-ref" id="fnref:223" href="#fn:223">223</a></td><td>O. Parkhi et al.</td></tr><tr><td>Corel Image Features Data Set</td><td>Database of images with features extracted.</td><td>Many features including color histogram, co-occurrence texture, and colormoments,</td><td>68,040</td><td>Text</td><td>Classification, object detection</td><td>1999</td><td><a class="footnote-ref" id="fnref:224" href="#fn:224">224</a><a class="footnote-ref" id="fnref:225" href="#fn:225">225</a></td><td>M. Ortega-Bindenberger et al.</td></tr><tr><td>Online Video Characteristics and Transcoding Time Dataset.</td><td>Transcoding times for various different videos and video properties.</td><td>Video features given.</td><td>168,286</td><td>Text</td><td>Regression</td><td>2015</td><td><a class="footnote-ref" id="fnref:226" href="#fn:226">226</a></td><td>T. Deneke et al.</td></tr><tr><td>Microsoft Sequential Image Narrative Dataset (SIND)</td><td>Dataset for sequential vision-to-language</td><td>Descriptive caption and storytelling given for each photo, and photos are arranged in sequences</td><td>81,743</td><td>Images, text</td><td>Visual storytelling</td><td>2016</td><td><a class="footnote-ref" id="fnref:227" href="#fn:227">227</a></td><td><a href="/facts/Microsoft_Research/04A5LpMI">Microsoft Research</a></td></tr><tr><td>Caltech-UCSD Birds-200-2011 Dataset</td><td>Large dataset of images of birds.</td><td>Part locations for birds, bounding boxes, 312 binary attributes given</td><td>11,788</td><td>Images, text</td><td>Classification</td><td>2011</td><td><a class="footnote-ref" id="fnref:228" href="#fn:228">228</a><a class="footnote-ref" id="fnref:229" href="#fn:229">229</a></td><td>C. Wah et al.</td></tr><tr><td>YouTube-8M</td><td>Large and diverse labeled video dataset</td><td>YouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities</td><td>8 million</td><td>Video, text</td><td>Video classification</td><td>2016</td><td><a class="footnote-ref" id="fnref:230" href="#fn:230">230</a><a class="footnote-ref" id="fnref:231" href="#fn:231">231</a></td><td>S. Abu-El-Haija et al.</td></tr><tr><td>YFCC100M</td><td>Large and diverse labeled image and video dataset</td><td>Flickr Videos and Images and associated description, titles, tags, and other metadata (such as EXIF and geotags)</td><td>100 million</td><td>Video, Image, Text</td><td>Video and Image classification</td><td>2016</td><td><a class="footnote-ref" id="fnref:232" href="#fn:232">232</a><a class="footnote-ref" id="fnref:233" href="#fn:233">233</a></td><td>B. Thomee et al.</td></tr><tr><td>Discrete LIRIS-ACCEDE</td><td>Short videos annotated for valence and arousal.</td><td>Valence and arousal labels.</td><td>9800</td><td>Video</td><td>Video emotion elicitation detection</td><td>2015</td><td><a class="footnote-ref" id="fnref:234" href="#fn:234">234</a></td><td>Y. Baveye et al.</td></tr><tr><td>Continuous LIRIS-ACCEDE</td><td>Long videos annotated for valence and arousal while also collecting Galvanic Skin Response.</td><td>Valence and arousal labels.</td><td>30</td><td>Video</td><td>Video emotion elicitation detection</td><td>2015</td><td><a class="footnote-ref" id="fnref:235" href="#fn:235">235</a></td><td>Y. Baveye et al.</td></tr><tr><td>MediaEval LIRIS-ACCEDE</td><td>Extension of Discrete LIRIS-ACCEDE including annotations for violence levels of the films.</td><td>Violence, valence and arousal labels.</td><td>10900</td><td>Video</td><td>Video emotion elicitation detection</td><td>2015</td><td><a class="footnote-ref" id="fnref:236" href="#fn:236">236</a></td><td>Y. Baveye et al.</td></tr><tr><td>Leeds Sports Pose</td><td>Articulated human pose annotations in 2000 natural sports images from Flickr.</td><td>Rough crop around single person of interest with 14 joint labels</td><td>2000</td><td>Images plus .mat file labels</td><td>Human pose estimation</td><td>2010</td><td><a class="footnote-ref" id="fnref:237" href="#fn:237">237</a></td><td>S. Johnson and M. Everingham</td></tr><tr><td>Leeds Sports Pose Extended Training</td><td>Articulated human pose annotations in 10,000 natural sports images from Flickr.</td><td>14 joint labels via crowdsourcing</td><td>10000</td><td>Images plus .mat file labels</td><td>Human pose estimation</td><td>2011</td><td><a class="footnote-ref" id="fnref:238" href="#fn:238">238</a></td><td>S. Johnson and M. Everingham</td></tr><tr><td>MCQ Dataset</td><td>6 different real multiple choice-based exams (735 answer sheets and 33,540 answer boxes) to evaluate computer vision techniques and systems developed for multiple choice test assessment systems.</td><td>None</td><td>735 answer sheets and 33,540 answer boxes</td><td>Images and .mat file labels</td><td>Development of multiple choice test assessment systems</td><td>2017</td><td><a class="footnote-ref" id="fnref:239" href="#fn:239">239</a><a class="footnote-ref" id="fnref:240" href="#fn:240">240</a></td><td>Afifi, M. et al.</td></tr><tr><td>Surveillance Videos</td><td>Real surveillance videos cover a large surveillance time (7 days with 24 hours each).</td><td>None</td><td>19 surveillance videos (7 days with 24 hours each).</td><td>Videos</td><td>Data compression</td><td>2016</td><td><a class="footnote-ref" id="fnref:241" href="#fn:241">241</a></td><td>Taj-Eddin, I. A. T. F. et al.</td></tr><tr><td>LILA BC</td><td>Labeled Information Library of Alexandria: Biology and Conservation. Labeled images that support machine learning research around ecology and environmental science.</td><td>None</td><td>~10M images</td><td>Images</td><td>Classification</td><td>2019</td><td><a class="footnote-ref" id="fnref:242" href="#fn:242">242</a></td><td>LILA working group</td></tr><tr><td>Can We See Photosynthesis?</td><td>32 videos for eight live and eight dead leaves recorded under both DC and AC lighting conditions.</td><td>None</td><td>32 videos</td><td>Videos</td><td>Liveness detection of plants</td><td>2017</td><td><a class="footnote-ref" id="fnref:243" href="#fn:243">243</a></td><td>Taj-Eddin, I. A. T. F. et al.</td></tr><tr><td>Mathematical Mathematics Memes</td><td>Collection of 10,000 memes on mathematics.</td><td>None</td><td>~10,000</td><td>Images</td><td>Visual storytelling, object detection.</td><td>2021</td><td><a class="footnote-ref" id="fnref:244" href="#fn:244">244</a></td><td>Mathematical Mathematics Memes</td></tr><tr><td>Flickr-Faces-HQ Dataset</td><td>Collection of images containing a face each, crawled from Flickr</td><td>Pruned with "various automatic filters", cropped and aligned to faces, and had images of statues, paintings, or photos of photos removed via crowdsourcing</td><td>70,000</td><td>Images</td><td>Face Generation</td><td>2019</td><td><a class="footnote-ref" id="fnref:245" href="#fn:245">245</a></td><td>Karras et al.</td></tr><tr><td>Fruits-360 dataset</td><td>Collection of images containing 170 fruits, vegetables, nuts, and seeds.</td><td>100x100 pixels, white background.</td><td>115499</td><td>Images (jpg)</td><td>Classification</td><td>2017–2025</td><td><a class="footnote-ref" id="fnref:246" href="#fn:246">246</a></td><td>Mihai Oltean</td></tr></tbody></table>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Bottou, L.; Cortes, C.; Denker, J.S.; Drucker, H.; Guyon, I.; Jackel, L.D.; LeCun, Y.; Muller, U.A.; Sackinger, E.; Simard, P.; Vapnik, V. (1994). "Comparison of classifier methods: A case study in handwritten digit recognition". Proceedings of the 12th IAPR International Conference on Pattern Recognition (Cat. No.94CH3440-5). Vol. 2. IEEE Comput. Soc. Press. pp. 77–82. doi:10.1109/ICPR.1994.576879. ISBN 978-0-8186-6270-6. <a href="978-0-8186-6270-6" target="_blank">978-0-8186-6270-6</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">"NIST Special Database 19". NIST. 2010-08-27. <a href="https://www.nist.gov/srd/nist-special-database-19" target="_blank">https://www.nist.gov/srd/nist-special-database-19</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">LeCun, Yann. "NORB: Generic Object Recognition in Images". cs.nyu.edu. Retrieved 2025-04-26. <a href="https://cs.nyu.edu/~yann/research/norb/" target="_blank">https://cs.nyu.edu/~yann/research/norb/</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">LeCun, Y.; Fu Jie Huang; Bottou, L. (2004). "Learning methods for generic object recognition with invariance to pose and lighting". 2. IEEE: 97–104. doi:10.1109/CVPR.2004.1315150. ISBN 978-0-7695-2158-9. {{cite journal}}: Cite journal requires |journal= (help) <a href="978-0-7695-2158-9" target="_blank">978-0-7695-2158-9</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Torralba, A.; Fergus, R.; Freeman, W.T. (November 2008). "80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 30 (11): 1958–1970. doi:10.1109/TPAMI.2008.128. ISSN 0162-8828. PMID 18787244. <a href="https://ieeexplore.ieee.org/document/4531741" target="_blank">https://ieeexplore.ieee.org/document/4531741</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">"The Street View House Numbers (SVHN) Dataset". ufldl.stanford.edu. Retrieved 2025-02-25. <a href="http://ufldl.stanford.edu/housenumbers/" target="_blank">http://ufldl.stanford.edu/housenumbers/</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng. "Reading Digits in Natural Images with Unsupervised Feature Learning" NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011 <a href="http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf" target="_blank">http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Hinton, Geoffrey; Vinyals, Oriol; Dean, Jeff (2015-03-09). "Distilling the Knowledge in a Neural Network". arXiv:1503.02531 [stat.ML]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Sun, Chen; Shrivastava, Abhinav; Singh, Saurabh; Gupta, Abhinav (2017). "Revisiting Unreasonable Effectiveness of Data in Deep Learning Era". pp. 843–852. arXiv:1707.02968 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Abnar, Samira; Dehghani, Mostafa; Neyshabur, Behnam; Sedghi, Hanie (2021-10-05). "Exploring the Limits of Large Scale Pre-training". arXiv:2110.02095 [cs.LG]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Zhai, Xiaohua; Kolesnikov, Alexander; Houlsby, Neil; Beyer, Lucas (2021-06-08). "Scaling Vision Transformers". arXiv:2106.04560 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">Zhou, Bolei; Lapedriza, Agata; Khosla, Aditya; Oliva, Aude; Torralba, Antonio (2018-06-01). "Places: A 10 Million Image Database for Scene Recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 40 (6): 1452–1464. doi:10.1109/TPAMI.2017.2723009. ISSN 0162-8828. PMID 28692961. <a href="https://ieeexplore.ieee.org/document/7968387" target="_blank">https://ieeexplore.ieee.org/document/7968387</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
<li id="fn:13">Grauman, Kristen; Westbury, Andrew; Byrne, Eugene; Chavis, Zachary; Furnari, Antonino; Girdhar, Rohit; Hamburger, Jackson; Jiang, Hao; Liu, Miao; Liu, Xingyu; Martin, Miguel; Nagarajan, Tushar; Radosavovic, Ilija; Ramakrishnan, Santhosh Kumar; Ryan, Fiona; Sharma, Jayant; Wray, Michael; Xu, Mengmeng; Xu, Eric Zhongcong; Zhao, Chen; Bansal, Siddhant; Batra, Dhruv; Cartillier, Vincent; Crane, Sean; Do, Tien; Doulaty, Morrie; Erapalli, Akshay; Feichtenhofer, Christoph; Fragomeni, Adriano; Fu, Qichen; Gebreselasie, Abrham; Gonzalez, Cristina; Hillis, James; Huang, Xuhua; Huang, Yifei; Jia, Wenqi; Khoo, Weslie; Kolar, Jachym; Kottur, Satwik; Kumar, Anurag; Landini, Federico; Li, Chao; Li, Yanghao; Li, Zhenqiang; Mangalam, Karttikeya; Modhugu, Raghava; Munro, Jonathan; Murrell, Tullie; Nishiyasu, Takumi; Price, Will; Puentes, Paola Ruiz; Ramazanova, Merey; Sari, Leda; Somasundaram, Kiran; Southerland, Audrey; Sugano, Yusuke; Tao, Ruijie; Vo, Minh; Wang, Yuchen; Wu, Xindi; Yagi, Takuma; Zhao, Ziwei; Zhu, Yunyi; Arbelaez, Pablo; Crandall, David; Damen, Dima; Farinella, Giovanni Maria; Fuegen, Christian; Ghanem, Bernard; Ithapu, Vamsi Krishna; Jawahar, C. V.; Joo, Hanbyul; Kitani, Kris; Li, Haizhou; Newcombe, Richard; Oliva, Aude; Park, Hyun Soo; Rehg, James M.; Sato, Yoichi; Shi, Jianbo; Shou, Mike Zheng; Torralba, Antonio; Torresani, Lorenzo; Yan, Mingfei; Malik, Jitendra (2022). "Ego4D: Around the World in 3,000 Hours of Egocentric Video". arXiv:2110.07058 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:13" class="footnote-back-ref">↩</a></li>
<li id="fn:14">Srinivasan, Krishna; Raman, Karthik; Chen, Jiecao; Bendersky, Michael; Najork, Marc (2021-07-11). "WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM. pp. 2443–2449. arXiv:2103.01913. doi:10.1145/3404835.3463257. ISBN 978-1-4503-8037-9. <a href="978-1-4503-8037-9" target="_blank">978-1-4503-8037-9</a> <a href="#fnref:14" class="footnote-back-ref">↩</a></li>
<li id="fn:15">Krishna, Ranjay; Zhu, Yuke; Groth, Oliver; Johnson, Justin; Hata, Kenji; Kravitz, Joshua; Chen, Stephanie; Kalantidis, Yannis; Li, Li-Jia; Shamma, David A; Bernstein, Michael S; Fei-Fei, Li (2017). "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations". International Journal of Computer Vision. 123: 32–73. arXiv:1602.07332. doi:10.1007/s11263-016-0981-7. S2CID 4492210. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:15" class="footnote-back-ref">↩</a></li>
<li id="fn:16">Karayev, S., et al. "A category-level 3-D object dataset: putting the Kinect to work." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2011. <a href="http://alliejanoch.com/iccvw2011.pdf" target="_blank">http://alliejanoch.com/iccvw2011.pdf</a> <a href="#fnref:16" class="footnote-back-ref">↩</a></li>
<li id="fn:17">Tighe, Joseph, and Svetlana Lazebnik. "Superparsing: scalable nonparametric image parsing with superpixels Archived 6 August 2019 at the Wayback Machine." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 352–365. <a href="/wiki/Svetlana_Lazebnik" target="_blank">/wiki/Svetlana_Lazebnik</a> <a href="#fnref:17" class="footnote-back-ref">↩</a></li>
<li id="fn:18">Arbelaez, P.; Maire, M; Fowlkes, C; Malik, J (May 2011). "Contour Detection and Hierarchical Image Segmentation" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 33 (5): 898–916. doi:10.1109/tpami.2010.161. PMID 20733228. S2CID 206764694. Retrieved 27 February 2016. <a href="http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/papers/amfm_pami2010.pdf" target="_blank">http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/papers/amfm_pami2010.pdf</a> <a href="#fnref:18" class="footnote-back-ref">↩</a></li>
<li id="fn:19">Lin, Tsung-Yi; Maire, Michael; Belongie, Serge; Bourdev, Lubomir; Girshick, Ross; Hays, James; Perona, Pietro; Ramanan, Deva; Lawrence Zitnick, C.; Dollár, Piotr (2014). "Microsoft COCO: Common Objects in Context". arXiv:1405.0312 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:19" class="footnote-back-ref">↩</a></li>
<li id="fn:20">Russakovsky, Olga; et al. (2015). "Imagenet large scale visual recognition challenge". International Journal of Computer Vision. 115 (3): 211–252. arXiv:1409.0575. doi:10.1007/s11263-015-0816-y. hdl:1721.1/104944. S2CID 2930547. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:20" class="footnote-back-ref">↩</a></li>
<li id="fn:21">"COCO – Common Objects in Context". cocodataset.org. <a href="https://cocodataset.org/" target="_blank">https://cocodataset.org/</a> <a href="#fnref:21" class="footnote-back-ref">↩</a></li>
<li id="fn:22">Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database."Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009. <a href="https://www.researchgate.net/profile/Li_Jia_Li/publication/221361415_ImageNet_a_Large-Scale_Hierarchical_Image_Database/links/00b495388120dbc339000000/ImageNet-a-Large-Scale-Hierarchical-Image-Database.pdf" target="_blank">https://www.researchgate.net/profile/Li_Jia_Li/publication/221361415_ImageNet_a_Large-Scale_Hierarchical_Image_Database/links/00b495388120dbc339000000/ImageNet-a-Large-Scale-Hierarchical-Image-Database.pdf</a> <a href="#fnref:22" class="footnote-back-ref">↩</a></li>
<li id="fn:23">Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. <a href="http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf" target="_blank">http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf</a> <a href="#fnref:23" class="footnote-back-ref">↩</a></li>
<li id="fn:24">Russakovsky, Olga; Deng, Jia; Su, Hao; Krause, Jonathan; Satheesh, Sanjeev; et al. (11 April 2015). "ImageNet Large Scale Visual Recognition Challenge". International Journal of Computer Vision. 115 (3): 211–252. arXiv:1409.0575. doi:10.1007/s11263-015-0816-y. hdl:1721.1/104944. S2CID 2930547. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:24" class="footnote-back-ref">↩</a></li>
<li id="fn:25">Xiao, Jianxiong; Hays, James; Ehinger, Krista A.; Oliva, Aude; Torralba, Antonio (June 2010). "SUN database: Large-scale scene recognition from abbey to zoo". 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE. pp. 3485–3492. doi:10.1109/cvpr.2010.5539970. hdl:1721.1/60690. ISBN 978-1-4244-6984-0. <a href="978-1-4244-6984-0" target="_blank">978-1-4244-6984-0</a> <a href="#fnref:25" class="footnote-back-ref">↩</a></li>
<li id="fn:26">Donahue, Jeff; Jia, Yangqing; Vinyals, Oriol; Hoffman, Judy; Zhang, Ning; Tzeng, Eric; Darrell, Trevor (2013). "DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition". arXiv:1310.1531 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:26" class="footnote-back-ref">↩</a></li>
<li id="fn:27">Yu, Fisher; Seff, Ari; Zhang, Yinda; Song, Shuran; Funkhouser, Thomas; Xiao, Jianxiong (2016-06-04). "LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop". arXiv:1506.03365 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:27" class="footnote-back-ref">↩</a></li>
<li id="fn:28">"Index of /lsun/". dl.yf.io. Retrieved 2024-09-19. <a href="http://dl.yf.io/lsun/" target="_blank">http://dl.yf.io/lsun/</a> <a href="#fnref:28" class="footnote-back-ref">↩</a></li>
<li id="fn:29">"LSUN". Complex Adaptive Systems Laboratory. Retrieved 2024-09-19. <a href="https://complexity.cecs.ucf.edu/lsun/" target="_blank">https://complexity.cecs.ucf.edu/lsun/</a> <a href="#fnref:29" class="footnote-back-ref">↩</a></li>
<li id="fn:30">Gupta, Agrim; Dollar, Piotr; Girshick, Ross (2019). "LVIS: A Dataset for Large Vocabulary Instance Segmentation": 5356–5364. {{cite journal}}: Cite journal requires |journal= (help) <a href="https://openaccess.thecvf.com/content_CVPR_2019/html/Gupta_LVIS_A_Dataset_for_Large_Vocabulary_Instance_Segmentation_CVPR_2019_paper.html" target="_blank">https://openaccess.thecvf.com/content_CVPR_2019/html/Gupta_LVIS_A_Dataset_for_Large_Vocabulary_Instance_Segmentation_CVPR_2019_paper.html</a> <a href="#fnref:30" class="footnote-back-ref">↩</a></li>
<li id="fn:31">Ivan Krasin, Tom Duerig, Neil Alldrin, Andreas Veit, Sami Abu-El-Haija, Serge Belongie, David Cai, Zheyun Feng, Vittorio Ferrari, Victor Gomes, Abhinav Gupta, Dhyanesh Narayanan, Chen Sun, Gal Chechik, Kevin Murphy. "OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017. Available from https://github.com/openimages." <a href="https://github.com/openimages" target="_blank">https://github.com/openimages</a> <a href="#fnref:31" class="footnote-back-ref">↩</a></li>
<li id="fn:32">Vyas, Apoorv, et al. "Commercial Block Detection in Broadcast News Videos." Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing. ACM, 2014. <a href="https://dl.acm.org/citation.cfm?id=2683546" target="_blank">https://dl.acm.org/citation.cfm?id=2683546</a> <a href="#fnref:32" class="footnote-back-ref">↩</a></li>
<li id="fn:33">Hauptmann, Alexander G., and Michael J. Witbrock. "Story segmentation and detection of commercials in broadcast news video." Research and Technology Advances in Digital Libraries, 1998. ADL 98. Proceedings. IEEE International Forum on. IEEE, 1998. <a href="https://pdfs.semanticscholar.org/5c21/6db7892fa3f515d816f84893bfab1137f0b2.pdf" target="_blank">https://pdfs.semanticscholar.org/5c21/6db7892fa3f515d816f84893bfab1137f0b2.pdf</a> <a href="#fnref:33" class="footnote-back-ref">↩</a></li>
<li id="fn:34">Tung, Anthony KH, Xin Xu, and Beng Chin Ooi. "Curler: finding and visualizing nonlinear correlation clusters." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005. <a href="https://www.researchgate.net/profile/Anthony_Tung/publication/221214229_CURLER_Finding_and_Visualizing_Nonlinear_Correlated_Clusters/links/55b8691a08aed621de05cd92.pdf" target="_blank">https://www.researchgate.net/profile/Anthony_Tung/publication/221214229_CURLER_Finding_and_Visualizing_Nonlinear_Correlated_Clusters/links/55b8691a08aed621de05cd92.pdf</a> <a href="#fnref:34" class="footnote-back-ref">↩</a></li>
<li id="fn:35">Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009. <a href="https://ieeexplore.ieee.org/abstract/document/5459469/" target="_blank">https://ieeexplore.ieee.org/abstract/document/5459469/</a> <a href="#fnref:35" class="footnote-back-ref">↩</a></li>
<li id="fn:36">Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories."Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006. <a href="/wiki/Svetlana_Lazebnik" target="_blank">/wiki/Svetlana_Lazebnik</a> <a href="#fnref:36" class="footnote-back-ref">↩</a></li>
<li id="fn:37">Griffin, G., A. Holub, and P. Perona. Caltech-256 object category dataset California Inst. Technol., Tech. Rep. 7694, 2007. Available: http://authors.library.caltech.edu/7694, 2007. <a href="http://authors.library.caltech.edu/7694" target="_blank">http://authors.library.caltech.edu/7694</a> <a href="#fnref:37" class="footnote-back-ref">↩</a></li>
<li id="fn:38">Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern information retrieval. Vol. 463. New York: ACM press, 1999. <a href="#fnref:38" class="footnote-back-ref">↩</a></li>
<li id="fn:39">"🐺 COYO-700M: Image-Text Pair Dataset". Kakao Brain. 2022-11-03. Retrieved 2022-11-03. <a href="https://github.com/kakaobrain/coyo-dataset" target="_blank">https://github.com/kakaobrain/coyo-dataset</a> <a href="#fnref:39" class="footnote-back-ref">↩</a></li>
<li id="fn:40">Fu, Xiping, et al. "NOKMeans: Non-Orthogonal K-means Hashing." Computer Vision—ACCV 2014. Springer International Publishing, 2014. 162–177. <a href="https://pdfs.semanticscholar.org/9da2/abae3072fd9fcff0e13b8f00fc21f22d0085.pdf" target="_blank">https://pdfs.semanticscholar.org/9da2/abae3072fd9fcff0e13b8f00fc21f22d0085.pdf</a> <a href="#fnref:40" class="footnote-back-ref">↩</a></li>
<li id="fn:41">Heitz, Geremy; et al. (2009). "Shape-based object localization for descriptive classification". International Journal of Computer Vision. 84 (1): 40–62. CiteSeerX 10.1.1.142.280. doi:10.1007/s11263-009-0228-y. S2CID 646320. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:41" class="footnote-back-ref">↩</a></li>
<li id="fn:42">Everingham, Mark; et al. (2010). "The pascal visual object classes (voc) challenge". International Journal of Computer Vision. 88 (2): 303–338. doi:10.1007/s11263-009-0275-4. hdl:20.500.11820/88a29de3-6220-442b-ab2d-284210cf72d6. S2CID 4246903. <a href="https://www.research.ed.ac.uk/portal/en/publications/the-pascal-visual-object-classes-voc-challenge(88a29de3-6220-442b-ab2d-284210cf72d6).html" target="_blank">https://www.research.ed.ac.uk/portal/en/publications/the-pascal-visual-object-classes-voc-challenge(88a29de3-6220-442b-ab2d-284210cf72d6).html</a> <a href="#fnref:42" class="footnote-back-ref">↩</a></li>
<li id="fn:43">Felzenszwalb, Pedro F.; et al. (2010). "Object detection with discriminatively trained part-based models". IEEE Transactions on Pattern Analysis and Machine Intelligence. 32 (9): 1627–1645. CiteSeerX 10.1.1.153.2745. doi:10.1109/tpami.2009.167. PMID 20634557. S2CID 3198903. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:43" class="footnote-back-ref">↩</a></li>
<li id="fn:44">Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. <a href="http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf" target="_blank">http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf</a> <a href="#fnref:44" class="footnote-back-ref">↩</a></li>
<li id="fn:45">Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A procrustean approach to learning binary codes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. <a href="/wiki/Svetlana_Lazebnik" target="_blank">/wiki/Svetlana_Lazebnik</a> <a href="#fnref:45" class="footnote-back-ref">↩</a></li>
<li id="fn:46">Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. <a href="http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf" target="_blank">http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf</a> <a href="#fnref:46" class="footnote-back-ref">↩</a></li>
<li id="fn:47">Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A procrustean approach to learning binary codes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. <a href="/wiki/Svetlana_Lazebnik" target="_blank">/wiki/Svetlana_Lazebnik</a> <a href="#fnref:47" class="footnote-back-ref">↩</a></li>
<li id="fn:48">"CINIC-10 dataset". Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey (2018) CINIC-10 is not ImageNet or CIFAR-10. 2018-10-09. Retrieved 2018-11-13. <a href="http://www.bayeswatch.com/2018/10/09/CINIC/" target="_blank">http://www.bayeswatch.com/2018/10/09/CINIC/</a> <a href="#fnref:48" class="footnote-back-ref">↩</a></li>
<li id="fn:49">"fashion-mnist: A MNIST-like fashion product database. Benchmark :point_right". Zalando Research. 2017-10-07. Retrieved 2017-10-07. <a href="https://github.com/zalandoresearch/fashion-mnist" target="_blank">https://github.com/zalandoresearch/fashion-mnist</a> <a href="#fnref:49" class="footnote-back-ref">↩</a></li>
<li id="fn:50">"notMNIST dataset". Machine Learning, etc. 2011-09-08. Retrieved 2017-10-13. <a href="http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html" target="_blank">http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html</a> <a href="#fnref:50" class="footnote-back-ref">↩</a></li>
<li id="fn:51">Chaladze, G., Kalatozishvili, L. (2017). Linnaeus 5 dataset. Chaladze.com. Retrieved 13 November 2017, from http://chaladze.com/l5/ <a href="http://chaladze.com/l5/" target="_blank">http://chaladze.com/l5/</a> <a href="#fnref:51" class="footnote-back-ref">↩</a></li>
<li id="fn:52">Afifi, Mahmoud (2017-11-12). "Gender recognition and biometric identification using a large dataset of hand images". arXiv:1711.04322 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:52" class="footnote-back-ref">↩</a></li>
<li id="fn:53">Lomonaco, Vincenzo; Maltoni, Davide (2017-10-18). "CORe50: a New Dataset and Benchmark for Continuous Object Recognition". arXiv:1705.03550 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:53" class="footnote-back-ref">↩</a></li>
<li id="fn:54">She, Qi; Feng, Fan; Hao, Xinyue; Yang, Qihan; Lan, Chuanlin; Lomonaco, Vincenzo; Shi, Xuesong; Wang, Zhengwei; Guo, Yao; Zhang, Yimin; Qiao, Fei; Chan, Rosa H.M. (2019-11-15). "OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning". arXiv:1911.06487v2 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:54" class="footnote-back-ref">↩</a></li>
<li id="fn:55">Morozov, Alexei; Sushkova, Olga (2019-06-13). "THz and thermal video data set". Development of the multi-agent logic programming approach to a human behaviour analysis in a multi-channel video surveillance. Moscow: IRE RAS. Retrieved 2019-07-19. <a href="http://www.fullvision.ru/monitoring/description_eng.php" target="_blank">http://www.fullvision.ru/monitoring/description_eng.php</a> <a href="#fnref:55" class="footnote-back-ref">↩</a></li>
<li id="fn:56">Morozov, Alexei; Sushkova, Olga; Kershner, Ivan; Polupanov, Alexander (2019-07-09). "Development of a method of terahertz intelligent video surveillance based on the semantic fusion of terahertz and 3D video images" (PDF). CEUR. 2391: paper19. Retrieved 2019-07-19. <a href="http://ceur-ws.org/Vol-2391/paper19.pdf" target="_blank">http://ceur-ws.org/Vol-2391/paper19.pdf</a> <a href="#fnref:56" class="footnote-back-ref">↩</a></li>
<li id="fn:57">Calli, Berk; Walsman, Aaron; Singh, Arjun; Srinivasa, Siddhartha; Abbeel, Pieter; Dollar, Aaron M. (September 2015). "Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set". IEEE Robotics & Automation Magazine. 22 (3): 36–52. arXiv:1502.03143. doi:10.1109/MRA.2015.2448951. ISSN 1070-9932. <a href="https://ieeexplore.ieee.org/document/7254318" target="_blank">https://ieeexplore.ieee.org/document/7254318</a> <a href="#fnref:57" class="footnote-back-ref">↩</a></li>
<li id="fn:58">Downs, Laura; Francis, Anthony; Koenig, Nate; Kinman, Brandon; Hickman, Ryan; Reymann, Krista; McHugh, Thomas B.; Vanhoucke, Vincent (2022-05-23). "Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items". 2022 International Conference on Robotics and Automation (ICRA). IEEE. pp. 2553–2560. arXiv:2204.11918. doi:10.1109/ICRA46639.2022.9811809. ISBN 978-1-7281-9681-7. <a href="978-1-7281-9681-7" target="_blank">978-1-7281-9681-7</a> <a href="#fnref:58" class="footnote-back-ref">↩</a></li>
<li id="fn:59">"Princeton Shape Benchmark". shape.cs.princeton.edu. Retrieved 2025-03-07. <a href="https://shape.cs.princeton.edu/benchmark/main.html" target="_blank">https://shape.cs.princeton.edu/benchmark/main.html</a> <a href="#fnref:59" class="footnote-back-ref">↩</a></li>
<li id="fn:60">Shilane, P.; Min, P.; Kazhdan, M.; Funkhouser, T. (2004). "The princeton shape benchmark". Proceedings Shape Modeling Applications, 2004. IEEE. pp. 167–388. doi:10.1109/SMI.2004.1314504. ISBN 978-0-7695-2075-9. <a href="978-0-7695-2075-9" target="_blank">978-0-7695-2075-9</a> <a href="#fnref:60" class="footnote-back-ref">↩</a></li>
<li id="fn:61">Janoch, Allison; Karayev, Sergey; Jia, Yangqing; Barron, Jonathan T.; Fritz, Mario; Saenko, Kate; Darrell, Trevor (2013), Fossati, Andrea; Gall, Juergen; Grabner, Helmut; Ren, Xiaofeng (eds.), "A Category-Level 3D Object Dataset: Putting the Kinect to Work", Consumer Depth Cameras for Computer Vision: Research Topics and Applications, London: Springer, pp. 141–165, doi:10.1007/978-1-4471-4640-7_8, ISBN 978-1-4471-4640-7, retrieved 2025-03-07 <a href="978-1-4471-4640-7" target="_blank">978-1-4471-4640-7</a> <a href="#fnref:61" class="footnote-back-ref">↩</a></li>
<li id="fn:62">Chang, Angel X.; Funkhouser, Thomas; Guibas, Leonidas; Hanrahan, Pat; Huang, Qixing; Li, Zimo; Savarese, Silvio; Savva, Manolis; Song, Shuran (2015-12-09). "ShapeNet: An Information-Rich 3D Model Repository". arXiv:1512.03012 [cs.GR]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:62" class="footnote-back-ref">↩</a></li>
<li id="fn:63">"Computational Vision and Geometry Lab". cvgl.stanford.edu. Retrieved 2025-03-07. <a href="https://cvgl.stanford.edu/projects/objectnet3d/" target="_blank">https://cvgl.stanford.edu/projects/objectnet3d/</a> <a href="#fnref:63" class="footnote-back-ref">↩</a></li>
<li id="fn:64">Xiang, Yu; Kim, Wonhui; Chen, Wei; Ji, Jingwei; Choy, Christopher; Su, Hao; Mottaghi, Roozbeh; Guibas, Leonidas; Savarese, Silvio (2016). "ObjectNet3D: A Large Scale Database for 3D Object Recognition". In Leibe, Bastian; Matas, Jiri; Sebe, Nicu; Welling, Max (eds.). Computer Vision – ECCV 2016. Lecture Notes in Computer Science. Vol. 9912. Cham: Springer International Publishing. pp. 160–176. doi:10.1007/978-3-319-46484-8_10. ISBN 978-3-319-46484-8. <a href="978-3-319-46484-8" target="_blank">978-3-319-46484-8</a> <a href="#fnref:64" class="footnote-back-ref">↩</a></li>
<li id="fn:65">Reizenstein, Jeremy; Shapovalov, Roman; Henzler, Philipp; Sbordone, Luca; Labatut, Patrick; Novotny, David (2021). "Common Objects in 3D: Large-Scale Learning and Evaluation of Real-Life 3D Category Reconstruction": 10901–10911. {{cite journal}}: Cite journal requires |journal= (help) <a href="https://openaccess.thecvf.com/content/ICCV2021/html/Reizenstein_Common_Objects_in_3D_Large-Scale_Learning_and_Evaluation_of_Real-Life_ICCV_2021_paper.html" target="_blank">https://openaccess.thecvf.com/content/ICCV2021/html/Reizenstein_Common_Objects_in_3D_Large-Scale_Learning_and_Evaluation_of_Real-Life_ICCV_2021_paper.html</a> <a href="#fnref:65" class="footnote-back-ref">↩</a></li>
<li id="fn:66">Reizenstein, Jeremy; Shapovalov, Roman; Henzler, Philipp; Sbordone, Luca; Labatut, Patrick; Novotny, David (2021-09-01). "Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction". arXiv:2109.00512 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:66" class="footnote-back-ref">↩</a></li>
<li id="fn:67">Downs, Laura; Francis, Anthony; Koenig, Nate; Kinman, Brandon; Hickman, Ryan; Reymann, Krista; McHugh, Thomas B.; Vanhoucke, Vincent (2022-05-23). "Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items". 2022 International Conference on Robotics and Automation (ICRA). IEEE. pp. 2553–2560. arXiv:2204.11918. doi:10.1109/ICRA46639.2022.9811809. ISBN 978-1-7281-9681-7. <a href="978-1-7281-9681-7" target="_blank">978-1-7281-9681-7</a> <a href="#fnref:67" class="footnote-back-ref">↩</a></li>
<li id="fn:68">Deitke, Matt; Liu, Ruoshi; Wallingford, Matthew; Ngo, Huong; Michel, Oscar; Kusupati, Aditya; Fan, Alan; Laforte, Christian; Voleti, Vikram; Gadre, Samir Yitzhak; VanderBilt, Eli; Kembhavi, Aniruddha; Vondrick, Carl; Gkioxari, Georgia; Ehsani, Kiana (2023-12-15). "Objaverse-XL: A Universe of 10M+ 3D Objects". Advances in Neural Information Processing Systems. 36: 35799–35813. <a href="https://proceedings.neurips.cc/paper_files/paper/2023/hash/70364304877b5e767de4e9a2a511be0c-Abstract-Datasets_and_Benchmarks.html" target="_blank">https://proceedings.neurips.cc/paper_files/paper/2023/hash/70364304877b5e767de4e9a2a511be0c-Abstract-Datasets_and_Benchmarks.html</a> <a href="#fnref:68" class="footnote-back-ref">↩</a></li>
<li id="fn:69">Wu, Tong; Zhang, Jiarui; Fu, Xiao; Wang, Yuxin; Ren, Jiawei; Pan, Liang; Wu, Wayne; Yang, Lei; Wang, Jiaqi; Qian, Chen; Lin, Dahua; Liu, Ziwei (2023). "OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation": 803–814. {{cite journal}}: Cite journal requires |journal= (help) <a href="https://openaccess.thecvf.com/content/CVPR2023/html/Wu_OmniObject3D_Large-Vocabulary_3D_Object_Dataset_for_Realistic_Perception_Reconstruction_and_CVPR_2023_paper.html" target="_blank">https://openaccess.thecvf.com/content/CVPR2023/html/Wu_OmniObject3D_Large-Vocabulary_3D_Object_Dataset_for_Realistic_Perception_Reconstruction_and_CVPR_2023_paper.html</a> <a href="#fnref:69" class="footnote-back-ref">↩</a></li>
<li id="fn:70">"OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation". omniobject3d.github.io. Retrieved 2025-03-07. <a href="https://omniobject3d.github.io/" target="_blank">https://omniobject3d.github.io/</a> <a href="#fnref:70" class="footnote-back-ref">↩</a></li>
<li id="fn:71">"UnCommon Objects in 3D". uco3d.github.io. Retrieved 2025-03-07. <a href="https://uco3d.github.io/" target="_blank">https://uco3d.github.io/</a> <a href="#fnref:71" class="footnote-back-ref">↩</a></li>
<li id="fn:72">Liu, Xingchen; Tayal, Piyush; Wang, Jianyuan; Zarzar, Jesus; Monnier, Tom; Tertikas, Konstantinos; Duan, Jiali; Toisoul, Antoine; Zhang, Jason Y. (2025-01-13). "UnCommon Objects in 3D". arXiv:2501.07574 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:72" class="footnote-back-ref">↩</a></li>
<li id="fn:73">M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset." In CVPR Workshop on The Future of Datasets in Vision, 2015. <a href="https://www.cityscapes-dataset.com/wordpress/wp-content/papercite-data/pdf/cordts2015cvprw.pdf" target="_blank">https://www.cityscapes-dataset.com/wordpress/wp-content/papercite-data/pdf/cordts2015cvprw.pdf</a> <a href="#fnref:73" class="footnote-back-ref">↩</a></li>
<li id="fn:74">Houben, Sebastian, et al. "Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013. <a href="https://www.researchgate.net/profile/Sebastian_Houben/publication/242346625_Detection_of_Traffic_Signs_in_Real-World_Images_The_German_Traffic_Sign_Detection_Benchmark/links/0046352a03ec384e97000000/Detection-of-Traffic-Signs-in-Real-World-Images-The-German-Traffic-Sign-Detection-Benchmark.pdf" target="_blank">https://www.researchgate.net/profile/Sebastian_Houben/publication/242346625_Detection_of_Traffic_Signs_in_Real-World_Images_The_German_Traffic_Sign_Detection_Benchmark/links/0046352a03ec384e97000000/Detection-of-Traffic-Signs-in-Real-World-Images-The-German-Traffic-Sign-Detection-Benchmark.pdf</a> <a href="#fnref:74" class="footnote-back-ref">↩</a></li>
<li id="fn:75">Mathias, Mayeul, et al. "Traffic sign recognition—How far are we from the solution?." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013. <a href="http://www.varcity.eu/paper/ijcnn2013_mathias_trafficsign.pdf" target="_blank">http://www.varcity.eu/paper/ijcnn2013_mathias_trafficsign.pdf</a> <a href="#fnref:75" class="footnote-back-ref">↩</a></li>
<li id="fn:76">Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. <a href="https://www.cvlibs.net/publications/Geiger2012CVPR.pdf" target="_blank">https://www.cvlibs.net/publications/Geiger2012CVPR.pdf</a> <a href="#fnref:76" class="footnote-back-ref">↩</a></li>
<li id="fn:77">Sturm, Jürgen, et al. "A benchmark for the evaluation of RGB-D SLAM systems." Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 2012. <a href="http://jsturm.de/publications/data/sturm12iros.pdf" target="_blank">http://jsturm.de/publications/data/sturm12iros.pdf</a> <a href="#fnref:77" class="footnote-back-ref">↩</a></li>
<li id="fn:78">The KITTI Vision Benchmark Suite on YouTube <a href="https://www.youtube.com/watch?v=KXpZ6B1YB_k" target="_blank">https://www.youtube.com/watch?v=KXpZ6B1YB_k</a> <a href="#fnref:78" class="footnote-back-ref">↩</a></li>
<li id="fn:79">Kragh, Mikkel F.; et al. (2017). "FieldSAFE – Dataset for Obstacle Detection in Agriculture". Sensors. 17 (11): 2579. arXiv:1709.03526. Bibcode:2017Senso..17.2579K. doi:10.3390/s17112579. PMC 5713196. PMID 29120383. <a href="https://vision.eng.au.dk/fieldsafe" target="_blank">https://vision.eng.au.dk/fieldsafe</a> <a href="#fnref:79" class="footnote-back-ref">↩</a></li>
<li id="fn:80">"Papers with Code - Daimler Monocular Pedestrian Detection Dataset". paperswithcode.com. Retrieved 5 May 2023. <a href="https://paperswithcode.com/dataset/daimler-monocular-pedestrian-detection" target="_blank">https://paperswithcode.com/dataset/daimler-monocular-pedestrian-detection</a> <a href="#fnref:80" class="footnote-back-ref">↩</a></li>
<li id="fn:81">Enzweiler, Markus; Gavrila, Dariu M. (December 2009). "Monocular Pedestrian Detection: Survey and Experiments". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (12): 2179–2195. doi:10.1109/TPAMI.2008.260. ISSN 1939-3539. PMID 19834140. S2CID 1192198. <a href="https://ieeexplore.ieee.org/document/4657363" target="_blank">https://ieeexplore.ieee.org/document/4657363</a> <a href="#fnref:81" class="footnote-back-ref">↩</a></li>
<li id="fn:82">Yin, Guojun; Liu, Bin; Zhu, Huihui; Gong, Tao; Yu, Nenghai (28 July 2020). "A Large Scale Urban Surveillance Video Dataset for Multiple-Object Tracking and Behavior Analysis". arXiv:1904.11784 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:82" class="footnote-back-ref">↩</a></li>
<li id="fn:83">"Object Recognition in Video Dataset". mi.eng.cam.ac.uk. Retrieved 5 May 2023. <a href="https://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/" target="_blank">https://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/</a> <a href="#fnref:83" class="footnote-back-ref">↩</a></li>
<li id="fn:84">Brostow, Gabriel J.; Shotton, Jamie; Fauqueur, Julien; Cipolla, Roberto (2008). "Segmentation and Recognition Using Structure from Motion Point Clouds". Computer Vision – ECCV 2008. Lecture Notes in Computer Science. Vol. 5302. Springer. pp. 44–57. doi:10.1007/978-3-540-88682-2_5. ISBN 978-3-540-88681-5. <a href="978-3-540-88681-5" target="_blank">978-3-540-88681-5</a> <a href="#fnref:84" class="footnote-back-ref">↩</a></li>
<li id="fn:85">Brostow, Gabriel J.; Fauqueur, Julien; Cipolla, Roberto (15 January 2009). "Semantic object classes in video: A high-definition ground truth database". Pattern Recognition Letters. 30 (2): 88–97. Bibcode:2009PaReL..30...88B. doi:10.1016/j.patrec.2008.04.005. ISSN 0167-8655. <a href="https://www.sciencedirect.com/science/article/abs/pii/S0167865508001220" target="_blank">https://www.sciencedirect.com/science/article/abs/pii/S0167865508001220</a> <a href="#fnref:85" class="footnote-back-ref">↩</a></li>
<li id="fn:86">"WildDash 2 Benchmark". wilddash.cc. Retrieved 5 May 2023. <a href="https://wilddash.cc/railsem19" target="_blank">https://wilddash.cc/railsem19</a> <a href="#fnref:86" class="footnote-back-ref">↩</a></li>
<li id="fn:87">Zendel, Oliver; Murschitz, Markus; Zeilinger, Marcel; Steininger, Daniel; Abbasi, Sara; Beleznai, Csaba (June 2019). "RailSem19: A Dataset for Semantic Rail Scene Understanding". 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 1221–1229. doi:10.1109/CVPRW.2019.00161. ISBN 978-1-7281-2506-0. S2CID 198166233. <a href="978-1-7281-2506-0" target="_blank">978-1-7281-2506-0</a> <a href="#fnref:87" class="footnote-back-ref">↩</a></li>
<li id="fn:88">"The Boreas Dataset". www.boreas.utias.utoronto.ca. Retrieved 5 May 2023. <a href="https://www.boreas.utias.utoronto.ca/#/" target="_blank">https://www.boreas.utias.utoronto.ca/#/</a> <a href="#fnref:88" class="footnote-back-ref">↩</a></li>
<li id="fn:89">Burnett, Keenan; Yoon, David J.; Wu, Yuchen; Li, Andrew Zou; Zhang, Haowei; Lu, Shichen; Qian, Jingxing; Tseng, Wei-Kang; Lambert, Andrew; Leung, Keith Y. K.; Schoellig, Angela P.; Barfoot, Timothy D. (26 January 2023). "Boreas: A Multi-Season Autonomous Driving Dataset". arXiv:2203.10168 [cs.RO]. <a href="/wiki/Angela_Schoellig" target="_blank">/wiki/Angela_Schoellig</a> <a href="#fnref:89" class="footnote-back-ref">↩</a></li>
<li id="fn:90">"Bosch Small Traffic Lights Dataset". hci.iwr.uni-heidelberg.de. 1 March 2017. Retrieved 5 May 2023. <a href="https://hci.iwr.uni-heidelberg.de/content/bosch-small-traffic-lights-dataset" target="_blank">https://hci.iwr.uni-heidelberg.de/content/bosch-small-traffic-lights-dataset</a> <a href="#fnref:90" class="footnote-back-ref">↩</a></li>
<li id="fn:91">Behrendt, Karsten; Novak, Libor; Botros, Rami (May 2017). "A deep learning approach to traffic lights: Detection, tracking, and classification". 2017 IEEE International Conference on Robotics and Automation (ICRA). pp. 1370–1377. doi:10.1109/ICRA.2017.7989163. ISBN 978-1-5090-4633-1. S2CID 6257133. <a href="978-1-5090-4633-1" target="_blank">978-1-5090-4633-1</a> <a href="#fnref:91" class="footnote-back-ref">↩</a></li>
<li id="fn:92">"FRSign Dataset". frsign.irt-systemx.fr. Retrieved 5 May 2023. <a href="https://frsign.irt-systemx.fr/" target="_blank">https://frsign.irt-systemx.fr/</a> <a href="#fnref:92" class="footnote-back-ref">↩</a></li>
<li id="fn:93">Harb, Jeanine; Rébéna, Nicolas; Chosidow, Raphaël; Roblin, Grégoire; Potarusov, Roman; Hajri, Hatem (5 February 2020). "FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains". arXiv:2002.05665 [cs.CY]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:93" class="footnote-back-ref">↩</a></li>
<li id="fn:94">"ifs-rwth-aachen/GERALD". Chair and Institute for Rail Vehicles and Transport Systems. 30 April 2023. Retrieved 5 May 2023. <a href="https://github.com/ifs-rwth-aachen/GERALD" target="_blank">https://github.com/ifs-rwth-aachen/GERALD</a> <a href="#fnref:94" class="footnote-back-ref">↩</a></li>
<li id="fn:95">Leibner, Philipp; Hampel, Fabian; Schindler, Christian (3 April 2023). "GERALD: A novel dataset for the detection of German mainline railway signals". Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit. 237 (10): 1332–1342. doi:10.1177/09544097231166472. ISSN 0954-4097. S2CID 257939937. <a href="https://journals.sagepub.com/doi/abs/10.1177/09544097231166472" target="_blank">https://journals.sagepub.com/doi/abs/10.1177/09544097231166472</a> <a href="#fnref:95" class="footnote-back-ref">↩</a></li>
<li id="fn:96">Wojek, Christian; Walk, Stefan; Schiele, Bernt (June 2009). "Multi-cue onboard pedestrian detection". 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 794–801. doi:10.1109/CVPR.2009.5206638. ISBN 978-1-4244-3992-8. S2CID 18000078. <a href="978-1-4244-3992-8" target="_blank">978-1-4244-3992-8</a> <a href="#fnref:96" class="footnote-back-ref">↩</a></li>
<li id="fn:97">Toprak, Tuğçe; Aydın, Burak; Belenlioğlu, Burak; Güzeliş, Cüneyt; Selver, M. Alper (5 April 2020). "Conditional Weighted Ensemble of Transferred Models for Camera Based Onboard Pedestrian Detection in Railway Driver Support Systems". IEEE Transactions on Vehicular Technology: 1. doi:10.1109/TVT.2020.2983825. S2CID 216510283. Retrieved 5 May 2023. <a href="https://zenodo.org/record/3741742" target="_blank">https://zenodo.org/record/3741742</a> <a href="#fnref:97" class="footnote-back-ref">↩</a></li>
<li id="fn:98">Toprak, Tugce; Belenlioglu, Burak; Aydın, Burak; Guzelis, Cuneyt; Selver, M. Alper (May 2020). "Conditional Weighted Ensemble of Transferred Models for Camera Based Onboard Pedestrian Detection in Railway Driver Support Systems". IEEE Transactions on Vehicular Technology. 69 (5): 5041–5054. doi:10.1109/TVT.2020.2983825. ISSN 1939-9359. S2CID 216510283. <a href="https://ieeexplore.ieee.org/document/9050835" target="_blank">https://ieeexplore.ieee.org/document/9050835</a> <a href="#fnref:98" class="footnote-back-ref">↩</a></li>
<li id="fn:99">Tilly, Roman; Neumaier, Philipp; Schwalbe, Karsten; Klasek, Pavel; Tagiew, Rustam; Denzler, Patrick; Klockau, Tobias; Boekhoff, Martin; Köppel, Martin (2023). "Open Sensor Data for Rail 2023". FID Move (in German). doi:10.57806/9mv146r0. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:99" class="footnote-back-ref">↩</a></li>
<li id="fn:100">Tagiew, Rustam; Köppel, Martin; Schwalbe, Karsten; Denzler, Patrick; Neumaier, Philipp; Klockau, Tobias; Boekhoff, Martin; Klasek, Pavel; Tilly, Roman (4 May 2023). "OSDaR23: Open Sensor Data for Rail 2023". 2023 8th International Conference on Robotics and Automation Engineering (ICRAE). pp. 270–276. arXiv:2305.03001. doi:10.1109/ICRAE59816.2023.10458449. ISBN 979-8-3503-2765-6. <a href="979-8-3503-2765-6" target="_blank">979-8-3503-2765-6</a> <a href="#fnref:100" class="footnote-back-ref">↩</a></li>
<li id="fn:101">"Home". Argoverse. Retrieved 5 May 2023. <a href="https://www.argoverse.org/" target="_blank">https://www.argoverse.org/</a> <a href="#fnref:101" class="footnote-back-ref">↩</a></li>
<li id="fn:102">Chang, Ming-Fang; Lambert, John; Sangkloy, Patsorn; Singh, Jagjeet; Bak, Slawomir; Hartnett, Andrew; Wang, De; Carr, Peter; Lucey, Simon; Ramanan, Deva; Hays, James (6 November 2019). "Argoverse: 3D Tracking and Forecasting with Rich Maps". arXiv:1911.02620 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:102" class="footnote-back-ref">↩</a></li>
<li id="fn:103">Kharroubi, Abderrazzaq; Ballouch, Zouhair; Hajji, Rafika; Yarroudh, Anass; Billen, Roland (9 April 2024). "Multi-Context Point Cloud Dataset and Machine Learning for Railway Semantic Segmentation". Infrastructures. 9 (4): 71. doi:10.3390/infrastructures9040071. <a href="https://doi.org/10.3390%2Finfrastructures9040071" target="_blank">https://doi.org/10.3390%2Finfrastructures9040071</a> <a href="#fnref:103" class="footnote-back-ref">↩</a></li>
<li id="fn:104">Qiu, Bo; Zhou, Yuzhou; Dai, Lei; Wang, Bing; Li, Jianping; Dong, Zhen; Wen, Chenglu; Ma, Zhiliang; Yang, Bisheng (December 2024). "WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation". IEEE Transactions on Intelligent Transportation Systems. 25 (12): 20900–20916. doi:10.1109/TITS.2024.3469546. ISSN 1558-0016. <a href="https://ieeexplore.ieee.org/document/10716569" target="_blank">https://ieeexplore.ieee.org/document/10716569</a> <a href="#fnref:104" class="footnote-back-ref">↩</a></li>
<li id="fn:105">Chen, Zhichao; Yang, Jie; Feng, Zhicheng; Zhu, Hao (16 January 2024). "RailFOD23: A dataset for foreign object detection on railroad transmission lines". Scientific Data. 11 (1): 72. Bibcode:2024NatSD..11...72C. doi:10.1038/s41597-024-02918-9. ISSN 2052-4463. PMC 10791632. PMID 38228610. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10791632" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10791632</a> <a href="#fnref:105" class="footnote-back-ref">↩</a></li>
<li id="fn:106">Khemmar, Redouane; Mauri, Antoine; Dulompont, Camille; Gajula, Jayadeep; Vauchey, Vincent; Haddad, Madjid; Boutteau, Rémi (22 May 2022). "Road and Railway Smart Mobility: A High-Definition Ground Truth Hybrid Dataset". Sensors. 22 (10): 3922. Bibcode:2022Senso..22.3922K. doi:10.3390/s22103922. PMC 9143394. PMID 35632331. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9143394" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9143394</a> <a href="#fnref:106" class="footnote-back-ref">↩</a></li>
<li id="fn:107">ICONS 2022: the seventeenth International Conference on Systems: April 24-28, 2022, Barcelona, Spain. Wilmington, DE, USA: IARIA. 2022. ISBN 978-1-61208-941-6. <a href="978-1-61208-941-6" target="_blank">978-1-61208-941-6</a> <a href="#fnref:107" class="footnote-back-ref">↩</a></li>
<li id="fn:108">Jiang, Tengping; Li, Shiwei; Zhang, Qinyu; Wang, Guangshuai; Zhang, Zequn; Zeng, Fankun; An, Peng; Jin, Xin; Liu, Shan; Wang, Yongjun (2024). "RailPC: A large-scale railway point cloud semantic segmentation dataset". CAAI Transactions on Intelligence Technology. 9 (6): 1548–1560. doi:10.1049/cit2.12349. ISSN 2468-2322. <a href="https://doi.org/10.1049%2Fcit2.12349" target="_blank">https://doi.org/10.1049%2Fcit2.12349</a> <a href="#fnref:108" class="footnote-back-ref">↩</a></li>
<li id="fn:109">Abid, Mahdi; Teixeira, Mathis; Mahtani, Ankur; Laurent, Thomas (2024). "RailCloud-HdF: A Large-Scale Point Cloud Dataset for Railway Scene Semantic Segmentation". Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. pp. 159–170. doi:10.5220/0012394800003660. ISBN 978-989-758-679-8. <a href="978-989-758-679-8" target="_blank">978-989-758-679-8</a> <a href="#fnref:109" class="footnote-back-ref">↩</a></li>
<li id="fn:110">Rustam, Tagiew; Ilkay, Wunderlich; Philipp, Zanitzer; Mark, Sastuba; Carsten, Knoll; Kilian, Göller; Haadia, Amjad; Steffen, Seitz (2025). "Görlitz Rail Test Center CV Dataset 2024 (RailGoerl24)". German National Library of Science and Technology. <a href="https://data.fid-move.de/de/dataset/railgoerl24" target="_blank">https://data.fid-move.de/de/dataset/railgoerl24</a> <a href="#fnref:110" class="footnote-back-ref">↩</a></li>
<li id="fn:111">"Face Recognition Homepage - Databases". www.face-rec.org. Retrieved 2025-04-26. <a href="https://www.face-rec.org/databases/" target="_blank">https://www.face-rec.org/databases/</a> <a href="#fnref:111" class="footnote-back-ref">↩</a></li>
<li id="fn:112">Huang, Gary B., et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Vol. 1. No. 2. Technical Report 07-49, University of Massachusetts, Amherst, 2007. <a href="https://hal.inria.fr/docs/00/32/19/23/PDF/Huang_long_eccv2008-lfw.pdf" target="_blank">https://hal.inria.fr/docs/00/32/19/23/PDF/Huang_long_eccv2008-lfw.pdf</a> <a href="#fnref:112" class="footnote-back-ref">↩</a></li>
<li id="fn:113">"LFW Face Database : Main". web.archive.org. 2012-12-01. Archived from the original on 2012-12-01. Retrieved 2025-04-26. <a href="https://web.archive.org/web/20121201044531/http://vis-www.cs.umass.edu/lfw" target="_blank">https://web.archive.org/web/20121201044531/http://vis-www.cs.umass.edu/lfw</a> <a href="#fnref:113" class="footnote-back-ref">↩</a></li>
<li id="fn:114">Zafeiriou, S.; Kollias, D.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; Kotsia, I. (2017). "Aff-Wild: Valence and Arousal 'In-the-Wild' Challenge" (PDF). 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 1980–1987. doi:10.1109/CVPRW.2017.248. ISBN 978-1-5386-0733-6. S2CID 3107614. <a href="978-1-5386-0733-6" target="_blank">978-1-5386-0733-6</a> <a href="#fnref:114" class="footnote-back-ref">↩</a></li>
<li id="fn:115">Kollias, D.; Tzirakis, P.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; Schuller, B.; Kotsia, I.; Zafeiriou, S. (2019). "Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond". International Journal of Computer Vision. 127 (6–7): 907–929. arXiv:1804.10938. doi:10.1007/s11263-019-01158-4. S2CID 13679040. <a href="https://rdcu.be/bmGm2" target="_blank">https://rdcu.be/bmGm2</a> <a href="#fnref:115" class="footnote-back-ref">↩</a></li>
<li id="fn:116">Kollias, D.; Zafeiriou, S. (2019). "Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface" (PDF). British Machine Vision Conference (BMVC), 2019. arXiv:1910.04855. <a href="https://bmvc2019.org/wp-content/uploads/papers/0399-paper.pdf" target="_blank">https://bmvc2019.org/wp-content/uploads/papers/0399-paper.pdf</a> <a href="#fnref:116" class="footnote-back-ref">↩</a></li>
<li id="fn:117">Kollias, D.; Schulc, A.; Hajiyev, E.; Zafeiriou, S. (2020). "Analysing Affective Behavior in the First ABAW 2020 Competition". 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). pp. 637–643. arXiv:2001.11409. doi:10.1109/FG47880.2020.00126. ISBN 978-1-7281-3079-8. S2CID 210966051. <a href="978-1-7281-3079-8" target="_blank">978-1-7281-3079-8</a> <a href="#fnref:117" class="footnote-back-ref">↩</a></li>
<li id="fn:118">Phillips, P. Jonathon; et al. (1998). "The FERET database and evaluation procedure for face-recognition algorithms". Image and Vision Computing. 16 (5): 295–306. doi:10.1016/s0262-8856(97)00070-x. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:118" class="footnote-back-ref">↩</a></li>
<li id="fn:119">Wiskott, Laurenz; et al. (1997). "Face recognition by elastic bunch graph matching". IEEE Transactions on Pattern Analysis and Machine Intelligence. 19 (7): 775–779. CiteSeerX 10.1.1.44.2321. doi:10.1109/34.598235. S2CID 30523165. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:119" class="footnote-back-ref">↩</a></li>
<li id="fn:120">Livingstone, Steven R.; Russo, Frank A. (2018). "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English". PLOS ONE. 13 (5): e0196391. Bibcode:2018PLoSO..1396391L. doi:10.1371/journal.pone.0196391. PMC 5955500. PMID 29768426. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5955500" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5955500</a> <a href="#fnref:120" class="footnote-back-ref">↩</a></li>
<li id="fn:121">Livingstone, Steven R.; Russo, Frank A. (2018). "Emotion". The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). doi:10.5281/zenodo.1188976. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:121" class="footnote-back-ref">↩</a></li>
<li id="fn:122">Grgic, Mislav; Delac, Kresimir; Grgic, Sonja (2011). "SCface–surveillance cameras face database". Multimedia Tools and Applications. 51 (3): 863–879. doi:10.1007/s11042-009-0417-2. S2CID 207218990. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:122" class="footnote-back-ref">↩</a></li>
<li id="fn:123">Wallace, Roy, et al. "Inter-session variability modelling and joint factor analysis for face authentication." Biometrics (IJCB), 2011 International Joint Conference on. IEEE, 2011. <a href="https://repository.ubn.ru.nl/bitstream/handle/2066/94489/94489.pdf" target="_blank">https://repository.ubn.ru.nl/bitstream/handle/2066/94489/94489.pdf</a> <a href="#fnref:123" class="footnote-back-ref">↩</a></li>
<li id="fn:124">Georghiades, A. "Yale face database". Center for Computational Vision and Control at Yale University. 2: 1997. <a href="http://CVC.yale.edu/Projects/Yalefaces/Yalefa" target="_blank">http://CVC.yale.edu/Projects/Yalefaces/Yalefa</a> <a href="#fnref:124" class="footnote-back-ref">↩</a></li>
<li id="fn:125">Nguyen, Duy; et al. (2006). "Real-time face detection and lip feature extraction using field-programmable gate arrays". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. 36 (4): 902–912. CiteSeerX 10.1.1.156.9848. doi:10.1109/tsmcb.2005.862728. PMID 16903373. S2CID 7334355. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:125" class="footnote-back-ref">↩</a></li>
<li id="fn:126">Kanade, Takeo, Jeffrey F. Cohn, and Yingli Tian. "Comprehensive database for facial expression analysis." Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on. IEEE, 2000. <a href="/wiki/Takeo_Kanade" target="_blank">/wiki/Takeo_Kanade</a> <a href="#fnref:126" class="footnote-back-ref">↩</a></li>
<li id="fn:127">Zeng, Zhihong; et al. (2009). "A survey of affect recognition methods: Audio, visual, and spontaneous expressions". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (1): 39–58. CiteSeerX 10.1.1.144.217. doi:10.1109/tpami.2008.52. PMID 19029545. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:127" class="footnote-back-ref">↩</a></li>
<li id="fn:128">Lyons, Michael; Kamachi, Miyuki; Gyoba, Jiro (1998). "Facial expression images". The Japanese Female Facial Expression (JAFFE) Database. doi:10.5281/zenodo.3451524. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:128" class="footnote-back-ref">↩</a></li>
<li id="fn:129">Lyons, Michael; Akamatsu, Shigeru; Kamachi, Miyuki; Gyoba, Jiro "Coding facial expressions with Gabor wavelets." Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on. IEEE, 1998. <a href="https://zenodo.org/record/3430156" target="_blank">https://zenodo.org/record/3430156</a> <a href="#fnref:129" class="footnote-back-ref">↩</a></li>
<li id="fn:130">Ng, Hong-Wei, and Stefan Winkler. "A data-driven approach to cleaning large face datasets Archived 6 December 2019 at the Wayback Machine." Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014. <a href="http://vintage.winklerbros.net/Publications/icip2014a.pdf" target="_blank">http://vintage.winklerbros.net/Publications/icip2014a.pdf</a> <a href="#fnref:130" class="footnote-back-ref">↩</a></li>
<li id="fn:131">RoyChowdhury, Aruni; Lin, Tsung-Yu; Maji, Subhransu; Learned-Miller, Erik (2015). "One-to-many face recognition with bilinear CNNs". arXiv:1506.01342 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:131" class="footnote-back-ref">↩</a></li>
<li id="fn:132">Jesorsky, Oliver, Klaus J. Kirchberg, and Robert W. Frischholz. "Robust face detection using the hausdorff distance." Audio-and video-based biometric person authentication. Springer Berlin Heidelberg, 2001. <a href="#fnref:132" class="footnote-back-ref">↩</a></li>
<li id="fn:133">Bhatt, Rajen B., et al. "Efficient skin region segmentation using low complexity fuzzy decision tree model." India Conference (INDICON), 2009 Annual IEEE. IEEE, 2009. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.708.9158&rep=rep1&type=pdf" target="_blank">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.708.9158&rep=rep1&type=pdf</a> <a href="#fnref:133" class="footnote-back-ref">↩</a></li>
<li id="fn:134">Lingala, Mounika; et al. (2014). "Fuzzy logic color detection: Blue areas in melanoma dermoscopy images". Computerized Medical Imaging and Graphics. 38 (5): 403–410. doi:10.1016/j.compmedimag.2014.03.007. PMC 4287461. PMID 24786720. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4287461" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4287461</a> <a href="#fnref:134" class="footnote-back-ref">↩</a></li>
<li id="fn:135">Maes, Chris, et al. "Feature detection on 3D face surfaces for pose normalisation and recognition." Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on. IEEE, 2010. <a href="https://lirias.kuleuven.be/retrieve/135678" target="_blank">https://lirias.kuleuven.be/retrieve/135678</a> <a href="#fnref:135" class="footnote-back-ref">↩</a></li>
<li id="fn:136">Savran, Arman, et al. "Bosphorus database for 3D face analysis." Biometrics and Identity Management. Springer Berlin Heidelberg, 2008. 47–56. <a href="https://web.archive.org/web/20190222192331/http://pdfs.semanticscholar.org/4254/fbba3846008f50671edc9cf70b99d7304543.pdf" target="_blank">https://web.archive.org/web/20190222192331/http://pdfs.semanticscholar.org/4254/fbba3846008f50671edc9cf70b99d7304543.pdf</a> <a href="#fnref:136" class="footnote-back-ref">↩</a></li>
<li id="fn:137">Heseltine, Thomas, Nick Pears, and Jim Austin. "Three-dimensional face recognition: An eigensurface approach." Image Processing, 2004. ICIP'04. 2004 International Conference on. Vol. 2. IEEE, 2004. <a href="http://eprints.whiterose.ac.uk/1526/01/austinj4.pdf" target="_blank">http://eprints.whiterose.ac.uk/1526/01/austinj4.pdf</a> <a href="#fnref:137" class="footnote-back-ref">↩</a></li>
<li id="fn:138">Ge, Yun; et al. (2011). "3D Novel Face Sample Modeling for Face Recognition". Journal of Multimedia. 6 (5): 467–475. CiteSeerX 10.1.1.461.9710. doi:10.4304/jmm.6.5.467-475. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:138" class="footnote-back-ref">↩</a></li>
<li id="fn:139">Wang, Yueming; Liu, Jianzhuang; Tang, Xiaoou (2010). "Robust 3D face recognition by local shape difference boosting". IEEE Transactions on Pattern Analysis and Machine Intelligence. 32 (10): 1858–1870. CiteSeerX 10.1.1.471.2424. doi:10.1109/tpami.2009.200. PMID 20724762. S2CID 15263913. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:139" class="footnote-back-ref">↩</a></li>
<li id="fn:140">Zhong, Cheng, Zhenan Sun, and Tieniu Tan. "Robust 3D face recognition using learned visual codebook." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.580.8534&rep=rep1&type=pdf" target="_blank">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.580.8534&rep=rep1&type=pdf</a> <a href="#fnref:140" class="footnote-back-ref">↩</a></li>
<li id="fn:141">Zhao, G.; Huang, X.; Taini, M.; Li, S. Z.; Pietikäinen, M. (2011). "Facial expression recognition from near-infrared videos" (PDF). Image and Vision Computing. 29 (9): 607–619. doi:10.1016/j.imavis.2011.07.002.[dead link] <a href="http://www.academia.edu/download/42229488/Image_and_Vision_Computing20160206-29020-1auzaon.pdf" target="_blank">http://www.academia.edu/download/42229488/Image_and_Vision_Computing20160206-29020-1auzaon.pdf</a> <a href="#fnref:141" class="footnote-back-ref">↩</a></li>
<li id="fn:142">Soyel, Hamit, and Hasan Demirel. "Facial expression recognition using 3D facial feature distances." Image Analysis and Recognition. Springer Berlin Heidelberg, 2007. 831–838. <a href="https://pdfs.semanticscholar.org/cf81/4b618fcbc9a556cdce225e74a8806867ba84.pdf" target="_blank">https://pdfs.semanticscholar.org/cf81/4b618fcbc9a556cdce225e74a8806867ba84.pdf</a> <a href="#fnref:142" class="footnote-back-ref">↩</a></li>
<li id="fn:143">Bowyer, Kevin W.; Chang, Kyong; Flynn, Patrick (2006). "A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition". Computer Vision and Image Understanding. 101 (1): 1–15. CiteSeerX 10.1.1.134.8784. doi:10.1016/j.cviu.2005.05.005. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:143" class="footnote-back-ref">↩</a></li>
<li id="fn:144">Tan, Xiaoyang; Triggs, Bill (2010). "Enhanced local texture feature sets for face recognition under difficult lighting conditions". IEEE Transactions on Image Processing. 19 (6): 1635–1650. Bibcode:2010ITIP...19.1635T. CiteSeerX 10.1.1.105.3355. doi:10.1109/tip.2010.2042645. PMID 20172829. S2CID 4943234. <a href="/wiki/Bibcode_(identifier)" target="_blank">/wiki/Bibcode_(identifier)</a> <a href="#fnref:144" class="footnote-back-ref">↩</a></li>
<li id="fn:145">Mousavi, Mir Hashem; Faez, Karim; Asghari, Amin (2008). "Three Dimensional Face Recognition Using SVM Classifier". Seventh IEEE/ACIS International Conference on Computer and Information Science (Icis 2008). pp. 208–213. doi:10.1109/ICIS.2008.77. ISBN 978-0-7695-3131-1. S2CID 2710422. <a href="978-0-7695-3131-1" target="_blank">978-0-7695-3131-1</a> <a href="#fnref:145" class="footnote-back-ref">↩</a></li>
<li id="fn:146">Amberg, Brian; Knothe, Reinhard; Vetter, Thomas (2008). "Expression invariant 3D face recognition with a Morphable Model" (PDF). 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition. pp. 1–6. doi:10.1109/AFGR.2008.4813376. ISBN 978-1-4244-2154-1. S2CID 5651453. Archived from the original (PDF) on 28 July 2018. Retrieved 6 August 2019. <a href="978-1-4244-2154-1" target="_blank">978-1-4244-2154-1</a> <a href="#fnref:146" class="footnote-back-ref">↩</a></li>
<li id="fn:147">Irfanoglu, M.O.; Gokberk, B.; Akarun, L. (2004). "3D shape-based face recognition using automatically registered facial surfaces". Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. pp. 183–186 Vol.4. doi:10.1109/ICPR.2004.1333734. ISBN 0-7695-2128-2. S2CID 10987293. <a href="0-7695-2128-2" target="_blank">0-7695-2128-2</a> <a href="#fnref:147" class="footnote-back-ref">↩</a></li>
<li id="fn:148">Beumier, Charles; Acheroy, Marc (2001). "Face verification from 3D and grey level clues". Pattern Recognition Letters. 22 (12): 1321–1329. Bibcode:2001PaReL..22.1321B. doi:10.1016/s0167-8655(01)00077-0. <a href="/wiki/Bibcode_(identifier)" target="_blank">/wiki/Bibcode_(identifier)</a> <a href="#fnref:148" class="footnote-back-ref">↩</a></li>
<li id="fn:149">Afifi, Mahmoud; Abdelhamed, Abdelrahman (2017-06-13). "AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces". arXiv:1706.04277 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:149" class="footnote-back-ref">↩</a></li>
<li id="fn:150">"SoF dataset". sites.google.com. Retrieved 2017-11-18. <a href="https://sites.google.com/view/sof-dataset" target="_blank">https://sites.google.com/view/sof-dataset</a> <a href="#fnref:150" class="footnote-back-ref">↩</a></li>
<li id="fn:151">"IMDb-WIKI". data.vision.ee.ethz.ch. Retrieved 2018-03-13. <a href="https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/" target="_blank">https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/</a> <a href="#fnref:151" class="footnote-back-ref">↩</a></li>
<li id="fn:152">"AVA: A Video Dataset of Atomic Visual Action". research.google.com. Retrieved 2024-10-18. <a href="https://research.google.com/ava/" target="_blank">https://research.google.com/ava/</a> <a href="#fnref:152" class="footnote-back-ref">↩</a></li>
<li id="fn:153">Li, Ang; Thotakuri, Meghana; Ross, David A.; Carreira, João; Vostrikov, Alexander; Zisserman, Andrew (2020-05-20). "The AVA-Kinetics Localized Human Actions Video Dataset". arXiv:2005.00214 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:153" class="footnote-back-ref">↩</a></li>
<li id="fn:154">Patron-Perez, A.; Marszalek, M.; Reid, I.; Zisserman, A. (2012). "Structured learning of human interactions in TV shows". IEEE Transactions on Pattern Analysis and Machine Intelligence. 34 (12): 2441–2453. doi:10.1109/tpami.2012.24. PMID 23079467. S2CID 6060568. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:154" class="footnote-back-ref">↩</a></li>
<li id="fn:155">Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (January 2013). Berkeley MHAD: A comprehensive multimodal human action database. In Applications of Computer Vision (WACV), 2013 IEEE Workshop on (pp. 53–60). IEEE. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.432.5113&rep=rep1&type=pdf" target="_blank">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.432.5113&rep=rep1&type=pdf</a> <a href="#fnref:155" class="footnote-back-ref">↩</a></li>
<li id="fn:156">Jiang, Y. G., et al. "THUMOS challenge: Action recognition with a large number of classes." ICCV Workshop on Action Recognition with a Large Number of Classes, http://crcv.ucf.edu/ICCV13-Action-Workshop. 2013. <a href="http://crcv.ucf.edu/ICCV13-Action-Workshop" target="_blank">http://crcv.ucf.edu/ICCV13-Action-Workshop</a> <a href="#fnref:156" class="footnote-back-ref">↩</a></li>
<li id="fn:157">Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." Advances in Neural Information Processing Systems. 2014. <a href="https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf" target="_blank">https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf</a> <a href="#fnref:157" class="footnote-back-ref">↩</a></li>
<li id="fn:158">Stoian, Andrei; Ferecatu, Marin; Benois-Pineau, Jenny; Crucianu, Michel (2016). "Fast Action Localization in Large-Scale Video Archives". IEEE Transactions on Circuits and Systems for Video Technology. 26 (10): 1917–1930. doi:10.1109/TCSVT.2015.2475835. S2CID 31537462. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:158" class="footnote-back-ref">↩</a></li>
<li id="fn:159">Botta, M., A. Giordana, and L. Saitta. "Learning fuzzy concept definitions." Fuzzy Systems, 1993., Second IEEE International Conference on. IEEE, 1993. <a href="/wiki/Lorenza_Saitta" target="_blank">/wiki/Lorenza_Saitta</a> <a href="#fnref:159" class="footnote-back-ref">↩</a></li>
<li id="fn:160">Frey, Peter W.; Slate, David J. (1991). "Letter recognition using Holland-style adaptive classifiers". Machine Learning. 6 (2): 161–182. doi:10.1007/bf00114162. <a href="https://doi.org/10.1007%2Fbf00114162" target="_blank">https://doi.org/10.1007%2Fbf00114162</a> <a href="#fnref:160" class="footnote-back-ref">↩</a></li>
<li id="fn:161">Peltonen, Jaakko; Klami, Arto; Kaski, Samuel (2004). "Improved learning of Riemannian metrics for exploratory analysis". Neural Networks. 17 (8): 1087–1100. CiteSeerX 10.1.1.59.4865. doi:10.1016/j.neunet.2004.06.008. PMID 15555853. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:161" class="footnote-back-ref">↩</a></li>
<li id="fn:162">Liu, Cheng-Lin; Yin, Fei; Wang, Da-Han; Wang, Qiu-Feng (January 2013). "Online and offline handwritten Chinese character recognition: Benchmarking on new databases". Pattern Recognition. 46 (1): 155–162. Bibcode:2013PatRe..46..155L. doi:10.1016/j.patcog.2012.06.021. <a href="/wiki/Bibcode_(identifier)" target="_blank">/wiki/Bibcode_(identifier)</a> <a href="#fnref:162" class="footnote-back-ref">↩</a></li>
<li id="fn:163">Wang, D.; Liu, C.; Yu, J.; Zhou, X. (2009). "CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters". 2009 10th International Conference on Document Analysis and Recognition. pp. 1206–1210. doi:10.1109/ICDAR.2009.163. ISBN 978-1-4244-4500-4. S2CID 5705532. <a href="978-1-4244-4500-4" target="_blank">978-1-4244-4500-4</a> <a href="#fnref:163" class="footnote-back-ref">↩</a></li>
<li id="fn:164">Liu, Cheng-Lin; Yin, Fei; Wang, Da-Han; Wang, Qiu-Feng (January 2013). "Online and offline handwritten Chinese character recognition: Benchmarking on new databases". Pattern Recognition. 46 (1): 155–162. Bibcode:2013PatRe..46..155L. doi:10.1016/j.patcog.2012.06.021. <a href="/wiki/Bibcode_(identifier)" target="_blank">/wiki/Bibcode_(identifier)</a> <a href="#fnref:164" class="footnote-back-ref">↩</a></li>
<li id="fn:165">Williams, Ben H., Marc Toussaint, and Amos J. Storkey. Extracting motion primitives from natural handwriting data. Springer Berlin Heidelberg, 2006. <a href="https://www.era.lib.ed.ac.uk/bitstream/handle/1842/3221/BH%20Williams%20PhD%20thesis%2009.pdf?sequence=1" target="_blank">https://www.era.lib.ed.ac.uk/bitstream/handle/1842/3221/BH%20Williams%20PhD%20thesis%2009.pdf?sequence=1</a> <a href="#fnref:165" class="footnote-back-ref">↩</a></li>
<li id="fn:166">Meier, Franziska, et al. "Movement segmentation using a primitive library."Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. IEEE, 2011. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.395.8598&rep=rep1&type=pdf" target="_blank">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.395.8598&rep=rep1&type=pdf</a> <a href="#fnref:166" class="footnote-back-ref">↩</a></li>
<li id="fn:167">T. E. de Campos, B. R. Babu and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009 <a href="http://personal.ee.surrey.ac.uk/Personal/T.Decampos/papers/decampos_etal_visapp2009.pdf" target="_blank">http://personal.ee.surrey.ac.uk/Personal/T.Decampos/papers/decampos_etal_visapp2009.pdf</a> <a href="#fnref:167" class="footnote-back-ref">↩</a></li>
<li id="fn:168">Cohen, Gregory; Afshar, Saeed; Tapson, Jonathan; André van Schaik (2017). "EMNIST: An extension of MNIST to handwritten letters". arXiv:1702.05373v1 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:168" class="footnote-back-ref">↩</a></li>
<li id="fn:169">"The EMNIST Dataset". NIST. 4 April 2017. <a href="https://www.nist.gov/itl/products-and-services/emnist-dataset" target="_blank">https://www.nist.gov/itl/products-and-services/emnist-dataset</a> <a href="#fnref:169" class="footnote-back-ref">↩</a></li>
<li id="fn:170">Cohen, Gregory; Afshar, Saeed; Tapson, Jonathan; André van Schaik (2017). "EMNIST: An extension of MNIST to handwritten letters". arXiv:1702.05373 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:170" class="footnote-back-ref">↩</a></li>
<li id="fn:171">Llorens, David, et al. "The UJIpenchars Database: a Pen-Based Database of Isolated Handwritten Characters." LREC. 2008. <a href="https://web.archive.org/web/20190806015012/https://pdfs.semanticscholar.org/24cf/ef15094c59322560377bbf8e4185245c654f.pdf" target="_blank">https://web.archive.org/web/20190806015012/https://pdfs.semanticscholar.org/24cf/ef15094c59322560377bbf8e4185245c654f.pdf</a> <a href="#fnref:171" class="footnote-back-ref">↩</a></li>
<li id="fn:172">Calderara, Simone; Prati, Andrea; Cucchiara, Rita (2011). "Mixtures of von mises distributions for people trajectory shape analysis". IEEE Transactions on Circuits and Systems for Video Technology. 21 (4): 457–471. doi:10.1109/tcsvt.2011.2125550. S2CID 1427766. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:172" class="footnote-back-ref">↩</a></li>
<li id="fn:173">Guyon, Isabelle, et al. "Result analysis of the nips 2003 feature selection challenge." Advances in neural information processing systems. 2004. <a href="http://papers.nips.cc/paper/2728-result-analysis-of-the-nips-2003-feature-selection-challenge.pdf" target="_blank">http://papers.nips.cc/paper/2728-result-analysis-of-the-nips-2003-feature-selection-challenge.pdf</a> <a href="#fnref:173" class="footnote-back-ref">↩</a></li>
<li id="fn:174">Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B. (2015-12-11). "Human-level concept learning through probabilistic program induction". Science. 350 (6266): 1332–1338. Bibcode:2015Sci...350.1332L. doi:10.1126/science.aab3050. ISSN 0036-8075. PMID 26659050. <a href="https://doi.org/10.1126%2Fscience.aab3050" target="_blank">https://doi.org/10.1126%2Fscience.aab3050</a> <a href="#fnref:174" class="footnote-back-ref">↩</a></li>
<li id="fn:175">Lake, Brenden (2019-11-09). "Omniglot data set for one-shot learning". GitHub. Retrieved 2019-11-10. <a href="https://github.com/brendenlake/omniglot" target="_blank">https://github.com/brendenlake/omniglot</a> <a href="#fnref:175" class="footnote-back-ref">↩</a></li>
<li id="fn:176">LeCun, Yann; et al. (1998). "Gradient-based learning applied to document recognition". Proceedings of the IEEE. 86 (11): 2278–2324. CiteSeerX 10.1.1.32.9552. doi:10.1109/5.726791. S2CID 14542261. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:176" class="footnote-back-ref">↩</a></li>
<li id="fn:177">Kussul, Ernst; Baidyk, Tatiana (2004). "Improved method of handwritten digit recognition tested on MNIST database". Image and Vision Computing. 22 (12): 971–981. doi:10.1016/j.imavis.2004.03.008. <a href="/wiki/Tetyana_Baydyk" target="_blank">/wiki/Tetyana_Baydyk</a> <a href="#fnref:177" class="footnote-back-ref">↩</a></li>
<li id="fn:178">Xu, Lei; Krzyżak, Adam; Suen, Ching Y. (1992). "Methods of combining multiple classifiers and their applications to handwriting recognition". IEEE Transactions on Systems, Man, and Cybernetics. 22 (3): 418–435. doi:10.1109/21.155943. hdl:10338.dmlcz/135217. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:178" class="footnote-back-ref">↩</a></li>
<li id="fn:179">Alimoglu, Fevzi, et al. "Combining multiple classifiers for pen-based handwritten digit recognition." (1996). <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.6299" target="_blank">http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.6299</a> <a href="#fnref:179" class="footnote-back-ref">↩</a></li>
<li id="fn:180">Tang, E. Ke; et al. (2005). "Linear dimensionality reduction using relevance weighted LDA". Pattern Recognition. 38 (4): 485–493. Bibcode:2005PatRe..38..485T. doi:10.1016/j.patcog.2004.09.005. S2CID 10580110. <a href="/wiki/Bibcode_(identifier)" target="_blank">/wiki/Bibcode_(identifier)</a> <a href="#fnref:180" class="footnote-back-ref">↩</a></li>
<li id="fn:181">Hong, Yi, et al. "Learning a mixture of sparse distance metrics for classification and dimensionality reduction." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. <a href="https://pages.ucsd.edu/~ztu/publication/iccv11_sparsemetric.pdf" target="_blank">https://pages.ucsd.edu/~ztu/publication/iccv11_sparsemetric.pdf</a> <a href="#fnref:181" class="footnote-back-ref">↩</a></li>
<li id="fn:182">Thoma, Martin (2017). "The HASYv2 dataset". arXiv:1701.08380 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:182" class="footnote-back-ref">↩</a></li>
<li id="fn:183">Karki, Manohar; Liu, Qun; DiBiano, Robert; Basu, Saikat; Mukhopadhyay, Supratik (2018-06-20). "Pixel-level Reconstruction and Classification for Noisy Handwritten Bangla Characters". arXiv:1806.08037 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:183" class="footnote-back-ref">↩</a></li>
<li id="fn:184">Liu, Qun; Collier, Edward; Mukhopadhyay, Supratik (2019). "PCGAN-CHAR: Progressively Trained Classifier Generative Adversarial Networks for Classification of Noisy Handwritten Bangla Characters". Digital Libraries at the Crossroads of Digital Information for the Future. Lecture Notes in Computer Science. Vol. 11853. Springer International Publishing. pp. 3–15. arXiv:1908.08987. doi:10.1007/978-3-030-34058-2_1. ISBN 978-3-030-34057-5. S2CID 201665955. <a href="978-3-030-34057-5" target="_blank">978-3-030-34057-5</a> <a href="#fnref:184" class="footnote-back-ref">↩</a></li>
<li id="fn:185">"iSAID". captain-whu.github.io. Retrieved 2021-11-30. <a href="https://captain-whu.github.io/iSAID/index.html" target="_blank">https://captain-whu.github.io/iSAID/index.html</a> <a href="#fnref:185" class="footnote-back-ref">↩</a></li>
<li id="fn:186">Zamir, Syed & Arora, Aditya & Gupta, Akshita & Khan, Salman & Sun, Guolei & Khan, Fahad & Zhu, Fan & Shao, Ling & Xia, Gui-Song & Bai, Xiang. (2019). iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images. website <a href="https://captain-whu.github.io/iSAID/index.html" target="_blank">https://captain-whu.github.io/iSAID/index.html</a> <a href="#fnref:186" class="footnote-back-ref">↩</a></li>
<li id="fn:187">Yuan, Jiangye; Gleason, Shaun S.; Cheriyadat, Anil M. (2013). "Systematic benchmarking of aerial image segmentation". IEEE Geoscience and Remote Sensing Letters. 10 (6): 1527–1531. Bibcode:2013IGRSL..10.1527Y. doi:10.1109/lgrs.2013.2261453. S2CID 629629. <a href="/wiki/Bibcode_(identifier)" target="_blank">/wiki/Bibcode_(identifier)</a> <a href="#fnref:187" class="footnote-back-ref">↩</a></li>
<li id="fn:188">Vatsavai, Ranga Raju. "Object based image classification: state of the art and computational challenges." Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data. ACM, 2013. <a href="https://dl.acm.org/citation.cfm?id=2534927" target="_blank">https://dl.acm.org/citation.cfm?id=2534927</a> <a href="#fnref:188" class="footnote-back-ref">↩</a></li>
<li id="fn:189">Butenuth, Matthias, et al. "Integrating pedestrian simulation, tracking and event detection for crowd analysis." Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE, 2011. <a href="http://www.hartmann-alberts.de/dirk/pub/proceedings2011e.pdf" target="_blank">http://www.hartmann-alberts.de/dirk/pub/proceedings2011e.pdf</a> <a href="#fnref:189" class="footnote-back-ref">↩</a></li>
<li id="fn:190">Fradi, Hajer, and Jean-Luc Dugelay. "Low level crowd analysis using frame-wise normalized feature for people counting." Information Forensics and Security (WIFS), 2012 IEEE International Workshop on. IEEE, 2012. <a href="http://www.eurecom.fr/fr/publication/3841/download/mm-publi-3841.pdf" target="_blank">http://www.eurecom.fr/fr/publication/3841/download/mm-publi-3841.pdf</a> <a href="#fnref:190" class="footnote-back-ref">↩</a></li>
<li id="fn:191">Johnson, Brian Alan, Ryutaro Tateishi, and Nguyen Thanh Hoan. "A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees." International journal of remote sensing34.20 (2013): 6969–6982. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.826.9200&rep=rep1&type=pdf" target="_blank">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.826.9200&rep=rep1&type=pdf</a> <a href="#fnref:191" class="footnote-back-ref">↩</a></li>
<li id="fn:192">Mohd Pozi, Muhammad Syafiq; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran (2015). "A new classification model for a class imbalanced data set using genetic programming and support vector machines: Case study for wilt disease classification". Remote Sensing Letters. 6 (7): 568–577. Bibcode:2015RSL.....6..568M. doi:10.1080/2150704X.2015.1062159. S2CID 58788630. <a href="https://www.tandfonline.com/doi/abs/10.1080/2150704X.2015.1062159" target="_blank">https://www.tandfonline.com/doi/abs/10.1080/2150704X.2015.1062159</a> <a href="#fnref:192" class="footnote-back-ref">↩</a></li>
<li id="fn:193">Gallego, A.-J.; Pertusa, A.; Gil, P. "Automatic Ship Classification from Optical Aerial Images with Convolutional Neural Networks." Remote Sensing. 2018; 10(4):511. <a href="https://www.mdpi.com/2072-4292/10/4/511" target="_blank">https://www.mdpi.com/2072-4292/10/4/511</a> <a href="#fnref:193" class="footnote-back-ref">↩</a></li>
<li id="fn:194">Gallego, A.-J.; Pertusa, A.; Gil, P. "MAritime SATellite Imagery dataset". Available: https://www.iuii.ua.es/datasets/masati/, 2018. <a href="https://www.iuii.ua.es/datasets/masati/" target="_blank">https://www.iuii.ua.es/datasets/masati/</a> <a href="#fnref:194" class="footnote-back-ref">↩</a></li>
<li id="fn:195">Johnson, Brian; Tateishi, Ryutaro; Xie, Zhixiao (2012). "Using geographically weighted variables for image classification". Remote Sensing Letters. 3 (6): 491–499. Bibcode:2012RSL.....3..491J. doi:10.1080/01431161.2011.629637. S2CID 122543681. <a href="/wiki/Bibcode_(identifier)" target="_blank">/wiki/Bibcode_(identifier)</a> <a href="#fnref:195" class="footnote-back-ref">↩</a></li>
<li id="fn:196">Chatterjee, Sankhadeep, et al. "Forest Type Classification: A Hybrid NN-GA Model Based Approach." Information Systems Design and Intelligent Applications. Springer India, 2016. 227–236. <a href="https://www.researchgate.net/profile/Sankhadeep_Chatterjee/publication/282605325_Forest_Type_Classification_A_Hybrid_NN-GA_Model_Based_Approach/links/57493cb308ae5c51e29e6f1b/Forest-Type-Classification-A-Hybrid-NN-GA-Model-Based-Approach.pdf" target="_blank">https://www.researchgate.net/profile/Sankhadeep_Chatterjee/publication/282605325_Forest_Type_Classification_A_Hybrid_NN-GA_Model_Based_Approach/links/57493cb308ae5c51e29e6f1b/Forest-Type-Classification-A-Hybrid-NN-GA-Model-Based-Approach.pdf</a> <a href="#fnref:196" class="footnote-back-ref">↩</a></li>
<li id="fn:197">Diegert, Carl. "A combinatorial method for tracing objects using semantics of their shape." Applied Imagery Pattern Recognition Workshop (AIPR), 2010 IEEE 39th. IEEE, 2010. <a href="https://www.osti.gov/servlets/purl/1278837" target="_blank">https://www.osti.gov/servlets/purl/1278837</a> <a href="#fnref:197" class="footnote-back-ref">↩</a></li>
<li id="fn:198">Razakarivony, Sebastien, and Frédéric Jurie. "Small target detection combining foreground and background manifolds." IAPR International Conference on Machine Vision Applications. 2013. <a href="https://hal.archives-ouvertes.fr/hal-00943444/file/13_mva-detection.pdf" target="_blank">https://hal.archives-ouvertes.fr/hal-00943444/file/13_mva-detection.pdf</a> <a href="#fnref:198" class="footnote-back-ref">↩</a></li>
<li id="fn:199">"SpaceNet". explore.digitalglobe.com. Archived from the original on 13 March 2018. Retrieved 2018-03-13. <a href="https://web.archive.org/web/20180313092809/http://explore.digitalglobe.com/spacenet" target="_blank">https://web.archive.org/web/20180313092809/http://explore.digitalglobe.com/spacenet</a> <a href="#fnref:199" class="footnote-back-ref">↩</a></li>
<li id="fn:200">Etten, Adam Van (2017-01-05). "Getting Started With SpaceNet Data". The DownLinQ. Retrieved 2018-03-13. <a href="https://medium.com/the-downlinq/getting-started-with-spacenet-data-827fd2ec9f53" target="_blank">https://medium.com/the-downlinq/getting-started-with-spacenet-data-827fd2ec9f53</a> <a href="#fnref:200" class="footnote-back-ref">↩</a></li>
<li id="fn:201">Vakalopoulou, M.; Bus, N.; Karantzalosa, K.; Paragios, N. (July 2017). "Integrating edge/Boundary priors with classification scores for building detection in very high resolution data". 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). pp. 3309–3312. doi:10.1109/IGARSS.2017.8127705. ISBN 978-1-5090-4951-6. S2CID 8297433. <a href="978-1-5090-4951-6" target="_blank">978-1-5090-4951-6</a> <a href="#fnref:201" class="footnote-back-ref">↩</a></li>
<li id="fn:202">Yang, Yi; Newsam, Shawn (2010). "Bag-of-visual-words and spatial extensions for land-use classification". Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York, New York, USA: ACM Press. pp. 270–279. doi:10.1145/1869790.1869829. ISBN 9781450304283. S2CID 993769. <a href="9781450304283" target="_blank">9781450304283</a> <a href="#fnref:202" class="footnote-back-ref">↩</a></li>
<li id="fn:203">Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (2015-11-03). "DeepSat: A learning framework for satellite imagery". Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM. pp. 1–10. doi:10.1145/2820783.2820816. ISBN 9781450339674. S2CID 4387134. <a href="9781450339674" target="_blank">9781450339674</a> <a href="#fnref:203" class="footnote-back-ref">↩</a></li>
<li id="fn:204">Liu, Qun; Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (2019-11-21). "DeepSat V2: feature augmented convolutional neural nets for satellite image classification". Remote Sensing Letters. 11 (2): 156–165. arXiv:1911.07747. doi:10.1080/2150704x.2019.1693071. ISSN 2150-704X. S2CID 208138097. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:204" class="footnote-back-ref">↩</a></li>
<li id="fn:205">Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (2015-11-03). "DeepSat: A learning framework for satellite imagery". Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM. pp. 1–10. doi:10.1145/2820783.2820816. ISBN 9781450339674. S2CID 4387134. <a href="9781450339674" target="_blank">9781450339674</a> <a href="#fnref:205" class="footnote-back-ref">↩</a></li>
<li id="fn:206">Liu, Qun; Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (2019-11-21). "DeepSat V2: feature augmented convolutional neural nets for satellite image classification". Remote Sensing Letters. 11 (2): 156–165. arXiv:1911.07747. doi:10.1080/2150704x.2019.1693071. ISSN 2150-704X. S2CID 208138097. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:206" class="footnote-back-ref">↩</a></li>
<li id="fn:207">Md Jahidul Islam, et al. "Semantic Segmentation of Underwater Imagery: Dataset and Benchmark." 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020. <a href="https://ieeexplore.ieee.org/abstract/document/9340821" target="_blank">https://ieeexplore.ieee.org/abstract/document/9340821</a> <a href="#fnref:207" class="footnote-back-ref">↩</a></li>
<li id="fn:208">Waszak et al. "Semantic Segmentation in Underwater Ship Inspections: Benchmark and Data Set." IEEE Journal of Oceanic Engineering. IEEE, 2022. <a href="https://ieeexplore.ieee.org/document/9998080" target="_blank">https://ieeexplore.ieee.org/document/9998080</a> <a href="#fnref:208" class="footnote-back-ref">↩</a></li>
<li id="fn:209">"True Color Kodak Images". r0k.us. Retrieved 2025-02-27. <a href="https://r0k.us/graphics/kodak/" target="_blank">https://r0k.us/graphics/kodak/</a> <a href="#fnref:209" class="footnote-back-ref">↩</a></li>
<li id="fn:210">Ebadi, Ashkan; Paul, Patrick; Auer, Sofia; Tremblay, Stéphane (2021-11-12). "NRC-GAMMA: Introducing a Novel Large Gas Meter Image Dataset". arXiv:2111.06827 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:210" class="footnote-back-ref">↩</a></li>
<li id="fn:211">Canada, Government of Canada National Research Council (2021). "The gas meter image dataset (NRC-GAMMA) - NRC Digital Repository". nrc-digital-repository.canada.ca. doi:10.4224/3c8s-z290. Retrieved 2021-12-02. <a href="https://nrc-digital-repository.canada.ca/eng/view/object/?id=ba1fc493-e65f-4c0a-ab31-ecbcdf00bfa4" target="_blank">https://nrc-digital-repository.canada.ca/eng/view/object/?id=ba1fc493-e65f-4c0a-ab31-ecbcdf00bfa4</a> <a href="#fnref:211" class="footnote-back-ref">↩</a></li>
<li id="fn:212">Rabah, Chaima Ben; Coatrieux, Gouenou; Abdelfattah, Riadh (October 2020). "The Supatlantique Scanned Documents Database for Digital Image Forensics Purposes". 2020 IEEE International Conference on Image Processing (ICIP). IEEE. pp. 2096–2100. doi:10.1109/icip40778.2020.9190665. ISBN 978-1-7281-6395-6. S2CID 224881147. <a href="978-1-7281-6395-6" target="_blank">978-1-7281-6395-6</a> <a href="#fnref:212" class="footnote-back-ref">↩</a></li>
<li id="fn:213">Mills, Kyle; Tamblyn, Isaac (2018-05-16). "Big graphene dataset". National Research Council of Canada. doi:10.4224/c8sc04578j.data. {{cite web}}: Missing or empty |url= (help) <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:213" class="footnote-back-ref">↩</a></li>
<li id="fn:214">Mills, Kyle; Spanner, Michael; Tamblyn, Isaac (2018-05-16). "Quantum simulation". Quantum simulations of an electron in a two dimensional potential well. National Research Council of Canada. doi:10.4224/PhysRevA.96.042113.data. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:214" class="footnote-back-ref">↩</a></li>
<li id="fn:215">Rohrbach, M.; Amin, S.; Andriluka, M.; Schiele, B. (2012). "A database for fine grained activity detection of cooking activities". 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 1194–1201. doi:10.1109/cvpr.2012.6247801. ISBN 978-1-4673-1228-8. <a href="978-1-4673-1228-8" target="_blank">978-1-4673-1228-8</a> <a href="#fnref:215" class="footnote-back-ref">↩</a></li>
<li id="fn:216">Kuehne, Hilde, Ali Arslan, and Thomas Serre. "The language of actions: Recovering the syntax and semantics of goal-directed human activities."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. <a href="https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Kuehne_The_Language_of_2014_CVPR_paper.pdf" target="_blank">https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Kuehne_The_Language_of_2014_CVPR_paper.pdf</a> <a href="#fnref:216" class="footnote-back-ref">↩</a></li>
<li id="fn:217">Sviatoslav, Voloshynovskiy, et al. "Towards Reproducible results in authentication based on physical non-cloneable functions: The Forensic Authentication Microstructure Optical Set (FAMOS)."Proc. Proceedings of IEEE International Workshop on Information Forensics and Security. 2012. <a href="http://vision.unige.ch/publications/postscript/2012/2012.WIFS.database.pdf" target="_blank">http://vision.unige.ch/publications/postscript/2012/2012.WIFS.database.pdf</a> <a href="#fnref:217" class="footnote-back-ref">↩</a></li>
<li id="fn:218">Olga, Taran and Shideh, Rezaeifar, et al. "PharmaPack: mobile fine-grained recognition of pharma packages."Proc. European Signal Processing Conference (EUSIPCO). 2017. <a href="https://archive-ouverte.unige.ch/unige:97444/ATTACHMENT01" target="_blank">https://archive-ouverte.unige.ch/unige:97444/ATTACHMENT01</a> <a href="#fnref:218" class="footnote-back-ref">↩</a></li>
<li id="fn:219">Khosla, Aditya, et al. "Novel dataset for fine-grained image categorization: Stanford dogs."Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC). 2011. <a href="https://people.csail.mit.edu/khosla/papers/fgvc2011.pdf" target="_blank">https://people.csail.mit.edu/khosla/papers/fgvc2011.pdf</a> <a href="#fnref:219" class="footnote-back-ref">↩</a></li>
<li id="fn:220">Parkhi, Omkar M., et al. "Cats and dogs."Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. <a href="http://www.robots.ox.ac.uk:5000/~vgg/publications/2012/parkhi12a/parkhi12a.pdf" target="_blank">http://www.robots.ox.ac.uk:5000/~vgg/publications/2012/parkhi12a/parkhi12a.pdf</a> <a href="#fnref:220" class="footnote-back-ref">↩</a></li>
<li id="fn:221">Biggs, Benjamin; Boyne, Oliver; Charles, James; Fitzgibbon, Andrew; Cipolla, Roberto (2020). Computer Vision – ECCV 2020. Lecture Notes in Computer Science. Vol. 12356. arXiv:2007.11110. doi:10.1007/978-3-030-58621-8. ISBN 978-3-030-58620-1. S2CID 227173931. <a href="978-3-030-58620-1" target="_blank">978-3-030-58620-1</a> <a href="#fnref:221" class="footnote-back-ref">↩</a></li>
<li id="fn:222">Parkhi, Omkar M., et al. "Cats and dogs."Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. <a href="http://www.robots.ox.ac.uk:5000/~vgg/publications/2012/parkhi12a/parkhi12a.pdf" target="_blank">http://www.robots.ox.ac.uk:5000/~vgg/publications/2012/parkhi12a/parkhi12a.pdf</a> <a href="#fnref:222" class="footnote-back-ref">↩</a></li>
<li id="fn:223">Razavian, Ali, et al. "CNN features off-the-shelf: an astounding baseline for recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014. <a href="https://www.cv-foundation.org/openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf" target="_blank">https://www.cv-foundation.org/openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf</a> <a href="#fnref:223" class="footnote-back-ref">↩</a></li>
<li id="fn:224">Ortega, Michael; et al. (1998). "Supporting ranked boolean similarity queries in MARS". IEEE Transactions on Knowledge and Data Engineering. 10 (6): 905–925. CiteSeerX 10.1.1.36.6079. doi:10.1109/69.738357. <a href="/wiki/CiteSeerX_(identifier)" target="_blank">/wiki/CiteSeerX_(identifier)</a> <a href="#fnref:224" class="footnote-back-ref">↩</a></li>
<li id="fn:225">He, Xuming, Richard S. Zemel, and Miguel Á. Carreira-Perpiñán. "Multiscale conditional random fields for image labeling[permanent dead link]." Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on. Vol. 2. IEEE, 2004. <a href="ftp://www-vhost.cs.toronto.edu/public_html/public_html/dist/zemel/Papers/cvpr04.pdf" target="_blank">ftp://www-vhost.cs.toronto.edu/public_html/public_html/dist/zemel/Papers/cvpr04.pdf</a> <a href="#fnref:225" class="footnote-back-ref">↩</a></li>
<li id="fn:226">Deneke, Tewodros, et al. "Video transcoding time prediction for proactive load balancing." Multimedia and Expo (ICME), 2014 IEEE International Conference on. IEEE, 2014. <a href="https://ieeexplore.ieee.org/abstract/document/6890256/" target="_blank">https://ieeexplore.ieee.org/abstract/document/6890256/</a> <a href="#fnref:226" class="footnote-back-ref">↩</a></li>
<li id="fn:227">Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell (13 April 2016). "Visual Storytelling". arXiv:1604.03968 [cs.CL].{{cite arXiv}}: CS1 maint: multiple names: authors list (link) <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:227" class="footnote-back-ref">↩</a></li>
<li id="fn:228">Wah, Catherine, et al. "The caltech-ucsd birds-200-2011 dataset." (2011). <a href="https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf" target="_blank">https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf</a> <a href="#fnref:228" class="footnote-back-ref">↩</a></li>
<li id="fn:229">Duan, Kun, et al. "Discovering localized attributes for fine-grained recognition." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. <a href="http://vision.soic.indiana.edu/papers/attributes2012cvpr.pdf" target="_blank">http://vision.soic.indiana.edu/papers/attributes2012cvpr.pdf</a> <a href="#fnref:229" class="footnote-back-ref">↩</a></li>
<li id="fn:230">"YouTube-8M Dataset". research.google.com. Retrieved 1 October 2016. <a href="https://research.google.com/youtube8m/" target="_blank">https://research.google.com/youtube8m/</a> <a href="#fnref:230" class="footnote-back-ref">↩</a></li>
<li id="fn:231">Abu-El-Haija, Sami; Kothari, Nisarg; Lee, Joonseok; Natsev, Paul; Toderici, George; Varadarajan, Balakrishnan; Vijayanarasimhan, Sudheendra (27 September 2016). "YouTube-8M: A Large-Scale Video Classification Benchmark". arXiv:1609.08675 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:231" class="footnote-back-ref">↩</a></li>
<li id="fn:232">"YFCC100M Dataset". mmcommons.org. Yahoo-ICSI-LLNL. Retrieved 1 June 2017. <a href="http://mmcommons.org" target="_blank">http://mmcommons.org</a> <a href="#fnref:232" class="footnote-back-ref">↩</a></li>
<li id="fn:233">Bart Thomee; David A Shamma; Gerald Friedland; Benjamin Elizalde; Karl Ni; Douglas Poland; Damian Borth; Li-Jia Li (25 April 2016). "Yfcc100m: The new data in multimedia research". Communications of the ACM. 59 (2): 64–73. arXiv:1503.01817. doi:10.1145/2812802. S2CID 207230134. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:233" class="footnote-back-ref">↩</a></li>
<li id="fn:234">Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "LIRIS-ACCEDE: A Video Database for Affective Content Analysis," in IEEE Transactions on Affective Computing, 2015. <a href="https://hal.archives-ouvertes.fr/hal-01375518/document" target="_blank">https://hal.archives-ouvertes.fr/hal-01375518/document</a> <a href="#fnref:234" class="footnote-back-ref">↩</a></li>
<li id="fn:235">Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "Deep Learning vs. Kernel Methods: Performance for Emotion Prediction in Videos," in 2015 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2015. <a href="https://hal.archives-ouvertes.fr/hal-01193144/document" target="_blank">https://hal.archives-ouvertes.fr/hal-01193144/document</a> <a href="#fnref:235" class="footnote-back-ref">↩</a></li>
<li id="fn:236">M. Sjöberg, Y. Baveye, H. Wang, V. L. Quang, B. Ionescu, E. Dellandréa, M. Schedl, C.-H. Demarty, and L. Chen, "The mediaeval 2015 affective impact of movies task," in MediaEval 2015 Workshop, 2015. <a href="https://www.researchgate.net/profile/Hanli_Wang2/publication/309704559_The_MediaEval_2015_Affective_Impact_of_Movies_Task/links/581dada308ae12715af33bc8/The-MediaEval-2015-Affective-Impact-of-Movies-Task.pdf" target="_blank">https://www.researchgate.net/profile/Hanli_Wang2/publication/309704559_The_MediaEval_2015_Affective_Impact_of_Movies_Task/links/581dada308ae12715af33bc8/The-MediaEval-2015-Affective-Impact-of-Movies-Task.pdf</a> <a href="#fnref:236" class="footnote-back-ref">↩</a></li>
<li id="fn:237">S. Johnson and M. Everingham, "Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation Archived 2021-11-04 at the Wayback Machine", in Proceedings of the 21st British Machine Vision Conference (BMVC2010) <a href="http://sam.johnson.io/research/publications/johnson10bmvc.pdf" target="_blank">http://sam.johnson.io/research/publications/johnson10bmvc.pdf</a> <a href="#fnref:237" class="footnote-back-ref">↩</a></li>
<li id="fn:238">S. Johnson and M. Everingham, "Learning Effective Human Pose Estimation from Inaccurate Annotation Archived 2021-11-04 at the Wayback Machine", In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2011) <a href="http://sam.johnson.io/research/publications/johnson11cvpr.pdf" target="_blank">http://sam.johnson.io/research/publications/johnson11cvpr.pdf</a> <a href="#fnref:238" class="footnote-back-ref">↩</a></li>
<li id="fn:239">Afifi, Mahmoud; Hussain, Khaled F. (2017-11-02). "The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques". arXiv:1711.00972 [cs.CV]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:239" class="footnote-back-ref">↩</a></li>
<li id="fn:240">"MCQ Dataset". sites.google.com. Retrieved 2017-11-18. <a href="https://sites.google.com/view/mcq-dataset/mcqe-dataset" target="_blank">https://sites.google.com/view/mcq-dataset/mcqe-dataset</a> <a href="#fnref:240" class="footnote-back-ref">↩</a></li>
<li id="fn:241">Taj-Eddin, I. A. T. F.; Afifi, M.; Korashy, M.; Hamdy, D.; Nasser, M.; Derbaz, S. (July 2016). "A new compression technique for surveillance videos: Evaluation using new dataset". 2016 Sixth International Conference on Digital Information and Communication Technology and its Applications (DICTAP). pp. 159–164. doi:10.1109/DICTAP.2016.7544020. ISBN 978-1-4673-9609-7. S2CID 8698850. <a href="978-1-4673-9609-7" target="_blank">978-1-4673-9609-7</a> <a href="#fnref:241" class="footnote-back-ref">↩</a></li>
<li id="fn:242">Tabak, Michael A.; Norouzzadeh, Mohammad S.; Wolfson, David W.; Sweeney, Steven J.; Vercauteren, Kurt C.; Snow, Nathan P.; Halseth, Joseph M.; Di Salvo, Paul A.; Lewis, Jesse S.; White, Michael D.; Teton, Ben; Beasley, James C.; Schlichting, Peter E.; Boughton, Raoul K.; Wight, Bethany; Newkirk, Eric S.; Ivan, Jacob S.; Odell, Eric A.; Brook, Ryan K.; Lukacs, Paul M.; Moeller, Anna K.; Mandeville, Elizabeth G.; Clune, Jeff; Miller, Ryan S.; Photopoulou, Theoni (2018). "Machine learning to classify animal species in camera trap images: Applications in ecology". Methods in Ecology and Evolution. 10 (4): 585–590. doi:10.1111/2041-210X.13120. ISSN 2041-210X. <a href="https://doi.org/10.1111%2F2041-210X.13120" target="_blank">https://doi.org/10.1111%2F2041-210X.13120</a> <a href="#fnref:242" class="footnote-back-ref">↩</a></li>
<li id="fn:243">Taj-Eddin, Islam A. T. F.; Afifi, Mahmoud; Korashy, Mostafa; Ahmed, Ali H.; Ng, Yoke Cheng; Hernandez, Evelyng; Abdel-Latif, Salma M. (November 2017). "Can we see photosynthesis? Magnifying the tiny color changes of plant green leaves using Eulerian video magnification". Journal of Electronic Imaging. 26 (6): 060501. arXiv:1706.03867. Bibcode:2017JEI....26f0501T. doi:10.1117/1.jei.26.6.060501. ISSN 1017-9909. S2CID 12367169. <a href="/w/index.php?title=Eulerian_magnification&action=edit&redlink=1" target="_blank">/w/index.php?title=Eulerian_magnification&action=edit&redlink=1</a> <a href="#fnref:243" class="footnote-back-ref">↩</a></li>
<li id="fn:244">"Mathematical Mathematics Memes". <a href="https://www.kaggle.com/abdelghanibelgaid/mathematical-mathematics-memes" target="_blank">https://www.kaggle.com/abdelghanibelgaid/mathematical-mathematics-memes</a> <a href="#fnref:244" class="footnote-back-ref">↩</a></li>
<li id="fn:245">Karras, Tero; Laine, Samuli; Aila, Timo (June 2019). "A Style-Based Generator Architecture for Generative Adversarial Networks". 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 4396–4405. arXiv:1812.04948. doi:10.1109/cvpr.2019.00453. ISBN 978-1-7281-3293-8. S2CID 54482423. <a href="978-1-7281-3293-8" target="_blank">978-1-7281-3293-8</a> <a href="#fnref:245" class="footnote-back-ref">↩</a></li>
<li id="fn:246">Oltean, Mihai (2017). "Fruits-360 dataset". GitHub. <a href="https://www.github.com/fruits-360" target="_blank">https://www.github.com/fruits-360</a> <a href="#fnref:246" class="footnote-back-ref">↩</a></li>
</ol>

List of datasets in computer vision and image processing open-in-new

List of datasets in computer vision and image processing