Menu
Home Explore People Places Arts History Plants & Animals Science Life & Culture Technology
On this page
List of datasets in computer vision and image processing

This is a list of datasets for machine learning research. It is part of the list of datasets for machine-learning research. These datasets consist primarily of images or videos for tasks such as object detection, facial recognition, and multi-label classification.

Object detection and recognition

Dataset NameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
MNISTDatabase of grayscale handwritten digits.60,000image, labelclassification19941LeCun et al.
Extended MNISTDatabase of grayscale handwritten digits and letters.810,000image, labelclassification20102NIST
NYU Object Recognition Benchmark (NORB)Stereoscopic pairs of photos of toys in various orientations.Centering, perturbation.97,200 image pairs (50 uniform-colored toys under 36 angles, 9 azimuths, and 6 lighting conditions)ImagesObject recognition200434LeCun et al.
80 Million Tiny Images80 million 32×32 images labelled with 75,062 non-abstract nouns.80,000,000image, label20085Torralba et al.
Street View House Numbers (SVHN)630,420 digits with bounding boxes in house numbers captured in Google Street View.630,420image, label, bounding boxes201167Netzer et al.
JFT-300MDataset internal to Google Research. 303M images with 375M labels in 18291 categories303,000,000image, label20178910Google Research
JFT-3BInternal to Google Research. 3 billion images, annotated with ~30k categories in a hierarchy.3,000,000,000image, label202111Google Research
Places10+ million images in 400+ scene classes, with 5000 to 30,000 images per class.10,000,000image, label201812Zhou et al
Ego 4DA massive-scale, egocentric dataset and benchmark suite collected across 74 worldwide locations and 9 countries, with over 3,670 hours of daily-life activity video.Object bounding boxes, transcriptions, labeling.3,670 video hoursvideo, audio, transcriptionsMultimodal first-person task202213K. Grauman et al.
Wikipedia-based Image Text Dataset37.5 million image-text examples with 11.5 million unique images across 108 Wikipedia languages.11,500,000image, captionPretraining, image captioning202114Srinivasan e al, Google Research
Visual GenomeImages and their description108,000images, textImage captioning201615R. Krishna et al.
Berkeley 3-D Object Dataset849 images taken in 75 different scenes. About 50 different object classes are labeled.Object bounding boxes and labeling.849labeled images, textObject recognition20141617A. Janoch et al.
Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500)500 natural images, explicitly separated into disjoint train, validation and test subsets + benchmarking code. Based on BSDS300.Each image segmented by five different subjects on average.500Segmented imagesContour detection and hierarchical image segmentation201118University of California, Berkeley
Microsoft Common Objects in Context (COCO)complex everyday scenes of common objects in their natural context.Object highlighting, labeling, and classification into 91 object types.2,500,000Labeled images, textObject recognition2015192021T. Lin et al.
ImageNetLabeled object image database, used in the ImageNet Large Scale Visual Recognition ChallengeLabeled objects, bounding boxes, descriptive words, SIFT features14,197,122Images, textObject recognition, scene recognition2009 (2014)222324J. Deng et al.
SUN (Scene UNderstanding)Very large scene and object recognition database.Places and objects are labeled. Objects are segmented.131,067Images, textObject recognition, scene recognition20142526J. Xiao et al.
LSUN (Large SUN)10 scene categories (bedroom, etc) and 20 object categories (airplane, etc)Images and labels.~60 millionImages, textObject recognition, scene recognition2015272829Yu et al.
LVIS (Large Vocabulary Instance Segmentation)segmentation masks for over 1000 entry-level object categories in images2.2 million segmentations, 164K imagesImages, segmentation masks.image segmentation masking201930
Open ImagesA Large set of images listed as having CC BY 2.0 license with image-level labels and bounding boxes spanning thousands of classes.Image-level labels, Bounding boxes9,178,275Images, textClassification, Object recognition2017

(V7 : 2022)

31
TV News Channel Commercial Detection DatasetTV commercials and news broadcasts.Audio and video features extracted from still images.129,685TextClustering, classification20153233P. Guha et al.
Statlog (Image Segmentation) DatasetThe instances were drawn randomly from a database of 7 outdoor images and hand-segmented to create a classification for every pixel.Many features calculated.2310TextClassification199034University of Massachusetts
Caltech 101Pictures of objects.Detailed object outlines marked.9146ImagesClassification, object recognition20033536F. Li et al.
Caltech-256Large dataset of images for object classification.Images categorized and hand-sorted.30,607Images, TextClassification, object detection20073738G. Griffin et al.
COYO-700MImage–text-pair dataset10 billion pairs of alt-text and image sources in HTML documents in CommonCrawl746,972,269Images, TextClassification, Image-Language202239
SIFT10M DatasetSIFT features of Caltech-256 dataset.Extensive SIFT feature extraction.11,164,866TextClassification, object detection201640X. Fu et al.
LabelMeAnnotated pictures of scenes.Objects outlined.187,240Images, textClassification, object detection200541MIT Computer Science and Artificial Intelligence Laboratory
PASCAL VOC DatasetImages in 20 categories and localization bounding boxes.Labeling, bounding box included500,000Images, textClassification, object detection20104243M. Everingham et al.
CIFAR-10 DatasetMany small, low-resolution, images of 10 classes of objects.Classes labelled, training set splits created.60,000ImagesClassification20094445A. Krizhevsky et al.
CIFAR-100 DatasetLike CIFAR-10, above, but 100 classes of objects are given.Classes labelled, training set splits created.60,000ImagesClassification20094647A. Krizhevsky et al.
CINIC-10 DatasetA unified contribution of CIFAR-10 and Imagenet with 10 classes, and 3 splits. Larger than CIFAR-10.Classes labelled, training, validation, test set splits created.270,000ImagesClassification201848Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey
Fashion-MNISTA MNIST-like fashion product databaseClasses labelled, training set splits created.60,000ImagesClassification201749Zalando SE
notMNISTSome publicly available fonts and extracted glyphs from them to make a dataset similar to MNIST. There are 10 classes, with letters A–J taken from different fonts.Classes labelled, training set splits created.500,000ImagesClassification201150Yaroslav Bulatov
Linnaeus 5 datasetImages of 5 classes of objects.Classes labelled, training set splits created.8000ImagesClassification201751Chaladze & Kalatozishvili
11K Hands11,076 hand images (1600 x 1200 pixels) of 190 subjects, of varying ages between 18 – 75 years old, for gender recognition and biometric identification.None11,076 hand imagesImages and (.mat, .txt, and .csv) label filesGender recognition and biometric identification201752M Afifi
CORe50Specifically designed for Continuous/Lifelong Learning and Object Recognition, is a collection of more than 500 videos (30fps) of 50 domestic objects belonging to 10 different categories.Classes labelled, training set splits created based on a 3-way, multi-runs benchmark.164,866 RBG-D imagesimages (.png or .pkl)

and (.pkl, .txt, .tsv) label files

Classification, Object recognition201753V. Lomonaco and D. Maltoni
OpenLORIS-ObjectLifelong/Continual Robotic Vision dataset (OpenLORIS-Object) collected by real robots mounted with multiple high-resolution sensors, includes a collection of 121 object instances (1st version of dataset, 40 categories daily necessities objects under 20 scenes). The dataset has rigorously considered 4 environment factors under different scenes, including illumination, occlusion, object pixel size and clutter, and defines the difficulty levels of each factor explicitly.Classes labelled, training/validation/testing set splits created by benchmark scripts.1,106,424 RBG-D imagesimages (.png and .pkl)

and (.pkl) label files

Classification, Lifelong object recognition, Robotic Vision201954Q. She et al.
THz and thermal video data setThis multispectral data set includes terahertz, thermal, visual, near infrared, and three-dimensional videos of objects hidden under people's clothes.images and 3D point cloudsMore than 20 videos. The duration of each video is about 85 seconds (about 345 frames).AP2JExperiments with hidden object detection20195556Alexei A. Morozov and Olga S. Sushkova

3D Objects

See (Calli et al, 2015)57 for a review of 33 datasets of 3D object as of 2015. See (Downs et al., 2022)58 for a review of more datasets as of 2022.

Dataset NameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
Princeton Shape Benchmark3D polygonal models collected from the Internet1814 models in 92 categories3D polygonal models, categoriesshape-based retrieval and analysis20045960Shilane et al.
Berkeley 3-D Object Dataset (B3DO)Depth and color images collected from crowdsourced Microsoft Kinect users. Annotated in 50 object categories.849 images, in 75 scenescolor image, depth image, object class, bounding boxes, 3D center pointsPredict bounding boxes2011, updated 201461Janoch et al.
ShapeNet3D models. Some are classified into WordNet synsets, like ImageNet. Partially classified into 3,135 categories.3,000,000 models, 220,000 of which are classified.3D models, class labelsPredict class label.201562Chang et al.
ObjectNet3DImages, 3D shapes, and objects 100 categories.90127 images, 201888 objects, 44147 3D shapesimages, 3D shapes, object bounding boxes, category labelsrecognizing the 3D pose and 3D shape of objects from 2D images20166364Xiang et al.
Common Objects in 3D (CO3D)Video frames from videos capturing objects from 50 MS-COCO categories, filmed by people on Amazon Mechanical Turk.6 million frames from 40000 videosmulti-view images, camera poses, 3D point clouds, object categoryPredict object category. Generate objects.2021, updated 2022 as CO3Dv26566Meta AI
Google Scanned ObjectsScanned objects in SDF format.over 10 million202267Google AI
Objectverse-XL3D objectsover 10 million3D objects, metadatanovel view synthesis, 3D object generation202368Deitke et al.
OmniObject3DScanned objects, labelled in 190 daily categories6,000textured meshes, point clouds, multiview images, videosrobust 3D perception, novel-view synthesis,surface reconstruction, 3D object generation20236970Wu et al.
UnCommon Objects in 3D (uCO3D)1,070 categories in the LVIS20257172Meta AI

Object detection and recognition for autonomous vehicles

Dataset NameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
Cityscapes DatasetStereo video sequences recorded in street scenes, with pixel-level annotations. Metadata also included.Pixel-level segmentation and labeling25,000Images, textClassification, object detection201673Daimler AG et al.
German Traffic Sign Detection Benchmark DatasetImages from vehicles of traffic signs on German roads. These signs comply with UN standards and therefore are the same as in other countries.Signs manually labeled900ImagesClassification20137475S. Houben et al.
KITTI Vision Benchmark DatasetAutonomous vehicles driving through a mid-size city captured images of various areas using cameras and laser scanners.Many benchmarks extracted from data.>100 GB of dataImages, textClassification, object detection2012767778A. Geiger et al.
FieldSAFEMulti-modal dataset for obstacle detection in agriculture including stereo camera, thermal camera, web camera, 360-degree camera, lidar, radar, and precise localization.Classes labelled geographically.>400 GB of dataImages and 3D point cloudsClassification, object detection, object localization201779M. Kragh et al.
Daimler Monocular Pedestrian Detection datasetIt is a dataset of pedestrians in urban environments.Pedestrians are box-wise labeled.Labeled part contains 15560 samples with pedestrians and 6744 samples without. Test set contains 21790 images without labels.ImagesObject recognition and classification2006808182Daimler AG
CamVidThe Cambridge-driving Labeled Video Database (CamVid) is a collection of videos.The dataset is labeled with semantic labels for 32 semantic classes.over 700 imagesImagesObject recognition and classification2008838485Gabriel J. Brostow, Jamie Shotton, Julien Fauqueur, Roberto Cipolla
RailSem19RailSem19 is a dataset for understanding scenes for vision systems on railways.The dataset is labeled semanticly and box-wise.8500ImagesObject recognition and classification, scene recognition20198687Oliver Zendel, Markus Murschitz, Marcel Zeilinger, Daniel Steininger, Sara Abbasi, Csaba Beleznai
BOREASBOREAS is a multi-season autonomous driving dataset. It includes data from includes a Velodyne Alpha-Prime (128-beam) lidar, a FLIR Blackfly S camera, a Navtech CIR304-H radar, and an Applanix POS LV GNSS-INS.The data is annotated by 3D bounding boxes.350 km of driving dataImages, Lidar and Radar dataObject recognition and classification, scene recognition20238889Keenan Burnett, David J. Yoon, Yuchen Wu, Andrew Zou Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith Y.K. Leung, Angela P. Schoellig, Timothy D. Barfoot
Bosch Small Traffic Lights DatasetIt is a dataset of traffic lights.The labeling include bounding boxes of traffic lights together with their state (active light).5000 images for training and a video sequence of 8334 frames for evaluationImagesTraffic light recognition20179091Karsten Behrendt, Libor Novak, Rami Botros
FRSignIt is a dataset of French railway signals.The labeling include bounding boxes of railway signals together with their state (active light).more than 100000ImagesRailway signal recognition20209293Jeanine Harb, Nicolas Rébéna, Raphaël Chosidow, Grégoire Roblin, Roman Potarusov, Hatem Hajri
GERALDIt is a dataset of German railway signals.The labeling include bounding boxes of railway signals together with their state (active light).5000ImagesRailway signal recognition20239495Philipp Leibner, Fabian Hampel, Christian Schindler
Multi-cue pedestrianMulti-cue onboard pedestrian detection dataset is a dataset for detection of pedestrians.The databaset is labeled box-wise.1092 image pairs with 1776 boxes for pedestriansImagesObject recognition and classification200996Christian Wojek, Stefan Walk, Bernt Schiele
RAWPEDRAWPED is a dataset for detection of pedestrians in the context of railways.The dataset is labeled box-wise.26000ImagesObject recognition and classification20209798Tugce Toprak, Burak Belenlioglu, Burak Aydın, Cuneyt Guzelis, M. Alper Selver
OSDaR23OSDaR23 is a multi-sensory dataset for detection of objects in the context of railways.The databaset is labeled box-wise.16874 framesImages, Lidar, Radar and InfraredObject recognition and classification202399100Roman Tilly, Rustam Tagiew, Pavel Klasek (DZSF); Philipp Neumaier, Patrick Denzler, Tobias Klockau, Martin Boekhoff, Martin Köppel (Digitale Schiene Deutschland); Karsten Schwalbe (FusionSystems)
AgroverseArgoverse is a multi-sensory dataset for detection of objects in the context of roads.The dataset is annotated box-wise.320 hours of recordingData from 7 cameras and LiDARObject recognition and classification, object tracking2022101102Argo AI, Carnegie Mellon University, Georgia Institute of Technology
Rail3DRail3D is a LiDAR dataset for railways recorded in Hungary, France, and BelgiumThe dataset is annotated semantically288 million annotated pointsLiDARObject recognition and classification, object tracking2024103Abderrazzaq Kharroubi, Ballouch Zouhair, Rafika Hajji, Anass Yarroudh, and Roland Billen; University of Liège and Hassan II Institute of Agronomy and Veterinary Medicine
WHU-Railway3DWHU-Railway3D is a LiDAR dataset for urban, rural, and plateau railways recorded in ChinaThe dataset is annotated semantically4.6 billion annotated data pointsLiDARObject recognition and classification, object tracking2024104Bo Qiu, Yuzhou Zhou, Lei Dai; Bing Wang, Jianping Li, Zhen Dong, Chenglu Wen, Zhiliang Ma, Bisheng Yang; Wuhan University, University of Oxford, Hong Kong Polytechnic University, Nanyang Technological University, Xiamen University and Tsinghua University
RailFOD23A dataset of foreign objects on railway catenaryThe dataset is annotated boxwise14,615 imagesImagesObject recognition and classification, object tracking2024105Zhichao Chen, Jie Yang, Zhicheng Feng, Hao Zhu; Jiangxi University of Science and Technology
ESRORADA dataset of images and point clouds for urban road and rail scenes from Le Havre and RouenThe dataset is annotated boxwise2,700 k virtual images and 100,000 real imagesImages, LiDARObject recognition and classification, object tracking2022106Redouane Khemmar, Antoine Mauri, Camille Dulompont, Jayadeep Gajula, Vincent Vauchey, Madjid Haddad and Rémi Boutteau; Le Havre Normandy University and SEGULA Technologies
RailVIDData recorded by AT615X infrared thermography from InfiRay in diverse railway scenarios, including carport, depot, and straight.The dataset is annotated semantically1,071 imagesinfrared imagesObject recognition and classification, object tracking2022107Hao Yuan, Zhenkun Mei, Yihao Chen, Weilong Niu, Cheng Wu; Soochow University
RailPCLiDAR dataset in the context of railwaysThe dataset is annotated semantically3 billion data pointsLiDARObject recognition and classification, object tracking2024108Tengping Jiang, Shiwei Li, Qinyu Zhang, Guangshuai Wang, Zequn Zhang, Fankun Zeng, Peng An, Xin Jin, Shan Liu, Yongjun Wang ; Nanjing Normal University, Ministry of Natural Resources, Eastern Institute of Technology, Tianjin Key Laboratory of Rail Transit Navigation Positioning and Spatio‐temporal Big Data Technology, Northwest Normal University, Washington University in St. Louis and Ningbo University of Technology
RailCloud-HdFLiDAR dataset in the context of railwaysThe dataset is annotated semantically8060.3 million data pointsLiDARObject recognition and classification, object tracking2024109Mahdi Abid , Mathis Teixeira, Ankur Mahtani and Thomas Laurent; Railenium
RailGoerl24RGB and LiDAR dataset in the context of railwaysThe dataset is annotated boxwise12205 HD RGB frames and 383922305 LiDAR colored cloud pointsRGB, LiDARPerson recognition and classification2025110DZSF, PECS-WORK GmbH, EYYES Deutschland GmbH, TU Dresden

Facial recognition

In computer vision, face images have been used extensively to develop facial recognition systems, face detection, and many other projects that use images of faces. See 111 for a curated list of datasets, focused on the pre-2005 period.

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault taskCreated (updated)ReferenceCreator
Labeled Faces in the Wild (LFW)Images of named individuals obtained by Internet search.frontal face detection, bounding box cropping13233 images of 5749 named individualsimages, labelsunconstrained face recognition2008112113Huang et al.
Aff-Wild298 videos of 200 individuals, ~1,250,000 manually annotated images: annotated in terms of dimensional affect (valence-arousal); in-the-wild setting; color database; various resolutions (average = 640x360)the detected faces, facial landmarks and valence-arousal annotations~1,250,000 manually annotated imagesvideo (visual + audio modalities)affect recognition (valence-arousal estimation)2017CVPR114

IJCV115

D. Kollias et al.
Aff-Wild2558 videos of 458 individuals, ~2,800,000 manually annotated images: annotated in terms of i) categorical affect (7 basic expressions: neutral, happiness, sadness, surprise, fear, disgust, anger); ii) dimensional affect (valence-arousal); iii) action units (AUs 1,2,4,6,12,15,20,25); in-the-wild setting; color database; various resolutions (average = 1030x630)the detected faces, detected and aligned faces and annotations~2,800,000 manually annotated imagesvideo (visual + audio modalities)affect recognition (valence-arousal estimation, basic expression classification, action unit detection)2019BMVC116

FG117

D. Kollias et al.
FERET (facial recognition technology)11338 images of 1199 individuals in different positions and at different times.None.11,338ImagesClassification, face recognition2003118119United States Department of Defense
Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS)7,356 video and audio recordings of 24 professional actors. 8 emotions each at two intensities.Files labelled with expression. Perceptual validation ratings provided by 319 raters.7,356Video, sound filesClassification, face recognition, voice recognition2018120121S.R. Livingstone and F.A. Russo
SCFaceColor images of faces at various angles.Location of facial features extracted. Coordinates of features given.4,160Images, textClassification, face recognition2011122123M. Grgic et al.
Yale Face DatabaseFaces of 15 individuals in 11 different expressions.Labels of expressions.165ImagesFace recognition1997124125J. Yang et al.
Cohn-Kanade AU-Coded Expression DatabaseLarge database of images with labels for expressions.Tracking of certain facial features.500+ sequencesImages, textFacial expression analysis2000126127T. Kanade et al.
JAFFE Facial Expression Database213 images of 7 facial expressions (6 basic facial expressions + 1 neutral) posed by 10 Japanese female models.Images are cropped to the facial region. Includes semantic ratings data on emotion labels.213Images, textFacial expression cognition1998128129Lyons, Kamachi, Gyoba
FaceScrubImages of public figures scrubbed from image searching.Name and m/f annotation.107,818Images, textFace recognition2014130131H. Ng et al.
BioID Face DatabaseImages of faces with eye positions marked.Manually set eye positions.1521Images, textFace recognition2001132BioID
Skin Segmentation DatasetRandomly sampled color values from face images.B, G, R, values extracted.245,057TextSegmentation, classification2012133134R. Bhatt.
Bosphorus3D Face image database.34 action units and 6 expressions labeled; 24 facial landmarks labeled.4652

Images, text

Face recognition, classification2008135136A Savran et al.
UOY 3D-Faceneutral face, 5 expressions: anger, happiness, sadness, eyes closed, eyebrows raised.labeling.5250

Images, text

Face recognition, classification2004137138University of York
CASIA 3D Face DatabaseExpressions: Anger, smile, laugh, surprise, closed eyes.None.4624

Images, text

Face recognition, classification2007139140Institute of Automation, Chinese Academy of Sciences
CASIA NIRExpressions: Anger Disgust Fear Happiness Sadness SurpriseNone.480Annotated Visible Spectrum and Near Infrared Video captures at 25 frames per secondFace recognition, classification2011141Zhao, G. et al.
BU-3DFEneutral face, and 6 expressions: anger, happiness, sadness, surprise, disgust, fear (4 levels). 3D images extracted.None.2500Images, textFacial expression recognition, classification2006142Binghamton University
Face Recognition Grand Challenge DatasetUp to 22 samples for each subject. Expressions: anger, happiness, sadness, surprise, disgust, puffy. 3D Data.None.4007Images, textFace recognition, classification2004143144National Institute of Standards and Technology
GavabdbUp to 61 samples for each subject. Expressions neutral face, smile, frontal accentuated laugh, frontal random gesture. 3D images.None.549Images, textFace recognition, classification2008145146King Juan Carlos University
3D-RMAUp to 100 subjects, expressions mostly neutral. Several poses as well.None.9971Images, textFace recognition, classification2004147148Royal Military Academy (Belgium)
SoF112 persons (66 males and 46 females) wear glasses under different illumination conditions.A set of synthetic filters (blur, occlusions, noise, and posterization ) with different level of difficulty.42,592 (2,662 original image × 16 synthetic image)Images, Mat fileGender classification, face detection, face recognition, age estimation, and glasses detection2017149150Afifi, M. et al.
IMDb-WIKIIMDb and Wikipedia face images with gender and age labels.None523,051ImagesGender classification, face detection, face recognition, age estimation2015151R. Rothe, R. Timofte, L. V. Gool

Action recognition

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
AVA-Kinetics Localized Human Actions VideoAnnotated 80 action classes from keyframes from videos from Kinetics-700.1.6 million annotations. 238,906 video clips, 624,430 keyframes.Annotations, videos.Action prediction2020152153Li et al from Perception Team of Google AI.
TV Human Interaction DatasetVideos from 20 different TV shows for prediction social actions: handshake, high five, hug, kiss and none.None.6,766 video clipsvideo clipsAction prediction2013154Patron-Perez, A. et al.
Berkeley Multimodal Human Action Database (MHAD)Recordings of a single person performing 12 actionsMoCap pre-processing660 action samples8 PhaseSpace Motion Capture, 2 Stereo Cameras, 4 Quad Cameras, 6 accelerometers, 4 microphonesAction classification2013155Ofli, F. et al.
THUMOS DatasetLarge video dataset for action classification.Actions classified and labeled.45M frames of videoVideo, images, textClassification, action detection2013156157Y. Jiang et al.
MEXAction2Video dataset for action localization and spottingActions classified and labeled.1000VideoAction detection2014158Stoian et al.

Handwriting and character recognition

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
Artificial Characters DatasetArtificially generated data describing the structure of 10 capital English letters.Coordinates of lines drawn given as integers. Various other features.6000TextHandwriting recognition, classification1992159H. Guvenir et al.
Letter DatasetUpper-case printed letters.17 features are extracted from all images.20,000TextOCR, classification1991160161D. Slate et al.
CASIA-HWDBOffline handwritten Chinese character database. 3755 classes in the GB 2312 character set.Gray-scaled images with background pixels labeled as 255.1,172,907Images, TextHandwriting recognition, classification2009162CASIA
CASIA-OLHWDBOnline handwritten Chinese character database, collected using Anoto pen on paper. 3755 classes in the GB 2312 character set.Provides the sequences of coordinates of strokes.1,174,364Images, TextHandwriting recognition, classification2009163164CASIA
Character Trajectories DatasetLabeled samples of pen tip trajectories for people writing simple characters.3-dimensional pen tip velocity trajectory matrix for each sample2858TextHandwriting recognition, classification2008165166B. Williams
Chars74K DatasetCharacter recognition in natural images of symbols used in both English and Kannada74,107Character recognition, handwriting recognition, OCR, classification2009167T. de Campos
EMNIST datasetHandwritten characters from 3600 contributorsDerived from NIST Special Database 19. Converted to 28x28 pixel images, matching the MNIST dataset.168800,000Imagescharacter recognition, classification, handwriting recognition2016EMNIST dataset169

Documentation170

Gregory Cohen, et al.
UJI Pen Characters DatasetIsolated handwritten charactersCoordinates of pen position as characters were written given.11,640TextHandwriting recognition, classification2009171172F. Prat et al.
Gisette DatasetHandwriting samples from the often-confused 4 and 9 characters.Features extracted from images, split into train/test, handwriting images size-normalized.13,500Images, textHandwriting recognition, classification2003173Yann LeCun et al.
Omniglot dataset1623 different handwritten characters from 50 different alphabets.Hand-labeled.38,300Images, text, strokesClassification, one-shot learning2015174175American Association for the Advancement of Science
MNIST databaseDatabase of handwritten digits.Hand-labeled.60,000Images, textClassification1994176177National Institute of Standards and Technology
Optical Recognition of Handwritten Digits DatasetNormalized bitmaps of handwritten data.Size normalized and mapped to bitmaps.5620Images, textHandwriting recognition, classification1998178E. Alpaydin et al.
Pen-Based Recognition of Handwritten Digits DatasetHandwritten digits on electronic pen-tablet.Feature vectors extracted to be uniformly spaced.10,992Images, textHandwriting recognition, classification1998179180E. Alpaydin et al.
Semeion Handwritten Digit DatasetHandwritten digits from 80 people.All handwritten digits have been normalized for size and mapped to the same grid.1593Images, textHandwriting recognition, classification2008181T. Srl
HASYv2Handwritten mathematical symbolsAll symbols are centered and of size 32px x 32px.168233Images, textClassification2017182Martin Thoma
Noisy Handwritten Bangla DatasetIncludes Handwritten Numeral Dataset (10 classes) and Basic Character Dataset (50 classes), each dataset has three types of noise: white gaussian, motion blur, and reduced contrast.All images are centered and of size 32x32.Numeral Dataset:

23330,

Character Dataset:

76000

Images,

text

Handwriting recognition,

classification

2017183184M. Karki et al.

Aerial images

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
iSAID: Instance Segmentation in Aerial Images DatasetPrecise instance-level annotatio carried out by professional annotators, cross-checked and validated by expert annotators complying with well-defined guidelines.655,451 (15 classes)Images, jpg, jsonAerial Classification, Object Detection, Instance Segmentation2019185186Syed Waqas Zamir,

Aditya Arora,

Akshita Gupta,

Salman Khan,

Guolei Sun,

Fahad Shahbaz Khan, Fan Zhu,

Ling Shao, Gui-Song Xia, Xiang Bai

Aerial Image Segmentation Dataset80 high-resolution aerial images with spatial resolution ranging from 0.3 to 1.0.Images manually segmented.80ImagesAerial Classification, object detection2013187188J. Yuan et al.
KIT AIS Data SetMultiple labeled training and evaluation datasets of aerial images of crowds.Images manually labeled to show paths of individuals through crowds.~ 150Images with pathsPeople tracking, aerial tracking2012189190M. Butenuth et al.
Wilt DatasetRemote sensing data of diseased trees and other land cover.Various features extracted.4899ImagesClassification, aerial object detection2014191192B. Johnson
MASATI datasetMaritime scenes of optical aerial images from the visible spectrum. It contains color images in dynamic marine environments, each image may contain one or multiple targets in different weather and illumination conditions.Object bounding boxes and labeling.7389ImagesClassification, aerial object detection2018193194A.-J. Gallego et al.
Forest Type Mapping DatasetSatellite imagery of forests in Japan.Image wavelength bands extracted.326TextClassification2015195196B. Johnson
Overhead Imagery Research Data SetAnnotated overhead imagery. Images with multiple objects.Over 30 annotations and over 60 statistics that describe the target within the context of the image.1000Images, textClassification2009197198F. Tanner et al.
SpaceNetSpaceNet is a corpus of commercial satellite imagery and labeled training data.GeoTiff and GeoJSON files containing building footprints.>17533ImagesClassification, Object Identification2017199200201DigitalGlobe, Inc.
UC Merced Land Use DatasetThese images were manually extracted from large images from the USGS National Map Urban Area Imagery collection for various urban areas around the US.This is a 21 class land use image dataset meant for research purposes. There are 100 images for each class.2,100Image chips of 256x256, 30 cm (1 foot) GSDLand cover classification2010202Yi Yang and Shawn Newsam
SAT-4 Airborne DatasetImages were extracted from the National Agriculture Imagery Program (NAIP) dataset.SAT-4 has four broad land cover classes, includes barren land, trees, grassland and a class that consists of all land cover classes other than the above three.500,000ImagesClassification2015203204S. Basu et al.
SAT-6 Airborne DatasetImages were extracted from the National Agriculture Imagery Program (NAIP) dataset.SAT-6 has six broad land cover classes, includes barren land, trees, grassland, roads, buildings and water bodies.405,000ImagesClassification2015205206S. Basu et al.

Underwater images

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
SUIM DatasetThe images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants.Images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor.1,635ImagesSegmentation2020207Md Jahidul Islam et al.
LIACI DatasetImages have been collected during underwater ship inspections and annotated by human domain experts.Images with pixel annotations for ten object categories: defects, corrosion, paint peel, marine growth, sea chest gratings, overboard valves, propeller, anodes, bilge keel and ship hull.1,893ImagesSegmentation2022208Waszak et al.

Other images

Dataset nameBrief descriptionPreprocessingInstancesFormatDefault TaskCreated (updated)ReferenceCreator
Kodak Lossless True Color Image SuiteRGB images for testing image compression.None24ImageImage compression1999209Kodak
NRC-GAMMAA novel benchmark gas meter image datasetNone28,883Image, LabelClassification2021210211A. Ebadi, P. Paul, S. Auer, & S. Tremblay
The SUPATLANTIQUE datasetImages of scanned official and Wikipedia documentsNone4908TIFF/pdfSource device identification, forgery detection, Classification,..2020212C. Ben Rabah et al.
Density functional theory quantum simulations of grapheneLabelled images of raw input to a simulation of grapheneRaw data (in HDF5 format) and output labels from density functional theory quantum simulation60744 test and 501473 training filesLabeled imagesRegression2019213K. Mills & I. Tamblyn
Quantum simulations of an electron in a two dimensional potential wellLabelled images of raw input to a simulation of 2d Quantum mechanicsRaw data (in HDF5 format) and output labels from quantum simulation1.3 million imagesLabeled imagesRegression2017214K. Mills, M.A. Spanner, & I. Tamblyn
MPII Cooking Activities DatasetVideos and images of various cooking activities.Activity paths and directions, labels, fine-grained motion labeling, activity class, still image extraction and labeling.881,755 framesLabeled video, images, textClassification2012215216M. Rohrbach et al.
FAMOS Dataset5,000 unique microstructures, all samples have been acquired 3 times with two different cameras.Original PNG files, sorted per camera and then per acquisition. MATLAB datafiles with one 16384 times 5000 matrix per camera per acquisition.30,000Images and .mat filesAuthentication2012217S. Voloshynovskiy, et al.
PharmaPack Dataset1,000 unique classes with 54 images per class.Class labeling, many local descriptors, like SIFT and aKaZE, and local feature agreators, like Fisher Vector (FV).54,000Images and .mat filesFine-grain classification2017218O. Taran and S. Rezaeifar, et al.
Stanford Dogs DatasetImages of 120 breeds of dogs from around the world.Train/test splits and ImageNet annotations provided.20,580Images, textFine-grain classification2011219220A. Khosla et al.
StanfordExtra Dataset2D keypoints and segmentations for the Stanford Dogs Dataset.2D keypoints and segmentations provided.12,035Labelled images3D reconstruction/pose estimation2020221B. Biggs et al.
The Oxford-IIIT Pet Dataset37 categories of pets with roughly 200 images of each.Breed labeled, tight bounding box, foreground-background segmentation.~ 7,400Images, textClassification, object detection2012222223O. Parkhi et al.
Corel Image Features Data SetDatabase of images with features extracted.Many features including color histogram, co-occurrence texture, and colormoments,68,040TextClassification, object detection1999224225M. Ortega-Bindenberger et al.
Online Video Characteristics and Transcoding Time Dataset.Transcoding times for various different videos and video properties.Video features given.168,286TextRegression2015226T. Deneke et al.
Microsoft Sequential Image Narrative Dataset (SIND)Dataset for sequential vision-to-languageDescriptive caption and storytelling given for each photo, and photos are arranged in sequences81,743Images, textVisual storytelling2016227Microsoft Research
Caltech-UCSD Birds-200-2011 DatasetLarge dataset of images of birds.Part locations for birds, bounding boxes, 312 binary attributes given11,788Images, textClassification2011228229C. Wah et al.
YouTube-8MLarge and diverse labeled video datasetYouTube video IDs and associated labels from a diverse vocabulary of 4800 visual entities8 millionVideo, textVideo classification2016230231S. Abu-El-Haija et al.
YFCC100MLarge and diverse labeled image and video datasetFlickr Videos and Images and associated description, titles, tags, and other metadata (such as EXIF and geotags)100 millionVideo, Image, TextVideo and Image classification2016232233B. Thomee et al.
Discrete LIRIS-ACCEDEShort videos annotated for valence and arousal.Valence and arousal labels.9800VideoVideo emotion elicitation detection2015234Y. Baveye et al.
Continuous LIRIS-ACCEDELong videos annotated for valence and arousal while also collecting Galvanic Skin Response.Valence and arousal labels.30VideoVideo emotion elicitation detection2015235Y. Baveye et al.
MediaEval LIRIS-ACCEDEExtension of Discrete LIRIS-ACCEDE including annotations for violence levels of the films.Violence, valence and arousal labels.10900VideoVideo emotion elicitation detection2015236Y. Baveye et al.
Leeds Sports PoseArticulated human pose annotations in 2000 natural sports images from Flickr.Rough crop around single person of interest with 14 joint labels2000Images plus .mat file labelsHuman pose estimation2010237S. Johnson and M. Everingham
Leeds Sports Pose Extended TrainingArticulated human pose annotations in 10,000 natural sports images from Flickr.14 joint labels via crowdsourcing10000Images plus .mat file labelsHuman pose estimation2011238S. Johnson and M. Everingham
MCQ Dataset6 different real multiple choice-based exams (735 answer sheets and 33,540 answer boxes) to evaluate computer vision techniques and systems developed for multiple choice test assessment systems.None735 answer sheets and 33,540 answer boxesImages and .mat file labelsDevelopment of multiple choice test assessment systems2017239240Afifi, M. et al.
Surveillance VideosReal surveillance videos cover a large surveillance time (7 days with 24 hours each).None19 surveillance videos (7 days with 24 hours each).VideosData compression2016241Taj-Eddin, I. A. T. F. et al.
LILA BCLabeled Information Library of Alexandria: Biology and Conservation. Labeled images that support machine learning research around ecology and environmental science.None~10M imagesImagesClassification2019242LILA working group
Can We See Photosynthesis?32 videos for eight live and eight dead leaves recorded under both DC and AC lighting conditions.None32 videosVideosLiveness detection of plants2017243Taj-Eddin, I. A. T. F. et al.
Mathematical Mathematics MemesCollection of 10,000 memes on mathematics.None~10,000ImagesVisual storytelling, object detection.2021244Mathematical Mathematics Memes
Flickr-Faces-HQ DatasetCollection of images containing a face each, crawled from FlickrPruned with "various automatic filters", cropped and aligned to faces, and had images of statues, paintings, or photos of photos removed via crowdsourcing70,000ImagesFace Generation2019245Karras et al.
Fruits-360 datasetCollection of images containing 170 fruits, vegetables, nuts, and seeds.100x100 pixels, white background.115499Images (jpg)Classification2017–2025246Mihai Oltean

References

  1. Bottou, L.; Cortes, C.; Denker, J.S.; Drucker, H.; Guyon, I.; Jackel, L.D.; LeCun, Y.; Muller, U.A.; Sackinger, E.; Simard, P.; Vapnik, V. (1994). "Comparison of classifier methods: A case study in handwritten digit recognition". Proceedings of the 12th IAPR International Conference on Pattern Recognition (Cat. No.94CH3440-5). Vol. 2. IEEE Comput. Soc. Press. pp. 77–82. doi:10.1109/ICPR.1994.576879. ISBN 978-0-8186-6270-6. 978-0-8186-6270-6

  2. "NIST Special Database 19". NIST. 2010-08-27. https://www.nist.gov/srd/nist-special-database-19

  3. LeCun, Yann. "NORB: Generic Object Recognition in Images". cs.nyu.edu. Retrieved 2025-04-26. https://cs.nyu.edu/~yann/research/norb/

  4. LeCun, Y.; Fu Jie Huang; Bottou, L. (2004). "Learning methods for generic object recognition with invariance to pose and lighting". 2. IEEE: 97–104. doi:10.1109/CVPR.2004.1315150. ISBN 978-0-7695-2158-9. {{cite journal}}: Cite journal requires |journal= (help) 978-0-7695-2158-9

  5. Torralba, A.; Fergus, R.; Freeman, W.T. (November 2008). "80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 30 (11): 1958–1970. doi:10.1109/TPAMI.2008.128. ISSN 0162-8828. PMID 18787244. https://ieeexplore.ieee.org/document/4531741

  6. "The Street View House Numbers (SVHN) Dataset". ufldl.stanford.edu. Retrieved 2025-02-25. http://ufldl.stanford.edu/housenumbers/

  7. Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng. "Reading Digits in Natural Images with Unsupervised Feature Learning" NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011 http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf

  8. Hinton, Geoffrey; Vinyals, Oriol; Dean, Jeff (2015-03-09). "Distilling the Knowledge in a Neural Network". arXiv:1503.02531 [stat.ML]. /wiki/ArXiv_(identifier)

  9. Sun, Chen; Shrivastava, Abhinav; Singh, Saurabh; Gupta, Abhinav (2017). "Revisiting Unreasonable Effectiveness of Data in Deep Learning Era". pp. 843–852. arXiv:1707.02968 [cs.CV]. /wiki/ArXiv_(identifier)

  10. Abnar, Samira; Dehghani, Mostafa; Neyshabur, Behnam; Sedghi, Hanie (2021-10-05). "Exploring the Limits of Large Scale Pre-training". arXiv:2110.02095 [cs.LG]. /wiki/ArXiv_(identifier)

  11. Zhai, Xiaohua; Kolesnikov, Alexander; Houlsby, Neil; Beyer, Lucas (2021-06-08). "Scaling Vision Transformers". arXiv:2106.04560 [cs.CV]. /wiki/ArXiv_(identifier)

  12. Zhou, Bolei; Lapedriza, Agata; Khosla, Aditya; Oliva, Aude; Torralba, Antonio (2018-06-01). "Places: A 10 Million Image Database for Scene Recognition". IEEE Transactions on Pattern Analysis and Machine Intelligence. 40 (6): 1452–1464. doi:10.1109/TPAMI.2017.2723009. ISSN 0162-8828. PMID 28692961. https://ieeexplore.ieee.org/document/7968387

  13. Grauman, Kristen; Westbury, Andrew; Byrne, Eugene; Chavis, Zachary; Furnari, Antonino; Girdhar, Rohit; Hamburger, Jackson; Jiang, Hao; Liu, Miao; Liu, Xingyu; Martin, Miguel; Nagarajan, Tushar; Radosavovic, Ilija; Ramakrishnan, Santhosh Kumar; Ryan, Fiona; Sharma, Jayant; Wray, Michael; Xu, Mengmeng; Xu, Eric Zhongcong; Zhao, Chen; Bansal, Siddhant; Batra, Dhruv; Cartillier, Vincent; Crane, Sean; Do, Tien; Doulaty, Morrie; Erapalli, Akshay; Feichtenhofer, Christoph; Fragomeni, Adriano; Fu, Qichen; Gebreselasie, Abrham; Gonzalez, Cristina; Hillis, James; Huang, Xuhua; Huang, Yifei; Jia, Wenqi; Khoo, Weslie; Kolar, Jachym; Kottur, Satwik; Kumar, Anurag; Landini, Federico; Li, Chao; Li, Yanghao; Li, Zhenqiang; Mangalam, Karttikeya; Modhugu, Raghava; Munro, Jonathan; Murrell, Tullie; Nishiyasu, Takumi; Price, Will; Puentes, Paola Ruiz; Ramazanova, Merey; Sari, Leda; Somasundaram, Kiran; Southerland, Audrey; Sugano, Yusuke; Tao, Ruijie; Vo, Minh; Wang, Yuchen; Wu, Xindi; Yagi, Takuma; Zhao, Ziwei; Zhu, Yunyi; Arbelaez, Pablo; Crandall, David; Damen, Dima; Farinella, Giovanni Maria; Fuegen, Christian; Ghanem, Bernard; Ithapu, Vamsi Krishna; Jawahar, C. V.; Joo, Hanbyul; Kitani, Kris; Li, Haizhou; Newcombe, Richard; Oliva, Aude; Park, Hyun Soo; Rehg, James M.; Sato, Yoichi; Shi, Jianbo; Shou, Mike Zheng; Torralba, Antonio; Torresani, Lorenzo; Yan, Mingfei; Malik, Jitendra (2022). "Ego4D: Around the World in 3,000 Hours of Egocentric Video". arXiv:2110.07058 [cs.CV]. /wiki/ArXiv_(identifier)

  14. Srinivasan, Krishna; Raman, Karthik; Chen, Jiecao; Bendersky, Michael; Najork, Marc (2021-07-11). "WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM. pp. 2443–2449. arXiv:2103.01913. doi:10.1145/3404835.3463257. ISBN 978-1-4503-8037-9. 978-1-4503-8037-9

  15. Krishna, Ranjay; Zhu, Yuke; Groth, Oliver; Johnson, Justin; Hata, Kenji; Kravitz, Joshua; Chen, Stephanie; Kalantidis, Yannis; Li, Li-Jia; Shamma, David A; Bernstein, Michael S; Fei-Fei, Li (2017). "Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations". International Journal of Computer Vision. 123: 32–73. arXiv:1602.07332. doi:10.1007/s11263-016-0981-7. S2CID 4492210. /wiki/ArXiv_(identifier)

  16. Karayev, S., et al. "A category-level 3-D object dataset: putting the Kinect to work." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2011. http://alliejanoch.com/iccvw2011.pdf

  17. Tighe, Joseph, and Svetlana Lazebnik. "Superparsing: scalable nonparametric image parsing with superpixels Archived 6 August 2019 at the Wayback Machine." Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 352–365. /wiki/Svetlana_Lazebnik

  18. Arbelaez, P.; Maire, M; Fowlkes, C; Malik, J (May 2011). "Contour Detection and Hierarchical Image Segmentation" (PDF). IEEE Transactions on Pattern Analysis and Machine Intelligence. 33 (5): 898–916. doi:10.1109/tpami.2010.161. PMID 20733228. S2CID 206764694. Retrieved 27 February 2016. http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/papers/amfm_pami2010.pdf

  19. Lin, Tsung-Yi; Maire, Michael; Belongie, Serge; Bourdev, Lubomir; Girshick, Ross; Hays, James; Perona, Pietro; Ramanan, Deva; Lawrence Zitnick, C.; Dollár, Piotr (2014). "Microsoft COCO: Common Objects in Context". arXiv:1405.0312 [cs.CV]. /wiki/ArXiv_(identifier)

  20. Russakovsky, Olga; et al. (2015). "Imagenet large scale visual recognition challenge". International Journal of Computer Vision. 115 (3): 211–252. arXiv:1409.0575. doi:10.1007/s11263-015-0816-y. hdl:1721.1/104944. S2CID 2930547. /wiki/ArXiv_(identifier)

  21. "COCO – Common Objects in Context". cocodataset.org. https://cocodataset.org/

  22. Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database."Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009. https://www.researchgate.net/profile/Li_Jia_Li/publication/221361415_ImageNet_a_Large-Scale_Hierarchical_Image_Database/links/00b495388120dbc339000000/ImageNet-a-Large-Scale-Hierarchical-Image-Database.pdf

  23. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  24. Russakovsky, Olga; Deng, Jia; Su, Hao; Krause, Jonathan; Satheesh, Sanjeev; et al. (11 April 2015). "ImageNet Large Scale Visual Recognition Challenge". International Journal of Computer Vision. 115 (3): 211–252. arXiv:1409.0575. doi:10.1007/s11263-015-0816-y. hdl:1721.1/104944. S2CID 2930547. /wiki/ArXiv_(identifier)

  25. Xiao, Jianxiong; Hays, James; Ehinger, Krista A.; Oliva, Aude; Torralba, Antonio (June 2010). "SUN database: Large-scale scene recognition from abbey to zoo". 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE. pp. 3485–3492. doi:10.1109/cvpr.2010.5539970. hdl:1721.1/60690. ISBN 978-1-4244-6984-0. 978-1-4244-6984-0

  26. Donahue, Jeff; Jia, Yangqing; Vinyals, Oriol; Hoffman, Judy; Zhang, Ning; Tzeng, Eric; Darrell, Trevor (2013). "DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition". arXiv:1310.1531 [cs.CV]. /wiki/ArXiv_(identifier)

  27. Yu, Fisher; Seff, Ari; Zhang, Yinda; Song, Shuran; Funkhouser, Thomas; Xiao, Jianxiong (2016-06-04). "LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop". arXiv:1506.03365 [cs.CV]. /wiki/ArXiv_(identifier)

  28. "Index of /lsun/". dl.yf.io. Retrieved 2024-09-19. http://dl.yf.io/lsun/

  29. "LSUN". Complex Adaptive Systems Laboratory. Retrieved 2024-09-19. https://complexity.cecs.ucf.edu/lsun/

  30. Gupta, Agrim; Dollar, Piotr; Girshick, Ross (2019). "LVIS: A Dataset for Large Vocabulary Instance Segmentation": 5356–5364. {{cite journal}}: Cite journal requires |journal= (help) https://openaccess.thecvf.com/content_CVPR_2019/html/Gupta_LVIS_A_Dataset_for_Large_Vocabulary_Instance_Segmentation_CVPR_2019_paper.html

  31. Ivan Krasin, Tom Duerig, Neil Alldrin, Andreas Veit, Sami Abu-El-Haija, Serge Belongie, David Cai, Zheyun Feng, Vittorio Ferrari, Victor Gomes, Abhinav Gupta, Dhyanesh Narayanan, Chen Sun, Gal Chechik, Kevin Murphy. "OpenImages: A public dataset for large-scale multi-label and multi-class image classification, 2017. Available from https://github.com/openimages." https://github.com/openimages

  32. Vyas, Apoorv, et al. "Commercial Block Detection in Broadcast News Videos." Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing. ACM, 2014. https://dl.acm.org/citation.cfm?id=2683546

  33. Hauptmann, Alexander G., and Michael J. Witbrock. "Story segmentation and detection of commercials in broadcast news video." Research and Technology Advances in Digital Libraries, 1998. ADL 98. Proceedings. IEEE International Forum on. IEEE, 1998. https://pdfs.semanticscholar.org/5c21/6db7892fa3f515d816f84893bfab1137f0b2.pdf

  34. Tung, Anthony KH, Xin Xu, and Beng Chin Ooi. "Curler: finding and visualizing nonlinear correlation clusters." Proceedings of the 2005 ACM SIGMOD international conference on Management of data. ACM, 2005. https://www.researchgate.net/profile/Anthony_Tung/publication/221214229_CURLER_Finding_and_Visualizing_Nonlinear_Correlated_Clusters/links/55b8691a08aed621de05cd92.pdf

  35. Jarrett, Kevin, et al. "What is the best multi-stage architecture for object recognition?." Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 2009. https://ieeexplore.ieee.org/abstract/document/5459469/

  36. Lazebnik, Svetlana, Cordelia Schmid, and Jean Ponce. "Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories."Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 2. IEEE, 2006. /wiki/Svetlana_Lazebnik

  37. Griffin, G., A. Holub, and P. Perona. Caltech-256 object category dataset California Inst. Technol., Tech. Rep. 7694, 2007. Available: http://authors.library.caltech.edu/7694, 2007. http://authors.library.caltech.edu/7694

  38. Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern information retrieval. Vol. 463. New York: ACM press, 1999.

  39. "🐺 COYO-700M: Image-Text Pair Dataset". Kakao Brain. 2022-11-03. Retrieved 2022-11-03. https://github.com/kakaobrain/coyo-dataset

  40. Fu, Xiping, et al. "NOKMeans: Non-Orthogonal K-means Hashing." Computer Vision—ACCV 2014. Springer International Publishing, 2014. 162–177. https://pdfs.semanticscholar.org/9da2/abae3072fd9fcff0e13b8f00fc21f22d0085.pdf

  41. Heitz, Geremy; et al. (2009). "Shape-based object localization for descriptive classification". International Journal of Computer Vision. 84 (1): 40–62. CiteSeerX 10.1.1.142.280. doi:10.1007/s11263-009-0228-y. S2CID 646320. /wiki/CiteSeerX_(identifier)

  42. Everingham, Mark; et al. (2010). "The pascal visual object classes (voc) challenge". International Journal of Computer Vision. 88 (2): 303–338. doi:10.1007/s11263-009-0275-4. hdl:20.500.11820/88a29de3-6220-442b-ab2d-284210cf72d6. S2CID 4246903. https://www.research.ed.ac.uk/portal/en/publications/the-pascal-visual-object-classes-voc-challenge(88a29de3-6220-442b-ab2d-284210cf72d6).html

  43. Felzenszwalb, Pedro F.; et al. (2010). "Object detection with discriminatively trained part-based models". IEEE Transactions on Pattern Analysis and Machine Intelligence. 32 (9): 1627–1645. CiteSeerX 10.1.1.153.2745. doi:10.1109/tpami.2009.167. PMID 20634557. S2CID 3198903. /wiki/CiteSeerX_(identifier)

  44. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  45. Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A procrustean approach to learning binary codes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. /wiki/Svetlana_Lazebnik

  46. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

  47. Gong, Yunchao, and Svetlana Lazebnik. "Iterative quantization: A procrustean approach to learning binary codes." Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. /wiki/Svetlana_Lazebnik

  48. "CINIC-10 dataset". Luke N. Darlow, Elliot J. Crowley, Antreas Antoniou, Amos J. Storkey (2018) CINIC-10 is not ImageNet or CIFAR-10. 2018-10-09. Retrieved 2018-11-13. http://www.bayeswatch.com/2018/10/09/CINIC/

  49. "fashion-mnist: A MNIST-like fashion product database. Benchmark :point_right". Zalando Research. 2017-10-07. Retrieved 2017-10-07. https://github.com/zalandoresearch/fashion-mnist

  50. "notMNIST dataset". Machine Learning, etc. 2011-09-08. Retrieved 2017-10-13. http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html

  51. Chaladze, G., Kalatozishvili, L. (2017). Linnaeus 5 dataset. Chaladze.com. Retrieved 13 November 2017, from http://chaladze.com/l5/ http://chaladze.com/l5/

  52. Afifi, Mahmoud (2017-11-12). "Gender recognition and biometric identification using a large dataset of hand images". arXiv:1711.04322 [cs.CV]. /wiki/ArXiv_(identifier)

  53. Lomonaco, Vincenzo; Maltoni, Davide (2017-10-18). "CORe50: a New Dataset and Benchmark for Continuous Object Recognition". arXiv:1705.03550 [cs.CV]. /wiki/ArXiv_(identifier)

  54. She, Qi; Feng, Fan; Hao, Xinyue; Yang, Qihan; Lan, Chuanlin; Lomonaco, Vincenzo; Shi, Xuesong; Wang, Zhengwei; Guo, Yao; Zhang, Yimin; Qiao, Fei; Chan, Rosa H.M. (2019-11-15). "OpenLORIS-Object: A Robotic Vision Dataset and Benchmark for Lifelong Deep Learning". arXiv:1911.06487v2 [cs.CV]. /wiki/ArXiv_(identifier)

  55. Morozov, Alexei; Sushkova, Olga (2019-06-13). "THz and thermal video data set". Development of the multi-agent logic programming approach to a human behaviour analysis in a multi-channel video surveillance. Moscow: IRE RAS. Retrieved 2019-07-19. http://www.fullvision.ru/monitoring/description_eng.php

  56. Morozov, Alexei; Sushkova, Olga; Kershner, Ivan; Polupanov, Alexander (2019-07-09). "Development of a method of terahertz intelligent video surveillance based on the semantic fusion of terahertz and 3D video images" (PDF). CEUR. 2391: paper19. Retrieved 2019-07-19. http://ceur-ws.org/Vol-2391/paper19.pdf

  57. Calli, Berk; Walsman, Aaron; Singh, Arjun; Srinivasa, Siddhartha; Abbeel, Pieter; Dollar, Aaron M. (September 2015). "Benchmarking in Manipulation Research: Using the Yale-CMU-Berkeley Object and Model Set". IEEE Robotics & Automation Magazine. 22 (3): 36–52. arXiv:1502.03143. doi:10.1109/MRA.2015.2448951. ISSN 1070-9932. https://ieeexplore.ieee.org/document/7254318

  58. Downs, Laura; Francis, Anthony; Koenig, Nate; Kinman, Brandon; Hickman, Ryan; Reymann, Krista; McHugh, Thomas B.; Vanhoucke, Vincent (2022-05-23). "Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items". 2022 International Conference on Robotics and Automation (ICRA). IEEE. pp. 2553–2560. arXiv:2204.11918. doi:10.1109/ICRA46639.2022.9811809. ISBN 978-1-7281-9681-7. 978-1-7281-9681-7

  59. "Princeton Shape Benchmark". shape.cs.princeton.edu. Retrieved 2025-03-07. https://shape.cs.princeton.edu/benchmark/main.html

  60. Shilane, P.; Min, P.; Kazhdan, M.; Funkhouser, T. (2004). "The princeton shape benchmark". Proceedings Shape Modeling Applications, 2004. IEEE. pp. 167–388. doi:10.1109/SMI.2004.1314504. ISBN 978-0-7695-2075-9. 978-0-7695-2075-9

  61. Janoch, Allison; Karayev, Sergey; Jia, Yangqing; Barron, Jonathan T.; Fritz, Mario; Saenko, Kate; Darrell, Trevor (2013), Fossati, Andrea; Gall, Juergen; Grabner, Helmut; Ren, Xiaofeng (eds.), "A Category-Level 3D Object Dataset: Putting the Kinect to Work", Consumer Depth Cameras for Computer Vision: Research Topics and Applications, London: Springer, pp. 141–165, doi:10.1007/978-1-4471-4640-7_8, ISBN 978-1-4471-4640-7, retrieved 2025-03-07 978-1-4471-4640-7

  62. Chang, Angel X.; Funkhouser, Thomas; Guibas, Leonidas; Hanrahan, Pat; Huang, Qixing; Li, Zimo; Savarese, Silvio; Savva, Manolis; Song, Shuran (2015-12-09). "ShapeNet: An Information-Rich 3D Model Repository". arXiv:1512.03012 [cs.GR]. /wiki/ArXiv_(identifier)

  63. "Computational Vision and Geometry Lab". cvgl.stanford.edu. Retrieved 2025-03-07. https://cvgl.stanford.edu/projects/objectnet3d/

  64. Xiang, Yu; Kim, Wonhui; Chen, Wei; Ji, Jingwei; Choy, Christopher; Su, Hao; Mottaghi, Roozbeh; Guibas, Leonidas; Savarese, Silvio (2016). "ObjectNet3D: A Large Scale Database for 3D Object Recognition". In Leibe, Bastian; Matas, Jiri; Sebe, Nicu; Welling, Max (eds.). Computer Vision – ECCV 2016. Lecture Notes in Computer Science. Vol. 9912. Cham: Springer International Publishing. pp. 160–176. doi:10.1007/978-3-319-46484-8_10. ISBN 978-3-319-46484-8. 978-3-319-46484-8

  65. Reizenstein, Jeremy; Shapovalov, Roman; Henzler, Philipp; Sbordone, Luca; Labatut, Patrick; Novotny, David (2021). "Common Objects in 3D: Large-Scale Learning and Evaluation of Real-Life 3D Category Reconstruction": 10901–10911. {{cite journal}}: Cite journal requires |journal= (help) https://openaccess.thecvf.com/content/ICCV2021/html/Reizenstein_Common_Objects_in_3D_Large-Scale_Learning_and_Evaluation_of_Real-Life_ICCV_2021_paper.html

  66. Reizenstein, Jeremy; Shapovalov, Roman; Henzler, Philipp; Sbordone, Luca; Labatut, Patrick; Novotny, David (2021-09-01). "Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction". arXiv:2109.00512 [cs.CV]. /wiki/ArXiv_(identifier)

  67. Downs, Laura; Francis, Anthony; Koenig, Nate; Kinman, Brandon; Hickman, Ryan; Reymann, Krista; McHugh, Thomas B.; Vanhoucke, Vincent (2022-05-23). "Google Scanned Objects: A High-Quality Dataset of 3D Scanned Household Items". 2022 International Conference on Robotics and Automation (ICRA). IEEE. pp. 2553–2560. arXiv:2204.11918. doi:10.1109/ICRA46639.2022.9811809. ISBN 978-1-7281-9681-7. 978-1-7281-9681-7

  68. Deitke, Matt; Liu, Ruoshi; Wallingford, Matthew; Ngo, Huong; Michel, Oscar; Kusupati, Aditya; Fan, Alan; Laforte, Christian; Voleti, Vikram; Gadre, Samir Yitzhak; VanderBilt, Eli; Kembhavi, Aniruddha; Vondrick, Carl; Gkioxari, Georgia; Ehsani, Kiana (2023-12-15). "Objaverse-XL: A Universe of 10M+ 3D Objects". Advances in Neural Information Processing Systems. 36: 35799–35813. https://proceedings.neurips.cc/paper_files/paper/2023/hash/70364304877b5e767de4e9a2a511be0c-Abstract-Datasets_and_Benchmarks.html

  69. Wu, Tong; Zhang, Jiarui; Fu, Xiao; Wang, Yuxin; Ren, Jiawei; Pan, Liang; Wu, Wayne; Yang, Lei; Wang, Jiaqi; Qian, Chen; Lin, Dahua; Liu, Ziwei (2023). "OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation": 803–814. {{cite journal}}: Cite journal requires |journal= (help) https://openaccess.thecvf.com/content/CVPR2023/html/Wu_OmniObject3D_Large-Vocabulary_3D_Object_Dataset_for_Realistic_Perception_Reconstruction_and_CVPR_2023_paper.html

  70. "OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation". omniobject3d.github.io. Retrieved 2025-03-07. https://omniobject3d.github.io/

  71. "UnCommon Objects in 3D". uco3d.github.io. Retrieved 2025-03-07. https://uco3d.github.io/

  72. Liu, Xingchen; Tayal, Piyush; Wang, Jianyuan; Zarzar, Jesus; Monnier, Tom; Tertikas, Konstantinos; Duan, Jiali; Toisoul, Antoine; Zhang, Jason Y. (2025-01-13). "UnCommon Objects in 3D". arXiv:2501.07574 [cs.CV]. /wiki/ArXiv_(identifier)

  73. M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset." In CVPR Workshop on The Future of Datasets in Vision, 2015. https://www.cityscapes-dataset.com/wordpress/wp-content/papercite-data/pdf/cordts2015cvprw.pdf

  74. Houben, Sebastian, et al. "Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013. https://www.researchgate.net/profile/Sebastian_Houben/publication/242346625_Detection_of_Traffic_Signs_in_Real-World_Images_The_German_Traffic_Sign_Detection_Benchmark/links/0046352a03ec384e97000000/Detection-of-Traffic-Signs-in-Real-World-Images-The-German-Traffic-Sign-Detection-Benchmark.pdf

  75. Mathias, Mayeul, et al. "Traffic sign recognition—How far are we from the solution?." Neural Networks (IJCNN), The 2013 International Joint Conference on. IEEE, 2013. http://www.varcity.eu/paper/ijcnn2013_mathias_trafficsign.pdf

  76. Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. https://www.cvlibs.net/publications/Geiger2012CVPR.pdf

  77. Sturm, Jürgen, et al. "A benchmark for the evaluation of RGB-D SLAM systems." Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 2012. http://jsturm.de/publications/data/sturm12iros.pdf

  78. The KITTI Vision Benchmark Suite on YouTube https://www.youtube.com/watch?v=KXpZ6B1YB_k

  79. Kragh, Mikkel F.; et al. (2017). "FieldSAFE – Dataset for Obstacle Detection in Agriculture". Sensors. 17 (11): 2579. arXiv:1709.03526. Bibcode:2017Senso..17.2579K. doi:10.3390/s17112579. PMC 5713196. PMID 29120383. https://vision.eng.au.dk/fieldsafe

  80. "Papers with Code - Daimler Monocular Pedestrian Detection Dataset". paperswithcode.com. Retrieved 5 May 2023. https://paperswithcode.com/dataset/daimler-monocular-pedestrian-detection

  81. Enzweiler, Markus; Gavrila, Dariu M. (December 2009). "Monocular Pedestrian Detection: Survey and Experiments". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (12): 2179–2195. doi:10.1109/TPAMI.2008.260. ISSN 1939-3539. PMID 19834140. S2CID 1192198. https://ieeexplore.ieee.org/document/4657363

  82. Yin, Guojun; Liu, Bin; Zhu, Huihui; Gong, Tao; Yu, Nenghai (28 July 2020). "A Large Scale Urban Surveillance Video Dataset for Multiple-Object Tracking and Behavior Analysis". arXiv:1904.11784 [cs.CV]. /wiki/ArXiv_(identifier)

  83. "Object Recognition in Video Dataset". mi.eng.cam.ac.uk. Retrieved 5 May 2023. https://mi.eng.cam.ac.uk/research/projects/VideoRec/CamVid/

  84. Brostow, Gabriel J.; Shotton, Jamie; Fauqueur, Julien; Cipolla, Roberto (2008). "Segmentation and Recognition Using Structure from Motion Point Clouds". Computer Vision – ECCV 2008. Lecture Notes in Computer Science. Vol. 5302. Springer. pp. 44–57. doi:10.1007/978-3-540-88682-2_5. ISBN 978-3-540-88681-5. 978-3-540-88681-5

  85. Brostow, Gabriel J.; Fauqueur, Julien; Cipolla, Roberto (15 January 2009). "Semantic object classes in video: A high-definition ground truth database". Pattern Recognition Letters. 30 (2): 88–97. Bibcode:2009PaReL..30...88B. doi:10.1016/j.patrec.2008.04.005. ISSN 0167-8655. https://www.sciencedirect.com/science/article/abs/pii/S0167865508001220

  86. "WildDash 2 Benchmark". wilddash.cc. Retrieved 5 May 2023. https://wilddash.cc/railsem19

  87. Zendel, Oliver; Murschitz, Markus; Zeilinger, Marcel; Steininger, Daniel; Abbasi, Sara; Beleznai, Csaba (June 2019). "RailSem19: A Dataset for Semantic Rail Scene Understanding". 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 1221–1229. doi:10.1109/CVPRW.2019.00161. ISBN 978-1-7281-2506-0. S2CID 198166233. 978-1-7281-2506-0

  88. "The Boreas Dataset". www.boreas.utias.utoronto.ca. Retrieved 5 May 2023. https://www.boreas.utias.utoronto.ca/#/

  89. Burnett, Keenan; Yoon, David J.; Wu, Yuchen; Li, Andrew Zou; Zhang, Haowei; Lu, Shichen; Qian, Jingxing; Tseng, Wei-Kang; Lambert, Andrew; Leung, Keith Y. K.; Schoellig, Angela P.; Barfoot, Timothy D. (26 January 2023). "Boreas: A Multi-Season Autonomous Driving Dataset". arXiv:2203.10168 [cs.RO]. /wiki/Angela_Schoellig

  90. "Bosch Small Traffic Lights Dataset". hci.iwr.uni-heidelberg.de. 1 March 2017. Retrieved 5 May 2023. https://hci.iwr.uni-heidelberg.de/content/bosch-small-traffic-lights-dataset

  91. Behrendt, Karsten; Novak, Libor; Botros, Rami (May 2017). "A deep learning approach to traffic lights: Detection, tracking, and classification". 2017 IEEE International Conference on Robotics and Automation (ICRA). pp. 1370–1377. doi:10.1109/ICRA.2017.7989163. ISBN 978-1-5090-4633-1. S2CID 6257133. 978-1-5090-4633-1

  92. "FRSign Dataset". frsign.irt-systemx.fr. Retrieved 5 May 2023. https://frsign.irt-systemx.fr/

  93. Harb, Jeanine; Rébéna, Nicolas; Chosidow, Raphaël; Roblin, Grégoire; Potarusov, Roman; Hajri, Hatem (5 February 2020). "FRSign: A Large-Scale Traffic Light Dataset for Autonomous Trains". arXiv:2002.05665 [cs.CY]. /wiki/ArXiv_(identifier)

  94. "ifs-rwth-aachen/GERALD". Chair and Institute for Rail Vehicles and Transport Systems. 30 April 2023. Retrieved 5 May 2023. https://github.com/ifs-rwth-aachen/GERALD

  95. Leibner, Philipp; Hampel, Fabian; Schindler, Christian (3 April 2023). "GERALD: A novel dataset for the detection of German mainline railway signals". Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit. 237 (10): 1332–1342. doi:10.1177/09544097231166472. ISSN 0954-4097. S2CID 257939937. https://journals.sagepub.com/doi/abs/10.1177/09544097231166472

  96. Wojek, Christian; Walk, Stefan; Schiele, Bernt (June 2009). "Multi-cue onboard pedestrian detection". 2009 IEEE Conference on Computer Vision and Pattern Recognition. pp. 794–801. doi:10.1109/CVPR.2009.5206638. ISBN 978-1-4244-3992-8. S2CID 18000078. 978-1-4244-3992-8

  97. Toprak, Tuğçe; Aydın, Burak; Belenlioğlu, Burak; Güzeliş, Cüneyt; Selver, M. Alper (5 April 2020). "Conditional Weighted Ensemble of Transferred Models for Camera Based Onboard Pedestrian Detection in Railway Driver Support Systems". IEEE Transactions on Vehicular Technology: 1. doi:10.1109/TVT.2020.2983825. S2CID 216510283. Retrieved 5 May 2023. https://zenodo.org/record/3741742

  98. Toprak, Tugce; Belenlioglu, Burak; Aydın, Burak; Guzelis, Cuneyt; Selver, M. Alper (May 2020). "Conditional Weighted Ensemble of Transferred Models for Camera Based Onboard Pedestrian Detection in Railway Driver Support Systems". IEEE Transactions on Vehicular Technology. 69 (5): 5041–5054. doi:10.1109/TVT.2020.2983825. ISSN 1939-9359. S2CID 216510283. https://ieeexplore.ieee.org/document/9050835

  99. Tilly, Roman; Neumaier, Philipp; Schwalbe, Karsten; Klasek, Pavel; Tagiew, Rustam; Denzler, Patrick; Klockau, Tobias; Boekhoff, Martin; Köppel, Martin (2023). "Open Sensor Data for Rail 2023". FID Move (in German). doi:10.57806/9mv146r0. /wiki/Doi_(identifier)

  100. Tagiew, Rustam; Köppel, Martin; Schwalbe, Karsten; Denzler, Patrick; Neumaier, Philipp; Klockau, Tobias; Boekhoff, Martin; Klasek, Pavel; Tilly, Roman (4 May 2023). "OSDaR23: Open Sensor Data for Rail 2023". 2023 8th International Conference on Robotics and Automation Engineering (ICRAE). pp. 270–276. arXiv:2305.03001. doi:10.1109/ICRAE59816.2023.10458449. ISBN 979-8-3503-2765-6. 979-8-3503-2765-6

  101. "Home". Argoverse. Retrieved 5 May 2023. https://www.argoverse.org/

  102. Chang, Ming-Fang; Lambert, John; Sangkloy, Patsorn; Singh, Jagjeet; Bak, Slawomir; Hartnett, Andrew; Wang, De; Carr, Peter; Lucey, Simon; Ramanan, Deva; Hays, James (6 November 2019). "Argoverse: 3D Tracking and Forecasting with Rich Maps". arXiv:1911.02620 [cs.CV]. /wiki/ArXiv_(identifier)

  103. Kharroubi, Abderrazzaq; Ballouch, Zouhair; Hajji, Rafika; Yarroudh, Anass; Billen, Roland (9 April 2024). "Multi-Context Point Cloud Dataset and Machine Learning for Railway Semantic Segmentation". Infrastructures. 9 (4): 71. doi:10.3390/infrastructures9040071. https://doi.org/10.3390%2Finfrastructures9040071

  104. Qiu, Bo; Zhou, Yuzhou; Dai, Lei; Wang, Bing; Li, Jianping; Dong, Zhen; Wen, Chenglu; Ma, Zhiliang; Yang, Bisheng (December 2024). "WHU-Railway3D: A Diverse Dataset and Benchmark for Railway Point Cloud Semantic Segmentation". IEEE Transactions on Intelligent Transportation Systems. 25 (12): 20900–20916. doi:10.1109/TITS.2024.3469546. ISSN 1558-0016. https://ieeexplore.ieee.org/document/10716569

  105. Chen, Zhichao; Yang, Jie; Feng, Zhicheng; Zhu, Hao (16 January 2024). "RailFOD23: A dataset for foreign object detection on railroad transmission lines". Scientific Data. 11 (1): 72. Bibcode:2024NatSD..11...72C. doi:10.1038/s41597-024-02918-9. ISSN 2052-4463. PMC 10791632. PMID 38228610. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10791632

  106. Khemmar, Redouane; Mauri, Antoine; Dulompont, Camille; Gajula, Jayadeep; Vauchey, Vincent; Haddad, Madjid; Boutteau, Rémi (22 May 2022). "Road and Railway Smart Mobility: A High-Definition Ground Truth Hybrid Dataset". Sensors. 22 (10): 3922. Bibcode:2022Senso..22.3922K. doi:10.3390/s22103922. PMC 9143394. PMID 35632331. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9143394

  107. ICONS 2022: the seventeenth International Conference on Systems: April 24-28, 2022, Barcelona, Spain. Wilmington, DE, USA: IARIA. 2022. ISBN 978-1-61208-941-6. 978-1-61208-941-6

  108. Jiang, Tengping; Li, Shiwei; Zhang, Qinyu; Wang, Guangshuai; Zhang, Zequn; Zeng, Fankun; An, Peng; Jin, Xin; Liu, Shan; Wang, Yongjun (2024). "RailPC: A large-scale railway point cloud semantic segmentation dataset". CAAI Transactions on Intelligence Technology. 9 (6): 1548–1560. doi:10.1049/cit2.12349. ISSN 2468-2322. https://doi.org/10.1049%2Fcit2.12349

  109. Abid, Mahdi; Teixeira, Mathis; Mahtani, Ankur; Laurent, Thomas (2024). "RailCloud-HdF: A Large-Scale Point Cloud Dataset for Railway Scene Semantic Segmentation". Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications. pp. 159–170. doi:10.5220/0012394800003660. ISBN 978-989-758-679-8. 978-989-758-679-8

  110. Rustam, Tagiew; Ilkay, Wunderlich; Philipp, Zanitzer; Mark, Sastuba; Carsten, Knoll; Kilian, Göller; Haadia, Amjad; Steffen, Seitz (2025). "Görlitz Rail Test Center CV Dataset 2024 (RailGoerl24)". German National Library of Science and Technology. https://data.fid-move.de/de/dataset/railgoerl24

  111. "Face Recognition Homepage - Databases". www.face-rec.org. Retrieved 2025-04-26. https://www.face-rec.org/databases/

  112. Huang, Gary B., et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Vol. 1. No. 2. Technical Report 07-49, University of Massachusetts, Amherst, 2007. https://hal.inria.fr/docs/00/32/19/23/PDF/Huang_long_eccv2008-lfw.pdf

  113. "LFW Face Database : Main". web.archive.org. 2012-12-01. Archived from the original on 2012-12-01. Retrieved 2025-04-26. https://web.archive.org/web/20121201044531/http://vis-www.cs.umass.edu/lfw

  114. Zafeiriou, S.; Kollias, D.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; Kotsia, I. (2017). "Aff-Wild: Valence and Arousal 'In-the-Wild' Challenge" (PDF). 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). pp. 1980–1987. doi:10.1109/CVPRW.2017.248. ISBN 978-1-5386-0733-6. S2CID 3107614. 978-1-5386-0733-6

  115. Kollias, D.; Tzirakis, P.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; Schuller, B.; Kotsia, I.; Zafeiriou, S. (2019). "Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond". International Journal of Computer Vision. 127 (6–7): 907–929. arXiv:1804.10938. doi:10.1007/s11263-019-01158-4. S2CID 13679040. https://rdcu.be/bmGm2

  116. Kollias, D.; Zafeiriou, S. (2019). "Expression, affect, action unit recognition: Aff-wild2, multi-task learning and arcface" (PDF). British Machine Vision Conference (BMVC), 2019. arXiv:1910.04855. https://bmvc2019.org/wp-content/uploads/papers/0399-paper.pdf

  117. Kollias, D.; Schulc, A.; Hajiyev, E.; Zafeiriou, S. (2020). "Analysing Affective Behavior in the First ABAW 2020 Competition". 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020). pp. 637–643. arXiv:2001.11409. doi:10.1109/FG47880.2020.00126. ISBN 978-1-7281-3079-8. S2CID 210966051. 978-1-7281-3079-8

  118. Phillips, P. Jonathon; et al. (1998). "The FERET database and evaluation procedure for face-recognition algorithms". Image and Vision Computing. 16 (5): 295–306. doi:10.1016/s0262-8856(97)00070-x. /wiki/Doi_(identifier)

  119. Wiskott, Laurenz; et al. (1997). "Face recognition by elastic bunch graph matching". IEEE Transactions on Pattern Analysis and Machine Intelligence. 19 (7): 775–779. CiteSeerX 10.1.1.44.2321. doi:10.1109/34.598235. S2CID 30523165. /wiki/CiteSeerX_(identifier)

  120. Livingstone, Steven R.; Russo, Frank A. (2018). "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English". PLOS ONE. 13 (5): e0196391. Bibcode:2018PLoSO..1396391L. doi:10.1371/journal.pone.0196391. PMC 5955500. PMID 29768426. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5955500

  121. Livingstone, Steven R.; Russo, Frank A. (2018). "Emotion". The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). doi:10.5281/zenodo.1188976. /wiki/Doi_(identifier)

  122. Grgic, Mislav; Delac, Kresimir; Grgic, Sonja (2011). "SCface–surveillance cameras face database". Multimedia Tools and Applications. 51 (3): 863–879. doi:10.1007/s11042-009-0417-2. S2CID 207218990. /wiki/Doi_(identifier)

  123. Wallace, Roy, et al. "Inter-session variability modelling and joint factor analysis for face authentication." Biometrics (IJCB), 2011 International Joint Conference on. IEEE, 2011. https://repository.ubn.ru.nl/bitstream/handle/2066/94489/94489.pdf

  124. Georghiades, A. "Yale face database". Center for Computational Vision and Control at Yale University. 2: 1997. http://CVC.yale.edu/Projects/Yalefaces/Yalefa

  125. Nguyen, Duy; et al. (2006). "Real-time face detection and lip feature extraction using field-programmable gate arrays". IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. 36 (4): 902–912. CiteSeerX 10.1.1.156.9848. doi:10.1109/tsmcb.2005.862728. PMID 16903373. S2CID 7334355. /wiki/CiteSeerX_(identifier)

  126. Kanade, Takeo, Jeffrey F. Cohn, and Yingli Tian. "Comprehensive database for facial expression analysis." Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on. IEEE, 2000. /wiki/Takeo_Kanade

  127. Zeng, Zhihong; et al. (2009). "A survey of affect recognition methods: Audio, visual, and spontaneous expressions". IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (1): 39–58. CiteSeerX 10.1.1.144.217. doi:10.1109/tpami.2008.52. PMID 19029545. /wiki/CiteSeerX_(identifier)

  128. Lyons, Michael; Kamachi, Miyuki; Gyoba, Jiro (1998). "Facial expression images". The Japanese Female Facial Expression (JAFFE) Database. doi:10.5281/zenodo.3451524. /wiki/Doi_(identifier)

  129. Lyons, Michael; Akamatsu, Shigeru; Kamachi, Miyuki; Gyoba, Jiro "Coding facial expressions with Gabor wavelets." Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on. IEEE, 1998. https://zenodo.org/record/3430156

  130. Ng, Hong-Wei, and Stefan Winkler. "A data-driven approach to cleaning large face datasets Archived 6 December 2019 at the Wayback Machine." Image Processing (ICIP), 2014 IEEE International Conference on. IEEE, 2014. http://vintage.winklerbros.net/Publications/icip2014a.pdf

  131. RoyChowdhury, Aruni; Lin, Tsung-Yu; Maji, Subhransu; Learned-Miller, Erik (2015). "One-to-many face recognition with bilinear CNNs". arXiv:1506.01342 [cs.CV]. /wiki/ArXiv_(identifier)

  132. Jesorsky, Oliver, Klaus J. Kirchberg, and Robert W. Frischholz. "Robust face detection using the hausdorff distance." Audio-and video-based biometric person authentication. Springer Berlin Heidelberg, 2001.

  133. Bhatt, Rajen B., et al. "Efficient skin region segmentation using low complexity fuzzy decision tree model." India Conference (INDICON), 2009 Annual IEEE. IEEE, 2009. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.708.9158&rep=rep1&type=pdf

  134. Lingala, Mounika; et al. (2014). "Fuzzy logic color detection: Blue areas in melanoma dermoscopy images". Computerized Medical Imaging and Graphics. 38 (5): 403–410. doi:10.1016/j.compmedimag.2014.03.007. PMC 4287461. PMID 24786720. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4287461

  135. Maes, Chris, et al. "Feature detection on 3D face surfaces for pose normalisation and recognition." Biometrics: Theory Applications and Systems (BTAS), 2010 Fourth IEEE International Conference on. IEEE, 2010. https://lirias.kuleuven.be/retrieve/135678

  136. Savran, Arman, et al. "Bosphorus database for 3D face analysis." Biometrics and Identity Management. Springer Berlin Heidelberg, 2008. 47–56. https://web.archive.org/web/20190222192331/http://pdfs.semanticscholar.org/4254/fbba3846008f50671edc9cf70b99d7304543.pdf

  137. Heseltine, Thomas, Nick Pears, and Jim Austin. "Three-dimensional face recognition: An eigensurface approach." Image Processing, 2004. ICIP'04. 2004 International Conference on. Vol. 2. IEEE, 2004. http://eprints.whiterose.ac.uk/1526/01/austinj4.pdf

  138. Ge, Yun; et al. (2011). "3D Novel Face Sample Modeling for Face Recognition". Journal of Multimedia. 6 (5): 467–475. CiteSeerX 10.1.1.461.9710. doi:10.4304/jmm.6.5.467-475. /wiki/CiteSeerX_(identifier)

  139. Wang, Yueming; Liu, Jianzhuang; Tang, Xiaoou (2010). "Robust 3D face recognition by local shape difference boosting". IEEE Transactions on Pattern Analysis and Machine Intelligence. 32 (10): 1858–1870. CiteSeerX 10.1.1.471.2424. doi:10.1109/tpami.2009.200. PMID 20724762. S2CID 15263913. /wiki/CiteSeerX_(identifier)

  140. Zhong, Cheng, Zhenan Sun, and Tieniu Tan. "Robust 3D face recognition using learned visual codebook." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.580.8534&rep=rep1&type=pdf

  141. Zhao, G.; Huang, X.; Taini, M.; Li, S. Z.; Pietikäinen, M. (2011). "Facial expression recognition from near-infrared videos" (PDF). Image and Vision Computing. 29 (9): 607–619. doi:10.1016/j.imavis.2011.07.002.[dead link] http://www.academia.edu/download/42229488/Image_and_Vision_Computing20160206-29020-1auzaon.pdf

  142. Soyel, Hamit, and Hasan Demirel. "Facial expression recognition using 3D facial feature distances." Image Analysis and Recognition. Springer Berlin Heidelberg, 2007. 831–838. https://pdfs.semanticscholar.org/cf81/4b618fcbc9a556cdce225e74a8806867ba84.pdf

  143. Bowyer, Kevin W.; Chang, Kyong; Flynn, Patrick (2006). "A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition". Computer Vision and Image Understanding. 101 (1): 1–15. CiteSeerX 10.1.1.134.8784. doi:10.1016/j.cviu.2005.05.005. /wiki/CiteSeerX_(identifier)

  144. Tan, Xiaoyang; Triggs, Bill (2010). "Enhanced local texture feature sets for face recognition under difficult lighting conditions". IEEE Transactions on Image Processing. 19 (6): 1635–1650. Bibcode:2010ITIP...19.1635T. CiteSeerX 10.1.1.105.3355. doi:10.1109/tip.2010.2042645. PMID 20172829. S2CID 4943234. /wiki/Bibcode_(identifier)

  145. Mousavi, Mir Hashem; Faez, Karim; Asghari, Amin (2008). "Three Dimensional Face Recognition Using SVM Classifier". Seventh IEEE/ACIS International Conference on Computer and Information Science (Icis 2008). pp. 208–213. doi:10.1109/ICIS.2008.77. ISBN 978-0-7695-3131-1. S2CID 2710422. 978-0-7695-3131-1

  146. Amberg, Brian; Knothe, Reinhard; Vetter, Thomas (2008). "Expression invariant 3D face recognition with a Morphable Model" (PDF). 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition. pp. 1–6. doi:10.1109/AFGR.2008.4813376. ISBN 978-1-4244-2154-1. S2CID 5651453. Archived from the original (PDF) on 28 July 2018. Retrieved 6 August 2019. 978-1-4244-2154-1

  147. Irfanoglu, M.O.; Gokberk, B.; Akarun, L. (2004). "3D shape-based face recognition using automatically registered facial surfaces". Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. pp. 183–186 Vol.4. doi:10.1109/ICPR.2004.1333734. ISBN 0-7695-2128-2. S2CID 10987293. 0-7695-2128-2

  148. Beumier, Charles; Acheroy, Marc (2001). "Face verification from 3D and grey level clues". Pattern Recognition Letters. 22 (12): 1321–1329. Bibcode:2001PaReL..22.1321B. doi:10.1016/s0167-8655(01)00077-0. /wiki/Bibcode_(identifier)

  149. Afifi, Mahmoud; Abdelhamed, Abdelrahman (2017-06-13). "AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces". arXiv:1706.04277 [cs.CV]. /wiki/ArXiv_(identifier)

  150. "SoF dataset". sites.google.com. Retrieved 2017-11-18. https://sites.google.com/view/sof-dataset

  151. "IMDb-WIKI". data.vision.ee.ethz.ch. Retrieved 2018-03-13. https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/

  152. "AVA: A Video Dataset of Atomic Visual Action". research.google.com. Retrieved 2024-10-18. https://research.google.com/ava/

  153. Li, Ang; Thotakuri, Meghana; Ross, David A.; Carreira, João; Vostrikov, Alexander; Zisserman, Andrew (2020-05-20). "The AVA-Kinetics Localized Human Actions Video Dataset". arXiv:2005.00214 [cs.CV]. /wiki/ArXiv_(identifier)

  154. Patron-Perez, A.; Marszalek, M.; Reid, I.; Zisserman, A. (2012). "Structured learning of human interactions in TV shows". IEEE Transactions on Pattern Analysis and Machine Intelligence. 34 (12): 2441–2453. doi:10.1109/tpami.2012.24. PMID 23079467. S2CID 6060568. /wiki/Doi_(identifier)

  155. Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., & Bajcsy, R. (January 2013). Berkeley MHAD: A comprehensive multimodal human action database. In Applications of Computer Vision (WACV), 2013 IEEE Workshop on (pp. 53–60). IEEE. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.432.5113&rep=rep1&type=pdf

  156. Jiang, Y. G., et al. "THUMOS challenge: Action recognition with a large number of classes." ICCV Workshop on Action Recognition with a Large Number of Classes, http://crcv.ucf.edu/ICCV13-Action-Workshop. 2013. http://crcv.ucf.edu/ICCV13-Action-Workshop

  157. Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional networks for action recognition in videos." Advances in Neural Information Processing Systems. 2014. https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf

  158. Stoian, Andrei; Ferecatu, Marin; Benois-Pineau, Jenny; Crucianu, Michel (2016). "Fast Action Localization in Large-Scale Video Archives". IEEE Transactions on Circuits and Systems for Video Technology. 26 (10): 1917–1930. doi:10.1109/TCSVT.2015.2475835. S2CID 31537462. /wiki/Doi_(identifier)

  159. Botta, M., A. Giordana, and L. Saitta. "Learning fuzzy concept definitions." Fuzzy Systems, 1993., Second IEEE International Conference on. IEEE, 1993. /wiki/Lorenza_Saitta

  160. Frey, Peter W.; Slate, David J. (1991). "Letter recognition using Holland-style adaptive classifiers". Machine Learning. 6 (2): 161–182. doi:10.1007/bf00114162. https://doi.org/10.1007%2Fbf00114162

  161. Peltonen, Jaakko; Klami, Arto; Kaski, Samuel (2004). "Improved learning of Riemannian metrics for exploratory analysis". Neural Networks. 17 (8): 1087–1100. CiteSeerX 10.1.1.59.4865. doi:10.1016/j.neunet.2004.06.008. PMID 15555853. /wiki/CiteSeerX_(identifier)

  162. Liu, Cheng-Lin; Yin, Fei; Wang, Da-Han; Wang, Qiu-Feng (January 2013). "Online and offline handwritten Chinese character recognition: Benchmarking on new databases". Pattern Recognition. 46 (1): 155–162. Bibcode:2013PatRe..46..155L. doi:10.1016/j.patcog.2012.06.021. /wiki/Bibcode_(identifier)

  163. Wang, D.; Liu, C.; Yu, J.; Zhou, X. (2009). "CASIA-OLHWDB1: A Database of Online Handwritten Chinese Characters". 2009 10th International Conference on Document Analysis and Recognition. pp. 1206–1210. doi:10.1109/ICDAR.2009.163. ISBN 978-1-4244-4500-4. S2CID 5705532. 978-1-4244-4500-4

  164. Liu, Cheng-Lin; Yin, Fei; Wang, Da-Han; Wang, Qiu-Feng (January 2013). "Online and offline handwritten Chinese character recognition: Benchmarking on new databases". Pattern Recognition. 46 (1): 155–162. Bibcode:2013PatRe..46..155L. doi:10.1016/j.patcog.2012.06.021. /wiki/Bibcode_(identifier)

  165. Williams, Ben H., Marc Toussaint, and Amos J. Storkey. Extracting motion primitives from natural handwriting data. Springer Berlin Heidelberg, 2006. https://www.era.lib.ed.ac.uk/bitstream/handle/1842/3221/BH%20Williams%20PhD%20thesis%2009.pdf?sequence=1

  166. Meier, Franziska, et al. "Movement segmentation using a primitive library."Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. IEEE, 2011. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.395.8598&rep=rep1&type=pdf

  167. T. E. de Campos, B. R. Babu and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009 http://personal.ee.surrey.ac.uk/Personal/T.Decampos/papers/decampos_etal_visapp2009.pdf

  168. Cohen, Gregory; Afshar, Saeed; Tapson, Jonathan; André van Schaik (2017). "EMNIST: An extension of MNIST to handwritten letters". arXiv:1702.05373v1 [cs.CV]. /wiki/ArXiv_(identifier)

  169. "The EMNIST Dataset". NIST. 4 April 2017. https://www.nist.gov/itl/products-and-services/emnist-dataset

  170. Cohen, Gregory; Afshar, Saeed; Tapson, Jonathan; André van Schaik (2017). "EMNIST: An extension of MNIST to handwritten letters". arXiv:1702.05373 [cs.CV]. /wiki/ArXiv_(identifier)

  171. Llorens, David, et al. "The UJIpenchars Database: a Pen-Based Database of Isolated Handwritten Characters." LREC. 2008. https://web.archive.org/web/20190806015012/https://pdfs.semanticscholar.org/24cf/ef15094c59322560377bbf8e4185245c654f.pdf

  172. Calderara, Simone; Prati, Andrea; Cucchiara, Rita (2011). "Mixtures of von mises distributions for people trajectory shape analysis". IEEE Transactions on Circuits and Systems for Video Technology. 21 (4): 457–471. doi:10.1109/tcsvt.2011.2125550. S2CID 1427766. /wiki/Doi_(identifier)

  173. Guyon, Isabelle, et al. "Result analysis of the nips 2003 feature selection challenge." Advances in neural information processing systems. 2004. http://papers.nips.cc/paper/2728-result-analysis-of-the-nips-2003-feature-selection-challenge.pdf

  174. Lake, B. M.; Salakhutdinov, R.; Tenenbaum, J. B. (2015-12-11). "Human-level concept learning through probabilistic program induction". Science. 350 (6266): 1332–1338. Bibcode:2015Sci...350.1332L. doi:10.1126/science.aab3050. ISSN 0036-8075. PMID 26659050. https://doi.org/10.1126%2Fscience.aab3050

  175. Lake, Brenden (2019-11-09). "Omniglot data set for one-shot learning". GitHub. Retrieved 2019-11-10. https://github.com/brendenlake/omniglot

  176. LeCun, Yann; et al. (1998). "Gradient-based learning applied to document recognition". Proceedings of the IEEE. 86 (11): 2278–2324. CiteSeerX 10.1.1.32.9552. doi:10.1109/5.726791. S2CID 14542261. /wiki/CiteSeerX_(identifier)

  177. Kussul, Ernst; Baidyk, Tatiana (2004). "Improved method of handwritten digit recognition tested on MNIST database". Image and Vision Computing. 22 (12): 971–981. doi:10.1016/j.imavis.2004.03.008. /wiki/Tetyana_Baydyk

  178. Xu, Lei; Krzyżak, Adam; Suen, Ching Y. (1992). "Methods of combining multiple classifiers and their applications to handwriting recognition". IEEE Transactions on Systems, Man, and Cybernetics. 22 (3): 418–435. doi:10.1109/21.155943. hdl:10338.dmlcz/135217. /wiki/Doi_(identifier)

  179. Alimoglu, Fevzi, et al. "Combining multiple classifiers for pen-based handwritten digit recognition." (1996). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.6299

  180. Tang, E. Ke; et al. (2005). "Linear dimensionality reduction using relevance weighted LDA". Pattern Recognition. 38 (4): 485–493. Bibcode:2005PatRe..38..485T. doi:10.1016/j.patcog.2004.09.005. S2CID 10580110. /wiki/Bibcode_(identifier)

  181. Hong, Yi, et al. "Learning a mixture of sparse distance metrics for classification and dimensionality reduction." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. https://pages.ucsd.edu/~ztu/publication/iccv11_sparsemetric.pdf

  182. Thoma, Martin (2017). "The HASYv2 dataset". arXiv:1701.08380 [cs.CV]. /wiki/ArXiv_(identifier)

  183. Karki, Manohar; Liu, Qun; DiBiano, Robert; Basu, Saikat; Mukhopadhyay, Supratik (2018-06-20). "Pixel-level Reconstruction and Classification for Noisy Handwritten Bangla Characters". arXiv:1806.08037 [cs.CV]. /wiki/ArXiv_(identifier)

  184. Liu, Qun; Collier, Edward; Mukhopadhyay, Supratik (2019). "PCGAN-CHAR: Progressively Trained Classifier Generative Adversarial Networks for Classification of Noisy Handwritten Bangla Characters". Digital Libraries at the Crossroads of Digital Information for the Future. Lecture Notes in Computer Science. Vol. 11853. Springer International Publishing. pp. 3–15. arXiv:1908.08987. doi:10.1007/978-3-030-34058-2_1. ISBN 978-3-030-34057-5. S2CID 201665955. 978-3-030-34057-5

  185. "iSAID". captain-whu.github.io. Retrieved 2021-11-30. https://captain-whu.github.io/iSAID/index.html

  186. Zamir, Syed & Arora, Aditya & Gupta, Akshita & Khan, Salman & Sun, Guolei & Khan, Fahad & Zhu, Fan & Shao, Ling & Xia, Gui-Song & Bai, Xiang. (2019). iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images. website https://captain-whu.github.io/iSAID/index.html

  187. Yuan, Jiangye; Gleason, Shaun S.; Cheriyadat, Anil M. (2013). "Systematic benchmarking of aerial image segmentation". IEEE Geoscience and Remote Sensing Letters. 10 (6): 1527–1531. Bibcode:2013IGRSL..10.1527Y. doi:10.1109/lgrs.2013.2261453. S2CID 629629. /wiki/Bibcode_(identifier)

  188. Vatsavai, Ranga Raju. "Object based image classification: state of the art and computational challenges." Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data. ACM, 2013. https://dl.acm.org/citation.cfm?id=2534927

  189. Butenuth, Matthias, et al. "Integrating pedestrian simulation, tracking and event detection for crowd analysis." Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE, 2011. http://www.hartmann-alberts.de/dirk/pub/proceedings2011e.pdf

  190. Fradi, Hajer, and Jean-Luc Dugelay. "Low level crowd analysis using frame-wise normalized feature for people counting." Information Forensics and Security (WIFS), 2012 IEEE International Workshop on. IEEE, 2012. http://www.eurecom.fr/fr/publication/3841/download/mm-publi-3841.pdf

  191. Johnson, Brian Alan, Ryutaro Tateishi, and Nguyen Thanh Hoan. "A hybrid pansharpening approach and multiscale object-based image analysis for mapping diseased pine and oak trees." International journal of remote sensing34.20 (2013): 6969–6982. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.826.9200&rep=rep1&type=pdf

  192. Mohd Pozi, Muhammad Syafiq; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran (2015). "A new classification model for a class imbalanced data set using genetic programming and support vector machines: Case study for wilt disease classification". Remote Sensing Letters. 6 (7): 568–577. Bibcode:2015RSL.....6..568M. doi:10.1080/2150704X.2015.1062159. S2CID 58788630. https://www.tandfonline.com/doi/abs/10.1080/2150704X.2015.1062159

  193. Gallego, A.-J.; Pertusa, A.; Gil, P. "Automatic Ship Classification from Optical Aerial Images with Convolutional Neural Networks." Remote Sensing. 2018; 10(4):511. https://www.mdpi.com/2072-4292/10/4/511

  194. Gallego, A.-J.; Pertusa, A.; Gil, P. "MAritime SATellite Imagery dataset". Available: https://www.iuii.ua.es/datasets/masati/, 2018. https://www.iuii.ua.es/datasets/masati/

  195. Johnson, Brian; Tateishi, Ryutaro; Xie, Zhixiao (2012). "Using geographically weighted variables for image classification". Remote Sensing Letters. 3 (6): 491–499. Bibcode:2012RSL.....3..491J. doi:10.1080/01431161.2011.629637. S2CID 122543681. /wiki/Bibcode_(identifier)

  196. Chatterjee, Sankhadeep, et al. "Forest Type Classification: A Hybrid NN-GA Model Based Approach." Information Systems Design and Intelligent Applications. Springer India, 2016. 227–236. https://www.researchgate.net/profile/Sankhadeep_Chatterjee/publication/282605325_Forest_Type_Classification_A_Hybrid_NN-GA_Model_Based_Approach/links/57493cb308ae5c51e29e6f1b/Forest-Type-Classification-A-Hybrid-NN-GA-Model-Based-Approach.pdf

  197. Diegert, Carl. "A combinatorial method for tracing objects using semantics of their shape." Applied Imagery Pattern Recognition Workshop (AIPR), 2010 IEEE 39th. IEEE, 2010. https://www.osti.gov/servlets/purl/1278837

  198. Razakarivony, Sebastien, and Frédéric Jurie. "Small target detection combining foreground and background manifolds." IAPR International Conference on Machine Vision Applications. 2013. https://hal.archives-ouvertes.fr/hal-00943444/file/13_mva-detection.pdf

  199. "SpaceNet". explore.digitalglobe.com. Archived from the original on 13 March 2018. Retrieved 2018-03-13. https://web.archive.org/web/20180313092809/http://explore.digitalglobe.com/spacenet

  200. Etten, Adam Van (2017-01-05). "Getting Started With SpaceNet Data". The DownLinQ. Retrieved 2018-03-13. https://medium.com/the-downlinq/getting-started-with-spacenet-data-827fd2ec9f53

  201. Vakalopoulou, M.; Bus, N.; Karantzalosa, K.; Paragios, N. (July 2017). "Integrating edge/Boundary priors with classification scores for building detection in very high resolution data". 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). pp. 3309–3312. doi:10.1109/IGARSS.2017.8127705. ISBN 978-1-5090-4951-6. S2CID 8297433. 978-1-5090-4951-6

  202. Yang, Yi; Newsam, Shawn (2010). "Bag-of-visual-words and spatial extensions for land-use classification". Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York, New York, USA: ACM Press. pp. 270–279. doi:10.1145/1869790.1869829. ISBN 9781450304283. S2CID 993769. 9781450304283

  203. Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (2015-11-03). "DeepSat: A learning framework for satellite imagery". Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM. pp. 1–10. doi:10.1145/2820783.2820816. ISBN 9781450339674. S2CID 4387134. 9781450339674

  204. Liu, Qun; Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (2019-11-21). "DeepSat V2: feature augmented convolutional neural nets for satellite image classification". Remote Sensing Letters. 11 (2): 156–165. arXiv:1911.07747. doi:10.1080/2150704x.2019.1693071. ISSN 2150-704X. S2CID 208138097. /wiki/ArXiv_(identifier)

  205. Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (2015-11-03). "DeepSat: A learning framework for satellite imagery". Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM. pp. 1–10. doi:10.1145/2820783.2820816. ISBN 9781450339674. S2CID 4387134. 9781450339674

  206. Liu, Qun; Basu, Saikat; Ganguly, Sangram; Mukhopadhyay, Supratik; DiBiano, Robert; Karki, Manohar; Nemani, Ramakrishna (2019-11-21). "DeepSat V2: feature augmented convolutional neural nets for satellite image classification". Remote Sensing Letters. 11 (2): 156–165. arXiv:1911.07747. doi:10.1080/2150704x.2019.1693071. ISSN 2150-704X. S2CID 208138097. /wiki/ArXiv_(identifier)

  207. Md Jahidul Islam, et al. "Semantic Segmentation of Underwater Imagery: Dataset and Benchmark." 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020. https://ieeexplore.ieee.org/abstract/document/9340821

  208. Waszak et al. "Semantic Segmentation in Underwater Ship Inspections: Benchmark and Data Set." IEEE Journal of Oceanic Engineering. IEEE, 2022. https://ieeexplore.ieee.org/document/9998080

  209. "True Color Kodak Images". r0k.us. Retrieved 2025-02-27. https://r0k.us/graphics/kodak/

  210. Ebadi, Ashkan; Paul, Patrick; Auer, Sofia; Tremblay, Stéphane (2021-11-12). "NRC-GAMMA: Introducing a Novel Large Gas Meter Image Dataset". arXiv:2111.06827 [cs.CV]. /wiki/ArXiv_(identifier)

  211. Canada, Government of Canada National Research Council (2021). "The gas meter image dataset (NRC-GAMMA) - NRC Digital Repository". nrc-digital-repository.canada.ca. doi:10.4224/3c8s-z290. Retrieved 2021-12-02. https://nrc-digital-repository.canada.ca/eng/view/object/?id=ba1fc493-e65f-4c0a-ab31-ecbcdf00bfa4

  212. Rabah, Chaima Ben; Coatrieux, Gouenou; Abdelfattah, Riadh (October 2020). "The Supatlantique Scanned Documents Database for Digital Image Forensics Purposes". 2020 IEEE International Conference on Image Processing (ICIP). IEEE. pp. 2096–2100. doi:10.1109/icip40778.2020.9190665. ISBN 978-1-7281-6395-6. S2CID 224881147. 978-1-7281-6395-6

  213. Mills, Kyle; Tamblyn, Isaac (2018-05-16). "Big graphene dataset". National Research Council of Canada. doi:10.4224/c8sc04578j.data. {{cite web}}: Missing or empty |url= (help) /wiki/Doi_(identifier)

  214. Mills, Kyle; Spanner, Michael; Tamblyn, Isaac (2018-05-16). "Quantum simulation". Quantum simulations of an electron in a two dimensional potential well. National Research Council of Canada. doi:10.4224/PhysRevA.96.042113.data. /wiki/Doi_(identifier)

  215. Rohrbach, M.; Amin, S.; Andriluka, M.; Schiele, B. (2012). "A database for fine grained activity detection of cooking activities". 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 1194–1201. doi:10.1109/cvpr.2012.6247801. ISBN 978-1-4673-1228-8. 978-1-4673-1228-8

  216. Kuehne, Hilde, Ali Arslan, and Thomas Serre. "The language of actions: Recovering the syntax and semantics of goal-directed human activities."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014. https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Kuehne_The_Language_of_2014_CVPR_paper.pdf

  217. Sviatoslav, Voloshynovskiy, et al. "Towards Reproducible results in authentication based on physical non-cloneable functions: The Forensic Authentication Microstructure Optical Set (FAMOS)."Proc. Proceedings of IEEE International Workshop on Information Forensics and Security. 2012. http://vision.unige.ch/publications/postscript/2012/2012.WIFS.database.pdf

  218. Olga, Taran and Shideh, Rezaeifar, et al. "PharmaPack: mobile fine-grained recognition of pharma packages."Proc. European Signal Processing Conference (EUSIPCO). 2017. https://archive-ouverte.unige.ch/unige:97444/ATTACHMENT01

  219. Khosla, Aditya, et al. "Novel dataset for fine-grained image categorization: Stanford dogs."Proc. CVPR Workshop on Fine-Grained Visual Categorization (FGVC). 2011. https://people.csail.mit.edu/khosla/papers/fgvc2011.pdf

  220. Parkhi, Omkar M., et al. "Cats and dogs."Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. http://www.robots.ox.ac.uk:5000/~vgg/publications/2012/parkhi12a/parkhi12a.pdf

  221. Biggs, Benjamin; Boyne, Oliver; Charles, James; Fitzgibbon, Andrew; Cipolla, Roberto (2020). Computer Vision – ECCV 2020. Lecture Notes in Computer Science. Vol. 12356. arXiv:2007.11110. doi:10.1007/978-3-030-58621-8. ISBN 978-3-030-58620-1. S2CID 227173931. 978-3-030-58620-1

  222. Parkhi, Omkar M., et al. "Cats and dogs."Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. http://www.robots.ox.ac.uk:5000/~vgg/publications/2012/parkhi12a/parkhi12a.pdf

  223. Razavian, Ali, et al. "CNN features off-the-shelf: an astounding baseline for recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2014. https://www.cv-foundation.org/openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf

  224. Ortega, Michael; et al. (1998). "Supporting ranked boolean similarity queries in MARS". IEEE Transactions on Knowledge and Data Engineering. 10 (6): 905–925. CiteSeerX 10.1.1.36.6079. doi:10.1109/69.738357. /wiki/CiteSeerX_(identifier)

  225. He, Xuming, Richard S. Zemel, and Miguel Á. Carreira-Perpiñán. "Multiscale conditional random fields for image labeling[permanent dead link]." Computer vision and pattern recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE computer society conference on. Vol. 2. IEEE, 2004. ftp://www-vhost.cs.toronto.edu/public_html/public_html/dist/zemel/Papers/cvpr04.pdf

  226. Deneke, Tewodros, et al. "Video transcoding time prediction for proactive load balancing." Multimedia and Expo (ICME), 2014 IEEE International Conference on. IEEE, 2014. https://ieeexplore.ieee.org/abstract/document/6890256/

  227. Ting-Hao (Kenneth) Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell (13 April 2016). "Visual Storytelling". arXiv:1604.03968 [cs.CL].{{cite arXiv}}: CS1 maint: multiple names: authors list (link) /wiki/ArXiv_(identifier)

  228. Wah, Catherine, et al. "The caltech-ucsd birds-200-2011 dataset." (2011). https://authors.library.caltech.edu/27452/1/CUB_200_2011.pdf

  229. Duan, Kun, et al. "Discovering localized attributes for fine-grained recognition." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012. http://vision.soic.indiana.edu/papers/attributes2012cvpr.pdf

  230. "YouTube-8M Dataset". research.google.com. Retrieved 1 October 2016. https://research.google.com/youtube8m/

  231. Abu-El-Haija, Sami; Kothari, Nisarg; Lee, Joonseok; Natsev, Paul; Toderici, George; Varadarajan, Balakrishnan; Vijayanarasimhan, Sudheendra (27 September 2016). "YouTube-8M: A Large-Scale Video Classification Benchmark". arXiv:1609.08675 [cs.CV]. /wiki/ArXiv_(identifier)

  232. "YFCC100M Dataset". mmcommons.org. Yahoo-ICSI-LLNL. Retrieved 1 June 2017. http://mmcommons.org

  233. Bart Thomee; David A Shamma; Gerald Friedland; Benjamin Elizalde; Karl Ni; Douglas Poland; Damian Borth; Li-Jia Li (25 April 2016). "Yfcc100m: The new data in multimedia research". Communications of the ACM. 59 (2): 64–73. arXiv:1503.01817. doi:10.1145/2812802. S2CID 207230134. /wiki/ArXiv_(identifier)

  234. Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "LIRIS-ACCEDE: A Video Database for Affective Content Analysis," in IEEE Transactions on Affective Computing, 2015. https://hal.archives-ouvertes.fr/hal-01375518/document

  235. Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, "Deep Learning vs. Kernel Methods: Performance for Emotion Prediction in Videos," in 2015 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2015. https://hal.archives-ouvertes.fr/hal-01193144/document

  236. M. Sjöberg, Y. Baveye, H. Wang, V. L. Quang, B. Ionescu, E. Dellandréa, M. Schedl, C.-H. Demarty, and L. Chen, "The mediaeval 2015 affective impact of movies task," in MediaEval 2015 Workshop, 2015. https://www.researchgate.net/profile/Hanli_Wang2/publication/309704559_The_MediaEval_2015_Affective_Impact_of_Movies_Task/links/581dada308ae12715af33bc8/The-MediaEval-2015-Affective-Impact-of-Movies-Task.pdf

  237. S. Johnson and M. Everingham, "Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation Archived 2021-11-04 at the Wayback Machine", in Proceedings of the 21st British Machine Vision Conference (BMVC2010) http://sam.johnson.io/research/publications/johnson10bmvc.pdf

  238. S. Johnson and M. Everingham, "Learning Effective Human Pose Estimation from Inaccurate Annotation Archived 2021-11-04 at the Wayback Machine", In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR2011) http://sam.johnson.io/research/publications/johnson11cvpr.pdf

  239. Afifi, Mahmoud; Hussain, Khaled F. (2017-11-02). "The Achievement of Higher Flexibility in Multiple Choice-based Tests Using Image Classification Techniques". arXiv:1711.00972 [cs.CV]. /wiki/ArXiv_(identifier)

  240. "MCQ Dataset". sites.google.com. Retrieved 2017-11-18. https://sites.google.com/view/mcq-dataset/mcqe-dataset

  241. Taj-Eddin, I. A. T. F.; Afifi, M.; Korashy, M.; Hamdy, D.; Nasser, M.; Derbaz, S. (July 2016). "A new compression technique for surveillance videos: Evaluation using new dataset". 2016 Sixth International Conference on Digital Information and Communication Technology and its Applications (DICTAP). pp. 159–164. doi:10.1109/DICTAP.2016.7544020. ISBN 978-1-4673-9609-7. S2CID 8698850. 978-1-4673-9609-7

  242. Tabak, Michael A.; Norouzzadeh, Mohammad S.; Wolfson, David W.; Sweeney, Steven J.; Vercauteren, Kurt C.; Snow, Nathan P.; Halseth, Joseph M.; Di Salvo, Paul A.; Lewis, Jesse S.; White, Michael D.; Teton, Ben; Beasley, James C.; Schlichting, Peter E.; Boughton, Raoul K.; Wight, Bethany; Newkirk, Eric S.; Ivan, Jacob S.; Odell, Eric A.; Brook, Ryan K.; Lukacs, Paul M.; Moeller, Anna K.; Mandeville, Elizabeth G.; Clune, Jeff; Miller, Ryan S.; Photopoulou, Theoni (2018). "Machine learning to classify animal species in camera trap images: Applications in ecology". Methods in Ecology and Evolution. 10 (4): 585–590. doi:10.1111/2041-210X.13120. ISSN 2041-210X. https://doi.org/10.1111%2F2041-210X.13120

  243. Taj-Eddin, Islam A. T. F.; Afifi, Mahmoud; Korashy, Mostafa; Ahmed, Ali H.; Ng, Yoke Cheng; Hernandez, Evelyng; Abdel-Latif, Salma M. (November 2017). "Can we see photosynthesis? Magnifying the tiny color changes of plant green leaves using Eulerian video magnification". Journal of Electronic Imaging. 26 (6): 060501. arXiv:1706.03867. Bibcode:2017JEI....26f0501T. doi:10.1117/1.jei.26.6.060501. ISSN 1017-9909. S2CID 12367169. /w/index.php?title=Eulerian_magnification&action=edit&redlink=1

  244. "Mathematical Mathematics Memes". https://www.kaggle.com/abdelghanibelgaid/mathematical-mathematics-memes

  245. Karras, Tero; Laine, Samuli; Aila, Timo (June 2019). "A Style-Based Generator Architecture for Generative Adversarial Networks". 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. pp. 4396–4405. arXiv:1812.04948. doi:10.1109/cvpr.2019.00453. ISBN 978-1-7281-3293-8. S2CID 54482423. 978-1-7281-3293-8

  246. Oltean, Mihai (2017). "Fruits-360 dataset". GitHub. https://www.github.com/fruits-360