一、传统图像算法工程师:
主要涉及图形处理,包括形态学、图像质量、相机成像之3A算法、去雾处理、颜色空间转换、滤镜等,主要在安防公司或者机器视觉领域,包括缺陷检测;
二、现代图像算法工程师:
涉及模式识别,主要表现的经验为Adaboost、SVM的研究与应用,特征选取与提取,包括智能驾驶的研究与应用、行人检测、人脸识别;
三、人工智能时代图像算法工程师:
深度学习,主要在大型互联网公司或者研究所机构,具体体现在TensorFlow等开源库的研究与应用,包括机器人的研、基于深度学习的人脸识别;
CSDN上也总结了很多:(补充2016-12-21 23:49:53)
计算机视觉、机器学习相关领域论文和源代码大集合–持续更新……
图像处理与计算机视觉:基础,经典以及最近发展 (经典论文集合)
计算机视觉领域的一些牛人博客,超有实力的研究机构web主页(转)
https://github.com/ty4z2008/Qix
通用库/General Library
- OpenCV 无需多言。
- RAVL Recognition And Vision Library. 线程安全。强大的IO机制。包含AAM。
图像,视频IO/Image, Video IO
AR相关/Augmented Reality
- ARToolKit 基于Marker的AR库
- ARToolKitPlus ARToolKit的增强版。实现了更好的姿态估计算法。
- PTAM 实时的跟踪、SLAM、AR库。无需Marker,模板,内置传感器等。
- BazAR 基于特征点检测和识别的AR库。
局部不变特征/Local Invariant Feature
- VLFeat 目前最好的Sift开源实现。同时包含了KD-tree,KD-Forest,BoW实现。
- VLFeat:著名而常用
著名的计算机视觉/图像处理开源项目,知名度应该不必OpenCV低太多,曾获ACM Open Source Software Competition 2010一等奖。使用C语言编写,提供C语言和Matlab两种接口。实现了大量计算机视觉算法,包括:
常用图像处理功能,包括颜色空间变换、几何变换(作为Matlab的补充),常用机器学习算法,包括GMM、SVM、KMeans等,常用的图像处理的plot工具。
特征提取,包括 Covariant detectors, HOG, SIFT,MSER等。VLFeat提供了一个vl_covdet() 函数作为框架,可以方便的统一所谓“co-variant feature detectors”,包括了DoG, Harris-Affine, Harris-Laplace并且可以提取SIFT或raw patches描述子。
超像素(Superpixel)分割,包括常用的Quick shift, SLIC算法等
高级聚类算法,比如整数KMeans:Integer k-means (IKM)、hierarchical version of integer k-means (HIKM),基于互信息自动判定聚类类数的算法Agglomerative Information Bottleneck (AIB) algorithm等
高维特特征匹配算法,随机KD树Randomized kd-trees
可以在这里查看VLFeat完整的功能列表。 http://www.vlfeat.org/matlab/matlab.html
- Ferns 基于Naive Bayesian Bundle的特征点识别。高速,但占用内存高。
- SIFT By Rob Hess 基于OpenCV的Sift实现。
目标检测/Object Detection
- AdaBoost By JianXin.Wu 又一个AdaBoost实现。训练速度快。
- 行人检测 By JianXin.Wu 基于Centrist和Linear SVM的快速行人检测。
(近似)最近邻/ANN
- FLANN 目前最完整的(近似)最近邻开源库。不但实现了一系列查找算法,还包含了一种自动选取最快算法的机制。
- ANN 另外一个近似最近邻库。
SLAM & SFM
- SceneLib [LGPL] monoSLAM库。由Androw Davison开发。
图像分割/Segmentation
- SLIC Super Pixel 使用Simple Linear Iterative Clustering产生指定数目,近似均匀分布的Super Pixel。
目标跟踪/Tracking
- TLD 基于Online Random Forest的目标跟踪算法。
- KLT Kanade-Lucas-Tracker
- Online boosting trackers Online Boosting Trackers
直线检测/Line Detection
- DSCC 基于联通域连接的直线检测算法。
- LSD [GPL] 基于梯度的,局部直线段检测算子。
指纹/Finger Print
- pHash [GPL] 基于感知的多媒体文件Hash算法。(提取,对比图像、视频、音频的指纹)
视觉显著性/Visual Salience
- Global Contrast Based Salient Region Detection Ming-Ming Cheng的视觉显著性算法。
FFT/DWT
- FFTW [GPL] 最快,最好的开源FFT。
- FFTReal [WTFPL] 轻量级的FFT实现。许可证是亮点。
音频处理/Audio processing
- STK [Free] 音频处理,音频合成。
- libsndfile [LGPL] 音频文件IO。
- libsamplerate [GPL ]音频重采样。
小波变换 快速小波变换(FWT)
BRIEF: Binary Robust Independent Elementary Feature 一个很好的局部特征描述子,里面有FAST corner + BRIEF实现特征点匹配的DEMO:http://cvlab.epfl.ch/software/brief/
http://code.google.com/p/javacv
Java打包的OpenCV, FFmpeg, libdc1394, PGR FlyCapture, OpenKinect, videoInput, and ARToolKitPlus库。可以放在Android上用~
libHIK,HIK SVM,计算HIK SVM跟Centrist的Lib。http://c2inet.sce.ntu.edu.sg/Jianxin/projects/libHIK/libHIK.htm
一组视觉显著性检测代码的链接:http://cg.cs.tsinghua.edu.cn/people/~cmm/saliency/
Peter Kovesi的工具箱:轻量好用,侧重图像处理 http://www.peterkovesi.com/matlabfns/
项目网站:http://www.csse.uwa.edu.au/~pk/research/matlabfns/
这位Peter大哥目前在The University of Western Australia工作,他自己写了一套Matlab计算机视觉算法,所谓工具箱其实就是许多m文件的集合,全部Matlab实现,无需编译安装,支持Octave(如果没有Matlab的话,有了这个工具箱也可以在Octave下进行图像处理了)。别看这位大哥单枪匹马,人家的工具箱可是相当有名,研究时候需要哪个Matlab的计算机视觉小功能,直接到他家主页上下几个m文件放在自己文件夹就好了。这个工具箱主要以图像处理算法为主,附带一些三维视觉的基本算法,列一些包括的功能:
Feature Detection via Phase Congruency,通过相位一致性检测图像特征
Spatial Feature Detection,Harris、Canny之类的特征算法
Edge Linking and Line Segment Fitting,边缘特征和线特征的各种操作
Image Denoising,图像降噪
Surface Normals to Surfaces,从法向量积分出表面
Anisotropic diffusion,著名的保边缘平滑算法
Functions Supporting Projective Geometry,透视几何、三维视觉的一些算法
Feature Matching、特征匹配
Fingerprint Enhancement,指纹图像增强
Interesting Synthetic Images,一些好玩儿的图像生成算法
Image Blending,图像融合
MexOpenCV:让Matlab支持调用的OpenCV
项目网站:http://www.cs.sunysb.edu/~kyamagu/mexopencv/
作者Kota Yamaguchi桑是石溪大学(Stony Brook University)的PhD,早些时候自己搞了一套东西把OpenCV的代码编译成Matlab可用的mex接口,然后这个东西迅速火了。今年夏天这个项目被OpenCV吸收为一个模块,貌似是搞了一个Google Summer of Code(GSoC)的项目,最近(大概是9、10月)已经merge到了OpenCV主包,有兴趣的可以到Github的OpenCV库下的module/matlab去玩一下,应该会在10月份的OpenCV 3 alpha里正式发布。现在OpenCV就同时有了Python和Maltab的binding(好强大)。具体的功能就不细说了,既然是OpenCV的binding,当然是可以使用OpenCV的绝大多数算法了。
介绍n款计算机视觉库/人脸识别开源库/软件
OpenCV是Intel®开源计算机视觉库。它由一系列 C 函数和少量 C++ 类构成,实现了图像处理和计算机视觉方面的很多通用算法。 OpenCV 拥有包括 300 多个C函数的跨平台的中、高层 API。它不依赖于其它的外部库——尽管也可以使用某些外部库。 OpenCV 对非商业… |
faceservice.cgi 是一个用来进行人脸识别的 CGI 程序, 你可以通过上传图像,然后该程序即告诉你人脸的大概坐标位置。faceservice是采用 OpenCV 库进行开发的。 |
OpenCVDotNet 是一个 .NET 对 OpenCV 包的封装。 |
jViolajones是人脸检测算法Viola-Jones的一个Java实现,并能够加载OpenCV XML文件。 示例代码:http://www.oschina.net/code/snippet_12_2033 |
JavaCV 提供了在计算机视觉领域的封装库,包括:OpenCV、ARToolKitPlus、libdc1394 2.x 、PGR FlyCapture和FFmpeg。此外,该工具可以很容易地使用Java平台的功能。 JavaCV还带有硬件加速的全屏幕图像显示(CanvasFrame),易于在多个内核中执行并行代码(并… |
QMotion 是一个采用 OpenCV 开发的运动检测程序,基于 QT。 |
OpenVSS – 开放平台的视频监控系统 – 是一个系统级别的视频监控软件视频分析框架(VAF)的视频分析与检索和播放服务,记录和索引技术。它被设计成插件式的支持多摄像头平台,多分析仪模块(OpenCV的集成),以及多核心架构。 |
手势识别,用OpenCV实现 |
提供人脸检测、识别与检测特定人脸的功能,示例代码 cvReleaseImage( &gray ); cvReleaseMemStorage(&storage); cvReleaseHaarClassifierCascade(&cascade);… |
Active Shape Model Library (ASMLibrary©) SDK, 用OpenCV开发,用于人脸检测与跟踪。 |
ECV 是 lua 的计算机视觉开发库(目前只提供linux支持) |
OpenCVSharp 是一个OpenCV的.Net wrapper,应用最新的OpenCV库开发,使用习惯比EmguCV更接近原始的OpenCV,有详细的使用样例供参考。 |
基于OpenCV构建的图像处理和3D视觉库。 示例代码: ImageSequenceReaderFactory factory; ImageSequenceReader* reader = factory.pathRegex(“c:/a/im_%03d.jpg”, 0, 20); //ImageSequenceReader* |
基于 QT 的面向对象的多平台计算机视觉库。可以方便的创建图形化应用程序,算法库主要从 OpenCV,GSL,CGAL,IPP,Octave 等高性能库借鉴而来。 |
cvBlob 是计算机视觉应用中在二值图像里寻找连通域的库.能够执行连通域分析与特征提取. |
GShow is a real-time image/video processing filter development kit. It successfully integrates DirectX11 with DirectShow framework. So it has the following |
VideoMan 提供一组视频捕获 API 。支持多种视频流同时输入(视频传输线、USB摄像头和视频文件等)。能利用 OpenGL 对输入进行处理,方便的与 OpenCV,CUDA 等集成开发计算机视觉系统。 |
Pattern Recognition project(开放模式识别项目),致力于开发出一套包含图像处理、计算机视觉、自然语言处理、模式识别、机器学习和相关领域算法的函数库。 |
OpenCV的Python封装,主要特性包括: 提供与OpenCV 2.x中最新的C++接口极为相似的Python接口,并且包括C++中不包括的C接口 提供对OpenCV 2.x中所有主要部件的绑定:CxCORE (almost complete), CxFLANN (complete), Cv (complete), |
计算机视觉快速开发平台,提供测试框架,使开发者可以专注于算法研究。 |
对函数库v412的封装,从网络摄像头等硬件获得图像数据,支持YUYV裸数据输出和BGR24的OpenCV IplImage输出 |
OpenVIDIA projects implement computer vision algorithms running on on graphics hardware such as single or multiple graphics processing units(GPUs) using |
实现了基于混合高斯模型的点集配准算法,该算法描述在论文: A Robust Algorithm for Point Set Registration Using Mixture of Gaussians, Bing Jian and Baba C. Vemuri. ,实现了C++/Matlab/Python接口… |
Recognition And Vision Library (RAVL) 是一个通用 C++ 库,包含计算机视觉、模式识别等模块。 |
LTI-Lib 是一个包含图像处理和计算机视觉常用算法和数据结构的面向对象库,提供 Windows 下的 VC 版本和 Linux 下的 gcc 版本,主要包含以下几方面内容: 1、线性代数 2、聚类分析 3、图像处理 4、可视化和绘图工具 |
OpenCV优化 opencv-dsp-acceleration
优化了OpenCV库在DSP上的速度。 |
C++计算机视觉库 Integrating Vision Toolkit
Integrating Vision Toolkit (IVT) 是一个强大而迅速的C++计算机视觉库,拥有易用的接口和面向对象的架构,并且含有自己的一套跨平台GUI组件,另外可以选择集成OpenCV |
The Epipolar Geometry Toolbox (EGT) is a toolbox designed for Matlab (by Mathworks Inc.). EGT provides a wide set of functions to approach computer vision |
ImageNets 是对OpenCV 的扩展,提供对机器人视觉算法方面友好的支持,使用Nokia的QT编写界面。 |
视频处理、计算机视觉和计算机图形学的快速开发库。 |
Matlab 的计算机视觉包,包含用于观察结果的 GUI 组件,貌似也停止开发了,拿来做学习用挺不错的。 |
SIP 是 Scilab(一种免费的类Matlab编程环境)的图像处理和计算机视觉库。SIP 可以读写 JPEG/PNG/BMP 格式的图片。具备图像滤波、分割、边缘检测、形态学处理和形状分析等功能。 |
STAIR Vision Library (SVL) 最初是为支持斯坦福智能机器人设计的,提供对计算机视觉、机器学习和概率统计模。
视觉相关网站
今天的主要任务就是和大家分享一些鄙人收藏的认为相当研究价值的网页:
Oxford大牛:Andrew Zisserman,http://www.robots.ox.ac.uk/~vgg/hzbook/code/,此人主要研究多幅图像的几何学,该网站提供了部分工具,相当实用,还有例子
西澳大利亚大学的Peter Kovesi:http://www.csse.uwa.edu.au/~pk/research/matlabfns/,提供了一些基本的matlab工具,主要内容涉及Computer Vision, Image Processing
CMU:http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/vision.html,该网站是我的最爱,尤其后面这个地址http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-groups.html,在这里提供了世界各地机构、大学在Computer Vision所涉及各领域的研究情况,包括Image Processing, Machine Vision,我后来也是通过它连接到了很多国外的网站
Cambridge:http://mi.eng.cam.ac.uk/milab.html,这是剑桥大学的机器智能实验室,里面有三个小组,Computer Vision & Robotics, Machine Intelligence, Speech,目前为止,Computer Vision & Robotics的一些研究成果对我日后的帮助可能会比较大,所以在此提及
大量计算机视觉方面的原版电子书:http://homepages.inf.ed.ac.uk/rbf/CVonline/books.htm,我今天先下了本Zisserman的书,呵呵,国外的原版书,虽然都是比较老的,但是对于基础的理解学习还是很有帮助的,至于目前的研究现状只能通过论文或者一些研究小组的网站
stanford:http://ai.stanford.edu/~asaxena/reconstruction3d/,这个网站是Andrew N.G老师和一个印度阿三的博士一起维护的,主要对于单张照片的三维重建,尤其他有个网页make3d.stanford.edu可以让你自己上传你的照片,通过网站来重建三维模型,这个网站对于刚开始接触Computer Vision的我来说,如获至宝,但有个致命问题就是make3d已经无法注册,我也多次给Andrew和印度阿三email,至今未回,郁闷,要是有这个网站的帐号,那还是相当爽的,不知道是不是由于他们的邮箱把我的email当成垃圾邮件过滤,哎,但这个stanford网站的贡献主要是代码,有很多computer vision的基础工具,貌似40M左右,全都是基于matlab的
caltech:http://www.vision.caltech.edu/bouguetj/calib_doc/,这是我们Computer Vision老师课件上的连接,主要是用于摄像机标定的工具集,当然也有涉及对标定图像三维重建的前期处理过程
JP Tarel:http://perso.lcpc.fr/tarel.jean-philippe/,这是他的个人主页,也是目前为止我发的email中,唯一一个给我回信的老外,因为我需要重建练习的正是他的图片集,我读过他的论文,但没有涉及代码的内容,再加上又是94年以前的论文,很多相关的引文,我都无法下载,在我的再三追问下,Tarel教授只告诉我,你可以按照我的那篇论文对足球进行重建,可是…你知道吗,你有很多图像处理的引文都下不了了,我只知道你通过那篇文章做了图像的预处理,根本不知道具体过程,当然我有幸找到过一篇90左右的论文,讲的是region-based segmentation,可是这文章里所有引文又是找不到的….悲剧的人生
开源软件网站:www.sourceforge.net
一、特征提取Feature Extraction:
- SIFT [1] [Demo program][SIFT Library] [VLFeat]
- PCA-SIFT [2] [Project]
- Affine-SIFT [3] [Project]
- SURF [4] [OpenSURF] [Matlab Wrapper]
- Affine Covariant Features [5] [Oxford project]
- MSER [6] [Oxford project] [VLFeat]
- Geometric Blur [7] [Code]
- Local Self-Similarity Descriptor [8] [Oxford implementation]
- Global and Efficient Self-Similarity [9] [Code]
- Histogram of Oriented Graidents [10] [INRIA Object Localization Toolkit] [OLT toolkit for Windows]
- GIST [11] [Project]
- Shape Context [12] [Project]
- Color Descriptor [13] [Project]
- Pyramids of Histograms of Oriented Gradients [Code]
- Space-Time Interest Points (STIP) [14][Project] [Code]
- Boundary Preserving Dense Local Regions [15][Project]
- Weighted Histogram[Code]
- Histogram-based Interest Points Detectors[Paper][Code]
- An OpenCV – C++ implementation of Local Self Similarity Descriptors [Project]
- Fast Sparse Representation with Prototypes[Project]
- Corner Detection [Project]
- AGAST Corner Detector: faster than FAST and even FAST-ER[Project]
- Real-time Facial Feature Detection using Conditional Regression Forests[Project]
- Global and Efficient Self-Similarity for Object Classification and Detection[code]
- WαSH: Weighted α-Shapes for Local Feature Detection[Project]
- HOG[Project]
- Online Selection of Discriminative Tracking Features[Project]
二、图像分割Image Segmentation:
- Normalized Cut [1] [Matlab code]
- Gerg Mori’ Superpixel code [2] [Matlab code]
- Efficient Graph-based Image Segmentation [3] [C++ code] [Matlab wrapper]
- Mean-Shift Image Segmentation [4] [EDISON C++ code] [Matlab wrapper]
- OWT-UCM Hierarchical Segmentation [5] [Resources]
- Turbepixels [6] [Matlab code 32bit] [Matlab code 64bit] [Updated code]
- Quick-Shift [7] [VLFeat]
- SLIC Superpixels [8] [Project]
- Segmentation by Minimum Code Length [9] [Project]
- Biased Normalized Cut [10] [Project]
- Segmentation Tree [11-12] [Project]
- Entropy Rate Superpixel Segmentation [13] [Code]
- Fast Approximate Energy Minimization via Graph Cuts[Paper][Code]
- Efficient Planar Graph Cuts with Applications in Computer Vision[Paper][Code]
- Isoperimetric Graph Partitioning for Image Segmentation[Paper][Code]
- Random Walks for Image Segmentation[Paper][Code]
- Blossom V: A new implementation of a minimum cost perfect matching algorithm[Code]
- An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Computer Vision[Paper][Code]
- Geodesic Star Convexity for Interactive Image Segmentation[Project]
- Contour Detection and Image Segmentation Resources[Project][Code]
- Biased Normalized Cuts[Project]
- Max-flow/min-cut[Project]
- Chan-Vese Segmentation using Level Set[Project]
- A Toolbox of Level Set Methods[Project]
- Re-initialization Free Level Set Evolution via Reaction Diffusion[Project]
- Improved C-V active contour model[Paper][Code]
- A Variational Multiphase Level Set Approach to Simultaneous Segmentation and Bias Correction[Paper][Code]
- Level Set Method Research by Chunming Li[Project]
- ClassCut for Unsupervised Class Segmentation[code]
- SEEDS: Superpixels Extracted via Energy-Driven Sampling [Project][other]
三、目标检测Object Detection:
- A simple object detector with boosting [Project]
- INRIA Object Detection and Localization Toolkit [1] [Project]
- Discriminatively Trained Deformable Part Models [2] [Project]
- Cascade Object Detection with Deformable Part Models [3] [Project]
- Poselet [4] [Project]
- Implicit Shape Model [5] [Project]
- Viola and Jones’s Face Detection [6] [Project]
- Bayesian Modelling of Dyanmic Scenes for Object Detection[Paper][Code]
- Hand detection using multiple proposals[Project]
- Color Constancy, Intrinsic Images, and Shape Estimation[Paper][Code]
- Discriminatively trained deformable part models[Project]
- Gradient Response Maps for Real-Time Detection of Texture-Less Objects: LineMOD [Project]
- Image Processing On Line[Project]
- Robust Optical Flow Estimation[Project]
- Where’s Waldo: Matching People in Images of Crowds[Project]
- Scalable Multi-class Object Detection[Project]
- Class-Specific Hough Forests for Object Detection[Project]
- Deformed Lattice Detection In Real-World Images[Project]
- Discriminatively trained deformable part models[Project]
四、显著性检测Saliency Detection:
- Itti, Koch, and Niebur’ saliency detection [1] [Matlab code]
- Frequency-tuned salient region detection [2] [Project]
- Saliency detection using maximum symmetric surround [3] [Project]
- Attention via Information Maximization [4] [Matlab code]
- Context-aware saliency detection [5] [Matlab code]
- Graph-based visual saliency [6] [Matlab code]
- Saliency detection: A spectral residual approach. [7] [Matlab code]
- Segmenting salient objects from images and videos. [8] [Matlab code]
- Saliency Using Natural statistics. [9] [Matlab code]
- Discriminant Saliency for Visual Recognition from Cluttered Scenes. [10] [Code]
- Learning to Predict Where Humans Look [11] [Project]
- Global Contrast based Salient Region Detection [12] [Project]
- Bayesian Saliency via Low and Mid Level Cues[Project]
- Top-Down Visual Saliency via Joint CRF and Dictionary Learning[Paper][Code]
- Saliency Detection: A Spectral Residual Approach[Code]
五、图像分类、聚类Image Classification, Clustering
- Pyramid Match [1] [Project]
- Spatial Pyramid Matching [2] [Code]
- Locality-constrained Linear Coding [3] [Project] [Matlab code]
- Sparse Coding [4] [Project] [Matlab code]
- Texture Classification [5] [Project]
- Multiple Kernels for Image Classification [6] [Project]
- Feature Combination [7] [Project]
- SuperParsing [Code]
- Large Scale Correlation Clustering Optimization[Matlab code]
- Detecting and Sketching the Common[Project]
- Self-Tuning Spectral Clustering[Project][Code]
- User Assisted Separation of Reflections from a Single Image Using a Sparsity Prior[Paper][Code]
- Filters for Texture Classification[Project]
- Multiple Kernel Learning for Image Classification[Project]
- SLIC Superpixels[Project]
六、抠图Image Matting
- A Closed Form Solution to Natural Image Matting [Code]
- Spectral Matting [Project]
- Learning-based Matting [Code]
七、目标跟踪Object Tracking:
- A Forest of Sensors – Tracking Adaptive Background Mixture Models [Project]
- Object Tracking via Partial Least Squares Analysis[Paper][Code]
- Robust Object Tracking with Online Multiple Instance Learning[Paper][Code]
- Online Visual Tracking with Histograms and Articulating Blocks[Project]
- Incremental Learning for Robust Visual Tracking[Project]
- Real-time Compressive Tracking[Project]
- Robust Object Tracking via Sparsity-based Collaborative Model[Project]
- Visual Tracking via Adaptive Structural Local Sparse Appearance Model[Project]
- Online Discriminative Object Tracking with Local Sparse Representation[Paper][Code]
- Superpixel Tracking[Project]
- Learning Hierarchical Image Representation with Sparsity, Saliency and Locality[Paper][Code]
- Online Multiple Support Instance Tracking [Paper][Code]
- Visual Tracking with Online Multiple Instance Learning[Project]
- Object detection and recognition[Project]
- Compressive Sensing Resources[Project]
- Robust Real-Time Visual Tracking using Pixel-Wise Posteriors[Project]
- Tracking-Learning-Detection[Project][OpenTLD/C++ Code]
- the HandVu:vision-based hand gesture interface[Project]
- Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities[Project]
八、Kinect:
九、3D相关:
- 3D Reconstruction of a Moving Object[Paper] [Code]
- Shape From Shading Using Linear Approximation[Code]
- Combining Shape from Shading and Stereo Depth Maps[Project][Code]
- Shape from Shading: A Survey[Paper][Code]
- A Spatio-Temporal Descriptor based on 3D Gradients (HOG3D)[Project][Code]
- Multi-camera Scene Reconstruction via Graph Cuts[Paper][Code]
- A Fast Marching Formulation of Perspective Shape from Shading under Frontal Illumination[Paper][Code]
- Reconstruction:3D Shape, Illumination, Shading, Reflectance, Texture[Project]
- Monocular Tracking of 3D Human Motion with a Coordinated Mixture of Factor Analyzers[Code]
- Learning 3-D Scene Structure from a Single Still Image[Project
十、机器学习算法:
- Matlab class for computing Approximate Nearest Nieghbor (ANN) [Matlab class providing interface toANN library]
- Random Sampling[code]
- Probabilistic Latent Semantic Analysis (pLSA)[Code]
- FASTANN and FASTCLUSTER for approximate k-means (AKM)[Project]
- Fast Intersection / Additive Kernel SVMs[Project]
- SVM[Code]
- Ensemble learning[Project]
- Deep Learning[Net]
- Deep Learning Methods for Vision[Project]
- Neural Network for Recognition of Handwritten Digits[Project]
- Training a deep autoencoder or a classifier on MNIST digits[Project]
- THE MNIST DATABASE of handwritten digits[Project]
- Ersatz:deep neural networks in the cloud[Project]
- Deep Learning [Project]
- sparseLM : Sparse Levenberg-Marquardt nonlinear least squares in C/C++[Project]
- Weka 3: Data Mining Software in Java[Project]
- Invited talk “A Tutorial on Deep Learning” by Dr. Kai Yu (余凯)[Video]
- CNN – Convolutional neural network class[Matlab Tool]
- Yann LeCun’s Publications[Wedsite]
- LeNet-5, convolutional neural networks[Project]
- Training a deep autoencoder or a classifier on MNIST digits[Project]
- Deep Learning 大牛Geoffrey E. Hinton’s HomePage[Website]
- Multiple Instance Logistic Discriminant-based Metric Learning (MildML) and Logistic Discriminant-based Metric Learning (LDML)[Code]
- Sparse coding simulation software[Project]
- Visual Recognition and Machine Learning Summer School[Software]
十一、目标、行为识别Object, Action Recognition:
- Action Recognition by Dense Trajectories[Project][Code]
- Action Recognition Using a Distributed Representation of Pose and Appearance[Project]
- Recognition Using Regions[Paper][Code]
- 2D Articulated Human Pose Estimation[Project]
- Fast Human Pose Estimation Using Appearance and Motion via Multi-Dimensional Boosting Regression[Paper][Code]
- Estimating Human Pose from Occluded Images[Paper][Code]
- Quasi-dense wide baseline matching[Project]
- ChaLearn Gesture Challenge: Principal motion: PCA-based reconstruction of motion histograms[Project]
- Real Time Head Pose Estimation with Random Regression Forests[Project]
- 2D Action Recognition Serves 3D Human Pose Estimation[Project]
- A Hough Transform-Based Voting Framework for Action Recognition[Project]
- Motion Interchange Patterns for Action Recognition in Unconstrained Videos[Project]
- 2D articulated human pose estimation software[Project]
- Learning and detecting shape models [code]
- Progressive Search Space Reduction for Human Pose Estimation[Project]
- Learning Non-Rigid 3D Shape from 2D Motion[Project]
十二、图像处理:
- Distance Transforms of Sampled Functions[Project]
- The Computer Vision Homepage[Project]
- Efficient appearance distances between windows[code]
- Image Exploration algorithm[code]
- Motion Magnification 运动放大 [Project]
- Bilateral Filtering for Gray and Color Images 双边滤波器 [Project]
- A Fast Approximation of the Bilateral Filter using a Signal Processing Approach [Project]
十三、一些实用工具:
- EGT: a Toolbox for Multiple View Geometry and Visual Servoing[Project] [Code]
- a development kit of matlab mex functions for OpenCV library[Project]
- Fast Artificial Neural Network Library[Project]
十四、人手及指尖检测与识别:
- finger-detection-and-gesture-recognition [Code]
- Hand and Finger Detection using JavaCV[Project]
- Hand and fingers detection[Code]
十五、场景解释:
- Nonparametric Scene Parsing via Label Transfer [Project]
十六、光流Optical flow:
- High accuracy optical flow using a theory for warping [Project]
- Dense Trajectories Video Description [Project]
- SIFT Flow: Dense Correspondence across Scenes and its Applications[Project]
- KLT: An Implementation of the Kanade-Lucas-Tomasi Feature Tracker [Project]
- Tracking Cars Using Optical Flow[Project]
- Secrets of optical flow estimation and their principles[Project]
- implmentation of the Black and Anandan dense optical flow method[Project]
- Optical Flow Computation[Project]
- Beyond Pixels: Exploring New Representations and Applications for Motion Analysis[Project]
- A Database and Evaluation Methodology for Optical Flow[Project]
- optical flow relative[Project]
- Robust Optical Flow Estimation [Project]
- optical flow[Project]
十七、图像检索Image Retrieval:
十八、马尔科夫随机场Markov Random Fields:
- Markov Random Fields for Super-Resolution [Project]
- A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors [Project]
十九、运动检测Motion detection:
- Moving Object Extraction, Using Models or Analysis of Regions [Project]
- Background Subtraction: Experiments and Improvements for ViBe [Project]
- A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications [Project]
- changedetection.net: A new change detection benchmark dataset[Project]
- ViBe – a powerful technique for background detection and subtraction in video sequences[Project]
- Background Subtraction Program[Project]
- Motion Detection Algorithms[Project]
- Stuttgart Artificial Background Subtraction Dataset[Project]
- Object Detection, Motion Estimation, and Tracking[Project]
Feature Detection and Description
General Libraries:
- VLFeat – Implementation of various feature descriptors (including SIFT, HOG, and LBP) and covariant feature detectors (including DoG, Hessian, Harris Laplace, Hessian Laplace, Multiscale Hessian, Multiscale Harris). Easy-to-use Matlab interface. See Modern features: Software – Slides providing a demonstration of VLFeat and also links to other software. Check also VLFeat hands-on session training
- OpenCV – Various implementations of modern feature detectors and descriptors (SIFT, SURF, FAST, BRIEF, ORB, FREAK, etc.)
Fast Keypoint Detectors for Real-time Applications:
- FAST – High-speed corner detector implementation for a wide variety of platforms
- AGAST – Even faster than the FAST corner detector. A multi-scale version of this method is used for the BRISK descriptor (ECCV 2010).
Binary Descriptors for Real-Time Applications:
- BRIEF – C++ code for a fast and accurate interest point descriptor (not invariant to rotations and scale) (ECCV 2010)
- ORB – OpenCV implementation of the Oriented-Brief (ORB) descriptor (invariant to rotations, but not scale)
- BRISK – Efficient Binary descriptor invariant to rotations and scale. It includes a Matlab mex interface. (ICCV 2011)
- FREAK – Faster than BRISK (invariant to rotations and scale) (CVPR 2012)
SIFT and SURF Implementations:
- SIFT: VLFeat, OpenCV, Original code by David Lowe, GPU implementation, OpenSIFT
- SURF: Herbert Bay’s code, OpenCV, GPU-SURF
Other Local Feature Detectors and Descriptors:
- VGG Affine Covariant features – Oxford code for various affine covariant feature detectors and descriptors.
- LIOP descriptor – Source code for the Local Intensity order Pattern (LIOP) descriptor (ICCV 2011).
- Local Symmetry Features – Source code for matching of local symmetry features under large variations in lighting, age, and rendering style (CVPR 2012).
Global Image Descriptors:
- GIST – Matlab code for the GIST descriptor
- CENTRIST – Global visual descriptor for scene categorization and object detection (PAMI 2011)
Feature Coding and Pooling
- VGG Feature Encoding Toolkit – Source code for various state-of-the-art feature encoding methods – including Standard hard encoding, Kernel codebook encoding, Locality-constrained linear encoding, and Fisher kernel encoding.
- Spatial Pyramid Matching – Source code for feature pooling based on spatial pyramid matching (widely used for image classification)
Convolutional Nets and Deep Learning
- EBLearn – C++ Library for Energy-Based Learning. It includes several demos and step-by-step instructions to train classifiers based on convolutional neural networks.
- Torch7 – Provides a matlab-like environment for state-of-the-art machine learning algorithms, including a fast implementation of convolutional neural networks.
- Deep Learning – Various links for deep learning software.
- Deformable Part-based Detector – Library provided by the authors of the original paper (state-of-the-art in PASCAL VOC detection task)
- Efficient Deformable Part-Based Detector – Branch-and-Bound implementation for a deformable part-based detector.
- Accelerated Deformable Part Model – Efficient implementation of a method that achieves the exact same performance of deformable part-based detectors but with significant acceleration (ECCV 2012).
- Coarse-to-Fine Deformable Part Model – Fast approach for deformable object detection (CVPR 2011).
- Poselets – C++ and Matlab versions for object detection based on poselets.
- Part-based Face Detector and Pose Estimation – Implementation of a unified approach for face detection, pose estimation, and landmark localization (CVPR 2012).
Attributes and Semantic Features
- Relative Attributes – Modified implementation of RankSVM to train Relative Attributes (ICCV 2011).
- Object Bank – Implementation of object bank semantic features (NIPS 2010). See also ActionBank
- Classemes, Picodes, and Meta-class features – Software for extracting high-level image descriptors (ECCV 2010, NIPS 2011, CVPR 2012).
Large-Scale Learning
- Additive Kernels – Source code for fast additive kernel SVM classifiers (PAMI 2013).
- LIBLINEAR – Library for large-scale linear SVM classification.
- VLFeat – Implementation for Pegasos SVM and Homogeneous Kernel map.
Fast Indexing and Image Retrieval
- FLANN – Library for performing fast approximate nearest neighbor.
- Kernelized LSH – Source code for Kernelized Locality-Sensitive Hashing (ICCV 2009).
- ITQ Binary codes – Code for generation of small binary codes using Iterative Quantization and other baselines such as Locality-Sensitive-Hashing (CVPR 2011).
- INRIA Image Retrieval – Efficient code for state-of-the-art large-scale image retrieval (CVPR 2011).
Object Detection
- See Part-based Models and Convolutional Nets above.
- Pedestrian Detection at 100fps – Very fast and accurate pedestrian detector (CVPR 2012).
- Caltech Pedestrian Detection Benchmark – Excellent resource for pedestrian detection, with various links for state-of-the-art implementations.
- OpenCV – Enhanced implementation of Viola&Jones real-time object detector, with trained models for face detection.
- Efficient Subwindow Search – Source code for branch-and-bound optimization for efficient object localization (CVPR 2008).
3D Recognition
- Point-Cloud Library – Library for 3D image and point cloud processing.
Action Recognition
- ActionBank – Source code for action recognition based on the ActionBank representation (CVPR 2012).
- STIP Features – software for computing space-time interest point descriptors
- Independent Subspace Analysis – Look for Stacked ISA for Videos (CVPR 2011)
- Velocity Histories of Tracked Keypoints – C++ code for activity recognition using the velocity histories of tracked keypoints (ICCV 2009)
Datasets
Attributes
- Animals with Attributes – 30,475 images of 50 animals classes with 6 pre-extracted feature representations for each image.
- aYahoo and aPascal – Attribute annotations for images collected from Yahoo and Pascal VOC 2008.
- FaceTracer – 15,000 faces annotated with 10 attributes and fiducial points.
- PubFig – 58,797 face images of 200 people with 73 attribute classifier outputs.
- LFW – 13,233 face images of 5,749 people with 73 attribute classifier outputs.
- Human Attributes – 8,000 people with annotated attributes. Check also this link for another dataset of human attributes.
- SUN Attribute Database – Large-scale scene attribute database with a taxonomy of 102 attributes.
- ImageNet Attributes – Variety of attribute labels for the ImageNet dataset.
- Relative attributes – Data for OSR and a subset of PubFig datasets. Check also this link for the WhittleSearch data.
- Attribute Discovery Dataset – Images of shopping categories associated with textual descriptions.
Fine-grained Visual Categorization
- Caltech-UCSD Birds Dataset – Hundreds of bird categories with annotated parts and attributes.
- Stanford Dogs Dataset – 20,000 images of 120 breeds of dogs from around the world.
- Oxford-IIIT Pet Dataset – 37 category pet dataset with roughly 200 images for each class. Pixel level trimap segmentation is included.
- Leeds Butterfly Dataset – 832 images of 10 species of butterflies.
- Oxford Flower Dataset – Hundreds of flower categories.
Face Detection
- FDDB – UMass face detection dataset and benchmark (5,000+ faces)
- CMU/MIT – Classical face detection dataset.
Face Recognition
- Face Recognition Homepage – Large collection of face recognition datasets.
- LFW – UMass unconstrained face recognition dataset (13,000+ face images).
- NIST Face Homepage – includes face recognition grand challenge (FRGC), vendor tests (FRVT) and others.
- CMU Multi-PIE – contains more than 750,000 images of 337 people, with 15 different views and 19 lighting conditions.
- FERET – Classical face recognition dataset.
- Deng Cai’s face dataset in Matlab Format – Easy to use if you want play with simple face datasets including Yale, ORL, PIE, and Extended Yale B.
- SCFace – Low-resolution face dataset captured from surveillance cameras.
Handwritten Digits
- MNIST – large dataset containing a training set of 60,000 examples, and a test set of 10,000 examples.
Pedestrian Detection
- Caltech Pedestrian Detection Benchmark – 10 hours of video taken from a vehicle,350K bounding boxes for about 2.3K unique pedestrians.
- INRIA Person Dataset – Currently one of the most popular pedestrian detection datasets.
- ETH Pedestrian Dataset – Urban dataset captured from a stereo rig mounted on a stroller.
- TUD-Brussels Pedestrian Dataset – Dataset with image pairs recorded in an crowded urban setting with an onboard camera.
- PASCAL Human Detection – One of 20 categories in PASCAL VOC detection challenges.
- USC Pedestrian Dataset – Small dataset captured from surveillance cameras.
Generic Object Recognition
- ImageNet – Currently the largest visual recognition dataset in terms of number of categories and images.
- Tiny Images – 80 million 32×32 low resolution images.
- Pascal VOC – One of the most influential visual recognition datasets.
- Caltech 101 / Caltech 256 – Popular image datasets containing 101 and 256 object categories, respectively.
- MIT LabelMe – Online annotation tool for building computer vision databases.
Scene Recognition
- MIT SUN Dataset – MIT scene understanding dataset.
- UIUC Fifteen Scene Categories – Dataset of 15 natural scene categories.
Feature Detection and Description
- VGG Affine Dataset – Widely used dataset for measuring performance of feature detection and description. CheckVLBenchmarks for an evaluation framework.
Action Recognition
- Benchmarking Activity Recognition – CVPR 2012 tutorial covering various datasets for action recognition.
RGBD Recognition
- RGB-D Object Dataset – Dataset containing 300 common household objects
,
一、特征提取Feature Extraction:
- SIFT [1] [Demo program][SIFT Library] [VLFeat]
- PCA-SIFT [2] [Project]
- Affine-SIFT [3] [Project]
- SURF [4] [OpenSURF] [Matlab Wrapper]
- Affine Covariant Features [5] [Oxford project]
- MSER [6] [Oxford project] [VLFeat]
- Geometric Blur [7] [Code]
- Local Self-Similarity Descriptor [8] [Oxford implementation]
- Global and Efficient Self-Similarity [9] [Code]
- Histogram of Oriented Graidents [10] [INRIA Object Localization Toolkit] [OLT toolkit for Windows]
- GIST [11] [Project]
- Shape Context [12] [Project]
- Color Descriptor [13] [Project]
- Pyramids of Histograms of Oriented Gradients [Code]
- Space-Time Interest Points (STIP) [14][Project] [Code]
- Boundary Preserving Dense Local Regions [15][Project]
- Weighted Histogram[Code]
- Histogram-based Interest Points Detectors[Paper][Code]
- An OpenCV – C++ implementation of Local Self Similarity Descriptors [Project]
- Fast Sparse Representation with Prototypes[Project]
- Corner Detection [Project]
- AGAST Corner Detector: faster than FAST and even FAST-ER[Project]
- Real-time Facial Feature Detection using Conditional Regression Forests[Project]
- Global and Efficient Self-Similarity for Object Classification and Detection[code]
- WαSH: Weighted α-Shapes for Local Feature Detection[Project]
- HOG[Project]
- Online Selection of Discriminative Tracking Features[Project]
二、图像分割Image Segmentation:
- Normalized Cut [1] [Matlab code]
- Gerg Mori’ Superpixel code [2] [Matlab code]
- Efficient Graph-based Image Segmentation [3] [C++ code] [Matlab wrapper]
- Mean-Shift Image Segmentation [4] [EDISON C++ code] [Matlab wrapper]
- OWT-UCM Hierarchical Segmentation [5] [Resources]
- Turbepixels [6] [Matlab code 32bit] [Matlab code 64bit] [Updated code]
- Quick-Shift [7] [VLFeat]
- SLIC Superpixels [8] [Project]
- Segmentation by Minimum Code Length [9] [Project]
- Biased Normalized Cut [10] [Project]
- Segmentation Tree [11-12] [Project]
- Entropy Rate Superpixel Segmentation [13] [Code]
- Fast Approximate Energy Minimization via Graph Cuts[Paper][Code]
- Efficient Planar Graph Cuts with Applications in Computer Vision[Paper][Code]
- Isoperimetric Graph Partitioning for Image Segmentation[Paper][Code]
- Random Walks for Image Segmentation[Paper][Code]
- Blossom V: A new implementation of a minimum cost perfect matching algorithm[Code]
- An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Computer Vision[Paper][Code]
- Geodesic Star Convexity for Interactive Image Segmentation[Project]
- Contour Detection and Image Segmentation Resources[Project][Code]
- Biased Normalized Cuts[Project]
- Max-flow/min-cut[Project]
- Chan-Vese Segmentation using Level Set[Project]
- A Toolbox of Level Set Methods[Project]
- Re-initialization Free Level Set Evolution via Reaction Diffusion[Project]
- Improved C-V active contour model[Paper][Code]
- A Variational Multiphase Level Set Approach to Simultaneous Segmentation and Bias Correction[Paper][Code]
- Level Set Method Research by Chunming Li[Project]
- ClassCut for Unsupervised Class Segmentation[code]
- SEEDS: Superpixels Extracted via Energy-Driven Sampling [Project][other]
三、目标检测Object Detection:
- A simple object detector with boosting [Project]
- INRIA Object Detection and Localization Toolkit [1] [Project]
- Discriminatively Trained Deformable Part Models [2] [Project]
- Cascade Object Detection with Deformable Part Models [3] [Project]
- Poselet [4] [Project]
- Implicit Shape Model [5] [Project]
- Viola and Jones’s Face Detection [6] [Project]
- Bayesian Modelling of Dyanmic Scenes for Object Detection[Paper][Code]
- Hand detection using multiple proposals[Project]
- Color Constancy, Intrinsic Images, and Shape Estimation[Paper][Code]
- Discriminatively trained deformable part models[Project]
- Gradient Response Maps for Real-Time Detection of Texture-Less Objects: LineMOD [Project]
- Image Processing On Line[Project]
- Robust Optical Flow Estimation[Project]
- Where’s Waldo: Matching People in Images of Crowds[Project]
- Scalable Multi-class Object Detection[Project]
- Class-Specific Hough Forests for Object Detection[Project]
- Deformed Lattice Detection In Real-World Images[Project]
- Discriminatively trained deformable part models[Project]
四、显著性检测Saliency Detection:
- Itti, Koch, and Niebur’ saliency detection [1] [Matlab code]
- Frequency-tuned salient region detection [2] [Project]
- Saliency detection using maximum symmetric surround [3] [Project]
- Attention via Information Maximization [4] [Matlab code]
- Context-aware saliency detection [5] [Matlab code]
- Graph-based visual saliency [6] [Matlab code]
- Saliency detection: A spectral residual approach. [7] [Matlab code]
- Segmenting salient objects from images and videos. [8] [Matlab code]
- Saliency Using Natural statistics. [9] [Matlab code]
- Discriminant Saliency for Visual Recognition from Cluttered Scenes. [10] [Code]
- Learning to Predict Where Humans Look [11] [Project]
- Global Contrast based Salient Region Detection [12] [Project]
- Bayesian Saliency via Low and Mid Level Cues[Project]
- Top-Down Visual Saliency via Joint CRF and Dictionary Learning[Paper][Code]
- Saliency Detection: A Spectral Residual Approach[Code]
五、图像分类、聚类Image Classification, Clustering
- Pyramid Match [1] [Project]
- Spatial Pyramid Matching [2] [Code]
- Locality-constrained Linear Coding [3] [Project] [Matlab code]
- Sparse Coding [4] [Project] [Matlab code]
- Texture Classification [5] [Project]
- Multiple Kernels for Image Classification [6] [Project]
- Feature Combination [7] [Project]
- SuperParsing [Code]
- Large Scale Correlation Clustering Optimization[Matlab code]
- Detecting and Sketching the Common[Project]
- Self-Tuning Spectral Clustering[Project][Code]
- User Assisted Separation of Reflections from a Single Image Using a Sparsity Prior[Paper][Code]
- Filters for Texture Classification[Project]
- Multiple Kernel Learning for Image Classification[Project]
- SLIC Superpixels[Project]
六、抠图Image Matting
- A Closed Form Solution to Natural Image Matting [Code]
- Spectral Matting [Project]
- Learning-based Matting [Code]
七、目标跟踪Object Tracking:
- A Forest of Sensors – Tracking Adaptive Background Mixture Models [Project]
- Object Tracking via Partial Least Squares Analysis[Paper][Code]
- Robust Object Tracking with Online Multiple Instance Learning[Paper][Code]
- Online Visual Tracking with Histograms and Articulating Blocks[Project]
- Incremental Learning for Robust Visual Tracking[Project]
- Real-time Compressive Tracking[Project]
- Robust Object Tracking via Sparsity-based Collaborative Model[Project]
- Visual Tracking via Adaptive Structural Local Sparse Appearance Model[Project]
- Online Discriminative Object Tracking with Local Sparse Representation[Paper][Code]
- Superpixel Tracking[Project]
- Learning Hierarchical Image Representation with Sparsity, Saliency and Locality[Paper][Code]
- Online Multiple Support Instance Tracking [Paper][Code]
- Visual Tracking with Online Multiple Instance Learning[Project]
- Object detection and recognition[Project]
- Compressive Sensing Resources[Project]
- Robust Real-Time Visual Tracking using Pixel-Wise Posteriors[Project]
- Tracking-Learning-Detection[Project][OpenTLD/C++ Code]
- the HandVu:vision-based hand gesture interface[Project]
- Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities[Project]
八、Kinect:
九、3D相关:
- 3D Reconstruction of a Moving Object[Paper] [Code]
- Shape From Shading Using Linear Approximation[Code]
- Combining Shape from Shading and Stereo Depth Maps[Project][Code]
- Shape from Shading: A Survey[Paper][Code]
- A Spatio-Temporal Descriptor based on 3D Gradients (HOG3D)[Project][Code]
- Multi-camera Scene Reconstruction via Graph Cuts[Paper][Code]
- A Fast Marching Formulation of Perspective Shape from Shading under Frontal Illumination[Paper][Code]
- Reconstruction:3D Shape, Illumination, Shading, Reflectance, Texture[Project]
- Monocular Tracking of 3D Human Motion with a Coordinated Mixture of Factor Analyzers[Code]
- Learning 3-D Scene Structure from a Single Still Image[Project
十、机器学习算法:
- Matlab class for computing Approximate Nearest Nieghbor (ANN) [Matlab class providing interface toANN library]
- Random Sampling[code]
- Probabilistic Latent Semantic Analysis (pLSA)[Code]
- FASTANN and FASTCLUSTER for approximate k-means (AKM)[Project]
- Fast Intersection / Additive Kernel SVMs[Project]
- SVM[Code]
- Ensemble learning[Project]
- Deep Learning[Net]
- Deep Learning Methods for Vision[Project]
- Neural Network for Recognition of Handwritten Digits[Project]
- Training a deep autoencoder or a classifier on MNIST digits[Project]
- THE MNIST DATABASE of handwritten digits[Project]
- Ersatz:deep neural networks in the cloud[Project]
- Deep Learning [Project]
- sparseLM : Sparse Levenberg-Marquardt nonlinear least squares in C/C++[Project]
- Weka 3: Data Mining Software in Java[Project]
- Invited talk “A Tutorial on Deep Learning” by Dr. Kai Yu (余凯)[Video]
- CNN – Convolutional neural network class[Matlab Tool]
- Yann LeCun’s Publications[Wedsite]
- LeNet-5, convolutional neural networks[Project]
- Training a deep autoencoder or a classifier on MNIST digits[Project]
- Deep Learning 大牛Geoffrey E. Hinton’s HomePage[Website]
- Multiple Instance Logistic Discriminant-based Metric Learning (MildML) and Logistic Discriminant-based Metric Learning (LDML)[Code]
- Sparse coding simulation software[Project]
- Visual Recognition and Machine Learning Summer School[Software]
十一、目标、行为识别Object, Action Recognition:
- Action Recognition by Dense Trajectories[Project][Code]
- Action Recognition Using a Distributed Representation of Pose and Appearance[Project]
- Recognition Using Regions[Paper][Code]
- 2D Articulated Human Pose Estimation[Project]
- Fast Human Pose Estimation Using Appearance and Motion via Multi-Dimensional Boosting Regression[Paper][Code]
- Estimating Human Pose from Occluded Images[Paper][Code]
- Quasi-dense wide baseline matching[Project]
- ChaLearn Gesture Challenge: Principal motion: PCA-based reconstruction of motion histograms[Project]
- Real Time Head Pose Estimation with Random Regression Forests[Project]
- 2D Action Recognition Serves 3D Human Pose Estimation[Project]
- A Hough Transform-Based Voting Framework for Action Recognition[Project]
- Motion Interchange Patterns for Action Recognition in Unconstrained Videos[Project]
- 2D articulated human pose estimation software[Project]
- Learning and detecting shape models [code]
- Progressive Search Space Reduction for Human Pose Estimation[Project]
- Learning Non-Rigid 3D Shape from 2D Motion[Project]
十二、图像处理:
- Distance Transforms of Sampled Functions[Project]
- The Computer Vision Homepage[Project]
- Efficient appearance distances between windows[code]
- Image Exploration algorithm[code]
- Motion Magnification 运动放大 [Project]
- Bilateral Filtering for Gray and Color Images 双边滤波器 [Project]
- A Fast Approximation of the Bilateral Filter using a Signal Processing Approach [Project]
十三、一些实用工具:
- EGT: a Toolbox for Multiple View Geometry and Visual Servoing[Project] [Code]
- a development kit of matlab mex functions for OpenCV library[Project]
- Fast Artificial Neural Network Library[Project]
十四、人手及指尖检测与识别:
- finger-detection-and-gesture-recognition [Code]
- Hand and Finger Detection using JavaCV[Project]
- Hand and fingers detection[Code]
十五、场景解释:
- Nonparametric Scene Parsing via Label Transfer [Project]
十六、光流Optical flow:
- High accuracy optical flow using a theory for warping [Project]
- Dense Trajectories Video Description [Project]
- SIFT Flow: Dense Correspondence across Scenes and its Applications[Project]
- KLT: An Implementation of the Kanade-Lucas-Tomasi Feature Tracker [Project]
- Tracking Cars Using Optical Flow[Project]
- Secrets of optical flow estimation and their principles[Project]
- implmentation of the Black and Anandan dense optical flow method[Project]
- Optical Flow Computation[Project]
- Beyond Pixels: Exploring New Representations and Applications for Motion Analysis[Project]
- A Database and Evaluation Methodology for Optical Flow[Project]
- optical flow relative[Project]
- Robust Optical Flow Estimation [Project]
- optical flow[Project]
十七、图像检索Image Retrieval:
十八、马尔科夫随机场Markov Random Fields:
- Markov Random Fields for Super-Resolution [Project]
- A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors [Project]
十九、运动检测Motion detection:
- Moving Object Extraction, Using Models or Analysis of Regions [Project]
- Background Subtraction: Experiments and Improvements for ViBe [Project]
- A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications [Project]
- changedetection.net: A new change detection benchmark dataset[Project]
- ViBe – a powerful technique for background detection and subtraction in video sequences[Project]
- Background Subtraction Program[Project]
- Motion Detection Algorithms[Project]
- Stuttgart Artificial Background Subtraction Dataset[Project]
- Object Detection, Motion Estimation, and Tracking[Project]
Feature Detection and Description
General Libraries:
- VLFeat – Implementation of various feature descriptors (including SIFT, HOG, and LBP) and covariant feature detectors (including DoG, Hessian, Harris Laplace, Hessian Laplace, Multiscale Hessian, Multiscale Harris). Easy-to-use Matlab interface. See Modern features: Software – Slides providing a demonstration of VLFeat and also links to other software. Check also VLFeat hands-on session training
- OpenCV – Various implementations of modern feature detectors and descriptors (SIFT, SURF, FAST, BRIEF, ORB, FREAK, etc.)
Fast Keypoint Detectors for Real-time Applications:
- FAST – High-speed corner detector implementation for a wide variety of platforms
- AGAST – Even faster than the FAST corner detector. A multi-scale version of this method is used for the BRISK descriptor (ECCV 2010).
Binary Descriptors for Real-Time Applications:
- BRIEF – C++ code for a fast and accurate interest point descriptor (not invariant to rotations and scale) (ECCV 2010)
- ORB – OpenCV implementation of the Oriented-Brief (ORB) descriptor (invariant to rotations, but not scale)
- BRISK – Efficient Binary descriptor invariant to rotations and scale. It includes a Matlab mex interface. (ICCV 2011)
- FREAK – Faster than BRISK (invariant to rotations and scale) (CVPR 2012)
SIFT and SURF Implementations:
- SIFT: VLFeat, OpenCV, Original code by David Lowe, GPU implementation, OpenSIFT
- SURF: Herbert Bay’s code, OpenCV, GPU-SURF
Other Local Feature Detectors and Descriptors:
- VGG Affine Covariant features – Oxford code for various affine covariant feature detectors and descriptors.
- LIOP descriptor – Source code for the Local Intensity order Pattern (LIOP) descriptor (ICCV 2011).
- Local Symmetry Features – Source code for matching of local symmetry features under large variations in lighting, age, and rendering style (CVPR 2012).
Global Image Descriptors:
- GIST – Matlab code for the GIST descriptor
- CENTRIST – Global visual descriptor for scene categorization and object detection (PAMI 2011)
Feature Coding and Pooling
- VGG Feature Encoding Toolkit – Source code for various state-of-the-art feature encoding methods – including Standard hard encoding, Kernel codebook encoding, Locality-constrained linear encoding, and Fisher kernel encoding.
- Spatial Pyramid Matching – Source code for feature pooling based on spatial pyramid matching (widely used for image classification)
Convolutional Nets and Deep Learning
- EBLearn – C++ Library for Energy-Based Learning. It includes several demos and step-by-step instructions to train classifiers based on convolutional neural networks.
- Torch7 – Provides a matlab-like environment for state-of-the-art machine learning algorithms, including a fast implementation of convolutional neural networks.
- Deep Learning – Various links for deep learning software.
- Deformable Part-based Detector – Library provided by the authors of the original paper (state-of-the-art in PASCAL VOC detection task)
- Efficient Deformable Part-Based Detector – Branch-and-Bound implementation for a deformable part-based detector.
- Accelerated Deformable Part Model – Efficient implementation of a method that achieves the exact same performance of deformable part-based detectors but with significant acceleration (ECCV 2012).
- Coarse-to-Fine Deformable Part Model – Fast approach for deformable object detection (CVPR 2011).
- Poselets – C++ and Matlab versions for object detection based on poselets.
- Part-based Face Detector and Pose Estimation – Implementation of a unified approach for face detection, pose estimation, and landmark localization (CVPR 2012).
Attributes and Semantic Features
- Relative Attributes – Modified implementation of RankSVM to train Relative Attributes (ICCV 2011).
- Object Bank – Implementation of object bank semantic features (NIPS 2010). See also ActionBank
- Classemes, Picodes, and Meta-class features – Software for extracting high-level image descriptors (ECCV 2010, NIPS 2011, CVPR 2012).
Large-Scale Learning
- Additive Kernels – Source code for fast additive kernel SVM classifiers (PAMI 2013).
- LIBLINEAR – Library for large-scale linear SVM classification.
- VLFeat – Implementation for Pegasos SVM and Homogeneous Kernel map.
Fast Indexing and Image Retrieval
- FLANN – Library for performing fast approximate nearest neighbor.
- Kernelized LSH – Source code for Kernelized Locality-Sensitive Hashing (ICCV 2009).
- ITQ Binary codes – Code for generation of small binary codes using Iterative Quantization and other baselines such as Locality-Sensitive-Hashing (CVPR 2011).
- INRIA Image Retrieval – Efficient code for state-of-the-art large-scale image retrieval (CVPR 2011).
Object Detection
- See Part-based Models and Convolutional Nets above.
- Pedestrian Detection at 100fps – Very fast and accurate pedestrian detector (CVPR 2012).
- Caltech Pedestrian Detection Benchmark – Excellent resource for pedestrian detection, with various links for state-of-the-art implementations.
- OpenCV – Enhanced implementation of Viola&Jones real-time object detector, with trained models for face detection.
- Efficient Subwindow Search – Source code for branch-and-bound optimization for efficient object localization (CVPR 2008).
3D Recognition
- Point-Cloud Library – Library for 3D image and point cloud processing.
Action Recognition
- ActionBank – Source code for action recognition based on the ActionBank representation (CVPR 2012).
- STIP Features – software for computing space-time interest point descriptors
- Independent Subspace Analysis – Look for Stacked ISA for Videos (CVPR 2011)
- Velocity Histories of Tracked Keypoints – C++ code for activity recognition using the velocity histories of tracked keypoints (ICCV 2009)
Datasets
Attributes
- Animals with Attributes – 30,475 images of 50 animals classes with 6 pre-extracted feature representations for each image.
- aYahoo and aPascal – Attribute annotations for images collected from Yahoo and Pascal VOC 2008.
- FaceTracer – 15,000 faces annotated with 10 attributes and fiducial points.
- PubFig – 58,797 face images of 200 people with 73 attribute classifier outputs.
- LFW – 13,233 face images of 5,749 people with 73 attribute classifier outputs.
- Human Attributes – 8,000 people with annotated attributes. Check also this link for another dataset of human attributes.
- SUN Attribute Database – Large-scale scene attribute database with a taxonomy of 102 attributes.
- ImageNet Attributes – Variety of attribute labels for the ImageNet dataset.
- Relative attributes – Data for OSR and a subset of PubFig datasets. Check also this link for the WhittleSearch data.
- Attribute Discovery Dataset – Images of shopping categories associated with textual descriptions.
Fine-grained Visual Categorization
- Caltech-UCSD Birds Dataset – Hundreds of bird categories with annotated parts and attributes.
- Stanford Dogs Dataset – 20,000 images of 120 breeds of dogs from around the world.
- Oxford-IIIT Pet Dataset – 37 category pet dataset with roughly 200 images for each class. Pixel level trimap segmentation is included.
- Leeds Butterfly Dataset – 832 images of 10 species of butterflies.
- Oxford Flower Dataset – Hundreds of flower categories.
Face Detection
- FDDB – UMass face detection dataset and benchmark (5,000+ faces)
- CMU/MIT – Classical face detection dataset.
Face Recognition
- Face Recognition Homepage – Large collection of face recognition datasets.
- LFW – UMass unconstrained face recognition dataset (13,000+ face images).
- NIST Face Homepage – includes face recognition grand challenge (FRGC), vendor tests (FRVT) and others.
- CMU Multi-PIE – contains more than 750,000 images of 337 people, with 15 different views and 19 lighting conditions.
- FERET – Classical face recognition dataset.
- Deng Cai’s face dataset in Matlab Format – Easy to use if you want play with simple face datasets including Yale, ORL, PIE, and Extended Yale B.
- SCFace – Low-resolution face dataset captured from surveillance cameras.
Handwritten Digits
- MNIST – large dataset containing a training set of 60,000 examples, and a test set of 10,000 examples.
Pedestrian Detection
- Caltech Pedestrian Detection Benchmark – 10 hours of video taken from a vehicle,350K bounding boxes for about 2.3K unique pedestrians.
- INRIA Person Dataset – Currently one of the most popular pedestrian detection datasets.
- ETH Pedestrian Dataset – Urban dataset captured from a stereo rig mounted on a stroller.
- TUD-Brussels Pedestrian Dataset – Dataset with image pairs recorded in an crowded urban setting with an onboard camera.
- PASCAL Human Detection – One of 20 categories in PASCAL VOC detection challenges.
- USC Pedestrian Dataset – Small dataset captured from surveillance cameras.
Generic Object Recognition
- ImageNet – Currently the largest visual recognition dataset in terms of number of categories and images.
- Tiny Images – 80 million 32×32 low resolution images.
- Pascal VOC – One of the most influential visual recognition datasets.
- Caltech 101 / Caltech 256 – Popular image datasets containing 101 and 256 object categories, respectively.
- MIT LabelMe – Online annotation tool for building computer vision databases.
Scene Recognition
- MIT SUN Dataset – MIT scene understanding dataset.
- UIUC Fifteen Scene Categories – Dataset of 15 natural scene categories.
Feature Detection and Description
- VGG Affine Dataset – Widely used dataset for measuring performance of feature detection and description. CheckVLBenchmarks for an evaluation framework.
Action Recognition
- Benchmarking Activity Recognition – CVPR 2012 tutorial covering various datasets for action recognition.
RGBD Recognition
- RGB-D Object Dataset – Dataset containing 300 common household objects