CVPR2022 Papers (Papers/Codes/Demos)
https://github.com/gbstack/cvpr-2022-papers
分类目录:
1. 检测
2. 分割(Segmentation)
3. 图像处理(Image Processing)
4. 估计(Estimation)
5. 图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)
6. 人脸(Face)
7. 三维视觉(3D Vision)
8. 目标跟踪(Object Tracking)
9. 医学影像(Medical Imaging)
10. 文本检测/识别(Text Detection/Recognition)
11. 遥感图像(Remote Sensing Image)
12. GAN/生成式/对抗式(GAN/Generative/Adversarial)
13. 图像生成/合成(Image Generation/Image Synthesis)
14. 场景图(Scene Graph
15. 视觉定位(Visual Localization)
16. 视觉推理/视觉问答(Visual Reasoning/VQA)
17. 图像分类(Image Classification)
18. 神经网络结构设计(Neural Network Structure Design)
19. 模型压缩(Model Compression)
20. 模型训练/泛化(Model Training/Generalization)
21. 模型评估(Model Evaluation)
22. 数据处理(Data Processing)
23. 主动学习(Active Learning)
24. 小样本学习/零样本学习(Few-shot/Zero-shot Learning)
25. 持续学习(Continual Learning/Life-long Learning)
26. 迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)
27. 度量学习(Metric Learning)
28. 对比学习(Contrastive Learning)
29. 增量学习(Incremental Learning)
30. 强化学习(Reinforcement Learning)
31. 元学习(Meta Learning)
32. 多模态学习(Multi-Modal Learning)
33. 视觉预测(Vision-based Prediction)
34. 数据集(Dataset)
35. 机器人(Robotic)
36. 自监督学习/半监督学习
检测
2D目标检测(2D Object Detection)
Oriented RepPoints for Aerial Object Detection(面向空中目标检测的 RepPoints)(小目标检测)
Confidence Propagation Cluster: Unleash Full Potential of Object Detectors(信心传播集群:释放物体检测器的全部潜力)
Semantic-aligned Fusion Transformer for One-shot Object Detection(用于一次性目标检测的语义对齐融合转换器)
A Dual Weighting Label Assignment Scheme for Object Detection(一种用于目标检测的双重加权标签分配方案)
MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection(混合图像块和 UnMix 特征块用于半监督目标检测)
SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection(域自适应对象检测的语义完全图匹配)
Accelerating DETR Convergence via Semantic-Aligned Matching(通过语义对齐匹配加速 DETR 收敛)
Focal and Global Knowledge Distillation for Detectors(探测器的焦点和全局知识蒸馏)
keywords: Object Detection, Knowledge Distillation
Unknown-Aware Object Detection: Learning What You Don’t Know from Videos in the Wild(未知感知对象检测:从野外视频中学习你不知道的东西)
Localization Distillation for Dense Object Detection(密集对象检测的定位蒸馏)
keywords: Bounding Box Regression, Localization Quality Estimation, Knowledge Distillation
视频目标检测(Video Object Detection)
Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering(通过联合表示学习和在线聚类进行无监督活动分割)
3D目标检测(3D object detection)
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers(用于 3D 对象检测的稳健 LiDAR-Camera Fusion 与 Transformer)
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds(学习用于 3D LiDAR 点云的高效基于点的检测器)
Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion(迈向具有深度完成的高质量 3D 检测)
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer(使用深度感知 Transformer 的单目 3D 对象检测)
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds(从点云进行 3D 对象检测的 Set-to-Set 方法)
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection(单目 3D 目标检测的联合语义和几何成本量)
DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection(用于多模态 3D 目标检测的激光雷达相机深度融合)
Point Density-Aware Voxels for LiDAR 3D Object Detection(用于 LiDAR 3D 对象检测的点密度感知体素)
Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement(带有形状引导标签增强的弱监督 3D 对象检测)
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes(在 3D 场景中实现稳健的定向边界框检测)
A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation(在全景分割的指导下,用于基于 LiDAR 的 3D 对象检测的多功能多视图框架)
keywords: 3D Object Detection with Point-based Methods, 3D Object Detection with Grid-based Methods, Cluster-free 3D Panoptic Segmentation, CenterPoint 3D Object Detection
Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving(自动驾驶中用于单目 3D 目标检测的伪立体)
keywords: Autonomous Driving, Monocular 3D Object Detection
伪装目标检测(Camouflaged Object Detection)
Implicit Motion Handling for Video Camouflaged Object Detection(视频伪装对象检测的隐式运动处理)
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection(放大和缩小:用于伪装目标检测的混合尺度三元组网络)
显著性目标检测(Saliency Object Detection)
Bi-directional Object-context Prioritization Learning for Saliency Ranking(显着性排名的双向对象上下文优先级学习)
Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection()
关键点检测(Keypoint Detection)
UKPGAN: A General Self-Supervised Keypoint Detector(一个通用的自监督关键点检测器)
车道线检测(Lane Detection)
CLRNet: Cross Layer Refinement Network for Lane Detection(用于车道检测的跨层细化网络)
Rethinking Efficient Lane Detection via Curve Modeling(通过曲线建模重新思考高效车道检测)
keywords: Segmentation-based Lane Detection, Point Detection-based Lane Detection, Curve-based Lane Detection, autonomous driving
边缘检测(Edge Detection)
EDTER: Edge Detection with Transformer(使用transformer的边缘检测)
消失点检测(Vanishing Point Detection)
Deep vanishing point detection: Geometric priors make dataset variations vanish(深度消失点检测**:几何先验使数据集变化消失)**
分割(Segmentation)
图像分割(Image Segmentation)
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation(学习不分割的内容:关于小样本分割的新视角)
CRIS: CLIP-Driven Referring Image Segmentation(CLIP 驱动的参考图像分割)
Hyperbolic Image Segmentation(双曲线图像分割)
全景分割(Panoptic Segmentation)
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers(使用 Transformers 深入研究全景分割)
Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation(弯曲现实:适应全景语义分割的失真感知Transformer)
keywords: Semanticand panoramic segmentation, Unsupervised domain adaptation, Transformer
语义分割(Semantic Segmentation)
Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation(用于域自适应语义分割的类平衡像素级自标记)
Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation(弱监督语义分割的区域语义对比和聚合)
Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation(走向稀疏注释的语义分割)
Scribble-Supervised LiDAR Semantic Segmentation
ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation(多目标域自适应语义分割的直接适应策略)
Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast(通过像素到原型对比的弱监督语义分割)
Representation Compensation Networks for Continual Semantic Segmentation(连续语义分割的表示补偿网络)
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels(使用不可靠伪标签的半监督语义分割)
Weakly Supervised Semantic Segmentation using Out-of-Distribution Data(使用分布外数据的弱监督语义分割)
Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation(弱监督语义分割的自监督图像特定原型探索)
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation(用于弱监督语义分割的多类token Transformer)
Cross Language Image Matching for Weakly Supervised Semantic Segmentation(用于弱监督语义分割的跨语言图像匹配)
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers(从注意力中学习亲和力:使用 Transformers 的端到端弱监督语义分割)
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation(让自我训练更好地用于半监督语义分割)
keywords: Semi-supervised learning, Semantic segmentation, Uncertainty estimation
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation(弱监督语义分割的类重新激活图)
实例分割(Instance Segmentation)
ContrastMask: Contrastive Learning to Segment Every Thing(对比学习分割每件事)
Discovering Objects that Can Move(发现可以移动的物体)
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation(一种基于端到端轮廓的高质量高速实例分割方法)
Efficient Video Instance Segmentation via Tracklet Query and Proposal(通过 Tracklet Query 和 Proposal 进行高效的视频实例分割)
SoftGroup for 3D Instance Segmentation on Point Clouds(用于点云上的 3D 实例分割)
keywords: 3D Vision, Point Clouds, Instance Segmentation
视频目标分割(Video Object Segmentation)
Language as Queries for Referring Video Object Segmentation(语言作为引用视频对象分割的查询)
密集预测(Dense Prediction)
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting(具有上下文感知提示的语言引导密集预测)
视频处理(Video Processing)
视频处理(Video Processing)
Neural Compression-Based Feature Learning for Video Restoration(用于视频复原的基于神经压缩的特征学习)
视频编辑(Video Editing)
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers(M3L:通过多模式多级transformer进行基于语言的视频编辑)
视频生成/视频合成(Video Generation/Video Synthesis)
Depth-Aware Generative Adversarial Network for Talking Head Video Generation(用于说话头视频生成的深度感知生成对抗网络)
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning(告诉我什么并告诉我如何:通过多模式调节进行视频合成)
估计(Estimation)
光流/运动估计(Optical Flow/Motion Estimation)
Global Matching with Overlapping Attention for Optical Flow Estimation(具有重叠注意力的全局匹配光流估计)
CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation(用于联合光流和场景流估计的双向相机-LiDAR 融合)
深度估计(Depth Estimation)
Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation(基于自适应相关的级联循环网络的实用立体匹配)
Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light(结合双目立体和单目结构光的深度估计)
RGB-Depth Fusion GAN for Indoor Depth Completion(用于室内深度完成的 RGB 深度融合 GAN)
Revisiting Domain Generalized Stereo Matching Networks from a Feature Consistency Perspective(从特征一致性的角度重新审视域广义立体匹配网络)
Deep Depth from Focus with Differential Focus Volume(具有不同焦点体积的焦点深度)
ChiTransformer:Towards Reliable Stereo from Cues(从线索走向可靠的立体声)
Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation and Focal Loss(重新思考多视图立体的深度估计:统一表示和焦点损失)
ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks(立体匹配网络中自动避免捷径和域泛化的信息论方法)
keywords: Learning-based Stereo Matching Networks, Single Domain Generalization, Shortcut Learning
Attention Concatenation Volume for Accurate and Efficient Stereo Matching(用于精确和高效立体匹配的注意力连接体积)
keywords: Stereo Matching, cost volume construction, cost aggregation
Occlusion-Aware Cost Constructor for Light Field Depth Estimation(光场深度估计的遮挡感知成本构造函数)
paper |
(https://github.com/YingqianWang/OACC- Net)
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation(用于单目深度估计的神经窗口全连接 CRF)
keywords: Neural CRFs for Monocular Depth
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion(通过几何感知融合进行 360 度单目深度估计)
keywords: monocular depth estimation(单目深度估计),transformer
人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)
Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization(用于单目绝对 3D 定位的基于射线的 3D 人体姿态估计)
Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video(捕捉运动中的人类:来自单目视频的时间注意 3D 人体姿势和形状估计)
Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors(来自稀疏惯性传感器的物理感知实时人体运动跟踪)
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation(用于多人 3D 姿势估计的分布感知单阶段模型)
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation(用于 3D 人体姿势估计的多假设transformer)
CDGNet: Class Distribution Guided Network for Human Parsing(用于人类解析的类分布引导网络)
Forecasting Characteristic 3D Poses of Human Actions(预测人类行为的特征 3D 姿势)
Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation(学习用于多人姿势估计的局部-全局上下文适应)
keywords: Top-Down Pose Estimation(从上至下姿态估计), Limb-based Grouping, Direct Regression
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video(用于视频中 3D 人体姿势估计的 Seq2seq 混合时空编码器)
图像处理(Image Processing)
超分辨率(Super Resolution)
Local Texture Estimator for Implicit Representation Function(隐式表示函数的局部纹理估计器)
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution(一种用于空间变形鲁棒场景文本图像超分辨率的文本注意网络)
Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution(一种真实图像超分辨率的局部判别学习方法)
Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel(对噪声和核进行精细退化建模的盲图像超分辨率)
Reflash Dropout in Image Super-Resolution(图像超分辨率中的闪退dropout)
Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence(迈向双向任意图像缩放:联合优化和循环幂等)
HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening(用于全色锐化的纹理和光谱特征融合Transformer)
HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging(光谱压缩成像的高分辨率双域学习)
keywords: HSI Reconstruction, Self-Attention Mechanism, Image Frequency Spectrum Analysis
图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)
Exploring and Evaluating Image Restoration Potential in Dynamic Scenes(探索和评估动态场景中的图像复原潜力)
Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction(通过随机收缩加速逆问题的条件扩散模型)
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction(用于高效高光谱图像重建的掩模引导光谱变换器)
Restormer: Efficient Transformer for High-Resolution Image Restoration(用于高分辨率图像复原的高效transformer)
Event-based Video Reconstruction via Potential-assisted Spiking Neural Network(通过电位辅助尖峰神经网络进行基于事件的视频重建)
图像去噪/去模糊/去雨去雾(Image Denoising)
AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network(通过非对称 PD 和盲点网络对真实世界图像进行自监督去噪)
IDR: Self-Supervised Image Denoising via Iterative Data Refinement(通过迭代数据细化的自监督图像去噪)
Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots(具有可见盲点的自监督图像去噪)
E-CIR: Event-Enhanced Continuous Intensity Recovery(事件增强的连续强度恢复)
keywords: Event-Enhanced Deblurring, Video Representation
图像编辑/图像修复(Image Edit/Inpainting)
High-Fidelity GAN Inversion for Image Attribute Editing(用于图像属性编辑的高保真 GAN 反演)
Style Transformer for Image Inversion and Editing(用于图像反转和编辑的样式transformer)
MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting(用于高保真图像修复的多级交互式 Siamese 过滤)
HairCLIP: Design Your Hair by Text and Reference Image(通过文本和参考图像设计你的头发)
keywords: Language-Image Pre-Training (CLIP), Generative Adversarial Networks
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding(增量transformer结构增强图像修复与掩蔽位置编码)
keywords: Image Inpainting, Transformer, Image Generation
图像翻译(Image Translation)
Globetrotter: Connecting Languages by Connecting Images(通过连接图像连接语言)
QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation(图像翻译中对比学习的查询选择注意)
FlexIT: Towards Flexible Semantic Image Translation(迈向灵活的语义图像翻译)
Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks(探索图像到图像翻译任务中对比学习的补丁语义关系)
keywords: image translation, knowledge transfer,Contrastive learning
风格迁移(Style Transfer)
Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization(任意风格迁移和域泛化的精确特征分布匹配)
Style-ERD: Responsive and Coherent Online Motion Style Transfer(响应式和连贯的在线运动风格迁移)
CLIPstyler: Image Style Transfer with a Single Text Condition(具有单一文本条件的图像风格转移)
keywords: Style Transfer, Text-guided synthesis, Language-Image Pre-Training (CLIP)
人脸(Face)
人脸(Face)
Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?(跨模态感知者:可以从声音中收集面部几何形状吗?)
Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data(利用 3D 合成数据去除人像眼镜和阴影)
HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network(分层解析胶囊网络的无监督人脸部分发现)
FaceFormer: Speech-Driven 3D Facial Animation with Transformers(FaceFormer:带有transformer的语音驱动的 3D 面部动画)
Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning(用于鲁棒人脸对齐和地标固有关系学习的稀疏局部补丁transformer)
人脸识别/检测(Facial Recognition/Detection)
Privacy-preserving Online AutoML for Domain-Specific Face Detection(用于特定领域人脸检测的隐私保护在线 AutoML)
An Efficient Training Approach for Very Large Scale Face Recognition(一种有效的超大规模人脸识别训练方法)
人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)
FENeRF: Face Editing in Neural Radiance Fields(神经辐射场中的人脸编辑)
GCFSR: a Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors(一种没有面部和 GAN 先验的生成可控人脸超分辨率方法)
Sparse to Dense Dynamic 3D Facial Expression Generation(稀疏到密集的动态 3D 面部表情生成)
keywords: Facial expression generation, 4D face generation, 3D face modeling
人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)
Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing(通过 Shuffled Style Assembly 进行域泛化以进行人脸反欺骗)
Voice-Face Homogeneity Tells Deepfake
Protecting Celebrities with Identity Consistency Transformer(使用身份一致性transformer保护名人)
目标跟踪(Object Tracking)
目标跟踪(Object Tracking)
Transforming Model Prediction for Tracking(转换模型预测以进行跟踪)
MixFormer: End-to-End Tracking with Iterative Mixed Attention(具有迭代混合注意力的端到端跟踪)
Unsupervised Domain Adaptation for Nighttime Aerial Tracking(夜间空中跟踪的无监督域自适应)
Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects(迭代对应几何:融合区域和深度以实现无纹理对象的高效 3D 跟踪)
paper |
(https://github.com/DLR- RM/3DObjectTracking)
TCTrack: Temporal Contexts for Aerial Tracking(空中跟踪的时间上下文)
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds(超越 3D 连体跟踪:点云中 3D 单对象跟踪的以运动为中心的范式)
keywords: Single Object Tracking, 3D Multi-object Tracking / Detection, Spatial-temporal Learning on Point Clouds
Correlation-Aware Deep Tracking(相关感知深度跟踪)
图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)
图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)
Bridging Video-text Retrieval with Multiple Choice Questions(桥接视频文本检索与多项选择题)
BEVT: BERT Pretraining of Video Transformers(视频Transformer的 BERT 预训练)
keywords: Video understanding, Vision transformers, Self-supervised representation learning, BERT pretraining
行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)
E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition(用于以自我为中心的动作识别的运动增强事件流)
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos(寻找变化:从未修剪的网络视频中学习对象状态和状态修改操作)
DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition(鲁棒动作识别的 Transformer 方法中的定向注意)
Self-supervised Video Transformer(自监督视频transformer)
Spatio-temporal Relation Modeling for Few-shot Action Recognition(小样本动作识别的时空关系建模)
RCL: Recurrent Continuous Localization for Temporal Action Detection(用于时间动作检测的循环连续定位)
OpenTAL: Towards Open Set Temporal Action Localization(走向开放集时间动作定位)
End-to-End Semi-Supervised Learning for Video Action Detection(视频动作检测的端到端半监督学习)
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos(模态特定注释视频上多模态动作识别的可学习不相关模态丢失)
Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation(通过代表性片段知识传播的弱监督时间动作定位)
Colar: Effective and Efficient Online Action Detection by Consulting Exemplars(通过咨询示例进行有效且高效的在线动作检测)
keywords: Online action detection(在线动作检测)
行人重识别/检测(Re-Identification/Detection)
Cascade Transformers for End-to-End Person Search(用于端到端人员搜索的级联transformer)
图像/视频字幕(Image/Video Caption)
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources(通过在线资源对上下文外图像进行开放域、基于内容、多模式的事实检查)
Hierarchical Modular Network for Video Captioning(用于视频字幕的分层模块化网络)
X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移)
医学影像(Medical Imaging)
医学影像(Medical Imaging)
ACPL: Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification(半监督医学图像分类的反课程伪标签)
Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces from 3D MRI Scans with Geometric Deep Neural Networks(使用几何深度神经网络从 3D MRI 扫描中快速显式重建皮质表面)
Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization(通过风格增强和双重归一化的可泛化跨模态医学图像分割)
Adaptive Early-Learning Correction for Segmentation from Noisy Annotations(从噪声标签中分割的自适应早期学习校正)
keywords: medical-imaging segmentation, Noisy Annotations
Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations(时间上下文很重要:使用疾病进展表示增强单图像预测)
keywords: Self-supervised Transformer, Temporal modeling of disease progression
文本检测/识别/理解(Text Detection/Recognition/Understanding)
文本检测/识别/理解(Text Detection/Recognition/Understanding)
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition(通过文本检测和文本识别之间更好的协同作用进行场景文本定位)
Fourier Document Restoration for Robust Document Dewarping and Recognition(用于鲁棒文档去扭曲和识别的傅里叶文档恢复)
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding(迈向布局感知多模式网络,以实现视觉丰富的文档理解)
GAN/生成式/对抗式(GAN/Generative/Adversarial)
GAN/生成式/对抗式(GAN/Generative/Adversarial)
Subspace Adversarial Training(子空间对抗训练)
DTA: Physical Camouflage Attacks using Differentiable Transformation Network(使用可微变换网络的物理伪装攻击)
Improving the Transferability of Targeted Adversarial Examples through Object-Based Diverse Input(通过基于对象的多样化输入提高目标对抗样本的可迁移性)
Towards Practical Certifiable Patch Defense with Vision Transformer(使用 Vision Transformer 实现实用的可认证补丁防御)
Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment(基于松弛空间结构对齐的小样本生成模型自适应)
Enhancing Adversarial Training with Second-Order Statistics of Weights(使用权重的二阶统计加强对抗训练)
Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack(通过自适应自动攻击对对抗鲁棒性的实际评估)
Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity(对语义相似性的频率驱动的不可察觉的对抗性攻击)
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon(阴影可能很危险:自然现象的隐秘而有效的物理世界对抗性攻击)
Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer(保护面部隐私:通过风格稳健的化妆转移生成对抗性身份面具)
Adversarial Texture for Fooling Person Detectors in the Physical World(物理世界中愚弄人探测器的对抗性纹理)
Label-Only Model Inversion Attacks via Boundary Repulsion(通过边界排斥的仅标签模型反转攻击)
图像生成/图像合成(Image Generation/Image Synthesis)
图像生成/图像合成(Image Generation/Image Synthesis)
Modulated Contrast for Versatile Image Synthesis(用于多功能图像合成的调制对比度)
Attribute Group Editing for Reliable Few-shot Image Generation(属性组编辑用于可靠的小样本图像生成)
Text to Image Generation with Semantic-Spatial Aware GAN(使用语义空间感知 GAN 生成文本到图像)
Playable Environments: Video Manipulation in Space and Time(可播放环境:空间和时间的视频操作)
FLAG: Flow-based 3D Avatar Generation from Sparse Observations(从稀疏观察中生成基于流的 3D 头像)
Dynamic Dual-Output Diffusion Models(动态双输出扩散模型)
Exploring Dual-task Correlation for Pose Guided Person Image Generation(探索姿势引导人物图像生成的双任务相关性)
3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces(基于小批量特征交换的三维形状变化自动编码器潜在解纠缠)
Interactive Image Synthesis with Panoptic Layout Generation(具有全景布局生成的交互式图像合成)
Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values(极性采样:通过奇异值对预训练生成网络的质量和多样性控制)
Autoregressive Image Generation using Residual Quantization(使用残差量化的自回归图像生成)
三维视觉(3D Vision)
三维视觉(3D Vision)
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings(在 3D 网格中嵌入消息并从 2D 渲染中提取它们)
X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移)
点云(Point Cloud)
IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment(通过深度嵌入对齐的动态 3D 点云插值)
No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces(没有痛苦,收获很大:通过拟合特征级时空表面,用静态模型对动态点云序列进行分类)
AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation(通用 3D 零件分割的中间监督搜索)
Geometric Transformer for Fast and Robust Point Cloud Registration(用于快速和稳健点云配准的几何transformer)
Contrastive Boundary Learning for Point Cloud Segmentation(点云分割的对比边界学习)
Shape-invariant 3D Adversarial Point Clouds(形状不变的 3D 对抗点云)
ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation(通过对抗旋转提高点云分类器的旋转鲁棒性)
Lepard: Learning partial point cloud matching in rigid and deformable scenes(Lepard:在刚性和可变形场景中学习部分点云匹配)
A Unified Query-based Paradigm for Point Cloud Understanding(一种基于统一查询的点云理解范式)
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding(用于 3D 点云理解的自监督跨模态对比学习)
keywords: Self-Supervised Learning, Contrastive Learning, 3D Point Cloud, Representation Learning, Cross-Modal Learning
三维重建(3D Reconstruction)
ϕ-SfT: Shape-from-Template with a Physics-Based Deformation Model(具有基于物理的变形模型的模板形状)
Input-level Inductive Biases for 3D Reconstruction(用于 3D 重建的输入级归纳偏差)
AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation(用于 3D 完成、重建和生成的形状先验)
Interacting Attention Graph for Single Image Two-Hand Reconstruction(单幅图像双手重建的交互注意力图)
OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction(实时动态 3D 重建的遮挡感知运动估计)
Neural RGB-D Surface Reconstruction(神经 RGB-D 表面重建)
Neural Face Identification in a 2D Wireframe Projection of a Manifold Object(流形对象的二维线框投影中的神经人脸识别)
paper |
(https://manycore- research.github.io/faceformer)
Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers(使用伤口分割和重建生成 3D 生物可打印贴片以治疗糖尿病足溃疡)
keywords: semantic segmentation, 3D reconstruction, 3D bio-printers
H4D: Human 4D Modeling by Learning Neural Compositional Representation(通过学习神经组合表示进行人体 4D 建模)
keywords: 4D Representation(4D 表征),Human Body Estimation(人体姿态估计),Fine-grained Human Reconstruction(细粒度人体重建)
场景重建/视图合成/新视角合成(Novel View Synthesis)
NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction(用于大规模场景重建的融合辐射场)
GeoNeRF: Generalizing NeRF with Geometry Priors(用几何先验概括 NeRF)
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions(室内 3D 场景重建的风格转换)
Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image(向外看:从单个图像合成一致的长期 3D 场景视频)
Point-NeRF: Point-based Neural Radiance Fields(基于点的神经辐射场)
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields(文本和图像驱动的神经辐射场操作)
keywords: NeRF, Image Generation and Manipulation, Language-Image Pre-Training (CLIP)
Point-NeRF: Point-based Neural Radiance Fields(基于点的神经辐射场)
模型压缩(Model Compression)
知识蒸馏(Knowledge Distillation)
Decoupled Knowledge Distillation(解耦知识蒸馏)
Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation(小波知识蒸馏:迈向高效的图像到图像转换)
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability(知识蒸馏作为高效的预训练:更快的收敛、更高的数据效率和更好的可迁移性)
Focal and Global Knowledge Distillation for Detectors(探测器的焦点和全局知识蒸馏)
keywords: Object Detection, Knowledge Distillation
剪枝(Pruning)
Interspace Pruning: Using Adaptive Filter Representations to Improve Training of Sparse CNNs(空间剪枝:使用自适应滤波器表示来改进稀疏 CNN 的训练)
量化(Quantization)
Implicit Feature Decoupling with Depthwise Quantization(使用深度量化的隐式特征解耦)
IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization(学习具有类内异质性的合成图像以进行零样本网络量化)
神经网络结构设计(Neural Network Structure Design)
神经网络结构设计(Neural Network Structure Design)
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning(学习探索样本关系以进行鲁棒表征学习)
keywords: sample relationship, data scarcity learning, Contrastive Self-Supervised Learning, long-tailed recognition, zero-shot learning, domain generalization, self-supervised learning
CNN
TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing(用于布局感知视觉处理的高效翻译变体卷积)(动态卷积)
On the Integration of Self-Attention and Convolution(自注意力和卷积的整合)
Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs(将内核扩展到 31×31:重新审视 CNN 中的大型内核设计)
DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos(视频中稀疏帧差异的端到端 CNN 推断)
keywords: sparse convolutional neural network, video inference accelerating
A ConvNet for the 2020s
Transformer
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition(在视觉transformer中为视觉识别指定协同上下文)
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts(深入研究分布变化下的视觉Transformer的泛化)
keywords: out-of-distribution (OOD) generalization, Vision Transformers
Mobile-Former: Bridging MobileNet and Transformer(连接 MobileNet 和 Transformer)
keywords: Light-weight convolutional neural networks(轻量卷积神经网络),Combination of CNN and ViT
神经网络架构搜索(NAS)
Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning(MAML 的全局收敛和受理论启发的神经架构搜索以进行 Few-Shot 学习)
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search(可微架构搜索的 Beta-Decay 正则化)
MLP
Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information(利用地理和时间信息进行细粒度图像分类的动态 MLP)
Revisiting the Transferability of Supervised Pretraining: an MLP Perspective(重新审视监督预训练的可迁移性:MLP 视角)
An Image Patch is a Wave: Quantum Inspired Vision MLP(图像补丁是波浪:量子启发的视觉 MLP)
数据处理(Data Processing)
数据处理(Data Processing)
Dataset Distillation by Matching Training Trajectories(通过匹配训练轨迹进行数据集蒸馏)(数据集蒸馏)
数据增广(Data Augmentation)
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge(使用教师知识进行数据增强优化)
3D Common Corruptions and Data Augmentation(3D 常见损坏和数据增强)
keywords: Data Augmentation, Image restoration, Photorealistic image synthesis
归一化/正则化(Batch Normalization)
Delving into the Estimation Shift of Batch Normalization in a Network(深入研究网络中批量标准化的估计偏移)
图像聚类(Image Clustering)
RAMA: A Rapid Multicut Algorithm on GPU(GPU 上的快速多切算法)
图像压缩(Image Compression)
Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression(用于高效神经图像压缩的统一多元高斯混合)
ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding(具有不均匀分组的空间通道上下文自适应编码的高效学习图像压缩)
The Devil Is in the Details: Window-based Attention for Image Compression(细节中的魔鬼:图像压缩的基于窗口的注意力)
Neural Data-Dependent Transform for Learned Image Compression(用于学习图像压缩的神经数据相关变换)
异常检测(Anomaly Detection)
ViM: Out-Of-Distribution with Virtual-logit Matching(具有虚拟 logit 匹配的分布外)(OOD检测)
Generative Cooperative Learning for Unsupervised Video Anomaly Detection(用于无监督视频异常检测的生成式协作学习)
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection(用于异常检测的自监督预测卷积注意力块)(论文暂未上传)
模型训练/泛化(Model Training/Generalization)
模型训练/泛化(Model Training/Generalization)
Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective(神经网络可以两次学习相同的模型吗? 从决策边界的角度研究可重复性和双重下降)
Towards Efficient and Scalable Sharpness-Aware Minimization(迈向高效和可扩展的锐度感知最小化)
keywords: Sharp Local Minima, Large-Batch Training
CAFE: Learning to Condense Dataset by Aligning Features(通过对齐特征学习压缩数据集)
keywords: dataset condensation, coreset selection, generative models
The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration(魔鬼在边缘:用于网络校准的基于边缘的标签平滑)
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising(通过引入查询去噪加速 DETR 训练)
keywords: Detection Transformer
噪声标签(Noisy Label)
Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels(带有噪声标签的学习中噪声检测的可扩展惩罚回归)
Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels(Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels)
长尾分布(Long-Tailed Distribution)
Targeted Supervised Contrastive Learning for Long-Tailed Recognition(用于长尾识别的有针对性的监督对比学习)
keywords: Long-Tailed Recognition(长尾识别), Contrastive Learning(对比学习)
图像特征提取与匹配(Image feature extraction and matching)
图像特征提取与匹配(Image feature extraction and matching)
Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences(弱监督语义对应的概率扭曲一致性)
视觉表征学习(Visual Representation Learning)
视觉表征学习(Visual Representation Learning)
SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization(通过相似性感知归一化探索场景文本的自监督表示学习)
Exploring Set Similarity for Dense Self-supervised Representation Learning(探索密集自监督表示学习的集合相似性)
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging(通过前景-背景合并的运动感知对比视频表示学习)
多模态学习(Multi-Modal Learning)
多模态学习(Multi-Modal Learning)
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound(通过视觉、语言和声音的神经脚本知识)
视觉-语言(Vision-language)
An Empirical Study of Training End-to-End Vision-and-Language Transformers(培训端到端视觉和语言transformer的实证研究)
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding(为视觉基础生成伪语言查询)
Conditional Prompt Learning for Vision-Language Models(视觉语言模型的条件提示学习)
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks(视觉和视觉语言任务中的自然语言解释模型)
**L-Verse: Bidirectional Generation Between Image and Text(图像和文本之间的双向生成) **(Oral Presentation)****
HairCLIP: Design Your Hair by Text and Reference Image(通过文本和参考图像设计你的头发)
keywords: Language-Image Pre-Training (CLIP), Generative Adversarial Networks
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields(文本和图像驱动的神经辐射场操作)
keywords: NeRF, Image Generation and Manipulation, Language-Image Pre-Training (CLIP)
Vision-Language Pre-Training with Triple Contrastive Learning(三重对比学习的视觉语言预训练)
keywords: Vision-language representation learning, Contrastive Learning
视觉预测(Vision-based Prediction)
视觉预测(Vision-based Prediction)
Remember Intentions: Retrospective-Memory-based Trajectory Prediction(记住意图:基于回顾性记忆的轨迹预测)
GaTector: A Unified Framework for Gaze Object Prediction(凝视对象预测的统一框架)
On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles(自动驾驶汽车轨迹预测的对抗鲁棒性)
Adaptive Trajectory Prediction via Transferable GNN(基于可迁移 GNN 的自适应轨迹预测)
Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective(迈向稳健和自适应运动预测:因果表示视角)
How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting(多少个观察就足够了? 轨迹预测的知识蒸馏)
keywords: Knowledge Distillation, trajectory forecasting
Motron: Multimodal Probabilistic Human Motion Forecasting(多模式概率人体运动预测)
数据集(Dataset)
数据集(Dataset)
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining(电子商务多模态预训练的自协调对比学习)(多模态预训练数据集)
FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos(用于视频中面部表情识别的大规模多场景数据集)
Ego4D: Around the World in 3,000 Hours of Egocentric Video(3000 小时以自我为中心的视频环游世界)
GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains(用于细粒度和域自适应识别谷物的大规模数据集)
Kubric: A scalable dataset generator(Kubric:可扩展的数据集生成器)
A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection(用于分段级视频复制检测的大规模综合数据集和复制重叠感知评估协议)
主动学习(Active Learning)
主动学习(Active Learning)
Active Learning by Feature Mixing(通过特征混合进行主动学习)
小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)
小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)
Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification(小样本分类的相互集中学习)
MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning(用于零样本学习的相互语义蒸馏网络)
keywords: Zero-Shot Learning, Knowledge Distillation
持续学习(Continual Learning/Life-long Learning)
持续学习(Continual Learning/Life-long Learning)
Meta-attention for ViT-backed Continual Learning(ViT 支持的持续学习的元注意力)
Learning to Prompt for Continual Learning(学习提示持续学习)
On Generalizing Beyond Domains in Cross-Domain Continual Learning(关于跨域持续学习中的域外泛化)
场景图(Scene Graph)
场景图生成(Scene Graph Generation)
Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation(用于无偏场景图生成的堆叠混合注意力和组协作学习)
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs(将视频场景图重新格式化为时间二分图)
keywords: Video Scene Graph Generation, Transformer, Video Grounding
视觉定位/位姿估计(Visual Localization/Pose Estimation)
视觉定位/位姿估计(Visual Localization/Pose Estimation)
DiffPoseNet: Direct Differentiable Camera Pose Estimation(直接可微分相机位姿估计)
ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation(用于 6DoF 对象姿态估计的粗到细表面编码)
Object Localization under Single Coarse Point Supervision(单粗点监督下的目标定位)
CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data(多模式合成数据辅助的可扩展空中定位)
GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting(通过几何引导的逐点投票进行类别级对象位姿估计)
CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild(CPPF:在野外实现稳健的类别级 9D 位姿估计)
OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation(用于基于深度的 6D 对象位姿估计的对象视点编码)
Spatial Commonsense Graph for Object Localisation in Partial Scenes(局部场景中对象定位的空间常识图)
视觉推理/视觉问答(Visual Reasoning/VQA)
视觉推理/视觉问答(Visual Reasoning/VQA)
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering(基于知识的视觉问答的多模态知识提取与积累)
REX: Reasoning-aware and Grounded Explanation(推理意识和扎根的解释)
图像分类(Image Classification)
图像分类(Image Classification)
GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes Prediction(用于多类别属性预测的基于全局、局部和内在的密集嵌入网络)
keywords: multi-label classification
迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)
迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)
Learning Affordance Grounding from Exocentric Images(从离中心图像中学习可供性基础)
Category Contrast for Unsupervised Domain Adaptation in Visual Tasks(视觉任务中无监督域适应的类别对比)
Learning Distinctive Margin toward Active Domain Adaptation(向主动领域适应学习独特的边际)
How Well Do Sparse Imagenet Models Transfer?(稀疏 Imagenet 模型的迁移效果如何?)
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation(用于手语翻译的简单多模态迁移学习基线)
Weakly Supervised Object Localization as Domain Adaption(作为域适应的弱监督对象定位)
keywords: Weakly Supervised Object Localization(WSOL), Multi-instance learning based WSOL, Separated-structure based WSOL, Domain Adaption
度量学习(Metric Learning)
度量学习(Metric Learning)
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning(双曲线视觉transformer:结合度量学习的改进)
Non-isotropy Regularization for Proxy-based Deep Metric Learning(基于代理的深度度量学习的非各向同性正则化)
Integrating Language Guidance into Vision-based Deep Metric Learning(将语言指导集成到基于视觉的深度度量学习中)
Enhancing Adversarial Robustness for Deep Metric Learning(增强深度度量学习的对抗鲁棒性)
keywords: Adversarial Attack, Adversarial Defense, Deep Metric Learning
对比学习(Contrastive Learning)
对比学习(Contrastive Learning)
Rethinking Minimal Sufficient Representation in Contrastive Learning(重新思考对比学习中的最小充分表示)
Selective-Supervised Contrastive Learning with Noisy Labels(带有噪声标签的选择性监督对比学习)
HCSC: Hierarchical Contrastive Selective Coding(分层对比选择性编码)
keywords: Self-supervised Representation Learning, Deep Clustering, Contrastive Learning
Crafting Better Contrastive Views for Siamese Representation Learning(为连体表示学习制作更好的对比视图)
增量学习(Incremental Learning)
增量学习(Incremental Learning)
Forward Compatible Few-Shot Class-Incremental Learning(前后兼容的小样本类增量学习)
Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning(非示例类增量学习的自我维持表示扩展)
元学习(Meta Learning)
元学习(Meta Learning)
What Matters For Meta-Learning Vision Regression Tasks?(元学习视觉回归任务的重要性是什么?)
机器人(Robotic)
机器人(Robotic)
Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation(通过离散化实现视觉机器人操作的高效学习)
IFOR: Iterative Flow Minimization for Robotic Object Rearrangement(IFOR:机器人对象重排的迭代流最小化)
自监督学习/半监督学习/无监督学习(Self-supervised Learning/Semi-supervised Learning)
自监督学习/半监督学习/无监督学习(Self-supervised Learning/Semi-supervised Learning)
SimMatch: Semi-supervised Learning with Similarity Matching(具有相似性匹配的半监督学习)
Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements(一个完全无监督的框架,用于从噪声和部分测量中学习图像)
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training(自监督视觉预训练的统一框架)
Class-Aware Contrastive Semi-Supervised Learning(类感知对比半监督学习)
keywords: Semi-Supervised Learning, Self-Supervised Learning, Real-World Unlabeled Data Learning
A study on the distribution of social biases in self-supervised learning visual models(自监督学习视觉模型中social biases分布的研究)
神经网络可解释性(Neural Network Interpretability)
神经网络可解释性(Neural Network Interpretability)
Do Explanations Explain? Model Knows Best(解释解释吗? 模型最清楚)
Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks(神经网络中可解释的部分-整体层次结构和概念语义关系)
图像计数(Image Counting)
图像计数(Image Counting)
Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting(表示、比较和学习:用于类不可知计数的相似性感知框架)
Boosting Crowd Counting via Multifaceted Attention(通过多方面注意提高人群计数)
联邦学习(Federated Learning)
联邦学习(Federated Learning)
FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction(通过局部漂移解耦和校正与非 IID 数据进行联邦学习)
Federated Class-Incremental Learning(联邦类增量学习)
Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning(通过非 IID 联邦学习的无数据知识蒸馏微调全局模型)
Differentially Private Federated Learning with Local Regularization and Sparsification(局部正则化和稀疏化的差分私有联邦学习)
其他
其他
**L-Verse: Bidirectional Generation Between Image and Text(图像和文本之间的双向生成) **(视觉语言表征学习)****
Backbone
Backbone
MPViT : Multi-Path Vision Transformer for Dense Prediction
CLIP
CLIP
PointCLIP: Point Cloud Understanding by CLIP
Blended Diffusion for Text-driven Editing of Natural Images
GAN
GAN
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
NAS
NAS
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
NeRF
NeRF
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
Urban Radiance Fields
Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
Visual Transformer
Backbone
MPViT : Multi-Path Vision Transformer for Dense Prediction
应用(Application)
Embracing Single Stride 3D Object Detector with Sparse Transformer
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
GroupViT: Semantic Segmentation Emerges from Text Supervision
Splicing ViT Features for Semantic Appearance Transfer
Mask Transfiner for High-Quality Instance Segmentation
数据增强(Data Augmentation)
数据增强(Data Augmentation)
AlignMix: Improving representation by interpolating aligned features
语义分割(Semantic Segmentation)
无监督语义分割
GroupViT: Semantic Segmentation Emerges from Text Supervision
实例分割(Instance Segmentation)
实例分割(Instance Segmentation)
Mask Transfiner for High-Quality Instance Segmentation
自监督实例分割
FreeSOLO: Learning to Segment Objects without Annotations
图像编辑(Image Editing)
图像编辑(Image Editing)
Blended Diffusion for Text-driven Editing of Natural Images
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Low-level Vision
Low-level Vision
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
超分辨率(Super-Resolution)
图像超分辨率(Image Super-Resolution)
Learning the Degradation Distribution for Blind Image Super-Resolution
视频超分辨率(Video Super-Resolution)
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
3D点云(3D Point Cloud)
3D点云(3D Point Cloud)
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
PointCLIP: Point Cloud Understanding by CLIP
3D目标检测(3D Object Detection)
3D目标检测(3D Object Detection)
Embracing Single Stride 3D Object Detector with Sparse Transformer
3D语义场景补全(3D Semantic Scene Completion)
3D语义场景补全(3D Semantic Scene Completion)
MonoScene: Monocular 3D Semantic Scene Completion
3D重建(3D Reconstruction)
3D重建(3D Reconstruction)
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
深度估计(Depth Estimation)
单目深度估计
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
人群计数(Crowd Counting)
人群计数(Crowd Counting)
Leveraging Self-Supervision for Cross-Domain Crowd Counting
医学图像(Medical Image)
医学图像(Medical Image)
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
场景图生成(Scene Graph Generation)
场景图生成(Scene Graph Generation)
SGTR: End-to-end Scene Graph Generation with Transformer
数据集(Datasets)
数据集(Datasets)
It’s About Time: Analog Clock Reading in the Wild
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
新任务(New Task)
新任务(New Task)
It’s About Time: Analog Clock Reading in the Wild
Splicing ViT Features for Semantic Appearance Transfer