Everything you care about in one place

Follow feeds: blogs, news, RSS and more. An effortless way to read and digest content of your choice.

Get Feeder

arxiv.org

cs.CV updates on arXiv.org

Get the latest updates from cs.CV updates on arXiv.org directly as they happen.

Follow now 114 followers

Latest posts

Last updated about 12 hours ago

An Improved Method for Personalizing Diffusion Models

about 12 hours ago

arXiv:2407.05312v2 Announce Type: replace Abstract: Diffusion models have demonstrated impressive image generation...

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference

about 12 hours ago

arXiv:2411.15851v2 Announce Type: replace Abstract: While vision-language models like CLIP have shown...

GS-ROR$^2$: Bidirectional-guided 3DGS and SDF for Reflective Object Relighting and Reconstruction

about 12 hours ago

arXiv:2406.18544v4 Announce Type: replace Abstract: 3D Gaussian Splatting (3DGS) has shown a...

Neuron Populations Exhibit Divergent Selectivity with Scale

about 12 hours ago

arXiv:2606.03990v1 Announce Type: cross Abstract: We investigate whether neuron populations within neural...

MAdam: Metric-Aware Multi-Objective Adam

about 12 hours ago

arXiv:2606.03904v1 Announce Type: cross Abstract: Multi-objective optimization (MOO) underlies many machine learning...

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

about 12 hours ago

arXiv:2606.03985v1 Announce Type: cross Abstract: We introduce Humanoid-GPT, a GPT-style Transformer with...

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

about 12 hours ago

arXiv:2606.03940v1 Announce Type: cross Abstract: In robotics systems, vast amounts of visual...

Face versus Body Tracking for Human-Robot Interaction: An Egocentric Dataset

about 12 hours ago

arXiv:2606.03694v1 Announce Type: cross Abstract: To enable meaningful human-robot interaction (HRI), a...

PHASER: Phase-Aware and Semantic Experience Replay for Vision-Language-Action Models

about 12 hours ago

arXiv:2606.03598v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have achieved remarkable success...

Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models

about 12 hours ago

arXiv:2606.03793v1 Announce Type: cross Abstract: Multimodal Large Language Models integrate visual perception...

Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study

about 12 hours ago

arXiv:2606.03693v1 Announce Type: cross Abstract: Medical Vision-Language Models (VLMs) are typically evaluated...

SagaQA: A Multi-hop Reasoning Benchmark for Long-form Narrative Understanding in TV Series

about 12 hours ago

arXiv:2606.03301v1 Announce Type: cross Abstract: We introduce SagaQA, a long-form video benchmark...