← Find more feeds

arxiv.org

cs.CV updates on arXiv.org

Get the latest updates from cs.CV updates on arXiv.org directly as they happen.

Follow now 114 followers

Latest posts

Last updated about 12 hours ago

An Improved Method for Personalizing Diffusion Models

about 12 hours ago

arXiv:2407.05312v2 Announce Type: replace Abstract: Diffusion models have demonstrated impressive image generation...

Read full

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference

about 12 hours ago

arXiv:2411.15851v2 Announce Type: replace Abstract: While vision-language models like CLIP have shown...

Read full

GS-ROR$^2$: Bidirectional-guided 3DGS and SDF for Reflective Object Relighting and Reconstruction

about 12 hours ago

arXiv:2406.18544v4 Announce Type: replace Abstract: 3D Gaussian Splatting (3DGS) has shown a...

Read full

Neuron Populations Exhibit Divergent Selectivity with Scale

about 12 hours ago

arXiv:2606.03990v1 Announce Type: cross Abstract: We investigate whether neuron populations within neural...

Read full

MAdam: Metric-Aware Multi-Objective Adam

about 12 hours ago

arXiv:2606.03904v1 Announce Type: cross Abstract: Multi-objective optimization (MOO) underlies many machine learning...

Read full

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

about 12 hours ago

arXiv:2606.03985v1 Announce Type: cross Abstract: We introduce Humanoid-GPT, a GPT-style Transformer with...

Read full

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

about 12 hours ago

arXiv:2606.03940v1 Announce Type: cross Abstract: In robotics systems, vast amounts of visual...

Read full

Face versus Body Tracking for Human-Robot Interaction: An Egocentric Dataset

about 12 hours ago

arXiv:2606.03694v1 Announce Type: cross Abstract: To enable meaningful human-robot interaction (HRI), a...

Read full

PHASER: Phase-Aware and Semantic Experience Replay for Vision-Language-Action Models

about 12 hours ago

arXiv:2606.03598v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have achieved remarkable success...

Read full

Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models

about 12 hours ago

arXiv:2606.03793v1 Announce Type: cross Abstract: Multimodal Large Language Models integrate visual perception...

Read full

Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study

about 12 hours ago

arXiv:2606.03693v1 Announce Type: cross Abstract: Medical Vision-Language Models (VLMs) are typically evaluated...

Read full

SagaQA: A Multi-hop Reasoning Benchmark for Long-form Narrative Understanding in TV Series

about 12 hours ago

arXiv:2606.03301v1 Announce Type: cross Abstract: We introduce SagaQA, a long-form video benchmark...

Read full

Or log in

Everything you care about in one place

cs.CV updates on arXiv.org

Latest posts

An Improved Method for Personalizing Diffusion Models

ResCLIP: Residual Attention for Training-free Dense Vision-language Inference

GS-ROR$^2$: Bidirectional-guided 3DGS and SDF for Reflective Object Relighting and Reconstruction

Neuron Populations Exhibit Divergent Selectivity with Scale

MAdam: Metric-Aware Multi-Objective Adam

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction

Face versus Body Tracking for Human-Robot Interaction: An Egocentric Dataset

PHASER: Phase-Aware and Semantic Experience Replay for Vision-Language-Action Models

Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models

Does Language Shift Break Medical Vision-Language Models? Indonesian Radiology Visual Question Answering Case Study

SagaQA: A Multi-hop Reasoning Benchmark for Long-form Narrative Understanding in TV Series

Try Feeder for free