[172] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

2024๋…„ 8์›” 30์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[165] Rich Human Feedback for Text-to-Image Generation

2024๋…„ 7์›” 19์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[146] Transformer Interpretability Beyond Attention Visualization

2024๋…„ 2์›” 6์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[131] Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

2023๋…„ 9์›” 13์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[125] RILS: Masked Visual Reconstruction in Language Semantic Space

2023๋…„ 8์›” 2์ผ ยท 2 ๋ถ„ ยท long8v ยท 

feat: add sparse rcnn

2023๋…„ 7์›” 24์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[108] Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships

2023๋…„ 4์›” 4์ผ ยท 3 ๋ถ„ ยท long8v ยท 

[90] Neural Collaborative Graph Machines for Table Structure Recognition

2022๋…„ 12์›” 22์ผ ยท 1 ๋ถ„ ยท long8v ยท 

[87] Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation

2022๋…„ 12์›” 8์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[85] Dynamic Head: Unifying Object Detection Heads with Attentions

2022๋…„ 12์›” 1์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[51] Structured Sparse R-CNN for Direct Scene Graph Generation

2022๋…„ 8์›” 19์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[36] SGTR: End-to-end Scene Graph Generation with Transformer

2022๋…„ 7์›” 19์ผ ยท 2 ๋ถ„ ยท long8v ยท 

[28] Learning to Compare: Relation Network for Few-Shot Learning

2022๋…„ 5์›” 31์ผ ยท 1 ๋ถ„ ยท long8v ยท