image

paper , code

TL;DR

  • I read this because.. : ์–ธ๊ธ‰๋˜์–ด. ํ•œ ์ด๋ฏธ์ง€๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ํ…์ŠคํŠธ๊ฐ€ ์—ฌ๋Ÿฌ๊ฐœ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Œ. ์ด์— ๋Œ€ํ•œ ambiguity?!(์†ก๊ฐ•ํ˜ธ, ๋‚จ์ž ๋ฐฐ์šฐ, ๋‚จ์ž)
  • task : contrastive learning
  • problem : ํ•œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํ…์ŠคํŠธ๊ฐ€ ํ‘œํ˜„ํ•  ๋•Œ ๋‹ค์–‘ํ•œ ์ธต์œ„์—์„œ ํ‘œํ˜„๋  ์ˆ˜ ์žˆ์Œ(๊ฐœ๊ฐ€ ๋ˆˆ ์œ„์— ์„œ ์žˆ๋‹ค, ๊ฐ•์•„์ง€, ใ„ฑใ…‡ใ…‡~)
  • idea : CLIP์˜ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์„ euclidean ๊ณต๊ฐ„์ด ์•„๋‹ˆ๋ผ hyperbolic ๊ณต๊ฐ„์œผ๋กœ ์˜ฎ๊ธฐ์ž
  • input/output : image/text -> score
  • architecture : CLIP๊ณผ ๊ฐ™์Œ
  • objective : contrastive + entailment loss
  • baseline : CLIP trained with YFCC-100M(by SLIP)
  • data : YFCC-100M
  • evaluation : image text retrieval, zs-image classification
  • result : ๊ฐœ์„ ๋œ ์„ฑ๋Šฅ. ํŠน์ • ์ด๋ฏธ์ง€์—์„œ [ROOT]์— ๋Œ€ํ•ด traverse ํ•˜๋ฉด์„œ ๋‚˜์˜ค๋Š” text๊ฐ€ ์ ์  genericํ•ด์ง„๊ฑธ ๋ณด์ž„.
  • contribution : ์•„๋งˆ CLIP์„ hyperbolic space์—์„œ ํ•œ ์ฒซ work?
  • etc. :

Details

Motivation

image

Arch

image

Lifting embeddings onto the hyperboloid

CLIP encoder๋ฅผ ํ†ต๊ณผํ•˜๋ฉด ๊ฐ๊ฐ์˜ ์ด๋ฏธ์ง€, ํ…์ŠคํŠธ ๋ฒกํ„ฐ๋Š” n์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ ๋‚˜์˜ค๊ณ  ์—ฌ๊ธฐ์— origin 0๋ฒกํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” transformation์„ ์ ์šฉ $v =[v_{enc}, 0]\in\mathbb{R}^{n+1}$ ์ด origin O์˜ tangent space์— ๋“ค์–ด๊ฐ€๊ฒŒ ๋˜๊ณ , ์ด๋Ÿฌ๋ฉด 0๊ณผ ๋‚ด์ ํ•˜๋ฉด 0์ด๋˜๋Š” ์กฐ๊ฑด์„ ์ถฉ์กฑํ•˜๊ฒŒ ๋œ๋‹ค. Lorents ๋ชจ๋ธ์˜ space ๊ณต๊ฐ„์— ๋Œ€ํ•ด์„œ๋งŒ ๊ณ„์‚ฐํ•˜๊ฒŒ ๋˜๋ฉด ๋œ๋‹ค. ๊ทธ๋Ÿด ๊ฒฝ์šฐ์— x ๋ฒกํ„ฐ์— ๋Œ€ํ•œ exponential map(tangent space -> manifold๋กœ ํˆฌ์˜ํ•˜๋Š” map vectors)์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์ •๋ฆฌ๋œ๋‹ค. image

image

์ฆ‰ CLIP encoder์—์„œ ๋‚˜์˜จ ์ž„๋ฒ ๋”ฉ์—๋‹ค๊ฐ€ ์ € transformation์„ ์ ์šฉํ•˜๋ฉด hyperbolic space๋กœ ๊ฐ€๊ฒŒ ๋œ๋‹ค.

Lorents inner product๋Š” ์•„๋ž˜์™€ ๊ฐ™์œผ๋ฏ€๋กœ ๋‚ด์ ์„ ํ†ตํ•ด similiarity๋ฅผ ๊ตฌํ•˜๊ณ  contrastive loss๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ๋œ๋‹ค image

Entailment loss

image

์•„๋ž˜์™€ ๊ฐ™์€ loss๋ฅผ contrastive loss์— ์ถ”๊ฐ€ํ•ด์คŒ ์ˆ˜ํ•™์  ์ดํ•ด๋Š” ์ž˜ ๋ชจ๋ฅด๊ฒ ๊ณ  ์ด loss๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ์ง๊ด€์€ {Text-image}ํŽ˜์–ด๊ฐ€ ์žˆ์„ ๋•Œ text๊ฐ€ image๋ฅผ entail ํ•ด์•ผ ํ•จ. image

image

Results

image image
  • ํ…์ŠคํŠธ๊ฐ€ ์ข€ ๋” generic ํ•˜๊ณ  ๋„๋ฆฌ ๋ถ„ํฌ๋˜์–ด ์žˆ์Œ
  • ๋‘˜์˜ ๊ณต๊ฐ„์ด ์•„์˜ˆ ๋ถ„๋ฆฌ ๋˜์–ด ์žˆ์Œ

Ablations

image image

Image Traverse

image