[TOMM] REFINE: Composed Video Retrieval via Shared and Differential Semantics Enhancement

1Shandong University, China,
2Harbin Institute of Technology (Shenzhen), China
*Corresponding author.

Abstract

MY ALT TEXT

Shareability vs. Variability in CVR

MY ALT TEXT

(a) illustrates a case of the CVR task, (b) visualizes the shareability in the video, and (c) highlights the variability in the video.


Framework: haREd and diFferential semantIcs eNhancement nEtwork (REFINE)

MY ALT TEXT

Our REFINE framework comprises three core modules: (a) Shared Feature Enhancement, (b) Differential Semantic Disentanglement, and (c) Associative Object Aggregation.


Experiment

MY ALT TEXT

Performance comparison on WebVid-CoVR-Test (R@π‘˜ (%)). The overall best results are in bold, while the best results over baselines are underlined.


MY ALT TEXT

Performance comparison on FashionIQ (R@π‘˜ (%)). The overall highest performance is indicated in bold, while the top-performing results over the baseline methods are underlined.


MY ALT TEXT

Performance comparison on Shoes (R@π‘˜ (%)). The overall highest performance is indicated in bold, while the top-performing results over the baseline methods are underlined.


MY ALT TEXT

Performance comparison on CIRR with respect to R@π‘˜(%) and R𝑠𝑒𝑏𝑠𝑒𝑑 @π‘˜(%). The overall highest performance is indicated in bold, while the top-performing results over the baseline methods are underlined.


MY ALT TEXT

Ablation Studies of REFINE with different components and various settings on WebVid-CoVR, FashionIQ, Shoes, and CIRR. Note that Var# is the number of different configurations.


MY ALT TEXT

Sensitivity to (a) Number of Semantic Clusters and (b) Number of Enhanced Tokens.


MY ALT TEXT

Case Study on (a) WebVid-CoVR, (b) FashionIQ, (c) Shoes, and (d) CIRR.

BibTeX


        @article{hurefine,
        title={REFINE: Composed Video Retrieval via Shared and Differential Semantics Enhancement},
        author={Hu, Yupeng and Li, Zixu and Chen, Zhiwei and Huang, Qinlei and Fu, Zhiheng and Xu, Mingzhu and Nie, Liqiang},
        journal={ACM Transactions on Multimedia Computing, Communications and Applications},
        publisher={ACM New York, NY}
        }