Visual Question Answering: From Theory to Application

作者： Wu, Qi,Wang, Peng,Wang, Xin
原文出版社：Springer
出版日期：2022/05/14
語言：英文

定價：9599元

分期價：(除不盡餘數於第一期收取) 分期說明

可接受VISA, Master, JCB, 聯合信用卡

3期0利率	每期3199	6期0利率	每期1599

運送方式：
臺灣與離島
海外

可配送點：台灣、蘭嶼、綠島、澎湖、金門、馬祖
可取貨點：台灣、蘭嶼、綠島、澎湖、金門、馬祖

載入中...

我要評鑑

分享

內容簡介

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Visual Question Answering in AI tasks . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Categorisation of VQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3.1 Classiﬁed by Data Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3.2 Classiﬁed by Task Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.3 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Book Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Part I Preliminaries2 Deep Learning Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3 Recurrent Neural Networks and variants . . . . . . . . . . . . . . . . . . . . . . . 182.4 Encoder-Decoder Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5 Attention Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6 Memory Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.7 Transformer Networks and BERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.8 Graph Neural Networks Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Question Answering (QA) Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1 Rule-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Information retrieval-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.3 Neural Semantic Parsing for QA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.4 Knowledge Base for QA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Part II Image-based VQA
ix

x Contents4 The Classical Visual Question Answering . . . . . . . . . . . . . . . . . . . . . . . . . 374.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.3 Generation VS. Classiﬁcation: Two answering policies . . . . . . . . . . . 394.4 Joint Embedding Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.4.1 Sequence-to-Sequence Encoder-Decoder Models . . . . . . . . . . 404.4.2 Bilinear Encoding for VQA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.5 Awesome Attention Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.5.1 Stacked Attention Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.5.2 Hierarchical Question-Image Co-attention . . . . . . . . . . . . . .

作者簡介

Dr. Qi Wu is Senior Lecturer at the University of Adelaide and Chief Investigator at the ARC Centre of Excellence for Robotic Vision. He is also Director of Vision-and-Language Methods at the Australian Institute for Machine Learning. Dr Wu has been in the Computer Vision field for 10 years and he has a strong track record, having pioneered the field of Vision-and-Language, one of the most interesting and technically challenging areas of Computer Vision. This area, which has emerged over the last 5 years, represents the application of computer vision technology to problems that are closer to Artificial Intelligence. Dr Wu has made breakthroughs in methods and conceptual understanding to advance the field and is recognised as an international leader in the discipline. Beyond publishing some of the seminal papers in the area, he has organised a series of workshops in CVPR, ICCV and ACL. and authored key benchmarks that define the field. Recently, he led a team that won second place in VATEX Video Captioning Challenge, the first place in both TextVQA Challenge and MedicalVQA Challenge. His achievements have been recognised with the Australian Academy of Science J G Russel Award in 2019, one of four awards to ECRs across Australia; and an NVIDIA Pioneer Research Award.

Dr. Peng Wang is Professor at the School of Computer Science, Northwestern Polytechnical University, China. He previously served at the School of Computer Science, University of Adelaide, for four years. His research interests include computer vision, machine learning, and artificial intelligence.

Dr. Xin Wang is currently Assistant Professor at the Department of Computer Science and Technology, Tsinghua University. His research interests include cross-modal multimedia intelligence and inferable recommendations in social media. He has published several high-quality research papers for top conferences including ICML, KDD, WWW, SIGIR ACM Multimedia, etc. In addition to being selected for the 2017 China Postdoctoral innovative talents supporting program, he received the ACM China Rising Star Award in 2020.

Dr. Xiaodong He is Deputy Managing Director of JD AI Research; Head of the Deep Learning, NLP and Speech Lab; and Technical Vice President of JD.com. He is also Affiliate Professor at the University of Washington (Seattle), where he serves on doctoral supervisory committees. His research interests are mainly in artificial intelligence areas including deep learning, natural language, computer vision, speech, information retrieval, and knowledge representation. He has published more than 100 papers in ACL, EMNLP, NAACL, CVPR, SIGIR, WWW, CIKM, NIPS, ICLR, ICASSP, Proc. IEEE, IEEE TASLP, IEEE SPM, and other venues. He has received several awards including the Outstanding Paper Award at ACL 2015. He is Co-inventor of the DSSM, which is now broadly applied to language, vision, IR, and knowledge representation tasks. He also led the development of the CaptionBot, the world-first image captioning cloud service, deployed in 2016. He and colleagues have won major AI challenges including the 2008 NIST MT Eval, IWSLT 2011, COCO Captioning Challenge 2015, and VQA 2017. His work has been widely integrated into influential software and services including Microsoft Image Caption Services, Bing & Ads, Seeing AI, Word, and PowerPoint. He has held editorial positions with several IEEE journals, served as Area Chair for NAACL-HLT 2015 and served on the organizing committees/program committees of major speech and language processing conferences. He is IEEE Fellow and Member of the ACL.

Wenwu Zhu is currently Professor in the Department of Computer Science and Technology at Tsinghua University and Vice Dean of National Research Center for Information Science and Technology. Prior to his current post, he was Senior Researcher and Research Manager at Microsoft Research Asia. He was Chief Scientist and Director at Intel Research China from 2004 to 2008. He worked at Bell Labs New Jersey as Member of Technical Staff during 1996-1999. He received his Ph.D. degree from New York University in 1996.

His current research interests are in the area of data-driven multimedia networking and multimedia intelligence. He has published over 350 referred papers and is Inventor or Co-inventor of over 50 patents. He received eight Best Paper Awards, including ACM Multimedia 2012 and IEEE Transactions on Circuits and Systems for Video Technology in 2001 and 2019.

He served as EiC for IEEE Transactions on Multimedia (2017-2019). He serves as Chair of the steering committee for IEEE Transactions on Multimedia, and he serves as Associate EiC for IEEE Transactions for Circuits and Systems for Video technology. He serves as General Co-Chair for ACM Multimedia 2018 and ACM CIKM 2019, respectively. He is AAAS Fellow, IEEE Fellow, SPIE Fellow, and Member of The Academy of Europe (Academia Europaea).

詳細資料

ISBN：9789811909634
規格：精裝 / 254頁 / 23.39 x 15.6 x 1.6 cm / 普通級 / 初版
出版地：美國

本書分類：自然科普> 電腦> 計算機概論
本書分類：自然科普> 電腦> 人工智慧

主題活動

閱讀媒合所｜寫書評送25元E-Coupon

購物說明

外文館商品版本：商品之書封，為出版社提供之樣本。實際出貨商品，以出版社所提供之現有版本為主。關於外文書裝訂、版本上的差異，請參考【外文書的小知識】。

調貨時間：無庫存之商品，在您完成訂單程序之後，將以空運的方式為您下單調貨。原則上約14~20個工作天可以取書(若有將延遲另行告知)。為了縮短等待的時間，建議您將外文書與其它商品分開下單，以獲得最快的取貨速度，但若是海外專案進口的外文商品，調貨時間約1~2個月。

若您具有法人身份為常態性且大量購書者，或有特殊作業需求，建議您可洽詢「企業採購」。

退換貨說明

會員所購買的商品均享有到貨十天的猶豫期（含例假日）。退回之商品必須於猶豫期內寄回。

辦理退換貨時，商品必須是全新狀態與完整包裝(請注意保持商品本體、配件、贈品、保證書、原廠包裝及所有附隨文件或資料的完整性，切勿缺漏任何配件或損毀原廠外盒)。退回商品無法回復原狀者，恐將影響退貨權益或需負擔部分費用。

訂購本商品前請務必詳閱商品退換貨原則。

選擇語言

:::網站搜尋

Visual Question Answering: From Theory to Application

內容簡介

作者簡介

詳細資料

主題活動

最近瀏覽商品

相關活動

購物說明

同類商品新上架

本類新品熱銷

本類暢銷榜

得獎認證