Multimodal sentiment analysis

<h2 id="features">Features</h2>
<a href="/facts/Feature_engineering/VkNJFAFb">Feature engineering</a>, which involves the selection of features that are fed into <a href="/facts/Machine_learning/e0w0XJTu">machine learning</a> algorithms, plays a key role in the sentiment classification performance.<a class="footnote-ref" id="fnref:12" href="#fn:12">12</a> In multimodal sentiment analysis, a combination of different textual, audio, and visual features are employed.<a class="footnote-ref" id="fnref:13" href="#fn:13">13</a>

<h3>Textual features</h3>
Similar to the conventional text-based <a href="/facts/Sentiment_analysis/QRdywf6g">sentiment analysis</a>, some of the most commonly used textual features in multimodal sentiment analysis are <a href="/facts/N-grams/v4BT6rPP">unigrams</a> and <a href="/facts/N-gram/v4BT6rPP">n-grams</a>, which are basically a sequence of words in a given textual document.<a class="footnote-ref" id="fnref:14" href="#fn:14">14</a> These features are applied using <a href="/facts/Bag-of-words/TlhIx600">bag-of-words</a> or bag-of-concepts feature representations, in which words or concepts are represented as vectors in a suitable space.<a class="footnote-ref" id="fnref:15" href="#fn:15">15</a><a class="footnote-ref" id="fnref:16" href="#fn:16">16</a>

<h3>Audio features</h3>
<a href="/facts/Feeling/RJgeFMo1">Sentiment</a> and <a href="/facts/Emotion/kvAr22wa">emotion</a> characteristics are prominent in different <a href="/facts/Phonetic/ljAVT8fv">phonetic</a> and <a href="/facts/Prosodic/UDzAIz46">prosodic</a> properties contained in audio features.<a class="footnote-ref" id="fnref:17" href="#fn:17">17</a> Some of the most important audio features employed in multimodal sentiment analysis are <a href="/facts/Mel-frequency_cepstrum/oBMnb8FD"> mel-frequency cepstrum (MFCC)</a>, <a href="/facts/Spectral_centroid/48BumUQn">spectral centroid</a>, <a href="/facts/Spectral_flux/4u2uvtzd">spectral flux</a>, beat histogram, beat sum, strongest beat, pause duration, and <a href="/facts/Pitch_accent/aIQu5zse">pitch</a>.<a class="footnote-ref" id="fnref:18" href="#fn:18">18</a> <a href="/facts/OpenSMILE/mUzpGgfG">OpenSMILE</a><a class="footnote-ref" id="fnref:19" href="#fn:19">19</a> and <a href="/facts/Praat/ykIv26Vy">Praat</a> are popular open-source toolkits for extracting such audio features.<a class="footnote-ref" id="fnref:20" href="#fn:20">20</a>

<h3>Visual features</h3>
One of the main advantages of analyzing videos with respect to texts alone, is the presence of rich sentiment cues in visual data.<a class="footnote-ref" id="fnref:21" href="#fn:21">21</a> Visual features include <a href="/facts/Facial_expression/polSzUsf">facial expressions</a>, which are of paramount importance in capturing sentiments and <a href="/facts/Emotion/kvAr22wa">emotions</a>, as they are a main channel of forming a person's present state of mind.<a class="footnote-ref" id="fnref:22" href="#fn:22">22</a> Specifically, <a href="/facts/Smile/o6e11wWV">smile</a>, is considered to be one of the most predictive visual cues in multimodal sentiment analysis.<a class="footnote-ref" id="fnref:23" href="#fn:23">23</a> OpenFace is an open-source facial analysis toolkit available for extracting and understanding such visual features.<a class="footnote-ref" id="fnref:24" href="#fn:24">24</a>

<h2 id="fusion-techniques">Fusion techniques</h2>
Unlike the traditional text-based <a href="/facts/Sentiment_analysis/QRdywf6g">sentiment analysis</a>, multimodal sentiment analysis undergo a fusion process in which data from different modalities (text, audio, or visual) are fused and analyzed together.<a class="footnote-ref" id="fnref:25" href="#fn:25">25</a> The existing approaches in multimodal sentiment analysis <a href="/facts/Data_fusion/71rNwLLs">data fusion</a> can be grouped into three main categories: feature-level, decision-level, and hybrid fusion, and the performance of the sentiment classification depends on which type of fusion technique is employed.<a class="footnote-ref" id="fnref:26" href="#fn:26">26</a>

<h3>Feature-level fusion</h3>
Feature-level fusion (sometimes known as early fusion) gathers all the features from each <a href="/facts/Modality_(human%E2%80%93computer_interaction)/KzL4uqMm">modality</a> (text, audio, or visual) and joins them together into a single feature vector, which is eventually fed into a classification algorithm.<a class="footnote-ref" id="fnref:27" href="#fn:27">27</a> One of the difficulties in implementing this technique is the integration of the heterogeneous features.<a class="footnote-ref" id="fnref:28" href="#fn:28">28</a>

<h3>Decision-level fusion</h3>
Decision-level fusion (sometimes known as late fusion), feeds data from each modality (text, audio, or visual) independently into its own classification algorithm, and obtains the final sentiment classification results by fusing each result into a single decision vector.<a class="footnote-ref" id="fnref:29" href="#fn:29">29</a> One of the advantages of this fusion technique is that it eliminates the need to fuse heterogeneous data, and each <a href="/facts/Modality_(human%E2%80%93computer_interaction)/KzL4uqMm">modality</a> can utilize its most appropriate <a href="/facts/Classification/0YnvM2EF">classification</a> <a href="/facts/Algorithm/fnl5NmRt">algorithm</a>.<a class="footnote-ref" id="fnref:30" href="#fn:30">30</a>

<h3>Hybrid fusion</h3>
Hybrid fusion is a combination of feature-level and decision-level fusion techniques, which exploits complementary information from both methods during the classification process.<a class="footnote-ref" id="fnref:31" href="#fn:31">31</a> It usually involves a two-step procedure wherein feature-level fusion is initially performed between two modalities, and decision-level fusion is then applied as a second step, to fuse the initial results from the feature-level fusion, with the remaining <a href="/facts/Modality_(human%E2%80%93computer_interaction)/KzL4uqMm">modality</a>.<a class="footnote-ref" id="fnref:32" href="#fn:32">32</a><a class="footnote-ref" id="fnref:33" href="#fn:33">33</a>

<h2 id="applications">Applications</h2>
Similar to text-based sentiment analysis, multimodal sentiment analysis can be applied in the development of different forms of <a href="/facts/Recommender_system/HjodW6nS">recommender systems</a> such as in the analysis of user-generated videos of movie reviews<a class="footnote-ref" id="fnref:34" href="#fn:34">34</a> and general product reviews,<a class="footnote-ref" id="fnref:35" href="#fn:35">35</a> to predict the sentiments of customers, and subsequently create product or service recommendations.<a class="footnote-ref" id="fnref:36" href="#fn:36">36</a> Multimodal sentiment analysis also plays an important role in the advancement of <a href="/facts/Virtual_assistant/BuouwL2U">virtual assistants</a> through the application of <a href="/facts/Natural_language_processing/1hjMKsSN">natural language processing</a> (NLP) and <a href="/facts/Machine_learning/e0w0XJTu">machine learning</a> techniques.<a class="footnote-ref" id="fnref:37" href="#fn:37">37</a> In the healthcare domain, multimodal sentiment analysis can be utilized to detect certain medical conditions such as <a href="/facts/Psychological_stress/eqat8pq9">stress</a>, <a href="/facts/Anxiety/Geekl883">anxiety</a>, or <a href="/facts/Depression_(mood)/GDNFWm1L">depression</a>.<a class="footnote-ref" id="fnref:38" href="#fn:38">38</a> Multimodal sentiment analysis can also be applied in understanding the sentiments contained in video news programs, which is considered as a complicated and challenging domain, as sentiments expressed by reporters tend to be less obvious or neutral.<a class="footnote-ref" id="fnref:39" href="#fn:39">39</a>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Soleymani, Mohammad; Garcia, David; Jou, Brendan; Schuller, Björn; Chang, Shih-Fu; Pantic, Maja (September 2017). "A survey of multimodal sentiment analysis". Image and Vision Computing. 65: 3–14. doi:10.1016/j.imavis.2017.08.003. S2CID 19491070. <a href="https://zenodo.org/record/3449163" target="_blank">https://zenodo.org/record/3449163</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Karray, Fakhreddine; Milad, Alemzadeh; Saleh, Jamil Abou; Mo Nours, Arab (2008). "Human-Computer Interaction: Overview on State of the Art" (PDF). International Journal on Smart Sensing and Intelligent Systems. 1: 137–159. doi:10.21307/ijssis-2017-283. <a href="http://s2is.org/Issues/v1/n1/papers/paper9.pdf" target="_blank">http://s2is.org/Issues/v1/n1/papers/paper9.pdf</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. S2CID 205433041. <a href="http://researchrepository.napier.ac.uk/Output/1792429" target="_blank">http://researchrepository.napier.ac.uk/Output/1792429</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Nguyen, Quy Hoang; Nguyen, Minh-Van Truong; Van Nguyen, Kiet (2024-05-01). "New Benchmark Dataset and Fine-Grained Cross-Modal Fusion Framework for Vietnamese Multimodal Aspect-Category Sentiment Analysis". arXiv:2405.00543 [cs.CL]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">"Google AI to make phone calls for you". BBC News. 8 May 2018. Retrieved 12 June 2018. <a href="https://www.bbc.com/news/technology-44045424" target="_blank">https://www.bbc.com/news/technology-44045424</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Wollmer, Martin; Weninger, Felix; Knaup, Tobias; Schuller, Bjorn; Sun, Congkai; Sagae, Kenji; Morency, Louis-Philippe (May 2013). "YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context" (PDF). IEEE Intelligent Systems. 28 (3): 46–53. doi:10.1109/MIS.2013.34. S2CID 12789201. <a href="https://opus.bibliothek.uni-augsburg.de/opus4/files/72633/72633.pdf" target="_blank">https://opus.bibliothek.uni-augsburg.de/opus4/files/72633/72633.pdf</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Pereira, Moisés H. R.; Pádua, Flávio L. C.; Pereira, Adriano C. M.; Benevenuto, Fabrício; Dalip, Daniel H. (9 April 2016). "Fusing Audio, Textual and Visual Features for Sentiment Analysis of News Videos". arXiv:1604.02612 [cs.CL]. <a href="/wiki/ArXiv_(identifier)" target="_blank">/wiki/ArXiv_(identifier)</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Zucco, Chiara; Calabrese, Barbara; Cannataro, Mario (November 2017). "Sentiment analysis and affective computing for depression monitoring". 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. pp. 1988–1995. doi:10.1109/bibm.2017.8217966. ISBN 978-1-5090-3050-7. S2CID 24408937. <a href="978-1-5090-3050-7" target="_blank">978-1-5090-3050-7</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Pang, Bo; Lee, Lillian (2008). Opinion mining and sentiment analysis. Hanover, MA: Now Publishers. ISBN 978-1601981509. <a href="978-1601981509" target="_blank">978-1601981509</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. S2CID 205433041. <a href="http://researchrepository.napier.ac.uk/Output/1792429" target="_blank">http://researchrepository.napier.ac.uk/Output/1792429</a> <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Sun, Shiliang; Luo, Chen; Chen, Junyu (July 2017). "A review of natural language processing techniques for opinion mining systems". Information Fusion. 36: 10–25. doi:10.1016/j.inffus.2016.10.004. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">Sun, Shiliang; Luo, Chen; Chen, Junyu (July 2017). "A review of natural language processing techniques for opinion mining systems". Information Fusion. 36: 10–25. doi:10.1016/j.inffus.2016.10.004. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
<li id="fn:13">Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. S2CID 205433041. <a href="http://researchrepository.napier.ac.uk/Output/1792429" target="_blank">http://researchrepository.napier.ac.uk/Output/1792429</a> <a href="#fnref:13" class="footnote-back-ref">↩</a></li>
<li id="fn:14">Yadollahi, Ali; Shahraki, Ameneh Gholipour; Zaiane, Osmar R. (25 May 2017). "Current State of Text Sentiment Analysis from Opinion to Emotion Mining". ACM Computing Surveys. 50 (2): 1–33. doi:10.1145/3057270. S2CID 5275807. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:14" class="footnote-back-ref">↩</a></li>
<li id="fn:15">Perez Rosas, Veronica; Mihalcea, Rada; Morency, Louis-Philippe (May 2013). "Multimodal Sentiment Analysis of Spanish Online Videos". IEEE Intelligent Systems. 28 (3): 38–45. doi:10.1109/MIS.2013.9. S2CID 1132247. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:15" class="footnote-back-ref">↩</a></li>
<li id="fn:16">Poria, Soujanya; Cambria, Erik; Hussain, Amir; Huang, Guang-Bin (March 2015). "Towards an intelligent framework for multimodal affective data analysis". Neural Networks. 63: 104–116. doi:10.1016/j.neunet.2014.10.005. hdl:1893/21310. PMID 25523041. S2CID 342649. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:16" class="footnote-back-ref">↩</a></li>
<li id="fn:17">Chung-Hsien Wu; Wei-Bin Liang (January 2011). "Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels". IEEE Transactions on Affective Computing. 2 (1): 10–21. doi:10.1109/T-AFFC.2010.16. S2CID 52853112. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:17" class="footnote-back-ref">↩</a></li>
<li id="fn:18">Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. S2CID 205433041. <a href="http://researchrepository.napier.ac.uk/Output/1792429" target="_blank">http://researchrepository.napier.ac.uk/Output/1792429</a> <a href="#fnref:18" class="footnote-back-ref">↩</a></li>
<li id="fn:19">Eyben, Florian; Wöllmer, Martin; Schuller, Björn (2009). "OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit". OpenEAR — Introducing the munich open-source emotion and affect recognition toolkit - IEEE Conference Publication. p. 1. doi:10.1109/ACII.2009.5349350. ISBN 978-1-4244-4800-5. S2CID 2081569. <a href="978-1-4244-4800-5" target="_blank">978-1-4244-4800-5</a> <a href="#fnref:19" class="footnote-back-ref">↩</a></li>
<li id="fn:20">Morency, Louis-Philippe; Mihalcea, Rada; Doshi, Payal (14 November 2011). "Towards multimodal sentiment analysis". Towards multimodal sentiment analysis: harvesting opinions from the web. ACM. pp. 169–176. doi:10.1145/2070481.2070509. ISBN 9781450306416. S2CID 1257599. <a href="9781450306416" target="_blank">9781450306416</a> <a href="#fnref:20" class="footnote-back-ref">↩</a></li>
<li id="fn:21">Poria, Soujanya; Cambria, Erik; Hazarika, Devamanyu; Majumder, Navonil; Zadeh, Amir; Morency, Louis-Philippe (2017). "Context-Dependent Sentiment Analysis in User-Generated Videos". Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 873–883. doi:10.18653/v1/p17-1081. <a href="https://doi.org/10.18653%2Fv1%2Fp17-1081" target="_blank">https://doi.org/10.18653%2Fv1%2Fp17-1081</a> <a href="#fnref:21" class="footnote-back-ref">↩</a></li>
<li id="fn:22">Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. S2CID 205433041. <a href="http://researchrepository.napier.ac.uk/Output/1792429" target="_blank">http://researchrepository.napier.ac.uk/Output/1792429</a> <a href="#fnref:22" class="footnote-back-ref">↩</a></li>
<li id="fn:23">Perez Rosas, Veronica; Mihalcea, Rada; Morency, Louis-Philippe (May 2013). "Multimodal Sentiment Analysis of Spanish Online Videos". IEEE Intelligent Systems. 28 (3): 38–45. doi:10.1109/MIS.2013.9. S2CID 1132247. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:23" class="footnote-back-ref">↩</a></li>
<li id="fn:24">OpenFace: An open source facial behavior analysis toolkit - IEEE Conference Publication. March 2016. doi:10.1109/WACV.2016.7477553. ISBN 978-1-5090-0641-0. S2CID 1919851. <a href="978-1-5090-0641-0" target="_blank">978-1-5090-0641-0</a> <a href="#fnref:24" class="footnote-back-ref">↩</a></li>
<li id="fn:25">Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. S2CID 205433041. <a href="http://researchrepository.napier.ac.uk/Output/1792429" target="_blank">http://researchrepository.napier.ac.uk/Output/1792429</a> <a href="#fnref:25" class="footnote-back-ref">↩</a></li>
<li id="fn:26">Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. S2CID 205433041. <a href="http://researchrepository.napier.ac.uk/Output/1792429" target="_blank">http://researchrepository.napier.ac.uk/Output/1792429</a> <a href="#fnref:26" class="footnote-back-ref">↩</a></li>
<li id="fn:27">Poria, Soujanya; Cambria, Erik; Howard, Newton; Huang, Guang-Bin; Hussain, Amir (January 2016). "Fusing audio, visual and textual clues for sentiment analysis from multimodal content". Neurocomputing. 174: 50–59. doi:10.1016/j.neucom.2015.01.095. S2CID 15287807. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:27" class="footnote-back-ref">↩</a></li>
<li id="fn:28">Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. S2CID 205433041. <a href="http://researchrepository.napier.ac.uk/Output/1792429" target="_blank">http://researchrepository.napier.ac.uk/Output/1792429</a> <a href="#fnref:28" class="footnote-back-ref">↩</a></li>
<li id="fn:29">Poria, Soujanya; Cambria, Erik; Howard, Newton; Huang, Guang-Bin; Hussain, Amir (January 2016). "Fusing audio, visual and textual clues for sentiment analysis from multimodal content". Neurocomputing. 174: 50–59. doi:10.1016/j.neucom.2015.01.095. S2CID 15287807. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:29" class="footnote-back-ref">↩</a></li>
<li id="fn:30">Poria, Soujanya; Cambria, Erik; Bajpai, Rajiv; Hussain, Amir (September 2017). "A review of affective computing: From unimodal analysis to multimodal fusion". Information Fusion. 37: 98–125. doi:10.1016/j.inffus.2017.02.003. hdl:1893/25490. S2CID 205433041. <a href="http://researchrepository.napier.ac.uk/Output/1792429" target="_blank">http://researchrepository.napier.ac.uk/Output/1792429</a> <a href="#fnref:30" class="footnote-back-ref">↩</a></li>
<li id="fn:31">Wollmer, Martin; Weninger, Felix; Knaup, Tobias; Schuller, Bjorn; Sun, Congkai; Sagae, Kenji; Morency, Louis-Philippe (May 2013). "YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context" (PDF). IEEE Intelligent Systems. 28 (3): 46–53. doi:10.1109/MIS.2013.34. S2CID 12789201. <a href="https://opus.bibliothek.uni-augsburg.de/opus4/files/72633/72633.pdf" target="_blank">https://opus.bibliothek.uni-augsburg.de/opus4/files/72633/72633.pdf</a> <a href="#fnref:31" class="footnote-back-ref">↩</a></li>
<li id="fn:32">Shahla, Shahla; Naghsh-Nilchi, Ahmad Reza (2017). "Exploiting evidential theory in the fusion of textual, audio, and visual modalities for affective music video retrieval - IEEE Conference Publication". doi:10.1109/PRIA.2017.7983051. S2CID 24466718. {{cite journal}}: Cite journal requires |journal= (help) <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:32" class="footnote-back-ref">↩</a></li>
<li id="fn:33">Poria, Soujanya; Peng, Haiyun; Hussain, Amir; Howard, Newton; Cambria, Erik (October 2017). "Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis". Neurocomputing. 261: 217–230. doi:10.1016/j.neucom.2016.09.117. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:33" class="footnote-back-ref">↩</a></li>
<li id="fn:34">Wollmer, Martin; Weninger, Felix; Knaup, Tobias; Schuller, Bjorn; Sun, Congkai; Sagae, Kenji; Morency, Louis-Philippe (May 2013). "YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context" (PDF). IEEE Intelligent Systems. 28 (3): 46–53. doi:10.1109/MIS.2013.34. S2CID 12789201. <a href="https://opus.bibliothek.uni-augsburg.de/opus4/files/72633/72633.pdf" target="_blank">https://opus.bibliothek.uni-augsburg.de/opus4/files/72633/72633.pdf</a> <a href="#fnref:34" class="footnote-back-ref">↩</a></li>
<li id="fn:35">Pérez-Rosas, Verónica; Mihalcea, Rada; Morency, Louis Philippe (1 January 2013). "Utterance-level multimodal sentiment analysis". Long Papers. Association for Computational Linguistics (ACL). <a href="https://experts.umich.edu/en/publications/utterance-level-multimodal-sentiment-analysis" target="_blank">https://experts.umich.edu/en/publications/utterance-level-multimodal-sentiment-analysis</a> <a href="#fnref:35" class="footnote-back-ref">↩</a></li>
<li id="fn:36">Chui, Michael; Manyika, James; Miremadi, Mehdi; Henke, Nicolaus; Chung, Rita; Nel, Pieter; Malhotra, Sankalp. "Notes from the AI frontier. Insights from hundreds of use cases". McKinsey & Company. Retrieved 13 June 2018. <a href="https://www.mckinsey.com/mgi/" target="_blank">https://www.mckinsey.com/mgi/</a> <a href="#fnref:36" class="footnote-back-ref">↩</a></li>
<li id="fn:37">"Google AI to make phone calls for you". BBC News. 8 May 2018. Retrieved 12 June 2018. <a href="https://www.bbc.com/news/technology-44045424" target="_blank">https://www.bbc.com/news/technology-44045424</a> <a href="#fnref:37" class="footnote-back-ref">↩</a></li>
<li id="fn:38">Zucco, Chiara; Calabrese, Barbara; Cannataro, Mario (November 2017). "Sentiment analysis and affective computing for depression monitoring". 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. pp. 1988–1995. doi:10.1109/bibm.2017.8217966. ISBN 978-1-5090-3050-7. S2CID 24408937. <a href="978-1-5090-3050-7" target="_blank">978-1-5090-3050-7</a> <a href="#fnref:38" class="footnote-back-ref">↩</a></li>
<li id="fn:39">Ellis, Joseph G.; Jou, Brendan; Chang, Shih-Fu (12 November 2014). "Why We Watch the News". Why We Watch the News: A Dataset for Exploring Sentiment in Broadcast Video News. ACM. pp. 104–111. doi:10.1145/2663204.2663237. ISBN 9781450328852. S2CID 14112246. <a href="9781450328852" target="_blank">9781450328852</a> <a href="#fnref:39" class="footnote-back-ref">↩</a></li>
</ol>

Multimodal sentiment analysis open-in-new

Multimodal sentiment analysis