{"id":325,"date":"2023-11-20T05:35:39","date_gmt":"2023-11-20T00:05:39","guid":{"rendered":"https:\/\/callzen.ai\/blog\/?p=325"},"modified":"2024-02-01T14:51:38","modified_gmt":"2024-02-01T09:21:38","slug":"automatic-speech-recognition-in-telephonic-speech","status":"publish","type":"post","link":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/","title":{"rendered":"Automatic Speech Recognition in Telephonic Speech"},"content":{"rendered":"<figure class=\"wp-block-post-featured-image\"><img loading=\"lazy\" decoding=\"async\" width=\"2500\" height=\"1042\" src=\"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02.jpg\" class=\"attachment-post-thumbnail size-post-thumbnail wp-post-image\" alt=\"\" style=\"object-fit:cover;\" srcset=\"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02.jpg 2500w, https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02-300x125.jpg 300w, https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02-1024x427.jpg 1024w, https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02-768x320.jpg 768w, https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02-1536x640.jpg 1536w, https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02-2048x854.jpg 2048w\" sizes=\"auto, (max-width: 2500px) 100vw, 2500px\" \/><\/figure>\n\n\n<h2 class=\"wp-block-heading\" style=\"font-size:30px\">What is <strong>Automatic Speech Recognition (ASR)<\/strong>?<\/h2>\n\n\n\n<p>Telephonic conversations are one of the most effective ways of communication with customers as attention during calls is usually higher than other channels.&nbsp;<\/p>\n\n\n\n<p>Any customer-centric company tends to generate a lot of audio data, however, audio can be difficult to analyze and derive insights from. This is where Automatic Speech Recognition (<strong>ASR<\/strong>) comes in. ASR is a technology that enables computers to understand speech and convert it into textual information.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"font-size:30px\"><strong>Multilingual ASR<\/strong><\/h2>\n\n\n\n<p>convozen.AI serves customers situated at multiple locations across the world, and we interact in their native language. This provides another big challenge in building an understanding of different languages &amp; accents.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How do ASR systems work?<\/strong><\/h2>\n\n\n\n<p>ASR systems usually have <strong>2 major steps <\/strong>in speech-to-text processing.&nbsp;<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Feature extraction from audio<\/li>\n\n\n\n<li>Mapping learned features to possible text sequences.&nbsp;<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">1. <strong>Feature extraction&nbsp;<\/strong><\/h3>\n\n\n\n<p>Raw audio data consists of signals sampled at a predefined frequency. Usually, this is set at 16KHz, implying 16000 samples within 1 second. Human speech has an average of 12 characters \/ second. So compressing this information is vital to having a quality ASR system. There are several ways to extract features from raw audio.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Mel frequency cepstral coefficients (MFCC):<\/strong> Raw audio can be represented as power expressed at different frequency bands present in it in a certain time frame. Usually, these bands are defined by the Mel scale, a predefined scale of frequency bands where change can be observed by the human auditory system. MFCCs are a set of coefficients that capture the shape of the power spectrum of a sound signal expressed in this scale.<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Transformer encoders: <\/strong>Transformers have shown unique capabilities in modelling text data, current state-of-the-art ASR systems are adopting them for audio representation also. Raw audio data is processed through several blocks of convolutions (<em>Point convolutions &amp; 1-D Depth wise convolution<\/em>) to obtain latent representations, some of these representations are masked &amp; sent through transformer layers to obtain contextualized representations. Then these are used to predict the masked latent representation using a contrastive objective. Once trained the contextualized representations become the audio embeddings.<\/li>\n<\/ul>\n\n\n\n<p class=\"has-text-align-center\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/_aStV9frXDQ52azxondetsEPFqjPjvYsVCPqTL0b8A883lsgxHAIrzF8cu5T1jNdOIS4af0C9us5JWiMaV5JLn6Z0enXv4tAON5jZjiIQSdd0gVa_l0G4Y9y7YGPCA7l6b1MAoFXSBUGHbOlBWsRRxw\" width=\"610\" height=\"448\"><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. <strong>Text Sequence mapping<\/strong><\/h3>\n\n\n\n<p>After training on a large corpus of audio data, speech representations that capture voice patterns are obtained, these can be further fine-tuned for final objectives such as language identification, emotion recognition &amp; also ASR.<\/p>\n\n\n\n<p>Feature embeddings are usually generated at shorter time frames such as 20ms &#8211; 50 ms. A single phoneme could be repeated across several of these frames depending on the speed of speech.&nbsp;<\/p>\n\n\n\n<p><strong>Example<\/strong>: \u201cGood &amp; God &#8221; both can be represented as g-g-o-o-o-o-d-d-d, g-g-o-o-o-o-d-d-d in time frames. The algorithm that can learn to collapse these frames into meaningful sequences is known as Connectionist Temporal Classification (CTC).&nbsp;<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Connectionist Temporal Classification (CTC)<\/strong><\/h3>\n\n\n\n<p>CTC Is an algorithm that assigns a probability score to an output Y given any input X. The main advantage of CTC is that the size of X and Y do not have to match.&nbsp;<\/p>\n\n\n\n<p>It tries to maximize the possible sequence of mappings that actually result in the final word against all the possible permutations. Ex: Let&#8217;s say our vocab is only 2 characters g,o &amp; the number of time frames is 4. Total number of possibilities is 2^4. But those results in \u201cgo\u201d are only 3 g-o-o-o, g-g-o-o, g-g-g-o. CTC will try to maximize the probability of these 3 sequences against all 16.<\/p>\n\n\n\n<p>Advanced ASR systems usually have a few more modules at the end of decoding to improve accuracy such as beam search, language models, word boosting etc. We at convozen employ word boosting to cater for domain-specific words occurring in our diverse client base spanning Healthcare, education &amp; finance sectors.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Ethical considerations&nbsp;<\/strong><\/h3>\n\n\n\n<p>ASR systems work with customers&#8217; audio data &amp; domain-specific information, it becomes of utmost importance to ensure data is not misused in nefarious ways such as voice cloning, data leakage to public domains, pitch &amp; personalized discounts leakage.&nbsp;<\/p>\n\n\n\n<p>convozen prioritizes data privacy above all else. Client data is solely utilized to enhance process quality. We guarantee the segregation of data sources in the cloud and implement role-based authentications to restrict access appropriately.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><a href=\"https:\/\/convozen.ai\/\"><img loading=\"lazy\" decoding=\"async\" width=\"728\" height=\"90\" src=\"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/12\/CTA-Banners-2-3.png\" alt=\"https:\/\/convozen.ai\/\" class=\"wp-image-384\" srcset=\"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/12\/CTA-Banners-2-3.png 728w, https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/12\/CTA-Banners-2-3-300x37.png 300w\" sizes=\"auto, (max-width: 728px) 100vw, 728px\" \/><\/a><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h3>\n\n\n\n<p>Automatic speech recognition is a complex field, which is constantly evolving as more open-source audio datasets become available along with better algorithms.&nbsp;<\/p>\n\n\n\n<p>With language understanding models showcasing rapid growth in extracting meaningful insights, ASR systems coupled with LLMs make it possible to see a new future of automation that can enable companies to understand their customers &amp; serve them effectively and grow rapidly!\u00a0<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>References<\/strong><\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/arxiv.org\/pdf\/1305.1145.pdf\">https:\/\/arxiv.org\/pdf\/1305.1145.pdf<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/jonathanbgn.com\/2021\/09\/30\/illustrated-wav2vec-2.html\">An Illustrated Tour of Wav2vec 2.0 | Jonathan Bgn<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/paperswithcode.com\/task\/automatic-speech-recognition\">Automatic Speech Recognition (ASR) | Papers With Code<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/NVIDIA\/NeMo\">NVIDIA\/NeMo: NeMo: a toolkit for conversational AI (github.com)<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/ai.meta.com\/blog\/wav2vec-20-learning-the-structure-of-speech-from-raw-audio\/\">Wav2vec 2.0: Learning the structure of speech from raw audio (meta.com)<\/a><\/li>\n<\/ul>\n\n\n\n<div class=\"inherit-container-width wp-block-group has-text-color has-background is-layout-constrained wp-block-group-is-layout-constrained\" style=\"color:#000000;background-color:#ffffff\">\n<blockquote class=\"wp-block-quote has-text-align-center has-text-color is-layout-flow wp-block-quote-is-layout-flow\" style=\"color:#5f4399\">\n<p><\/p>\n<cite><em>Unleash Your Contact Center&#8217;s Potential Today! \ud83d\udc49 Get Started with <strong>ConvoZen.AI<\/strong> and Elevate Customer Experience. <br> <\/em><\/cite><\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading has-text-align-center has-large-font-size\" id=\"schedule-a-visit\" style=\"line-height:1\"><strong><strong>Schedule a Demo Now!<\/strong><\/strong><\/h2>\n\n\n\n<div class=\"wp-block-buttons is-horizontal is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-03627597 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button has-custom-width wp-block-button__width-25\"><a class=\"wp-block-button__link has-text-color has-background wp-element-button\" href=\"https:\/\/convozen.ai\/booking?action=booking&amp;utm_source=organic&amp;utm_medium=blog\" style=\"border-radius:50px;color:#ffffff;background-color:#5f4399\">Click here<\/a><\/div>\n<\/div>\n<\/div>\n\n\n\n<div style=\"height:100px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>What is Automatic Speech Recognition (ASR)? Telephonic conversations are one of the most effective ways of communication with customers as [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":409,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[16],"tags":[17,18],"news-category":[],"class_list":["post-325","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technical-category","tag-asr-in-telephonic-speech","tag-automatic-speech-recognition"],"acf":{"before_after":null,"comparison_table":null,"icon":null,"playback_showcase":null,"stats":null},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Automatic Speech Recognition in Telephonic Speech - ConvoZen<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Automatic Speech Recognition in Telephonic Speech - ConvoZen\" \/>\n<meta property=\"og:description\" content=\"What is Automatic Speech Recognition (ASR)? Telephonic conversations are one of the most effective ways of communication with customers as [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/\" \/>\n<meta property=\"og:site_name\" content=\"ConvoZen\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-20T00:05:39+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-02-01T09:21:38+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2500\" \/>\n\t<meta property=\"og:image:height\" content=\"1042\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Team ConvoZen\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Team ConvoZen\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/\"},\"author\":{\"name\":\"Team ConvoZen\",\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/#\\\/schema\\\/person\\\/61823786c164b0c406fc967a2721fc0a\"},\"headline\":\"Automatic Speech Recognition in Telephonic Speech\",\"datePublished\":\"2023-11-20T00:05:39+00:00\",\"dateModified\":\"2024-02-01T09:21:38+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/\"},\"wordCount\":794,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/wp-content\\\/uploads\\\/2023\\\/11\\\/Hero-Banners-JAN-02.jpg\",\"keywords\":[\"ASR in telephonic speech\",\"Automatic Speech Recognition\"],\"articleSection\":[\"Technical\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/\",\"url\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/\",\"name\":\"Automatic Speech Recognition in Telephonic Speech - ConvoZen\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/wp-content\\\/uploads\\\/2023\\\/11\\\/Hero-Banners-JAN-02.jpg\",\"datePublished\":\"2023-11-20T00:05:39+00:00\",\"dateModified\":\"2024-02-01T09:21:38+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/#primaryimage\",\"url\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/wp-content\\\/uploads\\\/2023\\\/11\\\/Hero-Banners-JAN-02.jpg\",\"contentUrl\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/wp-content\\\/uploads\\\/2023\\\/11\\\/Hero-Banners-JAN-02.jpg\",\"width\":2500,\"height\":1042},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/technical-category\\\/automatic-speech-recognition-in-telephonic-speech\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Automatic Speech Recognition in Telephonic Speech\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/\",\"name\":\"ConvoZen\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/#organization\",\"name\":\"ConvoZen\",\"url\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/02\\\/Convozen-logo.png\",\"contentUrl\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/02\\\/Convozen-logo.png\",\"width\":202,\"height\":58,\"caption\":\"ConvoZen\"},\"image\":{\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/#\\\/schema\\\/person\\\/61823786c164b0c406fc967a2721fc0a\",\"name\":\"Team ConvoZen\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d29c9b1105eb2f66bf2234a2a3da114d78cf85b53888f6a17b2c934de9bf4766?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d29c9b1105eb2f66bf2234a2a3da114d78cf85b53888f6a17b2c934de9bf4766?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/d29c9b1105eb2f66bf2234a2a3da114d78cf85b53888f6a17b2c934de9bf4766?s=96&d=mm&r=g\",\"caption\":\"Team ConvoZen\"},\"sameAs\":[\"https:\\\/\\\/convozen.ai\\\/blog\\\/\"],\"url\":\"https:\\\/\\\/convozen.ai\\\/blog\\\/author\\\/convozen\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Automatic Speech Recognition in Telephonic Speech - ConvoZen","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/","og_locale":"en_US","og_type":"article","og_title":"Automatic Speech Recognition in Telephonic Speech - ConvoZen","og_description":"What is Automatic Speech Recognition (ASR)? Telephonic conversations are one of the most effective ways of communication with customers as [&hellip;]","og_url":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/","og_site_name":"ConvoZen","article_published_time":"2023-11-20T00:05:39+00:00","article_modified_time":"2024-02-01T09:21:38+00:00","og_image":[{"width":2500,"height":1042,"url":"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02.jpg","type":"image\/jpeg"}],"author":"Team ConvoZen","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Team ConvoZen","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/#article","isPartOf":{"@id":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/"},"author":{"name":"Team ConvoZen","@id":"https:\/\/convozen.ai\/blog\/#\/schema\/person\/61823786c164b0c406fc967a2721fc0a"},"headline":"Automatic Speech Recognition in Telephonic Speech","datePublished":"2023-11-20T00:05:39+00:00","dateModified":"2024-02-01T09:21:38+00:00","mainEntityOfPage":{"@id":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/"},"wordCount":794,"commentCount":0,"publisher":{"@id":"https:\/\/convozen.ai\/blog\/#organization"},"image":{"@id":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/#primaryimage"},"thumbnailUrl":"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02.jpg","keywords":["ASR in telephonic speech","Automatic Speech Recognition"],"articleSection":["Technical"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/","url":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/","name":"Automatic Speech Recognition in Telephonic Speech - ConvoZen","isPartOf":{"@id":"https:\/\/convozen.ai\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/#primaryimage"},"image":{"@id":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/#primaryimage"},"thumbnailUrl":"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02.jpg","datePublished":"2023-11-20T00:05:39+00:00","dateModified":"2024-02-01T09:21:38+00:00","breadcrumb":{"@id":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/#primaryimage","url":"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02.jpg","contentUrl":"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2023\/11\/Hero-Banners-JAN-02.jpg","width":2500,"height":1042},{"@type":"BreadcrumbList","@id":"https:\/\/convozen.ai\/blog\/technical-category\/automatic-speech-recognition-in-telephonic-speech\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/convozen.ai\/blog\/"},{"@type":"ListItem","position":2,"name":"Automatic Speech Recognition in Telephonic Speech"}]},{"@type":"WebSite","@id":"https:\/\/convozen.ai\/blog\/#website","url":"https:\/\/convozen.ai\/blog\/","name":"ConvoZen","description":"","publisher":{"@id":"https:\/\/convozen.ai\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/convozen.ai\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/convozen.ai\/blog\/#organization","name":"ConvoZen","url":"https:\/\/convozen.ai\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/convozen.ai\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2024\/02\/Convozen-logo.png","contentUrl":"https:\/\/convozen.ai\/blog\/wp-content\/uploads\/2024\/02\/Convozen-logo.png","width":202,"height":58,"caption":"ConvoZen"},"image":{"@id":"https:\/\/convozen.ai\/blog\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/convozen.ai\/blog\/#\/schema\/person\/61823786c164b0c406fc967a2721fc0a","name":"Team ConvoZen","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/d29c9b1105eb2f66bf2234a2a3da114d78cf85b53888f6a17b2c934de9bf4766?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/d29c9b1105eb2f66bf2234a2a3da114d78cf85b53888f6a17b2c934de9bf4766?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d29c9b1105eb2f66bf2234a2a3da114d78cf85b53888f6a17b2c934de9bf4766?s=96&d=mm&r=g","caption":"Team ConvoZen"},"sameAs":["https:\/\/convozen.ai\/blog\/"],"url":"https:\/\/convozen.ai\/blog\/author\/convozen\/"}]}},"_links":{"self":[{"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/posts\/325","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/comments?post=325"}],"version-history":[{"count":3,"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/posts\/325\/revisions"}],"predecessor-version":[{"id":518,"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/posts\/325\/revisions\/518"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/media\/409"}],"wp:attachment":[{"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/media?parent=325"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/categories?post=325"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/tags?post=325"},{"taxonomy":"news-category","embeddable":true,"href":"https:\/\/convozen.ai\/blog\/wp-json\/wp\/v2\/news-category?post=325"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}