I got a chance to go hands-on with the Galaxy S20 and S20 Plus and these phones are pretty cool right off the bat! Hey, but these phones actually have an equally high price to match with the Galaxy S20 starting at Rs. 67,000 and the S20 Plus starting at Rs 74,000 right? The thing is, we are not here to talk about its value or to review them but rather exploring the science behind a specific feature: ‘Live caption’! What is ‘live caption’? Let’s say you’re in a train with people around you and you don’t have your headphones with you. Or you’re watching a video that is in a language that you don’t understand! Or then you’re just checking something discreet! Live caption lets you see real-time captions or subtitles for your videos, podcasts or even voice notes without sending any information to Google! This feature is helpful in any of the scenarios that I mentioned earlier and is also pretty helpful for the hearing impaired, which makes it an awesome feature. In the near future, ‘live caption’ can also translate captions that are generated to any language real-time which would help you in watching videos irrespective of whether you know the language or not. You see the S20 family is not the first line up of phones to have ‘live caption’ but rather that feature is built into Android itself! But then, what makes the S20 line-up special is the fact that they’re the first non-Pixel lineup of phones to have ‘live caption’. Now, this would mean that ‘live caption’ would eventually make its way to most of the phones out there including the one that you are using! To use ‘live caption’, press the volume up or down button. Once the volume indicator comes up on the screen there will be a button below it with the captions icon which can be pressed to turn it on. If you don’t find the button below the volume indicator, you can go to the settings then go to accessibility and then to ‘live caption’. You can find the switch to turn on live captions here along with some other options like ‘language’, ‘high profanity’ and so on. If you don’t find the option live captions in the settings, unfortunately, it is not yet available for your phone. But then, it’ll pretty soon be rolled out to most of the phones or you can install a custom ROM to use it. Captioning is a subcategory of ‘natural language processing’, which is the branch of science that deals with the interaction between a human and a computer. You see, us humans come to this world with the innate ability to interact with other human beings because we’re sentient. Let’s suppose that we need to interact to others by writing down messages and passing it to them.. That would be a real pain and that’s how we interact with computers as of now! It is much easier to talk to them. So natural language processing tries to solve this problem using three broad categories. ‘Speech recognition’ and ‘speech to text’ deal with understanding the word spoken and converting them into text. As of today, speech recognition works based on the hidden Markov chain which takes the raw audio waveform that is there in the video or podcast, then chop it into small pieces and tries to identify the ‘Phonemes’ in each of these pieces. ‘Phonemes’ are the elemental sound of any language which can be combined to create any word in that language. “tomato” For example, the English language is said to contain around 40 ‘Phonemes’. The algorithm then compares the chopped phoneme combinations with the words that are there in its library to convert them into text. For example, if I say the word “STUFF”, the sound card will convert the analog signal from the microphone into digital signal, and then chops it into smaller pieces to find the ‘Phonemes’ which in this case are “ST”, “UH” and “FF”. Then the algorithm looks up four words that have these three phonemes tagged to them. Text analysis is the part where the computer tries to understand the text being generated. This phase is pretty important as interpreting languages without understanding the context can be very tricky! Consider the sentences “I know it” and “I said no to it”. These sentences both have the sound “no”, but in the first sentence it is “K-N-O-W” and in the second one it is “N-O”, which have drastically different meanings To understand the semantics, the algorithm uses ‘N-gram’ technique, which is basically looking at the adjacent words
for any word to predict that particular word. For example, let’s consider a sentence that has the sound “rain”. If that sentence contains other words like ‘thunder’ and ‘lightning’ the sound “rain” is probably our “R-A-I-N”, while if that sentence contains other words like ‘horse’ and ‘riding’, the “rein” in that sentence is
probably “R-E-I-N”. This is why you can see Google continually changing what is being told to it! Alright, if you enjoyed this video do give it a thumbs up and share it with your friends. Until I see you in the next video, this is Param signing off, wishing you absolutely nothing but the best of success!