IBM’s Watson Speech-To-Text censors Dick Van Dyke’s… surname.

Heading image by Turinboy / Flickr

IBM has wholly adopted the Watson branding ever since its debut in Jeopardy. The Watson umbrella now holds 19 APIs and counting at IBM. The terms are also surprisingly good, such that I used them for my assistant testing. They do have some quirks though.

The funniest one, to me, was a transcription that came along the Mary Poppins question, asking about its cast. Quoth Watson, verbatim:

the cast of Mary Poppins includes Julie Andrews Dick Van **** and 17 others .

Dick Van What? It had been smart enough to not censor the proper name “Dick”, so it either parsed a lowercase one at the end or something else popped in. Not to worry, Mr Fawlty, I learn. The profanity filter in US English is on by default. Turning it off gives us the unadulterated (or is that adult-rated?) response:

the cast of Mary Poppins includes Julie Andrews Dick Van Dyke and 17 others

So it got the name right. Which means the speech recogniser not only was trained properly but produced a proper name output. But the word filter coming after that apparently took issue with Dyke, presumably in reference to the slang usage of it, but having no knowledge or interest in that it’s being used as a proper noun.

Leave a Reply