Interesting excerpt:
De Boer agrees that our brains are the bottleneck. But, he says, instead of being limited by how quickly we can process information by listening, we’re likely limited by how quickly we can gather our thoughts. That’s because, he says, the average person can listen to audio recordings sped up to about 120%—and still have no problems with comprehension. “It really seems that the bottleneck is in putting the ideas together.”


Yes, exactly. This is information that’s encoded by tone, and it is accounted for in the 7 bits per syllable (or lack of syllable, for periods for example). It was more of an example to show how if what you’re conveying is assumed to always be speech, the encoding you can use can be much more efficient.
On that note, a thing if forgot to mention is that speech assumes that what will be said is pretty much always valid. For example, sure, ascii has a lot more information density at 8 bits per character as you point out, but in reality it’s capable of encoding things like “hsuuia75hs”. If you tried communicating this to someone over speech, you’d find that the average speed you can do this drops dramatically from the normal 7 bits/syllable, where the ascii used in my comment’s text has been constant-speed. That’s one of the trade-offs.