Using Audio & Text to Speech

First, the excuses…

I thought that I had been doing pretty well on the ITR12 course, and then December came along. In common with teachers around the country as soon as the calendar turns to 1st December I lose all semblance of any order to my professional life as all the ‘other things’  – which, to be fair, are vital parts of the wider life of the school – start making increasing demands on your time at the same time as your nearest and dearest start doing the same outside of work. I managed to do very little for the course during this time (but was impressed I managed to do anything!). Normally, the holidays can be a good time to pick up some of the slack, but this year we were lucky enough to have had arranged a trip to New York and so no slack could be taken up. “No problem,” I thought, “I’ll just get dug right in when we get home.” 

Or alternatively I’ll catch some bug and spend the next two and a half weeks feeling absolutely lousy and unable to focus on any kind of work! 

Anyway, feeling a bit more human now, and noticing the course moving on relentlessly without me I thought I had better try and get caught up. So, apologies for lagging behind, but the catch-up starts now!

Confession Time

I have to admit to having some amount of trepidation about my forthcoming confession. There’s no need for it really, I could easily write a blog post reflecting fairly honestly on my audio experiences without making things so clear, but I feel that to do so would be disingenuous at best and downright dishonest at worst. So here it is.

I hated it.

It can’t have been that bad, can it?

Actually, the answer to that is No, but Yes as well. That sounds a bit confusing, so I should probably explain.

It’s a game of two halves Brian….1

During my ‘audio experience’ I had a chance to listen to a couple of stories and a novel as audiobooks. These I really enjoyed. The stories were fairy tales from CDs being given away with The Guardian during September, and feature actors Stephen Mangan and Tamsin Grieg on voice duties. The novel was the audiobook of “The Great Hamster Massacre” read by someone whose name I didn’t recognise – Susie Riddell – but who turned out to be a graduate of the Royal Welsh College of Music and Drama; a professional actress and voiceover artist who has narrated many books, acted in many radio plays and is currently a regular in “The Archers”.

With such expertise on voice duties, it is perhaps unsurprising that I really enjoyed these. And perhaps looking back over the years and seeing where I have enjoyed many radio dramas or professional readings, no great surprise. I can easily see why people would enjoy listening to these, and why people would choose to listen to them. I even thought about audiobooks for my car journey to and from work – could be interesting and make a change. 

It’s a game of two halves Brian….2

Next, encouraged by my audiobook experience and inspired by David’s blog post I decided to have a go using my computer and browsing the internet via audio.

And I hated it.

Why was it so bad?

First of all, I have to admit I’m no expert at setting up or using voice accessibility on the computer (in this case Windows Narrator), so perhaps I contributed to my own downfall somewhat. But then, on the other hand, I’m the guy who should be able to do it for our school, so I’m not going to cut myself any slack there.

I could find nothing properly. Nothing. Think about that for a minute – absolutely nothing. Despite the fact I am a pretty proficient computer user and have some experience in assistive technology, I was unable to open a file, start an application or browse to a webpage purely using the audio. I had to peek. A lot.

And that’s not the half of it. David’s sums it all up pretty well in his fantastic post, so I’m not going to try and do the same (although I am going to recommend you read his post!) but I will add a couple of points of my own.

Voice Strain

Firstly, just listening to the voices is hard, hard work. Much harder than listening to a recorded voice – even an amateur one – and certainly much harder than listening to a ‘voice professional’ like those discussed above. To try and illustrate what I mean, I am going to insert some short audio clips in here as evidence. Using the introduction to this post as the reading material, I am going to add a text generated Chirbit of a synthetic voice reading the passage and then an AudioBoo of myself reading it (I did ask Stephen Fry to record the same clip too, but it turns out he’s rather busy).  

Check this out on Chirbit


I find the synthetic voice – and it’s not just this one, it’s most of them – incredibly difficult to listen to. They often seem to read too quickly,  although I know you can slow the speed of a lot of them down. And if you miss a bit, or want to check something again, it can be very awkward, but it’s more than that, I just don’t think my ears ‘like’ doing it.

No discrimination

To compound this misery, the screen reader reads out everything that’s on screen – and I mean everything. And including loads of things that aren’t on on screen too! Compare that to when you read a piece of text yourself – you know you’re just looking for the actual body of text, so you probably ignore internet addresses, headers, footers, font type,  page numbers, prices, copyright notices….you get the idea. The screen reader has no such discrimination; depending on how much text or links are on a page you may end up being there for quite a while.  You take for granted how much filtering you do when reading without even thinking about it, when you suddenly lose this ability it’s a nightmare. Then you have to think about all the keyboard shortcuts at the same time to try and get Narrator to read what you want it to. I’ve been trying for about a fortnight with the shortcuts in front of me and I still can’t manage it properly.

Think you’re working hard then, do you?

The last thing I’m going to mention is just how tiring the whole experience is. Possibly due to an interaction of the previous two points, I found the whole experience exhausting. I couldn’t believe how tiring I was finding such simple tasks – and there’s a thought to take back to the classroom.

Now, perhaps some or all of this is due to never having done these things before. Perhaps I would get better the more I practised, and would find the whole experience less uncomfortable. I would like to think that this would be the case, because if it doesn’t get easier, and that is what some students have to go through every day then I think we need to come up with a Better Way – and fast. 


  1. Thanks very much for linking to my article. I’ve not had any experience with Windows Narrator, but according to the user survey by webaim, this isn’t even in the top 10 most used software that blind users use.

    When I did my testing I used JAWS (the most popular worldwide, but costs a small fortune), and NVDA which is free.

    You mentioned how hard the voice is to understand. Listening to your clip I can understand why! The voice for JAWS isn’t too bad actually, it’s legible and many blind users have the speed cranked right up to speeds far greater than I can comprehend. The free NVDA voice isn’t great, but I found it okay once I got used to it. You can also choose different voices, and I hear you can download a better voice but this costs money or I’ve heard some blind users download new voices illegally.

    Whilst I agree mostly that we need “a better way” like you mentioned, I’ve done some chatting to some blind and screenreader users recently, and what I’ve found is that actually they’re mostly quite happy with the way the screenreader reads out everything. I considered changing the way I make websites and remove a lot of what they would consider as clutter, but in doing so we’re giving them a different experience, instead of giving them the freedom and the right to the same experience that a sighted user has.

    It could be that blind users just don’t know any better so accept this, but it’s a tricky line between tailoring their experience to make it better, and empowering them with access to full information on the page.

  2. Hey there Dave. Not a problem linking to your article; it’s a great article, and that survey you’ve posted here is pretty useful too.

    Yeah, when I was looking at using some of the software, I thought I would start by looking at what came built in to the computer first. Narrator sounds alright, but it is very difficult to control – although in fairness, I’m finding that with most of the software. I tried installing ChromeVox then, but it wouldn’t load in. I have managed it now, but it still seems a bit buggy. I looked at Thunder then, and also had a look at WebAnywhere. Even with okay vision and looking at the shortcuts, I can’t really get any of them to do what I want. WebAnywhere seems to be the most agreeable at the moment, although it seems much more prone to incorrect pronumciation. I saw a video for NVDA, and maybe it’s the one I should have started with.

    I can’t have made myself clear enough in the post – I didn’t really mean that I couldn’t understand the voices – mostly I could – it’s just that I found the very act of listening to them very, very hard work, and for different reasons for each of them! The speed is one thing, but even slowed down a bit they aren’t easy to listen to. WebAnywhere sounds quite mechanical, but reads all the text easily (even if it makes some mistakes). Narrator sounds better, but getting it to read what you want is tortuous. ChromeVox sounds good and seems quite easy to navigate to where you want it to read, but it keeps hanging in the middle of reading selected text. The Chirbit voice does not sound very good at all in the embedded player – it did sound a bit better in the website itself – but it was the easiest way I could think of to come up with a synthetic voice to embed in the blog. CALL Scotland have produced the Scottish Voices – Stuart and Heather – and they seem quite easy on the ear, although I have not given them a proper test drive just yet, and they can be difficult to get hold of. That’s interesting about the illegal downloading – I wouldn’t have thought of it, although as soon as you said it I thought “Well, yeah, I suppose you would have that issue!”

    What you said about blind and screen reader users and that they are happy with the current interfaces is very interesting. On one hand, I suppose it backs up what I was saying about practice – the more you try something the better you get. I reckon if the users that the technology is intended for are happy with it, then that’s all that counts, but from a technical standpoint I don’t know that it is replicating the sighted user experience – unless of course there are ways of customising the screenreaders that I’m not aware of (although that is quite, quite possible!). I mean, generally, as a sighted user I know what bit of a webpage I’m going to look at first, and my eyes are straight onto it as the page loads in. I don’t need to scan through the address bar or any menus or links to get to it. Can a screenreader be set in that way – to jump to the ‘main’pane on a page loading? Plus, while I can understand that the users are happy enough with what they have, if they were able to start again and design an interface from scratch now, using all of today’s technology, would the screenreader format be what they would come up with? I would have thought a more Siri-like experience could have been a good starting point.

    As for designing resources, be they web based or otherwise, I’m more confused than ever. Using the screenreaders has made me examine how useful ‘alt text’ can be, but I guess the best way to find out would be to do what you have done and speak to those who will be using the technology.

    Thanks again for your original post, and for your comment which has helped clarify a lot of thinking!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.