Adding Text To Speech to Accessible Sudoku

By cyberpuffin, 14 September, 2023

One of the main goals of Accessible Sudoku has been to make Sudoku accessible to all.  It's right there in the title.

The biggest, and first, step towards Accessibility was the theming system.  By allowing the player to adjust their theme (font, sizes, and colors) they could choose what was easiest and most aesthetically appealing to them.

  • Players with a color blindness can adjust to their personal needs
  • Low-vision players can increase the size of everything, albeit with a little scrolling
  • Dyslexic players can use the Open Dyslexic font for easier reading (the same font Sudoku Starter is written in)

This leaves at least two big gaps in audience:

  1. New players with no knowledge of how to play
  2. Low- or No-Vision players

New players can be taught with a tutorial (pending), but the issue of low- or no-vision players requires a bit more work.  In large part because modern screen readers don't work with Godot, as of this devlog.

The best practices for text-to-speech, in terms of the ideal player experience for blind players, is to send output to the player's screen reader. This preserves the choice of language, speed, pitch, etc. that the user set, as well as allows advanced features like allowing players to scroll backward and forward through text. As of now, Godot doesn't provide this level of integration.

-- Godot Tutorial - Text To Speech

emphasis added

Text to speech

No screen reader support yet, but Godot does support the built-in TTS system that ships with the main export targets: Android, iOS, Mac, Web, and Windows. Depending on the variant, Linux (X11) can't be relied upon to ship with the TTS libraries.  It may be necessary to install speech-dispatcher, festival, and espeakup with the variant's package manager (see Requirements for functionality for more details).

Voices available

The first step to enabling TTS is to check if voices are available and Log an error if there are none.

tts_get_voices_for_language(<lang>)

There are two main methods for getting voices from the Display Server: tts_get_voices() and tts_get_voices_for_language(<lang>).

At first glance the function for getting the voices for a specific language looks very attractive, especially since the player can choose their language after the locale / translation updates.

var voices: PackedStringArray = DisplayServer.tts_get_voices_for_language(OS.get_locale_language())

Great, now there's an array of voices that can be set.  Add them into an ItemList and we've got ourselves a voice list... well, mostly.

TTS Voices available list, with tts_get_voices_for_language, on windows.
Godot.tts_get_voice_for_languages.4.1.1.png

Turns out the tts_get_voices_for_language(<lang>) method only gather's the voices' ID and not their name.  Viewing the full registry key for the Windows TTS voice is not especially useful and writing a method to parse the multi-platform IDs for useful names is less than appealing.

tts_get_voices()

This brings us back to tts_get_voices().  With a minor modification we can still use the locale language from the OS Server and build ourselves a prettier list of voices available on the system.

## Build the item list of TTS voices to select from
func update_tts_voice_list() -> void:
	var current_locale: String = OS.get_locale_language()
	var index: int

	self.tts_list.clear()
	
	for voice in DisplayServer.tts_get_voices():
		if voice["language"].begins_with(current_locale):
			index = self.tts_list.add_item(voice["name"])
			self.tts_list.set_item_metadata(index, voice["id"])
			if voice["id"] == Config.audio.tts_voice_selected:
				self.tts_list.select(index)

	self.tts_list.sort_items_by_text()
	self.tts_list.ensure_current_is_visible()

	return
Breakdown

The above starts off with a documentation comment describing the method and then the method declaration itself.  Then moves into a few variable declarations, including one that grabs the current language for later testing.

Clear the existing entries from the list and start a for look that iterates over the Array of voice Dictionaries available.

Each voice's language is checked and skipped if it doesn't begin with the current language.

On voices where the language matches we add a new item to the ItemList (self.tts_list), storing its index.

Then the voice's ID is added to the item's metadata for later retrieval.

If the voice ID matches the currently selected voice ID, set the newly added ItemList item as selected.

Finally, after the voice loop the item list is sorted and the list is scrolled to the selection (when applicable)

TTS Voices available list, with tts_get_voices, on windows.
Godot.tts_get_voices.4.1.1.png

Other voice attributes

As the Godot tutorial said, it's a best practice to allow the player to adjust the rate, pitch, and voice for TTS and restore these settings at next load.

So new components were added to Accessible Sudoku's Options menu with controls that correlate to their method's limits.  For example, the rate can only range from 0.1 to 10.0.

Triggers

Now that the voice and attributes are set and saved we can move on to triggering the TTS on the required events.

There are two main ways to track a user scanning the screen: mouse and touch.  Systems should have one or the other if not both.

Tracking with the mouse is easy enough, watch _input for right click events and enable the mouse tracking when right click pressed is true, disable when right click pressed is false.

Screen touch is a bit harder to differentiate from normal input, especially true when ScrollContainers are involved.  With a timer, however, we can monitor how long the screen touch has been active an monitor after a timeout.

Signaling mouse_entered

With triggers and voices in place we can start connecting to element's mouse_entered signal and trigger a TTS check (to check if tracking is active or not) before sending it off to interrupt what ever is playing to speak the current message.

note: Accessible Sudoku's Project Setting -> Input Devices -> Pointing -> Emulate Mouse From Touch checkbox is true.

But how do you connect to a bunch of arbitrary elements in a scene?  Node Groups

Node groups and mouse processing

Any node can be added into a node group, though the scripting around such a group will have to account for the variety of Nodes it's expected to handle.  Additional node changes may be required, based on the node's defaults.

For instance, the mouse filter on a Label is set to ignore, so it won't receive the mouse_entered signal unless it's changed.

  • Update the mouse filter on each node
    • ignore: elements that shouldn't react to the mouse
    • pass: elements that should react to the mouse and propogate the signal down the tree
    • stop: elements that should react to the mouse and stop further processing
  • Add each node to the "tts" node group

Watch out for that popup

Popups, will grab focus from the main application and prevent the global singletons from processing their _input() methods.  One work-around is to add a script to the dialog and pass the input event to the appropriate singleton.

This works on Dialogs, though MenuButton popups remain to be handled.

Connect to Nodes in group

The final step is to connect the Nodes in the new group to the new TTS methods.

## Connect TTS node group to mouse_enter signals
func add_tts_to_scene_objects() -> void:
	var tts_callable: Callable
	var tts_msg: String

	# Cycle through nodes in the TTS group
	for tts_child in get_tree().get_nodes_in_group("tts"):
		# Determine message based on Node type
		if tts_child is CheckBox:
			tts_msg = "%s %s" % [tr(tts_child.text), tr("CHECK")]
		elif tts_child is ItemList:
			if tts_child.is_anything_selected():
				tts_msg = "%s, %s:" % [tts_child.name, tr("SELECTED")]
				for selected_item in tts_child.get_selected_items():
					tts_msg += " %s" % [tts_child.get_item_text(selected_item)]
			else:
				tts_msg = "%s %s %s" % [tts_child.name, tr("BLANK"), tr("SELECTED")]
		elif tts_child is Label or tts_child is LineEdit:
			tts_msg = tr(tts_child.text)
		elif tts_child is MenuButton:
			tts_msg = "%s %s" % [tr(tts_child.text), tr("MENU")]
		elif tts_child is Slider:
			tts_msg = "%s %s %s %s" % [
				tr("SLIDER"), str(tts_child.value), tr("OUTOF"),
				str(tts_child.max_value)
			]
		elif tts_child is SpinBox:
			tts_msg = tr(tts_child.name)
		else:
			tts_msg = tts_child.name

		# Bind message to TTS check
		tts_callable = Callable(Config.audio, "check_tts").bind(tts_msg)
		
		# Check if the Node is already connected
		if !tts_child.mouse_entered.is_connected(tts_callable):
			# Connect to node's mouse_entered signal
			if tts_child.mouse_entered.connect(tts_callable) != OK:
				Log.error("Failed to connect %s to TTS check" % [tts_child.name])
				continue

			Log.debug("Connected %s to TTS" % [tts_child.name])

	return

In lieu of a breakdown comments have been added to the method.

Limitations

  • Values are determined and set at the time of connecting the callback.  If a Label gets updated in a later frame, the old value will still be bound to the tts callback.

Wrap-up

From here all that should remain is a bit of play testing to find which nodes haven't yet been added to the "tts" node group and work out the popup input() hogging.

Versions

  • Godot 4.1.1-stable
Technology

Comments