Design
Event Based Design
Since Node.js was introduced to the market, the web space has enjoyed more scalable and responsive systems, due to Node’s event driven architecture, where every system is loosely coupled to one another, where asynchronous programming, events and messages govern the program flow, therefore making traditional performance bottlenecks like disk i/o, database access and others nearly insignificant, since access to such resources is properly managed and interested parties are notified whenever such a resource is free to use.
Inspired by this design, we decided to make the screen reader as event based as possible. Eventually the goal is to be able to abstract away the event sources in interfaces implementable by other components, therefore paving the way for alternative input methods.
For this to be doable, we’d need roughly the following to be implemented:
- An event manager(runs on the main thread)
this is the central hub for events. From here, events are received by system services like AT-SPI, ATK and so on, then propagated to the components interested in that particular event. Note: Each component should register with the hub for every event it wants to receive.
The event hub should also reference the AT-SPI registry, and should register all the events the entire screen reader needs, then propagate them to the appropriate components. Running on another thread should not only be discouraged, but actively prohibited, for example by not implementing send and sync.
- Component protocol(can run on multiple threads)
this is the protocol every component should adhere to. It should include methods for sending and receiving events. Basically, every component should be thread-safe, the only way they communicate to the event hub being through a channel, for example rust’s mpsc. The hub holds all the events in an array of data structures that, among other things, include the sending half of an mpsc channel for each of the registered components.
- Keyboard/alternative input method component, UI component, speech component, etc
specific implementations of the component protocol. Each component should deal with one, and only one, subsystem of the screen reader, sending proper events to the event hub for further propagation and handling by other systems if needed, for example the keyboard handler sends insert+q>goes to hub>goes to gesture manager>goes to quit>resource cleanup and quit program.
Speech
As we want to make sure our screen reader integrates with the Linux ecosystem and the existing tools as much as possible, we decided we will use Speech-Dispatcher as the speech platform
Each speech related setting will be saved on a per-screen reader and per-user basis by default. However, we are considering the possibility of including a utility that allows modification of the global Speech-Dispatcher configuration. Whether this is achievable depends on how stable the Speech-Dispatcher config format is, and if we can reverse engineer it or find some kind of formal spec. We can use the nom crate to parse it.
Though the default speech system will be Speech-Dispatcher, we will try to make it possible to allow addons to define their own speech systems, though this is certainly lower priority, as writing a speech-dispatcher module is very easy and allows the synth to be used with other Linux tools.
Object Navigation
Like in NVDA for Windows, this should follow the accessibility hierarchy as much as possible.
This should be done in the way users of screen readers with such features expect, a left/right command to navigate to the previous/next sibling of the object in focus, up to go to the parent object(if any), and down to go to the children of an object(if any).
Add-ons
Note: This section needs improvement and more discussion, however these are the ideas and plans so far
This section of the design document has concepts from the Android app format(.APK), as I think it’s brilliantly designed; an addon, before installation, is a zip archive that’s digitally signed to verify the author’s authenticity. It is composed of the following:
- A manifest file, always named “manifest.json”
Defines metadata about the add-on, such as its name, version, author, minimal Odilia version, capabilities, programming language, documentation links or local documentation, etc.
- The add-on bundle
The code of the add-on itself.
depending on the language declared in the manifest, the addon will be searched in the following locations within the archive- for Lua add-ons, it will be searched in lib/$addon_name.lua. The addon will be loaded through the Lua module mechanism, a.k.a require.
- for C or Rust add-ons, it will be searched in lib/$target_tripple/lib$add_on_name.so. This will be loaded through the dynamic linkage procedures in Linux.
Note: We may allow addons to be written in other languages, such as C, using a custom add-on subsystem that could be installed as an addon itself. However, installing such add-ons will warn the user that they could cause Odilia to become unstable due to things like memory leaks or segmentation faults, and that we are not responsible for these crashes, which should be reported to the add-on developers. This arises from the fact that add-ons run in the same process as Odilia, so if one crashes, it could bring Odilia down with it. Languages that are memory safe, such as Rust, or garbage collected languages like Lua, won’t produce a warning, as the risk of them crashing the entire screen reader is much lower, if not impossible.
Add-on Distribution
We intend to provide a central distribution platform for add-ons, for easy installation and discovery by the users.
Like any extension store, features like searching, filtering and so on will be available, as well as automatic updating upon the users choice.
Since this is a public addon platform, we will have to make sure our users don’t get infected due to the addons distributed through it, therefore
- In order to be published on the store, an add-on needs to be digitally signed with a certificate that uniquely identifies the entity or organisation responsible for its development and distribution.
- a static analysis of the archive would commence whenever a new version is uploaded. This could include scanning of the manifest file for dubious sets of entitlements, a submission of the binary files to virustotal, etc.
- Of course, we will encourage developers to make use of unit testing, and down the road could provide a unit testing framework that simulates the Odilia addon API that can be used to programmatically test addon functionality.
However, with a specific setting in the screen reader, the user can allow the installation of extensions not distributed through the store. Then, the addon manager would allow browsing to an addon file and install it.
Included Addons
In the spirit of modularity and collaboration potential, and because we want our add-on API to be as bullet-proof, efficient to use, and clearly documented as possible, most of the not strictly needed but useful components as commodity features(what is strictly not necessary to operate the core) will be included with the distribution as addons. Per the users choice, some of those can be disabled or even uninstalled, cutting down on the installed size.
OCR
The OCR system will have two modes:
NVDA Style OCR
The OCR results open in a different window, and the user will be able to click pieces of text for the mouse to perform a click at that location in the real window. This would also be the fastest way to perform even arbitrary OCR, as the OCR info will be available to add-ons, so the window doesn’t necessarily need to pop open, exactly like nvda.
this is under discussion, if it’s indeed possible to do such things with the current open source accessibility stack we have.
Innovative OCR
This mode doesn’t attempt to open a new window, instead it overlays the OCR within the current one.
Based on the control’s shape, position and color, Odilia will try to make an accessible tree for that window, recognising the control type and using the captured text as the control text if applicable. Of course, because of the error probability being so high, there would be a possibility for the user to define controls manually based on the mouse position and providing the text, or make improvements to the already generated OCR result, then saving it as a file that can be distributed. Of course such tweaks will be resolution dependent, but probably the utility will try to correct it as much as possible to fit the application by finding the greatest similarity to both the text and cords and control type from the provided database in the OCRed text.
Sound Themes (Earcons)
For a long time, mobile device users have had earcons, a sound representation of each control and element type on the current interface. Due to that, such users don’t need to wait for the speech to arrive to the control type, as it’s obvious what it is based on the specific sound.
NVDA uses an addon called audio themes together with unspoken to deliver this functionality which, while not perfect and can lag sometimes, at least tries to accomplish the same level of efficiency as the mobile experience and manages for the most part.
While preliminary support for audio has been implemented in Orca, it hasn’t been expanded on since, and as far as we know, the only thing that uses it is progress bar beeps.
Odilia will implement this as an add-on that’s part of the distribution, since its another unnecessary feature for the functioning of the screen reader, even though it increases productivity.
(Eventually), it would also be nice to support positional audio for earcons, so the user knows where the control they’re interacting is on screen.
Gesture System
This system needs to be as flexible as possible, so the gesture system will be an event component implementing the above mentioned protocol.
First, it takes all the raw user actions as events from the hub, registering them all, like key presses, touch gestures and other user-generated commands through alternative input methods.
Then, it reads the gestures configuration file, or maybe database.
When it finds the gesture corresponding to the event it got from the hub, it reads the action for that, constructs the appropriate event, sending it to the event hub to be processed and dispatched to the correct component.
Links to Currently Available Documentation
- at-spi API documentation
- at-spi general overview, with several links.
- at-spi examples, including Javascript, C and Python.
- Examples using exclusively py-atspi2, the human-generated bindings, not the ones generated via object introspection.
- Rust bindings for at-spi
- Rust bindings to speech dispatcher, speech-dispatcher-sys. Bindings auto-generated with rust-bindgen, written by nolan
- speech-dispatcher library documentation.
- The Speech Synthesis Interface Protocol (SSIP) API documentation.
- Liblouis bindings for rust, for braille tables and multiple braille codes, including those for the majority of languages
- Libbrlapi reference manual.
- Accessible Document Navigation Using AT-SPI