Enough Prototypes, let's do this for real

Published on February 2, 2022 by BGT Lover· 5 minutes read

As of afew days ago, we decided we played enough with the tools and components of the linux accessibility stack to be able to say that we aren’t so ignorant anymore, we know at least the basics about how it all fits together, how our screenreaders read information from a linux gui app, how the gui toolkit and atspi work together to provide the required context for such tools to operate.

Since I don’t want to bore you with code snippets, git logs, conversations from who knows how much time ago and so on, I will quickly give you a rundown of everything we learned during this long, laborious but nevertheless fun and educational journey

what we learned

Atspi is not so impossible to work with after all. If you worked with windows a11y before, even if only through the nvda python console, things shouldn’t be terribly alien to you. All accessible applications offer a tree of components called accessible objects which have mostly readonly properties such as text, role, relations and so on, events through which they can report internal changes, and methods through which they can be modifyed to some extent. If you also manage to wrap your head around dbus, it’s not that hard honestly.
Async Rust is pretty hard, async Rust that communicates with evdev is harder. We’re using an async architecture based on Tokio, in the hopes that this will make better use of the CPU and provide higher performance and reliability. A downside to this is that code can be significantly harder to write, and a little harder to understand sometimes. Working with the kernel API’s was always hard, introduce logic to bridge that model to the async model Rust uses is very hard, our architecture for that is not really sound.
Glib is hard to understand, even harder to master, hardest of all to bind. I understand what the people behind it wanted to do, an object oriented library that offers a concrete hierarchy of tipes in a formal specification, bindings to different languages being generated by special generators. It’s written in c for maximum performance and all that, at the same time offering high-level oop like abstractions over OS utilities, making a devs life much easier than if they were writing programs in raw c with the OS API’s. However, what resulted is, in my opinion, a monolithic framework that’s non-intuitive to work with for the most part, binding generators that don’t produce idiomatic code if they are able to generate all of the interface in the first place, looking at you, gir, weird datastructures like garray which are very hard to reason with in other languages, again, trying to do that in Rust was a pane, etc. Among some other miner reasons, that’s why we chose to not bind libatspi, but instead communicate with the atspi registrid directly through dbus.
In atspi land, everything is a node in a tree, even some desktop decorations that may or may not be visible to the sighted, having to sometimes filter out the junk. This will be important later, especially when we’re dealing with object navigation, allowing the user to freely move through the accessibility tree. We must find some heuristics that would allow us to just skip the decorative parts, mostly by trial and error. Alternatively, we can contact the developers of desktops we see this happening in, offering them directions on how to remove the junk.
Working with the kernel directly is frowned upon for security reasons, however evdev brings us advantages and features atspi isn’t made to give us, so we’re willing to accept the risk. Note: in addon space, we would have to be very careful about what we will allow to register as a keybinding, how are we going to allow addons to register keybindings in the first place, since evdev gives us privilidged access to every input device the kernel recognises, that means even the power button could be used as a key, I don’t think users would be very happy if that were the case. So, a malicious addon could easily lock up your keyboard, touchscreen, trackpad and whatever else if we aren’t careful with the design, perhaps only allow keybindings that are present in a pattern we either hardcode or allow the user to extend in a config file, anything else would be rejected.
DBUS code can be wordy and verbose at times, error messages more criptic than true cifertext, but it beats working with libatspi, a nightmare to bind with in Rust. FFI is not hard to do generally, but glib requires too much effort, the binding generators for Rust just aren’t up to the job yet.
Modularity is good, but too much modularity is bad. We were working on many crates separately, the only common denominator between them being that they are inside the Odilia github org. That means cargo isn’t able to reliably track versions for us, plus it generally got a bit chaotic towards the end. Furthermore, some of those crates shouldn’t have been separated at all imho, that’s why we are working now on unifying those crates, we’ll primarily use cargo workspaces.

final thoughts

this has been a very interesting journey indeed, I personally learned more things from this prototype than most other projects I made. However, as all playing has to eventually come to an end, we think it should end now. The prototype played its role in all this, it’s time to do it for real. For everyone who supported us so far, either by giving feedback, spreading the news around, even by being in the testing group and helping us test whatever comes in the stream of conciousness that the testing branch still is, we thank you, one and all. To some degree, you influenced the growth of the odilia screenreader, you are a part of how things are now.

With that out of the way, expect more news shortly, maybe. Let’s hope this is a good sign for Odilia. As always, let’s build a linux screenreader together, one step at a time!