macOS has a bunch of apps which can do so, including SketchyVim. Basically you would have all the vim modes motions and operators, inside any text box in the OS / in any app. I just did some looking up and asked LLMs, but didn’t find any linux equivalents of that. Ideally they would work on wayland and have app or window class exceptions.
It looks like this works by following accessibility standards. I’m not sure if an accessibility standard for input fields on Linux, but if it does it should be possible.