What models is AskUI using?

AskUI uses a layered system of AI models, each with a distinct role in understanding, executing, and interacting with user interfaces. Here’s how we classify and use them:

  1. Grounding Models (Locators)

    1. Grounding models identify and interact with UI elements on the screen.
  2. Query Models (Asks)

    • Responsible for answering user queries or generating intelligent responses.
  3. Large Action Models (act command) (Multi Step)

    • Responsibilites
      • Goal to → Planning
      • Delegate Grounding Models
      • Delegate Query Models
      • Reflection of Errors
    • UI-Tars
    • Computer-Use
    Model TypeModel NamePurposeTeachableOnline Trainable
    GroundingUIDT-1Locate elements & understand screenNoPartial
    GroundingPTA-1Convert prompts into one-click actionsNoYes
    QueryGPT-4Understand & respond to user queriesYesNo
    QueryComputer UseUnderstand & respond to user queriesYesNo
    Large Action (act)Computer UsePlan and execute full workflowsYesNo
    Large Action (act)UI-TarsPlan and execute full workflowsYesNo