VisionAgent.act()
.
Example:
name
str - The name of the AI element that was not found.locations
list[pathlib.Path] - The locations that were searched for the AI element.message
str - The error message.status_code
int - The HTTP status code from the API response.message
str - The error message from the API response.message
str - The error message.message
str - The error message.message
str - The error message.query
str - The query that was made.message
str - The error message.query
str - The query that was made.response
Any - The response that was received.model
str | ModelComposition - The model that was used.model_type
Literal[“Act”, “Locator (locate)”, “Query (get/extract)”] - The type of model that was used.get()
method of
VisionAgent
to extract information from screenshots or other images. These models analyze visual
content and return structured or unstructured information based on queries.
Example:
root
PILImage.Image - The underlying PIL Image object.root
Img - The image source to load from.askui.VisionAgent.get()
, askui.VisionAgent.locate()
, etc.
Accepts:
PIL.Image.Image
str
or pathlib.Path
)"data:image/png;base64,..."
)click()
, locate()
, and
mouse_move()
methods of VisionAgent
to find UI elements on screen. These models
analyze visual content to determine the coordinates of elements based on
descriptions or locators.
Example:
VisionAgent
, whether it’s an
ActModel
, GetModel
, or LocateModel
. It’s useful for type hints when you need to
work with models in a generic way.
ModelDefinition
) to be used for a task, e.g., locating an element on the screen to be able to click on it or extracting text from an image.
task
str - The task the model is trained for, e.g., end-to-end OCR ("e2e_ocr"
) or object detection ("od"
)architecture
str - The architecture of the model, e.g., "easy_ocr"
or "yolo"
version
str - The version of the modelinterface
str - The interface the model is trained for, e.g., "online_learning"
use_case
str, optional - The use case the model is trained for. In the case of workspace specific AskUI models, this is often the workspace id but with ”-” replaced by ”_”. Defaults to "00000000_0000_0000_0000_000000000000"
(custom null value).tags
list[str], optional - Tags for identifying the model that cannot be represented by other properties, e.g., ["trained", "word_level"]
VisionAgent
.
Example:
askui.VisionAgent.get()
.
The following types are allowed:
ResponseSchemaBase
: Custom Pydantic models that extend ResponseSchemaBase
str
: String responsesbool
: Boolean responsesint
: Integer responsesfloat
: Floating point responsesstr
as {"type": "string"}
, to be passed to model(s).
Also used for validating the responses of the model(s) used for data extraction.
askui.VisionAgent.get()
.
This class extends Pydantic’s BaseModel and adds constraints and configuration on top so that it can be used with models to define the schema (type) of the data to be extracted.
Example: