Skip to content

DedInc/emunium

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Emunium

Human-like browser and desktop automation. No CDP, no WebDriver -- emunium drives Chrome through a custom WebSocket bridge and performs all mouse/keyboard actions at the OS level, making scripts indistinguishable from real user input. A standalone mode covers desktop apps via image template matching and OCR.

Preview


Table of Contents


Installation

pip install emunium

Optional extras:

pip install "emunium[standalone]"   # image template matching (OpenCV + NumPy)
pip install "emunium[ocr]"          # EasyOCR text detection
pip install "emunium[parsing]"      # fast HTML parsing with selectolax
pip install "emunium[keyboard]"     # low-level keyboard input

Chrome is downloaded automatically on first launch via ensure_chrome().


Browser mode

from emunium import Browser, ClickType, Wait, WaitStrategy

with Browser(user_data_dir="my_profile") as browser:
    browser.goto("https://duckduckgo.com/")

    browser.type('input[name="q"]', "emunium automation")
    browser.click('button[type="submit"]', click_type=ClickType.LEFT)

    browser.wait(
        "a[data-testid='result-title-a']",
        strategy=WaitStrategy.STABLE,
        condition=Wait().visible().text_not_empty().stable(duration_ms=500),
        timeout=30,
    )

    print(browser.title, browser.url)

    for link in browser.query_selector_all("a[data-testid='result-title-a']")[:5]:
        print(f"  {link.text.strip()[:60]}  ({link.screen_x:.0f}, {link.screen_y:.0f})")

Browser constructor:

Browser(
    headless=False,
    user_data_dir=None,   # persistent profile dir; temp dir if None
    bridge_port=0,        # 0 = OS-assigned
    bridge_timeout=60.0,  # seconds to wait for extension handshake
)

Properties: browser.url, browser.title, browser.bridge.


Standalone mode

from emunium import Emunium, ClickType

emu = Emunium()
matches = emu.find_elements("search_icon.png", min_confidence=0.8)
if matches:
    emu.click_at(matches[0], ClickType.LEFT)

fields = emu.find_elements("text_field.png", min_confidence=0.85)
if fields:
    emu.type_at(fields[0], "hello world")

With OCR:

emu = Emunium(ocr=True, use_gpu=True, langs=["en"])
hits = emu.find_text_elements("Sign in", min_confidence=0.8)
if hits:
    emu.click_at(hits[0])

Waiting

Simple waits

All raise TimeoutError on timeout:

browser.wait_for_element(selector, timeout=10.0)
browser.wait_for_xpath(xpath, timeout=10.0)
browser.wait_for_text(text, timeout=10.0)
browser.wait_for_idle(silence=2.0, timeout=30.0)

Advanced waits with conditions

Wait() is a fluent builder. Conditions are ANDed by default:

browser.wait(
    "#results",
    strategy=WaitStrategy.STABLE,
    condition=Wait().visible().text_not_empty().stable(500),
    timeout=15,
)

Available conditions:

Method Description
.visible() Non-zero dimensions, not visibility:hidden
.clickable() Visible, enabled, pointer-events not none
.stable(duration_ms=300) Bounding rect unchanged for N ms
.unobscured() Not covered by another element at center point
.hidden() Element exists but is not visible
.detached() Element removed from DOM or never appeared
.text_not_empty() Inner text is non-empty after trim
.text_contains(sub) Inner text includes substring
.has_attribute(name, value=None) Attribute present (optionally with value)
.without_attribute(name) Attribute absent
.has_class(name) CSS class present
.has_style(prop, value) Computed style property equals value
.count_gt(n) More than N matching elements in DOM
.count_eq(n) Exactly N matching elements in DOM
.custom_js(code) Custom JS expression; receives el argument

WaitStrategy values: PRESENCE, VISIBLE, CLICKABLE, STABLE, UNOBSCURED.

Logical conditions

Combine conditions with OR/AND/NOT logic:

# Wait for EITHER a success message OR a captcha box
element = browser.wait(
    "body",
    condition=Wait().any_of(
        Wait().has_class("success-loaded"),
        Wait().text_contains("Verify you are human")
    ),
    timeout=15,
)

# Explicit AND (same as chaining, but groups sub-conditions)
browser.wait(
    "#panel",
    condition=Wait().all_of(
        Wait().visible().text_not_empty(),
        Wait().has_attribute("data-ready", "true"),
    ),
)

# NOT: wait until element is no longer disabled
browser.wait(
    "#submit",
    condition=Wait().not_(Wait().has_attribute("disabled")),
)

Negative waits

Wait for a loading spinner to be removed from the DOM:

browser.click("#submit-btn")
browser.wait(".loading-spinner", condition=Wait().detached(), timeout=20)

Wait for an element to become hidden (still in DOM but invisible):

browser.wait(".tooltip", condition=Wait().hidden(), timeout=5)

Soft waits

Check for something without crashing when it doesn't appear. Pass raise_on_timeout=False to get None instead of TimeoutError:

promo = browser.wait(
    ".promo-modal",
    condition=Wait().visible(),
    timeout=3.0,
    raise_on_timeout=False,
)
if promo:
    promo.click()

Network waits

Wait for a specific background API request to finish before proceeding. Uses glob-style pattern matching against response URLs:

browser.click("#fetch-data")
response = browser.wait_for_response("*/api/v1/users*", timeout=10.0)
if response:
    print(f"API status: {response['statusCode']}")

Standalone waits

Polling waits for the standalone (non-browser) mode. These call find_elements / find_text_elements in a loop:

emu = Emunium()

# Wait up to 10s for an image to appear on screen
match = emu.wait_for_image("submit_button.png", timeout=10.0, min_confidence=0.85)
emu.click_at(match)

# Wait for OCR text (requires ocr=True)
emu_ocr = Emunium(ocr=True)
hit = emu_ocr.wait_for_text_ocr("Payment Successful", timeout=30.0)
emu_ocr.click_at(hit)

# Soft standalone wait -- returns None on timeout
maybe = emu.wait_for_image("optional.png", timeout=3.0, raise_on_timeout=False)

Element API

Element instances are returned by all query and wait methods.

Properties: tag, text, attrs, rect, screen_x, screen_y, center, visible.

element.scroll_into_view()
element.hover(offset_x=None, offset_y=None, human=True)
element.move_to(offset_x=None, offset_y=None, human=True)
element.click(human=True)
element.double_click(human=True)
element.right_click(human=True)
element.middle_click(human=True)
element.type(text, characters_per_minute=280, offset=20, human=True)
element.drag_to(target, human=True)
element.focus()
element.get_attribute(name)
element.get_computed_style(prop)
element.refresh()  # re-query from page

Querying elements

browser.query_selector(selector)       # -> Element | None
browser.query_selector_all(selector)   # -> list[Element]
browser.get_by_text(text, exact=False) # -> list[Element]
browser.get_all_interactive()          # -> list[Element]

Mouse interaction

browser.click(selector, click_type=ClickType.LEFT, human=True, timeout=10.0)
browser.click_at(target, click_type=ClickType.LEFT, human=True, timeout=10.0)
browser.move_to(target, offset_x=None, offset_y=None, human=True, timeout=10.0)
browser.hover(target, ...)  # alias for move_to
browser.drag_and_drop(source_selector, target_selector, human=True)
browser.get_center(target)  # -> {"x": int, "y": int}

target can be a CSS selector string or an Element.


Keyboard interaction

browser.type(selector, text, characters_per_minute=280, offset=20, human=True)
browser.type_at(target, text, characters_per_minute=280, offset=20, human=True)

Non-ASCII text is pasted via clipboard (pyperclip). Install emunium[keyboard] for the keyboard library; otherwise pyautogui is used.


Scrolling

browser.scroll_to(element_or_selector)  # scroll element into viewport
browser.scroll_to(x, y)                 # scroll to absolute pixel coords

JavaScript execution

result = browser.execute_script("return document.title")

Tab management

browser.new_tab(url="about:blank")
browser.close_tab(tab_id=None)
browser.tab_info()  # -> dict with url, title, tabId, status
browser.page_info() # -> scrollX, scrollY, innerWidth, innerHeight, readyState, ...

PageParser and Locator

Offline HTML parsing with CSS selectors. No browser needed.

from emunium import PageParser

html = browser.execute_script("return document.documentElement.outerHTML")
parser = PageParser(html)

links = parser.locator("a[href]").all()
btn = parser.get_by_text("Sign in", exact=True).first
inputs = parser.get_by_role("textbox").all()
field = parser.get_by_placeholder("Search").first
email = parser.get_by_label("Email address").first
submit = parser.get_by_test_id("submit-btn").first

Locator supports: .first, .last, .nth(i), .all(), .count(), .inner_text(), .get_attribute(name), .filter(has_text=...).

Requires pip install "emunium[parsing]".


ClickType

from emunium import ClickType

ClickType.LEFT    # default
ClickType.RIGHT   # context menu
ClickType.MIDDLE
ClickType.DOUBLE

Optional extras

Extra What it installs What it unlocks
standalone opencv-python, numpy find_elements() image matching
ocr opencv-python, numpy, easyocr find_text_elements() OCR
parsing selectolax PageParser / Locator
keyboard keyboard Low-level keystroke delivery
pip install "emunium[standalone,parsing,keyboard]"

Advanced utilities

  • Bridge -- the raw WebSocket transport to the Chrome extension. For custom messaging outside the Browser facade.
  • CoordsStore -- thread-safe cache for element coordinates across async workflows.
  • ElementRecord -- lightweight dataclass used by CoordsStore.

ensure_chrome

from emunium import ensure_chrome

path = ensure_chrome()

Downloads the latest stable Chrome for Testing build for the current platform if not already present. Called automatically by Browser.launch().


Notes and limitations

  • Chrome only. The bridge extension targets Chrome/Chromium.
  • One active tab at a time. The bridge tracks a single pinned tab. new_tab() switches focus.
  • Parallel instances may conflict on the shared port.json. Use different bridge_port values.
  • Non-ASCII text is pasted via clipboard instead of typed keystroke-by-keystroke.
  • headless=True uses --headless=new. Coordinates still compute but the cursor is not visible. Use human=False in display-less environments.
  • Image matching uses multi-scale (0.9x, 1.0x, 1.1x) and multi-rotation (-10, 0, +10) search.

License

MIT

About

Human-like browser and desktop automation. No CDP, no WebDriver - emunium drives Chrome through a custom WebSocket bridge and performs all mouse/keyboard actions at the OS level, making scripts indistinguishable from real user input.

Topics

Resources

License

Stars

Watchers

Forks

Contributors