feat(search): add semantic search for AI-powered tool discovery #142

Shashikant86 · 2026-02-06T16:02:19Z

Problem

StackOne has over 10,000 actions across all connectors and growing, some connectors have 2,000+
actions alone. Keyword matching breaks down when someone searches "onboard new hire" but the
action is called hris_create_employee. In the StackOne AI SDK there is support for the keyword search and we need add the support for the semantic search using the action search service.

Implementation Details

Senatic Search added to SDK that uses the action search land benchmarked locally for now
Used the new benchmarks from the action search and the and will run against prod later

Change Summary

Add SemanticSearchClient interfacing with StackOne's /actions/search API for natural
language tool discovery, with Pydantic models for type-safe results
Add search_tools() to StackOneToolSet with over-fetching and connector filtering only
returns tools the user has linked accounts for, sorted by semantic relevance
Add search_action_names() for lightweight lookups without loading full tool definitions
Automatic fallback to local hybrid BM25+TF-IDF search when the semantic API is unavailable
Add utility tools (tool_search, tool_execute) for AI agents to dynamically discover and
execute tools at runtime, supporting both local and semantic search modes
Extend Tools with utility_tools(), get_connectors(), and filter_by_connector() for
connector-aware tool management
Include a benchmark suite with 94 evaluation tasks across 8 categories (HRIS, ATS, CRM, PM,
messaging, docs, marketing, LMS) — currently benchmarked against local endpoints. Semantic
search achieves 76.6% Hit@5 vs 66.0% for local search (+10.6% improvement)

How to Test

Run just test to verify all existing and new unit tests pass
Run just lint to confirm no linting regressions
Verify search_tools() returns relevant tools filtered to available connectors
Verify fallback to local search when semantic API is unavailable
Verify utility tools work with both use_semantic_search=True and default local mode
Run benchmark suite against local API to validate accuracy metrics

Summary by cubic

Adds semantic search to the SDK for natural language tool discovery across connectors, improving relevance over keyword search. Benchmarks show +10.6% Hit@5 vs local BM25 (76.6% vs 66.0%).

New Features
- SemanticSearchClient for /actions/search with typed models; exported in init; search_action_names added in both client and StackOneToolSet.
- StackOneToolSet.search_tools with account-scoped connector filtering, per-connector backfill when broad search is sparse, optional fallback to local BM25+TF‑IDF, and normalization/dedup of versioned action names.
- Utility tools: tool_search auto-switches to semantic when semantic_client is passed (create_semantic_tool_search); supports optional connector filter; tool_execute unchanged.
- Connector helpers: StackOneTool.connector and Tools.get_connectors for connector-aware filtering.
- Docs and examples: new Semantic Search README section, examples/semantic_search_example.py, semantic variant in utility_tools_example, plus OpenAI/LangChain usage patterns.
Bug Fixes
- tool_search schema: limit and minScore are nullable; execution handles None values.

^{Written for commit 521339b. Summary will update on new commits.}

Copilot

Pull request overview

Adds first-class semantic search capabilities to the StackOne AI SDK to improve natural-language tool discovery at scale (10k+ actions), with optional agent-facing utility tools and a benchmark harness to compare semantic vs local hybrid (BM25+TF‑IDF) search.

Changes:

Introduces SemanticSearchClient (+ Pydantic response models) for /actions/search, and exposes it via StackOneToolSet.semantic_client.
Adds StackOneToolSet.search_tools() and search_action_names() with connector-aware filtering and optional local fallback.
Extends Tools/StackOneTool with connector helpers and adds a semantic variant of the tool_search utility tool; includes benchmark script + documented benchmark results.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`stackone_ai/semantic_search.py`	New semantic search HTTP client + typed response models.
`stackone_ai/toolset.py`	Adds lazy semantic client + semantic search APIs (`search_tools`, `search_action_names`) with fallback logic.
`stackone_ai/utility_tools.py`	Adds `create_semantic_tool_search()` utility tool for agent-driven semantic discovery.
`stackone_ai/models.py`	Adds `StackOneTool.connector` plus `Tools.get_connectors()` / `filter_by_connector()` and semantic-search support in `utility_tools()`.
`stackone_ai/__init__.py`	Re-exports semantic search public API symbols.
`tests/test_semantic_search.py`	Unit + integration tests for the semantic client and toolset integration, plus connector helper tests.
`tests/benchmark_search.py`	New benchmark runner comparing local vs semantic search.
`tests/BENCHMARK_RESULTS.md`	Captures benchmark outcomes and methodology.
`README.md`	Documents semantic search at a high level.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T16:09:10Z

stackone_ai/toolset.py

+            # Fallback to local search
+            all_tools = self.fetch_tools(account_ids=account_ids)
+            utility = all_tools.utility_tools()
+            search_tool = utility.get_tool("tool_search")
+
+            if search_tool:
+                result = search_tool.execute(
+                    {
+                        "query": query,
+                        "limit": top_k,
+                        "minScore": min_score,
+                    }
+                )
+                matched_names = [t["name"] for t in result.get("tools", [])]
+                return Tools([t for t in all_tools if t.name in matched_names])
+


In the local-search fallback path, connector filtering is silently ignored and the returned Tools are not ordered by local relevance (they’re returned in all_tools iteration order, not matched_names/score order). This makes fallback behavior diverge from the semantic path and can return poorly ordered results. Consider applying the connector filter before building the local index (when connector is provided) and sorting the returned tools to match matched_names order.

Copilot · 2026-02-06T16:09:10Z

stackone_ai/toolset.py

+        except SemanticSearchError:
+            if not fallback_to_local:
+                raise
+
+            # Fallback to local search
+            all_tools = self.fetch_tools(account_ids=account_ids)
+            utility = all_tools.utility_tools()
+            search_tool = utility.get_tool("tool_search")
+
+            if search_tool:
+                result = search_tool.execute(
+                    {
+                        "query": query,
+                        "limit": top_k,
+                        "minScore": min_score,
+                    }
+                )
+                matched_names = [t["name"] for t in result.get("tools", [])]
+                return Tools([t for t in all_tools if t.name in matched_names])
+
+            return all_tools


The new fallback_to_local behavior in search_tools() is not covered by tests: tests/test_semantic_search.py doesn’t exercise the except SemanticSearchError branch (ordering, min_score handling, and connector behavior). Adding a unit test that forces SemanticSearchClient.search to raise SemanticSearchError would help prevent regressions in the documented fallback feature.

Copilot · 2026-02-06T16:09:10Z

stackone_ai/semantic_search.py

+    def search(
+        self,
+        query: str,
+        connector: str | None = None,
+        top_k: int = 10,
+    ) -> SemanticSearchResponse:
+        """Search for relevant actions using semantic search.
+
+        Args:
+            query: Natural language query describing what tools/actions you need
+            connector: Optional connector/provider filter (e.g., "bamboohr", "slack")
+            top_k: Maximum number of results to return (1-500, default: 10)
+
+        Returns:
+            SemanticSearchResponse containing matching actions with similarity scores
+
+        Raises:
+            SemanticSearchError: If the API call fails
+
+        Example:
+            response = client.search("onboard a new team member", top_k=5)
+            for result in response.results:
+                print(f"{result.action_name}: {result.similarity_score:.2f}")
+        """
+        url = f"{self.base_url}/actions/search"
+        headers = {
+            "Authorization": self._build_auth_header(),
+            "Content-Type": "application/json",
+        }
+        payload: dict[str, Any] = {"query": query, "top_k": top_k}
+        if connector:
+            payload["connector"] = connector
+
+        try:
+            response = httpx.post(url, json=payload, headers=headers, timeout=self.timeout)
+            response.raise_for_status()


SemanticSearchClient.search() documents top_k as 1–500 but does not validate inputs. Passing top_k<=0 or top_k>500 will currently be sent to the API and likely fail at runtime. Consider validating query (non-empty) and top_k range early (raising ValueError) so callers get consistent, client-side errors and you avoid unnecessary network calls.

Copilot · 2026-02-06T16:09:11Z

stackone_ai/utility_tools.py

+                "default": 5,
+            },
+            "minScore": {
+                "type": "number",
+                "description": "Minimum similarity score (0-1) to filter results (default: 0.0)",
+                "default": 0.0,


In create_semantic_tool_search(), limit and minScore have defaults but are implicitly treated as required in StackOneTool.to_openai_function() (fields are required unless nullable=True). That means agents may be forced to always provide them despite defaults. Consider marking limit and minScore as nullable=True (leaving query required) so the generated tool schema matches the intended optional-argument behavior.

Suggested change

"default": 5,

},

"minScore": {

"type": "number",

"description": "Minimum similarity score (0-1) to filter results (default: 0.0)",

"default": 0.0,

"default": 5,

"nullable": True,

},

"minScore": {

"type": "number",

"description": "Minimum similarity score (0-1) to filter results (default: 0.0)",

"default": 0.0,

"nullable": True,

Copilot · 2026-02-06T16:09:11Z

tests/benchmark_search.py

+    connector: str | None = None
+
+
+# 103 semantically-challenging evaluation queries


The comment # 103 semantically-challenging evaluation queries is inconsistent with the PR description / BENCHMARK_RESULTS.md (94 tasks) and will get stale as the list changes. Consider deriving this from len(EVALUATION_TASKS) (or updating the comment to match the actual count) to avoid confusion.

Suggested change

# 103 semantically-challenging evaluation queries

# Semantically-challenging evaluation queries

Copilot · 2026-02-06T16:09:11Z

tests/BENCHMARK_RESULTS.md

+| Method            | Hit@5      | MRR        | Avg Latency | Hits    |
+| ----------------- | ---------- | ---------- | ----------- | ------- |
+| Local BM25+TF-IDF | 66.0%      | 0.538      | 1.2ms       | 62/94   |
+| Semantic Search   | 76.6%      | 0.634      | 279.6ms     | 72/94   |
+| **Improvement**   | **+10.6%** | **+0.096** |             | **+10** |
+


The markdown tables in this file use leading || (double pipes), which renders as an extra empty column in GitHub-flavored markdown. Use single | table delimiters so the tables render correctly.

Copilot · 2026-02-06T16:09:12Z

stackone_ai/toolset.py

+            # Step 2: Over-fetch from semantic API to account for connector filtering
+            # We fetch 3x to ensure we get enough results after filtering
+            over_fetch_multiplier = 3
+            over_fetch_k = top_k * over_fetch_multiplier
+
+            response = self.semantic_client.search(
+                query=query,
+                connector=connector,
+                top_k=over_fetch_k,
+            )


search_tools() over-fetches top_k by 3x without clamping to the semantic search API limit (docstring in SemanticSearchClient.search() says 1–500). For sufficiently large top_k, over_fetch_k can exceed the API max and cause avoidable failures. Consider validating top_k and clamping over_fetch_k to the API maximum (and also guarding against non-positive values).

Copilot · 2026-02-06T16:09:12Z

stackone_ai/toolset.py

+        # Over-fetch if filtering by available_connectors
+        fetch_k = top_k * 3 if available_connectors else top_k
+
+        response = self.semantic_client.search(
+            query=query,
+            connector=connector,
+            top_k=fetch_k,
+        )


search_action_names() also multiplies top_k by 3 when available_connectors is provided, but doesn’t clamp to the semantic API’s max top_k (1–500). This can trigger API errors for larger top_k. Add validation/clamping for fetch_k similarly to search_tools().

cubic-dev-ai

3 issues found across 9 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="tests/benchmark_search.py">

<violation number="1" location="tests/benchmark_search.py:982">
P2: Connector-specific tasks aren’t filtered in local search, so local benchmark hits can come from the wrong connector and skew the comparison with semantic search. Filter local results by connector when task.connector is set.</violation>
</file>

<file name="stackone_ai/toolset.py">

<violation number="1" location="stackone_ai/toolset.py:347">
P2: Clamp the semantic search over-fetch to the API’s documented max (500). As written, `top_k * 3` can exceed the API limit and cause semantic search to fail for larger requests.</violation>

<violation number="2" location="stackone_ai/toolset.py:392">
P2: Preserve the relevance ordering from `tool_search` when falling back to local search; the current filtering returns tools in original toolset order instead of relevance order.</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Ask questions if you need clarification on any suggestion

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

tests/benchmark_search.py

stackone_ai/toolset.py

cubic-dev-ai

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="stackone_ai/utility_tools.py">

<violation number="1" location="stackone_ai/utility_tools.py:205">
P2: `limit`/`minScore` are now nullable in the semantic tool schema, but the execution path casts them directly to int/float. A null value will raise a TypeError at runtime. Either disallow null in the schema or add None handling before casting.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

stackone_ai/utility_tools.py

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="stackone_ai/utility_tools.py">

<violation number="1" location="stackone_ai/utility_tools.py:225">
P2: Using `or 5` overrides an explicit `limit=0`, so callers can no longer request zero results. Preserve 0 while still defaulting when the key is missing or None.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

stackone_ai/utility_tools.py

glebedel · 2026-02-09T09:38:56Z

tests/BENCHMARK_RESULTS.md

+| Method            | Hit@5      | MRR        | Avg Latency | Hits    |
+| ----------------- | ---------- | ---------- | ----------- | ------- |
+| Local BM25+TF-IDF | 66.0%      | 0.538      | 1.2ms       | 62/94   |
+| Semantic Search   | 76.6%      | 0.634      | 279.6ms     | 72/94   |


that's higher latency than I would have hoped, i'm assuming most of it is the network roundtrip not the actual search in the backend

For the benchmark against local lambda, this is still high as it it does performs embeddings and does semantic semantic search on prebuilt embedding. We have not done the prod benchmarks yet but with optimized infra for embedding will likely to improve the results in prod.

Benchmark needs to re-done at some point so removing this for now.

glebedel · 2026-02-09T10:19:10Z

stackone_ai/models.py

+    def utility_tools(
+        self,
+        hybrid_alpha: float | None = None,
+        use_semantic_search: bool = False,


Do we need two new params, could we instead consider presence of semantic_client as the use_semantic_search flag ?

Good point.. They actually takes the different path and Passing semantic_client actually clarify which path to take.. I need to fix that thanks 👍..

This is now based on the presence of Semantic Client .. Thanks

…resence

glebedel · 2026-02-09T11:33:41Z

tests/BENCHMARK_RESULTS.md

+| "see who applied for the role"      | greenhouse_list_applied_candidate_tags | ashby_add_hiring_team_member  |
+| "advance someone to the next round" | greenhouse_move_application            | factorial_invite_employee     |
+| "see open positions"                | teamtailor_list_jobs                   | hibob_create_position_opening |
+| "close a deal"                      | zohocrm_get_deal                       | shopify_close_order           |
+| "check course completion"           | saba_delete_recurring_completion       | saba_get_course               |
+| "update deal and notify team"       | zohocrm_get_deal                       | microsoftteams_update_team    |
+| "look up customer"                  | linear_update_customer_need            | shopify_search_customers      |


all these are surprising that semantics earch gets it so wrong. Is that because we're just looking at a single matched result?

Is it possible to commit the benchmark result actually in term of returned matches? Some of these results are puzzling

Benchmarks needs to be re-visited later.

glebedel · 2026-02-09T12:30:51Z

stackone_ai/models.py

+        """
+        return {tool.connector for tool in self.tools}
+
+    def filter_by_connector(self, connectors: list[str] | set[str]) -> Tools:


why do we need this function? Afaik we already support glob filtering which already allows this type of filtering (and more) and I don't think this is related to the semantic search

This is removed now as glob filtering via fetch_tools already covered thats.. Great catch as it was likely added speculatively.

glebedel · 2026-02-09T12:56:14Z

stackone_ai/semantic_search.py

+        self.base_url = base_url.rstrip("/")
+        self.timeout = timeout
+
+    def _build_auth_header(self) -> str:


Not as familiar with the python sdk than node one but it's worth dpuble checking whether or not we already have an http client to call the stackone API (considering we're calling the mcp server in that same SDK i'd assume we do)

This is good point about potential duplication but good candidate for further refactor of the SDK and add A shared HTTP client to make the requests .. I would add the A shared HTTP client could be a follow-up refactor as it doesn't exist in the Python version?

glebedel · 2026-02-09T12:56:52Z

stackone_ai/semantic_search.py

+        self,
+        query: str,
+        connector: str | None = None,
+        top_k: int = 10,


I think that's too low of a default, arguably we shouldn't actually have a default here since we have one in the backend and this property should be optional (i'm assuming here it is indeed optional on the backend)

Yes, made it optional now and uses the value set in the backend. Good point on magic number shouldn't be there but more we allow more it affect the context. Let's use the backend default here ..

glebedel · 2026-02-09T13:00:35Z

stackone_ai/toolset.py

+        connector: str | None = None,
+        available_connectors: set[str] | None = None,


why do we need two connector params? What's "available_connectors" and how is it different to connecto ?

available connectors shouldn't be there yet or never as filtering is done via fetch tools. replaced by account Ids. It now resolves connectors internally. Fixed this..

cubic-dev-ai

1 issue found across 10 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="stackone_ai/toolset.py">

<violation number="1" location="stackone_ai/toolset.py:476">
P2: When `account_ids` resolve to zero connectors, the empty set is falsy so connector filtering is skipped and results are returned from unrelated connectors. This breaks account scoping for accounts with no linked connectors; consider returning an empty list or treating an empty set as a valid filter state.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

stackone_ai/toolset.py

cubic-dev-ai

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="examples/utility_tools_example.py">

<violation number="1" location="examples/utility_tools_example.py:103">
P3: This comment is misleading ("onboard new hire" should not map to termination tools). Update it to reflect onboarding tools to avoid confusing users of the example.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

examples/utility_tools_example.py

Stackone and others added 9 commits February 5, 2026 11:48

Senamtic Search on action in Python AI SDK

0e0f7f3

Filter tools based on the SDK auth config and connector

0b0e9e0

Use the local benchmark from the ai-generations

736e68f

Add Semantinc search bench mark with local benchmarks

4d3deca

Fix CI lint errors

981f912

Fix the lint in the benchmark file

be6db2a

Formalise the docs and code

bcb0b87

Keep semantic search minimal in the README

0a26c57

Remove the old benchmark data

e6ab80b

Copilot AI review requested due to automatic review settings February 6, 2026 16:02

Copilot started reviewing on behalf of Shashikant86 February 6, 2026 16:02 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

cubic-dev-ai bot reviewed Feb 6, 2026

View reviewed changes

tests/benchmark_search.py Outdated Show resolved Hide resolved

stackone_ai/toolset.py Outdated Show resolved Hide resolved

stackone_ai/toolset.py Outdated Show resolved Hide resolved

implement PR feedback suggestions from cubic

96270d6

cubic-dev-ai bot reviewed Feb 6, 2026

View reviewed changes

stackone_ai/utility_tools.py Show resolved Hide resolved

fix nullable in the semantic tool schema

901d5af

cubic-dev-ai bot reviewed Feb 6, 2026

View reviewed changes

stackone_ai/utility_tools.py Outdated Show resolved Hide resolved

Shashikant86 added 3 commits February 6, 2026 17:38

limit override

94bb25d

handle per connector calls to avoid the guesswork

2c5619f

ci: trigger rebuild

4c5726d

glebedel reviewed Feb 9, 2026

View reviewed changes

simplify utility_tools API by inferring semantic search from client p…

06d6a9a

…resence

glebedel reviewed Feb 9, 2026

View reviewed changes

Benchmark update and PR suggestions

bf45364

cubic-dev-ai bot reviewed Feb 9, 2026

View reviewed changes

stackone_ai/toolset.py Show resolved Hide resolved

update the README gst

2ae1e77

cubic-dev-ai bot reviewed Feb 9, 2026

View reviewed changes

examples/utility_tools_example.py Outdated Show resolved Hide resolved

Shashikant86 added 3 commits February 10, 2026 09:38

Note on the fetch tools for actions that user expect to discover

e1fb3dd

Update examples and improve the semantic seach

4e19479

Fix ruff issues

521339b

		connector: str \| None = None


		# 103 semantically-challenging evaluation queries

	# 103 semantically-challenging evaluation queries
	# Semantically-challenging evaluation queries

		connector: str \| None = None,
		available_connectors: set[str] \| None = None,

feat(search): add semantic search for AI-powered tool discovery #142

Are you sure you want to change the base?

feat(search): add semantic search for AI-powered tool discovery #142

Uh oh!

Conversation

Shashikant86 commented Feb 6, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Implementation Details

Change Summary

How to Test

Summary by cubic

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glebedel Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Shashikant86 commented Feb 6, 2026 •

edited by cubic-dev-ai bot

Loading

glebedel Feb 9, 2026 •

edited

Loading

glebedel Feb 9, 2026 •

edited

Loading