Skip to content

Conversation

@Shashikant86
Copy link

@Shashikant86 Shashikant86 commented Feb 6, 2026

Problem

StackOne has over 10,000 actions across all connectors and growing, some connectors have 2,000+
actions alone. Keyword matching breaks down when someone searches "onboard new hire" but the
action is called hris_create_employee. In the StackOne AI SDK there is support for the keyword search and we need add the support for the semantic search using the action search service.

Implementation Details

  • Senatic Search added to SDK that uses the action search land benchmarked locally for now
  • Used the new benchmarks from the action search and the and will run against prod later

Change Summary

  • Add SemanticSearchClient interfacing with StackOne's /actions/search API for natural
    language tool discovery, with Pydantic models for type-safe results
  • Add search_tools() to StackOneToolSet with over-fetching and connector filtering only
    returns tools the user has linked accounts for, sorted by semantic relevance
  • Add search_action_names() for lightweight lookups without loading full tool definitions
  • Automatic fallback to local hybrid BM25+TF-IDF search when the semantic API is unavailable
  • Add utility tools (tool_search, tool_execute) for AI agents to dynamically discover and
    execute tools at runtime, supporting both local and semantic search modes
  • Extend Tools with utility_tools(), get_connectors(), and filter_by_connector() for
    connector-aware tool management
  • Include a benchmark suite with 94 evaluation tasks across 8 categories (HRIS, ATS, CRM, PM,
    messaging, docs, marketing, LMS) — currently benchmarked against local endpoints. Semantic
    search achieves 76.6% Hit@5 vs 66.0% for local search (+10.6% improvement)

How to Test

  • Run just test to verify all existing and new unit tests pass
  • Run just lint to confirm no linting regressions
  • Verify search_tools() returns relevant tools filtered to available connectors
  • Verify fallback to local search when semantic API is unavailable
  • Verify utility tools work with both use_semantic_search=True and default local mode
  • Run benchmark suite against local API to validate accuracy metrics

Summary by cubic

Adds semantic search to the SDK for natural language tool discovery across connectors, improving relevance over keyword search. Benchmarks show +10.6% Hit@5 vs local BM25 (76.6% vs 66.0%).

  • New Features

    • SemanticSearchClient for /actions/search with typed models; exported in init; search_action_names added in both client and StackOneToolSet.
    • StackOneToolSet.search_tools with account-scoped connector filtering, per-connector backfill when broad search is sparse, optional fallback to local BM25+TF‑IDF, and normalization/dedup of versioned action names.
    • Utility tools: tool_search auto-switches to semantic when semantic_client is passed (create_semantic_tool_search); supports optional connector filter; tool_execute unchanged.
    • Connector helpers: StackOneTool.connector and Tools.get_connectors for connector-aware filtering.
    • Docs and examples: new Semantic Search README section, examples/semantic_search_example.py, semantic variant in utility_tools_example, plus OpenAI/LangChain usage patterns.
  • Bug Fixes

    • tool_search schema: limit and minScore are nullable; execution handles None values.

Written for commit 521339b. Summary will update on new commits.

Copilot AI review requested due to automatic review settings February 6, 2026 16:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class semantic search capabilities to the StackOne AI SDK to improve natural-language tool discovery at scale (10k+ actions), with optional agent-facing utility tools and a benchmark harness to compare semantic vs local hybrid (BM25+TF‑IDF) search.

Changes:

  • Introduces SemanticSearchClient (+ Pydantic response models) for /actions/search, and exposes it via StackOneToolSet.semantic_client.
  • Adds StackOneToolSet.search_tools() and search_action_names() with connector-aware filtering and optional local fallback.
  • Extends Tools/StackOneTool with connector helpers and adds a semantic variant of the tool_search utility tool; includes benchmark script + documented benchmark results.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
stackone_ai/semantic_search.py New semantic search HTTP client + typed response models.
stackone_ai/toolset.py Adds lazy semantic client + semantic search APIs (search_tools, search_action_names) with fallback logic.
stackone_ai/utility_tools.py Adds create_semantic_tool_search() utility tool for agent-driven semantic discovery.
stackone_ai/models.py Adds StackOneTool.connector plus Tools.get_connectors() / filter_by_connector() and semantic-search support in utility_tools().
stackone_ai/__init__.py Re-exports semantic search public API symbols.
tests/test_semantic_search.py Unit + integration tests for the semantic client and toolset integration, plus connector helper tests.
tests/benchmark_search.py New benchmark runner comparing local vs semantic search.
tests/BENCHMARK_RESULTS.md Captures benchmark outcomes and methodology.
README.md Documents semantic search at a high level.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 379 to 394
# Fallback to local search
all_tools = self.fetch_tools(account_ids=account_ids)
utility = all_tools.utility_tools()
search_tool = utility.get_tool("tool_search")

if search_tool:
result = search_tool.execute(
{
"query": query,
"limit": top_k,
"minScore": min_score,
}
)
matched_names = [t["name"] for t in result.get("tools", [])]
return Tools([t for t in all_tools if t.name in matched_names])

Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the local-search fallback path, connector filtering is silently ignored and the returned Tools are not ordered by local relevance (they’re returned in all_tools iteration order, not matched_names/score order). This makes fallback behavior diverge from the semantic path and can return poorly ordered results. Consider applying the connector filter before building the local index (when connector is provided) and sorting the returned tools to match matched_names order.

Copilot uses AI. Check for mistakes.
Comment on lines 375 to 395
except SemanticSearchError:
if not fallback_to_local:
raise

# Fallback to local search
all_tools = self.fetch_tools(account_ids=account_ids)
utility = all_tools.utility_tools()
search_tool = utility.get_tool("tool_search")

if search_tool:
result = search_tool.execute(
{
"query": query,
"limit": top_k,
"minScore": min_score,
}
)
matched_names = [t["name"] for t in result.get("tools", [])]
return Tools([t for t in all_tools if t.name in matched_names])

return all_tools
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new fallback_to_local behavior in search_tools() is not covered by tests: tests/test_semantic_search.py doesn’t exercise the except SemanticSearchError branch (ordering, min_score handling, and connector behavior). Adding a unit test that forces SemanticSearchClient.search to raise SemanticSearchError would help prevent regressions in the documented fallback feature.

Copilot uses AI. Check for mistakes.
Comment on lines 71 to 106
def search(
self,
query: str,
connector: str | None = None,
top_k: int = 10,
) -> SemanticSearchResponse:
"""Search for relevant actions using semantic search.

Args:
query: Natural language query describing what tools/actions you need
connector: Optional connector/provider filter (e.g., "bamboohr", "slack")
top_k: Maximum number of results to return (1-500, default: 10)

Returns:
SemanticSearchResponse containing matching actions with similarity scores

Raises:
SemanticSearchError: If the API call fails

Example:
response = client.search("onboard a new team member", top_k=5)
for result in response.results:
print(f"{result.action_name}: {result.similarity_score:.2f}")
"""
url = f"{self.base_url}/actions/search"
headers = {
"Authorization": self._build_auth_header(),
"Content-Type": "application/json",
}
payload: dict[str, Any] = {"query": query, "top_k": top_k}
if connector:
payload["connector"] = connector

try:
response = httpx.post(url, json=payload, headers=headers, timeout=self.timeout)
response.raise_for_status()
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SemanticSearchClient.search() documents top_k as 1–500 but does not validate inputs. Passing top_k<=0 or top_k>500 will currently be sent to the API and likely fail at runtime. Consider validating query (non-empty) and top_k range early (raising ValueError) so callers get consistent, client-side errors and you avoid unnecessary network calls.

Copilot uses AI. Check for mistakes.
Comment on lines 307 to 312
"default": 5,
},
"minScore": {
"type": "number",
"description": "Minimum similarity score (0-1) to filter results (default: 0.0)",
"default": 0.0,
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In create_semantic_tool_search(), limit and minScore have defaults but are implicitly treated as required in StackOneTool.to_openai_function() (fields are required unless nullable=True). That means agents may be forced to always provide them despite defaults. Consider marking limit and minScore as nullable=True (leaving query required) so the generated tool schema matches the intended optional-argument behavior.

Suggested change
"default": 5,
},
"minScore": {
"type": "number",
"description": "Minimum similarity score (0-1) to filter results (default: 0.0)",
"default": 0.0,
"default": 5,
"nullable": True,
},
"minScore": {
"type": "number",
"description": "Minimum similarity score (0-1) to filter results (default: 0.0)",
"default": 0.0,
"nullable": True,

Copilot uses AI. Check for mistakes.
connector: str | None = None


# 103 semantically-challenging evaluation queries
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment # 103 semantically-challenging evaluation queries is inconsistent with the PR description / BENCHMARK_RESULTS.md (94 tasks) and will get stale as the list changes. Consider deriving this from len(EVALUATION_TASKS) (or updating the comment to match the actual count) to avoid confusion.

Suggested change
# 103 semantically-challenging evaluation queries
# Semantically-challenging evaluation queries

Copilot uses AI. Check for mistakes.
Comment on lines 12 to 17
| Method | Hit@5 | MRR | Avg Latency | Hits |
| ----------------- | ---------- | ---------- | ----------- | ------- |
| Local BM25+TF-IDF | 66.0% | 0.538 | 1.2ms | 62/94 |
| Semantic Search | 76.6% | 0.634 | 279.6ms | 72/94 |
| **Improvement** | **+10.6%** | **+0.096** | | **+10** |

Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markdown tables in this file use leading || (double pipes), which renders as an extra empty column in GitHub-flavored markdown. Use single | table delimiters so the tables render correctly.

Copilot uses AI. Check for mistakes.
Comment on lines 344 to 353
# Step 2: Over-fetch from semantic API to account for connector filtering
# We fetch 3x to ensure we get enough results after filtering
over_fetch_multiplier = 3
over_fetch_k = top_k * over_fetch_multiplier

response = self.semantic_client.search(
query=query,
connector=connector,
top_k=over_fetch_k,
)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_tools() over-fetches top_k by 3x without clamping to the semantic search API limit (docstring in SemanticSearchClient.search() says 1–500). For sufficiently large top_k, over_fetch_k can exceed the API max and cause avoidable failures. Consider validating top_k and clamping over_fetch_k to the API maximum (and also guarding against non-positive values).

Copilot uses AI. Check for mistakes.
Comment on lines 441 to 448
# Over-fetch if filtering by available_connectors
fetch_k = top_k * 3 if available_connectors else top_k

response = self.semantic_client.search(
query=query,
connector=connector,
top_k=fetch_k,
)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_action_names() also multiplies top_k by 3 when available_connectors is provided, but doesn’t clamp to the semantic API’s max top_k (1–500). This can trigger API errors for larger top_k. Add validation/clamping for fetch_k similarly to search_tools().

Copilot uses AI. Check for mistakes.
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 9 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="tests/benchmark_search.py">

<violation number="1" location="tests/benchmark_search.py:982">
P2: Connector-specific tasks aren’t filtered in local search, so local benchmark hits can come from the wrong connector and skew the comparison with semantic search. Filter local results by connector when task.connector is set.</violation>
</file>

<file name="stackone_ai/toolset.py">

<violation number="1" location="stackone_ai/toolset.py:347">
P2: Clamp the semantic search over-fetch to the API’s documented max (500). As written, `top_k * 3` can exceed the API limit and cause semantic search to fail for larger requests.</violation>

<violation number="2" location="stackone_ai/toolset.py:392">
P2: Preserve the relevance ordering from `tool_search` when falling back to local search; the current filtering returns tools in original toolset order instead of relevance order.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="stackone_ai/utility_tools.py">

<violation number="1" location="stackone_ai/utility_tools.py:205">
P2: `limit`/`minScore` are now nullable in the semantic tool schema, but the execution path casts them directly to int/float. A null value will raise a TypeError at runtime. Either disallow null in the schema or add None handling before casting.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="stackone_ai/utility_tools.py">

<violation number="1" location="stackone_ai/utility_tools.py:225">
P2: Using `or 5` overrides an explicit `limit=0`, so callers can no longer request zero results. Preserve 0 while still defaulting when the key is missing or None.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

| Method | Hit@5 | MRR | Avg Latency | Hits |
| ----------------- | ---------- | ---------- | ----------- | ------- |
| Local BM25+TF-IDF | 66.0% | 0.538 | 1.2ms | 62/94 |
| Semantic Search | 76.6% | 0.634 | 279.6ms | 72/94 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's higher latency than I would have hoped, i'm assuming most of it is the network roundtrip not the actual search in the backend

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the benchmark against local lambda, this is still high as it it does performs embeddings and does semantic semantic search on prebuilt embedding. We have not done the prod benchmarks yet but with optimized infra for embedding will likely to improve the results in prod.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark needs to re-done at some point so removing this for now.

def utility_tools(
self,
hybrid_alpha: float | None = None,
use_semantic_search: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need two new params, could we instead consider presence of semantic_client as the use_semantic_search flag ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.. They actually takes the different path and Passing semantic_client actually clarify which path to take.. I need to fix that thanks 👍..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now based on the presence of Semantic Client .. Thanks

Comment on lines 51 to 57
| "see who applied for the role" | greenhouse_list_applied_candidate_tags | ashby_add_hiring_team_member |
| "advance someone to the next round" | greenhouse_move_application | factorial_invite_employee |
| "see open positions" | teamtailor_list_jobs | hibob_create_position_opening |
| "close a deal" | zohocrm_get_deal | shopify_close_order |
| "check course completion" | saba_delete_recurring_completion | saba_get_course |
| "update deal and notify team" | zohocrm_get_deal | microsoftteams_update_team |
| "look up customer" | linear_update_customer_need | shopify_search_customers |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these are surprising that semantics earch gets it so wrong. Is that because we're just looking at a single matched result?

Is it possible to commit the benchmark result actually in term of returned matches? Some of these results are puzzling

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks needs to be re-visited later.

"""
return {tool.connector for tool in self.tools}

def filter_by_connector(self, connectors: list[str] | set[str]) -> Tools:
Copy link
Contributor

@glebedel glebedel Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this function? Afaik we already support glob filtering which already allows this type of filtering (and more) and I don't think this is related to the semantic search

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is removed now as glob filtering via fetch_tools already covered thats.. Great catch as it was likely added speculatively.

self.base_url = base_url.rstrip("/")
self.timeout = timeout

def _build_auth_header(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as familiar with the python sdk than node one but it's worth dpuble checking whether or not we already have an http client to call the stackone API (considering we're calling the mcp server in that same SDK i'd assume we do)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good point about potential duplication but good candidate for further refactor of the SDK and add A shared HTTP client to make the requests .. I would add the A shared HTTP client could be a follow-up refactor as it doesn't exist in the Python version?

self,
query: str,
connector: str | None = None,
top_k: int = 10,
Copy link
Contributor

@glebedel glebedel Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's too low of a default, arguably we shouldn't actually have a default here since we have one in the backend and this property should be optional (i'm assuming here it is indeed optional on the backend)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, made it optional now and uses the value set in the backend. Good point on magic number shouldn't be there but more we allow more it affect the context. Let's use the backend default here ..

Comment on lines 432 to 433
connector: str | None = None,
available_connectors: set[str] | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need two connector params? What's "available_connectors" and how is it different to connecto ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

available connectors shouldn't be there yet or never as filtering is done via fetch tools. replaced by account Ids. It now resolves connectors internally. Fixed this..

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 10 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="stackone_ai/toolset.py">

<violation number="1" location="stackone_ai/toolset.py:476">
P2: When `account_ids` resolve to zero connectors, the empty set is falsy so connector filtering is skipped and results are returned from unrelated connectors. This breaks account scoping for accounts with no linked connectors; consider returning an empty list or treating an empty set as a valid filter state.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="examples/utility_tools_example.py">

<violation number="1" location="examples/utility_tools_example.py:103">
P3: This comment is misleading ("onboard new hire" should not map to termination tools). Update it to reflect onboarding tools to avoid confusing users of the example.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants