-
Notifications
You must be signed in to change notification settings - Fork 0
feat(search): add semantic search for AI-powered tool discovery #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds first-class semantic search capabilities to the StackOne AI SDK to improve natural-language tool discovery at scale (10k+ actions), with optional agent-facing utility tools and a benchmark harness to compare semantic vs local hybrid (BM25+TF‑IDF) search.
Changes:
- Introduces
SemanticSearchClient(+ Pydantic response models) for/actions/search, and exposes it viaStackOneToolSet.semantic_client. - Adds
StackOneToolSet.search_tools()andsearch_action_names()with connector-aware filtering and optional local fallback. - Extends
Tools/StackOneToolwith connector helpers and adds a semantic variant of thetool_searchutility tool; includes benchmark script + documented benchmark results.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
stackone_ai/semantic_search.py |
New semantic search HTTP client + typed response models. |
stackone_ai/toolset.py |
Adds lazy semantic client + semantic search APIs (search_tools, search_action_names) with fallback logic. |
stackone_ai/utility_tools.py |
Adds create_semantic_tool_search() utility tool for agent-driven semantic discovery. |
stackone_ai/models.py |
Adds StackOneTool.connector plus Tools.get_connectors() / filter_by_connector() and semantic-search support in utility_tools(). |
stackone_ai/__init__.py |
Re-exports semantic search public API symbols. |
tests/test_semantic_search.py |
Unit + integration tests for the semantic client and toolset integration, plus connector helper tests. |
tests/benchmark_search.py |
New benchmark runner comparing local vs semantic search. |
tests/BENCHMARK_RESULTS.md |
Captures benchmark outcomes and methodology. |
README.md |
Documents semantic search at a high level. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
stackone_ai/toolset.py
Outdated
| # Fallback to local search | ||
| all_tools = self.fetch_tools(account_ids=account_ids) | ||
| utility = all_tools.utility_tools() | ||
| search_tool = utility.get_tool("tool_search") | ||
|
|
||
| if search_tool: | ||
| result = search_tool.execute( | ||
| { | ||
| "query": query, | ||
| "limit": top_k, | ||
| "minScore": min_score, | ||
| } | ||
| ) | ||
| matched_names = [t["name"] for t in result.get("tools", [])] | ||
| return Tools([t for t in all_tools if t.name in matched_names]) | ||
|
|
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the local-search fallback path, connector filtering is silently ignored and the returned Tools are not ordered by local relevance (they’re returned in all_tools iteration order, not matched_names/score order). This makes fallback behavior diverge from the semantic path and can return poorly ordered results. Consider applying the connector filter before building the local index (when connector is provided) and sorting the returned tools to match matched_names order.
stackone_ai/toolset.py
Outdated
| except SemanticSearchError: | ||
| if not fallback_to_local: | ||
| raise | ||
|
|
||
| # Fallback to local search | ||
| all_tools = self.fetch_tools(account_ids=account_ids) | ||
| utility = all_tools.utility_tools() | ||
| search_tool = utility.get_tool("tool_search") | ||
|
|
||
| if search_tool: | ||
| result = search_tool.execute( | ||
| { | ||
| "query": query, | ||
| "limit": top_k, | ||
| "minScore": min_score, | ||
| } | ||
| ) | ||
| matched_names = [t["name"] for t in result.get("tools", [])] | ||
| return Tools([t for t in all_tools if t.name in matched_names]) | ||
|
|
||
| return all_tools |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new fallback_to_local behavior in search_tools() is not covered by tests: tests/test_semantic_search.py doesn’t exercise the except SemanticSearchError branch (ordering, min_score handling, and connector behavior). Adding a unit test that forces SemanticSearchClient.search to raise SemanticSearchError would help prevent regressions in the documented fallback feature.
| def search( | ||
| self, | ||
| query: str, | ||
| connector: str | None = None, | ||
| top_k: int = 10, | ||
| ) -> SemanticSearchResponse: | ||
| """Search for relevant actions using semantic search. | ||
|
|
||
| Args: | ||
| query: Natural language query describing what tools/actions you need | ||
| connector: Optional connector/provider filter (e.g., "bamboohr", "slack") | ||
| top_k: Maximum number of results to return (1-500, default: 10) | ||
|
|
||
| Returns: | ||
| SemanticSearchResponse containing matching actions with similarity scores | ||
|
|
||
| Raises: | ||
| SemanticSearchError: If the API call fails | ||
|
|
||
| Example: | ||
| response = client.search("onboard a new team member", top_k=5) | ||
| for result in response.results: | ||
| print(f"{result.action_name}: {result.similarity_score:.2f}") | ||
| """ | ||
| url = f"{self.base_url}/actions/search" | ||
| headers = { | ||
| "Authorization": self._build_auth_header(), | ||
| "Content-Type": "application/json", | ||
| } | ||
| payload: dict[str, Any] = {"query": query, "top_k": top_k} | ||
| if connector: | ||
| payload["connector"] = connector | ||
|
|
||
| try: | ||
| response = httpx.post(url, json=payload, headers=headers, timeout=self.timeout) | ||
| response.raise_for_status() |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SemanticSearchClient.search() documents top_k as 1–500 but does not validate inputs. Passing top_k<=0 or top_k>500 will currently be sent to the API and likely fail at runtime. Consider validating query (non-empty) and top_k range early (raising ValueError) so callers get consistent, client-side errors and you avoid unnecessary network calls.
| "default": 5, | ||
| }, | ||
| "minScore": { | ||
| "type": "number", | ||
| "description": "Minimum similarity score (0-1) to filter results (default: 0.0)", | ||
| "default": 0.0, |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In create_semantic_tool_search(), limit and minScore have defaults but are implicitly treated as required in StackOneTool.to_openai_function() (fields are required unless nullable=True). That means agents may be forced to always provide them despite defaults. Consider marking limit and minScore as nullable=True (leaving query required) so the generated tool schema matches the intended optional-argument behavior.
| "default": 5, | |
| }, | |
| "minScore": { | |
| "type": "number", | |
| "description": "Minimum similarity score (0-1) to filter results (default: 0.0)", | |
| "default": 0.0, | |
| "default": 5, | |
| "nullable": True, | |
| }, | |
| "minScore": { | |
| "type": "number", | |
| "description": "Minimum similarity score (0-1) to filter results (default: 0.0)", | |
| "default": 0.0, | |
| "nullable": True, |
tests/benchmark_search.py
Outdated
| connector: str | None = None | ||
|
|
||
|
|
||
| # 103 semantically-challenging evaluation queries |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment # 103 semantically-challenging evaluation queries is inconsistent with the PR description / BENCHMARK_RESULTS.md (94 tasks) and will get stale as the list changes. Consider deriving this from len(EVALUATION_TASKS) (or updating the comment to match the actual count) to avoid confusion.
| # 103 semantically-challenging evaluation queries | |
| # Semantically-challenging evaluation queries |
tests/BENCHMARK_RESULTS.md
Outdated
| | Method | Hit@5 | MRR | Avg Latency | Hits | | ||
| | ----------------- | ---------- | ---------- | ----------- | ------- | | ||
| | Local BM25+TF-IDF | 66.0% | 0.538 | 1.2ms | 62/94 | | ||
| | Semantic Search | 76.6% | 0.634 | 279.6ms | 72/94 | | ||
| | **Improvement** | **+10.6%** | **+0.096** | | **+10** | | ||
|
|
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The markdown tables in this file use leading || (double pipes), which renders as an extra empty column in GitHub-flavored markdown. Use single | table delimiters so the tables render correctly.
stackone_ai/toolset.py
Outdated
| # Step 2: Over-fetch from semantic API to account for connector filtering | ||
| # We fetch 3x to ensure we get enough results after filtering | ||
| over_fetch_multiplier = 3 | ||
| over_fetch_k = top_k * over_fetch_multiplier | ||
|
|
||
| response = self.semantic_client.search( | ||
| query=query, | ||
| connector=connector, | ||
| top_k=over_fetch_k, | ||
| ) |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search_tools() over-fetches top_k by 3x without clamping to the semantic search API limit (docstring in SemanticSearchClient.search() says 1–500). For sufficiently large top_k, over_fetch_k can exceed the API max and cause avoidable failures. Consider validating top_k and clamping over_fetch_k to the API maximum (and also guarding against non-positive values).
stackone_ai/toolset.py
Outdated
| # Over-fetch if filtering by available_connectors | ||
| fetch_k = top_k * 3 if available_connectors else top_k | ||
|
|
||
| response = self.semantic_client.search( | ||
| query=query, | ||
| connector=connector, | ||
| top_k=fetch_k, | ||
| ) |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
search_action_names() also multiplies top_k by 3 when available_connectors is provided, but doesn’t clamp to the semantic API’s max top_k (1–500). This can trigger API errors for larger top_k. Add validation/clamping for fetch_k similarly to search_tools().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 issues found across 9 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="tests/benchmark_search.py">
<violation number="1" location="tests/benchmark_search.py:982">
P2: Connector-specific tasks aren’t filtered in local search, so local benchmark hits can come from the wrong connector and skew the comparison with semantic search. Filter local results by connector when task.connector is set.</violation>
</file>
<file name="stackone_ai/toolset.py">
<violation number="1" location="stackone_ai/toolset.py:347">
P2: Clamp the semantic search over-fetch to the API’s documented max (500). As written, `top_k * 3` can exceed the API limit and cause semantic search to fail for larger requests.</violation>
<violation number="2" location="stackone_ai/toolset.py:392">
P2: Preserve the relevance ordering from `tool_search` when falling back to local search; the current filtering returns tools in original toolset order instead of relevance order.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 4 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="stackone_ai/utility_tools.py">
<violation number="1" location="stackone_ai/utility_tools.py:205">
P2: `limit`/`minScore` are now nullable in the semantic tool schema, but the execution path casts them directly to int/float. A null value will raise a TypeError at runtime. Either disallow null in the schema or add None handling before casting.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="stackone_ai/utility_tools.py">
<violation number="1" location="stackone_ai/utility_tools.py:225">
P2: Using `or 5` overrides an explicit `limit=0`, so callers can no longer request zero results. Preserve 0 while still defaulting when the key is missing or None.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
tests/BENCHMARK_RESULTS.md
Outdated
| | Method | Hit@5 | MRR | Avg Latency | Hits | | ||
| | ----------------- | ---------- | ---------- | ----------- | ------- | | ||
| | Local BM25+TF-IDF | 66.0% | 0.538 | 1.2ms | 62/94 | | ||
| | Semantic Search | 76.6% | 0.634 | 279.6ms | 72/94 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's higher latency than I would have hoped, i'm assuming most of it is the network roundtrip not the actual search in the backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the benchmark against local lambda, this is still high as it it does performs embeddings and does semantic semantic search on prebuilt embedding. We have not done the prod benchmarks yet but with optimized infra for embedding will likely to improve the results in prod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark needs to re-done at some point so removing this for now.
stackone_ai/models.py
Outdated
| def utility_tools( | ||
| self, | ||
| hybrid_alpha: float | None = None, | ||
| use_semantic_search: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need two new params, could we instead consider presence of semantic_client as the use_semantic_search flag ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.. They actually takes the different path and Passing semantic_client actually clarify which path to take.. I need to fix that thanks 👍..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now based on the presence of Semantic Client .. Thanks
tests/BENCHMARK_RESULTS.md
Outdated
| | "see who applied for the role" | greenhouse_list_applied_candidate_tags | ashby_add_hiring_team_member | | ||
| | "advance someone to the next round" | greenhouse_move_application | factorial_invite_employee | | ||
| | "see open positions" | teamtailor_list_jobs | hibob_create_position_opening | | ||
| | "close a deal" | zohocrm_get_deal | shopify_close_order | | ||
| | "check course completion" | saba_delete_recurring_completion | saba_get_course | | ||
| | "update deal and notify team" | zohocrm_get_deal | microsoftteams_update_team | | ||
| | "look up customer" | linear_update_customer_need | shopify_search_customers | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all these are surprising that semantics earch gets it so wrong. Is that because we're just looking at a single matched result?
Is it possible to commit the benchmark result actually in term of returned matches? Some of these results are puzzling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks needs to be re-visited later.
stackone_ai/models.py
Outdated
| """ | ||
| return {tool.connector for tool in self.tools} | ||
|
|
||
| def filter_by_connector(self, connectors: list[str] | set[str]) -> Tools: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this function? Afaik we already support glob filtering which already allows this type of filtering (and more) and I don't think this is related to the semantic search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removed now as glob filtering via fetch_tools already covered thats.. Great catch as it was likely added speculatively.
| self.base_url = base_url.rstrip("/") | ||
| self.timeout = timeout | ||
|
|
||
| def _build_auth_header(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as familiar with the python sdk than node one but it's worth dpuble checking whether or not we already have an http client to call the stackone API (considering we're calling the mcp server in that same SDK i'd assume we do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good point about potential duplication but good candidate for further refactor of the SDK and add A shared HTTP client to make the requests .. I would add the A shared HTTP client could be a follow-up refactor as it doesn't exist in the Python version?
stackone_ai/semantic_search.py
Outdated
| self, | ||
| query: str, | ||
| connector: str | None = None, | ||
| top_k: int = 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's too low of a default, arguably we shouldn't actually have a default here since we have one in the backend and this property should be optional (i'm assuming here it is indeed optional on the backend)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, made it optional now and uses the value set in the backend. Good point on magic number shouldn't be there but more we allow more it affect the context. Let's use the backend default here ..
stackone_ai/toolset.py
Outdated
| connector: str | None = None, | ||
| available_connectors: set[str] | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need two connector params? What's "available_connectors" and how is it different to connecto ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
available connectors shouldn't be there yet or never as filtering is done via fetch tools. replaced by account Ids. It now resolves connectors internally. Fixed this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 10 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="stackone_ai/toolset.py">
<violation number="1" location="stackone_ai/toolset.py:476">
P2: When `account_ids` resolve to zero connectors, the empty set is falsy so connector filtering is skipped and results are returned from unrelated connectors. This breaks account scoping for accounts with no linked connectors; consider returning an empty list or treating an empty set as a valid filter state.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="examples/utility_tools_example.py">
<violation number="1" location="examples/utility_tools_example.py:103">
P3: This comment is misleading ("onboard new hire" should not map to termination tools). Update it to reflect onboarding tools to avoid confusing users of the example.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Problem
StackOne has over 10,000 actions across all connectors and growing, some connectors have 2,000+
actions alone. Keyword matching breaks down when someone searches "onboard new hire" but the
action is called
hris_create_employee. In the StackOne AI SDK there is support for the keyword search and we need add the support for the semantic search using the action search service.Implementation Details
Change Summary
SemanticSearchClientinterfacing with StackOne's/actions/searchAPI for naturallanguage tool discovery, with Pydantic models for type-safe results
search_tools()toStackOneToolSetwith over-fetching and connector filtering onlyreturns tools the user has linked accounts for, sorted by semantic relevance
search_action_names()for lightweight lookups without loading full tool definitionstool_search,tool_execute) for AI agents to dynamically discover andexecute tools at runtime, supporting both local and semantic search modes
Toolswithutility_tools(),get_connectors(), andfilter_by_connector()forconnector-aware tool management
messaging, docs, marketing, LMS) — currently benchmarked against local endpoints. Semantic
search achieves 76.6% Hit@5 vs 66.0% for local search (+10.6% improvement)
How to Test
just testto verify all existing and new unit tests passjust lintto confirm no linting regressionssearch_tools()returns relevant tools filtered to available connectorsuse_semantic_search=Trueand default local modeSummary by cubic
Adds semantic search to the SDK for natural language tool discovery across connectors, improving relevance over keyword search. Benchmarks show +10.6% Hit@5 vs local BM25 (76.6% vs 66.0%).
New Features
Bug Fixes
Written for commit 521339b. Summary will update on new commits.