Skip to content

Conversation

@lotif
Copy link
Collaborator

@lotif lotif commented Feb 11, 2026

Summary

Adding basic online evaluations for the report generation agent. Those evaluations are meant to be run by the "production" environment and will produce and upload scores to langfuse on the following:

  • Checking if the final result is present and contains a string match
  • Adding scores for latency, token count and cost
    • Those have been added to langfuse.py so they can be easily reused by other agents

This is how it those scores are displayed in the Langfuse UI:
Trace detail page:
Screenshot 2026-02-11 at 12 05 40

Dashboard page:
Screenshot 2026-02-11 at 14 58 09

Clickup Ticket(s): NA

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • 🔧 Refactoring (no functional changes)
  • ⚡ Performance improvement
  • 🧪 Test improvements
  • 🔒 Security fix

Changes Made

  • Small refactor to split report generation evaluations into online and offline
  • Adding a function to upload scores for final result against a string match
  • Adding functions to langfuse.py to upload scores on latency, token count and cost for a trace
  • Adding the function calls to send evaluations on each run of the demo UI for the report generation agent
  • Small fix to the UI for better output formatting

Testing

  • Tests pass locally (uv run pytest tests/)
  • Type checking passes (uv run mypy <src_dir>)
  • Linting passes (uv run ruff check src_dir/)
  • Manual testing performed (describe below)

Manual testing details:

Tested the UI and checked the resulting scores in langfuse.

Checklist

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Documentation updated (if applicable)
  • No sensitive information (API keys, credentials) exposed

@lotif lotif requested review from amrit110 and fcogidi February 11, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant