Athina AI: LLM Monitoring and Evaluation Platform

Athina AI: LLM Monitoring and Evaluation Platform
Do not index
Do not index
Original Paper
Blog URL
LLM developers are faced with a number of problems in production - the most fundamental of which is low visibility into the model’s performance when dealing with real production data.
This means LLM developers have:
  • Poor visibility into LLM inference calls and responses
  • No clear data or metrics around token usage, cost, latency
  • No way to measure or understand model performance in production
  • No explainability for LLM responses
As a result, they are often forced into building and maintaining such systems in-house - a huge effort for any engineering team.

Meet Athina

Athina gives you a self-serve dashboard that you can set up under 15 mins to evaluate and monitor any LLM in production.
notion image
With our dashboard you can:
  • Track your costs, token usage, response time and see user queries and LLM responses at one place
  • View the data at a global level, but also explore the cost and usage analytics at the individual customer level to see how your app is performing for each of your customers.
  • Get a comprehensive understanding of the types of mistakes your LLM is making - how, when and why.
  • Visualize performance of your LLM system over time.
  • Configure and run any of our pre-built evaluators.
If you use our evaluators to detect your LLM failures, you can view the results and in-depth insights on the dashboard itself.
 

✅ How Athina can help LLM Developers

1. High level view of your LLM usage

You can see all your user queries and responses along with metadata like model, token usage, latency, cost, environment, etc.
You can also view feedback from your end users and feedback from human graders.
notion image
 

2. Detailed client level analytics

Segment your data by customer_id, user_id or session_id to dive into how your product is performing at every level.
 

3. Understand your conversations and your users

For conversational use cases, we can help you get deeper insights by analyzing the topic, tone and sentiment of user queries and how those change over the duration of the conversation.
notion image
notion image
 

4. Understand model performance in production

Athina can help you get a comprehensive understanding of your model performance in production through our evals.
If you have configured an Athina evaluator, we will automatically run evals to detect bad outputs from your AI in production.
You can learn where your AI is responding incorrectly, and use these results to measure and improve your response quality.
notion image
 

5. Fire and forget, not a proxy endpoint

Your AI inference call is the most important part of your product.
We don’t want to interfere with that.
To use Athina, you simply have to make an API request or call a function in our Python logging SDK.
This can be done as a fire and forget response after your inference call, so it won’t have any impact on latency.
 

Let’s get started 🎉

  • Check out our open source evaluators on Github

Athina can help. Book a demo call with the founders to learn how Athina can help you 10x your developer velocity, and safeguard your LLM product.

Want to build a reliable GenAI product?

Book a demo

Written by

Shiv Sakhuja
Shiv Sakhuja

Co-founder, Athina AI