Do not index
Do not index
Original Paper
Blog URL
LLM developers are faced with a number of problems in production - the most fundamental of which is low visibility into the model’s performance when dealing with real production data.
This means LLM developers have:
- Poor visibility into LLM inference calls and responses
- No clear data or metrics around token usage, cost, latency
- No way to measure or understand model performance in production
- No explainability for LLM responses
As a result, they are often forced into building and maintaining such systems in-house - a huge effort for any engineering team.
Meet Athina
Athina gives you a self-serve dashboard that you can set up under 15 mins to evaluate and monitor any LLM in production.
With our dashboard you can:
- Track your costs, token usage, response time and see user queries and LLM responses at one place
- View the data at a global level, but also explore the cost and usage analytics at the individual customer level to see how your app is performing for each of your customers.
- Get a comprehensive understanding of the types of mistakes your LLM is making - how, when and why.
- Visualize performance of your LLM system over time.
- Configure and run any of our pre-built evaluators.
If you use our evaluators to detect your LLM failures, you can view the results and in-depth insights on the dashboard itself.
✅ How Athina can help LLM Developers
1. High level view of your LLM usage
You can see all your user queries and responses along with metadata like model, token usage, latency, cost, environment, etc.
You can also view feedback from your end users and feedback from human graders.
2. Detailed client level analytics
Segment your data by customer_id, user_id or session_id to dive into how your product is performing at every level.
3. Understand your conversations and your users
For conversational use cases, we can help you get deeper insights by analyzing the topic, tone and sentiment of user queries and how those change over the duration of the conversation.
4. Understand model performance in production
Athina can help you get a comprehensive understanding of your model performance in production through our evals.
If you have configured an Athina evaluator, we will automatically run evals to detect bad outputs from your AI in production.
You can learn where your AI is responding incorrectly, and use these results to measure and improve your response quality.
5. Fire and forget, not a proxy endpoint
Your AI inference call is the most important part of your product.
We don’t want to interfere with that.
To use Athina, you simply have to make an API request or call a function in our Python logging SDK.
This can be done as a fire and forget response after your inference call, so it won’t have any impact on latency.
Let’s get started 🎉
- Create your free account on athina.ai
- Have questions? Book a call
- Check out our open source evaluators on Github
- Learn more about our Text Summarization evaluators