Athina AI: LLM Monitoring and Evaluation Platform

Do not index

Original Paper

Blog URL

LLM developers are faced with a number of problems in production - the most fundamental of which is low visibility into the model’s performance when dealing with real production data.

This means LLM developers have:

Poor visibility into LLM inference calls and responses

No clear data or metrics around token usage, cost, latency

No way to measure or understand model performance in production

No explainability for LLM responses

As a result, they are often forced into building and maintaining such systems in-house - a huge effort for any engineering team.

Meet Athina

Athina gives you a self-serve dashboard that you can set up under 15 mins to evaluate and monitor any LLM in production.

With our dashboard you can:

Track your costs, token usage, response time and see user queries and LLM responses at one place

View the data at a global level, but also explore the cost and usage analytics at the individual customer level to see how your app is performing for each of your customers.

Get a comprehensive understanding of the types of mistakes your LLM is making - how, when and why.

Visualize performance of your LLM system over time.

Configure and run any of our pre-built evaluators.

If you use our evaluators to detect your LLM failures, you can view the results and in-depth insights on the dashboard itself.

✅ How Athina can help LLM Developers

1. High level view of your LLM usage

You can see all your user queries and responses along with metadata like model, token usage, latency, cost, environment, etc.

You can also view feedback from your end users and feedback from human graders.

2. Detailed client level analytics

Segment your data by customer_id, user_id or session_id to dive into how your product is performing at every level.

3. Understand your conversations and your users

For conversational use cases, we can help you get deeper insights by analyzing the topic, tone and sentiment of user queries and how those change over the duration of the conversation.

4. Understand model performance in production

Athina can help you get a comprehensive understanding of your model performance in production through our evals.

If you have configured an Athina evaluator, we will automatically run evals to detect bad outputs from your AI in production.

You can learn where your AI is responding incorrectly, and use these results to measure and improve your response quality.

5. Fire and forget, not a proxy endpoint

Your AI inference call is the most important part of your product.

We don’t want to interfere with that.

To use Athina, you simply have to make an API request or call a function in our Python logging SDK.

This can be done as a fire and forget response after your inference call, so it won’t have any impact on latency.

Let’s get started 🎉

Create your free account on athina.ai

Have questions? Book a call

Check out our open source evaluators on Github

Learn more about our Text Summarization evaluators