Meet Anton: AI-Powered Search Evaluation in an API Built for Search Engineers
Some days, you get the great fortune of launching products you’ve always wanted for yourself. Today is one of those days here at Objective HQ. Meet Anton — Anton is an AI-powered search critic that lets every Search Engineer evaluate & iterate at massive scale. If you don’t have an Objective account, go grab one and try Anton out for yourself today! Your first 1,000 judgements are included in your account for free. And if you're curious, take it for a spin in the Anton Lab. And if you want to cut straight to the chase, check out the API docs!
Near-human quality judgements at scale.
The history of search is also the history of tools & methods to evaluate the relevancy of search results. Most commonly, evaluating a “search result” — a search query and result object pair — is a human task. And human judgement is (and may always be!) the highest quality judgement of a search result possible.
Sometimes you need the high-precision judgement only a human can provide. Other times, you’re looking for a “human-like” sense of the relevance of a set of search results. Unfortunately, when high-precision judgements are the only tool in your Search Engineering toolbelt, the scale & speed of experimentation is significantly limited by the time it takes a human to judge a result.
We’ve all bumped into this problem throughout our careers, and building Objective. So, we sat down to build the tool we wanted. And boy do we love what we ended up with.
Making New Use Cases Possible On-Demand
We built Anton for the use cases that weren’t possible before. Humans are still incredible at high-precision judgements, especially in search results. But a lot of new use cases open up when you can have thousands of judgements in a few minutes.
- Quickly iterating & experimenting — want to quickly validate a finetuning experiment on any model, anywhere? Get new learnings in less time than it takes to go make coffee.
- Evaluating any search platform — Anton is a phenomenal tool for measuring the relative quality of one dataset across two different search systems. Ever wondered how to measure your Elastic Search install programmatically? With Anton, it’s easy to measure a sample, or measure the whole thing.
- Production monitoring — retail & product datasets change frequently (sometimes daily!). Why not monitor your search relevance the way you would monitor every other part of your user experience? Build a quick monitoring job with Anton, hook it up to your alerting system, and never have to wonder about your production relevance again.
Let’s take it for a spin!
Anton works asynchronously, letting you make a request for evaluation and letting you poll the API for status on that request. For efficiency, you can provide Anton with multiple query+result pairs to judge independently.
To start, let’s make a cURL request for a judgement on two query+result pairs - the first is for a search query of “blue shirt”, and the second is for a search query of “summer dress”. We give it an eval_name
that we’ll use next to check for status on our evaluation:
Next, we make a GET request to the API resource for the eval_name we created just before this. This endpoint will return a status of the request, and judgements once they are available.
Retrieving all of the evaluations you have run with Anton is also about as straightforward as it gets. Make a GET request to the “evaluations” API endpoint:
And there you have it — Anton can power pretty much any evaluation scenario you can dream up. We can’t wait to see what you evaluate!