Evaluations API

The Evaluations API allows you to submit and retrieve evaluation requests for your search results. You can assess how well your search queries match your data and get insights into the relevance of your results.

The Evaluation API uses our proprietary LLM-as-a-judge, named Anton, to generate relevance judgements for search results.

To use the Evaluations API, get an API key.

How it works

The evaluations API works by taking a query and an object, and produces a judgement on whether that object is relevant for the query. The object being passed is typically a result from a search system and could be a document, a product, an image, etc.. The LLM judge looks at the object and the query, and determines whether it's relevant for that query based on the content. What you get back is a score between 0 and 2, with 0 being irrelevant, 1 being partially relevant and 2 being perfectly relevant. You can use these scores to understand quality, compute metrics, train models, and more.

Learn more about how we evaluate results in Evaluating Relevance

Usage

The Evaluations API is an async endpoint, meaning using it requires two steps:

Creating the eval
Polling the ID until it succeeds, and fetching the results

Creating an evaluation

To create an evaluation, pass in pairs of (query, result object) that you want to have judged. Each pair will be judged, with the resulting (query, result object, judgement) triple being returned in the response.

In this example, I'm passing two queries for the same object. The object is a striped long sleeve shirt, and the queries are "shirts with stripes" and "solid shirts". We expect to get back a grade of "GREAT" and "BAD" for these results.

$ curl --request POST \
  --url 'https://api.objective.inc/v1/evaluations' \
  --header "Authorization: Bearer $OBJECTIVE_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{ 
    "configuration": {
      "eval_name": "Evaluations API Demo"
    },
    "data": [
      {
        "query": "shirts with stripes",
        "object_id": "597073001",
        "object": {
            "detail_desc": "Straight-cut, striped shirt in cotton and viscose twill in a relaxed fit with a turn-down collar and classic front. Long sleeves with buttoned cuffs, and a rounded hem."
        }
      },
      {
        "query": "solid shirts",
        "object_id": "597073001",
        "object": {
            "detail_desc": "Straight-cut, striped shirt in cotton and viscose twill in a relaxed fit with a turn-down collar and classic front. Long sleeves with buttoned cuffs, and a rounded hem."
        }
      }
    ]
  }' | jq  # Optional: format the response with JQ

# Response
{
  "status": "accepted",
  "id": "sb_AyvwW_NEVFarYrCjYfWa9",
  "metadata": {
    "created_at": "2024-08-12T18:45:38Z",
    "judge": "anton-0702",
    "eval_name": "Evaluations API Demo"
  },
  "message": "The evaluation has been accepted and processing has started."
}

Fetching results

The API returns an ID that can be used to poll for results. The status field will return "completed" when the evaluation is complete.

$ curl 'https://api.objective.inc/v1/evaluations/sb_AyvwW_NEVFarYrCjYfWa9' \
  --header "Authorization: Bearer $OBJECTIVE_API_KEY" | jq

# Response
{
  "status": "completed",
  "id": "sb_AyvwW_NEVFarYrCjYfWa9",
  "metadata": {
    "created_at": "2024-08-12T18:45:38Z",
    "judge": "anton-0702",
    "eval_name": "Evaluations API Demo"
  },
  "message": "The evaluation has been completed.",
  "judgements": [
    {
      "query": "shirts with stripes",
      "object_id": "597073001",
      "object": {
        "detail_desc": "Straight-cut, striped shirt in cotton and viscose twill in a relaxed fit with a turn-down collar and classic front. Long sleeves with buttoned cuffs, and a rounded hem."
      },
      "judgement": {
        "score": 2,
        "label": "GREAT",
        "explanation": "The retrieved result matches the user's query exactly. The detail description mentions that the shirt is \"Straight-cut, striped shirt in cotton and viscose twill\", which directly aligns with the user's query for \"shirts with stripes\". The description provides additional details about the shirt's fit, collar, sleeves, and hem, but the key information about it being a striped shirt is a perfect match to the query."
      }
    },
    {
      "query": "solid shirts",
      "object_id": "597073001",
      "object": {
        "detail_desc": "Straight-cut, striped shirt in cotton and viscose twill in a relaxed fit with a turn-down collar and classic front. Long sleeves with buttoned cuffs, and a rounded hem."
      },
      "judgement": {
        "score": 0,
        "label": "BAD",
        "explanation": "The given result is for a striped shirt, which does not match the query for \"solid shirts\". The result is not related to the query at all, as it describes a patterned shirt rather than a solid-colored one. Therefore, the relevance of the result to the query is rated as \"0\"."
      }
    }
  ]
}

API Endpoints

1. Request Evaluation

Endpoint: POST /v1/evaluations

Description: Submits a request for evaluation of search results based on the provided configuration and data.

Request:

curl --request POST \
  --url 'https://api.objective.inc/v1/evaluations' \
  --header 'Authorization: Bearer API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "configuration": {
      "eval_name": "My first Evaluation" # (optional)
    },
    "data": [
      {
        "query": "shirt",
        "object_id": "object123",
        "object": {
          "title": "Blue Shirt",
          "description": "A blue shirt with a classic design."
        }
      },
      {
        "query": "summer dress",
        "object_id": "object456",
        "object": {
          "title": "Floral Summer Dress",
          "description": "A floral dress perfect for any season."
        }
      }
    ]
  }'

Response:

{
  "status": "accepted",
  "metadata": {
    
    "eval_name": "My first Evaluation", # (optional)
    "anton_version": "anton-0702"
  },
  "id": "1234567890",
  "message": "Your evaluation request has been accepted and is being processed. You can check the status and retrieve the results using the results key below.",
  "results": "https://api.objective.inc/v1/evaluations/1234567890"
}

2. Get Evaluation Status and Results

Endpoint: GET /v1/evaluations/{id}

Description: Retrieves the results of a previously submitted evaluation.

Request:

curl --request GET \
  --url 'https://api.objective.inc/v1/evaluations/1234567890' \
  --header 'Authorization: Bearer API_KEY'

Responses:

While Processing:

{
  "id": "1234567890",
  "metadata": {
    "eval_name": "Elastic Search Eval",
    "anton_version": "anton-0702"
  },
  "status": "processing",
  "message": "Your request is currently being processed."
}

On Error:

{
  "id": "1234567890",
  "metadata": {
    "eval_name": "Elastic Search Eval",
    "anton_version": "anton-0702"
  },
  "status": "error",
  "message": "There was an error with your evaluation. Please try to run your evaluation again. We apologize for the inconvenience.",
  "errors": [
    "error_1": "Description of the error",
    "error_2": "Another error description"
  ]
}

When Complete:

{
  "id": "1234567890",
  "status": "completed",
  "message": "Your evaluation is complete.",
  "metadata": {
    "eval_name": "Elastic Search Eval",
    "judge": "anton-0702"
  },
  "judgements": [
    {
      "query": "shirt",
      "object_id": "object123",
      "object": {
        "id": "12345",
        "title": "Blue Shirt",
        "description": "A blue shirt with a classic design."
      },
      "judgement": {
        "score": 2,
        "label": "GREAT",
        "explanation": "The result is highly relevant and matches the search intent."
      }
    },
    {
      "query": "summer dress",
      "object_id": "object456",
      "object": {
        "id": "67890",
        "title": "Floral Summer Dress",
        "description": "A floral dress perfect for any season."
      },
      "judgement": {
        "label": 1,
        "score": "OK",
        "explanation": "The result is somewhat relevant but not the best match."
      }
    }
  ]
}

3. List Evaluations

Endpoint: GET /v1/evaluations

Description: Retrieves a list of all evaluations you have submitted

Request:

curl --request GET \
  --url 'https://api.objective.inc/v1/evaluations' \
  --header 'Authorization: Bearer API_KEY'

Response:

{
  "evaluations": [
    {
      "id": "1234567890",
      "metadata": {
        "created_at": "2024-07-12T08:55:43Z",
        "eval_name": "Elastic Search Eval",
        "judge": "anton-0702"
      }
    },
    {
      "id": "0987654321",
      
      "metadata": {
        "created_at": "2024-07-11T14:22:10Z",
        "eval_name": "Another Eval",
        "judge": "anton-0702"
      }
    }
  ],
  "pagination": {
    "current_page": 1,
    "total_pages": 1,
    "total_records": 2
  }
}

Error Handling

401 Unauthorized: Invalid API key or missing authorization.
400 Bad Request: Invalid request parameters or malformed request body.
404 Not Found: Evaluation ID not found.
500 Internal Server Error: An error occurred on the server side.

Best Practices

Throttle Requests: Implement rate limiting to avoid exceeding API limits.
Handle Errors Gracefully: Ensure your application can handle errors and retry failed requests as needed.
Secure Your API Key: Keep your API key confidential and avoid exposing it in client-side code.