Zoom In Semantic Text Searches with Highlights, Now in Self-Service Private Beta

April 19, 2024

Team Objective

We’re excited today to introduce you to Text Highlights, shipping into self-service Private Beta today! Text Highlights enable your search to zoom into smaller parts of your content to bring in finer-grained understanding and bring out better search experiences. If you already have an Objective account, you can jump into your Console, grab the Highlights docs, and get building. If you don’t have an account yet, get your name on the waitlist! We’re continuing to onboard developers as fast as we can, and we’ll let you know when your account is ready.

Zoom your semantic text searches in.

By default, your Text Indexes in Objective understand the ‘whole Object’ in your Object Store — understanding the relative relationships of each object to each other & your users’ queries. For some tasks, though, you’ll need more control over how your Text Indexes understand your Objects, and more insight over which part of an Object relates to a search query.

Pairing that control with the semantic understanding of intent built into all Objective Indexes gives your app a pretty powerful new superpower. In traditional keyword search, you could search Alice in Wonderland for “rabbit” and never find one of our favorite under-celebrated characters, March Hare. Semantic search will match him without breaking a sweat. “Rabbit tea party” won’t match anything at all with traditional keyword search. Semantic search with Highlights takes you right to the beginning of Chapter Six, where things get weird & magical 🐇.

A Text Index with highlights enabled also hydrates the semantically-matched text as part of your search results - so you can render this content in your search results UI however it makes most sense to your user experience.

You can imagine the kind of experiences you can build around searching call transcripts, long articles, customer support interactions, or just about anything else you can think of.

Grab your CLI and let’s get building.

We’re going to be using Python for this example, with our new Python library. But if Typescript is more your speed we have you covered — grab the Typescript library. And we’re going to be using Text Indexes - for a quick primer on Text Indexes and how they work, check out this post.

First thing’s first - let’s make sure we’re running the latest objective-sdk that includes Text Highlights support:


% pip3 install --upgrade objective-sdk

Each Object in your Object Store is a structured JSON document that can contain different datatypes — strings, numbers, or URIs to crawlable images. The structure of your Objects can be whatever you want! For this build, let’s say each Object in our Object Store is a chapter of our favorite book, with a simple structure like this:


{
	"chapter_title": "Chapter 1",
	"chapter_content": "Alice was beginning to get very tired of sitting..."
}

Naturally, you might be worried that the chapter_content field we’re creating in each Object could easily be thousands of words long. And maybe your content doesn’t use standard delimiters between sentences or sections of content. Don’t fret. Objective Search handles this for you without breaking a sweat, and intelligently delimits on sentences by default. So hold this thought, and we’ll come back to this in the next section.

Let’s write some code.

Now, grab your favorite .py file and let’s set up the basics & upsert some Objects to your Object Store. Let’s assume we have a few thousand individual customer support conversations that we want to upsert - the first one would look like this:


from objective import Client
import requests

client = Client(api_key="YOUR_API_KEY")

# URL where Alice in Wonderland text is located
url = "https://www.wolframcloud.com/objects/e0ae5dd2-f162-4a90-874f-c7bcd862fa85"    # Fetch the text of Alice in Wonderland

# Fetch the data
alice_data = requests.get(url).json()

# Function to split the Alice in Wonderland into individual chapters
def split_into_chapters(text):
# ... magic stuff

# Splits json into individual chapters
chapters = split_into_chapters(alice_data)

# Let's create a quick list of Objects to be upserted into your Object Store
# Create the list of objects for each chapter to be upserted into your Object Store
objects = []
for index, chapter_content in enumerate(chapters):
		chapter_objects = {
				"title": "Alice in Wonderland",
				"chapter_title": f"Chapter {index +1}",
				"chapter_content": chapter_content
		}
		objects.append(chapter_objects)

# And let's upsert them!
client.object_store.upsert_objects(objects)

Just upsert and go. We’ll handle all the messy ‘chunking’ business.

One of the things developers commonly run into when trying to home-grow AI-native search is ‘chunking’ and segmentation strategy — how to break apart chunks of text in order to optimize for the various content pre-processing tasks any search system has to do. Good news for you, friend — Objective Search handles segmenting for you. And exposes an elegant, API-driven control surface for you when you need more fine-grained steering. You just upsert Objects and start building.

Now that your Object Store has some Objects, let’s create a Text Index — this is where Text Highlights starts to do a lot of heavy lifting automatically for you (and you start to look like a hero at work):


index = client.indexes.create_index(
	index_type="text",
	highlights= {
	  "text": True
	}  
	fields={
		"searchable": ["chapter_content"],
		"segment_delimiter": {"chapter_content": "\n\n"}
	}
)

There are a few important things happening here. First, highlights={"text":True} tells Objective to create the Text Index with Highlights enabled. Importantly, Highlights is an Index-level feature that needs to be enabled at Text Index creation.

And second, adding the segment_delimiter gives you an additional level of control about how text content is split into segments to be interpreted by your Text Index. By default, your Text Index will segment by sentence and combine them into highlights. Depending on how your content is structured, it may be helpful to specify your own delimiter for the Index to anticipate so that the desired context stays together. If you need your Index to zoom in on multiple fields in your Objects, you can define one delimiter per Object field in the segment_delimiter property.

And now you have a Text Index with Highlights enabled! Your Index will get to work parsing & understanding the Objects in your Object Store. Your Text Index automatically processes (and reprocesses!) your Objects as they’re added, edited, or removed from the Object Store. At any time, you can check on indexing status by calling — you guessed it — .status(), which returns a queue of processing operations that you can use to make decisions on what to do next:


{'UPLOADED':1, 'PROCESSING':0, 'READY':0, 'ERROR':0}

And now we’re ready to do some searching 🔥. Getting search results for a query is as simple as:


index.search(query="chapters with rabbits", result_fields="highlights", object_fields="*")

With Highlights enabled, your Text Index will return an additional highlights field in the standard .search() response. There’s a ton of valuable information that gets hydrated here — let’s unpack it.

At a high level, the structure of the response looks like this:


{
	"results": [
		{
			"id": ...
			"object": {
				"title": "Alice in Wonderland",
        "chapter_title": "I-DOWN THE RABBIT-HOLE",
	      "chapter_content": "Alice was beginning to get very tired of sitting by her sister ..."
			}
			
			"highlights": [ ... ]
		}
	]
}

The important part is the inclusion of the extra highlights property, an array that holds all highlight objects for that individual search result (in this case, a chapter). This is what each object in the highlights array looks like:


{
	"highlight_type": "text",
	"references": [
		{
			"source": "chapter_content",
			"position": {
				"start_char": 0,
				"end_char": 775
			}
		}
	],
		
	"highlight": {
		"text": "Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do. Once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, \"and what is the use of a book,\" thought Alice, \"without pictures or conversations?\" \n\nSo she was considering in her own mind (as well as she could, for the day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her. \n\nThere was nothing so very remarkable in that, nor did Alice think it so very much out of the way to hear the Rabbit say to itself, \"Oh dear! Oh dear! I shall be too late!\""
	}
}

The references property of each highlight object contains a source property indicating the field of the search result from which the highlighted text originated and a position object with start_char and end_char properties specifying the character positions of the highlighted text within that field.

This is especially neat, because it lets you construct interesting search results UI directly from the response object, instead of needing to round-trip to the server in order to show UI that “peeks” into the relevant content to help build confidence that the item is or isn’t the result they were looking for.

We can’t wait to see what you build!

You can see how powerful enabling Highlights on your Text Indexes can be for some build scenarios you might have.

You might be searching books like we did here — but you might also be searching customer support conversations. Or voice & chat transcripts. Or long editorial articles. Emails. The possibilities are almost endless.