This guide is intended to provide useful tips when writing utterances for an intent.
Variables
Variables are used for slot-filling. Each variable has an associated Entity Class (Entity Category). This variable can thus take on any entity within that entity class. The variable name can be anything.
Example:
For an intent called Provision Application, a variable called applicationName could be created with Entity Class of Application.


Utterances / Phrases
Phrases are the training examples which ICM tries to match to. They can contain variables written with the convention: ${varName}.

Single-word phrases
Single-word phrases are OK for Skills. However, they can only be matched against single-word phrases (after stopword removal) as inputted by the user. This is to limit the FP (false positive) rate.
-
Why are these okay?
-
Single-word phrases can ONLY be hit via exact match (after punctuation and stopword removal)
-
Using variables
Variables should be used whenever possible. Instead of writing out a training phrase of “book conference room”, it’s better to add a variable (varConferenceSystems) for the appropriate entity category (e.g., Conference System) and write out a training phrase like: “book ${varConferenceSystems}”.
Why should this be done?
If the training phrase of “book conference room” was used instead, if someone were to type “I want to book a conference room” in webchat, this phrase might get transformed to “I want to book a conference-room” as part of the NER step of ICM scoring. Hence, ICM may encounter a FN (false negative).
Matching / Scoring / Classification
At matching / scoring / classification time, there’s a couple of high-level steps happening.
There’s some hierarchical steps being used:
-
Exact Match
-
BOL
-
STS (for tie-breaking)
-
Notes:
-
Exact Match does exactly what it sounds like it’s doing. This layer is necessary since exact matches should result in matches.
-
BOL does bag-of-lemmas token matching. It’s the tried-and-true method for having high recall and precision (with a good annotation strategy).
-
STS stands for semantic textual similarity and is currently only used for tie-breaking BOL scores.
After the matching / scoring / classification step, slot-filling occurs to fill each variable with a value (based on which intent was matched to).
Preprocessing steps for Matching / Scoring / Classification
In order:
-
Incoming phrases are sentencized. This means they are split up into sentences, and each individual sentence is matched against the utterance space.
-
Incoming phrases are stripped of punctuation.
-
Incoming phrases go through spell-checking step.
-
Incoming phrases go through NER (named entity recognition) step.
Comments
0 comments
Please sign in to leave a comment.