Contributing to LUIS with Microsoft/Recognizers-Text – Part 3

Welcome to the final article of this series! Hopefully if you’re reading this, you know that this is article is part of an ongoing guide on how to extend the Microsoft/Recognizers-Text project to support new languages. This exciting new open-source project released by the LUIS team provides robust recognition and resolution for unit entities commonly expressed in everyday language. In our previous posts, we provided a step-by-step guide to generate new custom language definitions, and also showed you how to integrate those definitions into the project. In the last post of this series, we’ll go over how to create and run tests to validate or debug a new language model.

For reference, you can check out the previous posts linked below.

  1. Define language specific definitions – Part 1
  2. Implement language specific extractors & parsers – Part 2
  3. Test extraction and parsing to verify the new language patterns, refine and repeat

Overview

All tests for the project are located in the Specs folder, which can be found in the root directory of the repo. Both of the .NET and JavaScript platforms use the same pool of JSON tests from this folder.

Since the project supports multiple platforms, the project maintains all of the unit tests in JSON format. This allows us to define unit tests in once in one location, without having to duplicate any tests, and enables us to maintain feature parity between all supported platforms.

Defining a Test Case

Here’s a sample test case from  Number/English/NumberModel.json :

{
  "Input": "one hundred and six",
  "Results": [
    {
      "Text": "one hundred and six",
      "TypeName": "number",
      "Resolution": {
        "value": "106"
      }
    }
  ]
}

In the JSON format above, the input string represents the sample utterance to test, and the results array includes the entity type, along with the resolution that we expect.  There are also 4 optional attributes to you can add to a test case:

  • Debug – this test will trigger a break point before executing an extractor or parser
{
  "Input": "one hundred and six",
  // triggers a debugger breakpoint before invoking parser or extractor
  "Debug": true,
  "Results": [
    {
      "Text": "one hundred and six",
      "TypeName": "number",
      "Resolution": {
        "value": "106"
      }
    }
  ]
}
  • NotSupported – Indicates that the test case can be skipped for a specified platform which may not be supported yet, but still run for the other platforms
{
  "Input": "one hundred and six",
  // Test will not run for JavaScript, warning will be generated instead
  "NotSupported": "javascript",
  "Results": [
    {
      "Text": "one hundred and six",
      "TypeName": "number",
      "Resolution": {
        "value": "106"
      }
    }
  ]
}
  • NotSupportedByDesign – Test is ignored or skipped with no warning. Possible values are: dotNet, javascript, python. De-lineate each by comma if more than one target language to ignore is desired.
  • Context – Used to provide contextual argument to the extractor or model. This attribute is used mostly to set a reference date object for the DateTime recognizers.
{
  "Input": "I'll go back today",
  // Add contextual arguments to test case
  "Context": {
    "ReferenceDateTime": "2016-11-07T00:00:00"
  },
  "Results": [
    {
      "Text": "today",
      "TypeName": "datetimeV2.date",
      "Resolution": {
        "values": [
          {
            "timex": "2016-11-07",
            "type": "date",
            "value": "2016-11-07"
          }
        ]
      }
    }
  ]
}

.NET

Adding a new Culture

The Recognizers-Text project within the solution is what provides the interfaces for extracting and parsing which the other unit-specific recognizers use. In this project all that needs to be done is to add a new constant name and language code for the new model to be added.

The .NET project uses the standard test tools framework available to Visual Studio.

TestHelpers.cs as the name suggests, include helper methods to perform much of the automated unit testing to run.

In this file, we simply need to include the namespace for the new language model, and add the appropriate references as required.

Test Classes

In the .NET solution, test classes are decorated using the [TestClass] attribute. Each different unit recognizer will have one different test class per language per the following format:

Test{Model}_{Language}

In this class, define a test method for each unit model you wish to run in the following format:

[DataSource("Microsoft.VisualStudio.TestTools.DataSource.CSV", "{Model}-{Language}.csv", "{Model}-{Language}#csv", DataAccessMethod.Sequential)]
[TestMethod]
public void {Model}()
{
    base.Test{ModelType}();
}

For example, adding the number model in english, the test method is defined as follows:

Running the tests

There are two ways to run your unit tests in the .NET project. The first is to simply use the test explorer in visual studio. In the above menu, select Test and the drop down will provide you with several options for running and debugging tests.

Selecting ‘All Tests’ like above

Alternatively, you can run tests from the command line using these steps:

  1. Open the “Developer Command Prompt for Visual Studio” (from Windows’ Start menu)
  2. CD to the location of the Microsoft.Recognizers.Text .NET solution
  3. Execute the VS Test console:
    > vstest.console.exe

Running with the debug option in visual studio, the debugger will break just before calling the extractor or parser. You can also set breakpoints normally, to walk through each and every step in the process to gain a stronger understanding of how everything works under the hood.

Node.js

The JavaScript (Node.js) project runs the same tests from the Specs folder using the AVA Test Runner. This is already included as part of the devDependencies within the project’s package.json file, so it should already be installed. Inside the JavaScript project includes a folder ‘test‘ which includes the necessary test scripts which match the test cases in the Specs folder to the appropriate recognizer models in the project.

Adding a new Culture

In cultures.js,  simply add the new language configuration to the exports using the following format:

'NewLanguage' : Culture.supportedCultures.find(c => c.cultureCode === Culture.NewLanguage)

Running the tests

Cd into the root folder of the JavaScript project. From here, you can simply run the command ‘npm run test‘ which will initiate the AVA test runner:

Alternatively, to add some visuals to testing, run the command ‘npm run browser-test‘. This will start up a local server which we’ve defaulted to localhost:8001.  

Simply apply the link to your browser and you should be able to visually see which tests pass/fail!

These tests can also be debugged using Visual Studio Code. Simply hit F5 (start debugging) and Visual Studio will run in debug mode using the AVA test runner. Similarly to .NET, just place breakpoints where desired and you can easily walk through the code under the hood.

Tip

As you debug a new language model, you may find that the regex definitions may need to be edited, or more may need to be added. This can be mitigated somewhat with careful planning, but every human language will have it’s own unique characteristics which will drive how you choose to abstract the language definitions. Whenever a definition needs to be changed, simply regenerate the custom language definitions as shown in part 1, and proceed with debugging.

Summary

This concludes our three part series on contributing to the Recognizers-Text project, and you should have everything you need to create your own language model. In part 1, we showed you how to generate your own language-specific definition files. Then in part 2, we described how to consume those definitions while also describing the basics of entity extraction and parsing. In this final post, we demonstrated how testing is done to validate a language model. All test cases are defined in JSON format in the Specs folder of the repo, and can easily be used for any platform the project supports.

Hopefully this has given you some insight into the world of natural language processing, and provided enough of a road map to create your own contributions!

Happy Making!

Matthew Shim from the Bot Framework Team.