0

I'm trying to use Azure AI Search to return back the specific pages from within a group of pdfs that match a search query. Right now I'm using the "generateNormalizedImagePerPage" image action to turn each page to an image, and then using the OcrSkill to read the text from the generated images. This allows me to split the content, but the problem is that when you query the index, it returns the entire pdf documents instead of just the specific pages that match.

I thought that I could use index projections to get each page of the pdf as a separate document in the search index.

This is what I tried. I created the index.

var index = new SearchIndex(name: "myindex")
{
    Fields =
    [
        new SearchField (name: "id", type: SearchFieldDataType.String) 
            { IsSearchable = true, IsKey = true, },
        new SearchField (name: "content", type: SearchFieldDataType.String) 
            { IsFilterable = true, IsKey = false },
        new SearchField (name: "pagetext", type: SearchFieldDataType.String) 
            { IsSearchable = true },
        new SearchField (name: "pagenumber", type: SearchFieldDataType.String) 
            { IsSearchable = true }
    ]
};

And then I created the index projections setting the projection mode to skip indexing parent documents. I also set the parentKeyFieldName to "content" because this article says that this field must be an Edm.String, can't be the key field, and must have Filterable set to true.

var mappings = new List<InputFieldMappingEntry>
{
    new (name: "pagetext")
    {
        Source = "/document/normalized_images/*/text"
    },
    new (name: "pagenumber")
    {
        Source = "/document/normalized_images/*/pageNumber"
    }
};

var selectors = new List<SearchIndexerIndexProjectionSelector>
{
    new (targetIndexName: "myindex",
         parentKeyFieldName: "content",
         sourceContext: "/document/normalized_images/*",
         mappings: mappings)
};

var indexProjections = new SearchIndexerIndexProjections(selectors)
{
    Parameters = new SearchIndexerIndexProjectionsParameters
    {
        ProjectionMode = IndexProjectionMode.SkipIndexingParentDocuments
    }
};

My problem is that I get an error when trying to create my skillset.

One or more index projection selectors are invalid. 
Details: Index 'myindex' must contain field 'content', it must be of type Edm.String, 
cannot be the key field and it must be filterable.

This error confuses me because I thought I met all the requirements for the targetIndexName specified in the article:

  • Must already have been created on the search service before the skillset containing the index projections definition is created.
  • Must contain a field with the name defined in the parentKeyFieldName parameter. This field must be of type Edm.String, can't be the key field, and must have filterable set to true.
  • The key field must have searchable set to true and be defined with the keyword analyzer.
  • Must have fields defined for each of the names defined in mappings, none of which can be the key field.

1 Answer 1

1

The content field in your index doesn't meet the requirements specified in the error message. we must have an index like this


 Fields =
                {
                    new SearchField("id", SearchFieldDataType.String) { IsSearchable = true, IsKey = true },
                    new SearchField("content", SearchFieldDataType.String) { IsSearchable = true, IsFilterable = true },
                    new SearchField("pagetext", SearchFieldDataType.String) { IsSearchable = true },
                    new SearchField("pagenumber", SearchFieldDataType.Int32) { IsFilterable = true }
                }

Modified the creation of the skillset to include the index projections and create skills. The CreateOrUpdateDemoSkillSetWithIndexProjections method now takes indexProjections as an additional parameter and sets it in the skillset's indexing options.

Note:

The Entity Recognition skill (v2) (Microsoft.Skills.Text.EntityRecognitionSkill) is now discontinued replaced by Microsoft.Skills.Text.V3.EntityRecognitionSkill. Follow the recommendations in Deprecated skills to migrate to a supported skill.

code is taken from git

 private static SearchIndexerSkillset CreateOrUpdateDemoSkillSet(SearchIndexerClient indexerClient, IList<SearchIndexerSkill> skills, string azureAiServicesKey)
 {
     // Azure AI services was formerly known as Cognitive Services.
     // The APIs still use the old name, so we need to create a CognitiveServicesAccountKey object
     SearchIndexerSkillset skillset = new SearchIndexerSkillset("demoskillset", skills)
     {
         Description = "Demo skillset",
         CognitiveServicesAccount = new CognitiveServicesAccountKey(azureAiServicesKey)
     };

     // Create the skillset in your search service.
     // The skillset does not need to be deleted if it was already created
     // since we are using the CreateOrUpdate method
     try
     {
         indexerClient.CreateOrUpdateSkillset(skillset);
     }
     catch (RequestFailedException ex)
     {
         Console.WriteLine("Failed to create the skillset\n Exception message: {0}\n", ex.Message);
         ExitProgram("Cannot continue without a skillset");
     }

     return skillset;
 }




Output: enter image description here

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the response. I realized that I didn't set IsSearchable to true on the content field. However, when I made that change I'm still getting the same error that the field is invalid. I'm setting the index projections when I define the skillset.
can you delete the before index, Indexers, skillset and run the code again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.