Introducing

Guidepad's Managed Embeddings Service

This demo showcases the capabilities of our embeddings service. This notebook will interact with a set of APIs we offer, showing that the embeddings service can be utilized by any downstream application with internet access, or any user with their preferred promgramming language.

You can use our API to generate and save embeddings for your documents, issue semantic searches across your documents, generate summaries of your documents, and train a model to produce custom embeddings for your documents.

Notebook Outline

  1. Installation and Setup
  2. Document Insertion
  3. Document Search
  4. Document Retrieval
  5. Document Summarization

What this notebook doesn't cover

  1. Bulk insert of documents
  2. Labeling your documents
  3. Labeling document pairs as similar or different
  4. Training a model to compute custom embeddings for your documents

Installation and Setup

To set up the embeddings service, you must install the Guidepad-ML plugin, deploy a model used for generating embeddings, and create the Operations associated with the embeddings service.

Please refer to the guide on setting up the embeddings service to accomplish this.

import requests

OPERATIONS_API_BASE_URL = '[OPERATIONS-API-BASE-URL]'

Document Insertion

This demonstrates how to insert documents into the system using the API. When you insert a document, it is saved to a datastore along with an embedding associated with the document. An ID is also generated for your document. If you have a model associated with the given project that generates custom embeddings, a custom embedding is also calculated and saved alongside this document.

Input

  • document: text of the document.
  • project: name of the project to insert the document to. Projects are used to separate your or your team's work. If no project is provided, this defaults to 'default'.
  • model_name: name of the EmbeddingModel used to compute an embedding for this document. If no model_name is provided, this defaults to 'default_embedding_model'.
  • model_version: version of the EmbeddingModel used to compute an embedding for this document. If no model_version is provided, this default to 'latest'.
  • label: optionally provide a label for your document. This could be a categorical label for your document and utilized when training a model to produce custom embeddings. This technique is not demonstrated in this notebook.

Example 1

Insert the document "the runner ran in the Boston marathon" into the test project

operation_endpoint = 'embeddings/insert-document-operation'
op_input = {
    'document': '1111The runner ran in the Boston marathon.',
    'project': 'test',
    'model_name': 'default_embedding_model'
}
response = requests.post(OPERATIONS_API_BASE_URL + operation_endpoint, json=op_input)
out = response.json()
print('document_id:', out['data']['document_id'])
print('document', out['data']['document'])
print('initial_embedding', out['data']['initial_embedding'][:5])
print('custom_embedding', out['data']['custom_embedding'][:5])
document_id: df949f4730864f04b838c90d3855663e
document 1111The runner ran in the Boston marathon.
initial_embedding [-0.06356184929609299, 0.052243154495954514, -0.1602657437324524, 0.160896435379982, -0.008224453777074814]
custom_embedding []

Example 2

Insert the document "The man went to the bank of the river to swim." into the test project

operation_endpoint = 'embeddings/insert-document-operation'
op_input = {
    'document': 'The man went to the bank of the river to swim.',
    'project': 'test',
}
response = requests.post(OPERATIONS_API_BASE_URL + operation_endpoint, json=op_input)
out = response.json()
print('document_id:', out['data']['document_id'])
print('document', out['data']['document'])
print('initial_embedding', out['data']['initial_embedding'][:5])
print('custom_embedding', out['data']['custom_embedding'][:5])
document_id: 4f4ef788ae8f47cea4d9efcc600766ec
document The man went to the bank of the river to swim.
initial_embedding [0.09251442551612854, 0.038514506071805954, -0.14504171907901764, -0.1457764059305191, 0.04127735644578934]
custom_embedding []

Example 3

Insert an earnings report summary from Apple Inc. into the finance project

operation_endpoint = 'embeddings/insert-document-operation'
op_input = {
    'document': 'Apple Inc. reported earnings results for the second quarter and six months ended April 01, 2023. For the second quarter, the company reported revenue was USD 94,836 million compared to USD 97,278 million a year ago. Net income was USD 24,160 million compared to USD 25,010 million a year ago. Basic earnings per share from continuing operations was USD 1.53 compared to USD 1.54 a year ago. Diluted earnings per share from continuing operations was USD 1.52 compared to USD 1.52 a year ago.For the six months, revenue was USD 211,990 million compared to USD 221,223 million a year ago. Net income was USD 54,158 million compared to USD 59,640 million a year ago. Basic earnings per share from continuing operations was USD 3.42 compared to USD 3.65 a year ago. Diluted earnings per share from continuing operations was USD 3.41 compared to USD 3.62 a year ago.',
    'project': 'finance'
}
response = requests.post(OPERATIONS_API_BASE_URL + operation_endpoint, json=op_input)
out = response.json()
print('document_id:', out['data']['document_id'])
print('document', out['data']['document'])
print('initial_embedding', out['data']['initial_embedding'][:5])
print('custom_embedding', out['data']['custom_embedding'][:5])
document_id: 03fcac6f911b42329472c48e6c34210f
document Apple Inc. reported earnings results for the second quarter and six months ended April 01, 2023. For the second quarter, the company reported revenue was USD 94,836 million compared to USD 97,278 million a year ago. Net income was USD 24,160 million compared to USD 25,010 million a year ago. Basic earnings per share from continuing operations was USD 1.53 compared to USD 1.54 a year ago. Diluted earnings per share from continuing operations was USD 1.52 compared to USD 1.52 a year ago.For the six months, revenue was USD 211,990 million compared to USD 221,223 million a year ago. Net income was USD 54,158 million compared to USD 59,640 million a year ago. Basic earnings per share from continuing operations was USD 3.42 compared to USD 3.65 a year ago. Diluted earnings per share from continuing operations was USD 3.41 compared to USD 3.62 a year ago.
initial_embedding [-0.02217843197286129, 0.20056475698947906, -0.09654616564512253, 0.14187942445278168, -0.13079282641410828]
custom_embedding []

Document Search

Here, we demonstrate how to use the API to semantically search through your documents. This will return the most relevant documents along with the cosine similarity between each document embedding and the query embedding.

Behind the scenes, we

  1. compute an embedding for your query.
  2. calculate the cosine similarity between the query embedding and the embeddings of documents in a given project.
  3. return top_k documents with the highest cosine similarities.

Input

  • query: search query
  • model_name: name of EmbeddingModel used to encode the query. Defaults to 'default_embedding_model' if not provided.
  • model_version: version of EmbeddingModel used to encode the query. Default to 'latest' if not provided.
  • project: only search for documents in the given project. Defaults to 'default' if not provided.
  • top_k: return a maximum of top_k documents. Defaults to 5 if not provided.
  • use_custom_embedding: whether or not to utilize custom embeddings that have previously been computed for documents in this project. Defaults to False if not provided.

Example 1

Search "shopping" in the "default" project. The default project contains documents from the SNLI corpus.

operation_endpoint = 'embeddings/document-search-operation'
op_input = {
    'query': 'shopping',
    'model_name': 'default_embedding_model',
    'project': 'default',
    'use_custom_embedding': False,
    'top_k': 3
}
response = requests.post(OPERATIONS_API_BASE_URL + operation_endpoint, json=op_input)
response.json()
{'_id': '557f6432211942baaf7687928b11ed58',
 'data': {'_id': 'aa05be45187e41f8a238306c81797f80',
  'documents': [{'_id': '1ff2a6189a3b482a85c0a1837dcf84a5',
    'cosine_similarity': 0.6684434752270738,
    'document': 'A woman is in Walmart',
    'index': 0},
   {'_id': '88d3da2db5bb4a20880fb4b14274d6c9',
    'cosine_similarity': 0.664186488537493,
    'document': 'A man walks near a store',
    'index': 1},
   {'_id': '1bbbadc1db9548c789155bfb58bb7b8d',
    'cosine_similarity': 0.6538451248555877,
    'document': 'A mother is with her two children at walmart',
    'index': 2}]},
 'message': ''}

Example 2

Search "france" in the "reuters" project. The reuters project contains documents from the Reuters-21578 dataset.

operation_endpoint = 'embeddings/document-search-operation'
op_input = {
    'query': 'france',
    'model_name': 'default_embedding_model',
    'project': 'reuters',
    'use_custom_embedding': False,
    'top_k': 3
}
response = requests.post(OPERATIONS_API_BASE_URL + operation_endpoint, json=op_input)
response.json()
{'_id': 'b56412836bf44ab3807f28dad78bb3d3',
 'data': {'_id': '227adece6c6a4377821b4c4f0ed24976',
  'documents': [{'_id': '3f819d656c564e289bbbae97abb69c0a',
    'cosine_similarity': 0.5581547658419515,
    'document': 'The 1992 deadline for abolishing economic barriers within the European Community should help French economic growth and create jobs, president of the French employers\' federation CNPF Francois Perigot said.     "Having a market at our disposal which is as homogeneous and accessible as that of Europe is an incredible piece of luck," he told Le Figaro in an interview.     He said that the majority of French business leaders were enthusiastic about the abolition of barriers and saw it as an opportunity rather than a danger for their companies.     "It can permit us to return to a growth rate which is much better than we could achieve in isolation. We know that we have to reestablish growth at three pct a year to solve the enormous problems confronting us -- and I am referring mainly to unemployment," Perigot added.     Finance Minister Edouard Balladur said yesterday that French growth would be just two pct this year, the same as last year and compared with the government\'s original 2.8 pct target.  REUTER ',
    'index': 0},
   {'_id': 'e8a5151bdd1c482f91f1dcd2f9d29fea',
    'cosine_similarity': 0.5355468562830077,
    'document': "The Bank of France expects a continued revival in short-term industrial activity, but the outlook for any improvement in France's record 10.9 pct unemployment rate remains bleak, the Bank of France said in its monthly review.     The upturn in activity in all industrial sectors except the agro-food sector in February more than compensated for the fall in January, while construction and civil engineering experienced a recovery which appears likely to extend over the next few months.     Internal demand rose and the export situation improved, in particular toward the European Community (EC), the Bank said.     Stocks decreases and order book levels, with the exception of the agro-food industry, improved substantially.     In addition, retail prices and salaries stabilised last months.     Production rose in all sectors except agricultural machinery and aeronautics, where it stabilised, and ship construction, where it declined.     The car industry was the major beneficiary of the upturn in activity in February, with both domestic and export orders rising.     In the consumer goods sector, actitity rose sharply despite a fall in the household goods sector and stability in pharmaceuticals.     Among semi-finished products, output rose sharply, helped by a strong growth in construction materials.     But activity in the retail sector declined slightly over the past two months.  REUTER ",
    'index': 1},
   {'_id': '94f4a01e478445199366642384e7446c',
    'cosine_similarity': 0.5311649995033113,
    'document': 'A year after squeezing to power with a narrow bare coalition majority, Gaullist Prime minister Jacques Chirac has swept away a cobweb of controls and regulations choking the French economy.     But France is still waiting for a promised industrial recovery the government says will follow from its free market policies. Company profits and the stock market are rising. But so is unemployment. Growth is stagnant at about two pct a year and the outlook for inflation, held to a 20-year low of 2.1 pct in 1986, is uncertain.     Forced last month to cut the government\'s 1987 growth target and raise its inflation estimate, Finance Minister Edouard Balladur ruled out action to stimulate the economy. But some government supporters say they fear time for an economic miracle may be running out.     The political clock is ticking towards Presidential elections due by April next year.     France\'s economic performance, led by a mixed cast of right-wing ministers and a socialist President, has won mixed reviews from non-partisan analysts.     For Michel Develle, Director of Economic Studies at newly-privatised Banque Paribas, the government\'s outstanding achievement has been to launch "a veritable intellectual revolution" breaking the staid habits formed by centuries of state control.     "The figures may look mediocre -- neither good nor bad -- but set in their context of structural reforms, they are excellent," Develle said.     But some analysts say they fear that Balladur, chief architect of the government\'s free market policies, may be pursuing a mirage.     "The belief that economic liberalism will produce an explosion of economic forces is ideological" said Indosuez chief economist Jean Cheval. "Personally I think it\'s an illusion. Dirigisme (direction) is a basic fact of the French system, from school onwards. Ultra-liberalism is impossible."     Illusion or not, the government has pushed its vision hard. Over the past year foreign exchange and consumer price controls have been largely abolished, labour regulations have been pruned to ease the sacking of redundant workers and a hugely popular programme has been launched to sell state-owned banks and industries to private investors.     Since December, nearly five mln French investors have bought shares in Cie Financiere de Paribas <PARI.PA> and glass maker Cie de Saint-Gobain SA <SGEP.PA>, the first two state companies brought to the stock market under the 300 billion franc five-year privatisation plan.     Encouraged by an amnesty for past illegal exports of capital, and the lifting of most currency controls, money has flooded into the Paris stockmarket from abroad, helping to lift the market 57 pct last year and another 12.5 pct since December.     At the end of last year the government abolished price controls that had existed for 42 years on services such as car repairs and hairdressing, freeing from state intervention small businesses which account for some 60 pct of the French economy.     The immediate result was a 0.9 pct rise in consumer prices in January, partly responsible for a forced revision in the official 1987 inflation forecast, to 2.5 pct from two pct or less.     "But even 2.5 pct would be a fantastic result, when you consider that prices are now free for the first time since 1945," commented Develle of Paribas.     Other achievements include a major reduction in the state\'s foreign debts, and a cut in the state budget deficit to 141.1 billion francs last year, 2.5 billion francs below target and down from 153.3 billion in 1985.     But despite a healthy balance of payments surplus and a gradual improvement in industrial productivity, the French franc was forced by speculators in January into a humiliating three pct devaluation against the West German mark, its second since Chirac took power.     A recent report by the Organisation for Economic Cooperation and Development pilloried French industry for failing to produce the goods that its potential customers wanted.     Outside the mainly state-controlled high technology sectors, French industrial goods were "increasingly ill-adapted to demand" and over-priced, the report said.     French economists, including Cheval at Indosuez, agreed with the report. "One of the assumptions of the government is that if you give them freedom, the employers will invest and modernise....But nine out of ten will say yes, they like freedom, and then wait to be told which way to go," he said.     And despite rising industrial investment and the introduction of special incentives to boost youth employment, the end-1986 number of jobless was reported at a record 2.7 million, some 300,000 more than a year earlier.     The problem for the government is that there may be little more it can do to prod the economy into faster growth.     French producers failed more than most to take advantage of last year\'s oil price falls and growth hopes now rest on the shaky prospects of expansion in other industrial countries like West Germany and Japan, they say.  REUTER... ',
    'index': 2}]},
 'message': ''}

Document Retrieval

You can retrieve a document given either the document ID or the document text itself. When you retrieve a document, the document will be returned along with the document_id, initial_embedding, and custom_embedding.

Input

  • document_id: ID of the document
  • document: text of the document
  • project: only retrieve a document in project. Defaults to 'default' if not provided.
operation_endpoint = 'embeddings/retrieve-document-operation'
op_input = {
    'document': 'The runner ran in the Boston marathon.',
    'project': 'test'
}
response = requests.post(OPERATIONS_API_BASE_URL + operation_endpoint, json=op_input)
out = response.json()
print('document_id:', out['data']['document_id'])
print('document', out['data']['document'])
print('initial_embedding', out['data']['initial_embedding'][:5])
print('custom_embedding', out['data']['custom_embedding'])
document_id: ca5a0c54baca496fb53db539bc2c1fd5
document The runner ran in the Boston marathon.
initial_embedding [-0.1317865401506424, 0.13551762700080872, -0.1744004189968109, 0.034594036638736725, -0.033813755959272385]
custom_embedding []
op_input = {
    'document_id': 'ca5a0c54baca496fb53db539bc2c1fd5',
    'project': 'test'
}
response = requests.post(OPERATIONS_API_BASE_URL + operation_endpoint, json=op_input)
out = response.json()
print('document_id:', out['data']['document_id'])
print('document', out['data']['document'])
print('initial_embedding', out['data']['initial_embedding'][:5])
print('custom_embedding', out['data']['custom_embedding'])
document_id: ca5a0c54baca496fb53db539bc2c1fd5
document The runner ran in the Boston marathon.
initial_embedding [-0.1317865401506424, 0.13551762700080872, -0.1744004189968109, 0.034594036638736725, -0.033813755959272385]
custom_embedding []

Document Summarization

Summarize a document in your datastore. For now, this only allows use of a single model to produce summaries.

Input

  • document_ids: IDs of the documents to summarize.
  • max_length: maximum number of tokens in summary.
  • min_length: minimum number of tokens in summary.
  • project: Only summarize documents that are in project.

View a Reuters article we can summarize

operation_endpoint = 'embeddings/retrieve-document-operation'
op_input = {
    'document_id': 'aaa96aaab1024fb5915591a819cad73e',
    'project': 'reuters'
}
response = requests.post(OPERATIONS_API_BASE_URL + operation_endpoint, json=op_input)
out = response.json()
print(out['data']['document'])
A year after squeezing to power with a narrow bare coalition majority, Gaullist Prime minister Jacques Chirac has swept away a cobweb of controls and regulations choking the French economy.     But France is still waiting for a promised industrial recovery the government says will follow from its free market policies. Company profits and the stock market are rising. But so is unemployment. Growth is stagnant at about two pct a year and the outlook for inflation, held to a 20-year low of 2.1 pct in 1986, is uncertain.     Forced last month to cut the government's 1987 growth target and raise its inflation estimate, Finance Minister Edouard Balladur ruled out action to stimulate the economy. But some government supporters say they fear time for an economic miracle may be running out.     The political clock is ticking towards Presidential elections due by April next year.     France's economic performance, led by a mixed cast of right-wing ministers and a socialist President, has won mixed reviews from non-partisan analysts.     For Michel Develle, Director of Economic Studies at newly-privatised Banque Paribas, the government's outstanding achievement has been to launch "a veritable intellectual revolution" breaking the staid habits formed by centuries of state control.     "The figures may look mediocre -- neither good nor bad -- but set in their context of structural reforms, they are excellent," Develle said.     But some analysts say they fear that Balladur, chief architect of the government's free market policies, may be pursuing a mirage.     "The belief that economic liberalism will produce an explosion of economic forces is ideological" said Indosuez chief economist Jean Cheval. "Personally I think it's an illusion. Dirigisme (direction) is a basic fact of the French system, from school onwards. Ultra-liberalism is impossible."     Illusion or not, the government has pushed its vision hard. Over the past year foreign exchange and consumer price controls have been largely abolished, labour regulations have been pruned to ease the sacking of redundant workers and a hugely popular programme has been launched to sell state-owned banks and industries to private investors.     Since December, nearly five mln French investors have bought shares in Cie Financiere de Paribas <PARI.PA> and glass maker Cie de Saint-Gobain SA <SGEP.PA>, the first two state companies brought to the stock market under the 300 billion franc five-year privatisation plan.     Encouraged by an amnesty for past illegal exports of capital, and the lifting of most currency controls, money has flooded into the Paris stockmarket from abroad, helping to lift the market 57 pct last year and another 12.5 pct since December.     At the end of last year the government abolished price controls that had existed for 42 years on services such as car repairs and hairdressing, freeing from state intervention small businesses which account for some 60 pct of the French economy.     The immediate result was a 0.9 pct rise in consumer prices in January, partly responsible for a forced revision in the official 1987 inflation forecast, to 2.5 pct from two pct or less.     "But even 2.5 pct would be a fantastic result, when you consider that prices are now free for the first time since 1945," commented Develle of Paribas.     Other achievements include a major reduction in the state's foreign debts, and a cut in the state budget deficit to 141.1 billion francs last year, 2.5 billion francs below target and down from 153.3 billion in 1985.     But despite a healthy balance of payments surplus and a gradual improvement in industrial productivity, the French franc was forced by speculators in January into a humiliating three pct devaluation against the West German mark, its second since Chirac took power.     A recent report by the Organisation for Economic Cooperation and Development pilloried French industry for failing to produce the goods that its potential customers wanted.     Outside the mainly state-controlled high technology sectors, French industrial goods were "increasingly ill-adapted to demand" and over-priced, the report said.     French economists, including Cheval at Indosuez, agreed with the report. "One of the assumptions of the government is that if you give them freedom, the employers will invest and modernise....But nine out of ten will say yes, they like freedom, and then wait to be told which way to go," he said.     And despite rising industrial investment and the introduction of special incentives to boost youth employment, the end-1986 number of jobless was reported at a record 2.7 million, some 300,000 more than a year earlier.     The problem for the government is that there may be little more it can do to prod the economy into faster growth.     French producers failed more than most to take advantage of last year's oil price falls and growth hopes now rest on the shaky prospects of expansion in other industrial countries like West Germany and Japan, they say.  REUTER... 

Summarize this article

operation_endpoint = 'embeddings/summarize-operation'
op_input = {
    'document_ids': ['aaa96aaab1024fb5915591a819cad73e'],
    'project': 'reuters',
    'max_length': 200,
    'min_length': 100
}
response = requests.post(OPERATIONS_API_BASE_URL + operation_endpoint, json=op_input, timeout=600)
response.json()
{'_id': '6af1dd04aa0b404eb986dbf1984bb7e3',
 'data': {'_id': '3320db7988c44eefbc141e61f622cfe3',
  'documents': [{'_id': '42b7aec6da874051a848b3d371601f2f',
    'document_id': 'aaa96aaab1024fb5915591a819cad73e',
    'project': 'reuters',
    'summary': "Jacques Chirac came to power a year ago with a narrow majority. Company profits and the stock market are rising, but the economy is stagnant. The outlook for inflation is uncertain. The French franc was devalued against the West German mark in January. The political clock is ticking towards Presidential elections due by April next year. Some analysts fear that the government's free market policies may be pursuing a mirage. The belief that economic liberalism will produce an explosion of economic forces is ideological, according to Jean Cheval."}]},
 'message': 'done'}

Recent Publications

Blog

Guidepad's ML Plugin

The guidepad-ML plugin is an extension of the guidepad platform that helps users with their end-to- end ML lifecycle.

Tommy O'Keefe

Jul 28, 2023 · 10 min read read

Blog

Guidepad's Managed Embeddings Service (Part 1)

This demo showcases the capabilities of our embeddings service. This notebook will interact with a set of APIs we offer, showing that the embeddings service can be utilized by any downstream application with internet access, or any user with their preferred programming language.

Tommy O'Keefe

Aug 8, 2023 · 10 min read read

Blog

Guidepad's Managed Embeddings Service (Part 2)

Let's explore how we can leverage our REST API to save documents, compute document embeddings using a combination of pretrained open-source language models, and generate custom embeddings for your documents.

Tommy O'Keefe

Aug 8, 2023 · 10 min read read