Two problems confronting builders of distributed systems are:
In guidepad, both of these questions are answered by one powerful abstraction: the Operation.
Operations, at their core, are functions. They take some input in a defined schema, pass that input to a handler, and return some output in another defined schema. What guidepad does differently is treat each of these facts about a function as data and then stores those data in some storage engine. A concrete Operation can then be constructed from the stored definition at runtime.
Here's an example of how an Operation instance could be created with guidepad's CLI:
$ guidepad entity create operation - <<EOF
{"name": "my_demo_operation", "implemented_at": "guidepad.operations.builtin.demo.echo:echo"}
> EOF
This demo operation definition is pretty simple - we created an operation named my_demo_operation
and pointed its implementation to a Python function echo
in the guidepad.operations.builtin.demo.echo
module. Importantly, creating this definition makes the same operation immediately available to every user and process with access to the guidepad instance, no matter their location.
Let's test out running our new operation via the CLI:
$ guidepad operation chain 'my_demo_operation(input_str="hi")'
hi
{'_id': 'f67dd602d9664cdb9cf8516dde94c3b2', 'output_str': 'hi'}
--------
The operation chain
command in guidepad allows users to call operations with inline Python syntax, passing the input as arguments to a function named after the operation (multiple operations can be chained together into simple pipelines with the |
character). If we want to know a bit more about the operation, we can fetch its definition with entity list
:
$ guidepad entity list operation '{"name": "my_demo_operation"}' -a name -a input_type -a output_type
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ _id ┃ name ┃ input_type ┃ output_type ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ ada0b270ee9941acaf9e4cd9b4aa629c │ my_demo_operation │ echo_input │ echo_output │
└──────────────────────────────────┴───────────────────┴────────────┴─────────────┘
You'll notice in this operation definition that the input_type
and output_type
are set to the names of two guidepad types, but in the creation of the operation we only provided the name
and implementation_path
. The types for the input and output were pulled in from the implementation:
### contents of echo.py
from guidepad.types.base_type import BaseType
from guidepad.types import attributes
class EchoInput(BaseType):
input_str = attributes.String()
class EchoOutput(BaseType):
output_str = attributes.String()
def echo(op_input: EchoInput) -> EchoOutput:
print(op_input.input_str)
return EchoOutput(output_str=op_input.input_str)
Guidepad noticed that the parameter and function return type hints were set to guidepad types, and used those as the intput and output types for the operation definition. Because the input and output schemas are types, and types are also data within guidepad, we can introspect the intput/output types (useful if one ever forgets the input a particular operation takes!) :
$ guidepad types describe-type echo_input
echo_input attributes
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ attribute ┃ data_type ┃ default_value ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ _id │ string │ <guidepad.types.attributes.UnsetValue object at 0x7fbb7c799cf0> │
│ input_str │ string │ <guidepad.types.attributes.UnsetValue object at 0x7fbb7c799cf0> │
└───────────┴───────────┴─────────────────────────────────────────────────────────────────┘
One of the biggest powerups achieved by storing operation definitions as data is the ability to edit their behavior at runtime. What if a different handler was implemented that took the same input as echo
, but also output the length of the input string? In guidepad we can switch between the two handlers by simply editing the operation definition:
guidepad entity edit operation my_demo_operation -a name
1 _id: ada0b270ee9941acaf9e4cd9b4aa629c
2 active: true
3 batch: false
...
13 group: null
14 implemented_at: guidepad.operations.builtin.demo.echo:echo2
15 input_type: echo_input
16 is_async: false
17 name: my_demo_operation
18 operations_api: null
19 output_type: echo_output
20 path: null
21 provided_by: null
22 type: null
entity edit
opens the default editor and populates it with the YAML version of the entity in question. Here we update implemented_at
to point at echo2
. If we fetch the operation we can see the output type is now echo_output2
, which again was pulled from the type hints on the function definition.
$ guidepad entity list operation '{"name": "my_demo_operation"}' -a name -a input_type -a output_type
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ _id ┃ name ┃ input_type ┃ output_type ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ ada0b270ee9941acaf9e4cd9b4aa629c │ my_demo_operation │ echo_input │ echo_output2 │
└──────────────────────────────────┴───────────────────┴────────────┴──────────────┘
$ guidepad operation chain 'my_demo_operation(input_str="hi")'
{'_id': '16d2f1e0ce6348a885c0d7e64f330e10', 'output_str': 'hi', 'output_len': 2}
--------
Executing the operation we can see that the output has changed to include the length of the input string. What if the developer of echo2
didn't use type hints? How could we change the output type of the operation to match the new return value of the handler? Since the input and output types on an operation are just pointers to types, we can define a new type inline in the attribute:
1 _id: ada0b270ee9941acaf9e4cd9b4aa629c
2 active: true
3 batch: false
...
13 group: null
14 implemented_at: guidepad.operations.builtin.demo.echo:echo2
15 input_type: echo_input
16 is_async: false
17 name: my_demo_operation
18 operations_api: null
19 output_type: |
20 {"name": "echo_output_2", "attributes": {"output_str": {"data_type": "string"}, "output_len": {"data_type": "int"}}}
21 path: null
22 provided_by: null
23 type: null
$ guidepad operation chain 'my_demo_operation(input_str="hi")'
{'_id': 'f5eb2c8cff9d402c8c29f121f93b01e2', 'output_str': 'hi', 'output_len': 2}
Because guidepad's datastores serve as the central point of truth, editing the definition in this way changes behavior across the entire system without any code changes/pushes required.
The previous section detailed how operations solve the first problem presented in this blog: defining re-usable bits of logic and typing their inputs and outputs at runtime. But what about making this logic portable? That's where guidepad's support for distributed execution of operations comes into play.
For the purposes of this demonstration, we'll lean on the list_service
operation - an operation that comes pre-installed with every guidepad instance. It can be described with the entity list
command:
guidepad entity list operation '{"name": "list_service"}'
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ _id ┃ name ┃ group ┃ type ┃ active ┃ path ┃ operations… ┃ provided_by ┃ implemente… ┃ input_type ┃ output_type ┃ framework_s… ┃ batch ┃ is_async ┃ disable_au… ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ list_servi… │ list_servi… │ autogenerat… │ guidepad.b… │ True │ autogenerat… │ <guidepad.… │ <guidepad.t… │ None │ list_servic… │ list_servi… │ runtime_typ… │ False │ False │ False │
│ │ │ │ │ │ │ object at │ object at │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ 0x7f0015fe… │ 0x7f0015fe9… │ │ │ │ │ │ │ │
└─────────────┴─────────────┴──────────────┴─────────────┴────────┴──────────────┴─────────────┴──────────────┴─────────────┴──────────────┴─────────────┴──────────────┴───────┴──────────┴─────────────┘
Of course, we can run the operation locally and get some results:
$ guidepad operation chain 'list_service(name={"op": "$eq", "value": "my_local_ecr_helper"})'
{'_id': '86e800825f96412084e12aff3b11d074', 'count': 1, 'instances': [{'_id': 'a6e8865ca4764b39873f1d027093a6d9', 'name': 'my_local_ecr_helper', 'deployed_on': None, 'requirements': [], 'log_level': 'INFO', 'environment': None, 'environment_variables': [], 'current_states': ['1fbc923a3dc84acc9e264471e4d99cc6'], 'state_machine': None, 'state_plans': ['default_k8s_ecr_helper_deploy'], 'monitor_interval': 10, 'service_type': 'k8s_ecr_helper', 'version': None, 'service_hosts': [], 'user': None, 'aws_account_id': '', 'aws_region': '', 'aws_access_key': '', 'aws_secret_key': ''}]}
--------
But we can also run the operation asynchronously in a distributed manner:
$ guidepad entity duplicate operation list_service -s name list_service_async -s is_async true
{'ok': True, 'message': 'Made a copy of operation-list_service with new id: 0cb5c6dd330b4c6e9cea6c95f8bd831a'}
$ guidepad operation chain 'list_service_async(name={"op": "$eq", "value": "my_local_ecr_helper"})'
{'invocation_id': 'a4e262a371804a58bd1d1725ae55c856'}
--------
$ guidepad entity list operation_execution '{"operation": "0cb5c6dd330b4c6e9cea6c95f8bd831a"}'
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ _id ┃ operation ┃ caller_id ┃ caller_type ┃ started_at ┃ ended_at ┃ error ┃ output_artifact ┃ user ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 7795c04f8ff24bef95d210ac7d001a9b │ <guidepad.types.attributes.ReferenceCollection object at 0x7f364a0063e0> │ 0d3fd45321e0450e996a456916ea5157 │ job_invocation │ 2023-11-28 16:32:52.123817+00:00 │ 2023-11-28 16:32:52.131200+00:00 │ False │ <guidepad.types.attributes.ReferenceCollection object at 0x7f364a006080> │ <guidepad.types.attributes.ReferenceCollection object at 0x7f364a006290> │
└──────────────────────────────────┴──────────────────────────────────────────────────────────────────────────┴──────────────────────────────────┴────────────────┴──────────────────────────────────┴──────────────────────────────────┴───────┴──────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────────────────────┘
$ guidepad artifact retrieve 36dcf117820949918a793653444c52da async_list_service.json
$ cat async_list_service.json
{"_id": "294e782857d0430a9c09be79c962afd6", "count": 1, "instances": [{"_id": "a6e8865ca4764b39873f1d027093a6d9", "name": "my_local_ecr_helper", "deployed_on": null, "requirements": [], "log_level": "INFO", "environment": null, "environment_variables": [], "current_states": ["1fbc923a3dc84acc9e264471e4d99cc6"], "state_machine": null, "state_plans": ["default_k8s_ecr_helper_deploy"], "monitor_interval": 10, "service_type": "k8s_ecr_helper", "version": null, "service_hosts": [], "user": null, "aws_account_id": "", "aws_region": "us-east-1", "aws_access_key": "", "aws_secret_key": ""}]}
In the above snippet, we:
list_service
operation, with a different name
and is_async
set to true
How and where did the operation execute? When the operation was called, guidepad dynamically created and scheduled a work plan invocation with instructions on how to execute the operation, in one of the environments configured within our guidepad instance. When the operation completed, guidepad persisted its output as an artifact and linked it to the operation_execution
record.
As a general rule, guidepad can execute operations in any environment for which it has a control plane implemented. We currently have control planes for Kubernetes, AWS, GCloud, Azure, and bare metal severs with the list constantly expanding. Users can utilize the plugin framework for guidepad to author their own control planes, should there be an exotic or niche use-case.
Remote execution of operations through asynchronous work plan scheduling works for operations with a handler written in Python, but what about other languages? Let's say you had this simple REST API, written in Go:
<code class="language-go">package main
import (
"net/http"
"github.com/gin-gonic/gin"
)
// album represents data about a record album.
type album struct {
ID string `json:"id"`
Title string `json:"title"`
Artist string `json:"artist"`
Price float64 `json:"price"`
}
// albums slice to seed record album data.
var albums = []album{
{ID: "1", Title: "Blue Train", Artist: "John Coltrane", Price: 56.99},
{ID: "2", Title: "Jeru", Artist: "Gerry Mulligan", Price: 17.99},
{ID: "3", Title: "Sarah Vaughan and Clifford Brown", Artist: "Sarah Vaughan", Price: 39.99},
}
func main() {
router := gin.Default()
router.GET("/list_albums", getAlbums)
router.Run("localhost:8080")
}
// getAlbums responds with the list of all albums as JSON.
func getAlbums(c *gin.Context) {
c.IndentedJSON(http.StatusOK, albums)
}
Using guidepad, you create a service definition for this API and deploy it to your environment of choice using the state plan/state machine framework. As part of that deployment, a service exposure can be created that provides access to the deployed service from outside the service's environment (these steps omitted for brevity, this blog is already quite long).
Then, you could create an Operation that is "provided by" the service:
guidepad entity create operation - <<EOF
{"name": "list_albums", "provided_by": "<UUID OF ServiceExposure>"}
When the list_albums
operation is called, guidepad will fetch the current connection details from the ServiceExposure and perform RPC to execute the operation against the deployed service. The service exposure abstraction manages the negotiation of connection details in different environments, allowing for a single interface for accessing the functionality of the operation regardless of which environment(s) the providing service is deployed into.
In a serverless framework, functionality is provided without concern to the infrastructure that functionality is dependent on. Through the operation abstraction's ability to execute functionality in heterogeneous compute environments, whether as an asynchronous call or RPC to a deployed service, teams can build serverless functionality on top of their already existing infrastructure. This frees organizations from the need to move to the cloud just to take advantage of offerings such as AWS Lambda and Fargate. Of course, if you'd like your operations to run on those frameworks they can be represented as environment types within guidepad and become targets for asynchronous execution (without changing anything about how your operations are invoked!).
With guidepad's built-in state machine functionality for services, users can even design sophisticated hot/cold behavior for the services backing their operations:
In the above diagram, the diamonds represent service states with the arrows between them being the permissible state transitions. Each transition is gated on a set of guidepad requirement entities, which must be satisfied before the transition can occur. These requirements are represented by the boxes with the white background. Guidepad is able to autonomously manage these state machines, transitioning services between states when the conditions are met. In this state machine, the service backing an operation would be automatically deployed if it is not deployed and an operation request is received, and then it would be undeployed if there were no requests within a specific time window. Another set of conditions would automatically scale the number of replicas of the service that were deployed, if a set throughput thresholds was violated.
This diagram would be represented by a set of data entities within a guidepad instance, allowing for real-time modification of the state machine without writing any files or pushing to repositories.
Guidepad's ML Plugin
Jul 28, 2023 · 10 min read read
Guidepad's Managed Embeddings Service (Part 1)
Aug 8, 2023 · 10 min read read
Guidepad's Managed Embeddings Service (Part 2)
Aug 8, 2023 · 10 min read read