Monitoring

Logging

Canton uses Logback as the logging library. All Canton logs derive from the logger com.digitalasset.canton. By default, Canton will write a log to the file log/canton.log using the INFO log-level and will also log WARN and ERROR to stdout.

How Canton produces log files can be configured extensively on the command line using the following options:

  • -v (or --verbose) is a short option to set the canton log level to DEBUG. This is likely the most common log option you will use.
  • --debug sets all log levels, except stdout which is set to INFO, to DEBUG. Note that DEBUG logs of external libraries can be very noisy.
  • --log-level-root=<level> configures the log-level of the root logger. This changes the log level of Canton and of external libraries, but not of stdout.
  • --log-level-canton=<level> configures the log-level of only the Canton logger.
  • --log-level-stdout=<level> configures the log-level of stdout. This will usually be the text displayed in the Canton console.
  • --log-file-name=log/canton.log configures the location of the log file.
  • --log-file-appender=flat|rolling|off configures if and how logging to a file should be done. The rolling appender will roll the files according to the defined date-time pattern.
  • --log-file-rolling-history=12 configures the number of historical files to keep when using the rolling appender.
  • --log-file-rolling-pattern=YYYY-mm-dd configures the rolling file suffix (and therefore the frequency) of how files should be rolled.
  • --log-truncate configures whether the log file should be truncated on startup.
  • --log-profile=container provides a default set of logging settings for a particular setup. Right now, we only support the container profile which logs to STDOUT and turns of flat file logging to avoid storage leaks due to log files within a container.

Please note that if you use --log-profile, the order of the command line arguments matters. The profile settings can be overridden on the command line by placing adjustments after the profile has been selected.

Canton supports the normal log4j logging levels: TRACE, DEBUG, INFO, WARN, ERROR.

For further customization, a custom logback configuration can be provided using JAVA_OPTS.

JAVA_OPTS="-Dlogback.configurationFile=./path-to-file.xml" ./bin/canton --config ...

If you use a custom log-file, the command line arguments for logging will not have any effect, except --log-level-canton and --log-level-root which can still be used to adjust the log level of the root loggers.

Viewing Logs

We strongly recommend the use of a log file viewer such as lnav to view Canton logs and resolve issues. Among other features, lnav has automatic syntax highlighting, convenient filtering for specific log messages, and allows viewing log files of different Canton components in a single view. This makes viewing logs and resolving issues a lot more efficient than simply using standard UNIX tools such as less or grep.

In particular, we have found the following features especially useful when using lnav:

  • viewing log files of different Canton components in a single view merged according to timestamps (lnav <log1> <log2> ...).
  • filtering specific log messages in (:filter-in <regex>) or out (:filter-out <regex>). When filtering messages, e.g. with a given trace-id, in, a transaction can be traced across different components, especially when using the single-view-feature described above.
  • searching for specific log messages (/<regex>) and jumping in-between them (n and N).
  • automatic syntax highlighting of parts of log messages (e.g. timestamps) and log messages themselves (e.g. WARN log messages are yellow).
  • jumping in-between error (e and E) and warn messages (w and W).
  • selectively activating and deactivating different filters and files (TAB and `` `` to activate/deactivate a filter).
  • marking lines (m) and jumping back-and-forth between marked lines (u and U).

The custom lnav log format file for Canton logs canton.lnav.json is bundled in any Canton release. It can be installed with lnav -i canton.lnav.json.

Detailed Logging

By default, logging will omit details in order to not write sensitive data into log files. For debug or educational purposes, you can turn on additional logging using the following configuration switches:

canton.monitoring.logging {
    event-details = true
    api {
        message-payloads = true
        max-method-length = 1000
        max-message-lines = 10000
        max-string-length = 10000
        max-metadata-size = 10000
    }
}

In particular, this will turn on payload logging in the ApiRequestLogger, which records every GRPC API invocation, and will turn on detailed logging of the SequencerClient and for the transaction trees. Please note that all additional events will be logged at DEBUG level.

Tracing

For further debuggability, Canton provides a trace-id which allows to trace the processing of requests through the system. The trace-id is exposed to logback through the mapping diagnostic context and can be included in the logback output pattern using %mdc{trace-id}.

The trace-id propagation is enabled by setting the canton.monitoring.tracing.propagation = enabled configuration option, which is already enabled by default.

It is also possible to configure the service where traces and spans are reported to. Currently Jaeger and Zipkin are supported. For example, Jaeger reporting can be configure as follows:

monitoring.tracing.tracer.exporter {
  type = jaeger
  address = ... // default: "localhost"
  port = ... // default: 14250
}

It is possible to try it out locally very easily by running Jaeger on a Docker container as follows:

docker run --rm -it --name jaeger\
  -p 16686:16686 \
  -p 14250:14250 \
  jaegertracing/all-in-one:1.22.0

Sampling

It is also possible to change how often spans are sampled (i.e. reported to the configured exporter). By default it will always report (monitoring.tracing.tracer.sampler.type = always-on). It can also be configured to never report (monitoring.tracing.tracer.sampler.type = always-off, although not super useful). And it can also be configured so that a specific fraction of spans are reported like below:

monitoring.tracing.tracer.sampler = {
  type = trace-id-ratio
  ratio = 0.5
}

There is one last property of sampling that can be optionally changed. By default we have parent-based sampling on (monitoring.tracing.tracer.sampler.parent-based = true) which means that a span is sampled iff its parent is sampled (the root span will follow the configured sampling strategy). This way, there will never be incomplete traces, so either the full trace is sampled or not. If this property is changed, all spans will follow the configured sampling strategy ignoring whether the parent is sampled or not.

Known Limitations

Not every trace created which can currently be observed in logs are reported to the configured trace collector service. Traces originated at console commands or that are part of the transaction protocol are largely well reported, while other kinds of traces are being added to the set of reported traces as the need arise.

Also, even the transaction protocol trace has a know limitation which is that once some command is submitted (and its trace fully reported), if there are any resulting daml events which are subsequently processed as a result, a new trace is created as currently the ledger api does not propagate any trace context info from command submission to transaction subscription. This can be observed for example by the fact that if a participant creates a ping command, it is possible to see the full transaction processing trace of the ping command being submitted, but then the participant which processes the ping by creating a pong command will then create a separate trace instead of continuing to use the same one.

Status

Each Canton node exposes rich status information. Running

<node>.health.status

will return a status object which can be one of

  • Failure - if the status of the node can not be determined, including an error message why it failed
  • NotInitialized - if the node is not yet initialized
  • Success[NodeStatus] - if the status could be determined including the detailed status.

Depending on the node type, the NodeStatus will differ. A participant node will respond with a message containing

  • Participant id: - the participant id of the node
  • Uptime: - the uptime of this node
  • Ports: - the ports on which the participant node exposes the Ledger and the Admin API.
  • Connected domains: - list of domains the participant is currently connected to properly
  • Unhealthy domains: - list of domains the participant is trying to be connected to but where the connection is not ready for command submission.
  • Active: - true if this instance is the active replica (can be false in case of the passive instance of a high-availability deployment)

A domain node or a sequencer node will respond with a message containing

  • Domain id: - the unique identifier of the domain
  • Uptime: - the uptime of this node
  • Ports: - the ports on which the domain node exposes the Public and the Admin API
  • Connected Participants: - the list of connected participants
  • Sequencer: - a boolean flag indicating if the embedded sequencer writer is operational

A domain topology manager or a mediator node will return

  • Node uid: - the unique identifier of the node
  • Uptime: - the uptime of this node
  • Ports: - the ports on which the node hosts its APIs.
  • Active: - true if this instance is the active replica (can be false in case of the passive instance of a high-availability deployment)

Health Dumps

In order to provide efficient support, we need as much information as possible. For this purpose, Canton implements an information gathering facility that will gather key essential system information for our support staff. Therefore, if you encounter an error where you need our help, please ensure the following:

  • Start Canton in interactive mode, with the -v option to enable debug logging: ./bin/canton -v -c <myconfig>. This will provide you with a console prompt.
  • Reproduce the error by following the steps that previously caused the error. Write down these steps so they can be provided to support staff.
  • After you observe the error, type health.dump() into the Canton console to generate the ZIP file.

This will create a dump file (.zip) that stores the following information:

  • The configuration you are using, with all sensitive data stripped from it (no passwords).
  • An extract of the logfile. We don’t log overly sensitive data into log files.
  • A current snapshot on Canton metrics.
  • A stacktrace for each running thread.

Please provide the gathered information together with the exact list of steps you did that lead to the issue to your support contact. Providing complete information is very important to us in order to help you troubleshoot your issues.

Health Check

The canton process can optionally expose an HTTP endpoint indicating if the process believes it is healthy. This is intended for use in uptime checks and liveness probes. If enabled, the /health endpoint will respond to a GET http request with a 200 HTTP status code if healthy or 500 if unhealthy (with a plain text description of why it is unhealthy).

To enable this health endpoint add a monitoring section to the canton configuration. As this health check is for the whole process, it is added directly to the canton configuration rather than for a specific node.

canton {
  monitoring.health {
   server {
      port = 7000
   }

   check {
     type = ping
     participant = participant1
     interval = 30s
   }
}

This health check will have participant1 “ledger ping” itself every 30 seconds. The process will be considered healthy if the ping is successful.

Metrics

Canton uses dropwizard’s metrics library to report metrics. The metrics library supports a variety of reporting backends. JMX based reporting (only for testing purposes) can be enabled using

canton.monitoring.metrics.reporters = [{ type = jmx }]

Additionally, metrics can be written to a file

canton.monitoring.metrics.reporters = [{
  type = jmx
}, {
  type = csv
  directory = "metrics"
  interval = 5s // default
  filters = [{
    contains = "canton"
  }]
}]

or reported via Graphite (to Grafana) using

canton.monitoring.metrics.reporters = [{
  type = graphite
  address = "localhost" // default
  port = 2003
  prefix.type = hostname // default
  interval = 30s // default
  filters = [{
    contains = "canton"
  }]
}]

or reported via Prometheus (to Grafana) using

canton.monitoring.metrics.reporters = [{
  type = prometheus
  address = "localhost" // default
  port = 9000 // default
}]

When using the graphite or csv reporters, Canton will periodically evaluate all metrics matching the given filters. It is therefore advisable to filter for only those metrics that are relevant to you.

In addition to Canton metrics, the process can also report Daml metrics (of the ledger api server). Optionally, JVM metrics can be included using

canton.monitoring.metrics.report-jvm-metrics = yes // default no

Participant Metrics

canton.<domain>.conflict-detection.sequencer-counter-queue

  • Summary: Size of conflict detection sequencer counter queue
  • Description: The task scheduler will work off tasks according to the timestamp order, scheduling the tasks whenever a new timestamp has been observed. This metric exposes the number of un-processed sequencer messages that will trigger a timestamp advancement.
  • Type: Gauge

canton.<domain>.conflict-detection.task-queue

  • Summary: Size of conflict detection task queue
  • Description: The task scheduler will schedule tasks to run at a given timestamp. This metric exposes the number of tasks that are waiting in the task queue for the right time to pass. A huge number does not necessarily indicate a bottleneck; it could also mean that a huge number of tasks have not yet arrived at their execution time.
  • Type: Gauge

canton.<domain>.protocol-messages.confirmation-request-creation

  • Summary: Time to create a confirmation request
  • Description: The time that the transaction protocol processor needs to create a confirmation request.
  • Type: Timer

canton.<domain>.protocol-messages.confirmation-request-size

  • Summary: Confirmation request size
  • Description: Records the histogram of the sizes of (transaction) confirmation requests.
  • Type: Histogram

canton.<domain>.protocol-messages.transaction-message-receipt

  • Summary: Time to parse a transaction message
  • Description: The time that the transaction protocol processor needs to parse and decrypt an incoming confirmation request.
  • Type: Timer

canton.<domain>.request-tracker.sequencer-counter-queue

  • Summary: Size of record order publisher sequencer counter queue
  • Description: Same as for conflict-detection, but measuring the sequencer counter queues for the publishing to the ledger api server according to record time.
  • Type: Gauge

canton.<domain>.request-tracker.task-queue

  • Summary: Size of record order publisher task queue
  • Description: The task scheduler will schedule tasks to run at a given timestamp. This metric exposes the number of tasks that are waiting in the task queue for the right time to pass.
  • Type: Gauge

canton.<domain>.sequencer-client.application-handle

  • Summary: Timer monitoring time and rate of sequentially handling the event application logic
  • Description: All events are received sequentially. This handler records the the rate and time it takes the application (participant or domain) to handle the events.
  • Type: Timer

canton.<domain>.sequencer-client.delay

  • Summary: The delay on the event processing
  • Description: Every message received from the sequencer carries a timestamp. The delay provides the difference between the sequencing time and the processing time. The difference can be a result of either clock-skew or if the system is overloaded and doesn’t manage to keep up with processing events.
  • Type: Gauge

canton.<domain>.sequencer-client.event-handle

  • Summary: Timer monitoring time and rate of entire event handling
  • Description: Most event handling cost should come from the application-handle. This timer measures the full time (which should just be marginally more than the application handle.
  • Type: Timer

canton.<domain>.sequencer-client.load

  • Summary: The load on the event subscription
  • Description: The event subscription processor is a sequential process. The load is a factor between 0 and 1 describing how much of an existing interval has been spent in the event handler.
  • Type: Gauge

canton.<domain>.sequencer-client.submissions.dropped

  • Summary: Count of send requests that did not cause an event to be sequenced
  • Description: Counter of send requests we did not witness a corresponding event to be sequenced by the supplied max-sequencing-time. There could be many reasons for this happening: the request may have been lost before reaching the sequencer, the sequencer may be at capacity and the the max-sequencing-time was exceeded by the time the request was processed, or the supplied max-sequencing-time may just be too small for the sequencer to be able to sequence the request.
  • Type: Counter

canton.<domain>.sequencer-client.submissions.in-flight

  • Summary: Number of sequencer send requests we have that are waiting for an outcome or timeout
  • Description: Incremented on every successful send to the sequencer. Decremented when the event or an error is sequenced, or when the max-sequencing-time has elapsed.
  • Type: Gauge

canton.<domain>.sequencer-client.submissions.overloaded

  • Summary: Count of send requests which receive an overloaded response
  • Description: Counter that is incremented if a send request receives an overloaded response from the sequencer.
  • Type: Counter

canton.<domain>.sequencer-client.submissions.sends

  • Summary: Rate and timings of send requests to the sequencer
  • Description: Provides a rate and time of how long it takes for send requests to be accepted by the sequencer. Note that this is just for the request to be made and not for the requested event to actually be sequenced.
  • Type: Timer

canton.<domain>.sequencer-client.submissions.sequencing

  • Summary: Rate and timings of sequencing requests
  • Description: This timer is started when a submission is made to the sequencer and then completed when a corresponding event is witnessed from the sequencer, so will encompass the entire duration for the sequencer to sequence the request. If the request does not result in an event no timing will be recorded.
  • Type: Timer

canton.commitments.compute

  • Summary: Time spent on commitment computations.
  • Description: Participant nodes compute bilateral commitments at regular intervals. This metric exposes the time spent on each computation. If the time to compute the metrics starts to exceed the commitment intervals, this likely indicates a problem.
  • Type: Timer

canton.db-storage.<storage>

  • Summary: Timer monitoring duration and rate of accessing the given storage
  • Description: Covers both read from and writes to the storage.
  • Type: Timer

canton.db-storage.<storage>.load

  • Summary: The load on the given storage
  • Description: The load is a factor between 0 and 1 describing how much of an existing interval has been spent reading from or writing to the storage.
  • Type: Gauge

canton.db-storage.alerts.multi-domain-event-log

  • Summary: Number of failed writes to the multi-domain event log
  • Description: Failed writes to the multi domain event log indicate an issue requiring user intervention. In the case of domain event logs, the corresponding domain no longer emits any subsequent events until domain recovery is initiated (e.g. by disconnecting and reconnecting the participant from the domain). In the case of the participant event log, an operation might need to be reissued. If this counter is larger than zero, check the canton log for errors for details.
  • Type: Counter

canton.db-storage.alerts.single-dimension-event-log

  • Summary: Number of failed writes to the event log
  • Description: Failed writes to the single dimension event log indicate an issue requiring user intervention. In the case of domain event logs, the corresponding domain no longer emits any subsequent events until domain recovery is initiated (e.g. by disconnecting and reconnecting the participant from the domain). In the case of the participant event log, an operation might need to be reissued. If this counter is larger than zero, check the canton log for errors for details.
  • Type: Counter

canton.db-storage.general.executor.queued

  • Summary: Number of database access tasks waiting in queue
  • Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
  • Type: Gauge

canton.db-storage.general.executor.running

  • Summary: Number of database access tasks currently running
  • Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
  • Type: Gauge

canton.db-storage.general.executor.waittime

  • Summary: Scheduling time metric for database tasks
  • Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
  • Type: Timer

canton.db-storage.write.executor.queued

  • Summary: Number of database access tasks waiting in queue
  • Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
  • Type: Gauge

canton.db-storage.write.executor.running

  • Summary: Number of database access tasks currently running
  • Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
  • Type: Gauge

canton.db-storage.write.executor.waittime

  • Summary: Scheduling time metric for database tasks
  • Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
  • Type: Timer

canton.prune

  • Summary: Duration of prune operations.
  • Description: This timer exposes the duration of pruning requests from the Canton portion of the ledger.
  • Type: Timer

canton.updates-published

  • Summary: Number of updates published through the read service to the indexer
  • Description: When an update is published through the read service, it has already been committed to the ledger. The indexer will subsequently store the update in a form that allows for querying the ledger efficiently.
  • Type: Meter

Domain Metrics

canton.db-storage.<storage>

  • Summary: Timer monitoring duration and rate of accessing the given storage
  • Description: Covers both read from and writes to the storage.
  • Type: Timer

canton.db-storage.<storage>.load

  • Summary: The load on the given storage
  • Description: The load is a factor between 0 and 1 describing how much of an existing interval has been spent reading from or writing to the storage.
  • Type: Gauge

canton.db-storage.alerts.multi-domain-event-log

  • Summary: Number of failed writes to the multi-domain event log
  • Description: Failed writes to the multi domain event log indicate an issue requiring user intervention. In the case of domain event logs, the corresponding domain no longer emits any subsequent events until domain recovery is initiated (e.g. by disconnecting and reconnecting the participant from the domain). In the case of the participant event log, an operation might need to be reissued. If this counter is larger than zero, check the canton log for errors for details.
  • Type: Counter

canton.db-storage.alerts.single-dimension-event-log

  • Summary: Number of failed writes to the event log
  • Description: Failed writes to the single dimension event log indicate an issue requiring user intervention. In the case of domain event logs, the corresponding domain no longer emits any subsequent events until domain recovery is initiated (e.g. by disconnecting and reconnecting the participant from the domain). In the case of the participant event log, an operation might need to be reissued. If this counter is larger than zero, check the canton log for errors for details.
  • Type: Counter

canton.db-storage.general.executor.queued

  • Summary: Number of database access tasks waiting in queue
  • Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
  • Type: Gauge

canton.db-storage.general.executor.running

  • Summary: Number of database access tasks currently running
  • Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
  • Type: Gauge

canton.db-storage.general.executor.waittime

  • Summary: Scheduling time metric for database tasks
  • Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
  • Type: Timer

canton.db-storage.write.executor.queued

  • Summary: Number of database access tasks waiting in queue
  • Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
  • Type: Gauge

canton.db-storage.write.executor.running

  • Summary: Number of database access tasks currently running
  • Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
  • Type: Gauge

canton.db-storage.write.executor.waittime

  • Summary: Scheduling time metric for database tasks
  • Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
  • Type: Timer

canton.mediator.outstanding-requests

  • Summary: Number of currently outstanding requests
  • Description: This metric provides the number of currently open requests registered with the mediator.
  • Type: Gauge

canton.mediator.requests

  • Summary: Number of totally processed requests
  • Description: This metric provides the number of totally processed requests since the system has been started.
  • Type: Meter

canton.mediator.sequencer-client.application-handle

  • Summary: Timer monitoring time and rate of sequentially handling the event application logic
  • Description: All events are received sequentially. This handler records the the rate and time it takes the application (participant or domain) to handle the events.
  • Type: Timer

canton.mediator.sequencer-client.delay

  • Summary: The delay on the event processing
  • Description: Every message received from the sequencer carries a timestamp. The delay provides the difference between the sequencing time and the processing time. The difference can be a result of either clock-skew or if the system is overloaded and doesn’t manage to keep up with processing events.
  • Type: Gauge

canton.mediator.sequencer-client.event-handle

  • Summary: Timer monitoring time and rate of entire event handling
  • Description: Most event handling cost should come from the application-handle. This timer measures the full time (which should just be marginally more than the application handle.
  • Type: Timer

canton.mediator.sequencer-client.load

  • Summary: The load on the event subscription
  • Description: The event subscription processor is a sequential process. The load is a factor between 0 and 1 describing how much of an existing interval has been spent in the event handler.
  • Type: Gauge

canton.mediator.sequencer-client.submissions.dropped

  • Summary: Count of send requests that did not cause an event to be sequenced
  • Description: Counter of send requests we did not witness a corresponding event to be sequenced by the supplied max-sequencing-time. There could be many reasons for this happening: the request may have been lost before reaching the sequencer, the sequencer may be at capacity and the the max-sequencing-time was exceeded by the time the request was processed, or the supplied max-sequencing-time may just be too small for the sequencer to be able to sequence the request.
  • Type: Counter

canton.mediator.sequencer-client.submissions.in-flight

  • Summary: Number of sequencer send requests we have that are waiting for an outcome or timeout
  • Description: Incremented on every successful send to the sequencer. Decremented when the event or an error is sequenced, or when the max-sequencing-time has elapsed.
  • Type: Gauge

canton.mediator.sequencer-client.submissions.overloaded

  • Summary: Count of send requests which receive an overloaded response
  • Description: Counter that is incremented if a send request receives an overloaded response from the sequencer.
  • Type: Counter

canton.mediator.sequencer-client.submissions.sends

  • Summary: Rate and timings of send requests to the sequencer
  • Description: Provides a rate and time of how long it takes for send requests to be accepted by the sequencer. Note that this is just for the request to be made and not for the requested event to actually be sequenced.
  • Type: Timer

canton.mediator.sequencer-client.submissions.sequencing

  • Summary: Rate and timings of sequencing requests
  • Description: This timer is started when a submission is made to the sequencer and then completed when a corresponding event is witnessed from the sequencer, so will encompass the entire duration for the sequencer to sequence the request. If the request does not result in an event no timing will be recorded.
  • Type: Timer

canton.sequencer.db-storage.<storage>

  • Summary: Timer monitoring duration and rate of accessing the given storage
  • Description: Covers both read from and writes to the storage.
  • Type: Timer

canton.sequencer.db-storage.<storage>.load

  • Summary: The load on the given storage
  • Description: The load is a factor between 0 and 1 describing how much of an existing interval has been spent reading from or writing to the storage.
  • Type: Gauge

canton.sequencer.db-storage.alerts.multi-domain-event-log

  • Summary: Number of failed writes to the multi-domain event log
  • Description: Failed writes to the multi domain event log indicate an issue requiring user intervention. In the case of domain event logs, the corresponding domain no longer emits any subsequent events until domain recovery is initiated (e.g. by disconnecting and reconnecting the participant from the domain). In the case of the participant event log, an operation might need to be reissued. If this counter is larger than zero, check the canton log for errors for details.
  • Type: Counter

canton.sequencer.db-storage.alerts.single-dimension-event-log

  • Summary: Number of failed writes to the event log
  • Description: Failed writes to the single dimension event log indicate an issue requiring user intervention. In the case of domain event logs, the corresponding domain no longer emits any subsequent events until domain recovery is initiated (e.g. by disconnecting and reconnecting the participant from the domain). In the case of the participant event log, an operation might need to be reissued. If this counter is larger than zero, check the canton log for errors for details.
  • Type: Counter

canton.sequencer.db-storage.general.executor.queued

  • Summary: Number of database access tasks waiting in queue
  • Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
  • Type: Gauge

canton.sequencer.db-storage.general.executor.running

  • Summary: Number of database access tasks currently running
  • Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
  • Type: Gauge

canton.sequencer.db-storage.general.executor.waittime

  • Summary: Scheduling time metric for database tasks
  • Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
  • Type: Timer

canton.sequencer.db-storage.write.executor.queued

  • Summary: Number of database access tasks waiting in queue
  • Description: Database access tasks get scheduled in this queue and get executed using one of the existing asynchronous sessions. A large queue indicates that the database connection is not able to deal with the large number of requests. Note that the queue has a maximum size. Tasks that do not fit into the queue will be retried, but won’t show up in this metric.
  • Type: Gauge

canton.sequencer.db-storage.write.executor.running

  • Summary: Number of database access tasks currently running
  • Description: Database access tasks run on an async executor. This metric shows the current number of tasks running in parallel.
  • Type: Gauge

canton.sequencer.db-storage.write.executor.waittime

  • Summary: Scheduling time metric for database tasks
  • Description: Every database query is scheduled using an asynchronous executor with a queue. The time a task is waiting in this queue is monitored using this metric.
  • Type: Timer

canton.sequencer.processed

  • Summary: Number of messages processed by the sequencer
  • Description: This metric measures the number of successfully validated messages processed by the sequencer since the start of this process.
  • Type: Meter

canton.sequencer.processed-bytes

  • Summary: Number of message bytes processed by the sequencer
  • Description: This metric measures the total number of message bytes processed by the sequencer.
  • Type: Meter

canton.sequencer.sequencer-client.application-handle

  • Summary: Timer monitoring time and rate of sequentially handling the event application logic
  • Description: All events are received sequentially. This handler records the the rate and time it takes the application (participant or domain) to handle the events.
  • Type: Timer

canton.sequencer.sequencer-client.delay

  • Summary: The delay on the event processing
  • Description: Every message received from the sequencer carries a timestamp. The delay provides the difference between the sequencing time and the processing time. The difference can be a result of either clock-skew or if the system is overloaded and doesn’t manage to keep up with processing events.
  • Type: Gauge

canton.sequencer.sequencer-client.event-handle

  • Summary: Timer monitoring time and rate of entire event handling
  • Description: Most event handling cost should come from the application-handle. This timer measures the full time (which should just be marginally more than the application handle.
  • Type: Timer

canton.sequencer.sequencer-client.load

  • Summary: The load on the event subscription
  • Description: The event subscription processor is a sequential process. The load is a factor between 0 and 1 describing how much of an existing interval has been spent in the event handler.
  • Type: Gauge

canton.sequencer.sequencer-client.submissions.dropped

  • Summary: Count of send requests that did not cause an event to be sequenced
  • Description: Counter of send requests we did not witness a corresponding event to be sequenced by the supplied max-sequencing-time. There could be many reasons for this happening: the request may have been lost before reaching the sequencer, the sequencer may be at capacity and the the max-sequencing-time was exceeded by the time the request was processed, or the supplied max-sequencing-time may just be too small for the sequencer to be able to sequence the request.
  • Type: Counter

canton.sequencer.sequencer-client.submissions.in-flight

  • Summary: Number of sequencer send requests we have that are waiting for an outcome or timeout
  • Description: Incremented on every successful send to the sequencer. Decremented when the event or an error is sequenced, or when the max-sequencing-time has elapsed.
  • Type: Gauge

canton.sequencer.sequencer-client.submissions.overloaded

  • Summary: Count of send requests which receive an overloaded response
  • Description: Counter that is incremented if a send request receives an overloaded response from the sequencer.
  • Type: Counter

canton.sequencer.sequencer-client.submissions.sends

  • Summary: Rate and timings of send requests to the sequencer
  • Description: Provides a rate and time of how long it takes for send requests to be accepted by the sequencer. Note that this is just for the request to be made and not for the requested event to actually be sequenced.
  • Type: Timer

canton.sequencer.sequencer-client.submissions.sequencing

  • Summary: Rate and timings of sequencing requests
  • Description: This timer is started when a submission is made to the sequencer and then completed when a corresponding event is witnessed from the sequencer, so will encompass the entire duration for the sequencer to sequence the request. If the request does not result in an event no timing will be recorded.
  • Type: Timer

canton.sequencer.subscriptions

  • Summary: Number of active sequencer subscriptions
  • Description: This metric indicates the number of active subscriptions currently open and actively served subscriptions at the sequencer.
  • Type: Gauge

canton.sequencer.time-requests

  • Summary: Number of time requests received by the sequencer
  • Description: When a Participant needs to know the domain time it will make a request for a time proof to be sequenced. It would be normal to see a small number of these being sequenced, however if this number becomes a significant portion of the total requests to the sequencer it could indicate that the strategy for requesting times may need to be revised to deal with different clock skews and latencies between the sequencer and participants.
  • Type: Meter

canton.topology-manager.sequencer-client.application-handle

  • Summary: Timer monitoring time and rate of sequentially handling the event application logic
  • Description: All events are received sequentially. This handler records the the rate and time it takes the application (participant or domain) to handle the events.
  • Type: Timer

canton.topology-manager.sequencer-client.delay

  • Summary: The delay on the event processing
  • Description: Every message received from the sequencer carries a timestamp. The delay provides the difference between the sequencing time and the processing time. The difference can be a result of either clock-skew or if the system is overloaded and doesn’t manage to keep up with processing events.
  • Type: Gauge

canton.topology-manager.sequencer-client.event-handle

  • Summary: Timer monitoring time and rate of entire event handling
  • Description: Most event handling cost should come from the application-handle. This timer measures the full time (which should just be marginally more than the application handle.
  • Type: Timer

canton.topology-manager.sequencer-client.load

  • Summary: The load on the event subscription
  • Description: The event subscription processor is a sequential process. The load is a factor between 0 and 1 describing how much of an existing interval has been spent in the event handler.
  • Type: Gauge

canton.topology-manager.sequencer-client.submissions.dropped

  • Summary: Count of send requests that did not cause an event to be sequenced
  • Description: Counter of send requests we did not witness a corresponding event to be sequenced by the supplied max-sequencing-time. There could be many reasons for this happening: the request may have been lost before reaching the sequencer, the sequencer may be at capacity and the the max-sequencing-time was exceeded by the time the request was processed, or the supplied max-sequencing-time may just be too small for the sequencer to be able to sequence the request.
  • Type: Counter

canton.topology-manager.sequencer-client.submissions.in-flight

  • Summary: Number of sequencer send requests we have that are waiting for an outcome or timeout
  • Description: Incremented on every successful send to the sequencer. Decremented when the event or an error is sequenced, or when the max-sequencing-time has elapsed.
  • Type: Gauge

canton.topology-manager.sequencer-client.submissions.overloaded

  • Summary: Count of send requests which receive an overloaded response
  • Description: Counter that is incremented if a send request receives an overloaded response from the sequencer.
  • Type: Counter

canton.topology-manager.sequencer-client.submissions.sends

  • Summary: Rate and timings of send requests to the sequencer
  • Description: Provides a rate and time of how long it takes for send requests to be accepted by the sequencer. Note that this is just for the request to be made and not for the requested event to actually be sequenced.
  • Type: Timer

canton.topology-manager.sequencer-client.submissions.sequencing

  • Summary: Rate and timings of sequencing requests
  • Description: This timer is started when a submission is made to the sequencer and then completed when a corresponding event is witnessed from the sequencer, so will encompass the entire duration for the sequencer to sequence the request. If the request does not result in an event no timing will be recorded.
  • Type: Timer