KIP-866 Migrating from ZooKeeper to KRaft - Apache Kafka (2023)

Actual state: accepted

topic of discussion:https://lists.apache.org/thread/phnrz31dj0jz44kcjmvzrrmhhsmbx945

JIRA: KAFKA-14304-Retrieving problem details... STATE

Keep the discussion on the mailing list instead of commenting on the wiki (wiki discussions get messy fast).

To complete the planKIP-500: Replace ZooKeeper with self-managing metadata quorumwe need a way to migrate Kafka clusters from a ZooKeeper quorum to a KRaft quorum. This should be done without affecting partition availability and with minimal impact on operators and client applications.

To give users more confidence to complete the migration to KRaft, we allow going back to ZooKeeper until the last step of the migration. This is accomplished by writing two copies of the metadata during migration: one for the KRaft quorum and one for ZooKeeper.

This KIP defines the behavior and set of new APIs for the "Bridge Version" as first mentioned in KIP-500.

Metric

MBean-NombreDescription
kafka.server:type=KafkaServer,name=metadata type

A list of: ZooKeeper (1) or KRaft (2). Every broker reports this.

kafka.controller:type=KafkaControllerName,name=metadata typeA list of: ZooKeeper (1), KRaft (2) or Dual (3). The active controller reports this.
kafka.controller:type=KafkaController,name=resources,resource={resource},level={level}The finished set of functions with their level as seen by the controller. Used to help operators see the flow of the clustermetadata.version
kafka.controller:type=KafkaController,nome=ZkMigrationStateAn enumeration of the possible migration states that the cluster can be in. This is only reported by the active controller.
kafka.controller:type=KafkaController,name=MigrationIneligibleBrokerCountA count of ZK brokers who are not eligible for migration. This metric is only reported by the active KRaft driver in MigrationInelgible mode.ZkMigrationState. If it is not in this state, it will report zero.
kafka.controller:type=KafkaController,name=MigrationIneligibleControllerCountVarious KRaft quorum drivers that are not suitable for migration. This metric is only reported by the active KRaft driver in MigrationInelgible mode.ZkMigrationState. If it is not in this state, it will report zero.
kafka.controller:type=KafkaController,nome = ZooKeeperWriteBehindLagThe backlog in records that ZooKeeper is behind compared to the highest committed record in the metadata log. This metric is only reported by the active KRaft driver.
kafka.controller:type=KafkaController,name=ZookeeperBlockingKRafMillisThe number of milliseconds that a KRaft write was blocked due to lazy ZooKeeper writes. This metric is only reported by the active KRaft driver.

new oneMetadataVersionin line 3.4 is added. This version is used for a few things in this layout.

  • Enable forwarding on all brokers (KIP-590: redirect Zookeeper mutation protocols to controller)
  • Using the new version of BrokerRegistration RPC
  • Using new versions of the RPC driver
  • Using the new ApiVersions RPC version (for KRaft controllers only)
  • Using the new ZkMigrationRecord
  • Enable migration components in KRaft controller and special migration behavior in ZK runners

All brokers must run at least this MetadataVersionbefore the migration can begin. ZK brokers specify their MetadataVersion usinginter.broker.protocol.versionas always. The KRaft controller is initialized with the same metadata version (which is stored in the metadata registry as a resource flag - see KIP-778: Updates from KRaft to KRaft).

Building

new one"zoo employee.metadata.migration.enableAdded settings for ZK racer and KRaft controller. The default is false. Setting this setting to true on each agent is a prerequisite to start the migration. Setting this to true in the KRaft controllers will trigger the start of the migration (more on this below). Setting this to true (or false) in a KRaft runner has no effect.

Controller-RPCs

A new one for the three RPCs of the ZK controller UpdateMetadataRequest, LeaderAndIsrRequest, and StopReplicaRequestKCraftControllerIdNombrefield is added.This field points to the active KRaft controller and is only set when the controller is in KRaft mode. If this field is set, thecontroller identificationThe field must be -1.

{ "apiKey": 4, "type": "request", "listeners": ["zkBroker"], "name": "LeaderAndIsrRequest", "validVersions": "0-7", // <-- New version 7 "flex versions": "4+", "fields": [ { "name": "ControllerId", "type": "int32", "versions": "0+", "entityType": "brokerId", " about": "Or controller ID." }, { "name": "KRaftControllerId", "type": "int32", "versions": "7+", "entityType": "brokerId", "about": "O ID of the KRaft controller used during " } , <-- New Field { "name": "ControllerEpoch", "type": "int32", "versions": "0+", "about": "The Controller Epoch". }, ... ]}
{ "apiKey": 5, "type": "request", "listeners": ["zkBroker"], "name": "StopReplicaRequest", "validVersions": "0-4", // <-- New version 4 "flex versions": "2+", "fields": [ { "name": "ControllerId", "type": "int32", "versions": "0+", "entityType": "brokerId", " about": "Or controller ID." }, { "name": "KRaftControllerId", "type": "int32", "versions": "4+", "entityType": "brokerId", "about": "O ID of the KRaft controller used during a migration ." }, // <-- New field { "name": "ControllerEpoch", "type": "int32", "versions": "0+", "about": "The controller epoch". }, ... ]}
{ "apiKey": 6, "type": "request", "listeners": ["zkBroker"], "name": "UpdateMetadataRequest", "validVersions": "0-8", // <-- New version 8 "flex versions": "6+", "fields": [ { "name": "ControllerId", "type": "int32", "versions": "0+", "entityType": "brokerId", " about": "Or controller ID." }, { "name": "KRaftControllerId", "type": "int32", "versions": "8+", "entityType": "brokerId", "about": "O ID of the KRaft controller used during a migration ." }, // <-- New field { "name": "ControllerEpoch", "type": "int32", "versions": "0+", "about": "The controller epoch". }, ... ]}

ApiVersionsResponse

Added a new marked field in ApiVersionsResponse to allow KRaft controllers to indicate their ability to perform the migration

{ "apiKey": 18, "type": "response", "name": "ApiVersionsResponse", "validVersions": "0-4", // <-- New version 4 "flexibleVersions": "3+", "fields": [ // ... { "name": "ZkMigrationReady", "type": "int8", "versions": "4+", "taggedVersions": "4+", "tag": 3 , "ignorable": true, "about": "Set by a KRaft controller if the configuration required for ZK migration is in effect" } // <-- new field ]}

This field is only set by the KRaft controller when it sends an ApiVersionsResponse to other KRaft controllers. As this migration is not supported on KRaft nodes in mixed mode, clients will never see this field when receiving ApiVersionsResponse sent by brokers.

Initially supported values ​​are:

  • 0: not ready
  • 1: pronto
  • Disabled: No ZK Driver

A new metadata record is added to indicate whether a ZK migration was started or completed.

{ "apiKey": <NEXT KEY>, "type": "metadata", "name": "ZkMigrationRecord", "validVersions": "0", "flexibleVersions": "0+", "fields": [ { " name": "ZkMigrationState", "type": "int8", "versions": "0+", "about": "One of the possible migration states". },]}
(Video) Apache Kafka 3.4 - New Features & Improvements

The possible values ​​for ZkMigrationState are: Started (0) and Finished (1). An int8 type is used to allow for additional states in the future.

Runner Registration RPC

A new version of the broker registration RPC will be added to allow ZK brokers to register with KRaft Quorum. A new field with labels is added to indicate that a ZK broker is ready for migration. The presence of this field is used to indicate that the issuing Trader is a ZK Trader. The use of this RPC by a ZK agent indicates that the zookeeper.metadata.migration.enable and quorum connection settings are configured correctly. The values ​​of this marked field are the same as the corresponding field in the ApiVersionsRequest.

{ "apiKey":62, "type": "request", "listeners": ["controller"], "name": "BrokerRegistrationRequest", "validVersions": "0-1", // <-- New version 1 "flex versions": "0+", "fields": [ // ... { "name": "ZkMigrationReady", "type": "int8", "versions": "1+", "tagged versions ": "1+", "tag": 1, "ignorable": true, "about": "Set by a ZK broker if the configuration required for ZK migration is in place." } // <-- New field ]}

RegistroBrokerRegistro

A new field is added to indicate that a registered broker is a ZooKeeper broker.

{ "apiKey": 0, "type": "metadata", "name": "RegisterBrokerRecord", "validVersions": "0-2", // <-- new version 2 "flexibleVersions": "0+", "fields": [ { "name": "BrokerId", "type": "int32", "versions": "0+", "entityType": "brokerId", "about": "The broker ID". }, { "name": "IsZkBroker", "type": "bool", "versions": "2+", "default": false, "about": "True if the registered broker is a ZK broker". }, // <-- new field // ... ]}

Migracioneszstand ZNode

As part of the metadata transfer from KRaft to ZooKeeper in dual logging mode, we need to keep track of what has been synced. A new ZNode is introduced to track which KRaft register offset has been rewritten to the ZK. This is used to restore the sync state after a KRaft driver failover.

ZNode /migration{ "version": 0, "kraft_controller_id": 3000, "kraft_controller_epoch": 1, "kraft_metadata_offset": 1234, "kraft_metadata_epoch": 10}

By using conditional updates on this ZNode, Will can prevent old KRaft drivers from syncing data with ZooKeeper after there is a re-election.

Controller-ZNodes

The two controller ZNodes /controller and /controller_epoch are managed by the KRaft quorum during the migration. See the Driver Guide section below for more details.

Added a new version of the JSON schema for /controller to include an isKRaft boolean field.

{ "version": 2, // <-- New version 2 "brokerid": 3000, "timestamp": 1234567890, "isKRaft": true // <-- New field}

This field is intended to be informational to aid in debugging.

Forwarding enabled in corridors

As detailed inKIP-500miKIP-590all brokers (ZK and KRaft) should forward administrative requests like CreateTopics to the active KRaft controller once the migration has started. When running the newmetadata.versionDefined in this KIP, all brokers allow forwarding.

ZK Broker Additional Settings

To support connecting to a KRaft controller for requests like AlterPartitions, ZK runners need additional configuration

  • Controller.Who.Votes: comma-separated list of "node@host:port" (same as KRaft agents would set)
  • controller.listener.names: a comma-separated list of listeners used by the controller
  • corresponding entries inlistener.safety.property.mapfor the listeners who camecontroller.listener.names

Additional KRaft broker settings

To support connecting to ZooKeeper during migration, KRaft drivers need additional configuration

  • zookeeper.connect(necessary)
  • zookeeper.connection.timeout.ms(Optional)
  • zookeeper.session.timeout.ms(Optional)
  • zookeeper.max.in.flight.requests(Optional)
  • zookeeper.set.acl(Optional)
  • ZooKeeper SSL Configuration (optional)

These settings must match the ZK settings used by the ZK controller.

migration triggers

The migration from ZK to KRaft is triggered by the state of the cluster.To start a migration, the cluster must meet a few requirements:

  1. there are runnersinter.broker.protocol.versionset the version added by this KIP to allow forwarding and to indicate that this is the minimum version of the software
  2. there are runnerszookeeper.metadata.migration.enableset to "true".This indicates that an operator has declared the intention to start the migration.
  3. Brokers configure the settings in "Additional ZK Broker Settings". This allows them to connect to the KRaft controller.
  4. No broker is offline (we will use offline replicas as a proxy for this).
  5. KRaft quorum is online and all members have itzookeeper.metadata.migration.enableset to "true", as well as configure the ZK settings.

The operator can prepare the ZK correctors or the KRaft controller in any order. The migration does not start until all the nodes are ready.

When we use broker/controller configurations and reboots, we follow a paradigm that Kafka operators are familiar with.

Here is a description of the migration state machine. There are likely more internal states that the driver uses, but these four will be exposed as ZkMigrationStatemetric.

(Video) Apache Kafka Security Best Practices

Illness

Tell

Description

MigrationNot eligible

1

Brokers and drivers do not meet the migration criteria. The cluster works in ZooKeeper mode.

Migrate ZkData

2

The controller copies the data from ZooKeeper to KRaft.

DualWriteMetadata

3

The controller is in KRaft mode and writes twice to ZooKeeper.

migration completed

4

The cluster was migrated to KRaft mode.

ZooKeeper active controller always reports "MigrationIneligible", while the active KRaft controller reports a status that matches the migration status.

prepare the bunch

The first step in the migration is to upgrade the cluster to at least the Bridge version.Upgrading the cluster to a known starting point narrows our compatibility matrix and ensures that the necessary logic is in place before migration. Agents must also configure the settings defined in "Migration Triggers" above.

To continue with the migration, allBrokers must be online to ensure they meet the migration criteria.

Driver Migration

This migration only supports dedicated KRaft controllers as the target implementation. Migration to a mixed KRaft agent/controller implementation is not supported.

A new set of nodes is provisioned for the host controller quorum. These drivers also startzookeeper.metadata.migration.enableset to "true". Once a quorum is established and a leader is elected, the active controller verifies that the entire quorum is ready to begin the migration. This is done by looking at the new tagged field in the ApiVersionsResponse that is exchanged between controllers. The controller then determines the set ofexisting ZK brokers and wait for incoming broker registration applications (see section onPresence of the ZK broker).Once all known ZK brokers have registered (and are in a valid state) with the KRaft driver, the migration process begins.

There is no order dependency between configuring the ZK brokers for migration and invoking the KRaft quorum.

The first step in the migration is to copy the existing ZK metadata and write it to the KRaft metadata registry. The active controller will also be set as an active controller from ZK's point of view. When copying ZK data, the driver does not process RPCs from the broker.

The metadata migration process causes driver downtime proportional to the total size of the metadata in the ZK.

The metadata copied from ZK is wrapped in a single metadata transaction (KIP-868). A ZkMigrationRecord is also included in this transaction.

At this point, all brokers are running in ZK mode and their broker-controller communication channels function as a ZK controller. ZK runners learn about this new driver by receiving an UpdateMetadataRequest from the new KRaft driver. From a racer's point of view, the driver looks and behaves like a regular ZK driver.

Metadata changes are now written to the KRaft metadata log and to ZooKeeper.

This dual recording mode writes metadata to the KRaft metadata registry and to ZooKeeper.

To ensure metadata consistency, we no longer need to write to ZK during data migration. This is accomplished by forcing the new KRaft controller to be the active ZK controller by forcing a write to ZNodes /controller and /controller_epoch.

migration agent

After migrating the driver metadata and leadership to KRaft, the racers will restart in KRaft mode one by one. As this rolling reboot takes place, the cluster will consist of ZK and KRaft agents.

The broker migration phase does not cause any downtime, but its total duration is virtually unlimited.

There is probably no sensible way to limit the time a cluster can stay in a mixed state, as continuous reboots can take several hours for large clusters. It is also possible that the operator resorts to ZK during this period.

Completing the migration

After the cluster is fully upgraded to KRaft mode, the driver continues to run in migration mode, performing dual writes for both KRaft and ZK. Since the data in the ZK still matches the KRaft metadata protocol, it is still possible to fall back on the ZK.

The time that the cluster is running all KRaft agents/controllers, but still in migration mode, is virtually unlimited.

Once the operator decides to configure KRaft mode, the final step is to re-establish controller quorum and bring it out of migration mode by configuringzookeeper.metadata.migration.enableto "false" (the disarm). The active controller will not complete the migration until it sees that all quorum members have signaled that they are completing the migration (again using the marked field in ApiVersionsResponse).After the driver exits migration mode, it writes a ZkMigrationRecord to the registry and no longer writes to the ZK. It will also disable your special handling of ZK-RPC.

At this point, the cluster has been fully migrated and is running in KRaft mode. A rollback to ZK is still possible after the migration is complete, but it must be done offline and will cause loss of metadata (which can also cause loss of partition data).

Metadata is written to the KRaft metadata registry and to ZooKeeper during migration. This gives us two important guarantees: we have a safe path back to ZK mode, and support for ZK broker metadata that depends on ZK watches.

At any time during the migration, the operator must be able to decide to return to ZK mode. This process should be safe and simple. By writing all metadata updates to both KRaft and ZK, we can ensure that the state stored in ZK is up to date.

By writing metadata changes to ZK, we also preserve compatibility with some direct retentive dependencies of ZK that exist in ZK runners.

  • ACL
  • dynamic configuration
  • delegation sheet

ZK brokers still rely on the monitoring mechanism to learn about changes to this metadata. When making dual recordings, we cover these cases.

The driver uses a limited write-back approach for ZooKeeper updates. When we pass records to KRaft, we write the data to ZooKeeper asynchronously. The number of pending ZK registrations is reported as a metric so that we can monitor the distance between ZK status and KRaft. We can also set a limit on the number of records that have not yet been written to ZooKeeper to avoid excessive differences between KRaft and ZooKeeper states.

To ensure consistency of data written to ZooKeeper, we will use ZooKeeper multi-operation transactions. For each "multiple" operation sent to ZooKeeper, we'll include the data written (eg themes, settings, etc.) along with a conditional ZNode update "/migration". The content of "/migration" is updated on each write to include the offset since the last record written to ZooKeeper. By using conditional updating, we can avoid contention between KRaft controllers during a failover and ensure consistency between the metadata registry and ZooKeeper.

Another benefit of using cross-operation transactions when synchronizing metadata with ZooKeeper is that we reduce the number of round trips to ZooKeeper. The ZK control also uses this pipeline technique for performance reasons.

This double-write approach ensures that all metadata displayed in ZK is also passed to KRaft.

ZK-Broker-RPC

To support brokers that are still running in ZK mode, the KRaft driver must send additional RPCs to keep the ZK broker metadata up to date.

LiderAndIsr: When the KRaft controller processes AlterPartitions or performs a leader election, we need to send LeaderAndIsr requests to the ZK runners.

update metadata: For metadata changes, the KRaft controller should send UpdateMetadataRequests to the ZK brokers. Instead ofcontroller identification, the KRaft controller will specify itselfKCraftControllerIdNombreCampo.

StopReplicas: After topic reassignments and deletions, we need to send StopReplicas to ZK-Brokers to stop managing specific replicas.

Each of these RPCs contains a newKCraftControllerIdNombreField pointing to the active KRaft controller. When this field is present, it acts as a signal to brokers that the controller is in KRaft mode. Use this field and thezookeeper.metadata.migration.enableconfig, agents can enable migration-specific behavior.

driver leadership

To prevent new writes to the ZK, the new KRaft quorum must first take control of the ZK controller. This can be achieved by unconditionally substituting two values ​​into ZK. The ZNode "/controller" displays the currently active controller. The replacement will activate a clock in all ZK racers to notify them of the choice of a new driver. The active KRaft controller writes its node ID (eg 3000) to this ZNode to claim controller leadership. This write is persistent and is not the usual volatile write used by the ZK driver election algorithm. This ensures that no ZK racer can claim the lead during a KRaft driver failover.

The second ZNode we are going to write to is /controller_epoch. This ZNode is used to protect legacy driver writes in ZK mode. Each ZK controller write is actually a conditional multi-write with a "check" operation for the ZNode version "/controller_epoch". By changing this node, we can ensure that all ongoing writes from the previous era of the ZK driver will fail.

Each time a KRaft controller election is held,the newly selected controller replaces the values ​​in "/controller" and "/controller_epoch". The first epoch generated by the KRaft quorum must be greater than the last epoch of ZK to preserve the monotonic epoch invariant.

hall register

When running in migration mode, the KRaft driver needs to know about the KRaft checker and the ZK checker. This is achieved by ZK brokers that send the broker lifecycle RPCs to the KRaft controller.

ZK brokers use a new version of the BrokerRegistration RPC to register with KRaft. ZK brokers configure the new ZkMigrationReady field and populate the Features field with a minimum and maximum metadata version number equal to their IBP. The KRaft driver will only accept the registration if the specified "metadata.version" is equal to the IBP/MetadataVersion quorum. The driver also only accepts registration if ZkMigrationReady has a valid value.

Upon successful registration, ZK brokers will send the BrokerHeartbeat RPC to indicate activity. As usual, ZK brokers learn from other brokers via UpdateMetadataRequest.

If a ZK broker tries to register with an invalid Node ID, Cluster ID or IBP, the KRaft driver will reject the registration and the broker will be shut down.

If a KRaft broker tries to register with the node ID of an existing ZK broker, the controller will reject the registration and the broker will exit.

KRaft controller soft start

If KRaft quorum is established before starting a migration, it should not process most RPCs until the initial ZooKeeper data migration is complete. This is required to avoid metadata divergence during the initial data migration. The driver must handle Raft-related RPCs, as well as BrokerRegistration and BrokerHeartbeat. Other RPCs (for example, CreateTopics) are rejected with a NOT_CONTROLLER error.

Once the metadata migration is complete, the KRaft controller will start working normally.

Presence of the ZK broker

When the KRaft controller enters migration mode, it waits for all known ZK brokers to register before starting the migration. The problem with this is that we cannot know exactly which ZK brokers exist. Broker records on ZK are short-lived and only show currently active brokers. If an operator took the brokers offline and started a migration, this would cause the controller to assume there were no brokers. To improve this, we can add a heuristic based on the cluster metadata to better capture the full set of ZK agents. By looking at topic mappings and configurations, we can estimate a number of brokers that have partitions assigned or have a dynamic configuration. This approach is still imperfect as brokers can be offline and have no mappings, but at least it prevents partition unavailability due to a broker running old software and unable to participate in the migration.

When a client initializes the cluster metadata, it should get the same metadata regardless of the agent type from which it initializes. Normally, ZK brokers return the active ZK controller ascontroller identificationand KRaft Brokers return a random live KRaft Broker. In both cases thiscontroller identificationit is read internally from the MetadataCache on the broker.

Since we need controller forwarding for this KIP, we can use the KRaft approach which returns a random broker (ZK or KRaft) likecontroller identificationto clients via MetadataResponse and rely on forwarding for writes.

For cross-agent requests, such as AlterPartitions and ControlledShutdown, we don't want to add a forwarding overhead, so we want to include the actual controller in the UpdateMetadataRequest. However, we can't just use the KRaft controller as acontroller identification. ZK runners connect to a ZK controller using the "inter.broker.listener.name" config and node informationLiveBrokerin UpdateMetadataRequest. To connect to a KRaft controller, ZK racers must use the "controller.listener.names"e"Controller.Who.Votes" configs. To make this possible, we use the newKCraftControllerIdNombreField in UpdateMetadataRequest.

topic exclusions

ZK migration logic should handle asynchronous topic deletions when migrating data. Typically, the ZK driver performs these asynchronous deletions through the TopicDeletionManager. If the KRaft controller takes over before a cleanup occurs, we must complete the cleanup as part of the state migration from ZK to KRaft. Once the migration is complete, we need to complete the delete on the ZK for the state to be consistent.

The ZK and KRaft brokers maintain a meta.properties file in their log directories to store the node and cluster ID. Each type of broker uses a different version of this file.

v0 is used by ZK brokers:

##Di 29 de noviembre 10:15:56 EST 2022broker.id=0version=0cluster.id=L05pbYc6Q4qlvxLk3rTO9A

v1 is used by KRaft racers and controllers:

##Di 29 de noviembre 10:16:40 EST 2022node.id=2version=1cluster.id=L05pbYc6Q4qlvxLk3rTO9A

Since these two versions contain the same data but with different field names, we can easily support v0 and v1 in KRaft's checkers and avoid modifying the file on disk. By leaving this file unchanged, we make it easier to downgrade to ZK during migration. Once the driver has completed the migration and written the final ZkMigrationRecord, the brokers can rewrite their meta.properties files as v1 to their log directories.

Reversal in ZK

As mentioned above, the operator should be able to return to ZooKeeper at any point in the migration process before taking the KRaft controllers out of migration mode. The rollback procedure consists of reversing the migration steps performed so far.

  • Racers need to be restarted in ZK mode one by one
  • KRaft Controller Quorum should shut down correctly
  • The operator can remove the persistent nodes "/controller" and "/controller_epoch" so that the ZK controller election can take place.

A clean shutdown of the KRaft quorum is important, as there may be uncommitted metadata waiting to be written to ZooKeeper. A forced shutdown can cause some metadata to be lost, which could lead to data loss.

failure modes

There are some error scenarios to consider during the migration. The KRaft driver may crash when the ZooKeeper data is initially copied, the driver may hang for some time after the initial migration, and the driver may fail to write new metadata to ZK.

Initial data migration

For the initial migration, the driver usesKIP-868 Metadata Transactionsto write all ZK metadata in a single transaction. If the controller fails before this transaction is complete, the next active controller aborts the transaction and restarts the migration process.

controller fails

After the data is migrated and the cluster status is Migration Active or Migration Finished, the KRaft driver may fail. In this case, the Raft layer elects a new leader that updates the /controller and /controller_epoch ZNodes and assumes controller leadership as usual.

No disponible ZooKeeper

In dual write mode, a write to ZK may fail. In this case, we want to stop performing metadata registry updates to avoid an infinite delay between KRaft and ZooKeeper. Since ZK brokers read data such as ACLs and dynamic settings from ZooKeeper, we need to limit the variance between ZK brokers and KRaft by setting a limit on the delay between KRaft and ZooKeeper.

incompatible broker

It is possible at any time during the migration for a carrier to bring in an incompatible broker. It can be a new or existing broker. In this case, the KRaft controller sees the broker's record in ZK, but does not send any RPCs to it. By refusing to send UpdateMetadata or LeaderAndIsr RPCs, this broker is effectively isolated from the rest of the cluster.

wrong settings

There are some misconfiguration scenarios that we can guard against.

If you have started a migration, but a selected KRaft controller is misconfigured (does not havezookeeper.metadata.migration.enableor ZK configuration) this driver should give up. By replaying the metadata record during its initialization phase, this driver can detect that a migration is in progress by looking at the initial ZkMigrationRecord. Since you don't have the necessary configuration, you can give up the lead and throw an error.

When a migration completes but the KRaft quorum appearszookeeper.metadata.migration.enable, we must not enter migration mode again. In this case, by replaying the record, the driver can see the second ZkMigrationRecord and know that the migration is complete and should not continue. This should generate errors, but the quorum can continue to function normally.

Other scenarios will likely exist and will be explored when implementing the migration feature.

In addition to basic "happy path" testing, we also want to test whether the migration can tolerate failures from KRaft brokers and controllers. We also want to have system patch testing if ZooKeeper is not available during migration. Another class of tests for this process is the consistency of the metadata at the broker level. Since we support both ZK and KRaft brokers at the same time, we need to make sure that their metadata is not inconsistent for too long.

offline migration

The main alternative to this design is an offline migration. While this is much easier, it's not a start for many Kafka users who need minimal downtime from their cluster. By enabling online migration from ZK to KRaft, we can provide a path to KRaft for all Kafka users, even those where Kafka is critical infrastructure.

Online Broker Migration

After KRaft has taken over the controller and migrated the ZK data, the design requires a restart of the ZK brokers in KRaft mode. An alternative to this is to dynamically switch brokers by using RPC handlers (UpdateMetadata and LeaderAndISR) for the metadata protocol. This would reduce the need to continually restart racers to get them into KRaft mode. The difficulty with this approach is that there is a big difference in the implementations between KafkaServer (ZK) and BrokerServer (KRaft). It is possible to compensate for these differences, but the effort would be too great. This option would also increase the risk of migration since we would be changing the "safe" state of the broker code. By leaving the ZK implementation largely unchanged, we provide ourselves with a safety net for a rollback during migration.

No duplicate recording

Another simplifying alternative would be to simply write metadata to KRaft in migration mode. This has some drawbacks. First, going back to ZK makes it much more difficult, if not possible in the first place. Second, we actually have some ZK leftmoreIt is used by brokers that need the data in the ZK to be up-to-date (see the section above on Dual Metadata Writes).

Command/RPC based trigger

Another way to start the migration would be to have an operator issue a special command or send a special RPC. Adding these manual human migration steps can complicate integration with orchestration software like Anisble, Chef, Kubernetes, etc. By adhering to a “set and reboot” approach, the migration trigger remains simple, but easier to integrate with other control systems.

ZooKeeper data synchronization with write-ahead

An alternative to lazywriting for ZooKeeper would be to write to ZooKeeper first and then to the metadata log. The main problem with this approach is that KRaft writes will be much slower since ZK is always in the write path. By doing a background write with offset tracking, we can amortize ZK write latency and potentially bulk write to ZK more efficiently.

Mixed mode migration support

Since mixed mode is primarily designed for developer environments, support for mixed mode migrations was not considered a priority for this design. By excluding it from this initial project, we can simplify deployment and remove an entire system configuration from the test matrix. The migration project is already complex, so any reduction in scope is beneficial. In the future, we may add support for mixed mode migrations based on this design.

References

Top Articles
Latest Posts
Article information

Author: Laurine Ryan

Last Updated: 15/09/2023

Views: 6321

Rating: 4.7 / 5 (57 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Laurine Ryan

Birthday: 1994-12-23

Address: Suite 751 871 Lissette Throughway, West Kittie, NH 41603

Phone: +2366831109631

Job: Sales Producer

Hobby: Creative writing, Motor sports, Do it yourself, Skateboarding, Coffee roasting, Calligraphy, Stand-up comedy

Introduction: My name is Laurine Ryan, I am a adorable, fair, graceful, spotless, gorgeous, homely, cooperative person who loves writing and wants to share my knowledge and understanding with you.