Debezium for Oracle - Part 2: Running the connector (2023)

This post is part of a three part series exploring the use of Debezium to capture changes from an Oracle database using Oracle LogMiner. In case you missed it, the first part of this series is here.Here.

In this second part, we'll build on what we did in part 1 by implementing the Oracle connector using Zookeeper, Kafka, and Kafka Connect. We'll look at a variety of connector configuration options and why they're important. And finally, we'll see the plugin in action!

Kafka Connect setup and requirements

To use Debezium, three separate services must be started:

  • zoo worker

  • Kafka-Makler

  • Kafka connection

we are going to usestevedoreContainer to perform the above services. Using separate containers simplifies the deployment process so you can see Debezium in action. In addition, we are also going to download theControlador Oracle JDBCand deploy it as part of the Kafka Connect container.

Using multiple instances of these services in production provides performance, reliability, and fault tolerance. The implementation would typically involve a platform like OpenShift or Kubernetes to manage multiple containers, or you would use dedicated hardware and manage it manually.

For this blog, we use a single instance of each service to keep it simple.

Zookeeper and Kafka containers are short-lived. Volumes are typically mounted on the host machine so that when the container is stopped, the data managed by the container persists. For the sake of simplicity, let's skip this step so that container interruption causes data loss.

Requirements: Start or Zookeeper

The Zookeeper service is the first service to start. The Kafka broker uses Zookeeper to handle the Kafka broker leadership election and manage discovery of services within the cluster, so each broker knows when a sibling has joined or left, when a broker left, and who it is. the new leader of a group. the specific topic/partition is a tuple.

Open a new terminal window and run the following command:

docker run -it --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 \ quay.io/debezium/zookeeper:1.9

Ozoo workerThe container is started in interactive mode and is destroyed when stopped. The container is calledzoo worker, which will be important in the release of future containers.

(Video) Capture Oracle database events with Debezium - Preparing the database (Part 1)

Prerequisites: Kafka startup

The Kafka service is the second service to start and depends on the Zookeeper service. Debezium produces change events that are sent to topics managed by the Kafka agent.

Open a new terminal window and run the following command:

OkafkaThe container is started in interactive mode and is destroyed when stopped. The container is calledkafka, which will be important from future containers. furthermore, thekafkaThe service also links to thezoo workerService, meaning the canonical namezoo workerresolves to the container it is running inzoo workerService.

Prerequisites: Download the Oracle JDBC driver

The Debezium Kafka Connect image does not come with the Oracle JDBC driver. To use Debezium for Oracle, the JDBC driver must be manually downloaded and included in the Debezium Kafka Connect image.

navigate toOracle Database JDBC-Treiberdownload page As of this writing, the latest Oracle database is Oracle 21, so click on itojdbc8.jarLink in section Oracle 21c. The downloaded JAR is used in the next section, adding the driver to Debezium's base Kafka Connect container image.

Prerequisites: Starting Kafka Connect

The Kafka Connect service is the third and last service to start, and it depends on the Kafka service. Kafka Connect is responsible for managing all the connectors and their associated workloads and is the runtime environment responsible for running the Debezium Connector for Oracle when we release it soon.

Open a new terminal window and run the following command:

docker run -it --rm --name connect -p 8083:8083 \ -e GROUP_ID=1 \ -e CONFIG_STORAGE_TOPIC=my_connect_configs \ -e OFFSET_STORAGE_TOPIC=my_connect_offsets \ -e STATUS_STORAGE_TOPIC=my_connect_statuses \ --link kafka:kafka \ -- enlace dbz_oracle21:dbz_oracle21 \ -v /path/to/ojdbc8.jar:/kafka/libs/ojdbc8.jar \ quay.io/debezium/connect:1.9

Oto connectThe container is started in interactive mode and is destroyed when stopped. The container is calledto connectand various environment variables control the naming of various required themes and some required configuration parameters. furthermore, theto connectContainer links forkafkaContainer, meaning the canonical namekafkaresolves to the container it is running inkafkaintermediary service.

Unlike the previous containers, we mount a volume with the-vDomain. The argument takes the form oflocal-path:container-path.

Ocamino localrepresents where theojdbc8.jarThe file exists on the host machine. EITHERcontainer pathshould be/kafka/libs/ojdbc8.jarinstalling the driver on the Kafka Connect classpath.

Create some initial test data

If the Oracle database created in Part 1 of this series uses the Oracle container registry image, there will be no seed data in the database. While this isn't necessarily a problem, ideally we'd like to get some data when implementing the Oracle connector; Therefore, some initial data must be available before implementation.

In a new terminal, let's connect to the database using SQL*Plus and create a new table with some initial data. The following uses a general user connecting to a pluggable databaseORCLPDB1. You can safely skip this step if you are connecting to an existing environment with tables to collect.

docker exec -it -e ORACLE_SID=ORCLPDB1 dbz_oracle21 sqlplus c##dbzuser@ORCLPDB1

Once connected, use the following SQL to create a table and some initial data:

CREATE TISCHThe client (identification number (9,0)primary I like, Namevarchar2(50));INSERTION EMCustomersVALUES(1001,'Tomas Zimmer');INSERTION EMCustomersVALUES(1002,'george bailey');INSERTION EMCustomersVALUES(1003,'edward walker');INSERTION EMCustomersVALUES(1004,'Ana Kretchmar');COMMIT;

By default, redo logs capture minimal information about changes to theCUSTOMERSTable because the additional logging is only defined at the database level.

If you are familiar with PostgreSQLIDENTITY REPLICAo MySQLbinlog_format, Oracle provides a similar mechanism called Supplemental Table-Level Writing, which we mentioned in Part 1 of this series. Additional table-level logging keeps track of the columns captured in redo logs when users switch rows. Set Additional Table Log Level(ALL) COLUMNSguarantees that Oracle will capture changes related toINSERTION,UPDATE, miEXTINGUISHOperations on redo logs.

Use the following SQL to set the extra logging level for the table:

CHANGE TISCHCustomersADDSUPPLEMENTARY REGISTRATION DATA (NO)COLUMNS;

Suppose the additional logging level of a captured table is set incorrectly. In this case, the adapter logs a warning that there is a problem, so that you can adjust the table configuration to capture any changes.

It should be noted that although this example uses the same user account to create thisCUSTOMERSthat the connector uses to connect, it is not uncommon for the user used by the connector to be different from the user who owns the tables in the Oracle database. In this case, the connector user must have permission to read the captured tables, which is theCHOOSETable Eligibility.

(Video) Red Hat Build of Debezium: An Overview of Change Data Capture

Preparing the Oracle Connectors

We are now ready to implement the Debezium Oracle Connector. Before registering the connector in Kafka Connect, let's look at the configuration in detail.

Below is an example configuration that we will use in this example:

(Video) Stream your PostgreSQL changes into Kafka with Debezium

{"Name":"kunden connector","settings": {"connector.class":"io.debezium.conector.oracle.OracleConnector","tasks.max":"1","Database.Hostname":"dbz_oracle21","database.port":"1521","Database.User":"c##dbzuser","Database Password":"dbz","database.dbname":"ORCLCDB","database.pdb.name":"ORCLPDB1","Datenbank.Server.Name":"Servidor1","table.include.list":"C##DBZUSER.CLIENTE","database.history.kafka.bootstrap.servers":"kafka:9092","database.history.kafka.topic":"schema changes"}}

Let's take a look at what each of these settings means.

Name

This is the name assigned to the connector, which must be unique within the Kafka Connect cluster.

connector.class

This is the class implementation of the given connector. Each of the Debezium source connectors has a unique class name to identify which connector is implemented.

tasks.max

This is the maximum number of tasks assigned to the connector implementation in Kafka Connect. Most Debezium plug-ins read changes sequentially from the source database, so a value of1often useful.

Database.Hostname

This is the hostname or IP address of the database. Whenever we provide a link todbz_oracle21container when starting Kafka Connect, we can use this name here to identify the container running the Oracle database. If you have an existing Oracle environment on another host, specify that hostname in this configuration property.

database.port

This is the port that the database uses to listen for connections. Oracle's default port is1521However, a database administrator can configure this for any available port. If you are connecting to an existing Oracle instance, use the port that the database uses.

Database.User

This is the database user account used for JDBC connections. This should be the general user created in the first part of this series, thec##dbzuserFrom the user. If you are connecting to an environment that does not support multi-user, this is the user you created in the root database without the common user prefix.

Database Password

This is the password of the database user account.

database.dbname

This is the database service that the adapter communicates with. Regardless of whether multi-tenancy is enabled or not, this is always the singular or root container database.

database.pdb.name

This is the optional removable database system identifier. This property must be specified when connecting to a database that supports multi-user and references the PDB. If this field is omitted, the adapter assumes that the database does not support multi-user.

Datenbank.Server.Name

The prefix used for all themes created by the adapter. This value must be unique across all topic implementations in the Kafka Connect cluster.

table.include.list

A comma-separated list of regular expressions or simple table names in the form of<Schema>.<Tables>Identify which tables are captured by the adapter.

database.history.kafka.bootstrap.servers

This is the URL of the Kafka broker where the database history topic will be stored. Whenever we provide a link tokafkacontainer when starting Kafka Connect, we can use this name here to refer to the broker and its port.

database.history.kafka.topic

This is the name of the topic that stores the history of the database schema. This topic will be restored when the connector restarts and populates the relational model in memory for this topic.

All Debezium plug-ins, except PostgreSQL, use a schema history to store the schemas of all tables. This is typically not ideal for Oracle databases, especially if the connector is implemented without multi-tenancy.

To limit memory to only tables in the include list, change the connector configuration by adding thedatabase.history.store.only.captured.tables.ddlproperty forTRUE.

(Video) Streaming Database Changes with Debezium by Gunnar Morling

For more information on other connector properties, see Oracledocumentationfor more details.

To implement the connector, save the above configuration to a file namedregistro-oracle.json. Now open a new terminal window and use theshirred ruffleCommand to register the connector with Kafka Connect:

curl -i -X ​​​​​​​​​​POST -H "Aceptar:aplicación/json" \ -H "Tipo de contenido:aplicación/json" \localhost:8083/conectores \ -d @register-oracle.json

If the registration is successful, the terminal where theto connectThe container is running, it starts taking a snapshot of the data on theCUSTOMERSTable. We can also confirm that the data exists in Kafka by using the Kafka console consumption tool and reading the topic content in the local terminal.

To verify the content of the theme, use the same terminal where the connector was registered and run the following command:

docker exec -it kafka /kafka/bin/kafka-console-consumer.sh \ --bootstrap-server 0.0.0.0:9092 \ --from-beginning \ --property print.key=true \ --topic server1.C__DBZUSER .KUNDEN

The theme converts the schema name fromC##DBZUSERForC__DBZUSERbecause the theme naming strategy automatically ensures that the theme name is Avro-compliant, which the hash character does not.

The output of the above command should look like this:

{"Scheme":{...},"useful load":{"Before":Null,"after":{"I WAS GOING":"1001","NAME":"Tomas Zimmer"},"Those":{"execution":"1.9.6. Final","connector":"oracle","Name":"Servidor1","ts_ms":1665102121000,"snapshot":"TRUE","Data bank":"ORCLPDB1","Serie":Null,"Scheme":"C##DBZUSER","Tisch":"CUSTOMERS","txId":Null,"scan":"2868546","cometer_scn":Null,"lcr_position":Null,"rs_id":Null,"sn":0,"redo_thread":Null},"op":"R","ts_ms":1665102126961,"transaction":Null}}...

You can now use the SQLPlus terminal where you created the initial test dataINSERTION,UPDATE, oEXTINGUISHrecords within theCUSTOMERSTable. You will see the corresponding change events on the endpoint that is currently being trackedservidor1.C__DBZUSER.CUSTOMERHe.

Please note that SQLPlus is not enabledautomatic confirmationSo make sure to automatically commit the changes when changing the data in theCUSTOMERSTable to make it visible to the connector mining process.

Diploma

In the first part of this series, we discussed what Oracle is, why it's so popular in the database world, and how to install and configure the database. In this part of the series, we discuss how to install all the necessary services, including Zookeeper, Apache Kafka, and Apache Kafka Connect. In addition, we also provide a sample Oracle connector that captures changes to theCUSTOMERSTisch.

In the next part of this series, I'll talk about performance, how the connector is monitored, key metrics, and why they're important. We can even create a small dashboard with metrics.

(Video) Change Data Capture For All Of Your Databases With Debezium - Episode 114

Videos

1. Bring Data from Source using Debezium with CDC into Kafka&S3Sink &Build Hudi Datalake | Hands on lab
(Soumil Shah)
2. Kafka Connect - Moving data between Oracle Databases
(Ian Whitney)
3. How to do CDC using debezium, kafka and postgres
(StartDataEngineering)
4. Kafka cdc implementation all starting barrier|| debezium || sink connector || source connector
(AnirbanIT)
5. Search Entities like Google - Part 2 Pulsar + Debezium
(ghostletters)
6. Data Streaming for Microservices using Debezium (Gunnar Morling)
(Devoxx)

References

Top Articles
Latest Posts
Article information

Author: Clemencia Bogisich Ret

Last Updated: 28/07/2023

Views: 6154

Rating: 5 / 5 (80 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Clemencia Bogisich Ret

Birthday: 2001-07-17

Address: Suite 794 53887 Geri Spring, West Cristentown, KY 54855

Phone: +5934435460663

Job: Central Hospitality Director

Hobby: Yoga, Electronics, Rafting, Lockpicking, Inline skating, Puzzles, scrapbook

Introduction: My name is Clemencia Bogisich Ret, I am a super, outstanding, graceful, friendly, vast, comfortable, agreeable person who loves writing and wants to share my knowledge and understanding with you.