This post is part of a three part series exploring the use of Debezium to capture changes from an Oracle database using Oracle LogMiner. In case you missed it, the first part of this series is here.Here.
In this second part, we'll build on what we did in part 1 by implementing the Oracle connector using Zookeeper, Kafka, and Kafka Connect. We'll look at a variety of connector configuration options and why they're important. And finally, we'll see the plugin in action!
Kafka Connect setup and requirements
To use Debezium, three separate services must be started:
-
zoo worker
-
Kafka-Makler
-
Kafka connection
we are going to usestevedoreContainer to perform the above services. Using separate containers simplifies the deployment process so you can see Debezium in action. In addition, we are also going to download theControlador Oracle JDBCand deploy it as part of the Kafka Connect container.
Using multiple instances of these services in production provides performance, reliability, and fault tolerance. The implementation would typically involve a platform like OpenShift or Kubernetes to manage multiple containers, or you would use dedicated hardware and manage it manually. For this blog, we use a single instance of each service to keep it simple. |
Zookeeper and Kafka containers are short-lived. Volumes are typically mounted on the host machine so that when the container is stopped, the data managed by the container persists. For the sake of simplicity, let's skip this step so that container interruption causes data loss. |
Requirements: Start or Zookeeper
The Zookeeper service is the first service to start. The Kafka broker uses Zookeeper to handle the Kafka broker leadership election and manage discovery of services within the cluster, so each broker knows when a sibling has joined or left, when a broker left, and who it is. the new leader of a group. the specific topic/partition is a tuple.
Open a new terminal window and run the following command:
docker run -it --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 \ quay.io/debezium/zookeeper:1.9
Ozoo workerThe container is started in interactive mode and is destroyed when stopped. The container is calledzoo worker
, which will be important in the release of future containers.
Prerequisites: Kafka startup
The Kafka service is the second service to start and depends on the Zookeeper service. Debezium produces change events that are sent to topics managed by the Kafka agent.
Open a new terminal window and run the following command:
docker run -it --rm --name kafka -p 9092:9092 --link zookeeper:zookeeper \ quay.io/debezium/kafka:1.9
OkafkaThe container is started in interactive mode and is destroyed when stopped. The container is calledkafka
, which will be important from future containers. furthermore, thekafkaThe service also links to thezoo workerService, meaning the canonical namezoo worker
resolves to the container it is running inzoo workerService.
Prerequisites: Download the Oracle JDBC driver
The Debezium Kafka Connect image does not come with the Oracle JDBC driver. To use Debezium for Oracle, the JDBC driver must be manually downloaded and included in the Debezium Kafka Connect image.
navigate toOracle Database JDBC-Treiberdownload page As of this writing, the latest Oracle database is Oracle 21, so click on itojdbc8.jar
Link in section Oracle 21c. The downloaded JAR is used in the next section, adding the driver to Debezium's base Kafka Connect container image.
Prerequisites: Starting Kafka Connect
The Kafka Connect service is the third and last service to start, and it depends on the Kafka service. Kafka Connect is responsible for managing all the connectors and their associated workloads and is the runtime environment responsible for running the Debezium Connector for Oracle when we release it soon.
Open a new terminal window and run the following command:
docker run -it --rm --name connect -p 8083:8083 \ -e GROUP_ID=1 \ -e CONFIG_STORAGE_TOPIC=my_connect_configs \ -e OFFSET_STORAGE_TOPIC=my_connect_offsets \ -e STATUS_STORAGE_TOPIC=my_connect_statuses \ --link kafka:kafka \ -- enlace dbz_oracle21:dbz_oracle21 \ -v /path/to/ojdbc8.jar:/kafka/libs/ojdbc8.jar \ quay.io/debezium/connect:1.9
Oto connectThe container is started in interactive mode and is destroyed when stopped. The container is calledto connect
and various environment variables control the naming of various required themes and some required configuration parameters. furthermore, theto connectContainer links forkafkaContainer, meaning the canonical namekafka
resolves to the container it is running inkafkaintermediary service.
Unlike the previous containers, we mount a volume with the Ocamino localrepresents where the |
Create some initial test data
If the Oracle database created in Part 1 of this series uses the Oracle container registry image, there will be no seed data in the database. While this isn't necessarily a problem, ideally we'd like to get some data when implementing the Oracle connector; Therefore, some initial data must be available before implementation.
In a new terminal, let's connect to the database using SQL*Plus and create a new table with some initial data. The following uses a general user connecting to a pluggable databaseORCLPDB1
. You can safely skip this step if you are connecting to an existing environment with tables to collect.
docker exec -it -e ORACLE_SID=ORCLPDB1 dbz_oracle21 sqlplus c##dbzuser@ORCLPDB1
Once connected, use the following SQL to create a table and some initial data:
CREATE TISCHThe client (identification number (9,0)primary I like, Namevarchar2(50));INSERTION EMCustomersVALUES(1001,'Tomas Zimmer');INSERTION EMCustomersVALUES(1002,'george bailey');INSERTION EMCustomersVALUES(1003,'edward walker');INSERTION EMCustomersVALUES(1004,'Ana Kretchmar');COMMIT;
By default, redo logs capture minimal information about changes to theCUSTOMERS
Table because the additional logging is only defined at the database level.
If you are familiar with PostgreSQLIDENTITY REPLICA
o MySQLbinlog_format
, Oracle provides a similar mechanism called Supplemental Table-Level Writing, which we mentioned in Part 1 of this series. Additional table-level logging keeps track of the columns captured in redo logs when users switch rows. Set Additional Table Log Level(ALL) COLUMNS
guarantees that Oracle will capture changes related toINSERTION
,UPDATE
, miEXTINGUISH
Operations on redo logs.
Use the following SQL to set the extra logging level for the table:
CHANGE TISCHCustomersADDSUPPLEMENTARY REGISTRATION DATA (NO)COLUMNS;
Suppose the additional logging level of a captured table is set incorrectly. In this case, the adapter logs a warning that there is a problem, so that you can adjust the table configuration to capture any changes. |
It should be noted that although this example uses the same user account to create thisCUSTOMERS
that the connector uses to connect, it is not uncommon for the user used by the connector to be different from the user who owns the tables in the Oracle database. In this case, the connector user must have permission to read the captured tables, which is theCHOOSE
Table Eligibility.
Preparing the Oracle Connectors
We are now ready to implement the Debezium Oracle Connector. Before registering the connector in Kafka Connect, let's look at the configuration in detail.
Below is an example configuration that we will use in this example:
{"Name":"kunden connector","settings": {"connector.class":"io.debezium.conector.oracle.OracleConnector","tasks.max":"1","Database.Hostname":"dbz_oracle21","database.port":"1521","Database.User":"c##dbzuser","Database Password":"dbz","database.dbname":"ORCLCDB","database.pdb.name":"ORCLPDB1","Datenbank.Server.Name":"Servidor1","table.include.list":"C##DBZUSER.CLIENTE","database.history.kafka.bootstrap.servers":"kafka:9092","database.history.kafka.topic":"schema changes"}}
Let's take a look at what each of these settings means.
Name
-
This is the name assigned to the connector, which must be unique within the Kafka Connect cluster.
connector.class
-
This is the class implementation of the given connector. Each of the Debezium source connectors has a unique class name to identify which connector is implemented.
tasks.max
-
This is the maximum number of tasks assigned to the connector implementation in Kafka Connect. Most Debezium plug-ins read changes sequentially from the source database, so a value of
1
often useful. Database.Hostname
-
This is the hostname or IP address of the database. Whenever we provide a link to
dbz_oracle21
container when starting Kafka Connect, we can use this name here to identify the container running the Oracle database. If you have an existing Oracle environment on another host, specify that hostname in this configuration property. database.port
-
This is the port that the database uses to listen for connections. Oracle's default port is
1521
However, a database administrator can configure this for any available port. If you are connecting to an existing Oracle instance, use the port that the database uses. Database.User
-
This is the database user account used for JDBC connections. This should be the general user created in the first part of this series, the
c##dbzuser
From the user. If you are connecting to an environment that does not support multi-user, this is the user you created in the root database without the common user prefix. Database Password
-
This is the password of the database user account.
database.dbname
-
This is the database service that the adapter communicates with. Regardless of whether multi-tenancy is enabled or not, this is always the singular or root container database.
database.pdb.name
-
This is the optional removable database system identifier. This property must be specified when connecting to a database that supports multi-user and references the PDB. If this field is omitted, the adapter assumes that the database does not support multi-user.
Datenbank.Server.Name
-
The prefix used for all themes created by the adapter. This value must be unique across all topic implementations in the Kafka Connect cluster.
table.include.list
-
A comma-separated list of regular expressions or simple table names in the form of
<Schema>.<Tables>
Identify which tables are captured by the adapter. database.history.kafka.bootstrap.servers
-
This is the URL of the Kafka broker where the database history topic will be stored. Whenever we provide a link to
kafka
container when starting Kafka Connect, we can use this name here to refer to the broker and its port. database.history.kafka.topic
-
This is the name of the topic that stores the history of the database schema. This topic will be restored when the connector restarts and populates the relational model in memory for this topic.
All Debezium plug-ins, except PostgreSQL, use a schema history to store the schemas of all tables. This is typically not ideal for Oracle databases, especially if the connector is implemented without multi-tenancy. To limit memory to only tables in the include list, change the connector configuration by adding the |
For more information on other connector properties, see Oracledocumentationfor more details.
To implement the connector, save the above configuration to a file namedregistro-oracle.json
. Now open a new terminal window and use theshirred ruffle
Command to register the connector with Kafka Connect:
curl -i -X POST -H "Aceptar:aplicación/json" \ -H "Tipo de contenido:aplicación/json" \localhost:8083/conectores \ -d @register-oracle.json
If the registration is successful, the terminal where theto connectThe container is running, it starts taking a snapshot of the data on theCUSTOMERS
Table. We can also confirm that the data exists in Kafka by using the Kafka console consumption tool and reading the topic content in the local terminal.
To verify the content of the theme, use the same terminal where the connector was registered and run the following command:
docker exec -it kafka /kafka/bin/kafka-console-consumer.sh \ --bootstrap-server 0.0.0.0:9092 \ --from-beginning \ --property print.key=true \ --topic server1.C__DBZUSER .KUNDEN
The theme converts the schema name from |
The output of the above command should look like this:
{"Scheme":{...},"useful load":{"Before":Null,"after":{"I WAS GOING":"1001","NAME":"Tomas Zimmer"},"Those":{"execution":"1.9.6. Final","connector":"oracle","Name":"Servidor1","ts_ms":1665102121000,"snapshot":"TRUE","Data bank":"ORCLPDB1","Serie":Null,"Scheme":"C##DBZUSER","Tisch":"CUSTOMERS","txId":Null,"scan":"2868546","cometer_scn":Null,"lcr_position":Null,"rs_id":Null,"sn":0,"redo_thread":Null},"op":"R","ts_ms":1665102126961,"transaction":Null}}...
You can now use the SQLPlus terminal where you created the initial test dataINSERTION
,UPDATE
, oEXTINGUISH
records within theCUSTOMERS
Table. You will see the corresponding change events on the endpoint that is currently being trackedservidor1.C__DBZUSER.CUSTOMER
He.
Please note that SQLPlus is not enabled |
Diploma
In the first part of this series, we discussed what Oracle is, why it's so popular in the database world, and how to install and configure the database. In this part of the series, we discuss how to install all the necessary services, including Zookeeper, Apache Kafka, and Apache Kafka Connect. In addition, we also provide a sample Oracle connector that captures changes to theCUSTOMERS
Tisch.
In the next part of this series, I'll talk about performance, how the connector is monitored, key metrics, and why they're important. We can even create a small dashboard with metrics.