NoSQL Zone is brought to you in partnership with:

Don Pinto is a Product Marketing Manager with experience in cloud and database technologies. Don is a DZone MVB and is not an employee of DZone and has posted 80 posts at DZone. You can read more from them at their website. View Full User Profile

From MySQL to Couchbase using Talend ETL

07.10.2013
| 5585 views |
  • submit to reddit

Curator's Note: The content of this article was originally written by Alexis Roos over at the Couchbase blog. 

Couchbase offers an unique NoSQL database combining integrated cache and storage technology. NoSQL is gaining strong popularity and a number of customers are looking to export or map some of the data they have in RDBMS to Couchbase NoSQL; this is where Talend’s offering can be leveraged.

Talend offers a number of capabilities around integration from business process management to enterprise service bus to master data management. Talend’s new Open Studio for Big Data version 5.3 release contains a Couchbase connector, which enables a large number of scenarios where Couchbase Server can be used alongside with traditional operational and analytical data sources.

To provide you with a basic idea on how to configure and use Talend with Couchbase Server, I will walk you through a simple scenario to create a ETL workflow using Talend Open Studio software to export data from MySQL database server into a Couchbase database server using JSON document format. The workflow will read from a MySQL database table and create corresponding JSON documents in Couchbase Server. You can build upon to example and augment it to support more complex scenarios involving data transformations, complex mappings, etc. The following steps have been tested on CentOS but can be adapted for other systems.

So let’s get started ...

Prerequisites

First, let’s look at the pre-requisites you need to get installed:

Start

Now, let’s make sure that everything is started

  • Start MySQL, if not started already:
    • sudo service mysqld start
  • Start Couchbase, if not started already:
    • sudo /etc/init.d/couchbase-server start
  • Launch Talend Open Studio for Big Data:
    • Execute following command or similar (based on installation directory and version): /opt/TOS_BD-r101800-V5.3.0/TOS_BD-linux-gtk-x86_64

Create

Now, let’s use Talend Open Studio to create the ETL workflow

  • After launching Talend, click on Create button and enter a project name such as “MySQLToCouchbaseIntegration” and click on Finish.
  • Select the created project and click on Open button

  • Create account in Talend or click on Skip button, this will open the IDE
  • Wait for IDE initialization to complete and close or minimize Welcome window
  • In Repository panel, left click on Job Designs and then right click on Create job
  • Enter job name, such as “MySQLToCouchbaseIntegrationJob” and click on Finish.

  • This will create a panel where job artifacts can be dragged and dropped and should display the palette to the right.
  • From the palette, Drag and Drop tMyslInput widget (can be found under Databases/MySQL) to the left

  • From the palette, Drag and Drop tCouchbaseOutput widget (can be found under Databases/Couchbase) to the right

  • Both components require jar files which will are not shipped with Talend software. Talend has a wiki explaining how to add jar files. At times, jar files can be directly downloaded from IDE or might need be downloaded manually and loaded into Talend by clicking on jar symbol (the one with “Add External JARs tooltip”) on Modules panel.

  • Jars must be loaded / resolved before processing to next steps. See detailed info
  • Double click on tMysqlInput on panel; this will display Component panel at bottom
  • Enter MySQL database connection info in panel. Note that all fields must be surrounded by double quotes (ie “)
  • Click on … button which is located to the right of Table Name, expand world and select City entry and Click on OK button
  • Remove default Query and replace by “SELECT * FROM City”
  • Below is an example screenshot of what it should look like:

  • Click on Guess schema button, this will bring up the Schema window, move and resize window as appropriate
  • Edit schema with the following changes:
    • Check Nullable for ID
    • Check Nullable for Population
    • Change ID type to String
    • Input DB types with following and click on OK button
      • INT for ID
      • VARCHAR for Name
      • VARCHAR for CountryCode
      • VARCHAR for District
      • INT for Population
  • Below is an example screenshot of what it should look like:

  • Left click on tMysqlInput and then right click to select Row / Main and drag arrow from tMysqlInput component to tCouchbaseOutput component. An arrow should be created between the two.

  • Left click on tCouchbaseOutput on panel; this will display Component panel at bottom
  • Enter Couchbase database connection info in panel. Note that all fields must be surrounded by double quotes (ie “)
  • In this example, the bucket name is default and it does not have authentication set up so username and password are left with “”
  • URI should be input by clicking on + button and changing field to pool URI such as http://localhost:8091/pools
  • Key should be left to ID (default) and Action on data to Insert
  • Below is an example screenshot of what it should look like:

  • Click on Configure JSON Tree … button; this will bring up mapper allowing mapping from table schema to JSON.
  • Select All schema attributes (using multi selection) on the left schema list and drop onto rootTag on the Link target window to the right and select Create as sub-element of target node, then click on OK button.
  • Output should look like this:

  • Select ID attribute in the Link target window on the right and right click on Set As Loop Element, then click on OK button

  • If everything is fine, both icons on the Designer panel should show no sign of errors, in which case we can now run the job
  • Click on Run panel, this will build job and execute it
  • Access your Couchbase server bucket and verify that the documents have been loaded: http://localhost:8091/index.html#sec=buckets
  • This should have added 4079 documents to the default bucket:

  • You can drill down and look at individual documents:

  • Voila!

As indicated, much more complex scenarios can be built and we will follow up soon with more elaborate examples.

Published at DZone with permission of Don Pinto, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)