Introduction

The Anypoint Connector for the Hadoop Distributed File System (HDFS) is used as a bi-directional gateway between Mule applications and HDFS.

Read through this user guide to understand how to set up and configure a basic flow using the connector. Track feature additions, compatibility, limitations and API version updates with each release of the connector using the Connector Release Notes. Review the connector operations and functionality using the Technical Reference alongside the demo applications.

MuleSoft maintains this connector under the Select support policy.

Prerequisites

This document assumes that you are familiar with Mule, Anypoint Connectors, and Anypoint Studio Essentials. To increase your familiarity with Studio, consider completing a Anypoint Studio Tutorial. This page requires some basic knowledge of Mule Concepts, Elements in a Mule Flow, and Global Elements.

To use the HDFS connector, you need:

  • Anypoint Studio - An instance of Anypoint Studio. If you do not use Anypoint Studio for development, follow the instructions in Configuring Maven Dependencies for your project.

  • An instance of Hadoop Distributed File System up and running. It can be downloaded from here.

Hardware and Software Requirements

For hardware and software requirements, please visit the Hardware and Software Requirements page.

Compatibility

HDFS Hadoop connector is compatible with the following:

Application/Service Version

Mule Runtime

3.6 or newer

Apache Hadoop

2.7.1 or newer

Starting with v5.0.0, the HDFS connector is licensed commercially with Anypoint Platform as are other Select connectors. Prior versions will remain freely available to the community.

Install the Connector

You can install the connector in Anypoint Studio using the instructions in Installing a Connector from Anypoint Exchange.

Upgrading from an Older Version

If you’re currently using an older version of the connector, a small popup appears in the bottom right corner of Anypoint Studio with an "Updates Available" message.

  1. Click the popup and check for available updates. 

  2. Click the Connector version checkbox and click Next and follow the instructions provided by the user interface. 

  3. Restart Studio when prompted. 

  4. After restarting, when creating a flow and using the HDFS Connector, if you have several versions of the connector installed, you may be asked which version you would like to use. Choose the version you would like to use.

Additionally, we recommend that you keep Studio up to date with its latest version.

Configure the Connector Global Element

To use the HDFS connector in your Mule application, you must configure a global HDFS element that can be used by the connector (read more about Global Elements). The HDFS connector offers the following global configuration, requiring the following credentials:

  1. Simple authetication configuration:

    Field Description

    NameNode URI

    The URI of the file system to connect to.

    It is passed to HDFS client as the FileSystem#FS_DEFAULT_NAME_KEY configuration entry. It can be overriden by values in configurationResources and configurationEntries.

    Username

    A simple user identity of a client process.

    It is passed to HDFS client as the "hadoop.job.ugi" configuration entry. It can be overriden by values in configurationResources and configurationEntries. If not provided it will use the currently logged in user.

    Configuration Resources

    A list of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. (e.g core-site.xml)

    Configuration Entries

    A map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key/value pairs.

    hdfs-config

  2. Kerberos authentication configuration:

    Field Description

    NameNode URI

    The URI of the file system to connect to.

    It is passed to HDFS client as the FileSystem#FS_DEFAULT_NAME_KEY configuration entry. It can be overriden by values in configurationResources and configurationEntries.

    Username

    A simple user identity of a client process.

    It is passed to HDFS client as the "hadoop.job.ugi" configuration entry. It can be overriden by values in configurationResources and configurationEntries. If not provided it will use the currently logged in user.

    KeytabPath

    Path to the keytab file associated with username.

    It is used in order to obtain TGT from "Authorization server". If not provided it will look for a TGT associated to username within your local kerberos cache.

    Configuration Resources

    A list of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. (e.g core-site.xml)

    Configuration Entries

    A map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key/value pairs.

    hdfs-config-with-kerberos

Using the Connector

You can use this connector as an inbound endpoint for polling content of a file at a configurable rate (interval) or as an outbound connector for manipulating data into the HDFS server.

See a full list of operations for any version of the connector here.

Connector Namespace and Schema

When designing your application in Studio, the act of dragging the connector from the palette onto the Anypoint Studio canvas should automatically populate the XML code with the connector namespace and schema location.

If you are manually coding the Mule application in Studio’s XML editor or other text editor, define the namespace and schema location in the header of your Configuration XML, inside the <mule> tag.
1
2
3
4
5
6
7
8
9
10
11
12
<mule xmlns="http://www.mulesoft.org/schema/mule/core"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:connector="http://www.mulesoft.org/schema/mule/hdfs"
      xsi:schemaLocation="
               http://www.mulesoft.org/schema/mule/core
               http://www.mulesoft.org/schema/mule/core/current/mule.xsd
               http://www.mulesoft.org/schema/mule/connector
               http://www.mulesoft.org/schema/mule/connector/current/mule-hdfs.xsd">

      <!-- put your global configuration elements and flows here -->

</mule>

Using the Connector in a Mavenized Mule App

If you are coding a Mavenized Mule application, this XML snippet must be included in your pom.xml file.

1
2
3
4
5
<dependency>
  <groupId></groupId>
  <artifactId></artifactId>
  <version></version>
</dependency>

Inside the <version> tags, put the desired version number, the word RELEASE for the latest release, or SNAPSHOT for the latest available version. The available versions to date are:

  • 5.0.0

  • 4.0.0

  • 3.7.0

  • 3.6.0

Demo Mule Applications Using Connector

Exsting demos demontrate how to use connector for basic file system operations and how to poll data from a file at specific interval.

First Example Use Case

The following example shows how to create a text file into HDFS through connector:

  1. In Anypoint Studio, click File > New > Mule Project, name the project, and click OK.

  2. In the search field, type "http" and drag the HTTP connector to the canvas, click the green plus sign to the right of Connector Configuration, and in the next screen, click OK to accept the default settings. Name the endpoint /createFile.

  3. In the Search bar type "HDFS" and drag the HDFS connector onto the canvas. Configure as explained Configure the Connector Global Element

  4. Choose Write to path as an operation. Set Path to "/test.txt" (this is the path of the file that is going to be created into HDFS) and leave other options with default values.

  5. Run the application and from your favorite http client make a POST request having "Content-type:plain/text" to locahost:8081/createFile with content that you want to write as payload. (e.g. curl -X POST -H "Content-Type:plain/text" -d "payload to write to file" localhost:8090/createFile)

  6. Check that /test.txt has been created and has your content by using Hadoop explorer.

This and other basic file systen operations are demonstrated within this demo.

First Example Use Case - XML

Paste this into Anypoint Studio to interact with the example use case application discussed in this guide.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?xml version="1.0" encoding="UTF-8"?>

<mule xmlns:hdfs="http://www.mulesoft.org/schema/mule/hdfs" xmlns:http="http://www.mulesoft.org/schema/mule/http" xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:doc="http://www.mulesoft.org/schema/mule/documentation"
        xmlns:spring="http://www.springframework.org/schema/beans"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-current.xsd
http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd
http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd
http://www.mulesoft.org/schema/mule/hdfs http://www.mulesoft.org/schema/mule/hdfs/current/mule-hdfs.xsd">
    <http:listener-config name="HTTP_Listener_Configuration" host="0.0.0.0" port="8081" doc:name="HTTP Listener Configuration"/>
    <hdfs:config name="HDFS__Configuration" nameNodeUri="hdfs://localhost:9000" doc:name="HDFS: Configuration"/>
    <flow name="hdfs-example-use-caseFlow">
        <http:listener config-ref="HTTP_Listener_Configuration" path="/createFile" doc:name="HTTP"/>
        <hdfs:write config-ref="HDFS__Configuration" path="/test.txt" doc:name="HDFS"/>
    </flow>
</mule>

Second Example Use Case

The following example shows how to set up the Hadoop Connector for Windows from Anypoint Studio:

  1. Download the Hadoop files and paste them into a local folder

  2. Set the following Environment Variables HADOOP_HOME: C:\Users\user\hadoop\winutils-master\hadoop-2.7.1 > KRB5_CONFIG: C:\Users\user\hadoop > KRB5CCNAME: C:\Users\user\hadoop\tmp\krb5cache

  3. In Anypoint Studio, click File > New > Mule Project, name the project, and click OK.

  4. In the search field, type set-variable. Drag the connector set-variable component on the canvas. Set the KRB5.conf location in a flow variable, like the following: ${mule.home}${file.separator}apps${file.separator}${app.name}${file.separator}classes${file.separator}krb5.conf

  5. In the search field, type expression-component. Drag the connector expression-component on the canvas. Set the Expression like: System.setProperty("java.security.krb5.conf",flowVars.krb5Loc);

  6. In the search field, type set-variable. Drag the connector set-variable component on the canvas. Set the TrustStore Location like this: ${mule.home}${file.separator}apps${file.separator}${app.name}${file.separator}classes${file.separator}hdfs.ts

  7. In the search field, type expression-component. Drag the connector expression-component on the canvas. Set the System.setProperty("javax.net.ssl.trustStore",flowVars.HDFS_TrustStore);

  8. In the search field, type expression-component. Drag the connector expression-component on the canvas. Set the System.setProperty("javax.net.ssl.trustStorePassword","example");

The following second example xml:

< ?xml version="1.0" encoding="UTF-8"?>
            < mule xmlns:tls="http://www.mulesoft.org/schema/mule/tls" xmlns:file="http://www.mulesoft.org/schema/mule/file" xmlns:scripting="http://www.mulesoft.org/schema/mule/scripting" xmlns:api-platform-gw="http://www.mulesoft.org/schema/mule/api-platform-gw" xmlns:dw="http://www.mulesoft.org/schema/mule/ee/dw" xmlns:tracking="http://www.mulesoft.org/schema/mule/ee/tracking" xmlns:hdfs="http://www.mulesoft.org/schema/mule/hdfs" xmlns:doc="http://www.mulesoft.org/schema/mule/documentation" xmlns="http://www.mulesoft.org/schema/mule/core" xmlns:apikit="http://www.mulesoft.org/schema/mule/apikit" xmlns:http="http://www.mulesoft.org/schema/mule/http" xmlns:spring="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mulesoft.org/schema/mule/core http://www.mulesoft.org/schema/mule/core/current/mule.xsd
http://www.mulesoft.org/schema/mule/http http://www.mulesoft.org/schema/mule/http/current/mule-http.xsd
http://www.mulesoft.org/schema/mule/apikit http://www.mulesoft.org/schema/mule/apikit/current/mule-apikit.xsd
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd
http://www.mulesoft.org/schema/mule/hdfs http://www.mulesoft.org/schema/mule/hdfs/current/mule-hdfs.xsd
http://www.mulesoft.org/schema/mule/ee/tracking http://www.mulesoft.org/schema/mule/ee/tracking/current/mule-tracking-ee.xsd
http://www.mulesoft.org/schema/mule/ee/dw http://www.mulesoft.org/schema/mule/ee/dw/current/dw.xsd
http://www.mulesoft.org/schema/mule/api-platform-gw http://www.mulesoft.org/schema/mule/api-platform-gw/current/mule-api-platform-gw.xsd
http://www.mulesoft.org/schema/mule/scripting http://www.mulesoft.org/schema/mule/scripting/current/mule-scripting.xsd
http://www.mulesoft.org/schema/mule/file http://www.mulesoft.org/schema/mule/file/current/mule-file.xsd
http://www.mulesoft.org/schema/mule/tls http://www.mulesoft.org/schema/mule/tls/current/mule-tls.xsd">
    < http:listener-config name="keystone-system-httpListenerConfig" host="0.0.0.0" port="8091" doc:name="HTTP Listener Configuration"/>
    < apikit:config name="keystone-system-config" raml="keystone-system.raml" consoleEnabled="false" doc:name="Router" keepRamlBaseUri="true"/>
    < hdfs:config-with-kerberos name="HDFS__Kerberos_Configuration" nameNodeUri="uri" username="username" keytabPath="${mule.home}${file.separator}apps${file.separator}${app.name}${file.separator}classes${file.separator}.keytab" doc:name="HDFS: Kerberos Configuration" >
        < hdfs:configuration-entries>
            < hdfs:configuration-entry key="hadoop.security.key.provider.path">kms://https@username.path.com:93
            < hdfs:configuration-entry key="dfs.encryption.key.provider.uri">kms://https@username.path.com:93
        < /hdfs:configuration-entries>
    < /hdfs:config-with-kerberos>
    < api-platform-gw:api apiName="api-system-keystone-dev" version="1.0.0" flowRef="keystone-system-main" create="true" apikitRef="keystone-system-config" doc:name="API Autodiscovery"/>
    < flow name="keystone-system-main">< http:listener config-ref="keystone-system-httpListenerConfig" path="/api/*" doc:name="HTTP"/>
        < apikit:router config-ref="keystone-system-config" doc:name="APIkit Router"/>
        < exception-strategy ref="keystone-system-apiKitGlobalExceptionMapping" doc:name="Reference Exception Strategy"/>
    < /flow>
    < flow name="keystone-system-console">
        < http:listener config-ref="keystone-system-httpListenerConfig" path="/console/*" doc:name="HTTP"/>
        < apikit:console config-ref="keystone-system-config" doc:name="APIkit Console"/>
    < /flow>
    < flow name="get:/hadoop/hdfs/secure/hr/read:keystone-system-config">
        < flow-ref name="ConfigureHDFS" doc:name="ConfigureHDFS"/>
        < set-variable variableName="HDFSPathToRead" value="#[message.inboundProperties.'http.query.params'.hdfspath]" doc:name="HDFS Path"/>
        < hdfs:read-operation config-ref="HDFS__Kerberos_Configuration" path="#[flowVars.HDFSPathToRead]" doc:name="ReadFileFromHDFS"/>
        < choice-exception-strategy doc:name="Choice Exception Strategy">
            < catch-exception-strategy enableNotifications="false" logException="false" doc:name="Catch Exception Strategy">
                < expression-component doc:name="Expression"> -1) {
	msg = "Missing or Null Payload not Allowed";
} else
{
msg = msg.substring(badRequestException.length());
}
} else if (msg.startsWith(badConnectionException)) {
	msg = msg.substring(badConnectionException.length());

}
payload = msg.replace("\\","\\\\").replace("\"", "\\\"");
]]>< /expression-component>
                < choice doc:name="Choice">
                    < when expression="#[exception.cause.toString().startsWith("org.mule.api.ConnectionException: ")]">
                        < set-property propertyName="http.status" value="#[404]" doc:name="Status - 404"/>
                    < /when>
                    < otherwise>
                        < set-property propertyName="http.status" value="#[500]" doc:name="Status - 500"/>
                    < /otherwise>
                < /choice>
                < dw:transform-message doc:name="Transform Message">
                    < dw:set-payload>< /dw:set-payload>
                < /dw:transform-message>
            < /catch-exception-strategy>
        < /choice-exception-strategy>

    < /flow>
    < flow name="post:/hadoop/hdfs/secure/hr/write:multipart/form-data:keystone-system-config">
        < flow-ref name="ConfigureHDFS" doc:name="ConfigureHDFS"/>
        < set-payload value="#[message.inboundAttachments]" doc:name="Retrieve Attachments"/>
        < set-variable variableName="myMap" value="#[[:]]" doc:name="MyMap"/>

              < foreach doc:name="For Each">
            < logger message="Attachment: #[payload.getName()]" level="DEBUG" doc:name="Logger"/>
            < expression-component doc:name="Expression">< /expression-component>
< /foreach>
        < hdfs:write config-ref="HDFS__Kerberos_Configuration" path="#[flowVars.myMap.hdfspath]" payload-ref="#[flowVars.myMap.fileContents]" doc:name="WriteToHDFS"/>
        < choice-exception-strategy doc:name="Copy_of_Choice Exception Strategy">
            < catch-exception-strategy enableNotifications="false" logException="false" doc:name="Copy_of_Catch Exception Strategy">
                < expression-component doc:name="Copy_of_Expression"> -1) {
	msg = "Missing or Null Payload not Allowed";
} else
{
msg = msg.substring(badRequestException.length());
}
} else if (msg.startsWith(badConnectionException)) {
	msg = msg.substring(badConnectionException.length());

}
payload = msg.replace("\\","\\\\").replace("\"", "\\\"");
]]>< /expression-component>
                < choice doc:name="Copy_of_Choice">
                    < when expression="#[exception.cause.toString().startsWith("org.mule.api.ConnectionException: ")]">
                        < set-property propertyName="http.status" value="#[404]" doc:name="Copy_of_Status - 404"/>
                    < /when>
                    < otherwise>
                        < set-property propertyName="http.status" value="#[500]" doc:name="Copy_of_Status - 500"/>
                    < /otherwise>
                < /choice>
                < dw:transform-message doc:name="Copy_of_Transform Message">
                    < dw:set-payload>< /dw:set-payload>
                < /dw:transform-message>
            < /catch-exception-strategy>
        < /choice-exception-strategy>

    < /flow>
    < apikit:mapping-exception-strategy name="keystone-system-apiKitGlobalExceptionMapping">
        < apikit:mapping statusCode="404">
            < apikit:exception value="org.mule.module.apikit.exception.NotFoundException" />
            < dw:transform-message doc:name="Transform Message">
                < dw:set-payload>< /dw:set-payload>
            < /dw:transform-message>

        < /apikit:mapping>
        < apikit:mapping statusCode="405">
            < apikit:exception value="org.mule.module.apikit.exception.MethodNotAllowedException" />
            < dw:transform-message doc:name="Transform Message">
                < dw:set-payload>< /dw:set-payload>
            < /dw:transform-message>

        < /apikit:mapping>
        < apikit:mapping statusCode="415">
            < apikit:exception value="org.mule.module.apikit.exception.UnsupportedMediaTypeException" />
            < dw:transform-message doc:name="Transform Message">
                < dw:set-payload>< /dw:set-payload>
            < /dw:transform-message>

        < /apikit:mapping>
        < apikit:mapping statusCode="406">
            < apikit:exception value="org.mule.module.apikit.exception.NotAcceptableException" />
            < dw:transform-message doc:name="Transform Message">
                < dw:set-payload>< /dw:set-payload>
            < /dw:transform-message>

        < /apikit:mapping>
        < apikit:mapping statusCode="400">
            < apikit:exception value="org.mule.module.apikit.exception.BadRequestException" />
            < dw:transform-message doc:name="Transform Message">
                < dw:set-payload>< /dw:set-payload>
            < /dw:transform-message>

        < /apikit:mapping>
    < /apikit:mapping-exception-strategy>
    < sub-flow name="ConfigureHDFS">
        < set-variable variableName="krb5Loc" value="${mule.home}${file.separator}apps${file.separator}${app.name}${file.separator}classes${file.separator}krb5.conf" doc:name="Set KRB5 Variable"/>
        < expression-component doc:name="Set System Property">
        < set-variable variableName="HDFS_TrustStore" value="${mule.home}${file.separator}apps${file.separator}${app.name}${file.separator}classes${file.separator}hdfs.ts" doc:name="Set TrustStore"/>
        < expression-component doc:name="Set TrustStore Location">
        < expression-component doc:name="Set TrustStore Password">
    < /sub-flow>
< /mule>