Hadoop Distributed File System (HDFS) Connector.
Additional Info
Requires Mule Enterprise License |
Yes |
Requires Entitlement |
No |
Mule Version |
3.6.0 or higher |
Configs
Kerberos Configuration
<hdfs:config-with-kerberos>
Connection Management
Kerberos authentication configuration. Here you can configure properties required by "Kerberos Authentication" in order to establish connection with Hadoop Distributed File System.
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
name |
The name of this configuration. With this name can be later referenced. |
x |
||
nameNodeUri |
The name of the file system to connect to. It is passed to HDFS client as the {FileSystem#FS_DEFAULT_NAME_KEY} configuration entry. It can be overriden by values in configurationResources and configurationEntries. |
x |
||
keytabPath |
Path to the keytab file associated with username. It is used in order to obtain TGT from "Authorization server". If not provided it will look for a TGT associated to username within your local kerberos cache. |
|
||
username |
A simple user identity of a client process. It is passed to HDFS client as the "hadoop.job.ugi" configuration entry. It can be overriden by values in configurationResources and configurationEntries. |
|
||
configurationResources |
A List of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. (e.g core-site.xml) |
|
||
configurationEntries |
A Map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key/value pairs. |
|
Simple Configuration
<hdfs:config>
Connection Management
Simple authentication configuration. Here you can configure properties required by "Simple Authentication" in order to establish connection with Hadoop Distributed File System.
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
name |
The name of this configuration. With this name can be later referenced. |
x |
||
nameNodeUri |
The name of the file system to connect to. It is passed to HDFS client as the {FileSystem#FS_DEFAULT_NAME_KEY} configuration entry. It can be overriden by values in configurationResources and configurationEntries. |
x |
||
username |
A simple user identity of a client process. It is passed to HDFS client as the "hadoop.job.ugi" configuration entry. It can be overriden by values in configurationResources and configurationEntries. |
|
||
configurationResources |
A List of configuration resource files to be loaded by the HDFS client. Here you can provide additional configuration files. (e.g core-site.xml) |
|
||
configurationEntries |
A Map of configuration entries to be used by the HDFS client. Here you can provide additional configuration entries as key/value pairs. |
|
Processors
Read from path
<hdfs:read-operation>
Read the content of a file designated by its path and streams it to the rest of the flow:
XML Sample
<!-- Reading a file using with an operation rather than pooling with an endpoint -->
<hdfs:read-operation path="/tmp/test.dat" bufferSize="8192" config-ref="hdfs-conf"/>
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
config-ref |
Specify which config to use |
x |
||
path |
the path of the file to read. |
x |
||
bufferSize |
int |
the buffer size to use when reading the file. |
4096 |
|
Get path meta data
<hdfs:get-metadata>
Get the metadata of a path, as described in HDFSConnector#read(String, int, SourceCallback), and store it in flow variables.
This flow variables are:
- hdfs.path.exists - Indicates if the path exists (true or false)
- hdfs.content.summary - A resume of the path info
- hdfs.file.checksum - MD5 digest of the file (if it is a file and exists)
- hdfs.file.status - A Hadoop object that contains info about the status of the file (org.apache.hadoop.fs.FileStatus
XML Sample
<!-- Store the meta-information of a path in flow variables -->
<hdfs:get-metadata path="/tmp/test.dat" config-ref="hdfs-conf"/>
Write to path
<hdfs:write>
Write the current payload to the designated path, either creating a new file or appending to an existing one.
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
config-ref |
Specify which config to use |
x |
||
path |
the path of the file to write to. |
x |
||
permission |
the file system permission to use if a new file is created, either in octal or symbolic format (umask). |
700 |
|
|
overwrite |
boolean |
if a pre-existing file should be overwritten with the new content. |
true |
|
bufferSize |
int |
the buffer size to use when appending to the file. |
4096 |
|
replication |
int |
block replication for the file. |
1 |
|
blockSize |
long |
the buffer size to use when appending to the file. |
1048576 |
|
ownerUserName |
the username owner of the file. |
|
||
ownerGroupName |
the group owner of the file. |
|
||
payload |
the payload to write to the file. |
#[payload] |
|
Append to file
<hdfs:append>
Append the current payload to a file located at the designated path. Note: by default the Hadoop server has the append option disabled. In order to be able append any data to an existing file refer to dfs.support.append configuration parameter
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
config-ref |
Specify which config to use |
x |
||
path |
the path of the file to write to. |
x |
||
bufferSize |
int |
the buffer size to use when appending to the file. |
4096 |
|
payload |
the payload to append to the file. |
#[payload] |
|
Delete file
<hdfs:delete-file>
Delete the file or directory located at the designated path.
XML Sample
<!-- Delete a file -->
<hdfs:delete-file path="/tmp/test.dat" config-ref="hdfs-conf"/>
Delete directory
<hdfs:delete-directory>
Delete the file or directory located at the designated path.
XML Sample
<!-- Delete a directory -->
<hdfs:delete-directory path="/tmp/my-dir" config-ref="hdfs-conf"/>
Make directories
<hdfs:make-directories>
Make the given file and all non-existent parents into directories. Has the semantics of Unix 'mkdir -p'. Existence of the directory hierarchy is not an error.
Rename
<hdfs:rename>
Renames path target to path destination.
XML Sample
<!-- Rename any source directory or file to the provided target path -->
<hdfs:rename source="/tmp/my-dir" target="/tmp/new-dir" config-ref="hdfs-conf"/>
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
config-ref |
Specify which config to use |
x |
||
source |
the source path to be renamed. |
x |
||
target |
the target new path after rename. |
x |
List status
<hdfs:list-status>
List the statuses of the files/directories in the given path if the path is a directory
XML Sample
<!-- List the statuses of the given path -->
<hdfs:list-status path="/tmp/my-dir" filter="^.*/2014/02/$" config-ref="hdfs-conf"/>
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
config-ref |
Specify which config to use |
x |
||
path |
the given path |
x |
||
filter |
the user supplied path filter |
|
Returns
Return Java Type | Description |
---|---|
FileStatus the statuses of the files/directories in the given path |
Glob status
<hdfs:glob-status>
Return all the files that match file pattern and are not checksum files. Results are sorted by their names.
XML Sample
<!-- Return all the files that match file pattern, sorted by their names -->
<hdfs:glob-status pathPattern="/tmp/*/*" config-ref="hdfs-conf"/>
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
config-ref |
Specify which config to use |
x |
||
pathPattern |
a regular expression specifying the path pattern. |
x |
||
filter |
PathFilter |
the user supplied path filter |
|
Copy from local file
<hdfs:copy-from-local-file>
Copy the source file on the local disk to the FileSystem at the given target path, set deleteSource if the source should be removed.
XML Sample
<!-- Copy from source local disk to the target FileSystem -->
<hdfs:copy-from-local-file deleteSource="true" overwrite="false" source="/tmp/mulesoft/" target="/user/mulesoft/" config-ref="hdfs-conf"/>
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
config-ref |
Specify which config to use |
x |
||
deleteSource |
boolean |
whether to delete the source. |
false |
|
overwrite |
boolean |
whether to overwrite a existing file. |
true |
|
source |
the source path on the local disk. |
x |
||
target |
the target path on the File System. |
x |
Copy to local file
<hdfs:copy-to-local-file>
Copy the source file on the FileSystem to local disk at the given target path, set deleteSource if the source should be removed. useRawLocalFileSystem indicates whether to use RawLocalFileSystem as it is a non CRC File System.
XML Sample
<!-- Copy to source local disk from the target FileSystem -->
<hdfs:copy-to-local-file deleteSource="false" useRawLocalFileSystem="false" source="/tmp/mulesoft/" target="/user/mulesoft/" config-ref="hdfs-conf"/>
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
config-ref |
Specify which config to use |
x |
||
deleteSource |
boolean |
whether to delete the source. |
false |
|
useRawLocalFileSystem |
boolean |
whether to use RawLocalFileSystem as local file system or not. |
false |
|
source |
the source path on the File System. |
x |
||
target |
the target path on the local disk. |
x |
Set permission
<hdfs:set-permission>
Set permission of a path (i.e., a file or a directory).
XML Sample
<!-- Set permission of a path to change. -->
<hdfs:set-permission path="/tmp/my-dir" permission="511" config-ref="hdfs-conf"/>
Set owner
<hdfs:set-owner>
Set owner of a path (i.e., a file or a directory). The parameters username and groupname cannot both be null.
XML Sample
<!-- Set owner of a path to change. -->
<hdfs:set-owner path="/tmp/my-dir" ownername="mulesoft" groupname="supergroup" config-ref="hdfs-conf"/>
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
config-ref |
Specify which config to use |
x |
||
path |
the path of the file or directory to set owner. |
x |
||
ownername |
If it is null, the original username remains unchanged. |
|
||
groupname |
If it is null, the original groupname remains unchanged. |
|
Sources
Read from path
<hdfs:read>
Read the content of a file designated by its path and streams it to the rest of the flow, while adding the path metadata in the following inbound properties:
- HDFSConnector#HDFS_PATH_EXISTS: a boolean set to true if the path exists
- HDFSConnector#HDFS_CONTENT_SUMMARY: an instance of ContentSummary if the path exists.
- HDFSConnector#HDFS_FILE_STATUS: an instance of FileStatus if the path exists.
- HDFSConnector#HDFS_FILE_CHECKSUM: an instance of FileChecksum if the path exists, is a file and has a checksum.
Attributes
Name | Java Type | Description | Default Value | Required |
---|---|---|---|---|
config-ref |
Specify which config to use |
x |
||
path |
the path of the file to read. |
x |
||
bufferSize |
int |
the buffer size to use when reading the file. |
4096 |
|
sourceCallback |
SourceCallback |
the SourceCallback used to propagate the event to the rest of the flow. |
x |
Returns
Return Java Type | Description |
---|---|
void |