Baidu 云服务Gradle 插件

作者:Rui 发布时间:March 11, 2016 分类:JAVA,BIGDATA 浏览:401

基于Gradle 开发的便于开发基于Baidu云服务的程序,插件采用Gradle Rule Base Model 形式开发

Baidu 云服务Gradle 插件支持 发布文件到OSS 和创建百度Hadoop集群并执行 Map-Reduce程序

使用示例代码:

buildscript {
  repositories {
    maven {
      url "https://plugins.gradle.org/m2/"
    }
  }
  dependencies {
    classpath "gradle.plugin.org.rapid.develop:baidu-cloud-plugin:1.1"
  }
}

apply plugin: "org.rapid.develop.baidu-cloud-plugin"

model {
    baidu {
        accessKey = 'accessKey'
        secretKey = 'secretKey'
        // OSS 上传设置
        // 上传 文件到指定的bucketName
        ossPublish {
            bucketName = 'bucketName'
            files = [jar.archivePath, new File('../logs/accesslog-10k.log')]
        }
        // MapReduce 设置
        // 可创建Baidu Hadoop集群,并添加应用,并执行任务
        // 可执行多个任务
        mapReduce {
            name = project.name
            imageType = 'hadoop'
            imageVersion = '0.1.0'
            autoTerminate = false
            logUri = "bos://$bucketName/logs/"
            master {
                instanceType = 'g.small'
                instanceCount = 1
            }
            slaves.create {
                instanceType = 'g.small'
                instanceCount = 2
            }
            steps.create {
                name = "$project.name-$project.version"
                actionOnFailure = 'Continue'
                mainClass = 'com.vianet.cie.hadoop.AccessLogAnalyzer'
                jar = "bos://$bucketName/$project.group/$project.version/$project.name-${project.version}.jar"
                arguments = "bos://$bucketName/$project.group/$project.version/accesslog-10k.log bos://$bucketName/out"
            }
        }
    }
}

源码地址:
https://github.com/baidu-cloud-plugin/baidu-cloud-plugin.git

插件地址:
https://plugins.gradle.org/plugin/org.rapid.develop.baidu-cloud-plugin

Windows 下执行Hadoop MapReduce注意事项

作者:Rui 发布时间:February 19, 2016 分类:JAVA,Hadoop,BIGDATA 浏览:446

Exception: java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
Window下执行mapreduce 需要安装hadoop window工具包

解决方案:
Window下执行mapreduce 需要下载 winutils 到 HADOOP_HOME/bin,而且需要将hadoop.dll放置 c://windows/system32 目录下。

winutils 下载地址:hadoop-winutils-2.6.0.zip

Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x

widnow下远程执行mapreduce,默认获取的是系统用户,和使用的hadoop文件夹用户不一致。

解决方案:
添加环境变量 HADOOP_USER_NAME=hadoop用户

Hadoop connection refused

远程执行mapreduce报连接异常,默认个人配置的hadoop都是单机或者伪分布式,端口往往只监听本地地址

解决方案:
修改core-site.xml 将 hdfs地址修改为真实地址

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.56.101:9000</value>
    </property>
</configuration>

如何在 Ubuntu上配置分布式Hadoop(备忘)

作者:Rui 发布时间:October 3, 2015 分类:JAVA,Hadoop,BIGDATA 浏览:533

环境

  1. ubuntu 12.04 TLS
  2. Oracle JDK 1.8
  3. Hadoop 2.6.4
  4. 主机环境:三台主机, 一台Master,两台Slave

设置hostname 和 hosts

将三台主机的hostname和hosts 按以下内容修改

vim /etc/hostname

输入相应的主机名 Master, Slave1, Slave2

vim /etc/hosts

添加hosts配置

192.168.147.128   Master
192.168.147.129   Slave1
192.168.147.130   Slave2

创建Hadoop 用户并设置SSH

本实例将hadoop跑在单独的用户环境中,以下在三台电脑上都要创建

创建用户

sudo useradd -m hadoop -s /bin/bash
sudo passwd hadoop 
sudo adduser hadoop sudo
sudo su - hadoop

阅读剩余部分...

Hadoop – All specified directories are failed to load

作者:Rui 发布时间:September 9, 2015 分类:JAVA,Hadoop,BIGDATA 浏览:880

Exception:

2015-09-22 15:57:54,057 WARN org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/data/hdfs/datanode: namenode clusterID = CID-ad31220c-f6e0-4c35-8731-b448f323f208; datanode clusterID = CID-b6802f1e-304b-4df7-8957-23a2958fa83b
2015-09-22 15:57:54,058 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to Master/10.147.6.205:9000. Exiting. 
java.io.IOException: All specified directories are failed to load.

namenode 和 datanode的version 不一致导致的错误

解决办法:

打开 usr/local/hadoop/dfs/datanode/current/VERSION 改变

CID-b6802f1e-304b-4df7-8957-23a2958fa83b

为:

CID-ad31220c-f6e0-4c35-8731-b448f323f208

Tips:

当你重新格式化namenode时,都要重新检查 namenode 和 datanode的version 的是否一致,他们必须要有一致的clusterID和namespaceID 才能够正常启动datanode

java.io.IOException: No FileSystem for scheme: hdfs

作者:Rui 发布时间:September 9, 2015 分类:JAVA,Hadoop,BIGDATA 浏览:761

Exception : java.io.IOException: No FileSystem for scheme: hdfs

出现这个错误是因为缺少hadoop-hdfs jar包,在项目中进行如下依赖配置即可

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.5.1</version>
</dependency>