操作的环境是阿里云的服务器,http://www.aliyun.com/,还不错,按时间收费,一个小时 0.28 块钱,操作系统是 Ubuntu 10.10 64 位,单核 CPU,内存 512
首先上来先修改主机名
hostname cluster-1
然后开始按照这里,http://hadoop.apache.org/docs/…,的教程安装 Hadoop 1.1.2
首先看到 Prerequisites,需要 java 的环境,试一下 java 的命令
root@cluster-1:~# java The program 'java' can be found in the following packages: * gcj-4.4-jre-headless * gcj-4.5-jre-headless * openjdk-6-jre-headless Try: apt-get install <selected package>
那么首先安装 java,看到这里的教程,http://www.itkee.com/developer…,首先补上源,然后安装
我的虚拟机装的是Ubuntu 10.04,安装的是Sun JDK6,因为Ubuntu 10.04后官方剔除了Sun JDK的源,提倡用OpenJDK代替(我估计是因为Sun被Oracle收购了,Oracle做了一些伤害开源组织的事,出于报复,故而提倡OpenJDK!),所以首先得加入SunJDK的源,
sudo gedit /etc/apt/sources.list编辑源列表,在文件末尾加上一句:
deb http://archive.canonical.com/ lucid partner
然后sudo apt-get update运行更新,
再输入sudo apt-get install sun-java6-jdk,提示您要不要下载软件包,输入y继续。
安装好JDK后,然后就是设置环境变量
sudo gedit /etc/profile编辑文件,在文件末尾加上
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.24
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
java路径是在命令行下安装时默认的路径,如果你的路径不同,只需修改第一行即可。
使配置生效,输入“source /etc/profile”
使用“java -version”查看是否配置成功,我的是:
bwk@ubuntu:~$ java -version
java version “1.6.0_24”
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) Client VM (build 19.1-b02, mixed mode, sharing)
证明配置成功,好了,大功告成!
但是在 apt-get update 的时候出现问题,没有办法 update 成功,于是想到自己手工去 java 官网下载安装程序来安装,看到这篇文章,http://os.51cto.com/art/201003…,去到这个网址,http://www.oracle.com/technetw…,下载了 jdk 6 的 bin 文件,上传到 vps
root@cluster-1:~# java -version java version "1.6.0_45" Java(TM) SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)
然后从 http://www.apache.org/dyn/clos… 下载 1.1.2 的 Hadoop
hadoop-1.1.2.tar.gz 31-Jan-2013 22:42 61927560
解压缩完了,按照要求,edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
修改完了,按照教程上的示例,跑了一下这个代码
$ mkdir input $ cp conf/*.xml input $ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' $ cat output/*
得到的结果是
root@cluster-1:~/hadoop-1.1.2# cat output/* 1 dfsadmin
这个是在 standalone 模式下的,问题教程也没有给出标准答案,所以也不知道得到这样的结果是不是对的
不过使用以下这个命令,可以大致的判断出来,那个 jar 是模拟 grep 的功能的,而我们的结果也应该是对的
root@cluster-1:~/hadoop-1.1.2# cat input/* | egrep dfs[a-z.] dfsadmin and mradmin commands to refresh the security policy in-effect.
接下来根据教程,尝试一下那个 Pseudo-Distributed 模式,按照教程修改了配置文件之后,需要让 ssh 能够无密码登录
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
按照教程跑起 start-all 的脚本之后,访问 namenode 的 web 控制页面,可以看到这样的界面
以及这个是 jobtracker 的控制界面
但是在按照教程跑这个命令的时候报错了
bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
错误是
13/06/18 14:57:52 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/06/18 14:57:52 WARN snappy.LoadSnappy: Snappy native library not loaded 13/06/18 14:57:52 INFO mapred.FileInputFormat: Total input paths to process : 16 13/06/18 14:57:52 INFO mapred.JobClient: Running job: job_201306181446_0001 13/06/18 14:57:54 INFO mapred.JobClient: map 0% reduce 0% 13/06/18 14:58:09 INFO mapred.JobClient: Task Id : attempt_201306181446_0001_m_000001_0, Status : FAILED Error: Java heap space 13/06/18 14:58:09 INFO mapred.JobClient: Task Id : attempt_201306181446_0001_m_000000_0, Status : FAILED 13/06/18 14:58:19 INFO mapred.JobClient: Task Id : attempt_201306181446_0001_m_000000_1, Status : FAILED Error: Java heap space 13/06/18 14:58:19 INFO mapred.JobClient: Task Id : attempt_201306181446_0001_m_000001_1, Status : FAILED Error: Java heap space 13/06/18 14:58:29 INFO mapred.JobClient: Task Id : attempt_201306181446_0001_m_000000_2, Status : FAILED Error: Java heap space 13/06/18 14:58:29 INFO mapred.JobClient: Task Id : attempt_201306181446_0001_m_000001_2, Status : FAILED Error: Java heap space 13/06/18 14:58:42 INFO mapred.JobClient: Job complete: job_201306181446_0001 13/06/18 14:58:42 INFO mapred.JobClient: Counters: 7 13/06/18 14:58:42 INFO mapred.JobClient: Job Counters 13/06/18 14:58:42 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=42937 13/06/18 14:58:42 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/06/18 14:58:42 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/06/18 14:58:42 INFO mapred.JobClient: Launched map tasks=8 13/06/18 14:58:42 INFO mapred.JobClient: Data-local map tasks=8 13/06/18 14:58:42 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 13/06/18 14:58:42 INFO mapred.JobClient: Failed map tasks=1 13/06/18 14:58:42 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201306181446_0001_m_000000 java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1327) at org.apache.hadoop.examples.Grep.run(Grep.java:69) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Grep.main(Grep.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
google 一下,看到这里,http://stackoverflow.com/quest…,需要修改 java 的堆大小
Clearly you have run out of the heap size allotted to Java. So you shall try to increase that.
For that you may execute the following before executing hadoop command:
export HADOOP_OPTS=”-Xmx4096m”
Alternatively, you can achieve the same thing by adding the following permanent setting in your mapred-site.xml file, this file lies in HADOOP_HOME/conf/ :This would set your java heap space to 4096 MB (4GB), you may even try it with a lower value first if that works. If that too doesn’t work out then increase it more if your machine supports it, if not then move to a machine having more memory and try there. As heap space simply means you don’t have enough RAM available for Java. mapred.child.java.opts
-Xmx4096m
然后重启整个 Hadoop
root@cluster-1:~/workspace/hadoop-1.1.2# bin/stop-all.sh stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode root@cluster-1:~/workspace/hadoop-1.1.2# bin/start-all.sh starting namenode, logging to /root/workspace/hadoop-1.1.2/libexec/../logs/hadoop-root-namenode-cluster-1.out localhost: starting datanode, logging to /root/workspace/hadoop-1.1.2/libexec/../logs/hadoop-root-datanode-cluster-1.out localhost: starting secondarynamenode, logging to /root/workspace/hadoop-1.1.2/libexec/../logs/hadoop-root-secondarynamenode-cluster-1.out starting jobtracker, logging to /root/workspace/hadoop-1.1.2/libexec/../logs/hadoop-root-jobtracker-cluster-1.out localhost: starting tasktracker, logging to /root/workspace/hadoop-1.1.2/libexec/../logs/hadoop-root-tasktracker-cluster-1.out
再次尝试之前失败的任务,依然报同样的错误,回想起之前的 conf/hadoop-env.sh 文件,把其中的
export HADOOP_HEAPSIZE=2000
这一行的注释去掉,再重启 Hadoop,运行任务,跑起来直接崩溃了
13/06/18 15:16:58 ERROR security.UserGroupInformation: PriviledgedActionException as:root cause:org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.mapred.SafeModeException: JobTracker is in safe mode at org.apache.hadoop.mapred.JobTracker.checkSafeMode(JobTracker.java:5270) at org.apache.hadoop.mapred.JobTracker.getStagingAreaDir(JobTracker.java:3797) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /user/root/grep-temp-1080766159. Name node is in safe mode. The reported blocks 17 has reached the threshold 0.9990 of total blocks 17. Safe mode will be turned off automatically in 19 seconds. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2111) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2088) at org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:832) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387) at org.apache.hadoop.ipc.Client.call(Client.java:1107) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy1.delete(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at com.sun.proxy.$Proxy1.delete(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:981) at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:245) at org.apache.hadoop.examples.Grep.run(Grep.java:87) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Grep.main(Grep.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
把之前的 mapred-site.xml 里面的改动恢复,再尝试,执行任务依然会直接崩溃
最后经过多个搜索,定位到问题应该是在 conf/hadoop-env.sh 这个文件中的 HADOOP_HEAPSIZE 这个变量中,对于这个变量的修改,我的步骤是这样的,首先停掉 Hadoop
# bin/stop-all.sh
然后删除 logs 和 tmp
#rm -rf logs/* # rm -rf /tmp/hadoop-root*
然后修改配置文件
#vi conf/hadoop-env.sh
并重新格式化 namenode
#bin/hadoop namenode -format
然后重启 Hadoop
#bin/start-all.sh
并重新拷入文件
# bin/hadoop fs -mkdir input # bin/hadoop fs -put conf/*.xml input
然后执行任务
# bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
可惜的是,当我把变量的值设为 5000 或者更高的时候,就会报内存不足,无法初始化 java 虚拟机
13/06/18 16:28:14 INFO mapred.JobClient: Task Id : attempt_201306181619_0001_m_000007_1, Status : FAILED java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258) attempt_201306181619_0001_m_000007_1: Error occurred during initialization of VM attempt_201306181619_0001_m_000007_1: Could not reserve enough space for object heap
而在 4000 或者以下的时候,就会报堆内存不足
13/06/18 16:46:13 INFO mapred.JobClient: Task Id : attempt_201306181644_0001_m_000001_1, Status : FAILED Error: Java heap space
由于这个 vps 的内存只有 512,所以我觉得应该没有办法在单机上模拟 Pseudo-Distributed,决定跳过,尝试 Fully-Distributed