Hadoopを触る。
まずは、QuickStartを。
http://hadoop.apache.org/core/docs/current/quickstart.html
QuickStartは、Hadoopをシングルノードで動かし、
Hadoop Distributed File System と、Map-Reduce の雰囲気をつかむことができるらしい。
Win32環境でも動くけど、開発用途にしてね。と。
Hadoopのデーモンをリモートで管理するのに、ssh、sshdが必要。
Download
とりあえず、適当なディレクトリにtar玉を展開。
$tar -xvf hadoop-0.16.0.tar.gz $cd hadoop-0.16.0 $ pwd /home/javian/hadoop/hadoop-0.16.0 $ ls CHANGES.txt NOTICE.txt bin c++ contrib hadoop-0.16.0-core.jar hadoop-0.16.0-test.jar libhdfs webapps LICENSE.txt README.txt build.xml conf docs hadoop-0.16.0-examples.jar lib
conf/hadoop-env.sh に JAVA_HOMEを設定する。
hadoop-env.sh には他にも環境変数が設定できるようだが、JAVA_HOMEだけが必須みたい。
export JAVA_HOME=/opt/jdk1.5
bin/hadoop を実行しろとあるので実行する。
$ bin/hadoop Usage: hadoop [--config confdir] COMMAND where COMMAND is one of: namenode -format format the DFS filesystem secondarynamenode run the DFS secondary namenode namenode run the DFS namenode datanode run a DFS datanode dfsadmin run a DFS admin client fsck run a DFS filesystem checking utility fs run a generic filesystem user client balancer run a cluster balancing utility jobtracker run the MapReduce job Tracker node pipes run a Pipes job tasktracker run a MapReduce task Tracker node job manipulate MapReduce jobs version print the version jar <jar> run a jar file distcp <srcurl> <desturl> copy file or directories recursively daemonlog get/set the log level for each daemon or CLASSNAME run the class named CLASSNAME Most commands print help when invoked w/o parameters.
Standalone Operation
Hadoopは、デフォルトだと、シングルプロセスの非分散モードで動くとのこと。
hadoop-0.16.0-examples.jarを実行する。
$ mkdir input $ cp conf/*.xml input $ls input/ hadoop-default.xml hadoop-site.xml $ bin/hadoop jar hadoop-0.16.0-examples.jar grep input output 'dfs[a-z.]+' 08/03/14 10:49:17 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 08/03/14 10:49:17 INFO mapred.FileInputFormat: Total input paths to process : 2 08/03/14 10:49:17 INFO mapred.JobClient: Running job: job_local_1 08/03/14 10:49:17 INFO mapred.MapTask: numReduceTasks: 1 08/03/14 10:49:17 INFO mapred.LocalJobRunner: file:/home/javian/hadoop/hadoop-0.16.0/input/hadoop-site.xml:0+178 08/03/14 10:49:17 INFO mapred.TaskRunner: Task 'job_local_1_map_0000' done. 08/03/14 10:49:17 INFO mapred.MapTask: numReduceTasks: 1 08/03/14 10:49:18 INFO mapred.LocalJobRunner: file:/home/javian/hadoop/hadoop-0.16.0/input/hadoop-default.xml:0+33751 08/03/14 10:49:18 INFO mapred.TaskRunner: Task 'job_local_1_map_0001' done. 08/03/14 10:49:18 INFO mapred.LocalJobRunner: reduce > reduce 08/03/14 10:49:18 INFO mapred.TaskRunner: Task 'reduce_llvjtt' done. 08/03/14 10:49:18 INFO mapred.TaskRunner: Saved output of task 'reduce_llvjtt' to file:/home/javian/hadoop/hadoop-0.16.0/grep-temp-963773070 08/03/14 10:49:18 INFO mapred.JobClient: Job complete: job_local_1 08/03/14 10:49:18 INFO mapred.JobClient: Counters: 9 08/03/14 10:49:18 INFO mapred.JobClient: Map-Reduce Framework 08/03/14 10:49:18 INFO mapred.JobClient: Map input records=1120 08/03/14 10:49:18 INFO mapred.JobClient: Map output records=39 08/03/14 10:49:18 INFO mapred.JobClient: Map input bytes=33929 08/03/14 10:49:18 INFO mapred.JobClient: Map output bytes=1114 08/03/14 10:49:18 INFO mapred.JobClient: Combine input records=39 08/03/14 10:49:18 INFO mapred.JobClient: Combine output records=38 08/03/14 10:49:18 INFO mapred.JobClient: Reduce input groups=38 08/03/14 10:49:18 INFO mapred.JobClient: Reduce input records=38 08/03/14 10:49:18 INFO mapred.JobClient: Reduce output records=38 08/03/14 10:49:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 08/03/14 10:49:18 INFO mapred.FileInputFormat: Total input paths to process : 1 08/03/14 10:49:19 INFO mapred.JobClient: Running job: job_local_2 08/03/14 10:49:19 INFO mapred.MapTask: numReduceTasks: 1 08/03/14 10:49:19 INFO mapred.LocalJobRunner: file:/home/javian/hadoop/hadoop-0.16.0/grep-temp-963773070/part-00000:0+1491 08/03/14 10:49:19 INFO mapred.TaskRunner: Task 'job_local_2_map_0000' done. 08/03/14 10:49:19 INFO mapred.LocalJobRunner: reduce > reduce 08/03/14 10:49:19 INFO mapred.TaskRunner: Task 'reduce_hzcsk6' done. 08/03/14 10:49:19 INFO mapred.TaskRunner: Saved output of task 'reduce_hzcsk6' to file:/home/javian/hadoop/hadoop-0.16.0/output 08/03/14 10:49:20 INFO mapred.JobClient: Job complete: job_local_2 08/03/14 10:49:20 INFO mapred.JobClient: Counters: 9 08/03/14 10:49:20 INFO mapred.JobClient: Map-Reduce Framework 08/03/14 10:49:20 INFO mapred.JobClient: Map input records=38 08/03/14 10:49:20 INFO mapred.JobClient: Map output records=38 08/03/14 10:49:20 INFO mapred.JobClient: Map input bytes=1405 08/03/14 10:49:20 INFO mapred.JobClient: Map output bytes=1101 08/03/14 10:49:20 INFO mapred.JobClient: Combine input records=0 08/03/14 10:49:20 INFO mapred.JobClient: Combine output records=0 08/03/14 10:49:20 INFO mapred.JobClient: Reduce input groups=2 08/03/14 10:49:20 INFO mapred.JobClient: Reduce input records=38 08/03/14 10:49:20 INFO mapred.JobClient: Reduce output records=38
hadoop-0.16.0-examples.jarを実行すると、part-0000 というファイルが出力されるので、その中を見てみる。
$ls output part-00000 $ cat output/* 2 dfs. 1 dfs.balance.bandwidth 1 dfs.block.size 1 dfs.blockreport.interval 1 dfs.client.block.write.retries 1 dfs.client.buffer.dir 1 dfs.data.dir 1 dfs.datanode.address 1 dfs.datanode.dns.interface 1 dfs.datanode.dns.nameserver 1 dfs.datanode.du.pct 1 dfs.datanode.du.reserved 1 dfs.datanode.http.address 1 dfs.default.chunk.view.size 1 dfs.df.interval 1 dfs.heartbeat.interval 1 dfs.hosts 1 dfs.hosts.exclude 1 dfs.http.address 1 dfs.impl 1 dfs.max.objects 1 dfs.name.dir 1 dfs.namenode.decommission.interval 1 dfs.namenode.handler.count 1 dfs.namenode.logging.level 1 dfs.network.script 1 dfs.permissions 1 dfs.permissions.supergroup 1 dfs.replication 1 dfs.replication.consider 1 dfs.replication.interval 1 dfs.replication.max 1 dfs.replication.min 1 dfs.replication.min. 1 dfs.safemode.extension 1 dfs.safemode.threshold.pct 1 dfs.secondary.http.address 1 dfs.web.ugi
どうやら、$ bin/hadoop jar hadoop-0.16.0-examples.jar grep input output 'dfs[a-z.]+'
で、inputディレクトリ内のファイルから、'dfs[a-z.]+'に一致する文字列を拾い集めて、output ディレクトリ
に出力しているみたい。
output/part-00000 の中身は、見つかった文字列と、そのカウント。
適当なファイルを作成して試してみる。
$ vi input/hoge.txt $ cat input/hoge.txt dfs.test.hoge dfs.test.moge dfs.test.hoge dfs.test.hoge dfs.test.hoge dfs.test.hoge dfs.test.moge dfs.test.moge dfs.test.moge dfs.test.moge dfs.test.moge $ rm -rf output $ $ bin/hadoop jar hadoop-0.16.0-examples.jar grep input output 'dfs[a-z.]+' 08/03/14 11:04:01 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 08/03/14 11:04:01 INFO mapred.FileInputFormat: Total input paths to process : 3 08/03/14 11:04:01 INFO mapred.JobClient: Running job: job_local_1 08/03/14 11:04:01 INFO mapred.MapTask: numReduceTasks: 1 08/03/14 11:04:02 INFO mapred.LocalJobRunner: file:/home/javian/hadoop/hadoop-0.16.0/input/hadoop-site.xml:0+178 08/03/14 11:04:02 INFO mapred.TaskRunner: Task 'job_local_1_map_0000' done. 08/03/14 11:04:02 INFO mapred.MapTask: numReduceTasks: 1 08/03/14 11:04:02 INFO mapred.LocalJobRunner: file:/home/javian/hadoop/hadoop-0.16.0/input/hoge.txt:0+154 08/03/14 11:04:02 INFO mapred.TaskRunner: Task 'job_local_1_map_0001' done. 08/03/14 11:04:02 INFO mapred.MapTask: numReduceTasks: 1 08/03/14 11:04:02 INFO mapred.LocalJobRunner: file:/home/javian/hadoop/hadoop-0.16.0/input/hadoop-default.xml:0+33751 08/03/14 11:04:02 INFO mapred.TaskRunner: Task 'job_local_1_map_0002' done. 08/03/14 11:04:02 INFO mapred.LocalJobRunner: reduce > reduce 08/03/14 11:04:02 INFO mapred.TaskRunner: Task 'reduce_hdig6n' done. 08/03/14 11:04:02 INFO mapred.TaskRunner: Saved output of task 'reduce_hdig6n' to file:/home/javian/hadoop/hadoop-0.16.0/grep-temp-1703807066 08/03/14 11:04:02 INFO mapred.JobClient: Job complete: job_local_1 08/03/14 11:04:02 INFO mapred.JobClient: Counters: 9 08/03/14 11:04:02 INFO mapred.JobClient: Map-Reduce Framework 08/03/14 11:04:02 INFO mapred.JobClient: Map input records=1131 08/03/14 11:04:02 INFO mapred.JobClient: Map output records=50 08/03/14 11:04:02 INFO mapred.JobClient: Map input bytes=34083 08/03/14 11:04:02 INFO mapred.JobClient: Map output bytes=1356 08/03/14 11:04:02 INFO mapred.JobClient: Combine input records=50 08/03/14 11:04:02 INFO mapred.JobClient: Combine output records=40 08/03/14 11:04:02 INFO mapred.JobClient: Reduce input groups=40 08/03/14 11:04:02 INFO mapred.JobClient: Reduce input records=40 08/03/14 11:04:02 INFO mapred.JobClient: Reduce output records=40 08/03/14 11:04:02 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 08/03/14 11:04:02 INFO mapred.FileInputFormat: Total input paths to process : 1 08/03/14 11:04:03 INFO mapred.JobClient: Running job: job_local_2 08/03/14 11:04:03 INFO mapred.MapTask: numReduceTasks: 1 08/03/14 11:04:03 INFO mapred.LocalJobRunner: file:/home/javian/hadoop/hadoop-0.16.0/grep-temp-1703807066/part-00000:0+1551 08/03/14 11:04:03 INFO mapred.TaskRunner: Task 'job_local_2_map_0000' done. 08/03/14 11:04:03 INFO mapred.LocalJobRunner: reduce > reduce 08/03/14 11:04:03 INFO mapred.TaskRunner: Task 'reduce_ottzrk' done. 08/03/14 11:04:03 INFO mapred.TaskRunner: Saved output of task 'reduce_ottzrk' to file:/home/javian/hadoop/hadoop-0.16.0/output 08/03/14 11:04:04 INFO mapred.JobClient: Job complete: job_local_2 08/03/14 11:04:04 INFO mapred.JobClient: Counters: 9 08/03/14 11:04:04 INFO mapred.JobClient: Map-Reduce Framework 08/03/14 11:04:04 INFO mapred.JobClient: Map input records=40 08/03/14 11:04:04 INFO mapred.JobClient: Map output records=40 08/03/14 11:04:04 INFO mapred.JobClient: Map input bytes=1465 08/03/14 11:04:04 INFO mapred.JobClient: Map output bytes=1145 08/03/14 11:04:04 INFO mapred.JobClient: Combine input records=0 08/03/14 11:04:04 INFO mapred.JobClient: Combine output records=0 08/03/14 11:04:04 INFO mapred.JobClient: Reduce input groups=4 08/03/14 11:04:04 INFO mapred.JobClient: Reduce input records=40 08/03/14 11:04:04 INFO mapred.JobClient: Reduce output records=40
ログを見ると、hoge.txtも処理されているのが分かる。
ouputディレクトリの中野ファイルを見ると、先ほど作成したhoge.txtの中の文字列もカウントされている。
$ cat output/* 6 dfs.test.moge 5 dfs.test.hoge 2 dfs. 1 dfs.balance.bandwidth 1 dfs.block.size 1 dfs.blockreport.interval ・・・
上記では、outputディレクトリを消してから実行しているが、ためしにoutputディレクトリ
をそのまま実行すると以下の例外が発生する。
結果の出力先は、実行前に存在してはいけないらしい。
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file:/home/javian/hadoop/hadoop-0.16.0/output already exists at org.apache.hadoop.mapred.OutputFormatBase.checkOutputSpecs(OutputFormatBase.java:108) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:540) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:805) at org.apache.hadoop.examples.Grep.run(Grep.java:84) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.examples.Grep.main(Grep.java:93) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:52) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
Configuration
以下のとおりに conf/hadoop-site.xmlを編集
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.default.name</name> <value>localhost:9000</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
とりあえず、ポート9000と9001 が使われてないことを確認。
$ netstat -a | grep 9000 $ netstat -a | grep 9001
Setup passphraseless ssh
パスワードなしで、localhost にssh できるか確認しろ。と。
自分の環境ではそんな設定していないので、QuickStartの内容に従いキーを生成。
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys $ ls ~/.ssh authorized_keys id_dsa id_dsa.pub known_hosts $ chmod 600 ~/.ssh/authorized_keys
authorized_keysに、自ユーザ以外のwrite権限があると、パスワードなしでsshログインができないので、chmodしている。
Execution
分散ファイルシステムをフォーマットする。
$ bin/hadoop namenode -format 08/03/14 12:09:44 INFO dfs.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hostname/xxx.xxx.xxx.xxx STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.16.0 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 618351; compiled by 'hadoopqa' on Mon Feb 4 19:29:11 UTC 2008 ************************************************************/ 08/03/14 12:09:44 INFO fs.FSNamesystem: fsOwner=javian,javian 08/03/14 12:09:44 INFO fs.FSNamesystem: supergroup=supergroup 08/03/14 12:09:44 INFO fs.FSNamesystem: isPermissionEnabled=true 08/03/14 12:09:44 INFO dfs.Storage: Storage directory /tmp/hadoop-javian/dfs/name has been successfully formatted. 08/03/14 12:09:44 INFO dfs.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at hostname/xxx.xxx.xxx.xxx ************************************************************/
/tmp/hadoop-javian/dfs/nameの下に、なんかいろいろできた。
$ ls /tmp/hadoop-javian/dfs/name current image $ ls /tmp/hadoop-javian/dfs/name/current VERSION edits fsimage fstime $ ls /tmp/hadoop-javian/dfs/name/image fsimage
Hadoopデーモンを起動する。
$ bin/start-all.sh
starting namenode, logging to /home/javian/hadoop/hadoop-0.16.0/bin/../logs/hadoop-javian-namenode-giant.out
localhost: starting datanode, logging to /home/javian/hadoop/hadoop-0.16.0/bin/../logs/hadoop-javian-datanode-giant.out
localhost: starting secondarynamenode, logging to /home/javian/hadoop/hadoop-0.16.0/bin/../logs/hadoop-javian-secondarynamenode-giant.out
starting jobtracker, logging to /home/javian/hadoop/hadoop-0.16.0/bin/../logs/hadoop-javian-jobtracker-giant.out
localhost: starting tasktracker, logging to /home/javian/hadoop/hadoop-0.16.0/bin/../logs/hadoop-javian-tasktracker-giant.out
ps で確認すると以下のクラスのjavaプロセスが起動している。
org.apache.hadoop.dfs.NameNode org.apache.hadoop.dfs.DataNode org.apache.hadoop.dfs.SecondaryNameNode org.apache.hadoop.mapred.JobTracker org.apache.hadoop.mapred.TaskTracker
このうち、NameNodeと、JobTrackerは以下のURLで参照できるWebのインタフェースを持っている。
- NameNode - http://localhost:50070/
- JobTracker - http://localhost:50030/
以下のURLで、ファイルシステムの中をのぞけるっぽいけどよくわからない。
http://localhost:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
conf ディレクトリを、input という名前で、分散ファイルシステム上にコピーする。
$ bin/hadoop dfs -put conf input
http://localhost:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
のURLで確認すると、分散ファイルシステム上に、/usr/javian/input という名前の
フォルダが作成されたことが確認できる。
最初と同じ要領で、example を実行。
$ bin/hadoop jar hadoop-0.16.0-examples.jar grep input output 'dfs[a-z.]+' 08/03/14 12:36:18 INFO mapred.FileInputFormat: Total input paths to process : 10 08/03/14 12:36:18 INFO mapred.JobClient: Running job: job_200803141216_0001 08/03/14 12:36:19 INFO mapred.JobClient: map 0% reduce 0% 08/03/14 12:36:23 INFO mapred.JobClient: map 9% reduce 0% 08/03/14 12:36:24 INFO mapred.JobClient: map 18% reduce 0% 08/03/14 12:36:26 INFO mapred.JobClient: map 36% reduce 0% 08/03/14 12:36:28 INFO mapred.JobClient: map 54% reduce 0% 08/03/14 12:36:30 INFO mapred.JobClient: map 72% reduce 0% 08/03/14 12:36:33 INFO mapred.JobClient: map 90% reduce 0% 08/03/14 12:36:35 INFO mapred.JobClient: map 100% reduce 0% 08/03/14 12:36:43 INFO mapred.JobClient: map 100% reduce 18% 08/03/14 12:36:45 INFO mapred.JobClient: map 100% reduce 100% 08/03/14 12:36:46 INFO mapred.JobClient: Job complete: job_200803141216_0001 08/03/14 12:36:46 INFO mapred.JobClient: Counters: 12 08/03/14 12:36:46 INFO mapred.JobClient: Job Counters 08/03/14 12:36:46 INFO mapred.JobClient: Launched map tasks=11 08/03/14 12:36:46 INFO mapred.JobClient: Launched reduce tasks=1 08/03/14 12:36:46 INFO mapred.JobClient: Data-local map tasks=11 08/03/14 12:36:46 INFO mapred.JobClient: Map-Reduce Framework 08/03/14 12:36:46 INFO mapred.JobClient: Map input records=1342 08/03/14 12:36:46 INFO mapred.JobClient: Map output records=48 08/03/14 12:36:46 INFO mapred.JobClient: Map input bytes=40589 08/03/14 12:36:46 INFO mapred.JobClient: Map output bytes=1290 08/03/14 12:36:46 INFO mapred.JobClient: Combine input records=48 08/03/14 12:36:46 INFO mapred.JobClient: Combine output records=44 08/03/14 12:36:46 INFO mapred.JobClient: Reduce input groups=43 08/03/14 12:36:46 INFO mapred.JobClient: Reduce input records=44 08/03/14 12:36:46 INFO mapred.JobClient: Reduce output records=43 08/03/14 12:36:46 INFO mapred.FileInputFormat: Total input paths to process : 1 08/03/14 12:36:47 INFO mapred.JobClient: Running job: job_200803141216_0002 08/03/14 12:36:48 INFO mapred.JobClient: map 0% reduce 0% 08/03/14 12:36:51 INFO mapred.JobClient: map 100% reduce 0% 08/03/14 12:36:57 INFO mapred.JobClient: map 100% reduce 100% 08/03/14 12:36:58 INFO mapred.JobClient: Job complete: job_200803141216_0002 08/03/14 12:36:58 INFO mapred.JobClient: Counters: 12 08/03/14 12:36:58 INFO mapred.JobClient: Job Counters 08/03/14 12:36:58 INFO mapred.JobClient: Launched map tasks=1 08/03/14 12:36:58 INFO mapred.JobClient: Launched reduce tasks=1 08/03/14 12:36:58 INFO mapred.JobClient: Data-local map tasks=1 08/03/14 12:36:58 INFO mapred.JobClient: Map-Reduce Framework 08/03/14 12:36:58 INFO mapred.JobClient: Map input records=43 08/03/14 12:36:58 INFO mapred.JobClient: Map output records=43 08/03/14 12:36:58 INFO mapred.JobClient: Map input bytes=1542 08/03/14 12:36:58 INFO mapred.JobClient: Map output bytes=1198 08/03/14 12:36:58 INFO mapred.JobClient: Combine input records=0 08/03/14 12:36:58 INFO mapred.JobClient: Combine output records=0 08/03/14 12:36:58 INFO mapred.JobClient: Reduce input groups=3 08/03/14 12:36:58 INFO mapred.JobClient: Reduce input records=43 08/03/14 12:36:58 INFO mapred.JobClient: Reduce output records=43
多分プロセス間でやり取りしてるんだと思うけど、ずいぶん時間がかかる。
終了後、先のURL出確認すると。/usr/javian/output/part-00000 という名前の
フォルダが作成されたことが確認できる。
このpart-00000 の中身を見るには、分散ファイルシステムの中から、ローカルの
ファイルシステム上にファイルをとりだす。
$ bin/hadoop dfs -get output output $ ls outpt part-00000
もしくは直接、分散ファイルシステムの中のファイルをのぞく。
$ bin/hadoop dfs -cat output/*
最後に、起動したデーモンをとめる。
bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode