1. hdfs 저장소에 input 디렉토리를 생성
1 2 | $ hdfs dfs -mkdir /input | cs |
2. hdfs 저장소에 테스트를 위한 파일 저장
2-1. 샘플 데이터
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | $ ll /usr/local/hadoop total 168 drwxrwxr-x 12 hadoop-user hadoop 4096 Oct 11 11:19 ./ drwxr-xr-x 14 root root 4096 Oct 10 16:47 ../ drwxrwxr-x 2 hadoop-user hadoop 4096 Jun 2 15:24 bin/ drwxrwxr-x 3 hadoop-user hadoop 4096 Jun 2 15:24 etc/ drwxrwxr-x 2 hadoop-user hadoop 4096 Jun 2 15:24 include/ drwxrwxr-x 3 hadoop-user hadoop 4096 Jun 2 15:24 lib/ drwxrwxr-x 2 hadoop-user hadoop 4096 Jun 2 15:24 libexec/ -rw-rw-r-- 1 hadoop-user hadoop 99253 Jun 2 15:24 LICENSE.txt drwxr-xr-x 3 hadoop-user hadoop 4096 Oct 13 12:37 logs/ -rw------- 1 hadoop-user hadoop 0 Oct 11 11:19 nohup.out -rw-rw-r-- 1 hadoop-user hadoop 15915 Jun 2 15:24 NOTICE.txt -rw-r--r-- 1 hadoop-user hadoop 1366 Jun 2 15:24 README.txt drwxrwxr-x 3 hadoop-user hadoop 4096 Sep 27 13:03 sbin/ drwxrwxr-x 4 hadoop-user hadoop 4096 Jun 2 15:24 share/ drwxr-xr-x 3 hadoop-user hadoop 4096 Oct 10 13:42 tmp/ drwxr-xr-x 3 hadoop-user hadoop 4096 Sep 25 11:10 yarn_data/ | cs |
* hadoop 디렉토리 내에 README.txt 파일을 샘플 데이터로 사용하겠습니다.
2-2. 샘플 데이터를 hdfs 저장소에 복사
1 2 | $ hdfs dfs -put /usr/local/hadoop/README.txt /input | cs |
2-2-1. 샘플 데이터 복사 여부 확인
1 2 3 4 | $ hdfs dfs -ls /input/ Found 1 items -rw-r--r-- 3 hadoop-user supergroup 1366 2017-10-13 13:01 /input/README.txt | cs |
* hdfs 저장소에 README.txt 파일을 확인함으로써 테스트를 위한 데이터 준비가 완료되었습니다.
3. wordcount 실행
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | $ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.1.jar wordcount /input/README.txt /output SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.8.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/local/hive-0.8.1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/10/13 13:15:45 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.10.30:8032 17/10/13 13:15:46 INFO input.FileInputFormat: Total input files to process : 1 17/10/13 13:15:46 INFO mapreduce.JobSubmitter: number of splits:1 17/10/13 13:15:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1507865821865_0002 17/10/13 13:15:47 INFO impl.YarnClientImpl: Submitted application application_1507865821865_0002 17/10/13 13:15:47 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1507865821865_0002/ 17/10/13 13:15:47 INFO mapreduce.Job: Running job: job_1507865821865_0002 17/10/13 13:15:54 INFO mapreduce.Job: Job job_1507865821865_0002 running in uber mode : false 17/10/13 13:15:54 INFO mapreduce.Job: map 0% reduce 0% 17/10/13 13:15:58 INFO mapreduce.Job: map 100% reduce 0% 17/10/13 13:16:03 INFO mapreduce.Job: map 100% reduce 100% 17/10/13 13:16:03 INFO mapreduce.Job: Job job_1507865821865_0002 completed successfully 17/10/13 13:16:03 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=1836 FILE: Number of bytes written=276889 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=1466 HDFS: Number of bytes written=1306 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=2318 Total time spent by all reduces in occupied slots (ms)=2343 Total time spent by all map tasks (ms)=2318 Total time spent by all reduce tasks (ms)=2343 Total vcore-milliseconds taken by all map tasks=2318 Total vcore-milliseconds taken by all reduce tasks=2343 Total megabyte-milliseconds taken by all map tasks=2373632 Total megabyte-milliseconds taken by all reduce tasks=2399232 Map-Reduce Framework Map input records=31 Map output records=179 Map output bytes=2055 Map output materialized bytes=1836 Input split bytes=100 Combine input records=179 Combine output records=131 Reduce input groups=131 Reduce shuffle bytes=1836 Reduce input records=131 Reduce output records=131 Spilled Records=262 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=168 CPU time spent (ms)=1250 Physical memory (bytes) snapshot=465510400 Virtual memory (bytes) snapshot=3973697536 Total committed heap usage (bytes)=354942976 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=1366 File Output Format Counters Bytes Written=1306 | cs |
* hadoop 설치 시 제공된 hadoop-mapreduce-examples-2.8.1.jar 파일에 wordcount 를 실행한 결과입니다.
* 출력 결과는 실행 시 지정한 /output 디렉토리에 생성됩니다.
* 참고: 출력 디렉토리는 미리 생성하지 않아도 실행 시에 자동으로 생성됩니다.
4. wordcount 실행 후 출력 파일 확인
1 2 3 4 5 | $ hdfs dfs -ls /output Found 2 items -rw-r--r-- 3 hadoop-user supergroup 0 2017-10-13 13:16 /output/_SUCCESS -rw-r--r-- 3 hadoop-user supergroup 1306 2017-10-13 13:16 /output/part-r-00000 | cs |
* 위와 같이 /output 디렉토리에 part-r-00000 파일이 잘 생성되었는지 확인합니다.
5. 출력 결과 확인
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 | $ hdfs dfs -cat /output/part-r-00000 (BIS), 1 (ECCN) 1 (TSU) 1 (see 1 5D002.C.1, 1 740.13) 1 <http://www.wassenaar.org/> 1 Administration 1 Apache 1 BEFORE 1 BIS 1 Bureau 1 Commerce, 1 Commodity 1 Control 1 Core 1 Department 1 ENC 1 Exception 1 Export 2 For 1 Foundation 1 Government 1 Hadoop 1 Hadoop, 1 Industry 1 Jetty 1 License 1 Number 1 Regulations, 1 SSL 1 Section 1 Security 1 See 1 Software 2 Technology 1 The 4 This 1 U.S. 1 Unrestricted 1 about 1 algorithms. 1 and 6 and/or 1 another 1 any 1 as 1 asymmetric 1 at: 2 both 1 by 1 check 1 classified 1 code 1 code. 1 concerning 1 country 1 country's 1 country, 1 cryptographic 3 currently 1 details 1 distribution 2 eligible 1 encryption 3 exception 1 export 1 following 1 for 3 form 1 from 1 functions 1 has 1 have 1 http://hadoop.apache.org/core/ 1 http://wiki.apache.org/hadoop/ 1 if 1 import, 2 in 1 included 1 includes 2 information 2 information. 1 is 1 it 1 latest 1 laws, 1 libraries 1 makes 1 manner 1 may 1 more 2 mortbay.org. 1 object 1 of 5 on 2 or 2 our 2 performing 1 permitted. 1 please 2 policies 1 possession, 2 project 1 provides 1 re-export 2 regulations 1 reside 1 restrictions 1 security 1 see 1 software 2 software, 2 software. 2 software: 1 source 1 the 8 this 3 to 2 under 1 use, 2 uses 1 using 2 visit 1 website 1 which 2 wiki, 1 with 1 written 1 you 1 your 1 | cs |
* 위의 결과로 미루어보아 생각보다 데이터의 양은 많지 않았던 것 같네요.
'Big Data Platform > Hadoop' 카테고리의 다른 글
[에러] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-pipes: An Ant BuildException has occured: exec returned: 1 (0) | 2018.01.12 |
---|---|
[경고] Unable to load native-hadoop library for your platform (0) | 2018.01.10 |
pi 실습 (0) | 2017.10.13 |
Unable to determine address of the host-falling back to "localhost" address (0) | 2017.10.13 |
Hadoop 환경 구축 (0) | 2017.10.12 |