日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Hadoop的TextInputFormat的作用,如何自定义实现的

發布時間:2024/2/28 编程问答 35 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Hadoop的TextInputFormat的作用,如何自定义实现的 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

代碼先行:【源代碼】

二話不說,代碼先行!

/*** Licensed to the Apache Software Foundation (ASF) under one* or more contributor license agreements. See the NOTICE file* distributed with this work for additional information* regarding copyright ownership. The ASF licenses this file* to you under the Apache License, Version 2.0 (the* "License"); you may not use this file except in compliance* with the License. You may obtain a copy of the License at** http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an "AS IS" BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/package org.apache.hadoop.mapred;import java.io.*;import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.compress.*;import com.google.common.base.Charsets;/** * An {@link InputFormat} for plain text files. Files are broken into lines.* Either linefeed or carriage-return are used to signal end of line. Keys are* the position in the file, and values are the line of text.. */ @InterfaceAudience.Public @InterfaceStability.Stable public class TextInputFormat extends FileInputFormat<LongWritable, Text>implements JobConfigurable {private CompressionCodecFactory compressionCodecs = null;public void configure(JobConf conf) {compressionCodecs = new CompressionCodecFactory(conf);}protected boolean isSplitable(FileSystem fs, Path file) {final CompressionCodec codec = compressionCodecs.getCodec(file);if (null == codec) {return true;}return codec instanceof SplittableCompressionCodec;}public RecordReader<LongWritable, Text> getRecordReader(InputSplit genericSplit, JobConf job,Reporter reporter)throws IOException {reporter.setStatus(genericSplit.toString());String delimiter = job.get("textinputformat.record.delimiter");byte[] recordDelimiterBytes = null;if (null != delimiter) {recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);}return new LineRecordReader(job, (FileSplit) genericSplit,recordDelimiterBytes);} }

先看注釋:

/** * An {@link InputFormat} for plain text files. Files are broken into lines.* Either linefeed or carriage-return are used to signal end of line. Keys are* the position in the file, and values are the line of text.. */ 純文本文件的{@link InputFormat}。文件被分成幾行。換行或載波返回都用來表示行結束。鍵是文件 中的位置,值是文本的行。

InputFormat用于描述輸入數據的格式。

TextInputFormat重寫了其父類的isSplitableRecordReader方法。

采用的編碼機Charsets.UTF_8

其他的沒什么說的,【主要不懂。。。】

?

總結

以上是生活随笔為你收集整理的Hadoop的TextInputFormat的作用,如何自定义实现的的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。