hdfs 简单操作
hdfs 简单操作
xiaozhou18 发表于7个月前
hdfs 简单操作
  • 发表于 7个月前
  • 阅读 15
  • 收藏 0
  • 点赞 0
  • 评论 0

新睿云服务器60天免费使用,快来体验!>>>   

一、dfs  bin/hdfs dfs命令

appendToFile

Usage: hdfs dfs -appendToFile <localsrc> ... <dst>

追加本地liunx下的一个或者多个文件到hdfs指定文件中.也可以从命令行读取输入.

·         hdfs dfs -appendToFile localfile /user/hadoop/hadoopfile

·         hdfs dfs -appendToFile localfile1 localfile2 /user/hadoop/hadoopfile

·         hdfs dfs -appendToFile localfile hdfs://nn.example.com/hadoop/hadoopfile

·         hdfs dfs -appendToFile - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.

Exit Code:

Returns 0 on success and 1 on error.

cat  查看文件内容

Usage: hdfs dfs -cat URI [URI ...]

查看内容.

Example:

·         hdfs dfs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2

·         hdfs dfs -cat file:///file3 /user/hadoop/file4

Exit Code:

Returns 0 on success and -1 on error.

Chgrp【change group】改变文件或文件夹所属组

Usage: hdfs dfs -chgrp [-R] GROUP URI [URI ...]

修改所属组.

Options

·         The -R option will make the change recursively through the directory structure.

chmod  给文件或文件夹加执行权限

Usage: hdfs dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...]

修改权限.

Options

·         The -R option will make the change recursively through the directory structure.

chown  改变文件或文件夹拥有者

Usage: hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]

修改所有者.

Options

·         The -R option will make the change recursively through the directory structure.

copyFromLocal 从本地liunx系统上复制文件到hdfs上

-f  表示如果文件存在就覆盖掉

如./hdfs dfs -copyFromLocal /usr/local/java/sparkword.txt  /out/wc/

/usr/local/java/sparkword.txt 表示linux上的文件

/out/wc/  表示hdfs上的文件

Usage: hdfs dfs -copyFromLocal <localsrc> URI

Options:

·         The -f option will overwrite the destination if it already exists.

copyToLocal 从hdfs上复制文件到liunx系统上

如./hdfs dfs -copyToLocal /out/wc/sparkword.txt  /usr/local/java/sparkword.tx1t

/usr/local/java/sparkword.tx1t 表示linux上的文件

/out/wc/sparkword.txt 表示hdfs上的文件

Usage: hdfs dfs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

-crc选项数据校验   表示比对从hdfs上复制到liunx上文件的完成性   如果不一致从新下载

count  列出文件夹数量、文件数量、内容大小

Usage: hdfs dfs -count  [-q] [-h] <paths>

列出文件夹数量、文件数量、内容大小. The output columns with -count are: DIR_COUNT, FILE_COUNT, CONTENT_SIZE FILE_NAME

The output columns with -count -q are: QUOTA, REMAINING_QUATA, SPACE_QUOTA, REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME

The -h option shows sizes in human readable format.

Example:

·         hdfs dfs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2

·         hdfs dfs -count -q hdfs://nn1.example.com/file1

·         hdfs dfs -count -q -h hdfs://nn1.example.com/file1

Exit Code:

Returns 0 on success and -1 on error.

cp 复制文件(夹)

Usage: hdfs dfs -cp [-f] [-p | -p[topax]] URI [URI ...] <dest>

复制文件(夹),可以覆盖,可以保留原有权限信息

Options:

·         The -f option will overwrite the destination if it already exists.

·         The -p option will preserve file attributes [topx] (timestamps, ownership, permission, ACL, XAttr). If -p is specified with no arg, then preserves timestamps, ownership, permission. If -pa is specified, then preserves permission also because ACL is a super-set of permission. Determination of whether raw namespace extended attributes are preserved is independent of the -p flag.

Example:

·         hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2

·         hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.

du 显示文件(夹)大小.

Usage: hdfs dfs -du [-s] [-h] URI [URI ...]

Options:

·         The -s option will result in an aggregate summary of file lengths being displayed, rather than the individual files.

·         The -h option will format file sizes in a "human-readable" fashion (e.g 64.0m instead of 67108864)

Example:

·         hdfs dfs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1

Exit Code: Returns 0 on success and -1 on error.

dus

Usage: hdfs dfs -dus <args>

Displays a summary of file lengths.

Note: This command is deprecated. Instead use hdfs dfs -du -s.

expunge 清空回收站.

Usage: hdfs dfs -expunge

 

get 从hdfs上下载文件

Usage: hdfs dfs -get [-ignorecrc] [-crc] <src> <localdst>

Copy files to the local file system. Files that fail the CRC check may be copied with the -ignorecrc option. Files and CRCs may be copied using the -crc option.

Example:

·         hdfs dfs -get /user/hadoop/file localfile

·         hdfs dfs -get hdfs://nn.example.com/user/hadoop/file localfile

Exit Code:

Returns 0 on success and -1 on error.

getfacl

Usage: hdfs dfs -getfacl [-R] <path>

显示权限信息.

Options:

·         -R: List the ACLs of all files and directories recursively.

·         path: File or directory to list.

Examples:

·         hdfs dfs -getfacl /file

·         hdfs dfs -getfacl -R /dir

Exit Code:

Returns 0 on success and non-zero on error.

getfattr

Usage: hdfs dfs -getfattr [-R] -n name | -d [-e en] <path>

Displays the extended attribute names and values (if any) for a file or directory.

Options:

·         -R: Recursively list the attributes for all files and directories.

·         -n name: Dump the named extended attribute value.

·         -d: Dump all extended attribute values associated with pathname.

·         -e encoding: Encode values after retrieving them. Valid encodings are "text", "hex", and "base64". Values encoded as text strings are enclosed in double quotes ("), and values encoded as hexadecimal and base64 are prefixed with 0x and 0s, respectively.

·         path: The file or directory.

Examples:

·         hdfs dfs -getfattr -d /file

·         hdfs dfs -getfattr -R -n user.myAttr /dir

Exit Code:

Returns 0 on success and non-zero on error.

getmerge

Usage: hdfs dfs -getmerge <src> <localdst> [addnl]

合并.

ls

Usage: hdfs dfs -ls [-R] <args>

Options:

·         The -R option will return stat recursively through the directory structure.

For a file returns stat on the file with the following format:

permissions number_of_replicas userid groupid filesize modification_date modification_time filename

For a directory it returns list of its direct children as in Unix. A directory is listed as:

permissions userid groupid modification_date modification_time dirname

Example:

·         hdfs dfs -ls /user/hadoop/file1

Exit Code:

Returns 0 on success and -1 on error.

lsr

Usage: hdfs dfs -lsr <args>

Recursive version of ls.

Note: This command is deprecated. Instead use hdfs dfs -ls -R

mkdir 创建目录

Usage: hdfs dfs -mkdir [-p] <paths>

Takes path uri's as argument and creates directories.

Options:

·         The -p option behavior is much like Unix mkdir -p, creating parent directories along the path.

Example:

·         hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2

·         hdfs dfs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir

Exit Code:

Returns 0 on success and -1 on error.

moveFromLocal  从本地移动文件到hdfs上

Usage: hdfs dfs -moveFromLocal <localsrc> <dst>

Similar to put command, except that the source localsrc is deleted after it's copied.

moveToLocal 从hdfs上移动文件到本地

Usage: hdfs dfs -moveToLocal [-crc] <src> <dst>

Displays a "Not implemented yet" message.

mv  在hdfs上移动文件

Usage: hdfs dfs -mv URI [URI ...] <dest>

Moves files from source to destination. This command allows multiple sources as well in which case the destination needs to be a directory. Moving files across file systems is not permitted.

Example:

·         hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2

·         hdfs dfs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1

Exit Code:

Returns 0 on success and -1 on error.

put  上传文件到hdfs上

Usage: hdfs dfs -put <localsrc> ... <dst>

Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system.

·         hdfs dfs -put localfile /user/hadoop/hadoopfile

·         hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir

·         hdfs dfs -put localfile hdfs://nn.example.com/hadoop/hadoopfile

·         hdfs dfs -put - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin.

Exit Code:

Returns 0 on success and -1 on error.

rm  删除

Usage: hdfs dfs -rm [-f] [-r|-R] [-skipTrash] URI [URI ...]

Delete files specified as args.

Options:

·         The -f option will not display a diagnostic message or modify the exit status to reflect an error if the file does not exist.

·         The -R option deletes the directory and any content under it recursively.

·         The -r option is equivalent to -R.

·         The -skipTrash option will bypass trash, if enabled, and delete the specified file(s) immediately. This can be useful when it is necessary to delete files from an over-quota directory.

Example:

·         hdfs dfs -rm hdfs://nn.example.com/file /user/hadoop/emptydir

Exit Code:

Returns 0 on success and -1 on error.

rmr  递归删除文件

Usage: hdfs dfs -rmr [-skipTrash] URI [URI ...]

Recursive version of delete.

Note: This command is deprecated. Instead use hdfs dfs -rm -r

setfacl

Usage: hdfs dfs -setfacl [-R] [-b|-k -m|-x <acl_spec> <path>]|[--set <acl_spec> <path>]

Sets Access Control Lists (ACLs) of files and directories.

Options:

·         -b: Remove all but the base ACL entries. The entries for user, group and others are retained for compatibility with permission bits.

·         -k: Remove the default ACL.

·         -R: Apply operations to all files and directories recursively.

·         -m: Modify ACL. New entries are added to the ACL, and existing entries are retained.

·         -x: Remove specified ACL entries. Other ACL entries are retained.

·         --set: Fully replace the ACL, discarding all existing entries. The acl_spec must include entries for user, group, and others for compatibility with permission bits.

·         acl_spec: Comma separated list of ACL entries.

·         path: File or directory to modify.

Examples:

·         hdfs dfs -setfacl -m user:hadoop:rw- /file

·         hdfs dfs -setfacl -x user:hadoop /file

·         hdfs dfs -setfacl -b /file

·         hdfs dfs -setfacl -k /dir

·         hdfs dfs -setfacl --set user::rw-,user:hadoop:rw-,group::r--,other::r-- /file

·         hdfs dfs -setfacl -R -m user:hadoop:r-x /dir

·         hdfs dfs -setfacl -m default:user:hadoop:r-x /dir

Exit Code:

Returns 0 on success and non-zero on error.

setfattr

Usage: hdfs dfs -setfattr -n name [-v value] | -x name <path>

Sets an extended attribute name and value for a file or directory.

Options:

·         -b: Remove all but the base ACL entries. The entries for user, group and others are retained for compatibility with permission bits.

·         -n name: The extended attribute name.

·         -v value: The extended attribute value. There are three different encoding methods for the value. If the argument is enclosed in double quotes, then the value is the string inside the quotes. If the argument is prefixed with 0x or 0X, then it is taken as a hexadecimal number. If the argument begins with 0s or 0S, then it is taken as a base64 encoding.

·         -x name: Remove the extended attribute.

·         path: The file or directory.

Examples:

·         hdfs dfs -setfattr -n user.myAttr -v myValue /file

·         hdfs dfs -setfattr -n user.noValue /file

·         hdfs dfs -setfattr -x user.myAttr /file

Exit Code:

Returns 0 on success and non-zero on error.

Setrep 设置副本数量

Usage: hdfs dfs -setrep [-R] [-w] <numReplicas> <path>

Changes the replication factor of a file. If path is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at path.

Options:

·         The -w flag requests that the command wait for the replication to complete. This can potentially take a very long time.

·         The -R flag is accepted for backwards compatibility. It has no effect.

Example:

·         hdfs dfs -setrep -w 3 /user/hadoop/dir1

Exit Code:

Returns 0 on success and -1 on error.

stat

Usage: hdfs dfs -stat URI [URI ...]

Returns the stat information on the path.

Example:

·         hdfs dfs -stat path

Exit Code: Returns 0 on success and -1 on error.

tail  查看指定多少行文件内容

Usage: hdfs dfs -tail [-f] URI

Displays last kilobyte of the file to stdout.

Options:

·         The -f option will output appended data as the file grows, as in Unix.

Example:

·         hdfs dfs -tail pathname

Exit Code: Returns 0 on success and -1 on error.

test  

Usage: hdfs dfs -test -[ezd] URI

Options:

·         The -e option will check to see if the file exists, returning 0 if true.

·         The -z option will check to see if the file is zero length, returning 0 if true.

·         The -d option will check to see if the path is directory, returning 0 if true.

Example:

·         hdfs dfs -test -e filename

text  查看文件内容相当于cat

Usage: hdfs dfs -text <src>

Takes a source file and outputs the file in text format. The allowed formats are zip and TextRecordInputStream.

touchz  创建文件

Usage: hdfs dfs -touchz URI [URI ...]

Create a file of zero length.

Example:

·         hdfs dfs -touchz pathname

Exit Code: Returns 0 on success and -1 on error.

二、dfsadmin bin/hdfs dfsadmin

解除安全模式

./hdfs dfsadmin -safemode leave      

enter表示进入安全模式   

leave退出安全模式

get查询安全模式状态

查看hdfs机器的信息

./hdfs dfsadmin -report -live

setQuota   设置文件夹下文件的数量 最大是多少

./hdfs dfsadmin -setQuota 10 /lisi  表示lisi文件夹下只能有10个文件

-clrQuota 清空

./hdfs dfsadmin -clrQuota /lisi

-setSpaceQuota 限定指定文件的大小

./hdfs dfsadmin -setSpaceQuota 4k /lisi/  lisi文件下只能存4k大小的文件

查看配额(quota)的大小

./hdfs dfs -count -q -h /lisi        -q表示查看配额的大小

三、httpFS访问

1、修改httpfs-env.sh配置  打开  export HTTPFS_HTTP_PORT=14000     端口

2、修改core-site.xml 配置

<property>

    <name>hadoop.proxyuser.root.hosts</name>

    <value>*</value>   表示任意主机都可以访问

</property>

<property>

    <name>hadoop.proxyuser.root.groups</name>

    <value>*</value>  表示任意用户都可以访问

</property>

3、修改hdfs-site.xml配置

<property>

          <name>dfs.webhdfs.enabled</name>

           <value>true</value>  表示打开webhdfs   http形式的访问

</property>   

4、重启namenode

5、启动httpfs   会自动启动一个内置的tomcat容器   

命令 sbin/httpfs.sh start

6、在liunx下执行 curl -i "http://node11:14000/webhdfs/v1?user.name=root&op=LISTSTATUS"

或者在浏览器中输入  http://node11:14000/webhdfs/v1?user.name=root&op=LISTSTATUS

node11 表示主机名 也可以用ip

14000 表示端口

webhdfs/v1  固定写法

user.name=root 用户是root

op=LISTSTATUS  操作

 

更多命令参考http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html

四、hdfs中的namenode

namenode  是hdfs中的文件管理节点  它维护着整个文件系统的文件目录结构,文件/目录的元信息和每个文件对应的数据块列表。接收用户的操作请求   客户先通过namenode获取数据存放在哪个datanode上     再根据拿到的地址  到datanode上获取数据。

namenode的信息 存放在  hdfs-site.xml的dfs.namenode.name.dir属性  配置的这个文件夹下

${hadoop.tmp.dir}/dfs/name/current 中    ${hadoop.tmp.dir}  是在core-site.xml中配置的

    我的是在/opt/hadoop-2.5/dfs/name/current这个目录下   这个目录下主要有fsimage,edits,stime等

fsimage:元数据镜像文件。存储某一时段NameNode内存元数据信息。

查看fsimage 内容  方式

*先启动服务器在执行bin/hdfs oiv -i 某个fsimage文件 -o  输出的文件名

*查看内容bin/hdfs dfs -ls  -R webhdfs://127.0.0.1:5978/

*导出结果bin/hdfs oiv -p XML -i  tmp/dfs/name/current/fsimage_0000000000000000055  -o fsimage.xml

-i表示输入文件  要查看那个fsimage文件

-o表示输出    要查看的内容输出到哪个文件里

edits:客户端对hdfs的操作日志文件  比如客户端在hdfs上传文件 下载文件  创建文件等 这些操作都会记录在edits文件里

*查看edtis内容  方式

bin/hdfs oev -i tmp/dfs/name/current/edits_0000000000000000057-0000000000000000186   -o edits.xml

fstime:保存最近一次checkpoint的时间

五、hdfs中的datanode

datanode是hdfs中真正存储数据的节点    datanode把数据切割成block块进行存储

默认Block大小是128MB        如果要存储一个200MB的文件   就会被划分成两个block块进行存储一个是128MB一个是72MB           如果要存储一个20MB的文件    不足128MB的文件  block块的大小按照文件的实际大小存储

hdfs中不适合存储小文件 的原因是   小文件越多 导致block块越多     namenode管理着block的列表信息  也就意味着 namenode需要更多的内存去管理存储block块的列表信息  从而对namenode造成内存的压力缩短集群的服务能力

datanode  数据存放在${hadoop.tmp.dir}/dfs/data/current/BP-1620470987-192.168.52.138-1479043560569/current/finalized

可以在hdfs-site.xml 中修改block块的大小

<propery>

    <name>dfs.blocksize</name>   

    <value>134217728</value>

</property>

Replication 副本  作用是防止数据丢失    默认是三个   可以修改副本数量   在hdfs-site.xml中     

集群会把副本分配到不同的机器的datanode上

<propery>

    <name>dfs.replication </name>   

    <value>3</value>

</property>

也可以通过命令改变指定文件的副本数量     

./hdfs dfs -setrep 5 /aa    设置aa文件的副本数是5

六、把多个小文件打成har包  合并成一个大文件     减少多个小文件对namenode的内存消耗

 打成har包的命令

bin/hadoop archive -archiveName xxx.har -p  /src  /dest

如 ./hadoop archive -archiveName aa.har -p  /out /

查看har文件内容

*bin/hadoop fs -lsr har:///dest/xxx.har

如./hadoop fs -lsr har:///aa.har

七、java操作hdfs

/**
 * java操作hdfs
 * @author xiaozhou
 *
 */
public class Hdfs {
	
	public static void main(String[] args) throws Exception {
		//HDFS地址
		URI uri=new URI("hdfs://node11:8020");
		Configuration con=new Configuration();
		FileSystem fileSystem=FileSystem.get(uri, con);
		//readFromHdfs(fileSystem);
		//uploadToHdfs(fileSystem);
		deleteFromHdfs(fileSystem);
	}
	//从hdfs上读数据
	public static void readFromHdfs(FileSystem fileSystem) throws Exception{
		//读取根目录下的student文件内容 
		FSDataInputStream inputStream = fileSystem.open(new Path("/student"));
		// 第一个参数输入流
		// 第二个参数输出流 System.out 表示输出到控制台  也可以指定一个文件流 输出到某个文件
		//如  FileOutputStream out=new FileOutputStream(new File("d:/aa.txt"));
		//第三个参数表示 缓冲大小
		//表示写完是否要关闭流  true表示关闭
		IOUtils.copyBytes(inputStream, System.out, 1024, true);
		//关闭流
		IOUtils.closeStream(inputStream);
	}
	
	public static void uploadToHdfs(FileSystem fileSystem) throws Exception{
		//判断文件是否存在
		boolean exists = fileSystem.exists(new Path("/out/pom.xml"));
		System.out.println(exists);
		if(!exists){
			//要向hdfs上   上传文件的路径  这个路径不能存在
			FSDataOutputStream outputStream = fileSystem.create(new Path("/out/pom.xml"));
			//上传的文件
			FileInputStream inputStream=new FileInputStream(new File("D:/pom.xml"));
			IOUtils.copyBytes(inputStream, outputStream, 1024,true);
			IOUtils.closeStream(outputStream);
		}
	}
	//删除文件
	public static void deleteFromHdfs(FileSystem fileSystem) throws Exception{
			boolean flag=fileSystem.deleteOnExit(new Path("/out/pom.xml"));
			System.out.println(flag);
	}
}

八、HDFS namenode 高可用

namenode  通过journalnode 集群来同步namenode数据的   当有新的数据到hdfs上 journalnode会将状态为active的namenode的数据  同步到状态为standby的namenode节点中

当active状态的namenode挂了的时候   通过FailoverController  把standby状态的namenode切换成active状态   对外提供服务

九、HDFS 的Trash回收站

*配置:在每个节点(不仅仅是主节点)上添加配置 core-site.xml,增加如下内容

<property>

    <name>fs.trash.interval</name>

    <value>1000</value>  表示1000分钟清空一次回收站

</property>

十、解决hdfs中小文件的方法

1、在上传之前先通过程序把多个小文件合并成一个文件

final Path path = new Path("/combinedfile");

  final FSDataOutputStream create = fs.create(path);

  final File dir = new File("C:\\Windows\\System32\\drivers\\etc");

  for(File fileName : dir.listFiles()) {

  System.out.println(fileName.getAbsolutePath());

  final FileInputStream fileInputStream = new FileInputStream(fileName.getAbsolutePath());

  final List<String> readLines = IOUtils.readLines(fileInputStream);

  for (String line : readLines) {

  create.write(line.getBytes()); 

  }

  fileInputStream.close();

  }

  create.close();

2、SequenceFile

Configuration conf=new Configuration();

FileSystem fs=FileSystem.get(conf);

Path seqFile=new Path("seqFile.seq");

 

//Reader内部类用于文件的读取操作

SequenceFile.Reader reader=new SequenceFile.Reader(fs,seqFile,conf);

 

//Writer内部类用于文件的写操作,假设Key和Value都为Text类型

SequenceFile.Writer writer=new SequenceFile.Writer(fs,conf,seqFile,Text.class,Text.class);

 

//通过writer向文档中写入记录

writer.append(new Text("key"),new Text("value"));

IOUtils.closeStream(writer);//关闭write流

 

//通过reader从文档中读取记录

Text key=new Text();

Text value=new Text();

while(reader.next(key,value)){

  System.out.println(key);

  System.out.println(value);

}

IOUtils.closeStream(reader);//关闭read流

3、打成har包

十一  集群间复制

hadoop distcp hdfs://source  hdfs://destination

标签: HDFS
  • 打赏
  • 点赞
  • 收藏
  • 分享
共有 人打赏支持
粉丝 2
博文 31
码字总数 86792
×
xiaozhou18
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
* 金额(元)
¥1 ¥5 ¥10 ¥20 其他金额
打赏人
留言
* 支付类型
微信扫码支付
打赏金额:
已支付成功
打赏金额: