當前位置：首頁 > 运维知识 > linux >内容正文

linux

Linux--Sys_Read系统调用过程分析

發布時間：2023/12/15 linux 37 豆豆

生活随笔收集整理的這篇文章主要介紹了 Linux--Sys_Read系统调用过程分析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

注：

本片文章以Read函數的調用為例來講述一下系統對塊驅動層的一些處理, 哈哈。如果有不正確或者不完善的地方，歡迎前來拍磚留言或者發郵件到guopeixin@126.com進行討論，先行謝過。

一．Read函數經由的層次模型

首先來了解一下Read函數經由的層次模型：

從圖中可以看出，對于磁盤的一次讀請求，首先經過虛擬文件系統層（vfs layer），其次是具體的文件系統層（例如 ext2），接下來是 cache 層（page cache 層）、通用塊層（generic block layer）、IO 調度層（I/O scheduler layer）、塊設備驅動層（block device driver layer），最后是物理塊設備層（block device layer）。

下面摘抄一份文檔，來對上面的各個層面的作用做一些簡述：

? 虛擬文件系統層的作用：屏蔽下層具體文件系統操作的差異，為上層的操作提供一個統一的接口。正是因為有了這個層次，所以可以把設備抽象成文件，使得操作設備就像操作文件一樣簡單。

? 在具體的文件系統層中，不同的文件系統（例如 ext2 和 NTFS）具體的操作過程也是不同的。每種文件系統定義了自己的操作集合。關于文件系統的更多內容，請參見參考資料。

? 引入 cache 層的目的是為了提高 linux 操作系統對磁盤訪問的性能。 Cache 層在內存中緩存了磁盤上的部分數據。當數據的請求到達時，如果在 cache 中存在該數據且是最新的，則直接將數據傳遞給用戶程序，免除了對底層磁盤的操作，提高了性能。

? 通用塊層的主要工作是：接收上層發出的磁盤請求，并最終發出 IO 請求。該層隱藏了底層硬件塊設備的特性，為塊設備提供了一個通用的抽象視圖。

? IO 調度層的功能：接收通用塊層發出的 IO 請求，緩存請求并試圖合并相鄰的請求（如果這兩個請求的數據在磁盤上是相鄰的）。并根據設置好的調度算法，回調驅動層提供的請求處理函數，以處理具體的 IO 請求。

? 驅動層中的驅動程序對應具體的物理塊設備。它從上層中取出 IO 請求，并根據該 IO 請求中指定的信息，通過向具體塊設備的設備控制器發送命令的方式，來操縱設備傳輸數據。

? 設備層中都是具體的物理設備。定義了操作具體設備的規范。

二．系統調用的發起點sys_read

1. sys_read代碼分析

Sys_read最終被注冊為系統API，在很多的系統模塊中都可以看到該API的調用。

函數sys_read()的代碼如下：

asmlinkage ssize_t sys_read(unsigned int fd, char __user * buf, size_t count)

{

struct file *file;

ssize_t ret = -EBADF;

int fput_needed;

file = fget_light(fd, &fput_needed);

if (file) {

loff_t pos = file_pos_read(file);

ret = vfs_read(file, buf, count, &pos);

file_pos_write(file, pos);

fput_light(file, fput_needed);

}

return ret;

}

ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)

{

ssize_t ret;

if (!(file->f_mode & FMODE_READ))

return -EBADF;

if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read))

return -EINVAL;

if (unlikely(!access_ok(VERIFY_WRITE, buf, count)))

return -EFAULT;

ret = rw_verify_area(READ, file, pos, count);

if (ret >= 0) {

count = ret;

if (file->f_op->read)

ret = file->f_op->read(file, buf, count, pos);

else

ret = do_sync_read(file, buf, count, pos);

if (ret > 0) {

fsnotify_access(file->f_path.dentry);

add_rchar(current, ret);

}

inc_syscr(current);

}

return ret;

}

從上面可以看到，調用Stack為sys_read()àvfs_read()àfile->f_op->read()。而file->f_op->read實際上就是具體的文件系統向通用Block層注冊的一個函數指針，對于本文中講述的EXT2文件系統來說，實際上就是do_sync_read。

三．Ext2文件系統在sys_read調用過程中的角色

1. Ext2文件系統file_operations接口的注冊過程

Ext2文件系統的模塊初始化函數會去注冊操作接口ext2_file_operations，調用Stack如下init_ext2_fs()à register_filesystem()àext2_get_sb()àext2_fill_super（）àext2_iget()，其中函數ext2_iget()會獲取結構體file_operations的值。其中，接口的定義如下：

* We have mostly NULL's here: the current defaults are ok for

* the ext2 filesystem.

const struct file_operations ext2_file_operations = {

.llseek = generic_file_llseek,

.read = do_sync_read,

.write = do_sync_write,

.aio_read = generic_file_aio_read,

.aio_write = generic_file_aio_write,

.unlocked_ioctl = ext2_ioctl,

#ifdef CONFIG_COMPAT

.compat_ioctl = ext2_compat_ioctl,

#endif

.mmap = generic_file_mmap,

.open = generic_file_open,

.release = ext2_release_file,

.fsync = ext2_sync_file,

.splice_read = generic_file_splice_read,

.splice_write = generic_file_splice_write,

};

const struct address_space_operations ext2_aops = {

.readpage = ext2_readpage,

.readpages = ext2_readpages,

.writepage = ext2_writepage,

.sync_page = block_sync_page,

.write_begin = ext2_write_begin,

.write_end = generic_write_end,

.bmap = ext2_bmap,

.direct_IO = ext2_direct_IO,

.writepages = ext2_writepages,

.migratepage = buffer_migrate_page,

.is_partially_uptodate = block_is_partially_uptodate,

};

const struct address_space_operations ext2_nobh_aops = {

.readpage = ext2_readpage,

.readpages = ext2_readpages,

.writepage = ext2_nobh_writepage,

.sync_page = block_sync_page,

.write_begin = ext2_nobh_write_begin,

.write_end = nobh_write_end,

.bmap = ext2_bmap,

.direct_IO = ext2_direct_IO,

.writepages = ext2_writepages,

.migratepage = buffer_migrate_page,

};

而函數ext2_iget()中的相關代碼如下：

struct inode *ext2_iget (struct super_block *sb, unsigned long ino)

{

...

if (S_ISREG(inode->i_mode)) {

inode->i_op = &ext2_file_inode_operations;

if (ext2_use_xip(inode->i_sb)) {

inode->i_mapping->a_ops = &ext2_aops_xip;

inode->i_fop = &ext2_xip_file_operations;

} else if (test_opt(inode->i_sb, NOBH)) {

inode->i_mapping->a_ops = &ext2_nobh_aops;

inode->i_fop = &ext2_file_operations;

} else {

inode->i_mapping->a_ops = &ext2_aops;

inode->i_fop = &ext2_file_operations;

}

} else if (S_ISDIR(inode->i_mode)) {

inode->i_op = &ext2_dir_inode_operations;

inode->i_fop = &ext2_dir_operations;

if (test_opt(inode->i_sb, NOBH))

inode->i_mapping->a_ops = &ext2_nobh_aops;

else

inode->i_mapping->a_ops = &ext2_aops;

} else if (S_ISLNK(inode->i_mode)) {

if (ext2_inode_is_fast_symlink(inode))

inode->i_op = &ext2_fast_symlink_inode_operations;

else {

inode->i_op = &ext2_symlink_inode_operations;

if (test_opt(inode->i_sb, NOBH))

inode->i_mapping->a_ops = &ext2_nobh_aops;

else

inode->i_mapping->a_ops = &ext2_aops;

}

} else {

inode->i_op = &ext2_special_inode_operations;

if (raw_inode->i_block[0])

init_special_inode(inode, inode->i_mode,

old_decode_dev(le32_to_cpu(raw_inode->i_block[0])));

else

init_special_inode(inode, inode->i_mode,

new_decode_dev(le32_to_cpu(raw_inode->i_block[1])));

}

...

}

2. 系統Read過程調用在該層的Stack

四．Page Cache在Sys_read調用過程中所做的工作

1. Page Cache在Sys_read調用過程中所做的工作

從前面粘貼的函數ext2_iget()的代碼中中可以看到inode->i_mapping->a_ops = &ext2_aops，實際上這里就是注冊了頁面緩存的一些接口。

上一部分提到Ext2調用的結束點就是mappingàa_opsàreadpage(file, page)，實際上執行的就是ext2_aops.readpage(file, page)，也即ext2_readpage。

有關函數ext2_readpage()的調用Stack如下：

五．通用Block層和IO Schedule層扮演的角色

這部分相對比較簡單，通過函數submit_bio()的調用直接可以找到，相關調用Stack如下：

六．Driver所做的事情

哎呀，分析了半天還沒有看到塊設備驅動的參與，不要急，這里就來了，呵呵。

在塊設備驅動中一般會調用通過Block層的導出函數blk_init_queue()來注冊執行具體操作的函數，形如q->request_fn = rfn。

相關代碼如下：

struct request_queue *

blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)

{

struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);

if (!q)

return NULL;

q->node = node_id;

if (blk_init_free_list(q)) {

kmem_cache_free(blk_requestq_cachep, q);

return NULL;

}

* if caller didn't supply a lock, they get per-queue locking with

* our embedded lock

if (!lock)

lock = &q->__queue_lock;

q->request_fn = rfn;

q->prep_rq_fn = NULL;

q->unplug_fn = generic_unplug_device;

q->queue_flags = (1 << QUEUE_FLAG_CLUSTER);

q->queue_lock = lock;

blk_queue_segment_boundary(q, 0xffffffff);

blk_queue_make_request(q, __make_request);

blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE);

blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);

blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);

q->sg_reserved_size = INT_MAX;

blk_set_cmd_filter_defaults(&q->cmd_filter);

* all done

if (!elevator_init(q, NULL)) {

blk_queue_congestion_threshold(q);

return q;

}

blk_put_queue(q);

return NULL;

}

至此，整個流程分析完畢。

最終，匯總的流程圖如下：

注：

根據吳仲杰大哥描述，函數request_fn確實是塊驅動的入口。

創作挑戰賽新人創作獎勵來咯，堅持創作打卡瓜分現金大獎

總結

以上是生活随笔為你收集整理的Linux--Sys_Read系统调用过程分析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： Linux--根文件系统的挂载过程分析
下一篇： Linux(fedora 10)Hell