日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 运维知识 > 数据库 >内容正文

数据库

mysql源码解读——MVCC

發布時間:2023/12/31 数据库 20 豆豆
生活随笔 收集整理的這篇文章主要介紹了 mysql源码解读——MVCC 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、什么是MVCC

MVCC(Multi-Version Concurrency Control)多版本并發控制,這個玩意兒當初大意過,竟然理解成了源代碼的版本控制。傻了巴唧的。MVCC其實是用來做數據安全性的,有過多線程的共享數據控制的編寫經驗的開發人員,理解起來會更容易一些。后來在區塊鏈中的提高交易速度時,有一些鏈采用了并行交易,而這其中,對交易的控制管理也使用了MVCC的控制方式。在MySql數據庫數據的訪問中,多個客戶端訪問服務端時,如果有讀有寫,就可能產生數據不一致的現象(臟讀和幻讀,而具體到為RC和RR即Read Committed和Repeatable Read兩個事務,MySql默認是RR事務隔離級別),而此時就需要用到MVCC版本控制 。不同版本的MySql對MVCC的應用,可能會有所不同,這時請關注相關版本的官方說明文檔,一切以官方文檔或者源碼為基準,不要想當然。如果想進一步對數據庫中的相關數據安全性有興趣,推薦看一下《數據密集型應用系統設計》,其中不但MVCC講的清晰還有更深層次的各種剖析。

二、mysql中的應用

在MySql中,讀取已提交和可重復讀這兩個事務中MVCC是有效的,也就是說,只有在這兩種情況下,才有討論MVCC的意義。在MySql中為了實現MVCC,InnoDB引擎默認為每一行添加了三個隱藏列(Oracle等數據庫也有類似的動作),這三個列分別為:
DB_ROW_ID:6字節長的ID,MySQL中如果沒有主鍵會默認創建這個,當初Oracle也有一個類似的ROWID;
DB_TRX_ID:6字節長的事務ID,存儲了當前事務在做INSERT或UPDATE語句操作時的最后一個事務ID;
DB_ROLL_PTR:7字節長的回滾指針,其指向寫入回滾段的undo log記錄,通過它可以將不同的版本串聯起來,形成版本鏈。這個如果不定期提交事務,那么會使回滾部分占滿空間。
在MVCC中讀操作有兩種,快照讀(snapshot read)和當前讀(current read),快照讀不加鎖,只讀可見版本;當前讀即增刪改,需要加鎖,至于為啥叫讀,你增刪改不也得先讀到指定的位置才能寫!
在MySql中有兩種實現事務隔離的方案,除了今天重點說的MVCC,另外簡單說明一下MySql中LBCC方案,其有兩個鎖:
Record lock: 只鎖索引而不是記錄。如果沒有指定主鍵索引,如上所述InnoDB會創建一個隱藏的主鍵索引。
Gap lock: 間隙鎖,它創建在指定記錄前或后條記錄之間間隙的鎖,它只要是用于解決RR隔離級別下的幻讀問題。
提到MVCC就得提到Read View(這玩意兒和PBFT中的場景有點類似),在不同的事務級別下(前面提到的RC和RR),Read View的產生機制也有不同,比如RR下會創建使用同一個事務創建的快照,而RC則每次生成一個新Read View。
在查詢的過程中,有兩種情況,一種查詢是在本事務中,一種不是在本事務中。在MySql中,單純的查詢不會產生事務ID,只有更新(增刪改)操作后才會有,而且ID不是更新開始就創建而是這個語句完成后才會創建。
這里面的不同在于,如果在相同事務中,是可以看到相關的更新的數據內容的。
那么什么是Read View?前面提到過undo log,Read View其實就是通過這些快照數據產生的讀視圖,視圖中的每條數據,可以通過上面提到的DB_TRX_ID和DB_ROLL_PTR來標識版本和指向下一個版本的指針。如果有C語言中的鏈表的經驗那么這個說法非常容易理解。通常,這個DB_TRX_ID,即事務ID是自動+1的。所以最新的事務其ID值是最大的。弄明白了Read View,就可以理解MVCC的流程了:
1、將當前存在的事務分成三部分:已提交事務;未提交事務和已提交事務;未開始事務。這三部分通過目前已知活動的事務ID中找出最小ID,最大ID(Read View來維護)。
2、三段的意義是:小于最小ID的,表明已經提交成功,在查詢時數據是可見的,也就是可以查詢出來的;大于最大ID的,說明事務尚未啟動,數據不可見;這里面需要說明的是“未提交事務和已提交事務”,它指的是,在Read View中,如果這個事務ID處于未提交事務數組中,那么這個數據不可見;如果不在這個數組中,則可見。記住噢,只有一個未提交事務數組。通過它來判斷。
3、通過這三段ID來判斷Read View中的事務ID,小于最小ID的,歸為已提交事務;大于最大ID的歸為未開始事務;余下的為未提交事務和已提交事務。
4、根據具體的判斷結果,來決定采取使用哪個版本中的具體的數據。
5、處理版本數據并返回。

三、源碼解讀

通過上面的具體分析,來看一下源碼相關具體的實現:
1、基本的數據結構
基本的數據結構包括事務、MVCC和Read View:

//storage/innobase/include /** The transaction system central memory data structure. */ struct trx_sys_t {TrxSysMutex mutex; /*!< mutex protecting most fields inthis structure except when notedotherwise */MVCC *mvcc; /*!< Multi version concurrency controlmanager */volatile trx_id_t max_trx_id; /*!< The smallest number not yetassigned as a transaction id ortransaction number. This is declaredvolatile because it can be accessedwithout holding any mutex duringAC-NL-RO view creation. */std::atomic<trx_id_t> min_active_id;/*!< Minimal transaction id which isstill in active state. */trx_ut_list_t serialisation_list;/*!< Ordered on trx_t::no of all thecurrenrtly active RW transactions */ #ifdef UNIV_DEBUGtrx_id_t rw_max_trx_no; /*!< Max trx number of read-writetransactions added for purge. */ #endif /* UNIV_DEBUG */char pad1[64]; /*!< To avoid false sharing */trx_ut_list_t rw_trx_list; /*!< List of active and committed inmemory read-write transactions, sortedon trx id, biggest first. Recoveredtransactions are always on this list. */char pad2[64]; /*!< To avoid false sharing */trx_ut_list_t mysql_trx_list; /*!< List of transactions createdfor MySQL. All user transactions areon mysql_trx_list. The rw_trx_listcan contain system transactions andrecovered transactions that will notbe in the mysql_trx_list.mysql_trx_list may additionally containtransactions that have not yet beenstarted in InnoDB. */trx_ids_t rw_trx_ids; /*!< Array of Read write transaction IDsfor MVCC snapshot. A ReadView would takea snapshot of these transactions whosechanges are not visible to it. We shouldremove transactions from the list beforecommitting in memory and releasing locksto ensure right order of removal andconsistent snapshot. */char pad3[64]; /*!< To avoid false sharing */Rsegs rsegs; /*!< Vector of pointers to rollbacksegments. These rsegs are iteratedand added to the end under a readlock. They are deleted under a writelock while the vector is adjusted.They are created and destroyed insingle-threaded mode. */Rsegs tmp_rsegs; /*!< Vector of pointers to rollbacksegments within the temp tablespace;This vector is created and destroyedin single-threaded mode so it is notprotected by any mutex because it isread-only during multi-threadedoperation. *//** Length of the TRX_RSEG_HISTORY list (update undo logs for committed* transactions). */std::atomic<uint64_t> rseg_history_len;TrxIdSet rw_trx_set; /*!< Mapping from transaction idto transaction instance */ulint n_prepared_trx; /*!< Number of transactions currentlyin the XA PREPARED state */bool found_prepared_trx; /*!< True if XA PREPARED trxs arefound. */ }; /** The MVCC read view manager */ //storage/innobase/include/read0read.h class MVCC {public:/** Constructor@param size Number of views to pre-allocate */explicit MVCC(ulint size);/** Destructor.Free all the views in the m_free list */~MVCC();/** Allocate and create a view.@param view View owned by this class created for the caller. Must befreed by calling view_close()@param trx Transaction instance of caller */void view_open(ReadView *&view, trx_t *trx);/**Close a view created by the above function.@param view view allocated by trx_open.@param own_mutex true if caller owns trx_sys_t::mutex */void view_close(ReadView *&view, bool own_mutex);/**Release a view that is inactive but not closed. Caller must ownthe trx_sys_t::mutex.@param view View to release */void view_release(ReadView *&view);/** Clones the oldest view and stores it in view. No need tocall view_close(). The caller owns the view that is passed in.It will also move the closed views from the m_views list to them_free list. This function is called by Purge to determine whether it shouldpurge the delete marked record or not.@param view Preallocated view, owned by the caller */void clone_oldest_view(ReadView *view);/**@return the number of active views */ulint size() const;/**@return true if the view is active and valid */static bool is_view_active(ReadView *view) {ut_a(view != reinterpret_cast<ReadView *>(0x1));return (view != nullptr && !(intptr_t(view) & 0x1));}/**Set the view creator transaction id. Note: This shouldbe set onlyfor views created by RW transactions. */static void set_view_creator_trx_id(ReadView *view, trx_id_t id);private:/**Validates a read view list. */bool validate() const;/**Find a free view from the active list, if none found then allocatea new view. This function will also attempt to move delete markedviews from the active list to the freed list.@return a view to use */inline ReadView *get_view();/**Get the oldest view in the system. It will also move the deletemarked read views from the views list to the freed list.@return oldest view if found or NULL */inline ReadView *get_oldest_view() const;ReadView *get_view_created_by_trx_id(trx_id_t trx_id) const;private:// Prevent copyingMVCC(const MVCC &);MVCC &operator=(const MVCC &);private:typedef UT_LIST_BASE_NODE_T(ReadView) view_list_t;/** Free views ready for reuse. */view_list_t m_free;/** Active and closed views, the closed views will have thecreator trx id set to TRX_ID_MAX */view_list_t m_views; };/** Mapping read-write transactions from id to transaction instance, for creating read views and during trx id lookup for MVCC and locking. */ struct TrxTrack {explicit TrxTrack(trx_id_t id, trx_t *trx = nullptr) : m_id(id), m_trx(trx) {// Do nothing}trx_id_t m_id;trx_t *m_trx; };struct TrxTrackHash {size_t operator()(const TrxTrack &key) const { return (size_t(key.m_id)); } };/** Comparator for TrxMap */ struct TrxTrackHashCmp {bool operator()(const TrxTrack &lhs, const TrxTrack &rhs) const {return (lhs.m_id == rhs.m_id);} };/** Comparator for TrxMap */ struct TrxTrackCmp {bool operator()(const TrxTrack &lhs, const TrxTrack &rhs) const {return (lhs.m_id < rhs.m_id);} };// typedef std::unordered_set<TrxTrack, TrxTrackHash, TrxTrackHashCmp> TrxIdSet; typedef std::set<TrxTrack, TrxTrackCmp, ut_allocator<TrxTrack>> TrxIdSet;//storage/innobase/include // Friend declaration class MVCC;/** Read view lists the trx ids of those transactions for which a consistent read should not see the modifications to the database. */class ReadView {/** This is similar to a std::vector but it is not a dropin replacement. It is specific to ReadView. */class ids_t {typedef trx_ids_t::value_type value_type;/**Constructor */ids_t() : m_ptr(), m_size(), m_reserved() {}/**Destructor */~ids_t() { UT_DELETE_ARRAY(m_ptr); }/** Try and increase the size of the array. Old elements are copied across.It is a no-op if n is < current size.@param n Make space for n elements */void reserve(ulint n);/**Resize the array, sets the current element count.@param n new size of the array, in elements */void resize(ulint n) {ut_ad(n <= capacity());m_size = n;}/**Reset the size to 0 */void clear() { resize(0); }/**@return the capacity of the array in elements */ulint capacity() const { return (m_reserved); }/**Copy and overwrite the current array contents@param start Source array@param end Pointer to end of array */void assign(const value_type *start, const value_type *end);/**Insert the value in the correct slot, preserving the order.Doesn't check for duplicates. */void insert(value_type value);/**@return the value of the first element in the array */value_type front() const {ut_ad(!empty());return (m_ptr[0]);}/**@return the value of the last element in the array */value_type back() const {ut_ad(!empty());return (m_ptr[m_size - 1]);}/**Append a value to the array.@param value the value to append */void push_back(value_type value);/**@return a pointer to the start of the array */trx_id_t *data() { return (m_ptr); }/**@return a const pointer to the start of the array */const trx_id_t *data() const { return (m_ptr); }/**@return the number of elements in the array */ulint size() const { return (m_size); }/**@return true if size() == 0 */bool empty() const { return (size() == 0); }private:// Prevent copyingids_t(const ids_t &);ids_t &operator=(const ids_t &);private:/** Memory for the array */value_type *m_ptr;/** Number of active elements in the array */ulint m_size;/** Size of m_ptr in elements */ulint m_reserved;friend class ReadView;};public:ReadView();~ReadView();/** Check whether transaction id is valid.@param[in] id transaction id to check@param[in] name table name */static void check_trx_id_sanity(trx_id_t id, const table_name_t &name);/** Check whether the changes by id are visible.@param[in] id transaction id to check against the view@param[in] name table name@return whether the view sees the modifications of id. */bool changes_visible(trx_id_t id, const table_name_t &name) constMY_ATTRIBUTE((warn_unused_result)) {ut_ad(id > 0);if (id < m_up_limit_id || id == m_creator_trx_id) {return (true);}check_trx_id_sanity(id, name);if (id >= m_low_limit_id) {return (false);} else if (m_ids.empty()) {return (true);}const ids_t::value_type *p = m_ids.data();return (!std::binary_search(p, p + m_ids.size(), id));}/**@param id transaction to check@return true if view sees transaction id */bool sees(trx_id_t id) const { return (id < m_up_limit_id); }/**Mark the view as closed */void close() {ut_ad(m_creator_trx_id != TRX_ID_MAX);m_creator_trx_id = TRX_ID_MAX;}/**@return true if the view is closed */bool is_closed() const { return (m_closed); }/**Write the limits to the file.@param file file to write to */void print_limits(FILE *file) const {fprintf(file,"Trx read view will not see trx with"" id >= " TRX_ID_FMT ", sees < " TRX_ID_FMT "\n",m_low_limit_id, m_up_limit_id);}/** Check and reduce low limit number for read view. Used toblock purge till GTID is persisted on disk table.@param[in] trx_no transaction number to check with */void reduce_low_limit(trx_id_t trx_no) {if (trx_no < m_low_limit_no) {/* Save low limit number set for Read View for MVCC. */ut_d(m_view_low_limit_no = m_low_limit_no);m_low_limit_no = trx_no;}}/**@return the low limit no */trx_id_t low_limit_no() const { return (m_low_limit_no); }/**@return the low limit id */trx_id_t low_limit_id() const { return (m_low_limit_id); }/**@return true if there are no transaction ids in the snapshot */bool empty() const { return (m_ids.empty()); }#ifdef UNIV_DEBUG/**@return the view low limit number */trx_id_t view_low_limit_no() const { return (m_view_low_limit_no); }/**@param rhs view to compare with@return truen if this view is less than or equal rhs */bool le(const ReadView *rhs) const {return (m_low_limit_no <= rhs->m_low_limit_no);} #endif /* UNIV_DEBUG */private:/**Copy the transaction ids from the source vector */inline void copy_trx_ids(const trx_ids_t &trx_ids);/**Opens a read view where exactly the transactions serialized before thispoint in time are seen in the view.@param id Creator transaction id */inline void prepare(trx_id_t id);/**Copy state from another view. Must call copy_complete() to finish.@param other view to copy from */inline void copy_prepare(const ReadView &other);/**Complete the copy, insert the creator transaction id into them_trx_ids too and adjust the m_up_limit_id *, if required */inline void copy_complete();/**Set the creator transaction id, existing id must be 0 */void creator_trx_id(trx_id_t id) {ut_ad(m_creator_trx_id == 0);m_creator_trx_id = id;}friend class MVCC;private:// Disable copyingReadView(const ReadView &);ReadView &operator=(const ReadView &);private:/** The read should not see any transaction with trx id >= thisvalue. In other words, this is the "high water mark". */trx_id_t m_low_limit_id;/** The read should see all trx ids which are strictlysmaller (<) than this value. In other words, this is thelow water mark". */trx_id_t m_up_limit_id;/** trx id of creating transaction, set to TRX_ID_MAX for freeviews. */trx_id_t m_creator_trx_id;/** Set of RW transactions that was active when this snapshotwas taken */ids_t m_ids;/** The view does not need to see the undo logs for transactionswhose transaction number is strictly smaller (<) than this value:they can be removed in purge if not needed by other views */trx_id_t m_low_limit_no;#ifdef UNIV_DEBUG/** The low limit number up to which read views don't need to accessundo log records for MVCC. This could be higher than m_low_limit_noif purge is blocked for GTID persistence. Currently used for debugvariable INNODB_PURGE_VIEW_TRX_ID_AGE. */trx_id_t m_view_low_limit_no; #endif /* UNIV_DEBUG *//** AC-NL-RO transaction view that has been "closed". */bool m_closed;typedef UT_LIST_NODE_T(ReadView) node_t;/** List of read views in trx_sys */byte pad1[64 - sizeof(node_t)];node_t m_view_list; };

/*
其實看上面的數據結構,其實內聚性還是比較好的,內聚性好意味著學習時的難度也降低不少,至少不用不斷的跳來跳去。英文注釋也挺清晰。

2、讀操作流程
一個完整的MVVC的對外暴露過程是從Select開始的,它的調用棧在前面提到過:
do_command->dispatch_sql_command->mysql_execute_command ->m_sql_cmd->execute---->row_sel->row_sel_get_clust_rec 最終會調用(一個集群一個非集群看實際的場景):

//storage/innobase/lock/lock0lock.cc /** Checks that a record is seen in a consistent read.@return true if sees, or false if an earlier version of the recordshould be retrieved */ bool lock_clust_rec_cons_read_sees(const rec_t *rec, /*!< in: user record which should be read orpassed over by a read cursor */dict_index_t *index, /*!< in: clustered index */const ulint *offsets, /*!< in: rec_get_offsets(rec, index) */ReadView *view) /*!< in: consistent read view */ {ut_ad(index->is_clustered());ut_ad(page_rec_is_user_rec(rec));ut_ad(rec_offs_validate(rec, index, offsets));/* Temp-tables are not shared across connections and multipletransactions from different connections cannot simultaneouslyoperate on same temp-table and so read of temp-table isalways consistent read. */if (srv_read_only_mode || index->table->is_temporary()) {ut_ad(view == nullptr || index->table->is_temporary());return (true);}/* NOTE that we call this function while holding the searchsystem latch. */trx_id_t trx_id = row_get_rec_trx_id(rec, index, offsets);return (view->changes_visible(trx_id, index->table->name)); }/** Checks that a non-clustered index record is seen in a consistent read.NOTE that a non-clustered index page contains so little information onits modifications that also in the case false, the present version ofrec may be the right, but we must check this from the clustered indexrecord.@return true if certainly sees, or false if an earlier version of theclustered index record might be needed */ bool lock_sec_rec_cons_read_sees(const rec_t *rec, /*!< in: user record whichshould be read or passed overby a read cursor */const dict_index_t *index, /*!< in: index */const ReadView *view) /*!< in: consistent read view */ {ut_ad(page_rec_is_user_rec(rec));/* NOTE that we might call this function while holding the searchsystem latch. */if (recv_recovery_is_on()) {return (false);} else if (index->table->is_temporary()) {/* Temp-tables are not shared across connections and multipletransactions from different connections cannot simultaneouslyoperate on same temp-table and so read of temp-table isalways consistent read. */return (true);}trx_id_t max_trx_id = page_get_max_trx_id(page_align(rec));ut_ad(max_trx_id > 0);return (view->sees(max_trx_id)); }

看一下最后的返回值函數:

/** Check whether the changes by id are visible. @param[in] id transaction id to check against the view @param[in] name table name @return whether the view sees the modifications of id. */ bool changes_visible(trx_id_t id, const table_name_t &name) constMY_ATTRIBUTE((warn_unused_result)) {ut_ad(id > 0);if (id < m_up_limit_id || id == m_creator_trx_id) {return (true);}check_trx_id_sanity(id, name);if (id >= m_low_limit_id) {return (false);} else if (m_ids.empty()) {return (true);}const ids_t::value_type *p = m_ids.data();return (!std::binary_search(p, p + m_ids.size(), id)); }

需要注意的是,這個判斷和前面講的有些細節的不同,以源碼為主,前面的分析主要是為了說明具體的應用過程。這里增加空和等于兩種判斷,等于表示本事務內數據,當然可見;空的話也是可見(ID在中間且空)。

3、Read View創建
剛才說過,在RR的情況下第一次查詢會生成Read Veiw,那么看一下具體的過程:

//row0sel.cc dberr_t row_search_mvcc(byte *buf, page_cur_mode_t mode,row_prebuilt_t *prebuilt, ulint match_mode,const ulint direction) {DBUG_TRACE;dict_index_t *index = prebuilt->index;ibool comp = dict_table_is_comp(index->table);const dtuple_t *search_tuple = prebuilt->search_tuple;....../* Do some start-of-statement preparations */if (!prebuilt->sql_stat_start) {/* No need to set an intention lock or assign a read view */if (!MVCC::is_view_active(trx->read_view) && !srv_read_only_mode &&prebuilt->select_lock_type == LOCK_NONE) {ib::error(ER_IB_MSG_1031) << "MySQL is trying to perform a"" consistent read but the read view is not"" assigned!";trx_print(stderr, trx, 600);fputc('\n', stderr);ut_error;}} else if (prebuilt->select_lock_type == LOCK_NONE) {/* This is a consistent read *//* Assign a read view for the query */if (!srv_read_only_mode) {trx_assign_read_view(trx);//此處調用}prebuilt->sql_stat_start = FALSE;} else {wait_table_again:err = lock_table(0, index->table,prebuilt->select_lock_type == LOCK_S ? LOCK_IS : LOCK_IX,thr);if (err != DB_SUCCESS) {table_lock_waited = TRUE;goto lock_table_wait;}prebuilt->sql_stat_start = FALSE;}...... } /** Assigns a read view for a consistent read query. All the consistent readswithin the same transaction will get the same read view, which is createdwhen this function is first called for a new started transaction.@return consistent read view */ ReadView *trx_assign_read_view(trx_t *trx) /*!< in/out: active transaction */ {ut_ad(trx->state == TRX_STATE_ACTIVE);if (srv_read_only_mode) {ut_ad(trx->read_view == nullptr);return (nullptr);} else if (!MVCC::is_view_active(trx->read_view)) {trx_sys->mvcc->view_open(trx->read_view, trx);}return (trx->read_view); } /** Allocate and create a view. @param view View owned by this class created for the caller. Must be freed by calling view_close() @param trx Transaction instance of caller */ void MVCC::view_open(ReadView *&view, trx_t *trx) {ut_ad(!srv_read_only_mode);/** If no new RW transaction has been started since the last viewwas created then reuse the the existing view. */if (view != nullptr) {uintptr_t p = reinterpret_cast<uintptr_t>(view);view = reinterpret_cast<ReadView *>(p & ~1);ut_ad(view->m_closed);/* NOTE: This can be optimised further, for now we onlyresuse the view iff there are no active RW transactions.There is an inherent race here between purge and thisthread. Purge will skip views that are marked as closed.Therefore we must set the low limit id after we reset theclosed status after the check. */if (trx_is_autocommit_non_locking(trx) && view->empty()) {view->m_closed = false;if (view->m_low_limit_id == trx_sys_get_max_trx_id()) {return;} else {view->m_closed = true;}}mutex_enter(&trx_sys->mutex);UT_LIST_REMOVE(m_views, view);} else {mutex_enter(&trx_sys->mutex);view = get_view();}if (view != nullptr) {view->prepare(trx->id);UT_LIST_ADD_FIRST(m_views, view);//增加到MVCC控制視圖變量中ut_ad(!view->is_closed());ut_ad(validate());}trx_sys_mutex_exit(); } /** Find a free view from the active list, if none found then allocate a new view. @return a view to use */ReadView *MVCC::get_view() {ut_ad(mutex_own(&trx_sys->mutex));ReadView *view;if (UT_LIST_GET_LEN(m_free) > 0) {view = UT_LIST_GET_FIRST(m_free);UT_LIST_REMOVE(m_free, view);} else {view = UT_NEW_NOKEY(ReadView());if (view == nullptr) {ib::error(ER_IB_MSG_918) << "Failed to allocate MVCC view";}}return (view); } /** Opens a read view where exactly the transactions serialized before this point in time are seen in the view. @param id Creator transaction id */void ReadView::prepare(trx_id_t id) {ut_ad(mutex_own(&trx_sys->mutex));m_creator_trx_id = id;m_low_limit_no = m_low_limit_id = m_up_limit_id = trx_sys->max_trx_id;if (!trx_sys->rw_trx_ids.empty()) {copy_trx_ids(trx_sys->rw_trx_ids);} else {m_ids.clear();}ut_ad(m_up_limit_id <= m_low_limit_id);if (UT_LIST_GET_LEN(trx_sys->serialisation_list) > 0) {const trx_t *trx;trx = UT_LIST_GET_FIRST(trx_sys->serialisation_list);if (trx->no < m_low_limit_no) {m_low_limit_no = trx->no;}}ut_d(m_view_low_limit_no = m_low_limit_no);m_closed = false; }

看最后創建Read View可以看到分為兩種情況即視圖為空和不為空,不為空則使用原有的,為空則從空閑視圖中拿一個,然后準備視圖并返回。

4、MVCC版本創建和分析
先看一下版本控制的發起,也就前面提到的更新操作:

/** Updates a record when the update causes no size changes in its fields. @param[in] flags Undo logging and locking flags @param[in] cursor Cursor on the record to update; cursor stays valid and positioned on the same record @param[in,out] offsets Offsets on cursor->page_cur.rec @param[in] update Update vector @param[in] cmpl_info Compiler info on secondary index updates @param[in] thr Query thread, or null if flags & (btr_no_locking_flag | btr_no_undo_log_flag | btr_create_flag | btr_keep_sys_flag) @param[in] trx_id Transaction id @param[in,out] mtr Mini-transaction; if this is a secondary index, the caller must mtr_commit(mtr) before latching any further pages @return locking or undo log related error code, or @retval DB_SUCCESS on success @retval DB_ZIP_OVERFLOW if there is not enough space left on the compressed page (IBUF_BITMAP_FREE was reset outside mtr) */ dberr_t btr_cur_update_in_place(ulint flags, btr_cur_t *cursor, ulint *offsets,const upd_t *update, ulint cmpl_info,que_thr_t *thr, trx_id_t trx_id, mtr_t *mtr) {dict_index_t *index;buf_block_t *block;page_zip_des_t *page_zip;dberr_t err;rec_t *rec;roll_ptr_t roll_ptr = 0;ulint was_delete_marked;ibool is_hashed;rec = btr_cur_get_rec(cursor);index = cursor->index;ut_ad(rec_offs_validate(rec, index, offsets));ut_ad(!!page_rec_is_comp(rec) == dict_table_is_comp(index->table));ut_ad(trx_id > 0 || (flags & BTR_KEEP_SYS_FLAG) ||index->table->is_intrinsic());/* The insert buffer tree should never be updated in place. */ut_ad(!dict_index_is_ibuf(index));ut_ad(dict_index_is_online_ddl(index) == !!(flags & BTR_CREATE_FLAG) ||index->is_clustered());ut_ad((flags & ~(BTR_KEEP_POS_FLAG | BTR_KEEP_IBUF_BITMAP)) ==(BTR_NO_UNDO_LOG_FLAG | BTR_NO_LOCKING_FLAG | BTR_CREATE_FLAG |BTR_KEEP_SYS_FLAG) ||thr_get_trx(thr)->id == trx_id);ut_ad(fil_page_index_page_check(btr_cur_get_page(cursor)));ut_ad(btr_page_get_index_id(btr_cur_get_page(cursor)) == index->id);DBUG_PRINT("ib_cur",("update-in-place %s (" IB_ID_FMT ") by " TRX_ID_FMT ": %s",index->name(), index->id, trx_id,rec_printer(rec, offsets).str().c_str()));block = btr_cur_get_block(cursor);page_zip = buf_block_get_page_zip(block);/* Check that enough space is available on the compressed page. */if (page_zip) {ut_ad(!index->table->is_temporary());if (!btr_cur_update_alloc_zip(page_zip, btr_cur_get_page_cur(cursor), index,offsets, rec_offs_size(offsets), false,mtr)) {return (DB_ZIP_OVERFLOW);}rec = btr_cur_get_rec(cursor);}/* Do lock checking and undo logging */err = btr_cur_upd_lock_and_undo(flags, cursor, offsets, update, cmpl_info,thr, mtr, &roll_ptr);if (UNIV_UNLIKELY(err != DB_SUCCESS)) {/* We may need to update the IBUF_BITMAP_FREEbits after a reorganize that was done inbtr_cur_update_alloc_zip(). */goto func_exit;}if (!(flags & BTR_KEEP_SYS_FLAG) && !index->table->is_intrinsic()) {row_upd_rec_sys_fields(rec, nullptr, index, offsets, thr_get_trx(thr),roll_ptr);}was_delete_marked =rec_get_deleted_flag(rec, page_is_comp(buf_block_get_frame(block)));is_hashed = (block->index != nullptr);if (is_hashed) {/* TO DO: Can we skip this if none of the fieldsindex->search_info->curr_n_fieldsare being updated? *//* The function row_upd_changes_ord_field_binary works onlyif the update vector was built for a clustered index, we mustNOT call it if index is secondary */if (!index->is_clustered() ||row_upd_changes_ord_field_binary(index, update, thr, nullptr, nullptr,nullptr)) {/* Remove possible hash index pointer to this record */btr_search_update_hash_on_delete(cursor);}rw_lock_x_lock(btr_get_search_latch(index));}assert_block_ahi_valid(block);row_upd_rec_in_place(rec, index, offsets, update, page_zip);if (is_hashed) {rw_lock_x_unlock(btr_get_search_latch(index));}btr_cur_update_in_place_log(flags, rec, index, update, trx_id, roll_ptr, mtr);if (was_delete_marked &&!rec_get_deleted_flag(rec, page_is_comp(buf_block_get_frame(block)))) {/* The new updated record owns its possible externallystored fields */lob::BtrContext btr_ctx(mtr, nullptr, index, rec, offsets, block);btr_ctx.unmark_extern_fields();}ut_ad(err == DB_SUCCESS);func_exit:if (page_zip && !(flags & BTR_KEEP_IBUF_BITMAP) && !index->is_clustered() &&page_is_leaf(buf_block_get_frame(block))) {/* Update the free bits in the insert buffer. */ibuf_update_free_bits_zip(block, mtr);}return (err); }

這里還有insert等,有興趣可以看看相關操作函數。查詢在前面提到的函數 row_search_mvcc()中發起:

dberr_t row_search_mvcc(byte *buf, page_cur_mode_t mode,row_prebuilt_t *prebuilt, ulint match_mode,const ulint direction) {else if (index == clust_index) {/* Fetch a previous version of the row if the currentone is not visible in the snapshot; if we have a veryhigh force recovery level set, we try to avoid crashesby skipping this lookup */if (srv_force_recovery < 5 &&!lock_clust_rec_cons_read_sees(rec, index, offsets,trx_get_read_view(trx))) {rec_t *old_vers;/* The following call returns 'offsets' associated with 'old_vers' */err = row_sel_build_prev_vers_for_mysql(trx->read_view, clust_index, prebuilt, rec, &offsets, &heap,&old_vers, need_vrow ? &vrow : nullptr, &mtr,prebuilt->get_lob_undo());if (err != DB_SUCCESS) {goto lock_wait_or_error;}if (old_vers == nullptr) {/* The row did not exist yet inthe read view */goto next_rec;}rec = old_vers;prev_rec = rec;ut_d(prev_rec_debug = row_search_debug_copy_rec_order_prefix(pcur, index, prev_rec, &prev_rec_debug_n_fields,&prev_rec_debug_buf, &prev_rec_debug_buf_size));} }

然后下來就是視圖的創建匹配和判斷,在前面已經提到過了。下面看一下記錄的版本具體數據的操作:
row_search_mvcc -> row_sel_build_prev_vers_for_mysql -> row_vers_build_for_consistent_read -> trx_undo_prev_version_build

bool trx_undo_prev_version_build(const rec_t *index_rec ATTRIB_USED_ONLY_IN_DEBUG,mtr_t *index_mtr ATTRIB_USED_ONLY_IN_DEBUG, const rec_t *rec,const dict_index_t *const index, ulint *offsets, mem_heap_t *heap,rec_t **old_vers, mem_heap_t *v_heap, const dtuple_t **vrow, ulint v_status,lob::undo_vers_t *lob_undo) {DBUG_TRACE;trx_undo_rec_t *undo_rec = nullptr;dtuple_t *entry;trx_id_t rec_trx_id;ulint type;undo_no_t undo_no;table_id_t table_id;trx_id_t trx_id;roll_ptr_t roll_ptr;upd_t *update = nullptr;byte *ptr;ulint info_bits;ulint cmpl_info;bool dummy_extern;byte *buf;ut_ad(!rw_lock_own(&purge_sys->latch, RW_LOCK_S));ut_ad(mtr_memo_contains_page(index_mtr, index_rec, MTR_MEMO_PAGE_S_FIX) ||mtr_memo_contains_page(index_mtr, index_rec, MTR_MEMO_PAGE_X_FIX));ut_ad(rec_offs_validate(rec, index, offsets));ut_a(index->is_clustered());roll_ptr = row_get_rec_roll_ptr(rec, index, offsets);*old_vers = nullptr;if (trx_undo_roll_ptr_is_insert(roll_ptr)) {/* The record rec is the first inserted version */return true;}rec_trx_id = row_get_rec_trx_id(rec, index, offsets);/* REDO rollback segments are used only for non-temporary objects.For temporary objects NON-REDO rollback segments are used. */bool is_temp = index->table->is_temporary();ut_ad(!index->table->skip_alter_undo);if (trx_undo_get_undo_rec(roll_ptr, rec_trx_id, heap, is_temp,index->table->name, &undo_rec)) {if (v_status & TRX_UNDO_PREV_IN_PURGE) {/* We are fetching the record being purged */undo_rec = trx_undo_get_undo_rec_low(roll_ptr, heap, is_temp);} else {/* The undo record may already have been purged,during purge or semi-consistent read. */return false;}}type_cmpl_t type_cmpl;ptr = trx_undo_rec_get_pars(undo_rec, &type, &cmpl_info, &dummy_extern,&undo_no, &table_id, type_cmpl);if (table_id != index->table->id) {/* The table should have been rebuilt, but purge hasnot yet removed the undo log records for thenow-dropped old table (table_id). */return true;}ptr = trx_undo_update_rec_get_sys_cols(ptr, &trx_id, &roll_ptr, &info_bits);/* (a) If a clustered index record version is such that thetrx id stamp in it is bigger than purge_sys->view, then theBLOBs in that version are known to exist (the purge has notprogressed that far);(b) if the version is the first version such that trx id in itis less than purge_sys->view, and it is not delete-marked,then the BLOBs in that version are known to exist (the purgecannot have purged the BLOBs referenced by that versionyet).This function does not fetch any BLOBs. The callers might, bypossibly invoking row_ext_create() via row_build(). However,they should have all needed information in the *old_versreturned by this function. This is because *old_vers is basedon the transaction undo log records. The functiontrx_undo_page_fetch_ext() will write BLOB prefixes to thetransaction undo log that are at least as long as the longestpossible column prefix in a secondary index. Thus, secondaryindex entries for *old_vers can be constructed withoutdereferencing any BLOB pointers. */ptr = trx_undo_rec_skip_row_ref(ptr, index);ptr = trx_undo_update_rec_get_update(ptr, index, type, trx_id, roll_ptr,info_bits, nullptr, heap, &update,lob_undo, type_cmpl);ut_a(ptr);if (row_upd_changes_field_size_or_external(index, offsets, update)) {/* We should confirm the existence of disowned external data,if the previous version record is delete marked. If the trx_idof the previous record is seen by purge view, we should treatit as missing history, because the disowned external datamight be purged already.The inherited external data (BLOBs) can be freed (purged)after trx_id was committed, provided that no view was startedbefore trx_id. If the purge view can see the committeddelete-marked record by trx_id, no transactions need to accessthe BLOB. *//* the row_upd_changes_disowned_external(update) call could beomitted, but the synchronization on purge_sys->latch is likelymore expensive. */if ((update->info_bits & REC_INFO_DELETED_FLAG) &&row_upd_changes_disowned_external(update)) {bool missing_extern;rw_lock_s_lock(&purge_sys->latch);missing_extern =purge_sys->view.changes_visible(trx_id, index->table->name);rw_lock_s_unlock(&purge_sys->latch);if (missing_extern) {/* treat as a fresh insert, not tocause assertion error at the caller. */return true;}}/* We have to set the appropriate extern storage bits in theold version of the record: the extern bits in rec for thosefields that update does NOT update, as well as the bits forthose fields that update updates to become externally storedfields. Store the info: */entry = row_rec_to_index_entry(rec, index, offsets, heap);/* The page containing the clustered index recordcorresponding to entry is latched in mtr. Thus thefollowing call is safe. */row_upd_index_replace_new_col_vals(entry, index, update, heap);buf = static_cast<byte *>(mem_heap_alloc(heap, rec_get_converted_size(index, entry)));*old_vers = rec_convert_dtuple_to_rec(buf, index, entry);} else {buf = static_cast<byte *>(mem_heap_alloc(heap, rec_offs_size(offsets)));*old_vers = rec_copy(buf, rec, offsets);rec_offs_make_valid(*old_vers, index, offsets);row_upd_rec_in_place(*old_vers, index, offsets, update, nullptr);}/* Set the old value (which is the after image of an update) in theupdate vector to dtuple vrow */if (v_status & TRX_UNDO_GET_OLD_V_VALUE) {row_upd_replace_vcol((dtuple_t *)*vrow, index->table, update, false,nullptr, nullptr);}#if defined UNIV_DEBUG || defined UNIV_BLOB_LIGHT_DEBUGut_a(!rec_offs_any_null_extern(*old_vers,rec_get_offsets(*old_vers, index, nullptr, ULINT_UNDEFINED, &heap))); #endif // defined UNIV_DEBUG || defined UNIV_BLOB_LIGHT_DEBUG/* If vrow is not NULL it means that the caller is interested in the values ofthe virtual columns for this version.If the UPD_NODE_NO_ORD_CHANGE flag is set on cmpl_info, it means that thechange which created this entry in undo log did not affect any column of anysecondary index (in particular: virtual), and thus the values of virtualcolumns were not recorded in undo. In such case the caller may assume that thevalues of (virtual) columns present in secondary index are exactly the same asthey are in the next (more recent) version.If on the other hand the UPD_NODE_NO_ORD_CHANGE flag is not set, then we willmake sure that *vrow points to a properly allocated memory and contains thevalues of virtual columns for this version recovered from undo log.This implies that if the caller has provided a non-NULL vrow, and the *vrow isstill NULL after the call, (and old_vers is not NULL) it must be because theUPD_NODE_NO_ORD_CHANGE flag was set for this version.This last statement is an important assumption made by therow_vers_impl_x_locked_low() function. */if (vrow && !(cmpl_info & UPD_NODE_NO_ORD_CHANGE)) {if (!(*vrow)) {*vrow = dtuple_create_with_vcol(v_heap ? v_heap : heap,index->table->get_n_cols(),dict_table_get_n_v_cols(index->table));dtuple_init_v_fld(*vrow);}ut_ad(index->table->n_v_cols);trx_undo_read_v_cols(index->table, ptr, *vrow,v_status & TRX_UNDO_PREV_IN_PURGE, false, nullptr,(v_heap != nullptr ? v_heap : heap));}if (update != nullptr) {update->reset();}return true; }

這個就是前面介紹的形成版本鏈的一個過程函數。通過解析undo log把指針一個個的連接起來,形成一個活動的版本鏈。

這樣,通過視圖創建、判斷以及MVCC中創建版本鏈的匹配原則,就可以拿到實際具體的相關版本數據了。

四、總結

MVCC是處理數據同步和安全的一種方式,是有效隔離事務的一種手段。數據庫如果嚴格實現串行讀寫,就不會有這種機制出現,但在實際應用中,為了達到更好的應用效果,提高并發和訪問速度,提出了想當多的方法,《數據密集型應用系統設計》中都有介紹。所以原理性的東西一定明白,再和具體的實現相對照,就會很清楚的弄明白事情的來龍去脈,知其然,知其所以然,是知也。
努力吧,歸來的少年!

總結

以上是生活随笔為你收集整理的mysql源码解读——MVCC的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。