當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Google Archive Patch 源码解析

發布時間：2025/3/15 编程问答 22 豆豆

生活随笔收集整理的這篇文章主要介紹了 Google Archive Patch 源码解析小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

如果你覺得本篇文章太長，可以直接看我總結的結論：

Google Archive Patch是嚴格的基于Zip文件格式的差量算法，其核心生成差量的算法還是BsDiff，核心合成文件的算法還是BsPatch，只是它將舊Zip文件和新Zip文件里的內容解壓出來分別轉為了差量友好的一個文件，使用差量算法生成差量文件；合成時，將舊Zip文件里的內容解壓出來轉為差量友好的一個文件，應用合成算法，生成新文件的差量友好的一個文件，再利用patch文件中每個ZipEntry的偏移和長度，以及壓縮等級，編碼策略，nowrap等標記，將其恢復為Zip文件。之所以使用差量友好的文件是因為一個文件，如果未壓縮，那么可以很簡單的描述其變化，如字符串”abc”，變為了”abcd”，我們可以直觀的描述其變化，增加了一個字符”d”；但是如果字符串經過了壓縮，那么這個變化不再可以這么容易的被描述。因此需要將壓縮后的文件轉為未壓縮的文件，生成差量友好的文件。

生成慢：Google Archive Patch之所以生成Patch的時間變長了，是因為Zip文件解壓出來后，生成的差量友好的文件變大了，因此使用BsDiff時，耗費的時間變長了。比如解壓出來后變大了2倍，則時間消耗變為了原來整個文件的生成差量的時間的2倍。

合成慢：合成的時間變長了，一方面的消耗也是因為生成差量友好的文件變大了，但是這不是本質原因，BsPatch合成是極快的，就算double倍時間，這點時間也是可以忽略不計的。其耗費時間的根本問題還在于重新生成zip文件時需要對流做各種判斷操作，一部分數據需要壓縮，一部分數據需要拷貝，基本上大部分的耗時操作都花在了數據的壓縮操作上。

生成文件小：小的原因上面已經解釋過了，是因為基于差量友好的文件生成差量文件的，文件間的變換變得很容易描述。

項目地址

Google Archive Patch

主要看三個模塊，一個是shared，一個是generator，另一個是applier。shared為另外兩個的公共模塊，generator為差量生成模塊，applier為差量應用模塊，其中generator中實現了一份java版的bsdiff算法，applier中實現了一份java版的bspatch算法。

Zip文件格式

Google Archive Patch是嚴格基于Zip文件格式的差量算法，因此有必要了解一下Zip文件格式。參考了網上的幾篇文章，發現其介紹文件格式的時候犯了一個小問題，他們都是正序的介紹其構成，但是其實應該倒過來，這樣更加便于理解。

一個Zip文件一般有三段構成

我們一一來解釋這三段，首先看最后一段

End of Central Directory

OffsetBytesDescription備注

0	4	End of Central Directory SIGNATURE = 0x06054b50	區塊頭部標記，固定值0x06054b50
4	2	disk number for this archive	忽略
6	2	disk number for the central directory	忽略
8	2	num entries in the central directory on this disk	忽略
10	2	num entries in the central directory overall	核心目錄結構總數
12	4	the length of the central directory	核心目錄的大小
16	4	the file offset of the central directory	核心目錄的偏移
20	2	the length of the zip file comment	注釋長度
22	n	from here to the EOF is the zip file comment	注釋內容

該段由一個表格所示的結構構成。這段的作用就是為了找出Central Directory的位置。

Central Directory

由End of Central Directory可以索引出Central Directory，看看其構成。

OffsetBytesDescription備注

0	4	Central Directory SIGNATURE = 0x02014b50	區塊頭部標記，固定值0x02014b50
4	2	the version-made-by	忽略
6	2	the version-needed-to-extract	忽略
8	2	the general-purpose flags, read for language encoding	通用位標記
10	2	the compression method	壓縮方法
12	2	the MSDOS last modified file time	文件最后修改時間
14	2	the MSDOS last modified file date	文件最后修改日期
16	4	the CRC32 of the uncompressed data	crc32校驗碼
20	4	the compressed size	壓縮后的大小
24	4	the uncompressed size	未壓縮的大小
28	2	the length of the file name	文件名長度
30	2	the length of the extras	擴展域長度
32	2	the length of the comment	文件注釋長度
34	2	the disk number	忽略
36	2	the internal file attributes	忽略
38	4	the external file attributes	忽略
42	4	the offset of the local section entry, where the data is	local entry所在偏移
46	i	the file name	文件名
46+i	j	the extras	擴展域
46+i+j	k	the comment	文件注釋

該段由n個表格表示的結構構成。這段的作用就是為了找出Zip文件真實數據所在的位置。

Contents of ZIP entries

由Central Directory段可以索引出Local Entry段，最后看一下Local Entry段

OffsetBytesDescription備注

0	4	Local Entry SIGNATURE = 0x04034b50	區塊頭部標記，固定值0x04034b50
4	2	the version-needed-to-extract	忽略
6	2	the general-purpose flags	通用位標記
8	2	the compression method	壓縮方法
10	2	the MSDOS last modified file time	文件最后修改時間
12	2	the MSDOS last modified file date	文件最后修改日期
14	4	the CRC32 of the uncompressed data	crc32校驗碼
18	4	the compressed size	壓縮后的大小
22	4	the uncompressed size	未壓縮的大小
26	2	the length of the file name	文件名長度
28	2	the length of the extras	擴展域長度
30	i	the file name	文件名
30+i	j	the extras	擴展區
30+i+j	k	file data	真實壓縮數據所在位置

該段由n個表格表示的結構構成。

Google Archive Patch解析Zip文件代碼

Google Archive Patch內部實現了一個解析Zip文件的mini結構，解析的工作主要由com.google.archivepatcher.generator.MinimalZipParser類負責，承載解析出來的數據主要由MinimalCentralDirectoryMetadata、MinimalZipArchive和MinimalZipEntry負責。解析完成后，最終輸出的是一個按照偏移量排序的MinimalZipEntry列表。

12345678910111213141516171819202122232425262728293031323334353637383940414243444546

private static List<MinimalZipEntry> listEntriesInternal(RandomAccessFileInputStream in) throws IOException { // Step 1: Locate the end-of-central-directory record header. long offsetOfEocd = MinimalZipParser.locateStartOfEocd(in, 32768); if (offsetOfEocd == -1) { // Archive is weird, abort. throw new ZipException("EOCD record not found in last 32k of archive, giving up");} // Step 2: Parse the end-of-central-directory data to locate the central directory itselfin.setRange(offsetOfEocd, in.length() - offsetOfEocd);MinimalCentralDirectoryMetadata centralDirectoryMetadata = MinimalZipParser.parseEocd(in); // Step 3: Extract a list of all central directory entries (contiguous data stream)in.setRange(centralDirectoryMetadata.getOffsetOfCentralDirectory(),centralDirectoryMetadata.getLengthOfCentralDirectory());List<MinimalZipEntry> minimalZipEntries = new ArrayList<MinimalZipEntry>(centralDirectoryMetadata.getNumEntriesInCentralDirectory()); for (int x = 0; x < centralDirectoryMetadata.getNumEntriesInCentralDirectory(); x++) {minimalZipEntries.add(MinimalZipParser.parseCentralDirectoryEntry(in));} // Step 4: Sort the entries in file order, not central directory order.Collections.sort(minimalZipEntries, LOCAL_ENTRY_OFFSET_COMAPRATOR); // Step 5: Seek out each local entry and calculate the offset of the compressed data within for (int x = 0; x < minimalZipEntries.size(); x++) {MinimalZipEntry entry = minimalZipEntries.get(x); long offsetOfNextEntry; if (x < minimalZipEntries.size() - 1) { // Don't allow reading past the start of the next entry, for sanity.offsetOfNextEntry = minimalZipEntries.get(x + 1).getFileOffsetOfLocalEntry();} else { // Last entry. Don't allow reading into the central directory, for sanity.offsetOfNextEntry = centralDirectoryMetadata.getOffsetOfCentralDirectory();} long rangeLength = offsetOfNextEntry - entry.getFileOffsetOfLocalEntry();in.setRange(entry.getFileOffsetOfLocalEntry(), rangeLength); long relativeDataOffset = MinimalZipParser.parseLocalEntryAndGetCompressedDataOffset(in);entry.setFileOffsetOfCompressedData(entry.getFileOffsetOfLocalEntry() + relativeDataOffset);} // Done! return minimalZipEntries;}

以上代碼主要做了下面幾件事

定位End of Central Directory起始偏移量
找到Central Directory段
解析Central Directory段
排序，按照偏移量升序
解析真實數據，找到其偏移量

如何定位End of Central Directory起始偏移量，其實很簡單，掃描字節，找到特定的頭部，即0x06054b50，其內部實現是掃描zip文件的最后32k部分的字節數組，找到了就返回，找不到就拋異常。這里有一個問題，如果最后32k找不到怎么辦，找了相關資料，也沒找到End of Central Directory一定在最后32k的說法，翻了Android Multidex的實現，發現它掃描的是最后64k的字節，這里就姑且認為它一定能掃描得到吧。其實現如下：

12345678910111213141516171819202122232425

public static long locateStartOfEocd(RandomAccessFileInputStream in, int searchBufferLength) throws IOException { final int maxBufferSize = (int) Math.min(searchBufferLength, in.length()); final byte[] buffer = new byte[maxBufferSize];//32k final long rangeStart = in.length() - buffer.length;in.setRange(rangeStart, buffer.length);readOrDie(in, buffer, 0, buffer.length);//read to buffer int offset = locateStartOfEocd(buffer);//locate if (offset == -1) { return -1;} return rangeStart + offset;} public static int locateStartOfEocd(byte[] buffer) { int last4Bytes = 0; // This is the 32 bits of data from the file for (int offset = buffer.length - 1; offset >= 0; offset--) {last4Bytes <<= 8;last4Bytes |= buffer[offset]; if (last4Bytes == EOCD_SIGNATURE) {//0x06054b50 return offset;}} return -1;}

找到End of Central Directory的起始偏移位置之后，就是解析該段數據，返回MinimalCentralDirectoryMetadata數據結構了。解析代碼如下：

1234567891011121314151617181920212223242526

public static MinimalCentralDirectoryMetadata parseEocd(InputStream in) throws IOException, ZipException { if (((int) read32BitUnsigned(in)) != EOCD_SIGNATURE) {//0x06054b50 throw new ZipException("Bad eocd header");} // *** 4 bytes encode EOCD_SIGNATURE, ignore (already found and verified). // 2 bytes encode disk number for this archive, ignore. // 2 bytes encode disk number for the central directory, ignore. // 2 bytes encode num entries in the central directory on this disk, ignore. // *** 2 bytes encode num entries in the central directory overall [READ THIS] // *** 4 bytes encode the length of the central directory [READ THIS] // *** 4 bytes encode the file offset of the central directory [READ THIS] // 2 bytes encode the length of the zip file comment, ignore. // Everything else from here to the EOF is the zip file comment, or junk. Ignore.skipOrDie(in, 2 + 2 + 2); int numEntriesInCentralDirectory = read16BitUnsigned(in);//number if (numEntriesInCentralDirectory == 0xffff) { // If 0xffff, this is a zip64 archive and this code doesn't handle that. throw new ZipException("No support for zip64");} long lengthOfCentralDirectory = read32BitUnsigned(in);//length long offsetOfCentralDirectory = read32BitUnsigned(in);//offset return new MinimalCentralDirectoryMetadata(numEntriesInCentralDirectory, offsetOfCentralDirectory, lengthOfCentralDirectory);}

從代碼中可以看出，其實只是解析出了三個重要的數據，分別是:

Central Directory 個數 n
Central Directory起始偏移 offset
Central Directory總長度 length

之后就是鎖定數據區域在[offset,offest+length]，內部實現是RandomAccessFile。for循環，循環次數為n，依次解析各個Central Directory。其解析單個Central Directory的代碼如下：

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950

public static MinimalZipEntry parseCentralDirectoryEntry(InputStream in) throws IOException { // *** 4 bytes encode the CENTRAL_DIRECTORY_ENTRY_SIGNATURE, verify for sanity // 2 bytes encode the version-made-by, ignore // 2 bytes encode the version-needed-to-extract, ignore // *** 2 bytes encode the general-purpose flags, read for language encoding. [READ THIS] // *** 2 bytes encode the compression method, [READ THIS] // 2 bytes encode the MSDOS last modified file time, ignore // 2 bytes encode the MSDOS last modified file date, ignore // *** 4 bytes encode the CRC32 of the uncompressed data [READ THIS] // *** 4 bytes encode the compressed size [READ THIS] // *** 4 bytes encode the uncompressed size [READ THIS] // *** 2 bytes encode the length of the file name [READ THIS] // *** 2 bytes encode the length of the extras, needed to skip the bytes later [READ THIS] // *** 2 bytes encode the length of the comment, needed to skip the bytes later [READ THIS] // 2 bytes encode the disk number, ignore // 2 bytes encode the internal file attributes, ignore // 4 bytes encode the external file attributes, ignore // *** 4 bytes encode the offset of the local section entry, where the data is [READ THIS] // n bytes encode the file name // n bytes encode the extras // n bytes encode the comment if (((int) read32BitUnsigned(in)) != CENTRAL_DIRECTORY_ENTRY_SIGNATURE) { throw new ZipException("Bad central directory header");}skipOrDie(in, 2 + 2); // Skip version stuff int generalPurposeFlags = read16BitUnsigned(in); int compressionMethod = read16BitUnsigned(in);skipOrDie(in, 2 + 2); // Skip MSDOS junk long crc32OfUncompressedData = read32BitUnsigned(in); long compressedSize = read32BitUnsigned(in); long uncompressedSize = read32BitUnsigned(in); int fileNameLength = read16BitUnsigned(in); int extrasLength = read16BitUnsigned(in); int commentLength = read16BitUnsigned(in);skipOrDie(in, 2 + 2 + 4); // Skip the disk number and file attributes long fileOffsetOfLocalEntry = read32BitUnsigned(in); byte[] fileNameBuffer = new byte[fileNameLength];readOrDie(in, fileNameBuffer, 0, fileNameBuffer.length);skipOrDie(in, extrasLength + commentLength); // General purpose flag bit 11 is an important hint for the character set used for file names. boolean generalPurposeFlagBit11 = (generalPurposeFlags & (0x1 << 10)) != 0; return new MinimalZipEntry(compressionMethod,crc32OfUncompressedData,compressedSize,uncompressedSize,fileNameBuffer,generalPurposeFlagBit11,fileOffsetOfLocalEntry);}

主要解析出如下的數據：

壓縮方法
crc32校驗碼
壓縮前大小
壓縮后大小
文件名
通用標記位
local entry偏移位置offset

返回了一個list，里面有n個MinimalZipEntry結構，經過按offset升序排序后，再遍歷list，解析其在local entry中的真實數據的偏移，其解析代碼如下：

123456789101112131415161718192021222324252627

public static long parseLocalEntryAndGetCompressedDataOffset(InputStream in) throws IOException { // *** 4 bytes encode the LOCAL_ENTRY_SIGNATURE, verify for sanity // 2 bytes encode the version-needed-to-extract, ignore // 2 bytes encode the general-purpose flags, ignore // 2 bytes encode the compression method, ignore (redundant with central directory) // 2 bytes encode the MSDOS last modified file time, ignore // 2 bytes encode the MSDOS last modified file date, ignore // 4 bytes encode the CRC32 of the uncompressed data, ignore (redundant with central directory) // 4 bytes encode the compressed size, ignore (redundant with central directory) // 4 bytes encode the uncompressed size, ignore (redundant with central directory) // *** 2 bytes encode the length of the file name, needed to skip the bytes later [READ THIS] // *** 2 bytes encode the length of the extras, needed to skip the bytes later [READ THIS] // The rest is the data, which is the main attraction here. if (((int) read32BitUnsigned(in)) != LOCAL_ENTRY_SIGNATURE) { throw new ZipException("Bad local entry header");} int junkLength = 2 + 2 + 2 + 2 + 2 + 4 + 4 + 4;skipOrDie(in, junkLength); // Skip everything up to the length of the file name final int fileNameLength = read16BitUnsigned(in); final int extrasLength = read16BitUnsigned(in); // The file name is already known and will match the central directory, so no need to read it. // The extra field length can be different here versus in the central directory and is used for // things like zipaligning APKs. This single value is the critical part as it dictates where the // actual DATA for the entry begins. return 4 + junkLength + 2 + 2 + fileNameLength + extrasLength;}

很簡單，跳過了locat entry的真實數據前面的所有字節，獲得偏移。

至此Zip文件解析完成。

差量文件的生成

實現代碼主要在FileByFileV1DeltaGenerator中，代碼如下：

123456789101112131415161718192021222324252627282930313233

@Overridepublic void generateDelta(File oldFile, File newFile, OutputStream patchOut) throws IOException, InterruptedException { try (TempFileHolder deltaFriendlyOldFile = new TempFileHolder();TempFileHolder deltaFriendlyNewFile = new TempFileHolder();TempFileHolder deltaFile = new TempFileHolder();FileOutputStream deltaFileOut = new FileOutputStream(deltaFile.file);BufferedOutputStream bufferedDeltaOut = new BufferedOutputStream(deltaFileOut)) {PreDiffExecutor.Builder builder = new PreDiffExecutor.Builder().readingOriginalFiles(oldFile, newFile).writingDeltaFriendlyFiles(deltaFriendlyOldFile.file, deltaFriendlyNewFile.file); for (RecommendationModifier modifier : recommendationModifiers) {builder.withRecommendationModifier(modifier);}PreDiffExecutor executor = builder.build();PreDiffPlan preDiffPlan = executor.prepareForDiffing();DeltaGenerator deltaGenerator = getDeltaGenerator();deltaGenerator.generateDelta(deltaFriendlyOldFile.file, deltaFriendlyNewFile.file, bufferedDeltaOut);bufferedDeltaOut.close();PatchWriter patchWriter = new PatchWriter(preDiffPlan,deltaFriendlyOldFile.file.length(),deltaFriendlyNewFile.file.length(),deltaFile.file);patchWriter.writeV1Patch(patchOut);}}protected DeltaGenerator getDeltaGenerator() { return new BsDiffDeltaGenerator();}

干了如下幾件事:

生成了三個臨時文件，分別用于存儲舊文件的差量友好文件，新文件的差量友好文件，差量文件，這三個文件會在jvm退出時自動刪除。
調用PreDiffExecutor的prepareForDiffing生成PreDiffPlan對象，該函數做了很多很多十分復雜的事情，后面細說
應用BsDiff差量算法生成差量文件
生成patch文件，patch文件格式后面細說。

現在來看下PreDiffExecutor的prepareForDiffing函數：

123456789101112131415

public PreDiffPlan prepareForDiffing() throws IOException {PreDiffPlan preDiffPlan = generatePreDiffPlan();List<TypedRange<JreDeflateParameters>> deltaFriendlyNewFileRecompressionPlan = null; if (deltaFriendlyOldFile != null) { // Builder.writingDeltaFriendlyFiles() ensures old and new are non-null when called, so a // check on either is sufficient.deltaFriendlyNewFileRecompressionPlan =Collections.unmodifiableList(generateDeltaFriendlyFiles(preDiffPlan));} return new PreDiffPlan(preDiffPlan.getQualifiedRecommendations(),preDiffPlan.getOldFileUncompressionPlan(),preDiffPlan.getNewFileUncompressionPlan(),deltaFriendlyNewFileRecompressionPlan);}

干了下面幾件事：

調用generatePreDiffPlan函數，生成一個PreDiffPlan對象，這個函數后面細說
根據返回的PreDiffPlan對象，調用generateDeltaFriendlyFiles函數生成差量友好文件，這個函數后面細說
創建一個PreDiffPlan對象，將相關參數傳入，分別是建議列表，舊文件需要被解壓的列表，新聞需要被解壓的列表，還有生成的新文件的差量友好相關的列表

現在來看看generatePreDiffPlan函數：

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394

private PreDiffPlan generatePreDiffPlan() throws IOException {Map<ByteArrayHolder, MinimalZipEntry> originalOldArchiveZipEntriesByPath = new HashMap<ByteArrayHolder, MinimalZipEntry>();Map<ByteArrayHolder, MinimalZipEntry> originalNewArchiveZipEntriesByPath = new HashMap<ByteArrayHolder, MinimalZipEntry>();Map<ByteArrayHolder, JreDeflateParameters> originalNewArchiveJreDeflateParametersByPath = new HashMap<ByteArrayHolder, JreDeflateParameters>(); for (MinimalZipEntry zipEntry : MinimalZipArchive.listEntries(originalOldFile)) {ByteArrayHolder key = new ByteArrayHolder(zipEntry.getFileNameBytes());originalOldArchiveZipEntriesByPath.put(key, zipEntry);}DefaultDeflateCompressionDiviner diviner = new DefaultDeflateCompressionDiviner(); for (DivinationResult divinationResult : diviner.divineDeflateParameters(originalNewFile)) {ByteArrayHolder key = new ByteArrayHolder(divinationResult.minimalZipEntry.getFileNameBytes());originalNewArchiveZipEntriesByPath.put(key, divinationResult.minimalZipEntry);originalNewArchiveJreDeflateParametersByPath.put(key, divinationResult.divinedParameters);}PreDiffPlanner preDiffPlanner = new PreDiffPlanner(originalOldFile,originalOldArchiveZipEntriesByPath,originalNewFile,originalNewArchiveZipEntriesByPath,originalNewArchiveJreDeflateParametersByPath,recommendationModifiers.toArray(new RecommendationModifier[] {})); return preDiffPlanner.generatePreDiffPlan();}public List<DivinationResult> divineDeflateParameters(File archiveFile) throws IOException {List<DivinationResult> results = new ArrayList<>(); for (MinimalZipEntry minimalZipEntry : MinimalZipArchive.listEntries(archiveFile)) {JreDeflateParameters divinedParameters = null; if (minimalZipEntry.isDeflateCompressed()) { // TODO(pasc): Reuse streams to avoid churning file descriptorsMultiViewInputStreamFactory isFactory = new RandomAccessFileInputStreamFactory(archiveFile,minimalZipEntry.getFileOffsetOfCompressedData(),minimalZipEntry.getCompressedSize()); // Keep small entries in memory to avoid unnecessary file I/O. if (minimalZipEntry.getCompressedSize() < (100 * 1024)) { try (InputStream is = isFactory.newStream()) { byte[] compressedBytes = new byte[(int) minimalZipEntry.getCompressedSize()];is.read(compressedBytes);divinedParameters =divineDeflateParameters(new ByteArrayInputStreamFactory(compressedBytes));} catch (Exception ignore) {divinedParameters = null;}} else {divinedParameters = divineDeflateParameters(isFactory);}}results.add(new DivinationResult(minimalZipEntry, divinedParameters));} return results;}public JreDeflateParameters divineDeflateParameters( MultiViewInputStreamFactory compressedDataInputStreamFactory) throws IOException { byte[] copyBuffer = new byte[32 * 1024]; // Iterate over all relevant combinations of nowrap, strategy and level. for (boolean nowrap : new boolean[] {true, false}) {Inflater inflater = new Inflater(nowrap);Deflater deflater = new Deflater(0, nowrap);strategy_loop: for (int strategy : new int[] {0, 1, 2}) {deflater.setStrategy(strategy); for (int level : LEVELS_BY_STRATEGY.get(strategy)) {deflater.setLevel(level);inflater.reset();deflater.reset(); try { if (matches(inflater, deflater, compressedDataInputStreamFactory, copyBuffer)) {end(inflater, deflater); return JreDeflateParameters.of(level, strategy, nowrap);}} catch (ZipException e) { // Parse error in input. The only possibilities are corruption or the wrong nowrap. // Skip all remaining levels and strategies. break strategy_loop;}}}end(inflater, deflater);} return null;}

generatePreDiffPlan做的事情是生成三個map對象。

第一個map對象是持有舊文件的相關數據。key為Zip Entry的文件名對應的字節數組的holder類ByteArrayHolder，value為MinimalZipEntry。
第二個map對象的持有新文件的相關數據。key為Zip Entry的文件名對應的字節數組的holder類ByteArrayHolder，value為MinimalZipEntry。
第三個map數據就是持有推測出來的新文件的Zip Entry的壓縮級別，策略，是否是nowrap三個數據。key為Zip Entry的文件名對應的字節數組的holder類ByteArrayHolder，value為JreDeflateParameters

對于前兩個數據調用前面解析過的Zip文件結構相關函數，返回MinimalZipEntry的List類型，key就來自MinimalZipEntry.getFileNameBytes()，而值就是其本身。

而第三個數據來的比較艱辛，需要經過推測，推測的方法很暴力，三層for循環，將壓縮的數據解壓縮，再利用三個參數的排列組合，即level，strategy，nowrap排列，進行重新壓縮，壓縮后的數據如果等于從Zip中解析出來的壓縮數據，則得到對應的level，strategy，nowrap值。這三個值的承載方式就是JreDeflateParameters。

利用這三個map構建了一個PreDiffPlanner對象，調用該對象的generatePreDiffPlan方法返回PreDiffPlan，其代碼如下：

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879

PreDiffPlan generatePreDiffPlan() throws IOException {List<QualifiedRecommendation> recommendations = getDefaultRecommendations(); for (RecommendationModifier modifier : recommendationModifiers) { // Allow changing the recommendations base on arbitrary criteria.recommendations = modifier.getModifiedRecommendations(oldFile, newFile, recommendations);} // Process recommendations to extract ranges for decompression & recompressionSet<TypedRange<Void>> oldFilePlan = new HashSet<>();Set<TypedRange<JreDeflateParameters>> newFilePlan = new HashSet<>(); for (QualifiedRecommendation recommendation : recommendations) { if (recommendation.getRecommendation().uncompressOldEntry) { long offset = recommendation.getOldEntry().getFileOffsetOfCompressedData(); long length = recommendation.getOldEntry().getCompressedSize();TypedRange<Void> range = new TypedRange<Void>(offset, length, null);oldFilePlan.add(range);} if (recommendation.getRecommendation().uncompressNewEntry) { long offset = recommendation.getNewEntry().getFileOffsetOfCompressedData(); long length = recommendation.getNewEntry().getCompressedSize();JreDeflateParameters newJreDeflateParameters =newArchiveJreDeflateParametersByPath.get( new ByteArrayHolder(recommendation.getNewEntry().getFileNameBytes()));TypedRange<JreDeflateParameters> range = new TypedRange<JreDeflateParameters>(offset, length, newJreDeflateParameters);newFilePlan.add(range);}}List<TypedRange<Void>> oldFilePlanList = new ArrayList<>(oldFilePlan);Collections.sort(oldFilePlanList);List<TypedRange<JreDeflateParameters>> newFilePlanList = new ArrayList<>(newFilePlan);Collections.sort(newFilePlanList); return new PreDiffPlan(Collections.unmodifiableList(recommendations),Collections.unmodifiableList(oldFilePlanList),Collections.unmodifiableList(newFilePlanList));}private List<QualifiedRecommendation> getDefaultRecommendations() throws IOException {List<QualifiedRecommendation> recommendations = new ArrayList<>(); // This will be used to find files that have been renamed, but not modified. This is relatively // cheap to construct as it just requires indexing all entries by the uncompressed CRC32, and // the CRC32 is already available in the ZIP headers.SimilarityFinder trivialRenameFinder = new Crc32SimilarityFinder(oldFile, oldArchiveZipEntriesByPath.values()); // Iterate over every pair of entries and get a recommendation for what to do. for (Map.Entry<ByteArrayHolder, MinimalZipEntry> newEntry :newArchiveZipEntriesByPath.entrySet()) {ByteArrayHolder newEntryPath = newEntry.getKey();MinimalZipEntry oldZipEntry = oldArchiveZipEntriesByPath.get(newEntryPath); if (oldZipEntry == null) { // The path is only present in the new archive, not in the old archive. Try to find a // similar file in the old archive that can serve as a diff base for the new file.List<MinimalZipEntry> identicalEntriesInOldArchive =trivialRenameFinder.findSimilarFiles(newFile, newEntry.getValue()); if (!identicalEntriesInOldArchive.isEmpty()) { // An identical file exists in the old archive at a different path. Use it for the // recommendation and carry on with the normal logic. // All entries in the returned list are identical, so just pick the first one. // NB, in principle it would be optimal to select the file that required the least work // to apply the patch - in practice, it is unlikely that an archive will contain multiple // copies of the same file that are compressed differently, so don't bother with that // degenerate case.oldZipEntry = identicalEntriesInOldArchive.get(0);}} // If the attempt to find a suitable diff base for the new entry has failed, oldZipEntry is // null (nothing to do in that case). Otherwise, there is an old entry that is relevant, so // get a recommendation for what to do. if (oldZipEntry != null) {recommendations.add(getRecommendation(oldZipEntry, newEntry.getValue()));}} return recommendations;}

該函數主要生成兩個List對象，分別是：

舊文件的建議解壓的Zip Entry的壓縮數據偏移位置和數據長度，承載的載體是TypedRange，泛型是Void，所有相關文件組成一個List對象
新文件的建議解壓的Zip Entry的壓縮數據的偏移位置和數據長度，承載的載體是TypedRange，泛型是JreDeflateParameters，泛型參數對應的值來自上一步解析出來的第三個map，所有相關件組成一個List對象

上面兩個List對象各自按偏移升序排序。

上面提到建議解壓的Zip Entry，那么這個數據是怎么來的呢？來自下面這個函數

12345678910111213141516171819202122232425262728293031323334353637383940

private List<QualifiedRecommendation> getDefaultRecommendations() throws IOException {List<QualifiedRecommendation> recommendations = new ArrayList<>(); // This will be used to find files that have been renamed, but not modified. This is relatively // cheap to construct as it just requires indexing all entries by the uncompressed CRC32, and // the CRC32 is already available in the ZIP headers.SimilarityFinder trivialRenameFinder = new Crc32SimilarityFinder(oldFile, oldArchiveZipEntriesByPath.values()); // Iterate over every pair of entries and get a recommendation for what to do. for (Map.Entry<ByteArrayHolder, MinimalZipEntry> newEntry :newArchiveZipEntriesByPath.entrySet()) {ByteArrayHolder newEntryPath = newEntry.getKey();MinimalZipEntry oldZipEntry = oldArchiveZipEntriesByPath.get(newEntryPath); if (oldZipEntry == null) { // The path is only present in the new archive, not in the old archive. Try to find a // similar file in the old archive that can serve as a diff base for the new file.List<MinimalZipEntry> identicalEntriesInOldArchive =trivialRenameFinder.findSimilarFiles(newFile, newEntry.getValue()); if (!identicalEntriesInOldArchive.isEmpty()) { // An identical file exists in the old archive at a different path. Use it for the // recommendation and carry on with the normal logic. // All entries in the returned list are identical, so just pick the first one. // NB, in principle it would be optimal to select the file that required the least work // to apply the patch - in practice, it is unlikely that an archive will contain multiple // copies of the same file that are compressed differently, so don't bother with that // degenerate case.oldZipEntry = identicalEntriesInOldArchive.get(0);}} // If the attempt to find a suitable diff base for the new entry has failed, oldZipEntry is // null (nothing to do in that case). Otherwise, there is an old entry that is relevant, so // get a recommendation for what to do. if (oldZipEntry != null) {recommendations.add(getRecommendation(oldZipEntry, newEntry.getValue()));}} return recommendations;}

這個函數主要做如下工作:

創建一個相似文件查找器，內部使用Map進行查找，key為crc32，值為舊文件的MinimalZipEntry，且是一個List，因為crc32相同的文件可能有多個。
遍歷新文件MinimalZipEntry的List對象，查看對應名字在舊文件中是否存在，如果不存在，則通過第一步的相似文件查找器查找crc32相同的文件，如果找到了，取List對象的第一個。如果找不到，則表示這個文件被移除了，不需要管它。
通過舊Entry和新Entry調用getRecommendation函數返回QualifiedRecommendation對象，add到List對象中；該對象持有了新舊Entry，以及新舊文件是否被解壓等相關信息。
返回找到的QualifiedRecommendation列表

QualifiedRecommendation的生成算法是什么呢，它是調用getRecommendation返回的，該函數代碼如下:

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768

private QualifiedRecommendation getRecommendation(MinimalZipEntry oldEntry, MinimalZipEntry newEntry) throws IOException { // Reject anything that is unsuitable for uncompressed diffing. // Reason singled out in order to monitor unsupported versions of zlib. if (unsuitableDeflate(newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.DEFLATE_UNSUITABLE);} // Reject anything that is unsuitable for uncompressed diffing. if (unsuitable(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.UNSUITABLE);} // If both entries are already uncompressed there is nothing to do. if (bothEntriesUncompressed(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.BOTH_ENTRIES_UNCOMPRESSED);} // The following are now true: // 1. At least one of the entries is compressed. // 1. The old entry is either uncompressed, or is compressed with deflate. // 2. The new entry is either uncompressed, or is reproducibly compressed with deflate. if (uncompressedChangedToCompressed(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEW,RecommendationReason.UNCOMPRESSED_CHANGED_TO_COMPRESSED);} if (compressedChangedToUncompressed(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_OLD,RecommendationReason.COMPRESSED_CHANGED_TO_UNCOMPRESSED);} // At this point, both entries must be compressed with deflate. if (compressedBytesChanged(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_BOTH,RecommendationReason.COMPRESSED_BYTES_CHANGED);} // If the compressed bytes have not changed, there is no need to do anything. return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.COMPRESSED_BYTES_IDENTICAL);}

主要有7種類型:

該文件被壓縮過，但是無法推測出其JreDeflateParamyaseters參數，也就是無法獲得其壓縮級別，編碼策略，nowrap三個參數，沒有了這三個參數，我們就無法重新進行壓縮，因此，對于這種情況，返回的是不建議解壓，原因是找不到合適的deflate參數還原壓縮數據
舊文件，或新文件被壓縮了，但是是不支持的壓縮算法，則返回不建議解壓縮，原因是使用了不支持的壓縮算法
如果新舊文件都沒有被壓縮，則返回不需要解壓，原因是都沒有被壓縮
如果舊文件未壓縮，新文件已壓縮，則返回新文件需要解壓，原因是從未壓縮文件變成了已壓縮文件
如果舊文件已壓縮，新文件未壓縮，則返回舊文件需要解壓，原因是從已壓縮文件變成了未壓縮文件
如果新舊文件都已經壓縮，且發生了變化，則返回需要解壓新舊文件，原因是文件發生改變
沒有新舊文件沒有發生變化，則返回不需要解壓新舊文件，原因是文件未發生改變

有了以上信息，再來看看差量友好的文件是怎么生成的：

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384

private List<TypedRange<JreDeflateParameters>> generateDeltaFriendlyFiles(PreDiffPlan preDiffPlan) throws IOException { try (FileOutputStream out = new FileOutputStream(deltaFriendlyOldFile);BufferedOutputStream bufferedOut = new BufferedOutputStream(out)) {DeltaFriendlyFile.generateDeltaFriendlyFile(preDiffPlan.getOldFileUncompressionPlan(), originalOldFile, bufferedOut);} try (FileOutputStream out = new FileOutputStream(deltaFriendlyNewFile);BufferedOutputStream bufferedOut = new BufferedOutputStream(out)) { return DeltaFriendlyFile.generateDeltaFriendlyFile(preDiffPlan.getNewFileUncompressionPlan(), originalNewFile, bufferedOut);}}public static <T> List<TypedRange<T>> generateDeltaFriendlyFile(List<TypedRange<T>> rangesToUncompress, File file, OutputStream deltaFriendlyOut) throws IOException { return generateDeltaFriendlyFile(rangesToUncompress, file, deltaFriendlyOut, true, DEFAULT_COPY_BUFFER_SIZE);} public static <T> List<TypedRange<T>> generateDeltaFriendlyFile(List<TypedRange<T>> rangesToUncompress,File file,OutputStream deltaFriendlyOut, boolean generateInverse, int copyBufferSize) throws IOException {List<TypedRange<T>> inverseRanges = null; if (generateInverse) {inverseRanges = new ArrayList<TypedRange<T>>(rangesToUncompress.size());} long lastReadOffset = 0;RandomAccessFileInputStream oldFileRafis = null;PartiallyUncompressingPipe filteredOut = new PartiallyUncompressingPipe(deltaFriendlyOut, copyBufferSize); try {oldFileRafis = new RandomAccessFileInputStream(file); for (TypedRange<T> rangeToUncompress : rangesToUncompress) { long gap = rangeToUncompress.getOffset() - lastReadOffset; if (gap > 0) { // Copy bytes up to the range start pointoldFileRafis.setRange(lastReadOffset, gap);filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.COPY);} // Now uncompress the range.oldFileRafis.setRange(rangeToUncompress.getOffset(), rangeToUncompress.getLength()); long inverseRangeStart = filteredOut.getNumBytesWritten(); // TODO(andrewhayden): Support nowrap=false here? Never encountered in practice. // This would involve catching the ZipException, checking if numBytesWritten is still zero, // resetting the stream and trying again.filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.UNCOMPRESS_NOWRAP);lastReadOffset = rangeToUncompress.getOffset() + rangeToUncompress.getLength(); if (generateInverse) { long inverseRangeEnd = filteredOut.getNumBytesWritten(); long inverseRangeLength = inverseRangeEnd - inverseRangeStart;TypedRange<T> inverseRange = new TypedRange<T>(inverseRangeStart, inverseRangeLength, rangeToUncompress.getMetadata());inverseRanges.add(inverseRange);}} // Finish the final bytes of the file long bytesLeft = oldFileRafis.length() - lastReadOffset; if (bytesLeft > 0) {oldFileRafis.setRange(lastReadOffset, bytesLeft);filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.COPY);}} finally { try {oldFileRafis.close();} catch (Exception ignored) { // Nothing} try {filteredOut.close();} catch (Exception ignored) { // Nothing}} return inverseRanges;}

這個函數比較巧妙，也比較復雜，其過程如下：

遍歷需要解壓的列表，獲得其偏移，將該偏移減去上次讀的偏移位置lastReadOffset，得到一個gap值，這個值使用COPY直接拷貝子杰數組
然后將數據定位到[offset,offset+length]之間，獲得已經解壓寫入的所有數據大小，賦值給inverseRangeStart，然后將壓縮數據使用對應的參數進行解壓，將上次讀的偏移位置lastReadOffset設置為當前的offset+length值。
判斷generateInverse是否為true，這里這個值永遠為true，因為入參傳了true。獲得已經解壓寫入的所有數據大小，賦值給inverseRangeEnd，使用inverseRangeEnd減去inverseRangeStart就是解壓之后的大小，構建TypedRange對象，add到list中
所有數據遍歷完之后，判斷當前讀的位置到文件結尾是否還有數據剩余，如果有，則繼續寫入
返回TypedRange的List對象。

這個過程比較復雜抽象，用一張圖來說明整個文件解壓過程。

上圖是zip文件，綠色的gap是一些描述信息，紅色的表示真實的壓縮數據，藍色的表示文件末尾遺留的數據，對于gap，執行拷貝操作，對于壓縮數據，執行解壓操作，并返回解壓之后真實的偏移offset和解壓之后真實數據的大小length，所有數據遍歷完之后，文件末尾還有一部分遺留數據，對其執行拷貝操作。

特別注意返回的TypedRange是新文件的解壓之后的offset和length，這個數據十分重要，還原zip文件就靠這個數據了。

有了新舊文件的差量友好文件之后做什么呢，很簡單，使用BsDiff生成差量文件，然后將差量文件寫入patch文件。patch文件的格式如下：

OffsetBytesDescription備注

0	8	Versioned Identifier	頭部標記，固定值”GFbFv1_0”，UTF-8字符串
8	4	Flags (currently unused, but reserved)	標記未，預留
12	8	Delta-friendly old archive size	舊文件差量友好文件大小，64位無符號整型
20	4	Num old archive uncompression ops	舊文件待解壓文件個數，32位無符號整型
24	i	Old archive uncompression op 1…n	舊文件待解壓文件的偏移和大小，總共n個
24+i	4	Num new archive recompression ops	新文件待壓縮文件個數，32位無符號整型
24+i+4	j	New archive recompression op 1…n	新文件待壓縮文件的偏移和大小，總共n個
24+i+4+j	4	Num delta descriptor records	新文件差量描述個數，32位無符號整型
24+i+4+j+4	k	Delta descriptor record 1…n	差量算法描述記錄，總共n個
24+i+4+j+4+k	l	Delta 1…n	差量算法描述

Old Archive Uncompression Op的數據結構如下

BytesDescription備注

8	Offset of first byte to uncompress	待解壓的偏移位置，64位無符號整型
8	Number of bytes to uncompress	待解壓的字節個數，64位無符號整型

New Archive Recompression Op的數據結構如下

BytesDescription備注

8	Offset of first byte to compress	待壓縮的偏移位置，64位無符號整型
8	Number of bytes to compress	待壓縮的字節個數，64位無符號整型
4	Compression settings	壓縮參數，即壓縮級別，編碼策略，nowrap

Compression Settings的數據結構如下

BytesDescription備注

1	Compatibility window ID	兼容窗口，當前取值為0，即默認兼容窗口
1	Deflate level	壓縮級別，取值[1,9]
1	Deflate strategy	編碼策略，取值[0,2]
1	Wrap mode	取值0=wrap,1=nowrap

Compatibility Window即兼容窗口，其默認的兼容窗口ID取值為0，默認兼容窗口使用如下配置

使用deflate算法進行壓縮（zlib）
32768個字節的buffer大小
已經被驗證的壓縮級別，1-9
已經被驗證過的編碼策略，0-2
已經被驗證過的wrap模式，wrap和nowrap

默認兼容窗口可以兼容Android4.0之后的系統。

這個兼容窗口是怎么得到的呢，其中有一個類叫DefaultDeflateCompatibilityWindow，可以調用getIncompatibleValues獲得其不兼容的參數列表JreDeflateParameters（壓縮級別，編碼策略，nowrap的承載體），內部通過排列組合這三個參數，對一段內容進行壓縮，產生壓縮后的數據的16進制的編碼，與內置的預期數據進行對比，如果相同則表示兼容，不相同表示不兼容。

這里有一個問題，官方表示可以兼容壓縮級別1-9，編碼策略0-2，wrap和nowrap，但是實際我測試下來，發現在pc上有一部分組合是不兼容的，大概約4個組合。Android上沒有測試過，不知道是否有這個問題。

Delta Descriptor Record用于描述差量算法，在當前的V1版Patch中，只有BsDiff算法，因此只有一條該數據結構，其數據結構如下:

BytesDescription備注

1	Delta format ID	差量算法對應的枚舉id，bsdiff取值0
8	Old delta-friendly region start	舊文件差量算法應用的偏移位置
8	Old delta-friendly region length	舊文件差量算法應用的長度
8	New delta-friendly region start	新文件差量算法應用的偏移位置
8	New delta-friendly region length	新文件差量算法應用的長度
8	Delta length	生成的差量文件的長度

生成patch文件的函數是writeV1Patch，其代碼如下：

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758

public void writeV1Patch(OutputStream out) throws IOException { // Use DataOutputStream for ease of writing. This is deliberately left open, as closing it would // close the output stream that was passed in and that is not part of the method's documented // behavior. @SuppressWarnings("resource")DataOutputStream dataOut = new DataOutputStream(out);dataOut.write(PatchConstants.IDENTIFIER.getBytes("US-ASCII"));//GFbFv1_0dataOut.writeInt(0); // Flags (reserved)dataOut.writeLong(deltaFriendlyOldFileSize); // Write out all the delta-friendly old file uncompression instructionsdataOut.writeInt(plan.getOldFileUncompressionPlan().size()); for (TypedRange<Void> range : plan.getOldFileUncompressionPlan()) {dataOut.writeLong(range.getOffset());dataOut.writeLong(range.getLength());} // Write out all the delta-friendly new file recompression instructionsdataOut.writeInt(plan.getDeltaFriendlyNewFileRecompressionPlan().size()); for (TypedRange<JreDeflateParameters> range : plan.getDeltaFriendlyNewFileRecompressionPlan()) {dataOut.writeLong(range.getOffset());dataOut.writeLong(range.getLength()); // Write the deflate informationdataOut.write(PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue);dataOut.write(range.getMetadata().level);dataOut.write(range.getMetadata().strategy);dataOut.write(range.getMetadata().nowrap ? 1 : 0);} // Now the delta section // First write the number of deltas present in the patch. In v1, there is always exactly one // delta, and it is for the entire input; in future versions there may be multiple deltas, of // arbitrary types.dataOut.writeInt(1); // In v1 the delta format is always bsdiff, so write it unconditionally.dataOut.write(PatchConstants.DeltaFormat.BSDIFF.patchValue); // Write the working ranges. In v1 these are always the entire contents of the delta-friendly // old file and the delta-friendly new file. These are for forward compatibility with future // versions that may allow deltas of arbitrary formats to be mapped to arbitrary ranges.dataOut.writeLong(0); // i.e., start of the working range in the delta-friendly old filedataOut.writeLong(deltaFriendlyOldFileSize); // i.e., length of the working range in olddataOut.writeLong(0); // i.e., start of the working range in the delta-friendly new filedataOut.writeLong(deltaFriendlyNewFileSize); // i.e., length of the working range in new // Finally, the length of the delta and the delta itself.dataOut.writeLong(deltaFile.length()); try (FileInputStream deltaFileIn = new FileInputStream(deltaFile);BufferedInputStream deltaIn = new BufferedInputStream(deltaFileIn)) { byte[] buffer = new byte[32768]; int numRead = 0; while ((numRead = deltaIn.read(buffer)) >= 0) {dataOut.write(buffer, 0, numRead);}}dataOut.flush();}

主要做了如下幾步：

寫入文件頭，”GFbFv1_0”
寫入標記位，預留，值為0
寫入舊文件差量友好文件的大小
寫入舊文件需要解壓的entry個數
依次寫入舊文件n個待解壓的entry的偏移和長度
寫入新文件需要壓縮的entry的個數
依次寫入新文件n個待壓縮的entry的偏移和長度，兼容窗口（窗口id，壓縮級別，壓縮策略，nowrap）
寫入差量算法描述個數，只使用了bsdiff，因此值為1
寫入差量算法id，舊文件差量友好文件應用差量算法的偏移和長度，新文件差量友好文件應用差量算法的偏移和長度
寫入patch文件的大小
寫入bsdiff生成的patch文件內容

新文件的合成

合成主要通過com.google.archivepatcher.applier.FileByFileV1DeltaApplier的applyDelta，最終會調用到applyDeltaInternal方法，其代碼如下：

1234567891011121314151617181920212223242526

private void applyDeltaInternal( File oldBlob, File deltaFriendlyOldBlob, InputStream deltaIn, OutputStream newBlobOut) throws IOException { // First, read the patch plan from the patch stream.PatchReader patchReader = new PatchReader();PatchApplyPlan plan = patchReader.readPatchApplyPlan(deltaIn);writeDeltaFriendlyOldBlob(plan, oldBlob, deltaFriendlyOldBlob); // Apply the delta. In v1 there is always exactly one delta descriptor, it is bsdiff, and it // takes up the rest of the patch stream - so there is no need to examine the list of // DeltaDescriptors in the patch at all. long deltaLength = plan.getDeltaDescriptors().get(0).getDeltaLength();DeltaApplier deltaApplier = getDeltaApplier(); // Don't close this stream, as it is just a limiting wrapper. @SuppressWarnings("resource")LimitedInputStream limitedDeltaIn = new LimitedInputStream(deltaIn, deltaLength); // Don't close this stream, as it would close the underlying OutputStream (that we don't own). @SuppressWarnings("resource")PartiallyCompressingOutputStream recompressingNewBlobOut = new PartiallyCompressingOutputStream(plan.getDeltaFriendlyNewFileRecompressionPlan(),newBlobOut,DEFAULT_COPY_BUFFER_SIZE);deltaApplier.applyDelta(deltaFriendlyOldBlob, limitedDeltaIn, recompressingNewBlobOut);recompressingNewBlobOut.flush();}

主要做了如下幾件事：

解析patch文件生成PatchApplyPlan對象
生成舊文件的差量友好文件
應用合成算法，合成新文件的差量友好文件，于此同時新文件zip包在流的寫入過程中完成合成。

對于第一步，來看看如何解析的，解析代碼如下:

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111

public PatchApplyPlan readPatchApplyPlan(InputStream in) throws IOException { // Use DataOutputStream for ease of writing. This is deliberately left open, as closing it would // close the output stream that was passed in and that is not part of the method's documented // behavior. @SuppressWarnings("resource")DataInputStream dataIn = new DataInputStream(in); // Read header and flags. byte[] expectedIdentifier = PatchConstants.IDENTIFIER.getBytes("US-ASCII"); byte[] actualIdentifier = new byte[expectedIdentifier.length];dataIn.readFully(actualIdentifier); if (!Arrays.equals(expectedIdentifier, actualIdentifier)) { throw new PatchFormatException("Bad identifier");}dataIn.skip(4); // Flags (ignored in v1) long deltaFriendlyOldFileSize = checkNonNegative(dataIn.readLong(), "delta-friendly old file size"); // Read old file uncompression instructions. int numOldFileUncompressionInstructions = (int) checkNonNegative(dataIn.readInt(), "old file uncompression instruction count");List<TypedRange<Void>> oldFileUncompressionPlan = new ArrayList<TypedRange<Void>>(numOldFileUncompressionInstructions); long lastReadOffset = -1; for (int x = 0; x < numOldFileUncompressionInstructions; x++) { long offset = checkNonNegative(dataIn.readLong(), "old file uncompression range offset"); long length = checkNonNegative(dataIn.readLong(), "old file uncompression range length"); if (offset < lastReadOffset) { throw new PatchFormatException("old file uncompression ranges out of order or overlapping");}TypedRange<Void> range = new TypedRange<Void>(offset, length, null);oldFileUncompressionPlan.add(range);lastReadOffset = offset + length; // To check that the next range starts after the current one} // Read new file recompression instructions int numDeltaFriendlyNewFileRecompressionInstructions = dataIn.readInt();checkNonNegative(numDeltaFriendlyNewFileRecompressionInstructions, "delta-friendly new file recompression instruction count");List<TypedRange<JreDeflateParameters>> deltaFriendlyNewFileRecompressionPlan = new ArrayList<TypedRange<JreDeflateParameters>>(numDeltaFriendlyNewFileRecompressionInstructions);lastReadOffset = -1; for (int x = 0; x < numDeltaFriendlyNewFileRecompressionInstructions; x++) { long offset = checkNonNegative(dataIn.readLong(), "delta-friendly new file recompression range offset"); long length = checkNonNegative(dataIn.readLong(), "delta-friendly new file recompression range length"); if (offset < lastReadOffset) { throw new PatchFormatException( "delta-friendly new file recompression ranges out of order or overlapping");}lastReadOffset = offset + length; // To check that the next range starts after the current one // Read the JreDeflateParameters // Note that v1 only supports the default deflate compatibility window.checkRange(dataIn.readByte(),PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue,PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue, "compatibility window id"); int level = (int) checkRange(dataIn.readUnsignedByte(), 1, 9, "recompression level"); int strategy = (int) checkRange(dataIn.readUnsignedByte(), 0, 2, "recompression strategy"); int nowrapInt = (int) checkRange(dataIn.readUnsignedByte(), 0, 1, "recompression nowrap");TypedRange<JreDeflateParameters> range = new TypedRange<JreDeflateParameters>(offset,length,JreDeflateParameters.of(level, strategy, nowrapInt == 0 ? false : true));deltaFriendlyNewFileRecompressionPlan.add(range);} // Read the delta metadata, but stop before the first byte of the actual delta. // V1 has exactly one delta and it must be bsdiff. int numDeltaRecords = (int) checkRange(dataIn.readInt(), 1, 1, "num delta records");List<DeltaDescriptor> deltaDescriptors = new ArrayList<DeltaDescriptor>(numDeltaRecords); for (int x = 0; x < numDeltaRecords; x++) { byte deltaFormatByte = (byte)checkRange(dataIn.readByte(),PatchConstants.DeltaFormat.BSDIFF.patchValue,PatchConstants.DeltaFormat.BSDIFF.patchValue, "delta format"); long deltaFriendlyOldFileWorkRangeOffset = checkNonNegative(dataIn.readLong(), "delta-friendly old file work range offset"); long deltaFriendlyOldFileWorkRangeLength = checkNonNegative(dataIn.readLong(), "delta-friendly old file work range length"); long deltaFriendlyNewFileWorkRangeOffset = checkNonNegative(dataIn.readLong(), "delta-friendly new file work range offset"); long deltaFriendlyNewFileWorkRangeLength = checkNonNegative(dataIn.readLong(), "delta-friendly new file work range length"); long deltaLength = checkNonNegative(dataIn.readLong(), "delta length");DeltaDescriptor descriptor = new DeltaDescriptor(PatchConstants.DeltaFormat.fromPatchValue(deltaFormatByte), new TypedRange<Void>(deltaFriendlyOldFileWorkRangeOffset, deltaFriendlyOldFileWorkRangeLength, null), new TypedRange<Void>(deltaFriendlyNewFileWorkRangeOffset, deltaFriendlyNewFileWorkRangeLength, null),deltaLength);deltaDescriptors.add(descriptor);} return new PatchApplyPlan(Collections.unmodifiableList(oldFileUncompressionPlan),deltaFriendlyOldFileSize,Collections.unmodifiableList(deltaFriendlyNewFileRecompressionPlan),Collections.unmodifiableList(deltaDescriptors));}

分為以下幾個步驟：

讀文件頭，校驗文件頭
忽略4個字節的標記位
讀舊文件差量友好文件的大小，并校驗，非負數
讀舊文件待解壓的個數，并校驗，非負數
讀n個舊文件待解壓的偏移，長度，并校驗，非負數
讀新文件待壓縮的個數，并校驗，非負數
讀n個新文件待壓縮的偏移，長度，并校驗，非負數，壓縮級別，編碼策略，nowrap值
讀差量算法個數
讀n個差量算法描述。差量算法id，舊文件應用差量算法的偏移和長度，新文件應用差量算法的偏移和長度，生成的差量文件的大小
返回PatchApplyPlan對象

接下來就是根據返回的PatchApplyPlan對象，獲得舊文件待解壓的一個TypedRange的List對象，然后使用DeltaFriendlyFile.generateDeltaFriendlyFile生產差量友好文件，這個過程和生產patch的那個過程一樣，不重復描述。其代碼如下：

1234567891011121314151617181920

private void writeDeltaFriendlyOldBlob( PatchApplyPlan plan, File oldBlob, File deltaFriendlyOldBlob) throws IOException {RandomAccessFileOutputStream deltaFriendlyOldFileOut = null; try {deltaFriendlyOldFileOut = new RandomAccessFileOutputStream(deltaFriendlyOldBlob, plan.getDeltaFriendlyOldFileSize());DeltaFriendlyFile.generateDeltaFriendlyFile(plan.getOldFileUncompressionPlan(),oldBlob,deltaFriendlyOldFileOut, false,DEFAULT_COPY_BUFFER_SIZE);} finally { try {deltaFriendlyOldFileOut.close();} catch (Exception ignored) { // Nothing}}

接下里就是合成新文件了，使用BsPatch算法完成合成并寫入Outputstream中，而這個OutputStream經過裝飾者模式包裝，最終傳入的是PartiallyCompressingOutputStream輸出流，構建PartiallyCompressingOutputStream對象所需參數就是新文件差量友好文件需要重新壓縮的數據的TypedRange的List對象。最終，合成Zip文件的工作會輾轉到PartiallyCompressingOutputStream中的writeChunk函數，其代碼如下：

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778

private int writeChunk(byte[] buffer, int offset, int length) throws IOException { if (bytesTillCompressionStarts() == 0 && !currentlyCompressing()) { // Compression will begin immediately.JreDeflateParameters parameters = nextCompressedRange.getMetadata(); if (deflater == null) {deflater = new Deflater(parameters.level, parameters.nowrap);} else if (lastDeflateParameters.nowrap != parameters.nowrap) { // Last deflater must be destroyed because nowrap settings do not match.deflater.end();deflater = new Deflater(parameters.level, parameters.nowrap);} // Deflater will already have been reset at the end of this method, no need to do it again. // Just set up the right parameters.deflater.setLevel(parameters.level);deflater.setStrategy(parameters.strategy);deflaterOut = new DeflaterOutputStream(normalOut, deflater, compressionBufferSize);} int numBytesToWrite;OutputStream writeTarget; if (currentlyCompressing()) { // Don't write past the end of the compressed range.numBytesToWrite = (int) Math.min(length, bytesTillCompressionEnds());writeTarget = deflaterOut;} else {writeTarget = normalOut; if (nextCompressedRange == null) { // All compression ranges have been consumed.numBytesToWrite = length;} else { // Don't write past the point where the next compressed range begins.numBytesToWrite = (int) Math.min(length, bytesTillCompressionStarts());}}writeTarget.write(buffer, offset, numBytesToWrite);numBytesWritten += numBytesToWrite; if (currentlyCompressing() && bytesTillCompressionEnds() == 0) { // Compression range complete. Finish the output and set up for the next run.deflaterOut.finish();deflaterOut.flush();deflaterOut = null;deflater.reset();lastDeflateParameters = nextCompressedRange.getMetadata(); if (rangeIterator.hasNext()) { // More compression ranges await in the future.nextCompressedRange = rangeIterator.next();} else { // All compression ranges have been consumed.nextCompressedRange = null;deflater.end();deflater = null;}} return numBytesToWrite;} private boolean currentlyCompressing() { return deflaterOut != null;} private long bytesTillCompressionStarts() { if (nextCompressedRange == null) { // All compression ranges have been consumed return -1L;} return nextCompressedRange.getOffset() - numBytesWritten;} private long bytesTillCompressionEnds() { if (nextCompressedRange == null) { // All compression ranges have been consumed return -1L;} return (nextCompressedRange.getOffset() + nextCompressedRange.getLength()) - numBytesWritten;}

合成算法的核心就是這個函數了，這個函數設計的十分巧妙，建議打個斷點跑一跑，好好理解一下。這里簡單介紹一下這個過程。

在PartiallyCompressingOutputStream的構造函數中，獲得了compressionRanges的第一個數據
判斷寫入的數據距離下一個壓縮數據開始如果是0，且當前并不在壓縮，則獲得壓縮設置，即壓縮級別，編碼策略，nowrap，并進行設置。并包裝輸出流為壓縮流。
如果當前正在壓縮，則判斷當前寫入的數據長度和待壓縮的數據長度，取其中小的一個，設置目標輸出流為壓縮流，即負責壓縮工作，而不是拷貝工作。
如果當前不在壓縮，如果沒有下一個壓縮數據了，則直接寫入對應長度的數據，如果還有下一個壓縮數據，則取當前寫入數據的長度和距離下一個壓縮數據的偏移位置的長度，取其中小的一個，設置目標輸出流為正常流，即進行拷貝工作，而不是壓縮工作
判斷當前是否正在壓縮，并且當前節點所有壓縮數據都已經寫入完全，執行壓縮流的finish和flush操作，重置壓縮相關配置項，并移動待壓縮的數據到下一條記錄。
重復以上操作，直到所有數據寫入完全。

過程比較復雜，同樣的用一張圖來表示:

合成的新的差量友好的文件數據如上圖表示。

當遇到綠色的gap區域時，則執行二進制拷貝操作，將其拷貝到輸出流去，當遇到紅色的已經解壓的數據，會使用對應的壓縮級別，編碼策略，nowrap參數將數據進行壓縮，將藍色的剩余數據寫入目標數據。

就這樣整個合成操作就完成了。

這樣為何能完成zip文件的合成呢，上面的解析已經很清楚了，其實Google Archive Patch記錄了所有新文件中需要重新壓縮的數據的參數，對于這些數據，使用這些參數壓縮，得到對應的壓縮數據，寫入其在新文件的真實位置，而對于Zip文件中的其他數據，則執行的是拷貝操作，這樣兩種操作合起來，最終就產生了新的Zip文件。且對于Apk來說，我們也無需關心其簽名。

這個過程真是巧妙，感嘆一下 !

Patch文件的壓縮和解壓

Google Archive Patch不對patch文件進行壓縮，壓縮工作需要自己進行，保證patch文件的大小很小，而客戶端接受到patch后需要對應的解壓。這么做，保證了壓縮patch算法的充分自由，可自行選擇，方便擴展。

通用的差量生成和合成框架

這幾天簡單的實現了一個通用的差量生成和合成框架，github地址見?CorePatch?,目前已經實現bsdiff和Google Archive Patch以及全量合成（直接拷貝文件）

優化

生成差量文件優化

生成差量文件使用的是bsdiff，但是對應的基礎文件經過解壓之后，其文件大小大大變大，導致生成差量文件的時間大大增加，這里沒有辦法優化，唯一的優化點就是使用其他更優差量生成算法，而不是BsDiff算法。

合成新文件優化

合成使用BsPatch進行合成，這個過程是十分快的，因此這里可以不優化，但是需要優化的點是合成新的Zip文件過程，即上面提到的writeChunk函數，而這個函數，唯一的耗時點就是壓縮操作，基本上壓縮操作耗時占全部耗時的80%-90%左右，所以這里基本沒什么優化點。

總結

Google Archive Patch的核心是生成差量友好文件，應用差量算法，記錄新文件差量友好文件中需要重新壓縮的偏移和長度，應用合成算法合成新文件時，對于需要重新壓縮的數據，用patch中的壓縮相關的參數進行壓縮，得到壓縮數據，而對于非壓縮數據，如Zip文件格式中其他數據，則執行拷貝操作。最終完美的合成了新文件，這種方式的優點是patch比基于文件級別的bsdiff生成的要小，缺點是生成時間長，合成時間長。

該算法核心的一個基本要求就是使用相同的壓縮級別，編碼策略和nowrap參數，對相同的數據進行壓縮，得到的數據數據。如果這個前提如果不滿足，則該算法就沒有意義了。

http://fucknmb.com/2017/10/05/Google-Archive-Patch-%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90/ http://fucknmb.com/2017/10/05/Google-Archive-Patch-%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90/

總結

以上是生活随笔為你收集整理的Google Archive Patch 源码解析的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：又掌握了一项新技能 - 断点调试 Gra
下一篇： aapt2 生成资源 public fl