日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Google Archive Patch 源码解析

發布時間:2025/3/15 编程问答 22 豆豆
生活随笔 收集整理的這篇文章主要介紹了 Google Archive Patch 源码解析 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

如果你覺得本篇文章太長,可以直接看我總結的結論:

Google Archive Patch是嚴格的基于Zip文件格式的差量算法,其核心生成差量的算法還是BsDiff,核心合成文件的算法還是BsPatch,只是它將舊Zip文件和新Zip文件里的內容解壓出來分別轉為了差量友好的一個文件,使用差量算法生成差量文件;合成時,將舊Zip文件里的內容解壓出來轉為差量友好的一個文件,應用合成算法,生成新文件的差量友好的一個文件,再利用patch文件中每個ZipEntry的偏移和長度,以及壓縮等級,編碼策略,nowrap等標記,將其恢復為Zip文件。之所以使用差量友好的文件是因為一個文件,如果未壓縮,那么可以很簡單的描述其變化,如字符串”abc”,變為了”abcd”,我們可以直觀的描述其變化,增加了一個字符”d”;但是如果字符串經過了壓縮,那么這個變化不再可以這么容易的被描述。因此需要將壓縮后的文件轉為未壓縮的文件,生成差量友好的文件。

生成慢:Google Archive Patch之所以生成Patch的時間變長了,是因為Zip文件解壓出來后,生成的差量友好的文件變大了,因此使用BsDiff時,耗費的時間變長了。比如解壓出來后變大了2倍,則時間消耗變為了原來整個文件的生成差量的時間的2倍。

合成慢:合成的時間變長了,一方面的消耗也是因為生成差量友好的文件變大了,但是這不是本質原因,BsPatch合成是極快的,就算double倍時間,這點時間也是可以忽略不計的。其耗費時間的根本問題還在于重新生成zip文件時需要對流做各種判斷操作,一部分數據需要壓縮,一部分數據需要拷貝,基本上大部分的耗時操作都花在了數據的壓縮操作上。

生成文件小:小的原因上面已經解釋過了,是因為基于差量友好的文件生成差量文件的,文件間的變換變得很容易描述。

項目地址

Google Archive Patch

主要看三個模塊,一個是shared,一個是generator,另一個是applier。shared為另外兩個的公共模塊,generator為差量生成模塊,applier為差量應用模塊,其中generator中實現了一份java版的bsdiff算法,applier中實現了一份java版的bspatch算法。

Zip文件格式

Google Archive Patch是嚴格基于Zip文件格式的差量算法,因此有必要了解一下Zip文件格式。參考了網上的幾篇文章,發現其介紹文件格式的時候犯了一個小問題,他們都是正序的介紹其構成,但是其實應該倒過來,這樣更加便于理解。

一個Zip文件一般有三段構成

我們一一來解釋這三段,首先看最后一段

End of Central Directory

OffsetBytesDescription備注
04End of Central Directory SIGNATURE = 0x06054b50區塊頭部標記,固定值0x06054b50
42disk number for this archive忽略
62disk number for the central directory忽略
82num entries in the central directory on this disk忽略
102num entries in the central directory overall核心目錄結構總數
124the length of the central directory核心目錄的大小
164the file offset of the central directory核心目錄的偏移
202the length of the zip file comment注釋長度
22nfrom here to the EOF is the zip file comment注釋內容

該段由一個表格所示的結構構成。這段的作用就是為了找出Central Directory的位置。

Central Directory

由End of Central Directory可以索引出Central Directory,看看其構成。

OffsetBytesDescription備注
04Central Directory SIGNATURE = 0x02014b50區塊頭部標記,固定值0x02014b50
42the version-made-by忽略
62the version-needed-to-extract忽略
82the general-purpose flags, read for language encoding通用位標記
102the compression method壓縮方法
122the MSDOS last modified file time文件最后修改時間
142the MSDOS last modified file date文件最后修改日期
164the CRC32 of the uncompressed datacrc32校驗碼
204the compressed size壓縮后的大小
244the uncompressed size未壓縮的大小
282the length of the file name文件名長度
302the length of the extras擴展域長度
322the length of the comment文件注釋長度
342the disk number忽略
362the internal file attributes忽略
384the external file attributes忽略
424the offset of the local section entry, where the data islocal entry所在偏移
46ithe file name文件名
46+ijthe extras擴展域
46+i+jkthe comment文件注釋

該段由n個表格表示的結構構成。這段的作用就是為了找出Zip文件真實數據所在的位置。

Contents of ZIP entries

由Central Directory段可以索引出Local Entry段,最后看一下Local Entry段

OffsetBytesDescription備注
04Local Entry SIGNATURE = 0x04034b50區塊頭部標記,固定值0x04034b50
42the version-needed-to-extract忽略
62the general-purpose flags通用位標記
82the compression method壓縮方法
102the MSDOS last modified file time文件最后修改時間
122the MSDOS last modified file date文件最后修改日期
144the CRC32 of the uncompressed datacrc32校驗碼
184the compressed size壓縮后的大小
224the uncompressed size未壓縮的大小
262the length of the file name文件名長度
282the length of the extras擴展域長度
30ithe file name文件名
30+ijthe extras擴展區
30+i+jkfile data真實壓縮數據所在位置

該段由n個表格表示的結構構成。

Google Archive Patch解析Zip文件代碼

Google Archive Patch內部實現了一個解析Zip文件的mini結構,解析的工作主要由com.google.archivepatcher.generator.MinimalZipParser類負責,承載解析出來的數據主要由MinimalCentralDirectoryMetadata、MinimalZipArchive和MinimalZipEntry負責。解析完成后,最終輸出的是一個按照偏移量排序的MinimalZipEntry列表。

12345678910111213141516171819202122232425262728293031323334353637383940414243444546private static List<MinimalZipEntry> listEntriesInternal(RandomAccessFileInputStream in) throws IOException { // Step 1: Locate the end-of-central-directory record header. long offsetOfEocd = MinimalZipParser.locateStartOfEocd(in, 32768); if (offsetOfEocd == -1) { // Archive is weird, abort. throw new ZipException("EOCD record not found in last 32k of archive, giving up");} // Step 2: Parse the end-of-central-directory data to locate the central directory itselfin.setRange(offsetOfEocd, in.length() - offsetOfEocd);MinimalCentralDirectoryMetadata centralDirectoryMetadata = MinimalZipParser.parseEocd(in); // Step 3: Extract a list of all central directory entries (contiguous data stream)in.setRange(centralDirectoryMetadata.getOffsetOfCentralDirectory(),centralDirectoryMetadata.getLengthOfCentralDirectory());List<MinimalZipEntry> minimalZipEntries = new ArrayList<MinimalZipEntry>(centralDirectoryMetadata.getNumEntriesInCentralDirectory()); for (int x = 0; x < centralDirectoryMetadata.getNumEntriesInCentralDirectory(); x++) {minimalZipEntries.add(MinimalZipParser.parseCentralDirectoryEntry(in));} // Step 4: Sort the entries in file order, not central directory order.Collections.sort(minimalZipEntries, LOCAL_ENTRY_OFFSET_COMAPRATOR); // Step 5: Seek out each local entry and calculate the offset of the compressed data within for (int x = 0; x < minimalZipEntries.size(); x++) {MinimalZipEntry entry = minimalZipEntries.get(x); long offsetOfNextEntry; if (x < minimalZipEntries.size() - 1) { // Don't allow reading past the start of the next entry, for sanity.offsetOfNextEntry = minimalZipEntries.get(x + 1).getFileOffsetOfLocalEntry();} else { // Last entry. Don't allow reading into the central directory, for sanity.offsetOfNextEntry = centralDirectoryMetadata.getOffsetOfCentralDirectory();} long rangeLength = offsetOfNextEntry - entry.getFileOffsetOfLocalEntry();in.setRange(entry.getFileOffsetOfLocalEntry(), rangeLength); long relativeDataOffset = MinimalZipParser.parseLocalEntryAndGetCompressedDataOffset(in);entry.setFileOffsetOfCompressedData(entry.getFileOffsetOfLocalEntry() + relativeDataOffset);} // Done! return minimalZipEntries;}

以上代碼主要做了下面幾件事

  • 定位End of Central Directory起始偏移量
  • 找到Central Directory段
  • 解析Central Directory段
  • 排序,按照偏移量升序
  • 解析真實數據,找到其偏移量

如何定位End of Central Directory起始偏移量,其實很簡單,掃描字節,找到特定的頭部,即0x06054b50,其內部實現是掃描zip文件的最后32k部分的字節數組,找到了就返回,找不到就拋異常。這里有一個問題,如果最后32k找不到怎么辦,找了相關資料,也沒找到End of Central Directory一定在最后32k的說法,翻了Android Multidex的實現,發現它掃描的是最后64k的字節,這里就姑且認為它一定能掃描得到吧。其實現如下:

12345678910111213141516171819202122232425public static long locateStartOfEocd(RandomAccessFileInputStream in, int searchBufferLength) throws IOException { final int maxBufferSize = (int) Math.min(searchBufferLength, in.length()); final byte[] buffer = new byte[maxBufferSize];//32k final long rangeStart = in.length() - buffer.length;in.setRange(rangeStart, buffer.length);readOrDie(in, buffer, 0, buffer.length);//read to buffer int offset = locateStartOfEocd(buffer);//locate if (offset == -1) { return -1;} return rangeStart + offset;} public static int locateStartOfEocd(byte[] buffer) { int last4Bytes = 0; // This is the 32 bits of data from the file for (int offset = buffer.length - 1; offset >= 0; offset--) {last4Bytes <<= 8;last4Bytes |= buffer[offset]; if (last4Bytes == EOCD_SIGNATURE) {//0x06054b50 return offset;}} return -1;}

找到End of Central Directory的起始偏移位置之后,就是解析該段數據,返回MinimalCentralDirectoryMetadata數據結構了。解析代碼如下:

1234567891011121314151617181920212223242526public static MinimalCentralDirectoryMetadata parseEocd(InputStream in) throws IOException, ZipException { if (((int) read32BitUnsigned(in)) != EOCD_SIGNATURE) {//0x06054b50 throw new ZipException("Bad eocd header");} // *** 4 bytes encode EOCD_SIGNATURE, ignore (already found and verified). // 2 bytes encode disk number for this archive, ignore. // 2 bytes encode disk number for the central directory, ignore. // 2 bytes encode num entries in the central directory on this disk, ignore. // *** 2 bytes encode num entries in the central directory overall [READ THIS] // *** 4 bytes encode the length of the central directory [READ THIS] // *** 4 bytes encode the file offset of the central directory [READ THIS] // 2 bytes encode the length of the zip file comment, ignore. // Everything else from here to the EOF is the zip file comment, or junk. Ignore.skipOrDie(in, 2 + 2 + 2); int numEntriesInCentralDirectory = read16BitUnsigned(in);//number if (numEntriesInCentralDirectory == 0xffff) { // If 0xffff, this is a zip64 archive and this code doesn't handle that. throw new ZipException("No support for zip64");} long lengthOfCentralDirectory = read32BitUnsigned(in);//length long offsetOfCentralDirectory = read32BitUnsigned(in);//offset return new MinimalCentralDirectoryMetadata(numEntriesInCentralDirectory, offsetOfCentralDirectory, lengthOfCentralDirectory);}

從代碼中可以看出,其實只是解析出了三個重要的數據,分別是:

  • Central Directory 個數 n
  • Central Directory起始偏移 offset
  • Central Directory總長度 length

之后就是鎖定數據區域在[offset,offest+length],內部實現是RandomAccessFile。for循環,循環次數為n,依次解析各個Central Directory。其解析單個Central Directory的代碼如下:

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950public static MinimalZipEntry parseCentralDirectoryEntry(InputStream in) throws IOException { // *** 4 bytes encode the CENTRAL_DIRECTORY_ENTRY_SIGNATURE, verify for sanity // 2 bytes encode the version-made-by, ignore // 2 bytes encode the version-needed-to-extract, ignore // *** 2 bytes encode the general-purpose flags, read for language encoding. [READ THIS] // *** 2 bytes encode the compression method, [READ THIS] // 2 bytes encode the MSDOS last modified file time, ignore // 2 bytes encode the MSDOS last modified file date, ignore // *** 4 bytes encode the CRC32 of the uncompressed data [READ THIS] // *** 4 bytes encode the compressed size [READ THIS] // *** 4 bytes encode the uncompressed size [READ THIS] // *** 2 bytes encode the length of the file name [READ THIS] // *** 2 bytes encode the length of the extras, needed to skip the bytes later [READ THIS] // *** 2 bytes encode the length of the comment, needed to skip the bytes later [READ THIS] // 2 bytes encode the disk number, ignore // 2 bytes encode the internal file attributes, ignore // 4 bytes encode the external file attributes, ignore // *** 4 bytes encode the offset of the local section entry, where the data is [READ THIS] // n bytes encode the file name // n bytes encode the extras // n bytes encode the comment if (((int) read32BitUnsigned(in)) != CENTRAL_DIRECTORY_ENTRY_SIGNATURE) { throw new ZipException("Bad central directory header");}skipOrDie(in, 2 + 2); // Skip version stuff int generalPurposeFlags = read16BitUnsigned(in); int compressionMethod = read16BitUnsigned(in);skipOrDie(in, 2 + 2); // Skip MSDOS junk long crc32OfUncompressedData = read32BitUnsigned(in); long compressedSize = read32BitUnsigned(in); long uncompressedSize = read32BitUnsigned(in); int fileNameLength = read16BitUnsigned(in); int extrasLength = read16BitUnsigned(in); int commentLength = read16BitUnsigned(in);skipOrDie(in, 2 + 2 + 4); // Skip the disk number and file attributes long fileOffsetOfLocalEntry = read32BitUnsigned(in); byte[] fileNameBuffer = new byte[fileNameLength];readOrDie(in, fileNameBuffer, 0, fileNameBuffer.length);skipOrDie(in, extrasLength + commentLength); // General purpose flag bit 11 is an important hint for the character set used for file names. boolean generalPurposeFlagBit11 = (generalPurposeFlags & (0x1 << 10)) != 0; return new MinimalZipEntry(compressionMethod,crc32OfUncompressedData,compressedSize,uncompressedSize,fileNameBuffer,generalPurposeFlagBit11,fileOffsetOfLocalEntry);}

主要解析出如下的數據:

  • 壓縮方法
  • crc32校驗碼
  • 壓縮前大小
  • 壓縮后大小
  • 文件名
  • 通用標記位
  • local entry偏移位置offset

返回了一個list,里面有n個MinimalZipEntry結構,經過按offset升序排序后,再遍歷list,解析其在local entry中的真實數據的偏移,其解析代碼如下:

123456789101112131415161718192021222324252627public static long parseLocalEntryAndGetCompressedDataOffset(InputStream in) throws IOException { // *** 4 bytes encode the LOCAL_ENTRY_SIGNATURE, verify for sanity // 2 bytes encode the version-needed-to-extract, ignore // 2 bytes encode the general-purpose flags, ignore // 2 bytes encode the compression method, ignore (redundant with central directory) // 2 bytes encode the MSDOS last modified file time, ignore // 2 bytes encode the MSDOS last modified file date, ignore // 4 bytes encode the CRC32 of the uncompressed data, ignore (redundant with central directory) // 4 bytes encode the compressed size, ignore (redundant with central directory) // 4 bytes encode the uncompressed size, ignore (redundant with central directory) // *** 2 bytes encode the length of the file name, needed to skip the bytes later [READ THIS] // *** 2 bytes encode the length of the extras, needed to skip the bytes later [READ THIS] // The rest is the data, which is the main attraction here. if (((int) read32BitUnsigned(in)) != LOCAL_ENTRY_SIGNATURE) { throw new ZipException("Bad local entry header");} int junkLength = 2 + 2 + 2 + 2 + 2 + 4 + 4 + 4;skipOrDie(in, junkLength); // Skip everything up to the length of the file name final int fileNameLength = read16BitUnsigned(in); final int extrasLength = read16BitUnsigned(in); // The file name is already known and will match the central directory, so no need to read it. // The extra field length can be different here versus in the central directory and is used for // things like zipaligning APKs. This single value is the critical part as it dictates where the // actual DATA for the entry begins. return 4 + junkLength + 2 + 2 + fileNameLength + extrasLength;}

很簡單,跳過了locat entry的真實數據前面的所有字節,獲得偏移。

至此Zip文件解析完成。

差量文件的生成

實現代碼主要在FileByFileV1DeltaGenerator中,代碼如下:

123456789101112131415161718192021222324252627282930313233@Overridepublic void generateDelta(File oldFile, File newFile, OutputStream patchOut) throws IOException, InterruptedException { try (TempFileHolder deltaFriendlyOldFile = new TempFileHolder();TempFileHolder deltaFriendlyNewFile = new TempFileHolder();TempFileHolder deltaFile = new TempFileHolder();FileOutputStream deltaFileOut = new FileOutputStream(deltaFile.file);BufferedOutputStream bufferedDeltaOut = new BufferedOutputStream(deltaFileOut)) {PreDiffExecutor.Builder builder = new PreDiffExecutor.Builder().readingOriginalFiles(oldFile, newFile).writingDeltaFriendlyFiles(deltaFriendlyOldFile.file, deltaFriendlyNewFile.file); for (RecommendationModifier modifier : recommendationModifiers) {builder.withRecommendationModifier(modifier);}PreDiffExecutor executor = builder.build();PreDiffPlan preDiffPlan = executor.prepareForDiffing();DeltaGenerator deltaGenerator = getDeltaGenerator();deltaGenerator.generateDelta(deltaFriendlyOldFile.file, deltaFriendlyNewFile.file, bufferedDeltaOut);bufferedDeltaOut.close();PatchWriter patchWriter = new PatchWriter(preDiffPlan,deltaFriendlyOldFile.file.length(),deltaFriendlyNewFile.file.length(),deltaFile.file);patchWriter.writeV1Patch(patchOut);}}protected DeltaGenerator getDeltaGenerator() { return new BsDiffDeltaGenerator();}

干了如下幾件事:

  • 生成了三個臨時文件,分別用于存儲舊文件的差量友好文件,新文件的差量友好文件,差量文件,這三個文件會在jvm退出時自動刪除。
  • 調用PreDiffExecutor的prepareForDiffing生成PreDiffPlan對象,該函數做了很多很多十分復雜的事情,后面細說
  • 應用BsDiff差量算法生成差量文件
  • 生成patch文件,patch文件格式后面細說。

現在來看下PreDiffExecutor的prepareForDiffing函數:

123456789101112131415public PreDiffPlan prepareForDiffing() throws IOException {PreDiffPlan preDiffPlan = generatePreDiffPlan();List<TypedRange<JreDeflateParameters>> deltaFriendlyNewFileRecompressionPlan = null; if (deltaFriendlyOldFile != null) { // Builder.writingDeltaFriendlyFiles() ensures old and new are non-null when called, so a // check on either is sufficient.deltaFriendlyNewFileRecompressionPlan =Collections.unmodifiableList(generateDeltaFriendlyFiles(preDiffPlan));} return new PreDiffPlan(preDiffPlan.getQualifiedRecommendations(),preDiffPlan.getOldFileUncompressionPlan(),preDiffPlan.getNewFileUncompressionPlan(),deltaFriendlyNewFileRecompressionPlan);}

干了下面幾件事:

  • 調用generatePreDiffPlan函數,生成一個PreDiffPlan對象,這個函數后面細說
  • 根據返回的PreDiffPlan對象,調用generateDeltaFriendlyFiles函數生成差量友好文件,這個函數后面細說
  • 創建一個PreDiffPlan對象,將相關參數傳入,分別是建議列表,舊文件需要被解壓的列表,新聞需要被解壓的列表,還有生成的新文件的差量友好相關的列表

現在來看看generatePreDiffPlan函數:

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394private PreDiffPlan generatePreDiffPlan() throws IOException {Map<ByteArrayHolder, MinimalZipEntry> originalOldArchiveZipEntriesByPath = new HashMap<ByteArrayHolder, MinimalZipEntry>();Map<ByteArrayHolder, MinimalZipEntry> originalNewArchiveZipEntriesByPath = new HashMap<ByteArrayHolder, MinimalZipEntry>();Map<ByteArrayHolder, JreDeflateParameters> originalNewArchiveJreDeflateParametersByPath = new HashMap<ByteArrayHolder, JreDeflateParameters>(); for (MinimalZipEntry zipEntry : MinimalZipArchive.listEntries(originalOldFile)) {ByteArrayHolder key = new ByteArrayHolder(zipEntry.getFileNameBytes());originalOldArchiveZipEntriesByPath.put(key, zipEntry);}DefaultDeflateCompressionDiviner diviner = new DefaultDeflateCompressionDiviner(); for (DivinationResult divinationResult : diviner.divineDeflateParameters(originalNewFile)) {ByteArrayHolder key = new ByteArrayHolder(divinationResult.minimalZipEntry.getFileNameBytes());originalNewArchiveZipEntriesByPath.put(key, divinationResult.minimalZipEntry);originalNewArchiveJreDeflateParametersByPath.put(key, divinationResult.divinedParameters);}PreDiffPlanner preDiffPlanner = new PreDiffPlanner(originalOldFile,originalOldArchiveZipEntriesByPath,originalNewFile,originalNewArchiveZipEntriesByPath,originalNewArchiveJreDeflateParametersByPath,recommendationModifiers.toArray(new RecommendationModifier[] {})); return preDiffPlanner.generatePreDiffPlan();}public List<DivinationResult> divineDeflateParameters(File archiveFile) throws IOException {List<DivinationResult> results = new ArrayList<>(); for (MinimalZipEntry minimalZipEntry : MinimalZipArchive.listEntries(archiveFile)) {JreDeflateParameters divinedParameters = null; if (minimalZipEntry.isDeflateCompressed()) { // TODO(pasc): Reuse streams to avoid churning file descriptorsMultiViewInputStreamFactory isFactory = new RandomAccessFileInputStreamFactory(archiveFile,minimalZipEntry.getFileOffsetOfCompressedData(),minimalZipEntry.getCompressedSize()); // Keep small entries in memory to avoid unnecessary file I/O. if (minimalZipEntry.getCompressedSize() < (100 * 1024)) { try (InputStream is = isFactory.newStream()) { byte[] compressedBytes = new byte[(int) minimalZipEntry.getCompressedSize()];is.read(compressedBytes);divinedParameters =divineDeflateParameters(new ByteArrayInputStreamFactory(compressedBytes));} catch (Exception ignore) {divinedParameters = null;}} else {divinedParameters = divineDeflateParameters(isFactory);}}results.add(new DivinationResult(minimalZipEntry, divinedParameters));} return results;}public JreDeflateParameters divineDeflateParameters( MultiViewInputStreamFactory compressedDataInputStreamFactory) throws IOException { byte[] copyBuffer = new byte[32 * 1024]; // Iterate over all relevant combinations of nowrap, strategy and level. for (boolean nowrap : new boolean[] {true, false}) {Inflater inflater = new Inflater(nowrap);Deflater deflater = new Deflater(0, nowrap);strategy_loop: for (int strategy : new int[] {0, 1, 2}) {deflater.setStrategy(strategy); for (int level : LEVELS_BY_STRATEGY.get(strategy)) {deflater.setLevel(level);inflater.reset();deflater.reset(); try { if (matches(inflater, deflater, compressedDataInputStreamFactory, copyBuffer)) {end(inflater, deflater); return JreDeflateParameters.of(level, strategy, nowrap);}} catch (ZipException e) { // Parse error in input. The only possibilities are corruption or the wrong nowrap. // Skip all remaining levels and strategies. break strategy_loop;}}}end(inflater, deflater);} return null;}

generatePreDiffPlan做的事情是生成三個map對象。

  • 第一個map對象是持有舊文件的相關數據。key為Zip Entry的文件名對應的字節數組的holder類ByteArrayHolder,value為MinimalZipEntry。
  • 第二個map對象的持有新文件的相關數據。key為Zip Entry的文件名對應的字節數組的holder類ByteArrayHolder,value為MinimalZipEntry。
  • 第三個map數據就是持有推測出來的新文件的Zip Entry的壓縮級別,策略,是否是nowrap三個數據。key為Zip Entry的文件名對應的字節數組的holder類ByteArrayHolder,value為JreDeflateParameters

對于前兩個數據調用前面解析過的Zip文件結構相關函數,返回MinimalZipEntry的List類型,key就來自MinimalZipEntry.getFileNameBytes(),而值就是其本身。

而第三個數據來的比較艱辛,需要經過推測,推測的方法很暴力,三層for循環,將壓縮的數據解壓縮,再利用三個參數的排列組合,即level,strategy,nowrap排列,進行重新壓縮,壓縮后的數據如果等于從Zip中解析出來的壓縮數據,則得到對應的level,strategy,nowrap值。這三個值的承載方式就是JreDeflateParameters。

利用這三個map構建了一個PreDiffPlanner對象,調用該對象的generatePreDiffPlan方法返回PreDiffPlan,其代碼如下:

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879PreDiffPlan generatePreDiffPlan() throws IOException {List<QualifiedRecommendation> recommendations = getDefaultRecommendations(); for (RecommendationModifier modifier : recommendationModifiers) { // Allow changing the recommendations base on arbitrary criteria.recommendations = modifier.getModifiedRecommendations(oldFile, newFile, recommendations);} // Process recommendations to extract ranges for decompression & recompressionSet<TypedRange<Void>> oldFilePlan = new HashSet<>();Set<TypedRange<JreDeflateParameters>> newFilePlan = new HashSet<>(); for (QualifiedRecommendation recommendation : recommendations) { if (recommendation.getRecommendation().uncompressOldEntry) { long offset = recommendation.getOldEntry().getFileOffsetOfCompressedData(); long length = recommendation.getOldEntry().getCompressedSize();TypedRange<Void> range = new TypedRange<Void>(offset, length, null);oldFilePlan.add(range);} if (recommendation.getRecommendation().uncompressNewEntry) { long offset = recommendation.getNewEntry().getFileOffsetOfCompressedData(); long length = recommendation.getNewEntry().getCompressedSize();JreDeflateParameters newJreDeflateParameters =newArchiveJreDeflateParametersByPath.get( new ByteArrayHolder(recommendation.getNewEntry().getFileNameBytes()));TypedRange<JreDeflateParameters> range = new TypedRange<JreDeflateParameters>(offset, length, newJreDeflateParameters);newFilePlan.add(range);}}List<TypedRange<Void>> oldFilePlanList = new ArrayList<>(oldFilePlan);Collections.sort(oldFilePlanList);List<TypedRange<JreDeflateParameters>> newFilePlanList = new ArrayList<>(newFilePlan);Collections.sort(newFilePlanList); return new PreDiffPlan(Collections.unmodifiableList(recommendations),Collections.unmodifiableList(oldFilePlanList),Collections.unmodifiableList(newFilePlanList));}private List<QualifiedRecommendation> getDefaultRecommendations() throws IOException {List<QualifiedRecommendation> recommendations = new ArrayList<>(); // This will be used to find files that have been renamed, but not modified. This is relatively // cheap to construct as it just requires indexing all entries by the uncompressed CRC32, and // the CRC32 is already available in the ZIP headers.SimilarityFinder trivialRenameFinder = new Crc32SimilarityFinder(oldFile, oldArchiveZipEntriesByPath.values()); // Iterate over every pair of entries and get a recommendation for what to do. for (Map.Entry<ByteArrayHolder, MinimalZipEntry> newEntry :newArchiveZipEntriesByPath.entrySet()) {ByteArrayHolder newEntryPath = newEntry.getKey();MinimalZipEntry oldZipEntry = oldArchiveZipEntriesByPath.get(newEntryPath); if (oldZipEntry == null) { // The path is only present in the new archive, not in the old archive. Try to find a // similar file in the old archive that can serve as a diff base for the new file.List<MinimalZipEntry> identicalEntriesInOldArchive =trivialRenameFinder.findSimilarFiles(newFile, newEntry.getValue()); if (!identicalEntriesInOldArchive.isEmpty()) { // An identical file exists in the old archive at a different path. Use it for the // recommendation and carry on with the normal logic. // All entries in the returned list are identical, so just pick the first one. // NB, in principle it would be optimal to select the file that required the least work // to apply the patch - in practice, it is unlikely that an archive will contain multiple // copies of the same file that are compressed differently, so don't bother with that // degenerate case.oldZipEntry = identicalEntriesInOldArchive.get(0);}} // If the attempt to find a suitable diff base for the new entry has failed, oldZipEntry is // null (nothing to do in that case). Otherwise, there is an old entry that is relevant, so // get a recommendation for what to do. if (oldZipEntry != null) {recommendations.add(getRecommendation(oldZipEntry, newEntry.getValue()));}} return recommendations;}

該函數主要生成兩個List對象,分別是:

  • 舊文件的建議解壓的Zip Entry的壓縮數據偏移位置和數據長度,承載的載體是TypedRange,泛型是Void,所有相關文件組成一個List對象
  • 新文件的建議解壓的Zip Entry的壓縮數據的偏移位置和數據長度,承載的載體是TypedRange,泛型是JreDeflateParameters,泛型參數對應的值來自上一步解析出來的第三個map,所有相關件組成一個List對象

上面兩個List對象各自按偏移升序排序。

上面提到建議解壓的Zip Entry,那么這個數據是怎么來的呢?來自下面這個函數

12345678910111213141516171819202122232425262728293031323334353637383940private List<QualifiedRecommendation> getDefaultRecommendations() throws IOException {List<QualifiedRecommendation> recommendations = new ArrayList<>(); // This will be used to find files that have been renamed, but not modified. This is relatively // cheap to construct as it just requires indexing all entries by the uncompressed CRC32, and // the CRC32 is already available in the ZIP headers.SimilarityFinder trivialRenameFinder = new Crc32SimilarityFinder(oldFile, oldArchiveZipEntriesByPath.values()); // Iterate over every pair of entries and get a recommendation for what to do. for (Map.Entry<ByteArrayHolder, MinimalZipEntry> newEntry :newArchiveZipEntriesByPath.entrySet()) {ByteArrayHolder newEntryPath = newEntry.getKey();MinimalZipEntry oldZipEntry = oldArchiveZipEntriesByPath.get(newEntryPath); if (oldZipEntry == null) { // The path is only present in the new archive, not in the old archive. Try to find a // similar file in the old archive that can serve as a diff base for the new file.List<MinimalZipEntry> identicalEntriesInOldArchive =trivialRenameFinder.findSimilarFiles(newFile, newEntry.getValue()); if (!identicalEntriesInOldArchive.isEmpty()) { // An identical file exists in the old archive at a different path. Use it for the // recommendation and carry on with the normal logic. // All entries in the returned list are identical, so just pick the first one. // NB, in principle it would be optimal to select the file that required the least work // to apply the patch - in practice, it is unlikely that an archive will contain multiple // copies of the same file that are compressed differently, so don't bother with that // degenerate case.oldZipEntry = identicalEntriesInOldArchive.get(0);}} // If the attempt to find a suitable diff base for the new entry has failed, oldZipEntry is // null (nothing to do in that case). Otherwise, there is an old entry that is relevant, so // get a recommendation for what to do. if (oldZipEntry != null) {recommendations.add(getRecommendation(oldZipEntry, newEntry.getValue()));}} return recommendations;}

這個函數主要做如下工作:

  • 創建一個相似文件查找器,內部使用Map進行查找,key為crc32,值為舊文件的MinimalZipEntry,且是一個List,因為crc32相同的文件可能有多個。
  • 遍歷新文件MinimalZipEntry的List對象,查看對應名字在舊文件中是否存在,如果不存在,則通過第一步的相似文件查找器查找crc32相同的文件,如果找到了,取List對象的第一個。如果找不到,則表示這個文件被移除了,不需要管它。
  • 通過舊Entry和新Entry調用getRecommendation函數返回QualifiedRecommendation對象,add到List對象中;該對象持有了新舊Entry,以及新舊文件是否被解壓等相關信息。
  • 返回找到的QualifiedRecommendation列表

QualifiedRecommendation的生成算法是什么呢,它是調用getRecommendation返回的,該函數代碼如下:

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768private QualifiedRecommendation getRecommendation(MinimalZipEntry oldEntry, MinimalZipEntry newEntry) throws IOException { // Reject anything that is unsuitable for uncompressed diffing. // Reason singled out in order to monitor unsupported versions of zlib. if (unsuitableDeflate(newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.DEFLATE_UNSUITABLE);} // Reject anything that is unsuitable for uncompressed diffing. if (unsuitable(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.UNSUITABLE);} // If both entries are already uncompressed there is nothing to do. if (bothEntriesUncompressed(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.BOTH_ENTRIES_UNCOMPRESSED);} // The following are now true: // 1. At least one of the entries is compressed. // 1. The old entry is either uncompressed, or is compressed with deflate. // 2. The new entry is either uncompressed, or is reproducibly compressed with deflate. if (uncompressedChangedToCompressed(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEW,RecommendationReason.UNCOMPRESSED_CHANGED_TO_COMPRESSED);} if (compressedChangedToUncompressed(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_OLD,RecommendationReason.COMPRESSED_CHANGED_TO_UNCOMPRESSED);} // At this point, both entries must be compressed with deflate. if (compressedBytesChanged(oldEntry, newEntry)) { return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_BOTH,RecommendationReason.COMPRESSED_BYTES_CHANGED);} // If the compressed bytes have not changed, there is no need to do anything. return new QualifiedRecommendation(oldEntry,newEntry,Recommendation.UNCOMPRESS_NEITHER,RecommendationReason.COMPRESSED_BYTES_IDENTICAL);}

主要有7種類型:

  • 該文件被壓縮過,但是無法推測出其JreDeflateParamyaseters參數,也就是無法獲得其壓縮級別,編碼策略,nowrap三個參數,沒有了這三個參數,我們就無法重新進行壓縮,因此,對于這種情況,返回的是不建議解壓,原因是找不到合適的deflate參數還原壓縮數據
  • 舊文件,或新文件被壓縮了,但是是不支持的壓縮算法,則返回不建議解壓縮,原因是使用了不支持的壓縮算法
  • 如果新舊文件都沒有被壓縮,則返回不需要解壓,原因是都沒有被壓縮
  • 如果舊文件未壓縮,新文件已壓縮,則返回新文件需要解壓,原因是從未壓縮文件變成了已壓縮文件
  • 如果舊文件已壓縮,新文件未壓縮,則返回舊文件需要解壓,原因是從已壓縮文件變成了未壓縮文件
  • 如果新舊文件都已經壓縮,且發生了變化,則返回需要解壓新舊文件,原因是文件發生改變
  • 沒有新舊文件沒有發生變化,則返回不需要解壓新舊文件,原因是文件未發生改變

有了以上信息,再來看看差量友好的文件是怎么生成的:

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384 private List<TypedRange<JreDeflateParameters>> generateDeltaFriendlyFiles(PreDiffPlan preDiffPlan) throws IOException { try (FileOutputStream out = new FileOutputStream(deltaFriendlyOldFile);BufferedOutputStream bufferedOut = new BufferedOutputStream(out)) {DeltaFriendlyFile.generateDeltaFriendlyFile(preDiffPlan.getOldFileUncompressionPlan(), originalOldFile, bufferedOut);} try (FileOutputStream out = new FileOutputStream(deltaFriendlyNewFile);BufferedOutputStream bufferedOut = new BufferedOutputStream(out)) { return DeltaFriendlyFile.generateDeltaFriendlyFile(preDiffPlan.getNewFileUncompressionPlan(), originalNewFile, bufferedOut);}}public static <T> List<TypedRange<T>> generateDeltaFriendlyFile(List<TypedRange<T>> rangesToUncompress, File file, OutputStream deltaFriendlyOut) throws IOException { return generateDeltaFriendlyFile(rangesToUncompress, file, deltaFriendlyOut, true, DEFAULT_COPY_BUFFER_SIZE);} public static <T> List<TypedRange<T>> generateDeltaFriendlyFile(List<TypedRange<T>> rangesToUncompress,File file,OutputStream deltaFriendlyOut, boolean generateInverse, int copyBufferSize) throws IOException {List<TypedRange<T>> inverseRanges = null; if (generateInverse) {inverseRanges = new ArrayList<TypedRange<T>>(rangesToUncompress.size());} long lastReadOffset = 0;RandomAccessFileInputStream oldFileRafis = null;PartiallyUncompressingPipe filteredOut = new PartiallyUncompressingPipe(deltaFriendlyOut, copyBufferSize); try {oldFileRafis = new RandomAccessFileInputStream(file); for (TypedRange<T> rangeToUncompress : rangesToUncompress) { long gap = rangeToUncompress.getOffset() - lastReadOffset; if (gap > 0) { // Copy bytes up to the range start pointoldFileRafis.setRange(lastReadOffset, gap);filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.COPY);} // Now uncompress the range.oldFileRafis.setRange(rangeToUncompress.getOffset(), rangeToUncompress.getLength()); long inverseRangeStart = filteredOut.getNumBytesWritten(); // TODO(andrewhayden): Support nowrap=false here? Never encountered in practice. // This would involve catching the ZipException, checking if numBytesWritten is still zero, // resetting the stream and trying again.filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.UNCOMPRESS_NOWRAP);lastReadOffset = rangeToUncompress.getOffset() + rangeToUncompress.getLength(); if (generateInverse) { long inverseRangeEnd = filteredOut.getNumBytesWritten(); long inverseRangeLength = inverseRangeEnd - inverseRangeStart;TypedRange<T> inverseRange = new TypedRange<T>(inverseRangeStart, inverseRangeLength, rangeToUncompress.getMetadata());inverseRanges.add(inverseRange);}} // Finish the final bytes of the file long bytesLeft = oldFileRafis.length() - lastReadOffset; if (bytesLeft > 0) {oldFileRafis.setRange(lastReadOffset, bytesLeft);filteredOut.pipe(oldFileRafis, PartiallyUncompressingPipe.Mode.COPY);}} finally { try {oldFileRafis.close();} catch (Exception ignored) { // Nothing} try {filteredOut.close();} catch (Exception ignored) { // Nothing}} return inverseRanges;}

這個函數比較巧妙,也比較復雜,其過程如下:

  • 遍歷需要解壓的列表,獲得其偏移,將該偏移減去上次讀的偏移位置lastReadOffset,得到一個gap值,這個值使用COPY直接拷貝子杰數組
  • 然后將數據定位到[offset,offset+length]之間,獲得已經解壓寫入的所有數據大小,賦值給inverseRangeStart,然后將壓縮數據使用對應的參數進行解壓,將上次讀的偏移位置lastReadOffset設置為當前的offset+length值。
  • 判斷generateInverse是否為true,這里這個值永遠為true,因為入參傳了true。獲得已經解壓寫入的所有數據大小,賦值給inverseRangeEnd,使用inverseRangeEnd減去inverseRangeStart就是解壓之后的大小,構建TypedRange對象,add到list中
  • 所有數據遍歷完之后,判斷當前讀的位置到文件結尾是否還有數據剩余,如果有,則繼續寫入
  • 返回TypedRange的List對象。

這個過程比較復雜抽象,用一張圖來說明整個文件解壓過程。

上圖是zip文件,綠色的gap是一些描述信息,紅色的表示真實的壓縮數據,藍色的表示文件末尾遺留的數據,對于gap,執行拷貝操作,對于壓縮數據,執行解壓操作,并返回解壓之后真實的偏移offset和解壓之后真實數據的大小length,所有數據遍歷完之后,文件末尾還有一部分遺留數據,對其執行拷貝操作。

特別注意返回的TypedRange是新文件的解壓之后的offset和length,這個數據十分重要,還原zip文件就靠這個數據了。

有了新舊文件的差量友好文件之后做什么呢,很簡單,使用BsDiff生成差量文件,然后將差量文件寫入patch文件。patch文件的格式如下:

OffsetBytesDescription備注
08Versioned Identifier頭部標記,固定值”GFbFv1_0”,UTF-8字符串
84Flags (currently unused, but reserved)標記未,預留
128Delta-friendly old archive size舊文件差量友好文件大小,64位無符號整型
204Num old archive uncompression ops舊文件待解壓文件個數,32位無符號整型
24iOld archive uncompression op 1…n舊文件待解壓文件的偏移和大小,總共n個
24+i4Num new archive recompression ops新文件待壓縮文件個數,32位無符號整型
24+i+4jNew archive recompression op 1…n新文件待壓縮文件的偏移和大小,總共n個
24+i+4+j4Num delta descriptor records新文件差量描述個數,32位無符號整型
24+i+4+j+4kDelta descriptor record 1…n差量算法描述記錄,總共n個
24+i+4+j+4+klDelta 1…n差量算法描述

Old Archive Uncompression Op的數據結構如下

BytesDescription備注
8Offset of first byte to uncompress待解壓的偏移位置,64位無符號整型
8Number of bytes to uncompress待解壓的字節個數,64位無符號整型

New Archive Recompression Op的數據結構如下

BytesDescription備注
8Offset of first byte to compress待壓縮的偏移位置,64位無符號整型
8Number of bytes to compress待壓縮的字節個數,64位無符號整型
4Compression settings壓縮參數,即壓縮級別,編碼策略,nowrap

Compression Settings的數據結構如下

BytesDescription備注
1Compatibility window ID兼容窗口,當前取值為0,即默認兼容窗口
1Deflate level壓縮級別,取值[1,9]
1Deflate strategy編碼策略,取值[0,2]
1Wrap mode取值0=wrap,1=nowrap

Compatibility Window即兼容窗口,其默認的兼容窗口ID取值為0,默認兼容窗口使用如下配置

  • 使用deflate算法進行壓縮(zlib)
  • 32768個字節的buffer大小
  • 已經被驗證的壓縮級別,1-9
  • 已經被驗證過的編碼策略,0-2
  • 已經被驗證過的wrap模式,wrap和nowrap

默認兼容窗口可以兼容Android4.0之后的系統。

這個兼容窗口是怎么得到的呢,其中有一個類叫DefaultDeflateCompatibilityWindow,可以調用getIncompatibleValues獲得其不兼容的參數列表JreDeflateParameters(壓縮級別,編碼策略,nowrap的承載體),內部通過排列組合這三個參數,對一段內容進行壓縮,產生壓縮后的數據的16進制的編碼,與內置的預期數據進行對比,如果相同則表示兼容,不相同表示不兼容。

這里有一個問題,官方表示可以兼容壓縮級別1-9,編碼策略0-2,wrap和nowrap,但是實際我測試下來,發現在pc上有一部分組合是不兼容的,大概約4個組合。Android上沒有測試過,不知道是否有這個問題。

Delta Descriptor Record用于描述差量算法,在當前的V1版Patch中,只有BsDiff算法,因此只有一條該數據結構,其數據結構如下:

BytesDescription備注
1Delta format ID差量算法對應的枚舉id,bsdiff取值0
8Old delta-friendly region start舊文件差量算法應用的偏移位置
8Old delta-friendly region length舊文件差量算法應用的長度
8New delta-friendly region start新文件差量算法應用的偏移位置
8New delta-friendly region length新文件差量算法應用的長度
8Delta length生成的差量文件的長度

生成patch文件的函數是writeV1Patch,其代碼如下:

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758 public void writeV1Patch(OutputStream out) throws IOException { // Use DataOutputStream for ease of writing. This is deliberately left open, as closing it would // close the output stream that was passed in and that is not part of the method's documented // behavior. @SuppressWarnings("resource")DataOutputStream dataOut = new DataOutputStream(out);dataOut.write(PatchConstants.IDENTIFIER.getBytes("US-ASCII"));//GFbFv1_0dataOut.writeInt(0); // Flags (reserved)dataOut.writeLong(deltaFriendlyOldFileSize); // Write out all the delta-friendly old file uncompression instructionsdataOut.writeInt(plan.getOldFileUncompressionPlan().size()); for (TypedRange<Void> range : plan.getOldFileUncompressionPlan()) {dataOut.writeLong(range.getOffset());dataOut.writeLong(range.getLength());} // Write out all the delta-friendly new file recompression instructionsdataOut.writeInt(plan.getDeltaFriendlyNewFileRecompressionPlan().size()); for (TypedRange<JreDeflateParameters> range : plan.getDeltaFriendlyNewFileRecompressionPlan()) {dataOut.writeLong(range.getOffset());dataOut.writeLong(range.getLength()); // Write the deflate informationdataOut.write(PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue);dataOut.write(range.getMetadata().level);dataOut.write(range.getMetadata().strategy);dataOut.write(range.getMetadata().nowrap ? 1 : 0);} // Now the delta section // First write the number of deltas present in the patch. In v1, there is always exactly one // delta, and it is for the entire input; in future versions there may be multiple deltas, of // arbitrary types.dataOut.writeInt(1); // In v1 the delta format is always bsdiff, so write it unconditionally.dataOut.write(PatchConstants.DeltaFormat.BSDIFF.patchValue); // Write the working ranges. In v1 these are always the entire contents of the delta-friendly // old file and the delta-friendly new file. These are for forward compatibility with future // versions that may allow deltas of arbitrary formats to be mapped to arbitrary ranges.dataOut.writeLong(0); // i.e., start of the working range in the delta-friendly old filedataOut.writeLong(deltaFriendlyOldFileSize); // i.e., length of the working range in olddataOut.writeLong(0); // i.e., start of the working range in the delta-friendly new filedataOut.writeLong(deltaFriendlyNewFileSize); // i.e., length of the working range in new // Finally, the length of the delta and the delta itself.dataOut.writeLong(deltaFile.length()); try (FileInputStream deltaFileIn = new FileInputStream(deltaFile);BufferedInputStream deltaIn = new BufferedInputStream(deltaFileIn)) { byte[] buffer = new byte[32768]; int numRead = 0; while ((numRead = deltaIn.read(buffer)) >= 0) {dataOut.write(buffer, 0, numRead);}}dataOut.flush();}

主要做了如下幾步:

  • 寫入文件頭,”GFbFv1_0”
  • 寫入標記位,預留,值為0
  • 寫入舊文件差量友好文件的大小
  • 寫入舊文件需要解壓的entry個數
  • 依次寫入舊文件n個待解壓的entry的偏移和長度
  • 寫入新文件需要壓縮的entry的個數
  • 依次寫入新文件n個待壓縮的entry的偏移和長度,兼容窗口(窗口id,壓縮級別,壓縮策略,nowrap)
  • 寫入差量算法描述個數,只使用了bsdiff,因此值為1
  • 寫入差量算法id,舊文件差量友好文件應用差量算法的偏移和長度,新文件差量友好文件應用差量算法的偏移和長度
  • 寫入patch文件的大小
  • 寫入bsdiff生成的patch文件內容

新文件的合成

合成主要通過com.google.archivepatcher.applier.FileByFileV1DeltaApplier的applyDelta,最終會調用到applyDeltaInternal方法,其代碼如下:

1234567891011121314151617181920212223242526private void applyDeltaInternal( File oldBlob, File deltaFriendlyOldBlob, InputStream deltaIn, OutputStream newBlobOut) throws IOException { // First, read the patch plan from the patch stream.PatchReader patchReader = new PatchReader();PatchApplyPlan plan = patchReader.readPatchApplyPlan(deltaIn);writeDeltaFriendlyOldBlob(plan, oldBlob, deltaFriendlyOldBlob); // Apply the delta. In v1 there is always exactly one delta descriptor, it is bsdiff, and it // takes up the rest of the patch stream - so there is no need to examine the list of // DeltaDescriptors in the patch at all. long deltaLength = plan.getDeltaDescriptors().get(0).getDeltaLength();DeltaApplier deltaApplier = getDeltaApplier(); // Don't close this stream, as it is just a limiting wrapper. @SuppressWarnings("resource")LimitedInputStream limitedDeltaIn = new LimitedInputStream(deltaIn, deltaLength); // Don't close this stream, as it would close the underlying OutputStream (that we don't own). @SuppressWarnings("resource")PartiallyCompressingOutputStream recompressingNewBlobOut = new PartiallyCompressingOutputStream(plan.getDeltaFriendlyNewFileRecompressionPlan(),newBlobOut,DEFAULT_COPY_BUFFER_SIZE);deltaApplier.applyDelta(deltaFriendlyOldBlob, limitedDeltaIn, recompressingNewBlobOut);recompressingNewBlobOut.flush();}

主要做了如下幾件事:

  • 解析patch文件生成PatchApplyPlan對象
  • 生成舊文件的差量友好文件
  • 應用合成算法,合成新文件的差量友好文件,于此同時新文件zip包在流的寫入過程中完成合成。

對于第一步,來看看如何解析的,解析代碼如下:

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111public PatchApplyPlan readPatchApplyPlan(InputStream in) throws IOException { // Use DataOutputStream for ease of writing. This is deliberately left open, as closing it would // close the output stream that was passed in and that is not part of the method's documented // behavior. @SuppressWarnings("resource")DataInputStream dataIn = new DataInputStream(in); // Read header and flags. byte[] expectedIdentifier = PatchConstants.IDENTIFIER.getBytes("US-ASCII"); byte[] actualIdentifier = new byte[expectedIdentifier.length];dataIn.readFully(actualIdentifier); if (!Arrays.equals(expectedIdentifier, actualIdentifier)) { throw new PatchFormatException("Bad identifier");}dataIn.skip(4); // Flags (ignored in v1) long deltaFriendlyOldFileSize = checkNonNegative(dataIn.readLong(), "delta-friendly old file size"); // Read old file uncompression instructions. int numOldFileUncompressionInstructions = (int) checkNonNegative(dataIn.readInt(), "old file uncompression instruction count");List<TypedRange<Void>> oldFileUncompressionPlan = new ArrayList<TypedRange<Void>>(numOldFileUncompressionInstructions); long lastReadOffset = -1; for (int x = 0; x < numOldFileUncompressionInstructions; x++) { long offset = checkNonNegative(dataIn.readLong(), "old file uncompression range offset"); long length = checkNonNegative(dataIn.readLong(), "old file uncompression range length"); if (offset < lastReadOffset) { throw new PatchFormatException("old file uncompression ranges out of order or overlapping");}TypedRange<Void> range = new TypedRange<Void>(offset, length, null);oldFileUncompressionPlan.add(range);lastReadOffset = offset + length; // To check that the next range starts after the current one} // Read new file recompression instructions int numDeltaFriendlyNewFileRecompressionInstructions = dataIn.readInt();checkNonNegative(numDeltaFriendlyNewFileRecompressionInstructions, "delta-friendly new file recompression instruction count");List<TypedRange<JreDeflateParameters>> deltaFriendlyNewFileRecompressionPlan = new ArrayList<TypedRange<JreDeflateParameters>>(numDeltaFriendlyNewFileRecompressionInstructions);lastReadOffset = -1; for (int x = 0; x < numDeltaFriendlyNewFileRecompressionInstructions; x++) { long offset = checkNonNegative(dataIn.readLong(), "delta-friendly new file recompression range offset"); long length = checkNonNegative(dataIn.readLong(), "delta-friendly new file recompression range length"); if (offset < lastReadOffset) { throw new PatchFormatException( "delta-friendly new file recompression ranges out of order or overlapping");}lastReadOffset = offset + length; // To check that the next range starts after the current one // Read the JreDeflateParameters // Note that v1 only supports the default deflate compatibility window.checkRange(dataIn.readByte(),PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue,PatchConstants.CompatibilityWindowId.DEFAULT_DEFLATE.patchValue, "compatibility window id"); int level = (int) checkRange(dataIn.readUnsignedByte(), 1, 9, "recompression level"); int strategy = (int) checkRange(dataIn.readUnsignedByte(), 0, 2, "recompression strategy"); int nowrapInt = (int) checkRange(dataIn.readUnsignedByte(), 0, 1, "recompression nowrap");TypedRange<JreDeflateParameters> range = new TypedRange<JreDeflateParameters>(offset,length,JreDeflateParameters.of(level, strategy, nowrapInt == 0 ? false : true));deltaFriendlyNewFileRecompressionPlan.add(range);} // Read the delta metadata, but stop before the first byte of the actual delta. // V1 has exactly one delta and it must be bsdiff. int numDeltaRecords = (int) checkRange(dataIn.readInt(), 1, 1, "num delta records");List<DeltaDescriptor> deltaDescriptors = new ArrayList<DeltaDescriptor>(numDeltaRecords); for (int x = 0; x < numDeltaRecords; x++) { byte deltaFormatByte = (byte)checkRange(dataIn.readByte(),PatchConstants.DeltaFormat.BSDIFF.patchValue,PatchConstants.DeltaFormat.BSDIFF.patchValue, "delta format"); long deltaFriendlyOldFileWorkRangeOffset = checkNonNegative(dataIn.readLong(), "delta-friendly old file work range offset"); long deltaFriendlyOldFileWorkRangeLength = checkNonNegative(dataIn.readLong(), "delta-friendly old file work range length"); long deltaFriendlyNewFileWorkRangeOffset = checkNonNegative(dataIn.readLong(), "delta-friendly new file work range offset"); long deltaFriendlyNewFileWorkRangeLength = checkNonNegative(dataIn.readLong(), "delta-friendly new file work range length"); long deltaLength = checkNonNegative(dataIn.readLong(), "delta length");DeltaDescriptor descriptor = new DeltaDescriptor(PatchConstants.DeltaFormat.fromPatchValue(deltaFormatByte), new TypedRange<Void>(deltaFriendlyOldFileWorkRangeOffset, deltaFriendlyOldFileWorkRangeLength, null), new TypedRange<Void>(deltaFriendlyNewFileWorkRangeOffset, deltaFriendlyNewFileWorkRangeLength, null),deltaLength);deltaDescriptors.add(descriptor);} return new PatchApplyPlan(Collections.unmodifiableList(oldFileUncompressionPlan),deltaFriendlyOldFileSize,Collections.unmodifiableList(deltaFriendlyNewFileRecompressionPlan),Collections.unmodifiableList(deltaDescriptors));}

分為以下幾個步驟:

  • 讀文件頭,校驗文件頭
  • 忽略4個字節的標記位
  • 讀舊文件差量友好文件的大小,并校驗,非負數
  • 讀舊文件待解壓的個數,并校驗,非負數
  • 讀n個舊文件待解壓的偏移,長度,并校驗,非負數
  • 讀新文件待壓縮的個數,并校驗,非負數
  • 讀n個新文件待壓縮的偏移,長度,并校驗,非負數,壓縮級別,編碼策略,nowrap值
  • 讀差量算法個數
  • 讀n個差量算法描述。差量算法id,舊文件應用差量算法的偏移和長度,新文件應用差量算法的偏移和長度,生成的差量文件的大小
  • 返回PatchApplyPlan對象

接下來就是根據返回的PatchApplyPlan對象,獲得舊文件待解壓的一個TypedRange的List對象,然后使用DeltaFriendlyFile.generateDeltaFriendlyFile生產差量友好文件,這個過程和生產patch的那個過程一樣,不重復描述。其代碼如下:

1234567891011121314151617181920private void writeDeltaFriendlyOldBlob( PatchApplyPlan plan, File oldBlob, File deltaFriendlyOldBlob) throws IOException {RandomAccessFileOutputStream deltaFriendlyOldFileOut = null; try {deltaFriendlyOldFileOut = new RandomAccessFileOutputStream(deltaFriendlyOldBlob, plan.getDeltaFriendlyOldFileSize());DeltaFriendlyFile.generateDeltaFriendlyFile(plan.getOldFileUncompressionPlan(),oldBlob,deltaFriendlyOldFileOut, false,DEFAULT_COPY_BUFFER_SIZE);} finally { try {deltaFriendlyOldFileOut.close();} catch (Exception ignored) { // Nothing}}

接下里就是合成新文件了,使用BsPatch算法完成合成并寫入Outputstream中,而這個OutputStream經過裝飾者模式包裝,最終傳入的是PartiallyCompressingOutputStream輸出流,構建PartiallyCompressingOutputStream對象所需參數就是新文件差量友好文件需要重新壓縮的數據的TypedRange的List對象。最終,合成Zip文件的工作會輾轉到PartiallyCompressingOutputStream中的writeChunk函數,其代碼如下:

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778private int writeChunk(byte[] buffer, int offset, int length) throws IOException { if (bytesTillCompressionStarts() == 0 && !currentlyCompressing()) { // Compression will begin immediately.JreDeflateParameters parameters = nextCompressedRange.getMetadata(); if (deflater == null) {deflater = new Deflater(parameters.level, parameters.nowrap);} else if (lastDeflateParameters.nowrap != parameters.nowrap) { // Last deflater must be destroyed because nowrap settings do not match.deflater.end();deflater = new Deflater(parameters.level, parameters.nowrap);} // Deflater will already have been reset at the end of this method, no need to do it again. // Just set up the right parameters.deflater.setLevel(parameters.level);deflater.setStrategy(parameters.strategy);deflaterOut = new DeflaterOutputStream(normalOut, deflater, compressionBufferSize);} int numBytesToWrite;OutputStream writeTarget; if (currentlyCompressing()) { // Don't write past the end of the compressed range.numBytesToWrite = (int) Math.min(length, bytesTillCompressionEnds());writeTarget = deflaterOut;} else {writeTarget = normalOut; if (nextCompressedRange == null) { // All compression ranges have been consumed.numBytesToWrite = length;} else { // Don't write past the point where the next compressed range begins.numBytesToWrite = (int) Math.min(length, bytesTillCompressionStarts());}}writeTarget.write(buffer, offset, numBytesToWrite);numBytesWritten += numBytesToWrite; if (currentlyCompressing() && bytesTillCompressionEnds() == 0) { // Compression range complete. Finish the output and set up for the next run.deflaterOut.finish();deflaterOut.flush();deflaterOut = null;deflater.reset();lastDeflateParameters = nextCompressedRange.getMetadata(); if (rangeIterator.hasNext()) { // More compression ranges await in the future.nextCompressedRange = rangeIterator.next();} else { // All compression ranges have been consumed.nextCompressedRange = null;deflater.end();deflater = null;}} return numBytesToWrite;} private boolean currentlyCompressing() { return deflaterOut != null;} private long bytesTillCompressionStarts() { if (nextCompressedRange == null) { // All compression ranges have been consumed return -1L;} return nextCompressedRange.getOffset() - numBytesWritten;} private long bytesTillCompressionEnds() { if (nextCompressedRange == null) { // All compression ranges have been consumed return -1L;} return (nextCompressedRange.getOffset() + nextCompressedRange.getLength()) - numBytesWritten;}

合成算法的核心就是這個函數了,這個函數設計的十分巧妙,建議打個斷點跑一跑,好好理解一下。這里簡單介紹一下這個過程。

  • 在PartiallyCompressingOutputStream的構造函數中,獲得了compressionRanges的第一個數據
  • 判斷寫入的數據距離下一個壓縮數據開始如果是0,且當前并不在壓縮,則獲得壓縮設置,即壓縮級別,編碼策略,nowrap,并進行設置。并包裝輸出流為壓縮流。
  • 如果當前正在壓縮,則判斷當前寫入的數據長度和待壓縮的數據長度,取其中小的一個,設置目標輸出流為壓縮流,即負責壓縮工作,而不是拷貝工作。
  • 如果當前不在壓縮,如果沒有下一個壓縮數據了,則直接寫入對應長度的數據,如果還有下一個壓縮數據,則取當前寫入數據的長度和距離下一個壓縮數據的偏移位置的長度,取其中小的一個,設置目標輸出流為正常流,即進行拷貝工作,而不是壓縮工作
  • 判斷當前是否正在壓縮,并且當前節點所有壓縮數據都已經寫入完全,執行壓縮流的finish和flush操作,重置壓縮相關配置項,并移動待壓縮的數據到下一條記錄。
  • 重復以上操作,直到所有數據寫入完全。

過程比較復雜,同樣的用一張圖來表示:

合成的新的差量友好的文件數據如上圖表示。

當遇到綠色的gap區域時,則執行二進制拷貝操作,將其拷貝到輸出流去,當遇到紅色的已經解壓的數據,會使用對應的壓縮級別,編碼策略,nowrap參數將數據進行壓縮,將藍色的剩余數據寫入目標數據。

就這樣整個合成操作就完成了。

這樣為何能完成zip文件的合成呢,上面的解析已經很清楚了,其實Google Archive Patch記錄了所有新文件中需要重新壓縮的數據的參數,對于這些數據,使用這些參數壓縮,得到對應的壓縮數據,寫入其在新文件的真實位置,而對于Zip文件中的其他數據,則執行的是拷貝操作,這樣兩種操作合起來,最終就產生了新的Zip文件。且對于Apk來說,我們也無需關心其簽名。

這個過程真是巧妙,感嘆一下 !

Patch文件的壓縮和解壓

Google Archive Patch不對patch文件進行壓縮,壓縮工作需要自己進行,保證patch文件的大小很小,而客戶端接受到patch后需要對應的解壓。這么做,保證了壓縮patch算法的充分自由,可自行選擇,方便擴展。

通用的差量生成和合成框架

這幾天簡單的實現了一個通用的差量生成和合成框架,github地址見?CorePatch?,目前已經實現bsdiff和Google Archive Patch以及全量合成(直接拷貝文件)

優化

生成差量文件優化

生成差量文件使用的是bsdiff,但是對應的基礎文件經過解壓之后,其文件大小大大變大,導致生成差量文件的時間大大增加,這里沒有辦法優化,唯一的優化點就是使用其他更優差量生成算法,而不是BsDiff算法。

合成新文件優化

合成使用BsPatch進行合成,這個過程是十分快的,因此這里可以不優化,但是需要優化的點是合成新的Zip文件過程,即上面提到的writeChunk函數,而這個函數,唯一的耗時點就是壓縮操作,基本上壓縮操作耗時占全部耗時的80%-90%左右,所以這里基本沒什么優化點。

總結

Google Archive Patch的核心是生成差量友好文件,應用差量算法,記錄新文件差量友好文件中需要重新壓縮的偏移和長度,應用合成算法合成新文件時,對于需要重新壓縮的數據,用patch中的壓縮相關的參數進行壓縮,得到壓縮數據,而對于非壓縮數據,如Zip文件格式中其他數據,則執行拷貝操作。最終完美的合成了新文件,這種方式的優點是patch比基于文件級別的bsdiff生成的要小,缺點是生成時間長,合成時間長。

該算法核心的一個基本要求就是使用相同的壓縮級別,編碼策略和nowrap參數,對相同的數據進行壓縮,得到的數據數據。如果這個前提如果不滿足,則該算法就沒有意義了。

http://fucknmb.com/2017/10/05/Google-Archive-Patch-%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90/ http://fucknmb.com/2017/10/05/Google-Archive-Patch-%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90/

總結

以上是生活随笔為你收集整理的Google Archive Patch 源码解析的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。