當前位置：首頁 > 编程语言 > java >内容正文

java

java内存中读文件_关于内存管理：读取Java中的大文件

發布時間：2023/12/14 java 25 豆豆

生活随笔收集整理的這篇文章主要介紹了 java内存中读文件_关于内存管理：读取Java中的大文件小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

我需要一個非常了解Java和內存問題的人的建議。

我有一個大文件(大約1.5GB)，我需要將此文件切成許多小文件(例如100個小文件)。

我通常知道如何做到這一點(使用BufferedReader)，但是我想知道您是否對內存有任何建議，或者提示如何更快地做到這一點。

我的文件包含文本，它不是二進制文件，每行大約有20個字符。

使用字節API(例如FileInputStream，ByteChannel)，而不要使用字符API(BufferedReader等)。否則，您將不必要地進行編碼和解碼。

使用字節分割文本文件是一個壞主意。

為了節省內存，請勿在內存中不必要地存儲/復制數據(即不要將其分配給循環外的變量)。只要輸入輸入，就立即處理輸出。

是否使用BufferedReader并不重要。正如一些隱含的暗示那樣，它不會花費太多的內存。最多只能將性能降低幾個百分點。使用NIO時也是如此。它只會提高可伸縮性，不會提高內存使用率。僅當您在同一個文件上運行數百個線程時，它才會變得有趣。

只需遍歷文件，在讀入時立即將每一行寫到其他文件，對行進行計數(如果達到100行)，然后切換到下一個文件，依此類推。

開球示例：

String encoding ="UTF-8";

int maxlines = 100;

BufferedReader reader = null;

BufferedWriter writer = null;

try {

reader = new BufferedReader(new InputStreamReader(new FileInputStream("/bigfile.txt"), encoding));

int count = 0;

for (String line; (line = reader.readLine()) != null;) {

if (count++ % maxlines == 0) {

close(writer);

writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("/smallfile" + (count / maxlines) +".txt"), encoding));

}

writer.write(line);

writer.newLine();

}

} finally {

close(writer);

close(reader);

}

是的，僅使用適當大小的字節緩沖區數組將其從FileInputStream傳遞到FilOutputStream。

對我來說，數行是行不通的。關鍵是：我有一個文件，我需要將其分割為200個文件(此文件可能會更改，它將來自數據庫)。我怎么做？僅計算行數是行不通的。能怎樣？

然后計算寫入的字節數，而不是行數。您可以事先知道文件大小(以字節為單位)。

使用lineStr.getBytes()。length？

例如。不要忘記指定正確的編碼！例如。 line.getBytes(encoding)。否則會搞砸。字節長度取決于所使用的字符編碼。如果您實際上不擔心txt行，那么我寧愿使用InputStream / OutputStream代替，并計算傳輸的字節數。順便說一下，不清楚是要說文件是存儲在數據庫中還是文件拆分參數是存儲在數據庫中。如果文件實際上也存儲在數據庫中，則可能是內存占用過多。確切的解決方案將取決于所使用的數據庫。

您使用的"關閉"方法怎么樣？

首先，如果文件包含二進制數據，則使用BufferedReader將是一個大錯誤(因為您會將數據轉換為String，這是不必要的，并且很容易破壞數據)；您應該改用BufferedInputStream。如果它是文本數據，并且需要按換行符進行分割，則可以使用BufferedReader(假設文件包含合理長度的行)。

關于內存，如果您使用大小合適的緩沖區，應該沒有任何問題(我將至少使用1MB的內存來確保HD主要執行順序讀取和寫入操作)。

如果發現速度是一個問題，則可以查看java.nio軟件包-據說它們比java.io快，

是的，我將使用BufferedReader，因為我有一個文本文件，需要逐行讀取它。現在，我還有另一個問題：寫入新文件時無法檢測到它的大小。這個想法是，當新文件的大小> xx MB時，將生成一個新文件。

@CC：您可以簡單地繼續累加要復制的行的String長度。但這取決于字符編碼如何將其轉換為文件大小(并且對于可變長度編碼(如UTF-8)根本無法正常工作)

我建議在FileOutputStream(在底部)和OutputStreamWriter之間添加自定義FilterOutputStream。實現此過濾器只是為了跟蹤通過它的字節數(apache commons io可能已經有這樣的實用程序了)。

另外，常見的誤解是" nio"比" io"快。在某些情況下可能是這種情況，但是通常將" nio"寫為比" io"更具可伸縮性，其中"可伸縮"不一定與"更快"相同。

@james：當它上面有一個BufferedWriter時，過濾器將無法產生正確的結果，盡管差異可能并不大。

它會落后，是的，但是替代方法是嘗試從chars近似字節，正如您所指出的那樣，這很丑陋。無論如何，我假設存在一個軟糖因素。如果計數需要非常準確，則可以在每一行之后刷新，但這當然會降低性能。

使用1MiB緩沖區的速度幾乎不可能比8到16 KiB之間的速度更快。

@Software Monkey：嗯，難道要不以16 KiB塊訪問HD會使其嚴重抖動嗎？操作系統或硬件緩存可能會通過預取來緩解這種情況。最后，最佳緩沖區大小可能是根據實際用例通過基準確定的。

@Michael：在我的測試中，對于大于此大小的緩沖區，批量讀取/寫入不再獲得任何有意義的吞吐量增加。 YMMV。那時，大約在2000年代初，最佳位置大約為10K。現在可能會更大一些，但可能不會太大。它可能約為X *磁盤分配單位，其中X很小。

@MichaelBorgwardt我遇到了同樣的問題，即我的信息檢索項目，我必須找出最佳的緩沖區大小以及最佳的讀寫器，我到處都讀到NIO工具比IO工具要快，但是在我的測試中，IO的工作速度更快!!

您可以考慮通過FileChannels使用內存映射文件。

對于大型文件，通常速度要快得多。 YMMV是性能折衷方案，可能會使其變慢。

相關答案：Java NIO FileChannel與FileOutputstream的性能/有用性

如果您只是直接閱讀文件，那么很可能不會給您帶來任何好處。

通常不會快很多。上次我對它進行基準測試時，我的閱讀率達到了20％。

這是一篇非常好的論文：

http://java.sun.com/developer/technicalArticles/Programming/PerfTuning/

總而言之，為了獲得出色的性能，您應該：

避免訪問磁盤。

避免訪問基礎操作系統。

避免方法調用。

避免單獨處理字節和字符。

例如，要減少對磁盤的訪問，可以使用大緩沖區。本文介紹了各種方法。

它必須用Java完成嗎？即是否需要與平臺無關？如果沒有，我建議在* nix中使用'split'命令。如果您確實需要，可以通過Java程序執行此命令。盡管我還沒有進行測試，但我想它的執行速度要比您能想到的任何Java IO實現都要快。

package all.is.well;

import java.io.IOException;

import java.io.RandomAccessFile;

import java.util.concurrent.ExecutorService;

import java.util.concurrent.Executors;

import junit.framework.TestCase;

/**

* @author Naresh Bhabat

Following ?implementation helps to deal with extra large files in java.

This program is tested for dealing with 2GB input file.

There are some points where extra logic can be added in future.

Pleasenote: if we want to deal with binary input file, then instead of reading line,we need to read bytes from read file object.

It uses random access file,which is almost like streaming API.

* ****************************************

Notes regarding executor framework and its readings.

Please note :ExecutorService executor = Executors.newFixedThreadPool(10);

* ?? ? ? ?for 10 threads:Total time required for reading and writing the text in

* ? ? ? ? :seconds 349.317

* ? ? ? ? For 100:Total time required for reading the text and writing ? : seconds 464.042

* ? ? ? ? For 1000 : Total time required for reading and writing text :466.538

* ? ? ? ? For 10000 ?Total time required for reading and writing in seconds 479.701

public class DealWithHugeRecordsinFile extends TestCase {

static final String FILEPATH ="C:\\springbatch\\bigfile1.txt.txt";

static final String FILEPATH_WRITE ="C:\\springbatch\\writinghere.txt";

static volatile RandomAccessFile fileToWrite;

static volatile RandomAccessFile file;

static volatile String fileContentsIter;

static volatile int position = 0;

public static void main(String[] args) throws IOException, InterruptedException {

long currentTimeMillis = System.currentTimeMillis();

try {

fileToWrite = new RandomAccessFile(FILEPATH_WRITE,"rw");//for random write,independent of thread obstacles

file = new RandomAccessFile(FILEPATH,"r");//for random read,independent of thread obstacles

seriouslyReadProcessAndWriteAsynch();

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

Thread currentThread = Thread.currentThread();

System.out.println(currentThread.getName());

long currentTimeMillis2 = System.currentTimeMillis();

double time_seconds = (currentTimeMillis2 - currentTimeMillis) / 1000.0;

System.out.println("Total time required for reading the text in seconds" + time_seconds);

}

/**

* @throws IOException

* Something ?asynchronously serious

public static void seriouslyReadProcessAndWriteAsynch() throws IOException {

ExecutorService executor = Executors.newFixedThreadPool(10);//pls see for explanation in comments section of the class

while (true) {

String readLine = file.readLine();

if (readLine == null) {

break;

}

Runnable genuineWorker = new Runnable() {

@Override

public void run() {

// do hard processing here in this thread,i have consumed

// some time and ignore some exception in write method.

writeToFile(FILEPATH_WRITE, readLine);

// System.out.println(" :" +

// Thread.currentThread().getName());

}

};

executor.execute(genuineWorker);

}

executor.shutdown();

while (!executor.isTerminated()) {

}

System.out.println("Finished all threads");

file.close();

fileToWrite.close();

}

/**

* @param filePath

* @param data

* @param position

private static void writeToFile(String filePath, String data) {

try {

// fileToWrite.seek(position);

data ="

" + data;

if (!data.contains("Randomization")) {

return;

}

System.out.println("Let us do something time consuming to make this thread busy"+(position++) +" ? :" + data);

System.out.println("Lets consume through this loop");

int i=1000;

while(i>0){

i--;

}

fileToWrite.write(data.getBytes());

throw new Exception();

} catch (Exception exception) {

System.out.println("exception was thrown but still we are able to proceeed further"

This can be used for marking failure of the records");

//exception.printStackTrace();

}

是。

我還認為，將read()與read(Char []，int init，int end)之類的參數一起使用是讀取較大文件的更好方法

(例如：read(buffer，0，buffer.length))

而且我還遇到了以下問題：對于二進制數據輸入流，使用BufferedReader而不是BufferedInputStreamReader會丟失值。因此，在這種情況下，使用BufferedInputStreamReader會更好。

您可以使用比傳統輸入/輸出流更快的java.nio：

http://java.sun.com/javase/6/docs/technotes/guides/io/index.html

請參閱我對Michael Borgwardts帖子的評論。

除非您不小心讀了整個輸入文件而不是逐行讀取，否則您的主要限制將是磁盤速度。您可能要嘗試從一個包含100行的文件開始，然后將其寫入100個不同的文件，每個文件一行一行，并使觸發機制在寫入當前文件的行數上起作用。該程序可以輕松擴展以適應您的情況。

不要使用沒有參數的read。

非常慢

最好讀取它以緩沖并快速將其移動到文件中。

使用bufferedInputStream，因為它支持二進制讀取。

這就是全部。

總結

以上是生活随笔為你收集整理的java内存中读文件_关于内存管理：读取Java中的大文件的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： react 断网提示
下一篇：中高级Java面试题解析，剑指BATJ，