java poi读取word内容
生活随笔
收集整理的這篇文章主要介紹了
java poi读取word内容
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
1.添加jar包
<dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>4.1.0</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml</artifactId><version>4.1.0</version></dependency><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml-schemas</artifactId><version>4.1.0</version></dependency><!-- https://mvnrepository.com/artifact/org.apache.poi/poi-scratchpad --><dependency><groupId>org.apache.poi</groupId><artifactId>poi-scratchpad</artifactId><version>4.1.0</version></dependency>2.讀取所有內(nèi)容(非表格)
public static String readDoc(String path) {String resullt = "";//首先判斷文件中的是doc/docxtry {if (path.endsWith(".doc")) {InputStream is = new FileInputStream(new File(path));WordExtractor extractor = new WordExtractor(is);resullt = extractor.getText();//輸出word文檔所有的文本System.out.println(extractor.getText());System.out.println("=================1=================");System.out.println("==================2================");// //輸出頁腳的內(nèi)容System.out.println("頁腳:" + extractor.getDocument());// System.out.println("===============4===================");// //輸出當前word文檔的元數(shù)據(jù)信息,包括作者、文檔的修改時間等。System.out.println(extractor.getMetadataTextExtractor().getText());System.out.println("===============5===================");//獲取各個段落的文本String paraTexts[] = extractor.getParagraphText();for (int i=0; i<paraTexts.length; i++) {System.out.println("Paragraph " + (i+1) + " : " + paraTexts[i]);}//輸出當前word的一些信息System.out.println(extractor.getTextFromPieces());System.out.println("=============6=====================");//輸出當前word的一些信息System.out.println(extractor.getMetadataTextExtractor());System.out.println("===============7===================");System.out.println(extractor.getEndnoteText());System.out.println("===============8===================");extractor.close();} else if (path.endsWith(".docx")) {OPCPackage opcPackage = POIXMLDocument.openPackage(path);POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage);resullt = extractor.getText();extractor.close();} else {System.out.println("此文件不是word文件");}} catch(Exception e){e.printStackTrace();}return resullt; }3.讀取表格內(nèi)容
以下代碼包含讀取段落內(nèi)容、表格內(nèi)容
public static void readTableData(String path){try {InputStream is = new FileInputStream(path);XWPFDocument doc = new XWPFDocument(is);List<XWPFParagraph> paras = doc.getParagraphs();for (XWPFParagraph para : paras) {//當前段落的屬性//CTPPr pr = para.getCTP().getPPr();System.out.println(para.getText());}//獲取文檔中所有的表格List<XWPFTable> tables = doc.getTables();List<XWPFTableRow> rows;List<XWPFTableCell> cells;for (XWPFTable table : tables) {//表格屬性//CTTblPr pr = table.getCTTbl().getTblPr();//獲取表格對應的行rows = table.getRows();for (XWPFTableRow row : rows) {//獲取行對應的單元格cells = row.getTableCells();for (XWPFTableCell cell : cells) {System.out.println(cell.getText());}}}is.close();} catch (Exception e) {e.printStackTrace();}}參考文章:java poi word 表格_java 使用POI 讀寫word 表格 https://blog.csdn.net/weixin_33045961/article/details/114433011
總結
以上是生活随笔為你收集整理的java poi读取word内容的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 李宏毅机器学习Homework1(代码简
- 下一篇: Hadoop各组件详解