本文實例講述了C#編程讀取文檔Doc、Docx及Pdf內容的方法。分享給大家供大家參考。具體分析如下:
Doc文檔:Microsoft Word 14.0 Object Library (GAC對象,調用前需要安裝word。安裝的word版本不同,COM的版本號也會不同)
Docx文檔:Microsoft Word 14.0 Object Library (GAC對象,調用前需要安裝word。安裝的word版本不同,COM的版本號也會不同)
Pdf文檔:PDFBox
/* 作者:GhostBear */using System;using System.Collections.Generic;using System.Linq;using System.Text;using System.IO;using System.Text.RegularExpressions;using org.pdfbox.pdmodel;using org.pdfbox.util;using Microsoft.Office.Interop.Word;namespace TestPdfReader{ class Program { static void Main(string[] args) { //PDF PDDocument doc = PDDocument.load(@"C:/resume.pdf"); PDFTextStripper pdfStripper = new PDFTextStripper(); string text = pdfStripper.getText(doc); string result = text.Replace('/t', ' ').Replace('/n', ' ').Replace('/r', ' ').Replace(" ", ""); Console.WriteLine(result); //Doc,Docx object docPath = @"C:/resume.doc"; object docxPath = @"C:/resume.docx"; object missing=System.Reflection.Missing.Value; object readOnly=true; Application wordApp; wordApp = new Application(); Document wordDoc = wordApp.Documents.Open(ref docPath, ref missing, ref readOnly, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing); string text2 = FilterString(wordDoc.Content.Text); wordDoc.Close(ref missing, ref missing, ref missing); wordApp.Quit(ref missing, ref missing, ref missing); Console.WriteLine(text2); Console.Read(); } private static string FilterString(string input) { return Regex.Replace(input, @"(/a|/t|/n|/s+)", ""); } }}
希望本文所述對大家的C#程序設計有所幫助。
新聞熱點
疑難解答