亚洲香蕉成人av网站在线观看_欧美精品成人91久久久久久久_久久久久久久久久久亚洲_热久久视久久精品18亚洲精品_国产精自产拍久久久久久_亚洲色图国产精品_91精品国产网站_中文字幕欧美日韩精品_国产精品久久久久久亚洲调教_国产精品久久一区_性夜试看影院91社区_97在线观看视频国产_68精品久久久久久欧美_欧美精品在线观看_国产精品一区二区久久精品_欧美老女人bb

首頁 > 學院 > 開發設計 > 正文

Lucene --open source text serch engine API(講稿)

2019-11-18 14:49:42
字體:
來源:轉載
供稿:網友

  /**
* 這是一個關于LUCene的講稿的txt格式。假如您需要pdf格式的可以
* 與我聯系(pengjy@263.net) 。
* 作者:pengjy
* 時間:2002-04
* keyWords: lucene, api, token, index, chinese, unicode
*/
................page 1 ................

Lucene

an open source text search engine API
high-performance,
full-featured,pure java

Pengjy@262.net

................page 2 ................
Agenda

Overview
APIs
How dose Search Engine Work
Feature
For Chinese character

................page 3 ................
Overview

An Apache Jakarta PRoject
High-performance, full-featured
Open source text search engine APIs
Easy to use, fast to build your own search engine

................page 4 ................
Overview

Version 1.2 rc4
applications using Lucence
2a.WebSearch
Jive Forums
RockyNewsgroup.org

................page 5 ................
APIs

org.apache.lucene.analysis
defines an abstract Analyzer API for converting
text from a java.io.Reader into a TokenStream,
an enumeration of Token's. A TokenStream is composed
by applying TokenFilter's to the output of a Tokenizer.
A few simple implemetations are provided, including
StopAnalyzer and the grammar-based StandardAnalyzer
(use JavaCC).

................page 6 ~9................
APIs

org.apache.lucene.document
provides a simple Document class. A document is
simply a set of named Field's, whose values may be
strings or instances of java.io.Reader.

org.apache.lucene.index
provides two primary classes: IndexWriter, which
creates and adds documents to indices; and IndexReader,
which ccesses the data in the index.

org.apache.lucene.queryParser
uses JavaCC to implement a QueryParser

org.apache.lucene.search
provides data structures to represent queries
(TermQuery for individual words, PhraseQuery for phrases,
and BooleanQuery for boolean combinations of queries) and
the abstract Searcher which turns queries into Hits.
IndexSearcher implements search over a single IndexReader.

org.apache.lucene.store
defines an abstract class for storing persistent
data,the Directory, a collection of named files written
by an OutputStream and read by an InputStream. Two
implementations are provided, FSDirectory, which uses
a file system directory to store files, and RAMDirectory
which implements files as memory-resident data structures.

org.apache.lucene.util
contains a few handy data structures, e.g.,
BitVector and PriorityQueue.

................page 10 ................
How dose Search Engine Work

Create indices

input -->analyzer-->filters-->tokens-->indices
^

tokenize

................page 11 ~ 14 ................
How dose Search Engine Work

Store Indices
Rather than maintaining a single index, it builds
multiple index segments. For each new document indexed,
Lucene creates a new index segment.
It merges small segments with larger ones -- this
keeps the total number of segments small so searches remain
fast.

To prevent conflicts (or locking overhead) between
index readers and writers, Lucene never modifies segments
in place, it only creates new ones. When merging segments,
Lucene writes a new segment and deletes the old ones --
after any active readers have closed it.

A Lucene index segment consists of several files:
A dictionary index containing one entry for each 100 entries
in the dictionary A dictionary containing one entry for
each unique word A postings file containing an entry for
each posting

Since Lucene never updates segments in place, they
can be stored in flat files instead of complicated B-trees.
For quick retrieval, the dictionary index contains offsets
into the dictionary file, and the dictionary holds offsets
into the postings file.

Lucene also implements a variety of tricks to compress
the dictionary and posting files -- thereby reducing disk
I/O -- without incurring substantial CPU overhead.

................page 15 ~ 22 ................
Feature

Incremental indexing
Incremental indexing allows easy adding of documents to
an existing index. Lucene supports both incremental and batch
indexing.

Data sources
Lucene allows developers to deliver the document to the
indexer through a String or an InputStream, permitting the
data source to be abstracted from the data. However, with
this approach, the developer must supply the appropriate
readers for the data. Feature

Indexing control
Some search engines can automatically crawl through a
directory tree or a Website to find documents to index.
Since Lucene Operates primarily in incremental mode, it lets
the application find and retrieve documents.

File formats
Lucene supports a filter mechanism, which offers a simple
alternative to indexing word processing documents, SGML
documents, and other file formats.

Content tagging
Lucene supports content tagging by treating documents
as collections of fields, and supports queries that
specify which field(s) to search. This permits semantically
richer queries like "author contains 'Hamilton' AND body
contains 'Constitution'".

Stop-word processing
Search engines will not index certain words, called stop
words.such as "a", "and," and "the". Lucene handles stop
words with the more general Analyzer mechanism, and provides
the StopAnalyzer class, which eliminates stop words from the
input stream.

Query features
Lucene supports a wide range of query features, including
all of those listed below:
Boolean queries; andqueries. return a "relevance" score
with each hit.
handle adjacency or proximity queries -- "search followed
by engine" or "Knicks near Celtics"
search on single keywords.
search multiple indexes at once and merge the results to
give a meaningful relevance score.

However, Lucene does not support the valuable "Soundex",
or "sounds like," query.

Concurrency
Lucene allows users to search an index transactionally,
even if another user is simultaneously updating the index.

Non-English support
As Lucene preprocesses the input stream through the
Analyzer class provided by the developer, it is possible to
perform language-specific filtering.

................page 23 ................
For Chinese character

JavaCC -- the Java Compiler Compiler.

build complex compilers for languages such as
Java or C++.
write tools that parse Java source code and perform
automatic analysis or transformation tasks.
EBNF (Extended Backus-Naur-Form)

................page 24 ................
For Chinese character

org.apache.lucene.analysis.standard.StandardTokenizer.jj

TOKEN : { // token patterns
)+ >
("." )+ > //email adress

}

................page 25 ................
For Chinese character

Add Uincode CJK to StandardTokenizer.jj
< #UNICODECJK:
[
"u4e00"-"u9faf", //CJK Unified Ideographs
"u3400"-"u4dbf", //CJK Unified Ideographs Extension A
"u3000"-"u303f", //CJK Symbols and Punctuation
"u2e80"-"u2eff", //CJK Radicals Supplement
"u3200"-"u32ff", //Enclosed CJK Letters and Months
"ufe30"-"ufe4f", //CJK Compatibility Forms
"u3300"-"u33ff", //CJK Compatibility
"uf900"-"ufaff" //CJK Compatibility Ideographs
]>

................page 26 ................
For Chinese character

Add Unicode CJK
Build Lucene (use Lucene 1.2 src and Ant 1.4)
Test windows 2000 server + weblogic 6.1 sp2 +
MSSQLserver 2000 + jive2.2.3 + Lucene


................page 27 ................

Thank you!

My mail:pengjy@263.net

................The end ................

發表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發表
亚洲香蕉成人av网站在线观看_欧美精品成人91久久久久久久_久久久久久久久久久亚洲_热久久视久久精品18亚洲精品_国产精自产拍久久久久久_亚洲色图国产精品_91精品国产网站_中文字幕欧美日韩精品_国产精品久久久久久亚洲调教_国产精品久久一区_性夜试看影院91社区_97在线观看视频国产_68精品久久久久久欧美_欧美精品在线观看_国产精品一区二区久久精品_欧美老女人bb
亚洲电影av在线| 久久精品精品电影网| 久久久999精品| 亚洲午夜av电影| 91亚洲va在线va天堂va国| 18性欧美xxxⅹ性满足| 亚洲桃花岛网站| 欧美重口另类videos人妖| 国产精品91视频| 久久久国产精彩视频美女艺术照福利| 日本高清久久天堂| 精品亚洲一区二区三区在线播放| 95av在线视频| 国产精品久久久久久久久男| 欧美一级淫片aaaaaaa视频| 亚洲色图偷窥自拍| 91精品国产91久久久久久最新| 88国产精品欧美一区二区三区| 欧美亚洲第一页| 亚洲色图国产精品| 亚洲视频999| 久久福利网址导航| 情事1991在线| 亚洲成在人线av| 高潮白浆女日韩av免费看| 亚洲高清福利视频| 日韩精品福利在线| 国产精品久久一| 亚洲成人久久一区| 福利一区视频在线观看| 国产一区二区三区网站| 亚洲精品资源美女情侣酒店| 国产精品电影一区| 欧美人与物videos| 热久久这里只有精品| 欧美二区在线播放| 欧美电影在线免费观看网站| 亚洲va欧美va国产综合久久| 91社区国产高清| 精品国产老师黑色丝袜高跟鞋| 国产一区二区三区日韩欧美| 亚洲综合社区网| 亚洲网站在线播放| 国产精品老牛影院在线观看| 欧美一级淫片videoshd| 国内精品久久久久伊人av| 日本亚洲精品在线观看| 黄色一区二区在线| 91精品国产综合久久男男| 亚洲女人初尝黑人巨大| 亚洲美女av在线播放| 亚洲午夜久久久久久久| 国产午夜精品美女视频明星a级| 在线电影欧美日韩一区二区私密| www.亚洲男人天堂| 国产精品一区二区av影院萌芽| 全亚洲最色的网站在线观看| 国产精品久久久久久久久久久新郎| 69影院欧美专区视频| 亚洲a区在线视频| 精品日本高清在线播放| 国产欧美精品久久久| 欧美性受xxxx黑人猛交| 亚洲人成在线电影| 日韩在线观看电影| 一区二区中文字幕| 国产精品视频免费观看www| 久久久精品国产| 国产主播在线一区| 欧美电影免费观看高清完整| 亚洲国产精彩中文乱码av在线播放| 夜夜嗨av色综合久久久综合网| 亚洲欧洲日韩国产| 中文字幕日韩精品在线观看| 日韩在线观看免费网站| 日韩精品中文字幕在线播放| 国产精品免费视频xxxx| 成人深夜直播免费观看| 亚洲电影免费观看高清完整版在线观看| 欧美在线视频免费播放| 国产在线日韩在线| 国产成+人+综合+亚洲欧洲| 亚洲视频视频在线| 久久免费国产视频| 精品亚洲国产视频| 国产精品久久久久久亚洲影视| 国产精品成av人在线视午夜片| 91色p视频在线| 精品视频—区二区三区免费| 欧美日韩在线视频首页| 国产盗摄xxxx视频xxx69| 日韩电视剧免费观看网站| 欧美日韩国产va另类| 国产一区二区黑人欧美xxxx| 久久视频中文字幕| 欧美怡红院视频一区二区三区| 精品亚洲一区二区三区| 亚洲少妇中文在线| 91精品国产高清久久久久久久久| 色偷偷av亚洲男人的天堂| 成人免费看黄网站| 亚洲xxxx视频| 亚洲综合成人婷婷小说| 精品久久久久久久久久久久久久| 欧美日韩综合视频网址| 亚洲一区二区三区四区在线播放| 欧美xxxwww| 欧美高清一级大片| 欧美激情影音先锋| 九色精品免费永久在线| 亚洲欧美另类自拍| 亚洲综合国产精品| 日韩视频永久免费观看| 中文字幕日韩高清| 国产精品男人的天堂| 在线日韩av观看| 国产精品伦子伦免费视频| 成人乱人伦精品视频在线观看| 久久久久在线观看| 国产日韩欧美中文| 欧美激情免费观看| 成人免费自拍视频| 欧美激情精品久久久久久久变态| 久久久久久免费精品| 欧美与欧洲交xxxx免费观看| 亚洲综合精品伊人久久| 欧美成人一区在线| 日韩精品在线第一页| 成人免费网站在线看| 久久免费国产视频| 亚洲综合中文字幕在线观看| 精品福利在线看| 国产91精品不卡视频| 国产精品亚洲激情| 国产精品久久久精品| 欧美日韩免费在线| 日韩精品中文字| 91精品国产乱码久久久久久蜜臀| 国产精品亚洲综合天堂夜夜| 亚洲欧美激情另类校园| 久久艳片www.17c.com| 日韩不卡中文字幕| 欧美国产一区二区三区| 欧美午夜无遮挡| 亚洲第一色中文字幕| 色中色综合影院手机版在线观看| 美女av一区二区| 欧美一级高清免费播放| 日韩在线观看免费高清| 日韩视频免费在线观看| 91成人天堂久久成人| 久久伊人91精品综合网站| 欧美国产极速在线| 欧美激情va永久在线播放| 欧美视频一二三| 久久久久久国产免费| 亚洲欧美中文字幕在线一区| 亚洲桃花岛网站| 日韩在线欧美在线| 欧美整片在线观看| 欧美日韩在线视频观看| 亚洲free嫩bbb| 国产成人精品av在线| 成人午夜激情网|