Dissect Lucene - 文件系统
org.apache.lucene.store
Directory, InputStream, OutputStream这三个抽象类构成了一个抽象的文件系统。
Directory中定义了一个基本文件系统的基本操作:
- public abstract class Directory {
- /** Returns an array of strings, one for each file in the directory. */
- public abstract String[] list()
- throws IOException;
- /** Returns true iff a file with the given name exists. */
- public abstract boolean fileExists(String name)
- throws IOException;
- /** Returns the time the named file was last modified. */
- public abstract long fileModified(String name)
- throws IOException;
- /** Set the modified time of an existing file to now. */
- public abstract void touchFile(String name)
- throws IOException;
- /** Removes an existing file in the directory. */
- public abstract void deleteFile(String name)
- throws IOException;
- /** Renames an existing file in the directory.
- If a file already exists with the new name, then it is replaced.
- This replacement should be atomic. */
- public abstract void renameFile(String from, String to)
- throws IOException;
- /** Returns the length of a file in the directory. */
- public abstract long fileLength(String name)
- throws IOException;
- /** Creates a new, empty file in the directory with the given name.
- Returns a stream writing this file. */
- public abstract OutputStream createFile(String name)
- throws IOException;
- /** Returns a stream reading an existing file. */
- public abstract InputStream openFile(String name)
- throws IOException;
- /** Construct a {@link Lock}.
- * @param name the name of the lock file
- */
- public abstract Lock makeLock(String name);
- /** Closes the store. */
- public abstract void close()
- throws IOException;
- }
InputStream, OutputStream中的分别定义了文件的Input以及Output(这里的文件并不是java.io.File)。
Lucene中对文件(索引文件)的操作也都是通过这三个最基本的类,而不是通过Java的I/O API,这样做的好处就是增加一层抽象,减少一层耦合。
比如Lucene中就提供了
FSDirectory - File System Directory,真实的磁盘文件系统
RAMDirectory - Random Access Memory Directory, 内存中的虚拟文件系统
这样,我们还可以实现其它的,比如
MYSQLDirectory - 存储在MySQL数据库中的文件系统
很多问题都是这样,通过多一层的抽象来解决。
1) FSDirectory
File System Directory
包括三个类,FSDirectory,FSInputStream, FSOutputStream。
很简单,是一个对Java I/O的Wrapper。
FSDirectory的构造函数是Private的,而它提供一个工厂方法(Factory Method)来构造具体的FSDirectory。构造的FSDirectory是被Cached的,主要原因有二:效率/同步。
- /** Returns the directory instance for the named location.
- *
- * <p>Directories are cached, so that, for a given canonical path, the same
- * FSDirectory instance will always be returned. This permits
- * synchronization on directories.
- *
- * @param file the path to the directory.
- * @param create if true, create, or erase any existing contents.
- * @return the FSDirectory for the named file. */
- public static FSDirectory getDirectory(File file, boolean create)
- throws IOException {
- file = new File(file.getCanonicalPath());
- FSDirectory dir;
- synchronized (DIRECTORIES) {
- dir = (FSDirectory)DIRECTORIES.get(file);
- if (dir == null) {
- dir = new FSDirectory(file, create);
- DIRECTORIES.put(file, dir);
- } else if (create) {
- dir.create();
- }
- }
- synchronized (dir) {
- dir.refCount++;
- }
- return dir;
- }
2) RAMDirectory
Memory resident Directory
包括:RAMFile, RAMDirectory, RAMInputStream, RAMOutputStream
用一种简单直观的内存来映射整个虚拟文件系统。
文件:RAMFile
- class RAMFile {
- Vector buffers = new Vector();
- long length;
- long lastModified = System.currentTimeMillis();
- }
RAMFile是这个虚拟文件系统的基本组成单位,由大小为BUFFER_SIZE的buffer数组来模拟文件的二进制流。
目录:采用String到RAMFile的名值对应的HashTable,存放在RAMDirectory中。
Popularity: 25%
Related entries:
- No Related Posts

November 27th, 2007 at 2:00 pm
How To Start A Blog…
I couldn’t understand some parts of this article, but it sounds interesting…