Skip to main content

Posts

Showing posts with the label nutch

How to index file names (and other file metadata) in nutch?

It seems like nutch indexes only (some) parse results. It runs the indexing filters which detremine what is indexed. These indexing filters get a Parse result as a parameter. How can I achieve file names and other file metadata like owner being indexed? Of course I need to add an indexing filter, but to do I also have to add a parser for parsing all filetypes and getting their metadata?