We are developing a recommendation engine to improve the editorial process in user contributed wiki articles. The project aims to develop a proof-of-concept website emulating Wikipedia that accepts user submitted articles, filters spam/vandalism, matches articles with editors and offers semantic search capability over this restricted index.
In modern information retrieval the standard keyword model fails to provide an adequate representation that facilitates the features described above. We plan on using recent developments in machine learning to address these issues, offering semantic search capability through an indexing process that draws on parallel implementations of Incremental Singular Value Decomposition and Latent Dirichlet Allocation. These heavyweight algorithms are proven methods for returning document matches beyond the keyword model through disambiguation and topic discovery.
We hope to enable a concurrent 'wrapper' in our modeling of user, document, editor behavior that will further utilize these parallel implementations. Through this we will allow near 'real time' access, indexing and filtering, subsequently providing a more robust means of document retrieval.