Cormine Intelligent Data

CorMine Intelligent Data's Knowledge Management Framework

We’ve developed a data-centric model that leverages our proprietary processes and technologies designed to capture and organize information based on automated analysis of the content of large data stores and the use of the controlled vocabularies –taxonomies, thesauri, and ontologies. This approach is flexible enough to allow us to apply it to our client’s unique needs and powerful enough to allow us to provide an inclusive suite of tools and services designed to allow researchers and analysts the most direct access possible to the information they need.

Our approach includes the following:

Life Cycle of Data:
We recognize that data has a shelf life. The accuracy, applicability and appropriateness of content changes over time as document stores grow and change via additions, revisions and shifts in the context. Our technologies and processes are designed with this life cycle in mind. Using iterative capture and classification processes, sophisticated document version and change records, meaningful data archives we can make sure that the information our technologies organize and retrieve is accurate up-to-date and in context.
Automated Large-Scale Text Classification:
Our classifiers are designed to discover and map relationships between concepts present in very large data stores via a number of advanced statistical methods. Depending upon the nature of available data and the needs of our customers, the product of the classifier can be used to enrich and extend an existing taxonomy or related organizational hierarchy, as an aid to the development of new taxonomies or, by loading the results of our classification processes into one of our custom search engines, to search and browse the concepts in the data itself.
Dynamic Knowledge Bases:
We call the principal product of our classification tools and techniques Dynamic Knowledge Bases(DKB) to distinguish them from convention, current generation databases. Our DKBs differ from conventional data stores in that they are organized both by the explicit meta-data (like traditional databases) and by the implicit concepts, the knowledge, that the data itself contains. Since the data structures are based on analysis of the content in the data rather than simply arbitrarily applied (or top down) labels, users are able to conduct searches based upon a consistent conceptual framework (in conjunction with keyword and full-text searches). As the data grows and changes, regular iterations of our classification processes allow the dynamic expansion and enrichment of the organizational structure used to map the concepts contained in data, thus ensuring that the conceptual framework that drives the searches are always relevant and up-to-date.
Multi-Source Fusion:
Our methods and technologies are designed to be format and source agnostic. This allows us to take the data wherever and however we find it and to classify it a common conceptual space. The content in this space may be comprised of many separate document sources, formats and genres. Our custom search engines allow users to search and retrieve, via a single set of queries, a wide variety of documents using concepts inherent in the combined corpora.
Concept based searches:
Keyword searches and straight text matching have inherent disadvantages. Keyword searches are only as useful as the keywords applied to a given document, which are often of limited scope and arbitrarily assigned. Text searches depend on users the predicting the natural language used in the desired documents. Concept searches, by comparison, are tremendously powerful and allow for effective and scalable searches based on the ideas contained within a single document and across the corpora as a whole.
Text Analytics:
Our classification techniques use a series of statistical algorithms that take advantage of the latent conceptual and semantic content of large sets of documents to uncover common concepts that may occur in various forms in the natural language. We leverage the product of these technologies and processes with a variety of powerful analytic tools to uncover patterns and continuities, to strengthen search and retrieval via our engines and, by way of thesauri and taxonomies, to enrich and expand the quality and effectiveness of the search tool itself.
Taxonomy and Ontology Utilization and Enhancement:
Our flexible classification technologies can be harnessed to take advantage of existing taxonomies or, via the analytics associated with our tools, enhance these taxonomies, or even build one from the concepts embedded in the data. Using these taxonomies as a framework, our tools can help to produce powerful ontologies that relate document sets to the conceptual schemata of a particular domain or enterprise.
Data Capture and Normalization:
Using a variety of spiders and crawlers, we can find and retrieve all manner of text based data regardless of its format and location. Our parsers and preprocessors allow us to classify and sort disparate data types (technical articles, patent applications, emails, blogs, wikis, research proposals, engineering specifications, progress reports, web-sites, etc) all within a single conceptual space.
Tailored Search and Retrieval Engines:
We’ve built our search engines from the ground up to take advantage of the strengths of our DKB technologies. Our Zeugma research engine is designed to be tailored to your specific data and your particular needs. Build searches based upon content areas or groups of concepts. Perform federated searches. Store, organize and share documents sets according to your particular needs and build upon concepts by grouping document sets, searches and search agents all packaged in an intuitive and powerful interface.

The dogmas of the quiet past are inadequate to the stormy present. The occasion is piled high with difficulty, and we must rise to the occasion. As our case is new, so we must think anew and act anew.
Abraham Lincoln

© Copyright 2006 by CorMine Intelligent Data, all rights reserved