United States   > change
Innovation Home
Princeton University
People, innovation and fun: Xerox executive discusses leadership and technology

NSF Frontier Series
Corporate Innovation Strategies in a Global Econom

Vandebroek talks Services Research at the First Services Research Innovation Initiative Symposium
XGS Innovation Thought Leader
Message From the CTO
Current Research Themes
Managing Innovation
GIO Podcast: An Innovation Conversation with Xerox & IBM
Fortune Blog with Sophie Vandebroek
Innovation Organizations
Research and Development
Engineering Center
Intellectual Property Operations
PARC
Innovation Resources
Conferences
Executive Biographies
Focus on Innovation Archive
Innovation Interests
Innovation Newsroom
Multimedia Resources
Publications
Xerox Supports Open Formats
 


Surpassing Search: New Xerox text mining software goes beyond "keywords" to deliver more relevant information

What it is: New Xerox text mining software for service offerings is discriminating, smart and easy to use.

How it's different: Takes ordinary search to the next level by digging into more documents, analyzing meaning of words and context, accepting queries in everyday language.

Why it matters: Makes it easy to retrieve information from massive data bases in legal cases, fraud detection, drug discovery, risk management, and more.

Researchers at Xerox unveiled FactSpotter, new document search software that goes beyond conventional "keyword" search, enabling it, in effect, to spot the one or two golden nuggets among the pebbles on the shore. Developed in Grenoble, France, by researchers at the Xerox Research Centre Europe, the new text mining software combines a powerful linguistic engine with an easy-to-use interface so that anyone can query the system in everyday language. Unlike traditional enterprise search tools, FactSpotter looks not only for the keywords contained in a query but also the context of the document those words contain. For example, if searching for documents that reference Angelina Jolie, FactSpotter will also return results where the pronoun "she" is used instead of Jolie's full name.

The "smart" search engine can comb through almost any document regardless of the language, location, format or type; take advantage of the way humans think, speak and ask questions; and discriminate the results highlighting just a handful of relevant answers instead of returning thousands of unrelated responses.

"Our advanced search engine goes beyond today's typical 'keyword' search or current data-mining programs, which typically end up searching only 40 percent of all the documents that are relevant because the keywords are too limiting," said Frédérique Segond, manager of parsing and semantics research at XRCE. "Xerox's tool is more accurate because it delves into documents, extracting the concepts and the relationships among them. By 'understanding' the context, it returns the right information to the searcher, and it even highlights the exact location of the answer within the document."

FactSpotter is part of Xerox's ongoing intelligent document technology research that complements its growing portfolio of services-related innovations. The technology helps customers better manage data and document-intensive work processes in industries like banking, finance and legal. Xerox plans to launch FactSpotter next year as part of its Xerox Litigation Services offerings, which include electronic discovery (e-discovery) services that primarily support legal and regulatory compliance.

"Today's knowledge worker has quite a task in front of them. Each and every day they search for specific data, information, or corporate knowledge in order to do their job well," said Mike Maziarka, director, InfoTrends Dynamic Content Software and Image Scanning Trends Consulting Services. "We all need tools that will make it easier to search for that 'needle' among the 'haystack' of masses of information that exist in our world today. FactSpotter meets this need because it can make searches easier to conduct, more accurate, and more encompassing. This ultimately improves the focus of the results and allows workers to be more productive."

Next Generation of Searching

The new software goes beyond traditional search engines in several ways:

FactSpotter's novel interface means users can express their queries naturally instead of forcing them to adapt their questions to the logic of computers. Traditional systems, on the other hand, split a query into isolated words and return only documents that contain exactly those words.

Unlike traditional search engines that return the entire document forcing the user to find the relevant information manually, FactSpotter returns the specific portion of a search document that is relevant to the query.

FactSpotter takes into account the context of the entire document instead of just a cluster of nearby words. It introduces the concept of "relation," searching within and across sentences and paragraphs.

FactSpotter recognizes abstract concepts, like "people" or "building," and will retrieve all the words that fit within that category.

By analyzing the meaning of both the query and the searched document, FactSpotter will dramatically simplify and speed up time-consuming activities. For example, during the electronic discovery phase of a legal trial, FactSpotter will allow specific facts to be found quickly and easily among thousands (and often millions) of different documents. By delivering complete and relevant answers quickly and easily, FactSpotter could revolutionize the operations of data-intensive businesses such as electronic legal discovery, risk management, pharmaceutical research, competitive and market intelligence, security intelligence and fraud detection.
 
Focus on Innovation Archive
2008
Xerox Honors Local Inventors at Annual Patent Dinner
Public Gets Sneak Peek at Xerox’s Erasable Paper at WIRED NextFest
Xerox Makes Environmental Remediation Patents Available to All Through Eco-Patent Commons
Scientists Develop 3-D Document Visualization for "No Surprises" Printing
DARPA program builds on PARC foundation in printing large-area, flexible electronics
Xerox Joins IORG
Xerox Research Centre Europe coordinates EU CACAO project to provide cross-language access to online catalogues and libraries
Incubating Inside Xerox Labs: Innovation that Benifits the Workplace, Healthcare, and the Environment
Robert Loce Elected SPIE Fellow
Rochester Engineering Society Celebrates Technical Excellence
Xerox is Among the World's Best Analyst Competing to Win the Edelman Prize for Achievemnt in Operations Research & Analytics
Patent Powerhouse: Xerox Boasts 101 Inventors with 50 or More Patents
2007
Xerox Reveals Breakthrough Software that Categorizes Text and Images at the Same Time
Xerox funds new services laboratory at NC State University
The Science Consultant Program: Bringing Science to Life for 40 Years
Xerox Technology Tricks Counterfeiters
Xerox Opens Its Labs to Journalists on TechDay
R&D Magazine Lauds Xerox FreeFlow VI Software Suite
Getting to 100 before 50; Xerox scientist Bob Loce Reaches Patent Milestone
Xerox to Fund Green, Nano, Imaging Fellowships at MIT School of Engineering
Know-How Results in breakthrough paper: saves trees and money
Xerox Funds 11 New University Research Projects
Surpassing Search: New Xerox text mining software goes beyond "keywords" to deliver more relevant information
Xerox receives the National Medal of Technology
Now You See It, Now You Don't: Xerox Scientists Develop Fluorescent Writing To Deter Counterfeiting
Xerox Scientist Creates 'Color Language' Making Color Matching as Easy as Describing a Color
PARC Scientist Stu Card Wins Franklin Institute Bower Award for Achievement in Science
Inside Innovation at Xerox: Scientists Create a Rainbow of Custom Blended Colors for DocuTech Highlight Color Systems
Xerox's Santokh Badesha Reaches Rare Milestone; Inventor Awarded 150th Patent
Content Centric Networking
Groundbreaking Canadian Nanotechnology Partnership Lays Foundation For Big Success From Tiny Tech
Xerox Awarded 27 Percent More Patents In 2006
2006
2005
2004
2003
2002
2001
Contact Us: for questions about Xerox research and innovation, patents or technology licensing, scientific work and related inquiries, please email: xigwebmaster@xerox.com

Outside Submissions: Xerox encourages and welcomes unsolicited ideas and suggestions. More information on submitting your ideas to Xerox for review can be found here.

If you have any questions, please don't hesitate to contact us by email at Outsidesubmissions@xerox.com.

For all other inquiries, please use the appropriate contacts listed at Contact Xerox.