There are several approaches to this, all of which in principle can work.
There are companies and technologies that are doing it bottom up - by embedding semantical annotations (meta-data) right into the data. The opposite camp is exploring the top-down approach, which relies on analyzing existing information.
The ultimate top-down solution would be a fully blown natural language processor, which is able to understand text like people do.
Download ebooks on http://www.frenchtheory.com/
See that post with different algorithms in metabole
See the journal French Metablog with today different posts