Document Analyzer Using LLM

Abstract

This talk describes an R&D project on the automated analysis of documents using the openCAESAR framework and Large Language Models (LLMs). It identifies the challenges of manual document analysis and proposes a solution that leverages LLMs to structure unstructured document content into analyzable datasets guided by semantic ontologies. The talk outlines a five-step process involving ontology specification, automated dataset extraction, SME curation, knowledge base publishing, and user query analysis. Finally, it discusses future work aimed at enhancing the Document Analyzer’s integration, user experience, precision, and applicability to a wider range of documents.

Speaker

Kareem Elaasar

is a M.Sc. in Software Engineering student at the University of California, Irvine. His interests include machine learning, artificial intelligence, and systems engineering, with a focus on applying AI technologies to improve engineering processes and knowledge management. He has interned at NASA’s Jet Propulsion Laboratory, where he developed tools to convert documents into knowledge graphs using language models and graph databases. His recent projects explore semantic web technologies, UML diagram synthesis, and embedding inversion techniques for deep learning models.

Abstract

Speaker

Kareem Elaasar

Presentation