Speed Session: From Text to Metadata: Automated Product Tagging with Python and NLP

preview_player
Показать описание
Aayushi Verma is a Data Science Fellow at the Institute for Defense Analyses (IDA), where she collaborates with the Chief Data Officer to drive IDA's Data Strategy. She has developed numerous data pipelines and visualization dashboards to bring data-driven insights to staff. Her data science interests include machine learning/deep learning, image processing, and extracting stories from data. Aayushi holds an M.S. in Data Science from Pace University, and a B.Sc. (Hons.) in Astrophysics from the University of Canterbury.

As a research organization, the Institute for Defense Analyses (IDA) produces a variety of deliverables like reports, memoranda, slides, and other formats for our sponsors. Due to their length and volume, summarizing these products quickly for efficient retrieval of information on specific research topics poses a challenge. IDA has led numerous initiatives for historical tagging of documents, but this is a manual and time-consuming process, and must be led periodically to tag newer products. To address this challenge, we have developed a Python-based automated product tagging pipeline using natural language processing (NLP) techniques.

This pipeline utilizes NLP keyword extraction techniques to identify descriptive keywords within the content. Filtering these keywords with IDA's research taxonomy terms produces a set of product tags, serving as metadata. This process also enables standardized tagging of products, compared to the manual tagging process, which introduces variability in tagging quality across project leaders, authors, and divisions. Instead, the tags produced through this pipeline are consistent and descriptive of the contents. This product-tagging pipeline facilitates an automated and standardized process for streamlined topic summarization of IDA's research products, and has many applications for quantifying and analyzing IDA's research in terms of these product tags.

Рекомендации по теме
visit shbcf.ru