Data is an essential part of modern computational science that allows researchers to discover and answer new scientific questions. The successful collection, management, security, and storage of petabytes of data – generated by ongoing research and the work of partner programs such as the Atmospheric Radiation Measurement program, Air Force Weather, and Veterans Affairs – is enabled by division-wide efforts to improve the data lifecycle for users and collaborators. These efforts include:
- Projects such as Constellation DOI and DataFed, which aim to ease the identification, referencing, and retrieval of large datasets in laboratories across the nation and globally
- Groundbreaking security frameworks such as CITADEL, specifically designed to protect sensitive personal data that is necessary for research in health fields.
The explosion of data calls for the capability to train and deploy artificial intelligence (AI) and machine learning models at scale to support advanced data analysis and accelerate scientific discoveries. By partnering with both computational scientists and experimentalists, the National Center for Computational Sciences offers expertise in extreme-scale distributed training of AI models; deploying, evaluating, and improving AI frameworks and methods; and bridging high-performance computing with experimental facilities at the edge.
At the heart of the National Center for Computational Sciences’ data projects is the understanding that collaboration, data sharing, and accessibility are critical to scientific progress.