Project 1: Cloud Computing
Mentor: Volodymyr Kindratenko
The goal of this project is to deploy, maintain, and experiment with the latest release of OpenStack cloud operating system software on a cluster at the Innovative Systems Lab. The purpose of this experimental OpenStack deployment is to gain and maintain operational awareness of the new features and functionality ahead of the NCSA's production cloud, provide NCSA staff and affiliate faculty with a platform to experiment with the new OpenStack functionality, and to study and evaluate new projects within the OpenStack environment. This project is best suited for students interested in system administration, deployment and operation of complex cloud and HPC environments. Requirements: experience with distributed systems.
Project 2: Deep Learning
Mentor: Volodymyr Kindratenko
This project will involve deployment and evaluation of existing deep learning frameworks on an HPC cluster and on a cloud. The goal is to gain hands-on experience with deep learning codes, frameworks, and methodologies and to support upcoming projects requiring deep learning. The work may also require parallelizing codes to work on multiple nodes. This project is best suited for students interested in the development of machine learning techniques and their applications in science and technology fields. Requirements: Machine Learning/Deep Learning coursework.
Project 3: Computational Materials Science: Multi-scale Simulations and Machine Learning
Co-mentors: Andre Schleife and Andrew Ferguson
Modern computational materials science uses sophisticated simulation techniques to study properties of advanced and complex materials, including biomolecules, condensed-matter crystals, and polymers. At the same time, bridging length and time scales from atomistic resolution to actual samples is an important challenge. In this project, we aim to combine atomistic simulations and Maxwell modeling techniques, to accurately describe nano- and meso-structured materials. These simulations are computationally challenging and, while they yield accurate results, their high computational cost renders it difficult to apply them for high-throughput materials design.
By extracting data from these simulations, collecting that data in well-structured databases using modern materials schemas, and establishing connections to underlying structural descriptors, we aim to leverage supervised machine-learning techniques to significantly accelerate the materials design process. Working towards this goal, students will use existing and generate new data for complex structures, using Maxwell modeling. They will develop an open-source tool that interfaces an external Maxwell solver with the scikit-learn Python-based machine-learning library to perform supervised machine learning and guided materials design and discovery. In order to disseminate results to a broad scientific audience and the general public, using accurate yet intuitive visualization, students will have the opportunity to develop codes based on the open-source ray-tracer Blender/LuxRender and the open-source yt framework to produce image files and movies that are compatible with virtual and mixed reality viewers such as Google Cardboard or Windows Mixed Reality. View examples for possible outcomes.
Project 4: Data Storage and Analysis Framework for Semiconductor Nanocrystals used in Bioimaging
Co-mentors: Andre Schleife and Michal Ondrejcek
Light-emitting molecules are a central technology in biology and medicine that provide the ability to optically tag proteins and nucleic acids that mediate human disease. In particular, fluorescent dyes are a key part of molecular diagnostics and optical imaging reagents. We recently made major breakthroughs in engineering fluorescent semiconductor nanocrystals to increase the number of distinct molecules that can be accurately measured, far beyond what is possible with such organic dye molecules. We aim to develop nanocrystals that are able to distinguish diseased from healthy tissue and determine how the complex genetics underlying cancer respond to therapy, using measurement techniques and microscopes that are already widely accessible.
In order to achieve this goal, we need to understand a complex design space, that includes size, shape, composition, and internal structure of the different nanocrystals. To this end, we have started implementing a database that stores and catalogs optical properties and other relevant data describing semiconductor nanocrystals. Students in this team will work with computational and experimental researchers in several departments in order to turn this data into descriptors that are useful and efficient in the context of machine-learning. Schemas will be extended accordingly and the web interface will be improved such that data and analysis workflows can be efficiently shared between multiple researchers.
Students will first test the current descriptors and then implement improvements based on these tests. The framework will be interfaced with Globus and the Materials Data Facility and their underlying work flows. Students will also develop code that automatically analyzes data stored in the facility, e.g. to verify and validate experimental and computational results against each other. This project is highly interdisciplinary and students will work with a team of researchers in bioengineering, materials science, mechanical engineering, and NCSA.
Project 5: Parallel Framework Integration for Analysis and Visualization in Python
Mentor: Matthew Turk
Students will work on the Python-based, MPI-parallel analysis and visualization tool yt to enable seamless interoperability with other new tools such as dask and xarray. Students should be familiar with Python, the Python ecosystem of tools, and would benefit from familiarity with MPI, Cython, and potentially I/O libraries such as HDF5.
Project 6: Resolving Racial Health Disparities by using Advanced Statistics and Machine Learning on Complex Multidimensional Datasets
Mentors: Liudmila Mainzer, Zeynep Madak-Erdogan
African American women have a 4-5 fold greater risk of death from breast cancer compared to Caucasian women, even after controlling for stage at diagnosis, treatment, and other known prognostic factors. Our initial cross-sectional studies suggest that the composition of serum from African American vs. Caucasian women were different and reflected biochemical changes due to socioeconomic status. Thus, we are now tackling a complex multidimensional dataset including proteomic, genomic, biometric, geographic and socioeconomic measurements. These dimensions need to be harmonized and correct statistical approaches applied, in order to determine the exact combination of factors that drive this racial health disparity. Additionally, we are planning to increase the size of our dataset, which will make the problem computationally challenging. We are also extending our analyses to other health disparity problems and other datasets. We invite a talented student to participate in this important and exciting project, and get involved in optimization of our analyses pipelines, development of advanced statistical approaches and data analytics. Skills desired: Statistics, machine learning, computing, bioinformatics.
Project 7: The Impact of Housing Discrimination on the Pollution Exposure Gap in the United States
Mentor: Ignacio Sarmiento Barbieri
The choice of residential affects the neighborhood with which one interacts on a daily basis. It also serves as a primary channel through which households can express demand for amenities and public services (e.g. parks, clean air, public safety). But what happens when individuals are not free to choose where to live? In this project we are examining the effects of racial discrimination in the housing market. In particular, we are interested on a key public good, clean air. We are generating experimental evidence on the impact.
Project 8: Modeling and Detection of Black Hole Collisions with the Blue Waters Supercomputer
Mentors: Eliu Huerta, Roland Haas
The Laser Interferometer Gravitational-Wave Observatory's (LIGO) detection of gravitational waves from merging black holes in September 2014 inaugurated a new era in astronomy and astrophysics, opening a window to observe the Universe through gravitational radiation. Occurring 100 years after Einstein's announcement of his theory of general relativity, the detection spurred world-wide interest in physics and science in general, making headline news around the world. The recent Nobel Prize awarded for this detection and the announcement of the detection of the double binary neutron star system by LIGO/Virgo underline the importance of these efforts and the interest that the wider society has in it.
In this project, a pair of REU-INCLUSION students will write Python/C libraries to extract information from numerical relativity simulations that describe mergers of black holes and neutron stars. Through this work the students will become familiar with one of the more exciting research topics in contemporary astronomy, and this work will provide them with new tools to study phenomena across science domains that require high performance environments. These simulations will also be used to create scientific visualizations for outreach purposes.