Gulf Con 2026 Presentation Notes

Hey All!
It was great seeing everyone at Gulf Con a couple weeks ago! I am copying the notes for all of our Data Management Sessions below! Please reply to this thread or feel free to add any additional details here if there was something that I missed. Thanks!

Presentation 1 - GRIIDC: How an established repository evolves with changing technology and standards

  • GRIIDC is a multidisciplinary data repository
    • Making data publicly available through the HART institute at TAMUCC
    • The team ensures everything is QA/QC
    • This system is fee based to store, free to access
    • If you are interested in using the system the fee covers data storage and help with planning for the data management
    • DOIs are issued for each dataset and the team helps distribute it
  • Gulf wide organizations are included in the repository
    • 3614 datasets
    • Total of 172 TB
    • 50,000+ downloads
  • GoMRI was the first organization to require open data, now everyone is requiring that you share data or make it open
  • FAIR guiding principles - Findable, accessible, interoperable, reuseable
    • Searchable
    • Metadata
    • Google dataset search
    • Standardize keywords
    • Collects POC for each dataset
    • Follows ISO 19115-2 metadata standards
  • TRUST principles - Transparency, responsibility, user focus, sustainability, technology
    • GRIIDC uses 2 virtual machines that host the data
    • When datasets >25 GB they are stored on AWS
    • Data backups are on all systems so no data is lost
  • GRIIDC has devised their own standardized keywords
  • The GRIIDC monitoring page has been updated
    • There is a tracking status of data in the system that can be used by a data submitter or a user to see where the data is during the upload process
    • This is also where a user can download the report about the data from this page
  • There is also a map search option so that a user can see what data is available in the system in a specific area
    • Challenges
    • Some of the metadata attributes
      • Keywords – have to be backfilled as time goes on
      • There are free text areas that need to be standardized
    • Large datasets – curation takes time (curation= filling out the metadata information and making sure its human readable)
      • Storing data costs money
      • Takes time to curate the data
    • Compliance
      • Researchers have to know the rules of inputting metadata and what the requirements are
      • Data has to be QA/QCed
    • Collaboration
      • Data repositories
      • Researchers
        • Need buy in from the researchers to achieve curation
      • Funding agencies
        • Need to create the timelines for funding data archiving into repositories

Presentation 2 Part 1 - A New Coastal Data Ecosystem: How Florida’s Seafloor Mapping Initiative is Meeting Diverse User Needs

  • Two phase data collection
    • LiDAR to collect shallow data
      • 20-40 m deep
    • Multibeam for anything deeper
  • Prioritization
    • The team gave out a group of tokens to researchers that were placed along the coast to see where people wanted to collect data
    • Sites were then chosen based on highest selected sites
  • Lidar - 75,595 km2
    • LiDAR point cloud
    • Bare Earth DEM
    • GIS data
    • Mapping reports
  • Sonar – 64,382 km2
    • Sonar point cloud
    • Bathymetric attributes
    • Reports
  • Use case – aircraft carrier was sunk off the coast of Florida
    • Largest off coast artificial reef in the world
    • Currently a larger ship is being prepped to be sunk that will become the largest artificial reef in the world
  • Challenges
    • Funds were received by 2024
    • Funds need to be spent by 2026
    • All Lidar data has been collected
    • This coastal mapping program is just part of the work that is being done by the FDEP team
    • Technical issues
      • Hurricanes/ inclement weather
      • Size of data to use/download – it is very large and takes a long time to download and a lot of power to use
      • Lots of moving pieces

Presentation 2 Part 2 - Topo bathymetric – integration workflow integration improvements

  • Working with USGS to stitch the models together
  • Really strong metadata that will show details of how and when the data was collected
    • This was collected at the time the data was collected with some holes that will need to be filled during QA/QC
  • There is a dashboard on the FDEP hub site that shows this work
  • Data is going to be available next summer
  • The project team wanted the inland bathymetry to be included in the maps
    • Terrestrial and the bathymetry have been connected if there is terrestrial data available
    • Florida GIO.gov
      • Initiatives – maps are available along with data timelines
      • Jim’s information is there as well
  • This project has won several awards nationally and internationally
  • Questions -
    • Project off the coast of Destin – need to collaborate on where the break spot was (20-30 miles out, which is outside of the project area)
    • Sediment identification – vendors coming in to do that or what are the next steps
      • FSU coming in to do some backscatter data and would be doing some of that work
      • NOAA has done a lot of the pan handle data

Presentation 3 - Fine-Tuned Large Language Models for Natech Analytics

  • Environmental hazards – Air pollutions, water insecurity
    • Corpus Christi recently experienced a water shortage
    • This is important as there was a lack of quality, accessibility, and affordability of water
  • Natechs – technological accidents (like natural disasters, but caused by technology centers)
    • Unpermitted release of pollutants into neighborhoods
    • Need to capture these unpredicted events that are not captured in traditional models
  • There are no current models that look at Natechs
    • The project team wanted to look to see if there was a correlation between natechs happening and natural disasters in countries
      • In other words do countries with higher natural disasters tend to have higher natechs
    • Looking at climate effects on localized environmental health disparities on overburdened communities
    • LLM
      • There are a lot of documents and reports that are available regarding the impacts of natural disasters and when Natechs happen
      • A pretrained model can be used to look through this data
      • First had to work out which prompts to use for the LLM to get outputs that would answer the questions the team was trying to answer
      • To fine tune the LLM the project team used Meta AI
    • Fine tuning got the accuracy of the LLM to 0.958 with a precision 0.957
    • When a pretrained model had too much data which impacted the precision and accuracy
    • What are the hazards that are being triggered
      • Natech cause 6% of emission incidents
      • Lightning strikes and freezing events could also trigger natechs
      • Strong seasonal trends seem to trigger natechs impacts
        • Meaning there is more preparedness that needs to happen during certain times
      • Gulf Wide vs. Just Texas
        • Hurricanes at the Gulf Wide view were a major trigger of natech events
        • Gulf coast is facing a higher excessive emissions (14%) than just Texas
  • Takeaways
    • Fine tuned LLM – turns air emission narratives into structured natech analytics
    • Two decadal analysis of Texas natechs
    • AI-assisted Natech research and management
  • Model what are the issues from climate disasters,
    • how can we measure future air pollution/water insecurity,
    • what are the health impacts of these things
    • solutions need to be designed and implemented
  • need data input from our local partners
    • brought together Texas A&M University to help with the data collection
  • This prototype is the starting point and can be applied more broadly
1 Like