Gulf Con 2026 Presentation Notes

bjensen · June 4, 2026, 5:10pm

Hey All!
It was great seeing everyone at Gulf Con a couple weeks ago! I am copying the notes for all of our Data Management Sessions below! Please reply to this thread or feel free to add any additional details here if there was something that I missed. Thanks!

Presentation 1 - GRIIDC: How an established repository evolves with changing technology and standards

GRIIDC is a multidisciplinary data repository
- Making data publicly available through the HART institute at TAMUCC
- The team ensures everything is QA/QC
- This system is fee based to store, free to access
- If you are interested in using the system the fee covers data storage and help with planning for the data management
- DOIs are issued for each dataset and the team helps distribute it
Gulf wide organizations are included in the repository
- 3614 datasets
- Total of 172 TB
- 50,000+ downloads
GoMRI was the first organization to require open data, now everyone is requiring that you share data or make it open
FAIR guiding principles - Findable, accessible, interoperable, reuseable
- Searchable
- Metadata
- Google dataset search
- Standardize keywords
- Collects POC for each dataset
- Follows ISO 19115-2 metadata standards
TRUST principles - Transparency, responsibility, user focus, sustainability, technology
- GRIIDC uses 2 virtual machines that host the data
- When datasets >25 GB they are stored on AWS
- Data backups are on all systems so no data is lost
GRIIDC has devised their own standardized keywords
The GRIIDC monitoring page has been updated
- There is a tracking status of data in the system that can be used by a data submitter or a user to see where the data is during the upload process
- This is also where a user can download the report about the data from this page
There is also a map search option so that a user can see what data is available in the system in a specific area
- Challenges
- Some of the metadata attributes
  - Keywords – have to be backfilled as time goes on
  - There are free text areas that need to be standardized
- Large datasets – curation takes time (curation= filling out the metadata information and making sure its human readable)
  - Storing data costs money
  - Takes time to curate the data
- Compliance
  - Researchers have to know the rules of inputting metadata and what the requirements are
  - Data has to be QA/QCed
- Collaboration
  - Data repositories
  - Researchers
    - Need buy in from the researchers to achieve curation
  - Funding agencies
    - Need to create the timelines for funding data archiving into repositories

Presentation 2 Part 1 - A New Coastal Data Ecosystem: How Florida’s Seafloor Mapping Initiative is Meeting Diverse User Needs

Two phase data collection
- LiDAR to collect shallow data
  - 20-40 m deep
- Multibeam for anything deeper
Prioritization
- The team gave out a group of tokens to researchers that were placed along the coast to see where people wanted to collect data
- Sites were then chosen based on highest selected sites
Lidar - 75,595 km2
- LiDAR point cloud
- Bare Earth DEM
- GIS data
- Mapping reports
Sonar – 64,382 km2
- Sonar point cloud
- Bathymetric attributes
- Reports
Use case – aircraft carrier was sunk off the coast of Florida
- Largest off coast artificial reef in the world
- Currently a larger ship is being prepped to be sunk that will become the largest artificial reef in the world
Challenges
- Funds were received by 2024
- Funds need to be spent by 2026
- All Lidar data has been collected
- This coastal mapping program is just part of the work that is being done by the FDEP team
- Technical issues
  - Hurricanes/ inclement weather
  - Size of data to use/download – it is very large and takes a long time to download and a lot of power to use
  - Lots of moving pieces

Presentation 2 Part 2 - Topo bathymetric – integration workflow integration improvements

Working with USGS to stitch the models together
Really strong metadata that will show details of how and when the data was collected
- This was collected at the time the data was collected with some holes that will need to be filled during QA/QC
There is a dashboard on the FDEP hub site that shows this work
Data is going to be available next summer
The project team wanted the inland bathymetry to be included in the maps
- Terrestrial and the bathymetry have been connected if there is terrestrial data available
- Florida GIO.gov
  - Initiatives – maps are available along with data timelines
  - Jim’s information is there as well
This project has won several awards nationally and internationally
Questions -
- Project off the coast of Destin – need to collaborate on where the break spot was (20-30 miles out, which is outside of the project area)
- Sediment identification – vendors coming in to do that or what are the next steps
  - FSU coming in to do some backscatter data and would be doing some of that work
  - NOAA has done a lot of the pan handle data

Presentation 3 - Fine-Tuned Large Language Models for Natech Analytics

Environmental hazards – Air pollutions, water insecurity
- Corpus Christi recently experienced a water shortage
- This is important as there was a lack of quality, accessibility, and affordability of water
Natechs – technological accidents (like natural disasters, but caused by technology centers)
- Unpermitted release of pollutants into neighborhoods
- Need to capture these unpredicted events that are not captured in traditional models
There are no current models that look at Natechs
- The project team wanted to look to see if there was a correlation between natechs happening and natural disasters in countries
  - In other words do countries with higher natural disasters tend to have higher natechs
- Looking at climate effects on localized environmental health disparities on overburdened communities
- LLM
  - There are a lot of documents and reports that are available regarding the impacts of natural disasters and when Natechs happen
  - A pretrained model can be used to look through this data
  - First had to work out which prompts to use for the LLM to get outputs that would answer the questions the team was trying to answer
  - To fine tune the LLM the project team used Meta AI
- Fine tuning got the accuracy of the LLM to 0.958 with a precision 0.957
- When a pretrained model had too much data which impacted the precision and accuracy
- What are the hazards that are being triggered
  - Natech cause 6% of emission incidents
  - Lightning strikes and freezing events could also trigger natechs
  - Strong seasonal trends seem to trigger natechs impacts
    - Meaning there is more preparedness that needs to happen during certain times
  - Gulf Wide vs. Just Texas
    - Hurricanes at the Gulf Wide view were a major trigger of natech events
    - Gulf coast is facing a higher excessive emissions (14%) than just Texas
Takeaways
- Fine tuned LLM – turns air emission narratives into structured natech analytics
- Two decadal analysis of Texas natechs
- AI-assisted Natech research and management
Model what are the issues from climate disasters,
- how can we measure future air pollution/water insecurity,
- what are the health impacts of these things
- solutions need to be designed and implemented
need data input from our local partners
- brought together Texas A&M University to help with the data collection
This prototype is the starting point and can be applied more broadly