Ocean Hypercapnia Data Challenge

Play video

Stage: Submissions Close


We are seeking to work with talented scientists in any field that can come up with a better numerical approach to predict two oceanic state variables (Dissolved Inorganic Carbon (DIC) and Alkalinity (ALK)) that control carbon dioxide levels in the ocean.

Rewards & Prizes

We are excited to collaborate with the eventual winners to accelerate our collective knowledge about the potential threat of ocean hypercapnia. There are two different awards and prizes for those to participate.

1. The Peer Choice Award ($500)

Openness accelerates innovation and reproducibility.  If you have a numerical approach or idea on how we can better predict ocean carbon dioxide levels using the existing global data-set, then simply write a short synopsis or upload a short video on your approach. The winner of the 'Peer Award' will be determined by an open vote of Thinkable members.

2. The Challenge Award ($3000 & co-authorship)

This will be awarded to the individual or team who beats our predictions for DIC and ALK by the largest margin (quantified by RSE). The winners of this award will have the option if they choose to become a co-author on the follow-up publication to our Nature paper and any future papers that use their winning approach.    


1. The Peer Choice Award

Those wishing to submit an idea or technique for the 'Peer Choice Award' don't have to submit predictions but must submit their proposed approach to be considered by an open vote. The winner of the 'Peer Award' will be determined by the individual or team that proposes a numerical approach that gains the most votes by our researcher members.

2. The Challenge Award

We have tested our SOMLO approach using the 3 disclosed test datasets below, with a final RSE value of 11.4 umol/kg for DIC and 7.9 umol/kg for TA. The winning individual or team will be awarded who achieves the greatest improvement to our RSE values for both DIC and ALK. The minimum improvement to be awarded the overall challenge winner is 2 umol/kg for DIC and 1 umol/kg for ALK. Final entries will be evaluated on the combined RSE of data predictions using test_dataset_1,2,3 for both DIC and ALK. The overall winner will then need to share their scripts privately with us for final verification and awarding.

1. We do not require individuals or teams to have any formal qualifications or oceanography background to participate in this challenge. It is open to anyone.

2. Entrants are allowed to use any combination of predictor variables listed in the training data-sets (excluding DIC and/or ALK). They can also bring in other variable's not listed if they choose (e.g. satellite derived chlorophyll, n-vector, etc)

3. Please provide a paragraph summary of your approach with your final submission.

Global Ocean Data-sets

Training files

We have split up the worlds surface ocean database of DIC & ALK (~30,000 measurements) into three independent training datasets that contain coinciding predictor variables like latitude, temperature, salinity, nutrients, oxygen etc.

Each dataset includes the following variables:
Latitude (deg North)
Longitude (deg East)
data_number  (for assessing which set of variables each entry used)
Depth (metres)
Pressure (dB)
MLD (Mixed Layer Depth in metres)
Temperature (degrees Cel)
Salinity (psu)
Oxygen (umol/kg)
Nitrate (umol/kg)
Silicate (umol/kg)
Phosphate (umol/kg)
DIC_input (Dissolved Inorganic Carbon in umol/kg)
TA_input (Alkalinity in umol/kg)

Test Data

There are three different sets of independent data-sets to predict DIC and ALK from your approach using the equivalent training data-sets above. Each represents ~10% of the global data-sets and are not included in the training data.

How to submit?

After you have predicted DIC and ALK for each of the three different testing data sets above using your numerical approach, combine your final predictions into one csv file for DIC and ALK separately and include them in a dropbox, google drive etc link within your submission.

Here are our final numbers as an example submission using a dropbox link:

What was our approach to predict DIC and ALK?

Our data-analysis was performed using R and combined the use of a neural network clustering algorithm and a principle-component regression. For DIC predictions, the optimal parameter set was temperature, salinity, phosphate & oxygen, while for ALK, our parameter set was salinity, oxygen, phosphate and silicate.  Click here to watch a brief summary of our approach and to download the open-access paper that details our approach.

 Funding Available