Transmission retrieval: WASP-39b JWST¶
This tutorial shows how perform an atmospheric retrieval of the transmission spectra of WASP-39b, constrained by the JWST observations. We will replicate the analysis presented in Welbanks et al., that is, a retrieval of:
the combined NIRISS, NIRCam, NIRSpec, and MIRI observations, with fixed offsets
1D atmosphere with free VMRs (constant-with-altitude)
a Madhu temperature profile, T(p)
a Rayleigh opacity model
a gray, patchy cloud-deck model
We can break the analysis into the following steps:
Setup¶
For the setup we will need three ingredients:
A configuration file to define the system parameters, atmospheric model, posterior sampling, etc.
An observation file defining the data points: depths, uncertainties, and bin wavelengths
Cross-section files for the atmospheric species
Lets start with the required input files, and then go over the configuration file.
Observation file¶
Pyrat Bay observation files tell the code what data is being fit.
These are a plain text files containing the transit depth,
uncertainty, central wavelength (um), bin half-width (um), and label.
There is always one data point per row. Each data point is being
assumed as a top-hat passband within the respective wavelength range.
Below you can find an extract of the WASP-39b JWST transit data. Click the link to see/download the entire file. Note in the header that comments are allowed, and there are two special flags that let users define the depth units and where the data starts.
# Comment lines (like this one) and blank lines are ignored,
# central-wavelength and half-width units are always microns
# @DEPTH_UNITS sets the depth and uncert units (e.g., none, percent, ppt, ppm)
@DEPTH_UNITS
ppm
# depth depth_err wl half_width instrument
@DATA
20667.1 274.2 0.52132 0.00425 nirspec_prism
20940.2 201.4 0.53004 0.00446 nirspec_prism
21186.6 160.5 0.53919 0.00469 nirspec_prism
21296.6 140.8 0.54883 0.00494 nirspec_prism
21068.6 127.9 0.55899 0.00522 nirspec_prism
21401.9 120.0 0.56972 0.00551 nirspec_prism
21323.3 115.7 0.58106 0.00583 nirspec_prism
21600.7 110.7 0.59308 0.00619 nirspec_prism
21188.3 109.9 0.60584 0.00657 nirspec_prism
21165.7 108.5 0.61940 0.00700 nirspec_prism
20782.0 102.8 2.16754 0.02619 nirspec_prism
Cross sections¶
Following the analysis of Welbanks et al., we will include line-sampled cross sections for these molecules: H₂O, CO, CO₂, CH⁴, SO₂, H₂S, HCN, NH₃, and C₂H₂. Here we will work with the latests opacity sources for these species from ExoMol and HITEMP.
The current recommendations for sampled cross sections for JWST retrievals are a resolution \(R>20.000\). So, here we will use a cross section grid at \(R=25.000\), sampling from \(0.5-12\) μm in wavelength (to cover the spectral range of the data), from \(500-2500\) K in temperature, and from \(100-1.0^{-9}\) bar in pressure.
Now, beware that cross section files have many assumptions baked into them. In addition, one might need to adjust the ranges or sampling resolution of the grid for specific project. Thus, below there are two options, (a) download and use already made the cross-section files (b) compute your own cross sections starting from the line-list files (where you can customize at will).
The Zenodo repository doi.org/10.5281/zenodo.16965391 contains the cross-section files that we will use for this JWST atmospheric retrieval. See the list below for direct links to the files for each molecule.
These cross sections have been computed assuming an H₂/He-dominated atmosphere, and terrestrial isotopic ratios. The lines have Voigt profiles with a wing cut-off at 300 HWHM and at 25 cm-1. The grids sampling are:
Wavelength: \(0.15-33\) μm, at a constant resolution of \(R=25.000\)
Temperature: \(200-5000\) K, with \(\Delta T = 150\) K
Pressure: \(1.0^{-9}-1.0^{3}\) bar, equally sampled in log(p) with 4 samples per dex.
Species (source) |
References |
|---|---|
H2O (exomol, pokazatel) |
|
CO (HITEMP, li) |
|
CO2 (ames, ai3000k) |
|
CH4 (exomol, mm) |
|
SO2 (exomol, exoames) |
|
H2S (exomol, ayt2) |
|
HCN (exomol, harris larner) |
|
NH3 (exomol, coyute) |
|
C2H2 (exomol, acety) |
Note
If you want to see the source script or need to customize the cross sections (e.g., broader temperature ranges, finer resolution, different line profiles), follow the steps in the ‘Compute cross sections’ tab.
TBD
Configuration file¶
Lastly, the configuration file will put together the inputs, define the atmospheric model, and configure the retrieval options. Here below is the file we will use for the JWST observation of WASP-39b.
Click here to show/hide: wasp39b_retrieval_transit_jwst.cfg
[pyrat]
# Pyrat Bay run mode [tli pt atmosphere spectrum radeq opacity mcmc]
runmode = retrieval
# Output file names
logfile = ret_wasp39b_fiducial/WASP39b_jwst_fiducial.log
# Verbosity level [1--3]:
verb = 2
# Observing geometry, select between: [transit eclipse emission]
rt_path = transit
# The observations
dunits = ppm
obsfile = ../inputs/obs_wasp39b_transit_jwst.dat
# Wavelength sampling
wl_low = 0.499 um
wl_high = 12.0 um
# System parameters
rstar = 0.9096 rsun
mstar = 0.9065 msun
tstar = 5512.0
rplanet = 1.2959 rjup
mplanet = 0.266 mjup
# +/- 0.033 mjup
smaxis = 0.0486 au
refpressure = 0.1 bar
# Atmospheric model:
nlayers = 81
ptop = 1e-9 bar
pbottom = 1e2 bar
# Temperature-profile model [isothermal guillot madhu]
tmodel = madhu
#tpars = 1000.0
# Atmospheric composition and abundances:
chemistry = free
species =
H2 He Na K H2O CH4 CO
CO2 HCN NH3 H2S SO2 C2H2
uniform_vmr =
0.85 0.1445 1.000e-10 1.000e-10 1.000e-10 1.000e-10 1.000e-10
1.000e-10 1.000e-10 1.000e-10 1.000e-10 1.000e-10 1.000e-10
# Radius model (hydrostatic equilibrium)
radmodel = hydro_m
# Line-sampled cross sections
sampled_cross_sec =
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_H2O_exomol_pokazatel.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_CO_hitemp_2019.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_CO2_ames_ai3000k.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_SO2_exomol_exoames.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_H2S_exomol_ayt2.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_HCN_exomol_harris_larner.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_C2H2_exomol_acety.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_NH3_exomol_coyute.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_CH4_exomol_mm.npz
# Continuum cross sections
continuum_cross_sec =
{ROOT}pyratbay/data/CIA/CIA_Borysow_H2H2_0060-7000K_0.6-500um.dat
{ROOT}pyratbay/data/CIA/CIA_Borysow_H2He_0050-7000K_0.5-031um.dat
# Alkali opacity, select from: [sodium_vdw potassium_vdw]
alkali =
sodium_vdw
potassium_vdw
# Clour models [lecavelier deck]
clouds =
deck 2.0
lecavelier 0.0 -4.0
# Rayleigh models, select from: [rayleigh_H rayleigh_He rayleigh_H2]
rayleigh =
rayleigh_H2
rayleigh_He
# VMR parameters (constant with altitude VMRs)
vmr_vars =
log_H2O
log_CO2
log_CO
log_SO2
log_H2S
log_HCN
log_CH4
log_NH3
log_C2H2
log_Na
log_K
# Bulk species
bulk = H2 He
# Param name value lo_bound hi_bound step prior prior_sigma
retrieval_params =
log_p1 -4.0 -9.0 2.0 0.3
log_p2 -7.2 -9.0 2.0 0.3
log_p3 -1.0 -2.0 2.0 0.3
a1 1.50 0.02 2.0 0.02
a2 0.35 0.02 2.0 0.02
T0 850.0 800.0 1300.0 30.0
M_planet 0.266 0.1 0.43 0.05 0.266 0.033
log_p_ref -1.0 -9.0 2.0 0.3
log_H2O -1.70 -12.0 -0.3 0.3
log_CO2 -3.00 -12.0 -0.3 0.3
log_CO -2.30 -12.0 -0.3 0.3
log_SO2 -4.05 -12.0 -0.3 0.3
log_H2S -3.15 -12.0 -0.3 0.3
log_HCN -6.00 -12.0 -0.3 0.3
log_CH4 -5.00 -12.0 -0.3 0.3
log_NH3 -5.00 -12.0 -0.3 0.3
log_C2H2 -8.00 -12.0 -0.3 0.3
log_Na -3.25 -12.0 -0.3 0.3
log_K -7.00 -12.0 -0.3 0.3
log_k_ray 3.8 -4.0 10.0 0.3
alpha_ray -2.0 -20.0 2.0 0.1
log_p_cl -1.0 -9.0 2.0 0.3
f_patchy 0.6 0.0 1.0 0.05
# Retrieval setup:
sampler = multinest
nlive = 1000
resume = True
theme = blue
data_color = black
post_processing = True
# Retrieval temperature boundaries:
tlow = 800
thigh = 2500
# If set, plot wavelength axes in log-scale, using these tick labels:
wl_ticks = 0.7 1.0 2.0 3.0 5.0 7.0 10.0
Lets break this down:
# Pyrat Bay run mode [tli pt atmosphere spectrum radeq opacity mcmc]
runmode = retrieval
# Output file names
logfile = ret_wasp39b_fiducial/WASP39b_jwst_fiducial.log
# Verbosity level [1--3]:
verb = 2
This first section defines what we want to run. runmode
indicates that we want a retrieval. logfile sets the path to
the output files. Note that logfile can contain a folder,
which will be created if needed. Finally, verb sets the
screen-output verbosity.
# Observing geometry, select between: [transit eclipse emission]
rt_path = transit
# The observations
dunits = ppm
obsfile = ../inputs/obs_wasp39b_transit_jwst.dat
# Wavelength sampling
wl_low = 0.499 um
wl_high = 12.0 um
Here we define the observing path of the observation (in this case we have a transit), the location of the observation file discussed above (and the desired output units)
wl_low and wl_high set the and the spectral range to model.
Note that the wavelenght sampling is partly set by the opacity
files (resolution and maximum wavelength coverage). One can trim
the wavelength ranges (as shown here) to extract only the region
covered by the observations. One can also lower the resolution
via a wl_thinning = n parameter, which will take every n-th
sample of the opacity files (with n an integer).
# System parameters
rstar = 0.9096 rsun
mstar = 0.9065 msun
tstar = 5512.0
rplanet = 1.2959 rjup
mplanet = 0.266 mjup
# +/- 0.033 mjup
smaxis = 0.0486 au
refpressure = 0.1 bar
And this next section defines the system parameters. For a transit run, the relevant properties will be the stellar radius and the planetary mass, radius, and reference pressure.
Note that this rplanet value is the reference altitute
situated at the refpressure pressure (this is the constrain
to compute the layer’s \(r(p)\) profile under hydrostatic
equilibrium). Also note that refpressure does not need to be
at one of the sampled layers (it can be anywhere in between the
atmosphere pressure range).
# Atmospheric model:
nlayers = 81
ptop = 1e-9 bar
pbottom = 1e2 bar
# Temperature-profile model [isothermal guillot madhu]
tmodel = madhu
#tpars = 1000.0
# Atmospheric composition and abundances:
chemistry = free
species =
H2 He Na K H2O CH4 CO
CO2 HCN NH3 H2S SO2 C2H2
uniform_vmr =
0.85 0.1445 1.000e-10 1.000e-10 1.000e-10 1.000e-10 1.000e-10
1.000e-10 1.000e-10 1.000e-10 1.000e-10 1.000e-10 1.000e-10
# Radius model (hydrostatic equilibrium)
radmodel = hydro_m
These parameters define the atmospheric-profile models. The pressure parameters are clear, the only constraint is that the bottom pressure must be covered by the opacity files. That is, it’s only possible to extrapolate to lower pressures (because then the opacities are in the Doppler broadening regime).
For the temperature profile we will use the [Madhusudhan2009] model. The parameters will be set below when discussing the retrieval parameters.
For the composition we will adopt constant-with-altitude VMR profiles, so we define the species to include and their initial VMR values.
Finally we set the radius-profile model, this is a hydrostatic-equilibrium model assuming a variable gravity depending on the mass of the planet \(g(r) = GM/r^2\).
# Line-sampled cross sections
sampled_cross_sec =
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_H2O_exomol_pokazatel.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_CO_hitemp_2019.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_CO2_ames_ai3000k.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_SO2_exomol_exoames.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_H2S_exomol_ayt2.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_HCN_exomol_harris_larner.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_C2H2_exomol_acety.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_NH3_exomol_coyute.npz
inputs/cross_section_0.15-33.0um_0200-5000K_R025K_CH4_exomol_mm.npz
# Continuum cross sections
continuum_cross_sec =
{ROOT}pyratbay/data/CIA/CIA_Borysow_H2H2_0060-7000K_0.6-500um.dat
{ROOT}pyratbay/data/CIA/CIA_Borysow_H2He_0050-7000K_0.5-031um.dat
# Alkali opacity, select from: [sodium_vdw potassium_vdw]
alkali =
sodium_vdw
potassium_vdw
# Clour models [lecavelier deck]
clouds =
deck 2.0
lecavelier 0.0 -4.0
# Rayleigh models, select from: [rayleigh_H rayleigh_He rayleigh_H2]
rayleigh =
rayleigh_H2
rayleigh_He
Now we define the atmospheric absorbers. Make sure that all absorber
species are defined in the atmospheric composition. Note that some of
these set constraints to the domain to be expored. The
sampled_cross_sec files set the maximum resolution, spectral
range, temperature range, and pressure. The continuum_cross_sec
files set temperature range constraints, but one can span beyond their
wavelength ranges.
On top of the line sampled opacities, we add Na and K alkali models from [Burrows2000], CIA, and H₂ and He Rayleigh opacities. For clouds we add a cloud-deck model and a parametric [Lecavelier2008] Rayleigh model.
vmr_vars =
log_H2O
log_CO2
log_CO
log_SO2
log_H2S
log_HCN
log_CH4
log_NH3
log_C2H2
log_Na
log_K
# Bulk species
bulk = H2 He
# Param name value lo_bound hi_bound step prior prior_sigma
retrieval_params =
log_p1 -4.0 -9.0 2.0 0.3
log_p2 -7.2 -9.0 2.0 0.3
log_p3 -1.0 -2.0 2.0 0.3
a1 1.50 0.02 2.0 0.02
a2 0.35 0.02 2.0 0.02
T0 850.0 800.0 1300.0 30.0
M_planet 0.266 0.1 0.43 0.05 0.266 0.033
log_p_ref -1.0 -9.0 2.0 0.3
log_H2O -1.70 -12.0 -0.3 0.3
log_CO2 -3.00 -12.0 -0.3 0.3
log_CO -2.30 -12.0 -0.3 0.3
log_SO2 -4.05 -12.0 -0.3 0.3
log_H2S -3.15 -12.0 -0.3 0.3
log_HCN -6.00 -12.0 -0.3 0.3
log_CH4 -5.00 -12.0 -0.3 0.3
log_NH3 -5.00 -12.0 -0.3 0.3
log_C2H2 -8.00 -12.0 -0.3 0.3
log_Na -3.25 -12.0 -0.3 0.3
log_K -7.00 -12.0 -0.3 0.3
log_k_ray 3.8 -4.0 10.0 0.3
alpha_ray -2.0 -20.0 2.0 0.1
log_p_cl -1.0 -9.0 2.0 0.3
f_patchy 0.6 0.0 1.0 0.05
Here we define which VMR abundances are considered free parameters and which are ‘bulk’ (i.e., filler) abundances. All other species’ VMRs will be kept fixed at their initial values as defined above.
And then we define the retrievals parameters, their initial values,
boundaries, and priors. Since here we will sample the posterior using
pymultinest [Feroz2009] [Buchner2014], the most important values are
the lower and upper boundaries (the initial value is irrelevant for
the retrieval). The step value determine which parameters are left
free to fit (step>0) and which are kept fixed at their initial
value (step=0, thus making it trivial to try runs with different
configurations).
If desired, one can also set Gaussian priors by specifiying the prior
value and uncertainty after the parameter’s step, as done here for
the planet mass.
Thus, in summary, this retrieval will fit:
temperature profile:
log_p1toT0parameters.mass and reference pressure:
M_planet(units as inmplanet) andlog_p_refin \(\log10(p/{\rm bar})\)aundances:
log_H2Otolog_Kvalues set the \(\log10({\rm VMR})\) for the given species.Rayleigh:
log_k_rayandalpha_rayset the strength and slope of the [Lecavelier2008] model.cloud deck:
log_p_clsets the cloud-top pressure in \(\log10(p/{\rm bar})\) units.cloud patchy factor:
f_patchya value between 0 and 1, which affects both cloud deck and [Lecavelier2008] models.
# Retrieval setup:
sampler = multinest
nlive = 1000
resume = True
theme = blue
data_color = black
post_processing = True
# Retrieval temperature boundaries:
tlow = 800
thigh = 2500
# If set, plot wavelength axes in log-scale, using these tick labels:
wl_ticks = 0.7 1.0 2.0 3.0 5.0 7.0 10.0
Finally, we configure the posterior sampler. In this case we use
pymultinest [Feroz2009] [Buchner2014], with 1000 live points.
resume=True allows you to pick up a previous run and continue from
there.
tlow and thigh allow the code to set additional temperature
range constraints (beyond those set by the temperature-model
parameters).
theme and data_color allow you to customize the color of the
models and data points in the output plots. Any valid matplotlib
color
is a valid color.
The wl_ticks parameter has two effects: if set, it makes the code
to plot wavelengths axes in log scale with the given ticks (otherwise
defaults to a linear scale).
The post_processing = True parameter indicates to compute median
+/-1sigma, and +/-2sigma statistics out of the posterior distribution.
Note that this is a post-process step done after the posterior
sampling is finished. These statistics are computed for the spectra,
the temperature profiles, contribution functions, and VMRs (along with
plots of them). All these data will be neatly packed into a picke
file.
Retrieval run¶
To launch the retrieval run, we use the following command from the
prompt. Since we are using multinest, we will make use of its MPI
parallel-computing capability (thus, the prefix mpirun -n 64):
# Launch the retrieval with 64 parallel CPUs
mpirun -n 64 pbay -c wasp39b_retrieval_transit_jwst.cfg
You can adjust the number of CPUs according to your machine/cluster
limitations. Pyrat Bay internally uses shared memory to optimize
the memory demand.
That’s it. Now we wait until the run is over. This should take from one to a few days depending on your machine.
Retrieval outputs¶
TBD
Detection statistics¶
OK, we have now a posterior distribution for species on WASP-39b based on the JWST observations, for some there are well constrained VMRs, for others there are upper limits. We want now to assess the siginificance of each detection. For this we will run a series of leave-one-out retrievals with the same configuration as before, but removing opacity from one species at a time.
Here’s an extract of what changed in the cofiguration for the run without H₂O:
# Pyrat Bay run mode [tli pt atmosphere spectrum radeq opacity mcmc]
runmode = retrieval
# Output file names
logfile = ret_wasp39b_no_H2O/WASP39b_jwst_no_H2O.log
...
# Param name value lo_bound hi_bound step prior prior_sigma
retrieval_params =
log_p1 -4.0 -9.0 2.0 0.3
log_p2 -7.2 -9.0 2.0 0.3
log_p3 -1.0 -2.0 2.0 0.3
a1 1.50 0.02 2.0 0.02
a2 0.35 0.02 2.0 0.02
T0 850.0 800.0 1300.0 30.0
M_planet 0.266 0.1 0.43 0.05 0.266 0.033
log_p_ref -1.0 -9.0 2.0 0.3
log_H2O -10.0 -12.0 -0.3 0.0
log_CO2 -3.00 -12.0 -0.3 0.3
...
As you see, we only change two lines:
edit
logfileto set the proper name for the outputsfix the H₂O free parameter (
step=0) and set the initial value to a negligible VMR (-10.0)
And then, there are two more optimizations for a better efficiency:
to remove the H₂O file from
sampled_cross_secto consume less resourcesset
post_processing = False, this will prevent the code from running the post processing after the retrieval. We will do that in a separate call in a background process.
Here are sample config files for leave-one-out runs for H₂O, CO₂, and SO₂:
The recommendation is, since we want to run a series of retrievals, we write a script like the one below to concatenate one run after the other.
Note that we added the pbay --post ... calls after each
posterior sampling, with an ampersand at the end to rnu it in the
background. This is useful since the post-processing uses only a
single CPU and might take a few hours to complete. Putting it in
background allow us to launch each retrival right after the previous
one.
# Launch the retrievals, and then the post-processing in the background
mpirun -n 64 pbay -c wasp39b_retrieval_transit_jwst_no_H2O.cfg
pbay --post wasp39b_retrieval_transit_jwst_no_H2O.cfg &
mpirun -n 64 pbay -c wasp39b_retrieval_transit_jwst_no_CO2.cfg
pbay --post wasp39b_retrieval_transit_jwst_no_CO2.cfg &
mpirun -n 64 pbay -c wasp39b_retrieval_transit_jwst_no_SO2.cfg
pbay --post wasp39b_retrieval_transit_jwst_no_SO2.cfg &
Then to start the retrievals, run this command from the prompt:
# Launch leave-one-out retrievals
sh wasp39b_loo_retrievals.sh