Reading HDF/netCDF Data using IDL

In this seminar we will explore the commands available for exploring HDF and netCDF data sets and retrieving data using IDL. Then we will look at procedures for automatically printing all relevant information from a file (using a MODIS file for HDF-EOS and a GOES file for netCDF), and a second procedure for extracting data once you know what you are looking for.

HDF/netCDF files typically contain several datasets. The entire file and each dataset within the file is accompanied by a number of descriptive fields called attributes. Importing your desired dataset from an unfamiliar file into IDL will require the following steps:

  1. Open the file and asign it a file ID.
  2. Find the number of file attributes and datasets.
  3. Read the file attributes.
  4. Select a dataset and assign it a dataset ID.
  5. Find the number of dataset attributes.
  6. Read dataset attributes.
  7. Import the dataset.
Once you are completely familiar with the file steps 2, 3, 5 and 6 may be omitted. Required syntax will be designated by bold script, user supplied variables by italics. The user supplied variables may be displayed by using the print command. I will include examples as though you were typing it directly while in the IDL environment.

Reading Data in HDF

1. Open the file and assign it a file ID

    fileID = HDF_SD_Start('filename', /read)

If you have assigned the actual filename to a variable (file = 'my_file.hdf'), you do not need the quotes. When you are completely through with the file you should close it using the HDF_SD_End, fileID command.

2. Find the number of file attributes and datasets

    HDF_SD_FileInfo, FileID, num_datasets, num_attributes

3. Read the file attributes

    HDF_SD_AttrInfo, FileID, attribute_index, name = attr_name, data = attr_data

4. Select a dataset and assign it a dataset ID

If you already know the name of the dataset, then you may use

    dataset_index = HDF_SD_NameToIndex(fileID, dataset_name)

Then you use this index to assign a dataset ID:

    datasetID = HDF_SD_Select(fileID, dataset_index)

if you don't yet know the exact name of the dataset, you'll have to explore them one by one (steps 5 and 6). The datasets are zero indexed; for example, EOS data starts with the geolocation data so that longitude is index = 0, latitude is index = 1, etc.

5. Find the number of dataset attributes

    HDF_SD_GetInfo, datasetID,name = dataset_name, natts = num_attributes, $
               ndim=num_dims, dims =dimvector

In which either the variables assigned with the equals signs are optional

6. Read the dataset attributes

    HDF_SD_AttrInfo, datasetID, attribute_index, name = attr_name, data = attr_data

Note that this is the same syntax used to read the global file information. You will want to get the scale factor and offset from the attribute data to convert from the integerized data to the true data (see below).

7. Import the selected dataset

    HDF_SD_GetData, datasetID, data_variable, $
                Start = [x, y, z], Count=[xdim,ydim,zdim], Stride=[xjump,yjump,zjump]

Where the Start, Count, and Stride variables are optional.

If the data is in integerized form, you will need to convert it to the true values:

    true_data = scale*(filedata - offset)

Now you're done!

Procedures for Exploring and Importing data from HDF-EOS Files

Both sample files discussed below may be copied from the MetoGrads directory into a newly created file 'hdfsem' by the following UNIX commands:

mkdir hdfsem
cp ~gcm/hdf_netcdf/*.pro hdfsem/

The modis_sds.pro procedure will loop through all the attributes for each data set in an hdf file, and print the information out to a txt file of your choice. When you run the program, you will be prompted to select the file of your choice from a directory. There is a cloud file in the MetoGrads directory. To print information about it into the file 'cloud_info.txt' issue the following command from the IDL command line:

modis_sds,'cloud_info.txt'

When the file selection window appears, edit the Filter box (top of the window) to read '/homes/metogra/gcm/hdf_netcdf/*.hdf' and hit the 'filter' button at the bottom of the window. You'll see a hdf file on the right side. Select this with the mouse and hit the 'okay' button at the bottom of the screen. IDL will tell you the information has been saved in 'cloud_info.txt'. You can open this file to find what the variables are named so that you can use the exact character string to extract the data.

The modread.pro procedure will read a specified variable from an HDF file, convert it from integerized form, and also provide information like the variable dimensions and the fill value (which it replaces with the IDL fill value !Values.F_NaN). Let's look at the 'Cloud_Top_Temperature' variable. The procedure requires a file name, but instead of spelling the whole thing out, let's pick out out of a list and store it in a variable called 'filename':

filename = dialog_pickfile(filter='*.hdf')

Now we can read the data and put it in the variable Tcld using the following command:

modread, filename, Tcld, 'Cloud_Top_Temperature', dims, fillvalue

Now that the variable is stored in Tcld, you can explore it using IDL.


Reading Data in netCDF

Much of this will look similar to the HDF methodology, so some of the commentary is reduced. Those who want a fuller explanation may refer to similar sections above.

1. Open the file and assign it a file ID

    fileID = ncdf_open('filename', /read)

When you are completely through with the file you should close it using the ncdf_close, fileID command.

2. Find the number of file attributes and datasets (or variables). The information will be contained in the structure variable that we have named 'fileinq_struct', but you may give it any name you wish so long as you use the proper record names.

    fileinq_struct=ncdf_inquire(fileID)

nvars = fileinq_struct.nvars
natts = fileinq_struct.natts

3. Read the file (global) attributes

    global_attname=ncdf_attname(fileID,attndx,/global)
    ncdf_attget,fileID,global_attname,value,/global

4. Use the variable index to get the name, dimensions, and number of attributes

    varinq_struct=ncdf_varinq(fileID,varndx)
    variable_name = varinq_struct.name
    dimensions = varinq_struct.dims
    numatts = varinq_struct.natts

Note that the lack of a NameToIndex function such as found in HDF means you'll have to explore the variables one by one by index in order to find the one you want.

5. Read the variable attributes

First get the name of the attribute by index

    attname=ncdf_attname(fileID,varndx,attndx)

Now read the attribute

    ncdf_attget,fileID,varndx,attname,value

Note how this uses the same command as for getting global attributes, but the variable index must be included when the /global switch is not set.

6. Get an ID for the variable

    varID=ncdf_varid(fileID,varname)

7. Import the selected dataset

    ncdf_varget,fileID,varID,variable

If the data is in integerized form, you will need to convert it to the true values using the scale factor and offset that (hopefully) is stored in the attributes for the variable.

Now you're done!

IDL Procedures for Exploring and Reading netCDF Data

You will need to copy the goes data file to your directory, as well as the IDL procedures 'ncdfshow.pro' and 'ncdfread.pro'.

The ncdfshow.pro procedure will find the number of variables, the number of attributes per variable, and loop through them in order to write them into a text file that you specify. To run it from the IDL command line type:

   ncdfshow,'fileinfo.txt'

Where you should typically provide something more descriptive than 'fileinfo' to write the result into. You will be prompted to select a file, so select the GOES data file provided. After it runs, open fileinfo.txt and see what information is in the file.

Now that you know the name of the variable you want to look at, you can run the ncdfread.pro procedure to get the data in raw form. You will want to put the actual filename into a variable that's a little easier to handle first.

   filename = dialog_pickfile()
    ncdfread,filename,'variable_name',data_variable,dims

Where 'variable_name' is a string that must match exactly the name you found in 'fileinfo.txt', data_variable is where the data ends up, and dims is a vector of the array dimensions.

Now you have the data in an IDL variable, you may need to use any offset and scale information found in the attributes to scale it to a physically meaningful quantity.