Reading HDF Data using IDL

In this seminar we will explore the commands available for exploring HDF data sets and retrieving data using IDL. Then we will look at a procedure for automatically printing all relevant information from an HDF-EOS file, and a second procedure for extracting data once you know what you are looking for.

HDF files typically contain several datasets. The entire file and each dataset within the file is accompanied by a number of descriptive fields called attributes. Importing your desired dataset from an unfamiliar file into IDL will require the following steps:

  1. Open the file and asign it a file ID.
  2. Find the number of file attributes and datasets.
  3. Read the file attributes.
  4. Select a dataset and assign it a dataset ID.
  5. Find the number of dataset attributes.
  6. Read dataset attributes.
  7. Import the dataset.
Once you are completely familiar with the file steps 2, 3, 5 and 6 may be omitted. Required syntax will be designated by bold script, user supplied variables by italics. The user supplied variables may be displayed by using the print command. I will include examples as though you were typing it directly while in the IDL environment.

1. Open the file and assign it a file ID

    fileID = HDF_SD_Start('filename', /read)

If you have assigned the actual filename to a variable (file = 'my_file.hdf'), you do not need the quotes. When you are completely through with the file you should close it using the HDF_SD_End, filename command.

2. Find the number of file attributes and datasets

    HDF_SD_FileInfo, FileID, num_datasets, num_attributes

3. Read the file attributes

    HDF_SD_AttrInfo, FileID, attribute_index, name = attr_name, data = attr_data

4. Select a dataset and assign it a dataset ID

If you already know the name of the dataset, then you may use

    dataset_index = HDF_SD_NameToIndex(fileID, dataset_name)

Then you use this index to assign a dataset ID:

    datasetID = HDF_SD_Select(fileID, dataset_index)

if you don't yet know the exact name of the dataset, you'll have to explore them one by one (steps 5 and 6). The datasets are zero indexed; for example, EOS data starts with the geolocation data so that longitude is index = 0, latitude is index = 1, etc.

5. Find the number of dataset attributes

    HDF_SD_GetInfo, datasetID,name = dataset_name, natts = num_attributes, $
               ndim=num_dims, dims =dimvector

In which either the variables assigned with the equals signs are optional

6. Read the dataset attributes

    HDF_SD_AttrInfo, datasetID, attribute_index, name = attr_name, data = attr_data

Note that this is the same syntax used to read the global file information. You will want to get the scale factor and offset from the attribute data to convert from the integerized data to the true data (see below).

7. Import the selected dataset

    HDF_SD_GetData, datasetID, data_variable, $
                Start = [x, y, z], Count=[xdim,ydim,zdim], Stride=[xjump,yjump,zjump]

Where the Start, Count, and Stride variables are optional.

If the data is in integerized form, you will need to convert it to the true values:

    true_data = scale*(filedata - offset)

Now you're done!

Procedures for Exploring and Importing data from HDF-EOS Files

Both sample files discussed below may be copied from the MetoGrads directory into a newly created file 'hdfsem' by the following UNIX commands:

mkdir hdfsem
cp ~gcm/hdf_netcdf/*.pro hdfsem/

The modis_sds.pro procedure will loop through all the attributes for each data set in an hdf file, and print the information out to a txt file of your choice. When you run the program, you will be prompted to select the file of your choice from a directory. There is a cloud file in the MetoGrads directory. To print information about it into the file 'cloud_info.txt' issue the following command from the IDL command line:

modis_sds,'cloud_info.txt'

When the file selection window appears, edit the Filter box (top of the window) to read '/homes/metogra/gcm/hdf_netcdf/*.hdf' and hit the 'filter' button at the bottom of the window. You'll see a hdf file on the right side. Select this with the mouse and hit the 'okay' button at the bottom of the screen. IDL will tell you the information has been saved in 'cloud_info.txt'. You can open this file to find what the variables are named so that you can use the exact character string to extract the data.

The modread.pro procedure will read a specified variable from an HDF file, convert it from integerized form, and also provide information like the variable dimensions and the fill value (which it replaces with the IDL fill value !Values.F_NaN). Let's look at the 'Cloud_Top_Temperature' variable. The procedure requires a file name, but instead of spelling the whole thing out, let's pick out out of a list and store it in a variable called 'filename':

filename = dialog_pickfile(filter='*.hdf')

Now we can read the data and put it in the variable Tcld using the following command:

modread, filename, Tcld, 'Cloud_Top_Temperature', dims, fillvalue

Now that the variable is stored in Tcld, you can explore it using IDL.