Reading HDF data in Matlab
originally written by Brian Vant-Hull
This page will walk the reader through the process of converting HDF data into Matlab data arrays. How you proceed once you have the array depends on your knowledge of Matlab, and those who do not know how to use the various plot options are encouraged to consult the reference manuals. Most of what's here is in the Matlab help files, but is not explained in nearly enough detail: it took me nearly a full day to make the process work. Hopefully any readers who know how to plot an array of data in Matlab can now be looking at plots of HDF data within 15 minutes of receiving the file.
There are six steps in the importation process:
- Open the HDF file and assign it a file ID
- Extract information about the file description.
- Extract detailed information about the file itself.
- Assign dataset ID(s).
- Extract detailed information about the datasets.
- Import the dataset(s) you are interested in.
- Each step uses information from the previous step. Once you know your data set, steps 2, 3 and 5 may be skipped.
The single Matlab command "hdfsd" with various parameters is used for every command in the process. Italics will be used to represent user inputs, boldface will represent commands or syntax which must be typed exactly as shown. All spaces are optional. For each command I will also show an example and explanations.
Open HDF file and assign it a file ID
fileID = hdfsd('start','filename', 'read')
The filename is the actual HDF file. These are often long and cumbersome, so you have a chance to define a file ID which is shorter and easier to remember. The computer will issue a numerical file ID which can be used interchangeably with the ID you came up with. For all the examples below, I'm interested in studying aspects of clouds in a modis file. The actual filename is impossibly long, so let's suppose I'm trying to look at a file named MOD06_20223.hdf. I might issue the command
>> CloudID = hdfsd('start', 'MOD06_20223.hdf','read')
Note the use of quotation marks. The computer comes back with
CloudID = 393216
The computer uses this number to identify the file for this session, while we will use "CloudID".
Extract information about the file description
[numdatasets, numdescripters, status] = hdfsd('fileinfo', fileID)
You may name the parameters in the square braces what you please, but they represent the number of datasets, the number of sets of descriptive information (which is referred to as "attributes"), and the status, which has no use that I can see and can be omitted. Here's my example:
>> [numdata, numdescr] = hdfsd('fileinfo', CloudID)
Note that there are no quotation marks around CloudID. The computer responds:
numdata = 44
numdescr = 8
Somehow you have to figure out which of these 44 datasets are the ones you want. One of these 8 descriptors (or "attributes" ) will have the information. This brings us to the next step.
Extract detailed information about the file itself
DescripterN = hdfsd('readattr', 'fileID', n)
Where you may have to figure out which attribute number is the one you want. For HDF, all numbered lists are assumed to start at zero. Assume I know Brightness Temperature is somewhere in the file. Let's experiment and see what happens:
>> descriptor0 = hdfsd('readattr', CloudID, 0)
The computer responds: HDFEOS V2.4
This isn't what we want. So we try the next one:
>> descriptor1 = hdfsd('readattr', ,CloudID, 1)
This time the computer comes back with several screens full of information. Part of it is
DataFieldName = "Brightness_Temperature"
DataType = DFNT_INT16
DimList = ("cell_along_swath_5km", "cell_along_swath_5km")
Now we know how to get brightness temperature (along with all the other things in the list). It's data set 8. We need to set up an ID for it.
Assign Dataset ID(s)
DatasetID = hdfsd('select', fileID, DataIndex)
Where DataIndex comes from the data set number you found in the previous step. But since everything starts counting from the geofields latitude (0) and longitude (1) you'll have to add 1 to the data index. For example:
>> BTid = hdfsd('select', CloudID, 9)
The computer responds: BTid = 39824
Once again, we will use our own data set ID rather than the computer generated number. Let's see if we did it right:
Extract Data Set Information
[name, numdim, dimvector, datatype, numdescriptors, status] = hdfsd('getinfo',DatasetID)
Most of these make sense. You are looking at an array of 2 or more dimensions: the number of dimensions is numdim, and the size of each array dimension is dimvector. In my cloud example I'll shorten it a little:
>> [name,numdim,dimvector,type,numdescr] = hdfsd('getinfo', BTid)
The computer responds:
name = Brightness Temperature
numdim = 3
dimvector = 7 406 270 type = int16
numdescr = 10
We have to be careful to realize that the dimvector is backwards: the number of points in each direction is z = 7, y = 406, x = 270. Don't forget! You may use "hdfsd('getattr',DatasetID, n)" to get more detailed information, but it's not described fully in the Matlab help files, and except for the units it's not easy to decipher what the output means. Further investigation is needed.
Read HDF Datasests and Import into Matlab
DatasetVar = hdfsd('readdata', datasetID, startvector, stridevector, endvector)
Where the vectors are the points in each dimension where you start, the distance you move in each step (number of datapoints, not actual distance), and where you stop. I've had problems making the stridevector work right for 3 dimensional arrays where I just wanted to read two dimensions, but you can always import the entire dataset and then whittle it down in Matlab.
For my example we'll need to set up the vectors first. Remember you can name them whatever you want.
>> startvector = [0 0 0]
>> endvector = dimvector (as defined from the previous command!)
>> stride =  (this reads all the data)
>> BTvar = hdfsd('readdata', BTid, startvector, stride, endvector)
You now have the 3 dimensional array "BTvar" in Matlab!