[Previous Page | Next Page | NeXus Home Page | NeXus Site Map]

NeXus File Structure

The following sections describe the basic structure and hierarchy of all NeXus files. The reason for imposing some discipline in the data organization is so that people can easily navigate the files and understand their contents without external documentation. As far as possible, NeXus files should be self-describing, at least to those familiar with the experimental technique. The NeXus file structure is an agreement on what information is included and where, and in no way removes the basic portability of HDF files; generic HDF-aware visualization tools, file content listings and editing programs, etc., should still read the files without problem.

The following sections discuss in general terms the types of data objects stored in NeXus files and their organization. The section on NeXus contents will describe in more detail what goes where.



[Previous Page | Next Page | NeXus Home Page | NeXus Site Map]

NeXus Objects

NeXus data files contain two types of entity: data items and data groups.

  1. Data items can be scalar values or multidimensional arrays of a variety of sizes (1-byte, 2-byte, 4-byte, 8-byte) and types (characters, integers, floats). If extra information is required to describe a particular data item, for example to specify the data units, it can be attached to the data object in the form of data attributes.
  2. Data Groups are like folders that can contain a number of data items and/or other groups. In fact, a NeXus file can be viewed as a computer file system. Just as files are stored in folders (or subdirectories) to make them easy to locate, so NeXus data items are stored in groups. The group hierarchy is designed to make it easy to navigate a NeXus file.


NeXus Data

All NeXus data are stored as multidimensional arrays. These are equivalent to Scientific Data Sets in HDF4 and simple datasets in HDF5. The multidimensional data can have any data type (floating point, integer, character string etc.) and can be endowed with arbitrary attributes such as labels, units, calibration offsets etc.

In HDF4, we use SDS's even for storing scalar values even though this is much less efficient than using HDF Vdata. This is because we consider it important that all data can have at least the "unit" attribute defined, and the original versions of HDF4 did not support Vdata attributes. We were also concerned about the extra complexity in the NeXus API entailed by using more than one type of data object. In the HDF4 version, we make use of some "tricks" for improving the storage efficiency of SDS's, such as preventing the creation of default dimension scales, and increasing the size of the HDF header blocks. This is not a problem in the HDF5 version, which has a unified data model similar to the HDF4 SDS.

NeXus Attributes

Attributes are extra information that are associated with particular data sets. They are used to annotate the data, e.g. with units or calibration offsets, and may be of any data type. In addition, NeXus defines other attributes that will be used to identify primary data signals, plotting axes etc. Finally, NeXus files will themselves be annotated with global attributes used to identify the NeXus version, file owner, etc.


NeXus Groups

NeXus data are linked together in groups, which can be thought of as folders or directories in a file system. HDF allows data objects to be linked to more than one group without occupying any more storage space. This mechanism is analogous to the linking (or aliasing) of files in Unix (or Macintosh) file systems, although no one instance of such a link can be considered the "parent" of the others.

We use NeXus groups in order to make the layout of NeXus files easy to understand. The advantage of such a hierarchical organization is most evident when a lot of information, e.g. instrument descriptions, is stored. If there is minimal information available, some of the hierarchy will appear redundant but, once the principles of the NeXus file organization are understood, it should be easy to identify and retrieve any information that is stored.

In addition to their regular names, groups can have class names. This allows us to use some object-oriented concepts in designing NeXus files as discussed in more detail in the next section.


[Previous Page | Next Page | NeXus Home Page | NeXus Site Map]

NeXus Classes

NeXus groups can be assigned both names and classes. This allows us to use some object-oriented concepts in designing NeXus files. In particular, we use group classes to define the type of group object and its expected contents whereas the group name labels a particular instance of that object. In some cases, the groups will actually define physical objects, such as crystal monochromators or disk choppers. In others, the group will define a logical set of descriptive data.


Class Rules


[Previous Page | Next Page | NeXus Home Page | NeXus Site Map]

NeXus Layout

In this section, we describe the group structure of NeXus files. It is important to remember that it is not necessary for all these groups to be present in each NeXus file. However, if they are present, their locations should conform to the layout below. The actual contents of these groups will be described in a later section.


NeXus Groups

Although NeXus groups can have any descriptive name, their classes are restricted to those defined by the NeXus standard. These all have NX as a prefix. The following groups are examples of those typically found in each NeXus file. The links are to XML files in which the group contents are formally defined using the NeXus meta-DTD format.


NXentry
All the data, including instrument and sample descriptions, which logically make up a single scan or measurement. At many facilities, this corresponds to the entity that is defined by a single run number, which could be used to name the NXentry group. There can be many NXentry groups in each NeXus file.
NXdata The data to be plotted i.e. a single data set comprising the measurements along with the data errors, and the default axis scales and labels required to plot the data. There can be more than one NXdata group in each NXentry if there are several detector banks producing plottable data.
NXsample The information needed to define the physical state of the sample during the scan e.g. temperature, magnetic field, crystal mosaic.
NXinstrument The information needed to describe the instrument. In general, this group will contain several other groups describing the instrument components e.g. choppers, collimators, detectors. We give a few examples here of NeXus groups stored in the NXinstrument group. A comprehensive listing is given in the NeXus contents section.
NXsource The properties of the source that may be relevant for the experiment, such as the reactor power or the accelerator target material. This group could also contain logged data concerning the source performance such as the proton beam current.
NXchopper If users wants to store detailed instrumental descriptions with the NeXus data, they should store them in groups such as this. Each group corresponds to a beamline component of the instrument which has a defined position with respect to the sample position i.e. positive distances are downstream from the sample, negative are upstream. The sample has, by definition, a distance of zero.
NXdetector The information needed to describe the type, position, solid angle and efficiency of the detectors. It is also commonly used to store scattering angles and, in time-of-flight machines, the time-of-flight, since the time-gating of the measurements is a property of the detector counting chain.

The simplest HDF files conforming to the NeXus standard would consist of a single NXdata group contained within a single NXentry group. This would allow plotting programs to automatically select and plot the data but would not be sufficient for more detailed data analysis requiring instrumental and sample parameters.


[Previous Page | Next Page | NeXus Home Page | NeXus Site Map]

NeXus Data

One of the aims of the NeXus design was to make it possible for standard plotting applications to identify the plottable data automatically (or semi-automatically). Much of the data stored in NeXus files are meta-data i.e. not actual measurements but additional information required to interpret the data. In order to distinguish the actual measurements from this meta-data, it will be stored separately in groups with the class NXdata. The data defining the axis scales, i.e. the physical values corresponding to the data dimensions, will also be stored in the same group to make automatic plotting easier.


NXdata Rules


Technical Note
The NXdata group is only required because of limitations in the way HDF4 provides dimension scales. In HDF4, each dimension scale is a one-dimensional SDS which is linked to the data SDS by name rather than unique tag/reference pair. This means that it is not possible to have more than one dimension scale of the same name in the entire HDF file. This is an unacceptable limitation for the NeXus format. Instead, we identify dimension scales by their appearance in the same NXdata group and the value of their "axis" attribute. HDF5 has no standard way of identifying dimension scales

Identifying Plottable Data

Any program whose aim is to identify plottable data should use the following procedure.

  1. Open the first top level NeXus group with class NXentry.
  2. Open the first NeXus group with class NXdata.
  3. Loop through NeXus data items in this group searching for the item with attribute "signal" =1. This is the plottable data.
  4. Check to see if this data item has an attribute called "axes". If so, the names are defined as a comma-delimited string within this attribute in the C-order of the data array, and you can skip the next two steps.
  5. If the "axes" attribute is not defined, search for the one-dimensional NeXus data items with attribute "primary" = 1. These are the dimension scales to label the axes of each dimension of the data.
  6. Link each dimension scale to the respective data dimension by the "axis" attribute (= 1,2,...,rank of data).
  7. If necessary, close the NXdata group, open the next one and repeat steps 3 to 6.
  8. If necessary, close the NXentry group, open the next one and repeat steps 2 to 7.

Consult the NeXus API section, which describes the routines available to program these operations. In the course of time, generic NeXus browsers will provide this functionality automatically.


NeXus Histograms

Neutron counts are often stored as histograms. In time-of-flight measurements, the counts are collected in a set of contiguous time bins defined by the counting electronics. In position-sensitive detectors, the counts are collected in a set of contiguous detector pixels in either one or two dimensions. Although this is a very common state of affairs, very little software makes explicit use of the histogram nature of these counts even though it can make some operations, such as integration or unit conversion, much simpler to perform. It is not even easy to plot the data as histograms in many widely-used software packages.

As far as NeXus is concerned, the main problem that we must address is how to store histograms and their dimension scales. There are two issues: how to store the data (counts or distributions?), and how to store the dimensions scales (bin centers or bin boundaries?). Whichever we choose, it must be possible to read the data in the alternative form.

Data Storage

Counts are collected between lower and upper bin boundaries. However, what should be plotted are not the counts themselves, but the distributions that they represent i.e. counts/bin-width, since the counts represent an integral of the true distribution over the size of the bin. This becomes obvious when the bins are not all of equal size. In order to make the plotting of NeXus data as easy as possible, it is recommended that the data are stored as distributions automatically with the units specified accordingly e.g. counts/microsecond, counts/mm. However, the data may be stored as integer counts with units set to "counts". When the API is able to provide automatic unit conversion, the calling program will be able to specify what units the returned data should have.

Dimension Scale Storage

For contiguous histogram bins, there is one more bin bounary than the number of data points. Since very few plotting or analysis packages treat histogram data explicitly, we recommend that all dimension scales are stored as bin centers, but with the offset from the first bin center to its lower bin boundary specified by the histogram_offset attribute. This allows programs to reconstruct the original bin boundaries when required. An advantage of this scheme is that it ensures that the dimension size of the dimension scales are the same as the corresponding data dimensions. The existence of the "histogram_offset" attribute defines the data set as histograms. However, there are occasions where the process of reconstructing the original bin boundaries leads to unacceptable rounding errors, so its use is not mandatory.

[Previous Page | Next Page | NeXus Home Page | NeXus Site Map]

Comments to: Ray Osborn <ROsborn@anl.gov>
Revised: Saturday, September 14, 2002

Copyright © 1996-2002 NeXus Design Team. All rights reserved.