V Accessing
and managing Gravitational Waves data
V.1 Accessing
frames through the Framelib
The Framelib is integrated into VEGA. This means that all the usual
functions are accessible to the command line and in the macros. For example, you
can try to run the macro vframel_ex.C. We let you compare it with the example
FrameFull.c of the Framelib manual.
There is a way to draw the time series (vectors) extracted from frames, and
this is described in the chapter "Representing gravitational wave
data".
V.2 Metadatabase
for easy data access
V.2.1 What
for ?
In a GW detection experiment, we will certainly end-up analyzing BIG
amounts of data. Even if the format of these data is determined, the data itself
will consist of many hundreds of files, each consisting of many megabytes. This
is true even in local data that a user may want to analyze on his machine.
Furthermore, when somebody uses some files locally, chances are that those files
contain frames that are not contiguous, consisting of chunks of interesting data
scattered through all the time the experiment was running. How then, access this
data by time ? Or by frame/run number ?
If someone wants eventually to have headaches, one of the solutions might
be to give each file a meaningful name, containing for example the start time of
the first frame in that file, and be very disciplined. But what if somebody
wants to access a part of the data that spans two or more frames ? This, and
other such questions, led to the idea of a metadatabase.
Suppose we have a bunch of frame files in a directory. We create a
database, consisting in our case of a ROOT file containing two Trees, that will
index the location (file), start time, and other interesting parameters (trigger
conditions...) of all the frames present in the directory files.
In order to access one particular frame, one just has to give its time to
the metadatabase, which will send him back this frame. We will see all the
possibilities of this approach.
V.2.2 Principle
and structure
From the user's point of view, the analysis environment should allow
her/him to access simply any vector or frame that is contained in a set of frame
files, even remotely. One should also be able to access vector not caring about
frame boundaries. It should also be possible to make accesses conditioned by
some trigger, slow monitoring value or user-defined condition. Furthermore, one
should be able to manage a set of files that is as big as possible (of the order
of 1 TB or more)
The option chosen in VEGA to solve this problem is to build a database that
contains metadata about frames and indexes these frames. The structure of the
database is drawn on the following figure :
The complete frame information is kept in the frame files, while the
database serves as an index to access them. The access is a two step process.
First, one gives the starting time of the vector one wants. This starting time
is used to determine, with the help of what is called a "time hash table", the
frame files that possibly contain the desired information. Then, only the
metadata corresponding to these files are searched and finally, the desired
frame is accessed in the relevant file or the desired vector is reconstructed
and given to the user.
The metadata is kept in a container, called Tree, introduced in ROOT and
specifically designed to handle big amounts of data and access it quickly. The
tree structure can also contain conditions for frame access.
V.2.3 Creation
of a metadatabase
- VFrDataBase(char* filename, char* mode, char* framefilenames, char*
opt)
To create a local database, we need to create a
database object of class VFrDataBase. At the same time, we can build it, looking
for frame files. As, for example, in
vega[] vd = new
VFrDataBase("demoDB.root","CREATE","./")
we call the VFrDataBase constructor.
The first parameter is the name of the database file.
The second one is the mode with which we open the database. Here, it is
opened in "CREATE" mode, since we want to build it.
he third parameter is the path to the directory containing the frame files.
Here, it is "./", meaning the local directory. You can put whatever path you
like and specify particular files/directories with wildcards.
The search is by default recursive, all the contained directories will be
searched recursively. If you want to turn this option off, add a last parameter
and specify "S"
That's it ! The amount of time needed to index the files depends on the
number and size of those files. But typically, it consists of a sequential read
of all the files. This shouldn't be much slower than the speed of the connection
to the disk containing the directories.
V.2.4 Accessing
data through a metadatabase
V.2.4.1 Opening
an existing database
To easily access the data, we need first to open a database. We can use the
constructor in "READ" mode, which is the default :
vega[] vd = new VFrDataBase("./demoDB.root")
One may replace "./demoDB.root" with the path/name of his local
database.
V.2.4.2 Extracting
frames
Frames are extracted from the database with
- FrameH* GetFrame(Double_t time)
- FrameH* GetFrameR(Double_t
relativetime)
The first method extracts a frame at an
absolute time, the second at a time relative to the reference time (see chapter
“Dealing with time”).
As in
vega[] FrameH* frame = vd->GetFrame(time)
where time is a double expressing a time that is contained in the frame.
This method sends back a pointer to a well-known FrameH structure.
Once one has extracted one frame, it is possible to extract the next one or
the preceding one in the database :
- FrameH* GetNextFrame()
For example
:
vega[] frame = vd->GetNextFrame()
or
- FrameH* GetPreviousFrame()
For example
:
vega[] frame = vd->GetPreviousFrame()
Be careful, while GetFrame is checking to see if the requested time is
really contained in one of the frames of the database, GetNextFrame and
GetPreviousFrame do not. They will simply return you the frame that is the next
(or previous) one in time in the database, even if it's years away.
V.2.4.3 Extracting
vectors of any length
Sometimes, one would like to extract a vector which length is smaller than
the one of a frame, or a vector that is spanning two or more frames. One can
extract such a vector with :
- FrVect* GetVect(Text_t* nameofvect, Double_t start, Double_t length,
Option_t* opt="")
- FrVect* GetVectR(Text_t* nameofvect, Double_t start, Double_t length,
Option_t* opt="")
The first method extracts a
frame at an absolute time, the second at a time relative to the reference time
(see chapter “Dealing with time”).
As, for example :
vega[] FrVect* vect =
vd->GetVect("adc.IFO_DMRO",start,length)
It returns a vector of type FrVect given its name and type. The vector
starts at time "start" and has a length "length" (both are doubles). Vectors
extracted from the database will be concatenated as needed to obtain the desired
vector. The string nameofvect indicates the type (adc, proc, sim) and the name
of the series to be extracted. The convention for the format is
"type.name".
For example "adc.IFO_DMRO" is a good format.
If only the name is given, the groups of series in the frame will be
searched in the order adc, proc, sim for that name. If more than one identical
name was defined in the frame, the first found series with that name will be
used.
The extracted values of the vector are casted to double by default. This
behaviour may be changed.
If the option string opt = "nocast", the extracted vector retains
the original data type. “opt” is set by default to empty (no
option).
V.2.4.4 Extracting
n-tuples of SMS data
As was seen in the chapter “IV – Hands on VEGA”, in order
to manipulate slow monitoring data, one has first to extract them in a ntuple.
This is done with the VFrDataBase methods :
- VNtuple* ExtractSMS(Text_t* nameofntuple, Text_t* varlist, Double_t
start, Double_t length)
- VNtuple* ExtractSMSR(Text_t* nameofntuple, Text_t* varlist, Double_t
start, Double_t length)
The first method extracts an
SMS at an absolute time, the second at a time relative to the reference time
(see chapter “Dealing with time”).
- nameofntuple is the name of the new ntuple to be built.
- varlist is the list of variables to extract. Colons separate
the variables, and each variable name is constructed with the name of the slow
monitoring station, followed by the name of the variable, separated by a dot,
like stationname.varname. So “TiServer.TA1:TiServer.T2:To9Tp.G41” is
a valid variables list. If one wants to extract all the variables of a station,
just give the name of the station, without any variable.
“TiServer:To9Tp” will work (of course if these stations exist).
- start is the time from which the extraction starts in the
database. It is an absolute GPS time for the ExtractSMS method
and a time relative to the reference time for ExtractSMSR.
- length is the length of the extraction section, in
seconds.
The variables and slow monitoring stations are checked
for in the first frame accessible after the time start. If the
variable names were given, the search will be made until the end of the
searching section (till the time start+length is reached). But if all the
variables of a station were asked for, and this station is not present, the
ntuple will not be build.
V.2.5 Getting
general information about the metadatabase and it’s contents
V.2.5.1 Getting the
start time of the metadatabase
To get the start time of the first frame indexed in the metadatabase, one
has to use the VFrDataBase method
- double GetStart()
For example if vd is a
valid database,
vega[] vd->GetStart()
will output this start time while
vega[] st = vd->GetStart()
will put it in a double that may be reused.
V.2.5.2 Printing
information about the metadatabase
One can get some information about the metadatabase by using the
VFrDataBase method
- double Print()
For example if vd is a
valid database,
vega[] vd->Print()
This will show a rather extensive output of the metadatabase content, and
will print the dump of the first frame at the end. This may be used to collect
names of vectors, etc....
V.3 Dealing
with selected or triggered data
V.3.1 Condition
information in the metadatabase
When the metadatabase is build by reading the frame files, the trigger
information contained in each frame is retrieved and arranged in such a way to
allow easy extraction of frames or vectors satisfying some selection.
The triggers (structures of type FrTrigData in the FrameLib) are converted
to more general objects called conditions. This will allow to use in the future
as conditions other information such as slow monitoring data or quality
information. The user may even add his own conditions, without copying the whole
files just to add a FrTrigData structure. A simple scheme of the database was
given above. This scheme is now enhanced by the addition of condition trees and
an index for fast access:
V.3.2 Condition
Sets
Condition sets (class VConditionSet) are objects designed to be used as
iterators on the condition trees, allowing a user to select a subset of all the
conditions, through a selection formula, and to determine successively all the
time intervals where this selection is valid. In other words, a condition set
contains a selection expression on the conditions and is used as a pointer to
the successive intervals of time satisfying this selection.
This will be clearer with an example. Suppose two triggers were defined in
a set of frames, say "Trig1" and "Trig2". These triggers have a given amplitude,
which may vary, and we would like to select, for example, all time intervals for
which "Trig1.amplitude>50 && Trig2.amplitude>1". This is our
selection expression. A condition set will be build with this expression, as for
example in:
vega[] cs = new VConditionSet(w,"Trig1.amp>50 &&
Trig2.amp>1")
The object "w" is a valid metadatabase. A condition set is attached to a
particular metadatabase since it contains the condition information.
The user may stop here since the extraction of all the successive frames or
vectors satisfying the selection expression is described in the paragraphs below
and do not need any other manipulation of the condition set. Nevertheless, for
the interested reader, we can play with it, to understand the
behavior.
When first build, a condition set points to the beginning of the
metadatabase. One can make it jump to the next set of valid conditions by using
the method NextFormSet() :
vega[] cs->NextFormSet()
Using it repeatedly, the set will point to the successive selected time
intervals, in red in the following figure:
Actually, the selected intervals are not forced to have limits at the
beginning or end of a frame. This depends on the start time and length of the
triggers and is not dependent on the frame length.
Once the condition set has jumped to the next selected interval, one may
retrieve its start and end time:
vega[] start = cs->GetIntersectionStart()
vega[] end = cs->GetIntersectionEnd()
Why is the word "Intersection" appearing here?
Each selection formula used in a condition set references one or more
conditions (or triggers). For example, the formula above references "Trig1" and
"Trig2". Each of these conditions has a start time and an end time. But in the
case where there is more than one referenced condition, the start and end time
of each of them may not coincide. So, when searching a match for the selection,
the first thing done is to search for an intersection of all the referenced
conditions:
In the example above, we have only used a sequential search, getting all
the interesting intervals. It is also possible to jump directly to the interval
that is following immediately a specified time :
vega[]cs->NearestGEFormSet(time)
where "time" is in GPS format (double, seconds.nanoseconds).
V.3.3 Extracting
frames with a condition
There are two ways of extracting frames that satisfy a given condition:
directly or sequentially. The direct method will be used if a particular frame,
which approximate time is known, is to be extracted. The sequential method is
used if one needs to process or view sequentially all the frames that correspond
to a given selection expression.
V.3.3.1 Direct methods
The methods of VFrDataBase that allow a direct conditioned access to frames
are:
- FrameH* GetNextFrame(Double_t time, char* selection)
- FrameH* GetNextFrameR(Double_t relativetime, char*
selection)
The first method extracts a frame at an
absolute time, the second at a time relative to the reference time (see chapter
“Dealing with time”). "selection" is a selection expression
referring to conditions existing in the database (hence the frames) such as
"Trig1.amp>50 && Trig2.amp>2".
Example
vega[]frame =
vd->GetNextFrame(time,"Trig1.amp>50&&Trig2.amp>2")
the search will begin from time "time". The frame returned will be the one
containing the start time of the first interval satisfying the selection
expression. This method sends back a pointer to a well-known FrameH
structure.
V.3.3.2 Sequential methods
In order to access sequentially all the frames of interest, satisfying a
selection expression, one has to have an object that will point to the intervals
of interest. So came the idea of condition sets. Once a condition set has been
defined, one can use it to extract frames that are recorded at the corresponding
time. The methods of VFrDataBase to do so are:
- FrameH* GetNextFrame(VConditionSet* condset)
- FrameH* GetNextFrameR(VConditionSet*
condset)
The first method extracts a frame at an
absolute time, the second at a time relative to the reference time (see chapter
“Dealing with time”). "condset" is a condition set defined as
explained in the paragraph "Condition Sets"
Example
vega[]frame = vd->GetNextFrame(condset)
The search will be governed by the condition set "condset". The frame
returned will be the one containing the start time of the next interval pointed
to by "condset". These methods have to be called sequentially in a
loop.
This method sends back a pointer to a well-known FrameH
structure.
One warning: the memory for returned frames is allocated by the system, but
it is the user's duty to free it after use. One has to do:
vega[]FrameFree(frame)
after finishing using the frame object.
V.3.4 Extracting
vectors with a condition
Instead of extracting full frames, it is possible to extract only some
vectors. The start time and length of the condition time interval will direct
the start time and length of the vectors. The user does not (yet) control the
length of the vector returned.
As for frames, the extraction may be direct or sequential.
V.3.4.1 Direct methods
The methods of VFrDataBase that allow a direct conditioned access to
vectors are:
- FrVect* GetNextVect(Text_t* nameofvect, Double_t gpstime, char*
selection)
- FrVect* GetNextVectR(Text_t* nameofvect, Double_t gpstime, char*
selection)
They return a vector of type FrVect given
its name and type. The vector spans a time interval for which the conditions
expressed in the selection expression "selection" are valid. This time interval
is the nearest next one to the time "gpstime".
Thus, "gpstime" is not the start time of the vector but the start time of
the search.
The string nameofvect indicates the type (adc, proc, sim) and the name of
the series to be extracted. The convention for the format is "type.name". For
example "adc.IFO_DMRO" is a good format. If only the name is given, the groups
of series in the frame will be searched in the order adc, proc, sim for that
name. If more than one identical name was defined in the frame, the first found
series with that name will be used.
The first method starts the search at an absolute time, the second at a
time relative to the reference time (see chapter “Dealing with
time”).
For example:
vega[]vect =
vd->GetNextVect("adc.IFO_DMRO",gpstime,"Trig1>50")
Will return a vector of type FrVect,.
Frames where the condition expression "Trig1>50" will be first searched
for. This will give a start time and length for the time interval of interest.
Then, the vector of type adc and named "IFO_DMRO" will be assembled, whose start
time and length will match those of the interval.
If only the name is given, the groups of series in the frame will be
searched in the order adc, proc, sim for that name. If more than one identical
name was defined in the frame, the first found series with that name will be
used.
V.3.4.2 Sequential methods
In order to access sequentially all the vectors of interest, satisfying a
selection expression, one has to have an object that will point to the time
intervals of interest. So came the idea of condition sets. Once a condition set
has been defined, one can use it to extract vectors that are recorded at the
corresponding time. The method of VFrDataBase to do so is:
- FrVect* GetNextVect(Text_t* nameofvect, VConditionSet*
condset)
It returns a vector of type FrVect given its
name and type. First, the condition set "condset" jumps to the next time
interval satisfying its internal conditions. Then, the vector is extracted that
spans this time interval.
The string nameofvect indicates the type (adc, proc, sim) and the name of
the series to be extracted. The convention for the format is "type.name". For
example "adc.IFO_DMRO" is a good format. If only the name is given, the groups
of series in the frame will be searched in the order adc, proc, sim for that
name. If more than one identical name was defined in the frame, the first found
series with that name will be used.
For example:
vega[]vect =
vd->GetNextVect("adc.IFO_DMRO",condset)
The search will be governed by the condition set "condset". The returned
vector (of type FrVect) will be the next vector that satisfies the internal
selection expression of the condition set "condset".
This method has been designed to be called sequentially in a
loop.
One warning: the memory for returned vectors is allocated by the system,
but it is the user's duty to free it after use. One has to do:
vega[]FrVectFree(vect)
after finishing using the vector object.
V.3.5 Getting
information about condition sets
V.3.5.1 Printing the names of the conditions present
in a metadatabase
When printing information about a metadatabase, the dump of the first frame
is made. But this doesn’t show all the conditions that exist in a
metadatabase. To show the names of all the conditions gathered when building the
metadatabase, one has to use again the Print() method of VFrDataBase, but with
an option “conditions”. For example if vd is a valid
database:
vega[] vd->Print("conditions")
will do it.
V.3.5.2 Current status
of a condition set
A condition set, as explained above, points to a particular section of data
satisfying some criteria. It may be useful to get some information about the
current status of a particular condition set. The following methods of
VConditionSet are used to retrieve this information:
class VConditionSet
- double GetIntersectionStart()
- double GetIntersectionEnd()
Returns the
start and end time of the current region of interest. This region is the
intersection region of all the conditions that compose the condition
set.
Example, provided cs is a valid condition set:
vega[] cs->GetIntersectionStart()
(double)661234567.1
vega[] cs->GetIntersectionEnd()
(double)661234569.3
class VConditionSet
- double GetLastConditionResult()
Returns
the result of the condition formula that is used to define the condition set
applied to the current set of conditions. When searching for the next condition
set, this result should be non zero.
class VConditionSet
- double Eval(const char* formula)
Returns
the result of the formula “formula” applied to the current set of
conditions. This allows to see, in a set of conditions, the status of a
particular condition. For example, if “vd” is a valid metadatabase,
suppose the specified condition set is :
vega[] cs = new VconditionSet(vd,"Trig1.a>2 &&
Trig2.a<20")
The fact that you jumped to the next valid condition set with a
GetNextVect() or GetNextFrame() call (see VFrDataBase), ensures that the
condition "Trig1.a>2 && Trig2.a<20" is satisfied. But you
may wonder what is the real value of the amplitude of condition
“Trig1”. This may be known with:
vega[] cs.Eval("Trig1.a")
(double) 11.876
V.4 N-tuples adapted for GW
data analysis
Slow monitoring data spans a lot of frames. The special treatment we have
to apply in order to gain a simple yet efficient way of dealing with SMS data is
to build a special object called an ntuple that will contain all this data. We
developed a particular ntuple (VNtuple) that differs from the standard ROOT one
in that it is adapted to our needs. You can think of a ntuple as a list of all
the sms data put in a tree-like structure:
The difference with a simple array is that each leaf of the tree can be any
kind of object, even a tree itself. This leads to a hierarchy structure, like in
a directory structure for a file system.
In fact, these more general ntuples are called Trees. In simple ntuples, as
VNtuples each leaf is a single float parameter.
VNtuples are derived from ROOT TNtuples and therefore, have access to all
the methods available in TNtuples. We will focus on the ones that were added
and on the ones most frequently used. For the others, the reader may refer to
http://root.cern.ch/root/html/TNtuple.html.
Building
VNtuple
The standard constructor is :
- VNtuple(Text_t* name, Text_t* title, Text_t* listofvariables, Int_t
bufsize=32000);
- name is the name of the new VNtuple.
- title is its title.
- listofvariables is the list of variables to be put in the
ntuple. It is a char string made of names separated by colons. For example
"t:x:y:z:var1". This list will tell what is the number of variables to be
foreseen.
- bufsize is the buffer size used internally when writing to
disk. The default value is sufficient for most uses.
Example :
vnt = new VNtuple("vnt","Example
vntuple","t:TP1:TP2:PR1:PR2")
will build a new ntuple object with 5 variables. In a compiled program, vnt
has to be declared as a VNtuple* before use. It will be automatically declared
if one uses the interpreter.
V.4.2 Filling
an N-tuple and getting data
The ExtractSMS method of VFrDataBase does the job for you in case you want
to extract SMS data. It is nevertheless useful to know how to fill an ntuple and
access the data that is inside it, in case you want to loop on this
data.
- void Fill(Float_t* x)
- void Fill(Float_t x0, Float_t x1, Float_t x2=0, Float_t x3=0, Float_t
x4=0, Float_t x5=0, Float_t x6=0, Float_t x7=0, Float_t x8=0, Float_t x9=0,
Float_t x10=0, Float_t x11=0, Float_t x12=0, Float_t x13=0, Float_t
x14=0)
- x is an array of floats.
- x0 to x14 in case you have less than 15 variables and more
than 1 (2 or more), you can call the second form of the Fill
method.
Example :
vnt->Fill(1.43,4.23,b,c,d)
will fill the ntuple with the values specified.
One can fill an ntuple in a loop as much as he wants, especially if the
ntuple is on disk. To make an ntuple on disk, simply open a ROOT file in
“RECREATE” mode BEFORE building the ntuple. This one will
automatically be attached to the file and it’s contents will be flushed to
disk as soon as the buffers are full (every 32 kbytes by default).
- void GetEntry(Int_t i)
- Float_t* GetArgs()
Extracts the variables
corresponding to a given entry in the ntuple.
- i is the index of the entry (in filling
order).
GetEntry() fills an internal array with the contents of
entry i and GetArgs() returns a pointer to an array of floats containing the
values. There are two different methods because in the case of Trees, the
objects to be returned may be very complex, and one can try to get only a subset
of these objects.
The GetEntries() method returns the number of entries in
the ntuple.
A simple loop to process all the values of a ntuple would be:
while ( int i=0; i < vnt->GetEntries(); i++ ) {
vnt->GetEntry(i);
float* x =
vnt->GetArgs();
printf("nvar = %d, x0 =
%fn",nvar,x[0]);
}
Of course, one can do any kind of treatment in the loop
If it’s just a matter of printing the values, the Scan() method may
be used. The previous example could be replaced by a simple
vnt->Scan() call.
V.4.3 Drawing
The drawing methods are described in "Representing gravitational wave
data".
Damir BUSKULIC
Last update :19/11/2001;