The Framelib is integrated into VEGA. This
means that all the usual functions are accessible to the command line and in the
macros. For example, you can try to run the macro vframel_ex.C. We let you
compare it with the example FrameFull.c of the Framelib manual.
There is a way to draw the time series
(vectors) extracted from frames, and this is described in the chapter
"Representing gravitational wave data".
V.2 General
scheme for data access
In a GW detection experiment, we will
certainly end-up analyzing BIG amounts of data. Even if the format of these data
is determined, the data itself will consist of many hundreds of files, each
consisting of many megabytes. This is true even in local data that a user may
want to analyze on his machine. Furthermore, when somebody uses some files
locally, chances are that those files contain frames that are not contiguous,
consisting of chunks of interesting data scattered through all the time the
experiment was running. How then, access this data by time? Or by frame/run
number?
The fundamental idea is to separate the
files containing real data and the metadata, or data about data, containing
information about frame file location, beginning and end times, conditions and
triggers, etc... We will call the place where this metadata is stored an
information database (may take various forms, as a metadatabase, a bookkeeping
database, or the frame files themselves). A mechanism, called "frame channel"
will act as a data provider, getting information about file location in the
information database, extracting the data from the frame files and providing the
result to the user.
This is shown in the following picture
:
Except when building a metadatabase, the
user interacts mainly with the frame channel to get the data in the form of
frames or vectors.
The frame channel will interact with the
information database through the exchange of metadata, while accessing directly
the frame files. Clearly, this scheme is extendable to any kind of information
or metadata and any type of frame location, even remote.
V.3 Different
information databases for different uses
There are various types of information
databases that are foreseen. Depending on the complexity of frame information,
the user may want to choose one or the other.
V.3.1 Simple
access: direct files
If one has a few files containing frames
ordered sequentially, there is no need to build a database. One could think of
accessing them with direct FrameLib calls. This is certainly doable. It may be
convenient though to have exactly the same calling sequences than for more
complex cases. One can think to build metadata on the fly with the information
contained in the files themselves. So when building and opening a frame channel
there is a direct files mode which is defined by adding the prefix "file:" to
the name of the files one wants to open, as in :
vega[] fc
= new VFrameChannel("file:demo400000000.F
demo400000045.F")
The commands and functions to extract a
frame or a vector are, from then on, the same as in the following
cases.
V.3.2 More
complex case: metadatabase
When the set of files becomes big, or the
frames are scattered in non-sequential order in a set of files, or one wants to
use conditions or triggers, it is necessary to build a metadatabase that will
keep the information about frames location and characteristics. It will be used
as an index to access the real frames. One builds such a metadatabase with, for
example :
vega[] vd
= new VFrameMetaDB("demoDB.root","CREATE","demo*")
The first argument of the constructor for a
metadatabase is the name of the ROOT file that will contain metadata, in our
case "demoDB.root", next the mode of opening ("READ" by default, but we used
"CREATE" to build a new database). The third argument is the path to the files.
As can be seen, one can use wildcards to specify frame files names. But one
could as well specify a directory name. The search is by default recursive,
which means you can set the beginning of the search to the top of a directory
tree containing directories, etc....
The metadatabase structure and methods are
described in Appendix A.
When a metadatabase is build, one should
connect a frame channel to it:
vega[]
w = new VFrameChannel("demoDB.root")
The prefix "metadb" may be put in front of
the metadatabase name, but this is the default, so it is not
mandatory.
The instructions to extract a frame or
vector are the same as in the direct file access case and will be explained in
the following paragraphs. The only difference is that one is assured not to have
any problems in case frames are mixed in various files or one wants to use a
selected (triggered) set of frames.
V.3.3 Full
case: bookkeeping database
In the case of VIRGO, a bookkeeping database
is build continuously as frames come from the detector. A case is foreseen for
connecting a frame channel to such a database. The prefix is "bkdb:". This will
be described in future versions of VEGA.
V.4 Access
to the data through a Frame Channel
V.4.1 Opening
a frame channel and connecting to an information database
VFrameChannel(char*
dbname)
To
create a frame channel, we need to create an object of class VFrameChannel. At
the same time, we can build it, looking for an information database. As, for
example, in
vega[]
fc = new VFrameChannel("demoDB.root")
we call the VFrameChannel constructor.
The parameter is the name of the info
database. It may be preceded by the type of database :
"file:" for direct file access. The frame
files themselves will give the meta information.
"metadb:" for a metadatabase, as described
in the next paragraph
"bkdb:" for an ORACLE bookkeeping database,
feature not yet available, but in progress.
V.4.2 Accessing
data through a Frame Channel
V.4.2.1 Extracting
frames
Frames are extracted through the frame
channel 'fc', opened as in the previous paragraph, with
FrameH* GetFrame(Double_t time)
FrameH* GetFrameR(Double_t
relativetime)
The
first method extracts a frame at an absolute time, the second at a time relative
to the reference time (see chapter “Dealing with
time”).
As in
vega[]
FrameH* frame = fc->GetFrame(time)
where time is a double expressing a time
that is contained in the frame. This method sends back a pointer to a well-known
FrameH structure.
Once one has extracted one frame, it is
possible to extract the next one or the preceding one in time :
FrameH*
GetNextFrame()
For
example :
vega[]
frame = fc->GetNextFrame()
or
FrameH*
GetPreviousFrame()
For
example :
vega[]
frame = fc->GetPreviousFrame()
Be careful, while GetFrame is checking to
see if the requested time is really contained in one of the frames of the
database, GetNextFrame and GetPreviousFrame do not. They will simply return you
the frame that is the next (or previous) one in time in the database, even if
it's years away.
V.4.2.2 Extracting
vectors of any length
Sometimes, one would like to extract a
vector which length is smaller than the one of a frame, or a vector that is
spanning two or more frames. One can extract such a vector with
:
It returns a vector of type FrVect given its
name and type. The vector starts at time "start" and has a length "length" (both
are doubles). Vectors extracted through the frame channel will be concatenated
as needed to obtain the desired vector. The string nameofvect indicates the type
(adc, proc, sim) and the name of the series to be extracted. The convention for
the format is "type.name".
For example "adc.IFO_DMRO" is a good
format.
If only the name is given, the groups of
series in the frame will be searched in the order adc, proc, sim for that name.
If more than one identical name was defined in the frame, the first found series
with that name will be used.
The extracted values of the vector are
casted to double by default. This behaviour may be changed. If the option
string opt =
"nocast", the extracted vector
retains the original data type. “opt” is set by default to empty (no
option).
V.4.2.3 Extracting
n-tuples of SMS data
As was seen in the chapter “IV –
Hands on VEGA”, in order to manipulate slow monitoring data, one has first
to extract them in a ntuple. This is done with the VFrameChannel
methods:
The
first method extracts an SMS at an absolute time, the second at a time relative
to the reference time (see chapter “Dealing with
time”).
nameofntuple
is the name of the new ntuple to be built.
varlist
is the list of variables to
extract. Colons separate the variables, and each variable name is constructed
with the name of the slow monitoring station, followed by the name of the
variable, separated by a dot, like stationname.varname. So
“TiServer.TA1:TiServer.T2:To9Tp.G41” is a valid variables list. If
one wants to extract all the variables of a station, just give the name of the
station, without any variable. “TiServer:To9Tp” will work (of course
if these stations exist).
start
is the time from which the extraction starts in the database. It is an absolute
GPS time for the
ExtractSMS
method and a time relative to the reference time for
ExtractSMSR.
length
is the length of the extraction section, in
seconds.
The variables and
slow monitoring stations are checked for in the first frame accessible after the
time
start.
If the variable names were given, the search will be made until the end of the
searching section (till the time start+length is reached). But if all the
variables of a station were asked for, and this station is not present, the
ntuple will not be build.
V.4.3 Getting
general information about the information database and it’s
contents
V.4.3.1 Getting
the start time of the information database
To get the start time of the first frame
indexed in the information database (files, metadb, bookkeeping database) to
which a frame channel is connected, one has to use the VFrameChannel
method
double
GetStart()
For
example if fc is a valid frame channel,
vega[]
fc->GetStart()
will output this start time
while
vega[]
st = fc->GetStart()
will put it in a double that may be
reused.
V.4.3.2 Printing
information about the frame channel
One can get some information about the frame
channel and it's connected database by using the VFrameChannel
method
void
Print()
For
example if fc is a valid frame channel,
vega[]
fc->Print()
This will show a summary information about
the information database, and will print the dump of the first frame at the end.
This may be used to collect names of vectors, etc....
V.5 Dealing
with selected or triggered data
V.5.1 Condition
information in the information database
A frame channel is connected to an
information database that may have various types. However, every such database
is supposed to keep information about triggers, conditions and other such as
quality words that allow a user to select a subset of all the frames. When the
database is build by reading the frame files or getting information from them,
the trigger information contained in each frame is gathered and arranged in such
a way to allow easy extraction of frames or vectors satisfying some
selection.
The triggers (structures of type FrTrigData
in the FrameLib) are converted to more general objects called conditions. This
will allow to use in the future as conditions other information such as slow
monitoring data or quality information. The user may even add his own
conditions, without copying the whole files just to add a FrTrigData structure.
The reader may find as an example the
structure of the VEGA metadatabase in Appendix A
V.5.2 Condition
Sets
Condition sets (class VConditionSet) are
objects designed to be used as iterators on the conditions gathered in info
databases, allowing a user to select a subset of all the conditions, through a
selection formula, and to determine successively all the time intervals where
this selection is valid. In other words, a condition set contains a selection
expression on the conditions and is used as a pointer to the successive
intervals of time satisfying this selection.
This will be clearer with an example.
Suppose two triggers were defined in a set of frames, say "Trig1" and "Trig2".
These triggers have a given amplitude, which may vary, and we would like to
select, for example, all time intervals for which "Trig1.amplitude>50
&& Trig2.amplitude>1". This is our selection expression. A condition
set will be build with this expression, as for example in:
The object "fc" is a valid frame channel
connected to an info database. Each information database has a particular type
of condition set. So the frame channel will ask the info database to create a
new condition set adapted to it's needs. All condition sets particular to a
database derive from a common parent, called VConditionSet. This is the type of
object returned by the CreateConditionSet() method.
In other words, each database acts as a
factory of condition sets that will be used only by itself and the frame
channel.
The user may stop here since the extraction
of all the successive frames or vectors satisfying the selection expression is
described in the paragraphs below and do not need any other manipulation of the
condition set. Nevertheless, for the interested reader, we can play with it, to
understand the behavior.
When first build, a condition set points to
the beginning of the info database. One can make it jump to the next set of
valid conditions by using the method NextFormSet() :
vega[]
cs->NextFormSet()
Using it repeatedly, the set will point to
the successive selected time intervals, in dark in the following
figure:
Actually, the selected intervals are not
forced to have limits at the beginning or end of a frame. This depends on the
start time and length of the triggers and is not dependent on the frame
length.
Once the condition set has jumped to the
next selected interval, one may retrieve its start and end
time:
vega[]
start = cs->GetIntersectionStart()
vega[]
end = cs->GetIntersectionEnd()
Why is the word "Intersection" appearing
here?
Each selection formula used in a condition
set references one or more conditions (or triggers). For example, the formula
above references "Trig1" and "Trig2". Each of these conditions has a start time
and an end time. But in the case where there is more than one referenced
condition, the start and end time of each of them may not coincide. So, when
searching a match for the selection, the first thing done is to search for an
intersection of all the referenced conditions:
In the example above, we have only used a
sequential search, getting all the interesting intervals. It is also possible to
jump directly to the interval that is following immediately a specified time
:
vega[]cs->NearestGEFormSet(time)
where "time" is in GPS format (double,
seconds.nanoseconds).
V.5.3 Extracting
frames with a condition
There are two ways of extracting frames that
satisfy a given condition: directly or sequentially. The direct method will be
used if a particular frame, which approximate time is known, is to be extracted.
The sequential method is used if one needs to process or view sequentially all
the frames that correspond to a given selection expression.
V.5.3.1 Direct
methods
The methods of VFrameChannel that allow a
direct conditioned access to frames are:
FrameH* GetNextFrame(Double_t time, char*
selection)
The
first method extracts a frame at an absolute time, the second at a time relative
to the reference time (see chapter “Dealing with time”). "selection"
is a selection expression referring to conditions existing in the database
(hence the frames) such as "Trig1.amp>50 &&
Trig2.amp>2".
the search will begin from time "time". The
frame returned will be the one containing the start time of the first interval
satisfying the selection expression. This method sends back a pointer to a
well-known FrameH structure.
V.5.3.2 Sequential
methods
In order to access sequentially all the
frames of interest, satisfying a selection expression, one has to have an object
that will point to the intervals of interest. So came the idea of condition
sets. Once a condition set has been defined, one can use it to extract frames
that are recorded at the corresponding time. The methods of VFrameChannel to do
so are:
FrameH* GetNextFrame(VConditionSet*
condset)
FrameH* GetNextFrameR(VConditionSet*
condset)
The
first method extracts a frame at an absolute time, the second at a time relative
to the reference time (see chapter “Dealing with time”). "condset"
is a condition set defined as explained in the paragraph "Condition
Sets"
Example
vega[]frame
= fc->GetNextFrame(condset)
The search will be governed by the condition
set "condset". The frame returned will be the one containing the start time of
the next interval pointed to by "condset". These methods have to be called
sequentially in a loop.
This method sends back a pointer to a
well-known FrameH structure.
One warning: the memory for returned frames
is allocated by the system, but it is the user's duty to free it after use. One
has to do:
vega[]FrameFree(frame)
after finishing using the frame
object.
V.5.4 Extracting
vectors with a condition
Instead of extracting full frames, it is
possible to extract only some vectors. The start time and length of the
condition time interval will direct the start time and length of the vectors.
The user does not (yet) control the length of the vector
returned.
As for frames, the extraction may be direct
or sequential.
V.5.4.1 Direct
methods
The methods of VFrameChannel that allow a
direct conditioned access to vectors are:
They
return a vector of type FrVect given its name and type. The vector spans a time
interval for which the conditions expressed in the selection expression
"selection" are valid. This time interval is the nearest next one to the time
"gpstime".
Thus, "gpstime" is not the start time of the
vector but the start time of the search.
The string nameofvect indicates the type
(adc, proc, sim) and the name of the series to be extracted. The convention for
the format is "type.name". For example "adc.IFO_DMRO" is a good format. If only
the name is given, the groups of series in the frame will be searched in the
order adc, proc, sim for that name. If more than one identical name was defined
in the frame, the first found series with that name will be
used.
The first method starts the search at an
absolute time, the second at a time relative to the reference time (see chapter
“Dealing with time”).
Frames where the condition expression
"Trig1>50" will be first searched for. This will give a start time and length
for the time interval of interest. Then, the vector of type adc and named
"IFO_DMRO" will be assembled, whose start time and length will match those of
the interval.
If only the name is given, the groups of
series in the frame will be searched in the order adc, proc, sim for that name.
If more than one identical name was defined in the frame, the first found series
with that name will be used.
V.5.4.2 Sequential
methods
In order to access sequentially all the
vectors of interest, satisfying a selection expression, one has to have an
object that will point to the time intervals of interest. So came the idea of
condition sets. Once a condition set has been defined, one can use it to extract
vectors that are recorded at the corresponding time. The method of VFrameChannel
to do so is:
It
returns a vector of type FrVect given its name and type. First, the condition
set "condset" jumps to the next time interval satisfying its internal
conditions. Then, the vector is extracted that spans this time
interval.
The string nameofvect indicates the type
(adc, proc, sim) and the name of the series to be extracted. The convention for
the format is "type.name". For example "adc.IFO_DMRO" is a good format. If only
the name is given, the groups of series in the frame will be searched in the
order adc, proc, sim for that name. If more than one identical name was defined
in the frame, the first found series with that name will be
used.
The search will be governed by the condition
set "condset". The returned vector (of type FrVect) will be the next vector that
satisfies the internal selection expression of the condition set
"condset".
This method has been designed to be called
sequentially in a loop.
One warning: the memory for returned vectors
is allocated by the system, but it is the user's duty to free it after use. One
has to do:
vega[]FrVectFree(vect)
after finishing using the vector
object.
V.5.5 Getting
information about condition sets
V.5.5.1 Printing the names
of the conditions present in an information database
When printing information about a frame
channel, the dump of the first frame is made. But this doesn’t show all
the conditions that exist in the connected database. To show the names of all
the conditions gathered when building the info database, one has to use again
the Print() method of VFrameChannel, but with an option
“conditions”. For example if fc is a valid frame
channel:
vega[]
fc->Print("conditions")
will do it.
V.5.5.2 Current
status of a condition set
A condition set, as explained above, points
to a particular section of data satisfying some criteria. It may be useful to
get some information about the current status of a particular condition set. The
following methods of VConditionSet are used to retrieve this
information:
class
VConditionSet
double GetIntersectionStart()
double
GetIntersectionEnd()
Returns
the start and end time of the current region of interest. This region is the
intersection region of all the conditions that compose the condition
set.
Example, provided cs is a valid condition
set:
vega[]
cs->GetIntersectionStart()
(double)661234567.1
vega[]
cs->GetIntersectionEnd()
(double)661234569.3
class
VConditionSet
double
GetLastConditionResult()
Returns
the result of the condition formula that is used to define the condition set
applied to the current set of conditions. When searching for the next condition
set, this result should be non zero.
class
VConditionSet
double Eval(const char*
formula)
Returns
the result of the formula “formula” applied to the current set of
conditions. This allows to see, in a set of conditions, the status of a
particular condition. For example, if “fc” is a valid frame channel,
suppose the specified condition set is :
The fact that you jumped to the next valid
condition set with a GetNextVect() or GetNextFrame() call (see VFrameChannel),
ensures that the condition "Trig1.a>2 && Trig2.a<20" is
satisfied. But you may wonder what is the real value of the amplitude of
condition “Trig1”. This may be known with:
vega[]
cs.Eval("Trig1.a")
(double) 11.876
V.6 N-tuples
adapted for GW data analysis
Slow monitoring data spans a lot of frames.
The special treatment we have to apply in order to gain a simple yet efficient
way of dealing with SMS data is to build a special object called an ntuple that
will contain all this data. We developed a particular ntuple (VNtuple) that
differs from the standard ROOT one in that it is adapted to our needs. You can
think of a ntuple as a list of all the sms data put in a tree-like
structure:
The difference with a simple array is that
each leaf of the tree can be any kind of object, even a tree itself. This leads
to a hierarchical structure, like in a directory structure for a file
system.
In fact, these more general ntuples are
called Trees. In simple ntuples, as VNtuples each leaf is a single float
parameter.
VNtuples are derived from ROOT TNtuples and
therefore, have access to all the methods available in TNtuples. We will focus
on the ones that were added and on the ones most frequently used. For the
others, the reader may refer to
http://root.cern.ch/root/html/TNtuple.html.
listofvariables
is the list of variables to be put in the ntuple. It is a char string made of
names separated by colons. For example "t:x:y:z:var1". This list will tell what
is the number of variables to be foreseen.
bufsize
is the buffer size used internally when writing to disk. The default value is
sufficient for most uses.
Example :
vnt = new VNtuple("vnt","Example
vntuple","t:TP1:TP2:PR1:PR2")
will build a new ntuple object with 5
variables. In a compiled program, vnt has to be declared as a VNtuple* before
use. It will be automatically declared if one uses the
interpreter.
V.6.2 Filling
an N-tuple and getting data
The ExtractSMS method of VFrameChannel does
the job for you in case you want to extract SMS data. It is nevertheless useful
to know how to fill an ntuple and access the data that is inside it, in case you
want to loop on this data.
x0 to
x14 in case you have less than 15
variables and more than 1 (2 or more), you can call the second form of the Fill
method.
Example :
vnt->Fill(1.43,4.23,b,c,d)
will fill the ntuple with the values
specified.
One can fill an ntuple in a loop as much as
he wants, especially if the ntuple is on disk. To make an ntuple on disk, simply
open a ROOT file in “RECREATE” mode BEFORE building the ntuple. This
one will automatically be attached to the file and it’s contents will be
flushed to disk as soon as the buffers are full (every 32 kbytes by
default).
void GetEntry(Int_t i)
Float_t*
GetArgs()
Extracts
the variables corresponding to a given entry in the ntuple.
i
is the index of the entry (in filling
order).
GetEntry() fills an
internal array with the contents of entry i and GetArgs() returns a pointer to
an array of floats containing the values. There are two different methods
because in the case of Trees, the objects to be returned may be very complex,
and one can try to get only a subset of these objects.
The
GetEntries()
method returns the number of entries in the ntuple.
A simple loop to process all the values of a
ntuple would be: while ( int i=0; i
< vnt->GetEntries(); i++ )
{
vnt->GetEntry(i); float*
x = vnt->GetArgs();
printf("nvar = %d, x0 =
%fn",nvar,x[0]);
}
Of course, one can do any kind of treatment
in the loop
If it’s just a matter of printing the
values, the Scan() method may be used. The previous example could be replaced by
a simple vnt->Scan()
call.
V.6.3 Drawing
The drawing methods are described in
"Representing gravitational wave data".