CLIDATA

The Clidata system is primary intended for archiving of climatology data, for the data quality control and for administration of climatology stations and station observations. The System was designed to replace the old CLICOM system, which has been used in Czech Republic from 1993-2000.

The system is designed for the Oracle database environment, which defines simple and secure access to stored data.

By virtue of the system flexibility, easy administration and multi-language support, the system is capable of set up in any foreign country and for any meteorological service. The system has been operationally used in Czech Hydrometeorological Institue for 14 years and it is successfully installed in more than 30 other countries.

The system is particularly user-friendly during the definition of stations, station observations and manual key entry of the data. The system facilitates the population of data from automated (real-time) stations as well as the definition of personalised key entry forms.


1 Introduction

2 History of automated processing of climate data

3 History of CLIDATA

  3.1. ORACLE database system
  3.2. Programming works
  3.3. CLIDATA development coordinator
  3.4. Restrictions in CLIDATA development
  3.5. Programming code
  3.6. Language versions
  3.7. Assigning of specific development tasks
  3.8. Putting into operation
  3.9. Basic access to CLIDATA structures
  3.10 User rights
  3.11. Starting CLIDATA
  3.12. Entering data in CLIDATA
  3.12.1. Definition of an element
  3.12.2. Definition of the station
  3.12.3. Definition of station elements
  3.12.4. Definition of a data acquisition form sheet
  3.12.5. Entering of data

4 Description of basic parts of CLIDATA

  4.1. Geography of the station
  4.2. Description of the station observation
  4.3. Data
  4.3.2. One minute rainfall intensity
  4.3.3. Meteorological phenomena
  4.3.4. Pentade, decade and monthly data
  4.3.5. Number of days
  4.3.6. Long-termed averages and extremes
  4.3.7. Normals
  4.4. Data acquisition and data control
  4.4.1. Data acquisition
  4.4.2. Data control
  4.5. System administration
  4.5.1. Relation to CLICOM
  4.5.2. Simple code lists
  4.5.3. Administration of import files
  4.5.4. Calculation equations
  4.5.5. Preparation of data e-mails
  4.5.6. FTP transfer of data and information
  4.5.7. Administration of CLIDATA
  4.6. Products
  4.6.1. Wind rose
  4.6.2. X-day function
  4.6.3. Rainfall intensity
  4.6.4. User EDATA
  4.6.5. Regular inquiries
  4.6.6. CLIMAT report
  4.6.7. Frequency characteristics

5 Using CLIDATA in the Czech Republic and in the world

  5.1. CLIDATA in ČHMÚ
  5.2. CLIDATA in the world

6 Short-period forecast

7 Conclusion

In past years, importance of climate database applications has increased considerably. The climate data history as well as outputs from automatic meteorology stations make up a big volume of climatology information and data. Various approaches to data processing are used in existing database systems. Advanced climatology applications comprise measurement data, station information, metadata, data section and other products. Integrated components make it possible to link the climatology application directly with other applications (such as GIS) and to provide various outputs. The Czech Hydrometeorological Institute (“ČHMÚ”) develops a climatology database application: CLIDATA comprises metadata and is used for basic processing of climate data.

Metadata provide a geographic description of measuring stations and station administration applications. It is possible to include additional information into the description of the station, for instance notes, maps, photos or a sun horizon line, if landscape near the station is rather complicated. The sun horizon line influences also sunshine measurements and solar radiation parameters. The photos facilitate description of influences of cities on observation (if the station is installed in built-up areas). In each definition of a station, there is an observation description A list of station elements and observation schedules are stored in the database kept for existing and historic stations. Several pieces of descriptive information can exist for each element. This depends on observation development for the individual components throughout the history. In the database, a measuring instrument can be recorded for the elements. The station information includes other details (such as a replacement and repairs of the instruments, a number of used instruments, check visit results, a list of faults in the station observation, or a training of observers) as well as detailed information about the station observer.

Data can be entered into the application in any of four available ways. The first method is an import of data from previous applications. The second method is to import text files. The application defines import methods describing a format of an entry line of the text file (such as an identification of the station, date, and element) to be imported. In that case, a user is required only to provide a server with the text file. Once the import is completed, the user will check an import protocol. If the import has failed, the user has to re-import the text file again. The third method is to import data from telecommunication networks or other sources of electronic data. In that case, a format of a data report must be identified. Similarly as in cases above, the description of an internal structure of the data report is found in the application. The fourth method is to re-write data from paper reports. First, it is necessary to define a data acquisition form sheet. It looks like a paper statement. Lines or columns in the form sheet can be used. The form sheet can include check sums in the columns and/or lines. Once the data are filled in in the form sheets, preset check formula will immediately check them for correctness. Any faulty or suspicious values will be marked with various colours. As far as data about meteorological phenomena are concerned, special data acquisition form sheets are available. They cover as many as 24 phenomena per each station per day. The form sheet includes a symbol of a meteorological phenomenon as well as its start, end and intensity. The data from the data acquisition form sheets are saved directly into database tables.

Data are verified in three steps. First, the upper and lower limits for each element are checked irrespective of the location and altitude of the station. This check is really preliminary only. Then, a “static” check is carried out on a basis of formulae that define mutual relations among the elements. For instance, the maximum daily temperature of the air must be higher than or equal to all temperatures measured during the day. This check is carried out in the data acquisition form sheets. Each value is given a quality code after the check. The user will decide whether the value is correct or will estimate the correct value or will calculate the correct value externally. In the third level of data verification, extreme value tables are used. It is essential that the database should store a sufficient quantity of data for each station and each element. First, it is necessary to calculate empirical probabilities from data for twenty years at least. Geographic Information Systems displays symbols of various colours on the map, depending on data intervals formed by the empirical probability values. It is easy for the user then to find data that might be incorrect and have not been detected in previous verification levels. Once this verification is over, the complete monthly record is set a quality tag that confirms that the values for the observation month, element and time have been verified.

Basic calculations of the climate values are included into the database application. The application is able to process the climate values and create various outputs. CLIDATA databases are used by the Czech Academy of Science, universities and other clients.

An international co-operation between meteorological services all over the world is an important part of CLIDATA and a pre-requisite for further development of CLIDATA in the future. Currently, CLIDATA is used in 20 countries.

Key words:

climate data, database applications, data processing, CLIDATA

1 Introduction

The climatology has becoming more and more important, considering information about climate changes (Jones et al., 1999; Jones, Moberg, 2003). It is impossible to learn patterns and phenomena in the climate system without a detailed analysis of the climate data for the instrumentation measurement period (e.g. Luterbacher et al., 2004). The behaviour of the climate system cannot be described unless reliable and accessible groups of climate elements have been available for a long time. National meteorological services operate now measuring networks of meteorological stations with varying levels of equipment (synoptic, climate and rainfall measurement states). The measuring stations provide a big quantity of up-to-date data. All these data create a climate record that is based on a long history of instrument measurements of the meteorological data. Boundary conditions of climate models are used for a creation of emission scenarios. Furthermore, analyses of the current and past climates and analyses of impacts of climatic changes on individual industries (forestry and water management, agriculture, power generation) cannot exist without basic climate data (Rodó, Comín, 2003; Arnell, 2004; Battisti, 2004). Opinions have been appearing now that the climatic changes result from an increasing variability of the climate (Storch et al., 1995) caused by changes in energy relations in the climate system. Reliable climate data are also essential for analysing the historical variability of the climate.

CLIDATA is a climate database application of a Czech origin. This is a software extension to a standard ORACLE database system. An international collaboration in meteorology and climatology within the World Meteorological Organisation ("WMO") is possible only if there is an consistent approach to measurement methods and delivery and basic processing of measured or observed data and information. The Czech Republic ranks among developed European countries in the meteorology and climatology. Such countries use specialized agencies or authorized organizations to carry out the mentioned activities. In the Czech Republic, the Ministry of Environment delegated its allowance organization - Czech Hydrometeorological Institute ("ČHMÚ") – to perform basic data processing, to maintain a database and to prepare source information for the state administration. All those obligations are anchored in the deed of foundation of ČHMÚ.

The quantity of climate data has been increasing all the time and database processing in the climatology must be of a top level. The objective of this publication is to provide details about CLIDATA - the climate database application that was developed in ČHMÚ from 1996 and 2000 and that is still being extended. CLIDATA is used by Czech meteorological services as well as in other countries. In CLIDATA, the climate data can be processed in a consistent and user-friendly way.

Chapter 2 of this publication gives a brief overview of a history of automatic processing of the climate data. Chapter 3 describes briefly the history of CLIDATA and outlines pre-requisites for further development of CLIDATA. Chapter 4 describes the start of CLIDATA and operating fundamentals. Chapter 5 is the most extensive chapter that provides a detailed description of individual parts of the application: geographical stations, data acquisition, data control, system administration and related products. Chapter 6 describes the current version of CLIDATA in the Czech Republic and in other countries and Chapter 7 draws briefly conclusions about a future development of CLIDATA.

It should be pointed out that CLIDATA is not a work of an individual. This climate database application was developed thanks to knowledge and expertise of several developers and authors. A basic idea, most assignments to software developers and testing from 1996 to 2003 can be regarded as a clearly original work of the author of this publication. In that period, the author co-operated closely with his colleagues, in particular with Dr. Luboš Coufal who headed then the Climatology Department in ČHMÚ. Since 2004, only minor adjustments and debugging have been carried out. The author is not directly responsible for operation of CLIDATA now. But he is still involved in preparation of individual changes and integration of CLIDATA into an operation application. In no case, the author developed a program code of the application. This work describes only user parts and climate data parts of CLIDATA. Without co-operation of many people, CLIDATA would never be developed and could not be used in day-to-day activities carried out by ČHMÚ.

2 History of automated processing of climate data

In the Czech Republic, climate data are typically data measured and observed from times when measurements with instruments started. It is well known that first regular measurements in the Czech Republic started in 1775 in Klementinum in Prague (Seydl, 1963). Information from the Klementinum station ranks among oldest records kept in ČHMÚ’s database. As more information has become available for database processing, methods for data storing and general processing have started changing. Figure 1 shows a considerable increase in stations that are kept in CLIDATA now . Since 1961, all data are stored in the database. In 2002, more than 1,000 stations (incl. experimental and foreign ones) were used to provide data necessary for assessment of disastrous floods in 2002 in the Czech Republic (Kubát a kol., 2003). From 1775 to 1849, only the data from the Klementinum station in Prague were available. Only then, the number of the meteorological stations started increasing. Before 1978, paper reports only were available for interval and daily observation data. Each month, a general data processing was carried out (Fig. 2). Monthly values were published in annual bulletins from 1916 to 1977 for the climate stations and from 1936 to 1978 for the rainfall stations (Krška, Šamaj, 2001).

Fig. 1. Number of stations stored in CLIDATA as of 1 January 2007

In the 1970s, a gradual migration of interval data and daily data started so that the data could be processed by computers. Though the purpose of this work is not a history of computers, it should be mentioned that the climate data were stored in punch cards, punch tapes, magnetic tapes, floppy disk, and external magnetic tape units. In 1993 climate experts started storing the data in the first database – CLICOM – in standalone computers. The development of the backup method has been so far always faster than the data backup itself. For instance, before all data had been stored into the punch cards and punch tapes, the data backup method changed. Then, new data were backed up in magnetic tapes and existing data had to be transferred from old media. Therefore, the historic data and up-to-date data were processed separately in the ČHMÚ until 2000. The standard processing of the up-to-date data on a monthly basis started in 1983. The data were stored on a magnetic medium used by a computer in Praha-Komořany where ČHMÚ has got its headquarters (it was a mainframe computer EC 1055 until 1993). At the same time the historic data from 1961 were collected from some stations.

Fig. 2: Monthly statement of meteorological measurements and observations (P2TURN01 station, June 2006)

In 1992, a decision was made to stop using EC 1055 and to introduce CLICOM for processing of up-to-date as well as historic data (CLICOM Project, 1989). Then, the data were not available in the central computer department of ČHMÚ in Prague only, but became accessible for all PCs in individual departments of ČHMÚ. CLICOM was the first database application to be used regularly by climatology experts in the Czech Republic. Data were stored in DataEase, a database application developed by U.S. NOAA in 1983. Installations became popular all over the world, in particular in developing countries (Report of the Meeting, 1999; Meeting of the WMO, 2000).

For ČHMÚ, this system represented an enormous progress. However, it became clear soon that CLICOM was a rather of restriction for further development of the climate data processing.

3 History of CLIDATA

The main disadvantage of CLICOM was a lack of co-operation among installations in the computer network. Data between workstations in the headquarters and branch offices were transferred on diskettes or in first e-mail system. Consequently, it was essential to organize the work smoothly and still a danger existed that ČHMÚ would provide non-identical data to some external users. ČHMÚ staff was trying to modify CLICOM so that it could meet their needs at best. Some components were added, the user environment was translated into Czech and necessary manuals were drafted (Coufal, Tolasz, 1994; Coufal et al., 1996). ČHMÚ experts became recognized in WMO for their knowledge of the system. With the financial aid of the Czech Republic, CLICOM was installed in meteorological services of Macedonia and Moldavia.

In 1996, ČHMÚ decided to develop its own application in accordance with recommendations submitted by various WMO expert groups (Coufal, Tolasz, 1998; Report of the Climate Database Management Systems Evaluation Workshop, 2002). From the very beginning it was essential to define clearly basic rules that would apply until 2000 when the application was put into operation. Below are the key principles:

  • The application will be built on the ORACLE database system.
  •  
  • ČHMÚ will not be involved in the programming.
  •  
  • One person only will co-ordinate the works on behalf of ČHMÚ.
  •  
  • If necessary applications exist (e.g. a table processor, statistics package or a mapping software), they will be used and will not be developed.
  •  
  • ORACLE database and ORACLE tools will be employed preferably to develop a programme code.
  •  
  • The user environment will exist in various language versions that can be modified and customised.
  •  
  • The database and programming part of the application will be in English only.
  •  
  • The application will be developed gradually, by individual logical parts.

3.1. ORACLE database system

The decision to use ORACLE was influenced considerably by WMO’s recommendations. And this proved to be a good choice. It was decided to develop the application in ČHMÚ branch office in Ostrava. It was necessary to find out a programming company experienced in ORACLE database programming. The company was required to be big enough for not depending on one employee only and to be able to develop the application step by step. Those conditions were met by ATACO, s.r.o. that has been developing database applications for management of technology processes for a long time. ATACO became a reliable partner for ČHMÚ not only for development CLIDATA, but also for the operation and installation of CLIDATA in other countries.

3.2. Programming works

Within the development of CLIDATA, ČHMÚ provided specialized advice and prepared assignments for individual parts of the application. ČHMÚ, however, was not involved in the programming. ATACO provided software developers and warranted the development at the same level if in case of personal changes. Now, ATACO provides also consulting in respect of database issues for the Meteorology and Climatology Department in ČHMÚ.

3.3. CLIDATA development coordinator

ČHMÚ created in Ostrava a specialised department responsible for the climate database. The department was delegated to co-ordinate development works and to be in contact with ATACO. The author of this publication headed the department from 1996 to 2003. In order to ensure a personal substitutability on the part of ČHMÚ, the author was represented by his methodology superior – the head of Climatologic Department from ČHMÚ in Prague (this approach being rather untypical).

3.4. Restrictions in CLIDATA development

Products that can be replaced with existing software have not been developed. The application can be accessed through (Structured Query Language) or ODBC (Open Database Connectivity). This means, the data stored in CLIDATA can be processed by means of various statistics packages, GIS applications or MS Office applications (typically MS Excel). It was clear that the users would require table outputs with definitions depending always on the user and his current requirements. Even an experienced climatologist with many years of practice cannot define unambiguously all types of outputs that are normally needed in practice. Therefore, a user environment in Oracle Discoverer (Fig. 3) is an integral part of the application. In this environment, it is possible to process the climatologic data out of the CLIDATA user interface and a low level of authorization is needed for users (see Chapter 4.1). Special applications were also developed for map outputs in CLIDATA-GIS (Tolasz, Stříž, 2003). Even the users without knowledge of GIS may create the map outputs based on the climate data stored in CLIDATA. Throughout the development and use of CLIDATA, many other applications were developed in ČHMÚ. They extend now original capabilities and functionalities of the system.

Fig. 3: Oracle Discoverer environment

3.5. Programming code

In order to streamline changes in individual application versions and source database it was decided to develop in ORACLE only. The source database versions changed gradually from the start of development until trial operation: from Oracle 8, through Oracle 9 and 9i to Oracle 10i that is being used now. Basic parts of the application did not face any problems when upgraded. A replication extension only had to be modified after the first change of the source database versions. The same programming code of the application has been used now smoothly in databases installed under various operating systems (Solaris, Linux, Windows).

This principle was not met in only one place in CLIDATA. According to one of the assignments it was required that CLIDATA should be able to take over all basic data and metadata from CLICOM. This part of the application was not developed in ORACLE, but in C++ and integrated as a special programming unit into the CLIDATA structure. Later on it became evident that this part of the application preferred CLIDATA to other successors of the previous application CLICOM (CLICOM Project, 1989).

3.6. Language versions

At the beginning it was decided to develop an application that would be able to replace CLICOM in various countries. Therefore, the application can switch between language versions (Czech, English, French, German and Spanish) and each user can also prepare a language version in another one language. This means that the application can be operated in all ORACLE-supported languages as well as in the application system environment (Windows, Unix or Linux). In each language version, CLIDATA is able to communicate with the user in the specific language, define description texts for form sheets, tables, or helps. All this eliminates problems in foreign language communication for standard users. The programming code however exists in English only.

3.7. Assigning of specific development tasks

Earlier, the co-operation between ČHMÚ and ATACO was characterised with two big issues: climate experts lacked knowledge of relation databases and Oracle and programme developers lacked knowledge of climatology. Therefore it was decided to assign development tasks step by step instead of contracting a complete project. This approach was rather untypical but the both parties were able to gain gradually the necessary expertise (both the database and climatology knowledge).

3.8. Putting into operation

CLIDATA was put into a regular operation in January 2000 in the ČHMÚ branch office in Ostrava. Within several weeks, CLIDATA was introduced in all branch offices. Users were trained in entering and correcting of new data in a standard monthly mode. Another training was provided to users who processed the data and used them for drafting of opinions and other routine work in ČHMÚ. All history data from CLICOM (the previous applications) were migrated into CLIDATA well in advance and user manuals were drafted (Coufal, Tolasz, 1999). In the first half of 2000, several user trainings were organized jointly for all ČHMÚ branch offices. Then, the users were trained directly in their departments. The transfer from the CLICOM data processing to CLIDATA data processing was expected not to be easy, but it was completed smoothly without any problems. The data in the DataEase CLICOM format were archived as a safe back-up and were used once only after an unauthorised intervention in some data structures at the end of 2006. The successful transfer to CLIDATA was proved by installations of CLIDATA in other countries (Ghana and Macedonia) in 2000.

3.9. Basic access to CLIDATA structures

From the point of view of users, CLIDATA is divided into several basic parts that provide the users with various levels of services. Some users only have an access to all parts of the CLIDATA. Typically the users are authorised to work in some parts of CLIDATA only and to use certain database rights only.

3.10 User rights

A core of CLIDATA is the database in the Oracle database server (Fig. 4). This part defines tables, table relations and programming code of the application. All climate data including metadata are stored in the tables. The user must to enter his or her username and password to access the data. User rights are read, hide, insert, update and delete and apply to individual parts of the application as well as to individual parts of the tables.

  • The read right enables the user to display and export data stored in the tables provided that the user has got a necessary autorisation and the data are not hidden. This user access authorisation is the lowest. It is however enough for a routine work of a climatologist.
  •  
  • The hide right is automatically granted to each user who did not logged in. If the user does not log in or logs in incorrectly or before definition of the rights, contents of all tables are hidden.
  •  
  • The insert right enables the user to insert new records in the tables.
  •  
  • The update right enables existing entries in the tables to be updated or corrected.
  •  
  • The delete right is for users who will delete the entries in the tables.
  • Fig. 4: Structure of the application

Combining several of the rights above, each user can be given a group of authorisations necessary for him to work in the database. For instance, if the user enters new information into the database or corrects existing data, the required rights for the basic data tables are read, insert, update and delete. It is however possible to restrict the rights for the table only for certain parts of the table. For instance, the widely used read right for the monthly data tables can be restricted to a certain element or for certain stations only. The user with restrictions can, for instance, process the monthly rainfall aggregates for the stations within competence of ČHMÚ in Ústí nad Labem, while other information in the table will not exist for such user.

3.11. Starting CLIDATA

The user can access CLIDATA data in CLIDATA or out of the regular CLIDATA interface. The most frequent tool is however CLIDATA itself (Fig. 5) or iCLIDATA - an Internet version of CLIDATA (Fig. 6).

Fig. 5: CLIDATA logging in Fig. 6: iCLIDATA logging in

CLIDATA must be installed in a user PC and network environment tools must enable the user to work in the CLIDATA database instance in the database server. If iCLIDATA is used, an access name and password are needed and Oracle Application Server must run in the ČHMÚ’s network through a web client. The users with the read right at least can use Oracle Discoverer, a standard application developed by Oracle (Fig. 7).

Fig. 7: Oracle Discoverer logging in

The environment of this application looks like a standard table processor (e.g. MS Excel, Fig. 3). It however uses contents of the database tables in the database server by means of pre-defined database views. Higher level users can access the database tables directly through SQL Client (Fig. 8).

Fig. 8: SQL logging in

There is also a special application that can be used for processing the data in the database table: CLIDATA GIS. This application enables the users to create map outputs in ArcView products developed by ESRI.

3.12. Entering data in CLIDATA

To process the data in CLIDATA, it is essential to have general knowledge of climatology and structure of data stored in the database tables. The principle of CLIDATA and CLIDATA relation basis can be explained in a relatively simple example when data are entered for a daily rainfall aggregate from the Milešovka station in February 2005. New data cannot be entered into the database tables, unless a basic description has been created.

3.12.1. Definition of an element

First, it is necessary to define an element. Otherwise it is impossible to enter the daily rainfall aggregates into the database. In this case, the element is SRA (Rainfall Aggregates in Interval) with the definition set forth in Fig. 9.

Fig. 9: Definition of the SRA element.

The definition of the element includes the abbreviation (SRA), name (Rainfall) and description (Rainfall Aggregates). Each element is assigned a unit that is selected from an existing list of units (Fig. 10). A lower or upper limit, this means a physical range, can be also defined for the element. The physical range cannot be exceeded. This means that it is impossible to enter into the database a value that higher or lower than the defined limit.

Fig. 10: List of units

In the right part of the definition form sheet, calculation methods are defined for EDATA (information about element extremes, table 1), NDATA (calculation of normal values of the element, table 2) and MDATA (pentade, decade and monthly data, Table 3). This example shows the monthly, decade and pentade maxima and aggregates in MDATA, irregularities in EDATA and normal sum in NDATA. The lower part of the definition form sheet is used for calculation of MDATA_COUNT (a number of days in month when the limit was exceeded or was not reached, table 4). In it also possible to define a conversion table for import of data from CLICOM.

Table 1: Contents of EDATA

3.12.2. Definition of the station

In order to define the station in Milešovka it is necessary to prepare and fill in several supporting tables. The definition form sheet (Fig. 11) contains fields for which a list of possible values should exist (Country, River Basin and District). The definition of the Country defines, among others, the definition of time valid in the country. That time will be used for the saving of measurement results (Fig. 12).

Fig. 11: Definition of the station in Milešovka Fig. 12: Definition of the country for stored data

A unique ID for the station is an indicative (U1MIL001) combined with an observation interval (from 1 August 1997 until now). Now, the name of the station, geographical coordinates and altitude are filled in. Other records are not mandatory, but make climatologist’s future work with data easier. If any parameter of the station changes considerably, the entry must be closed: a current end of observation is entered and a new entry is made for the station with the same indicative, but with a new observation interval. It is possible to enter 21.12.9999 at the end of observation. This indicates that the station exists now.

3.12.3. Definition of station elements

Each definition of the station contains a list of measured and observed element (this means, the relations between the definition of the station and that of an element). The form sheet used for the definition of station elements in shown as a table in Fig. 13.

Fig. 13: Definition of station elements

Some tables need to be prepared in advance before the filling in of the form sheet starts. The definitions include the element as well as observation schemes, if any, and observation intervals. The basic data in CLIDATA are generally divided into two independent tables. Nonregular data that are measured in preset climate times during a day (for instance at 7 a.m., 2 p.m. and 9 p.m. of the mean local solar time) are governed by the observation schemes and are entered in RDATA_N (table 5). Regular data that are measured in regular intervals during a day (for instance, every hour or every 15 minutes) are governed by the measurement interval and are entered in RDATA_R (table 5). For SRA, the observation scheme 2 has been defined with the measurement time 07:00. This means that a daily rainfall aggregate measured at 7 a.m. is saved in RDATA_N. The definition of the element in the station defines also a measuring instrument and amethod used for calculation of some daily values or derived characteristics.

3.12.4. Definition of a data acquisition form sheet

If the daily rainfall values are to be entered manually in the database, it is necessary to prepare a data acquisition form sheet. In the system it is possible to customise the data acquisition form sheets. Then, the form sheets will be optimally in line with the data acquisition statement.

Fig. 14: Definition of the data acquisition form sheet

Definition of the form sheet is given in Fig. 14, while Fig. 15 shows a selection of the elements in the form sheet. The user can define not only the appearance of the form sheet (such as elements or a calendar on a line or in a column), but also can check column sums and line sums.

Fig. 15: Adding of elements in the data acquisition form sheet

3.12.5. Entering of data

It is still rather widespread to enter manually data into the database from non-automated climate and rainfall measuring stations. More and more climate data are acquired directly in the station (by an automatic station or by an observer) and are imported then in the database. In the manual data entry form sheet (Fig. 16) it is possible to change the data orientation (by lines or by columns) and to fill in the last acquired value in the remaining part of the line/column.

Steps that need to be carried out before the climate data are entered into the database prove that the application maintains integrity of specific pieces of information in the database. At the beginning of the work this can result in certain user problems, but the climate data are finally well arranged and easy to survey. On one hand, it is clear from the example that CLIDATA is rather variable. On the other hand, administration of the application is rather demanding (see Chapter 5.4).

Fig. 16: Using the data acquisition form sheet

4 Description of basic parts of CLIDATA

CLIDATA is divided into several logical parts for the users. Depending on the level of authorisation, the user uses one or several parts of the application. Each well designed database application must contain general information about stored data, this means the metadata. In CLIDATA the metadata describe mostly measuring locations (geography of stations) and, in the system part of the application, definitions of parameters relating to the measurement, storing and general data processing. Climate data in CLIDATA are accessible by various tools (Chapter 4.2). More details about the data processing are described in following chapters. A particular attention is also paid to the acquisition and verification of data. The system part of CLIDATA provides an access to user tools used for the definition of data import, data administration and verification of certain automatic actions (for instance, regular inquiring in the database and sending of results to preset e-mails, or the definition of an interval for checking of import directories) and for data replication in ČHMÚ’s network environment.

4.1. Geography of the station

A geography of the station in the application is a complete database description of measuring locations, measured elements as well as the method used for calculation of characteristics and daily values. This part of CLIDATA includes also user tools for processing of records in the tables GEOGRAPHY (table 6) and ST_ELEMENTS (table 7) Figure 17 shows a basic form sheet for the station geography. The body of the form sheet consists of basic fields. On the right-hand side of the form sheet, there are options for the form sheet and tools for opening other form sheets.

Fig. 17: Station geography form sheet

A unique combination of a record in the GEOGRAPHY table is formed by following items: Station (GH_ID) and dates when the observation starts (BEGIN_DATE) and ends (END_DATE). This means, that several records exist typically for one station in the GEOGRAPHY table. In the history of measurements, locations of the stations changed often in towns and villages. In that case, following parameters changed: geographical co-ordinates (that can be measured now with an accuracy of degrees, minutes and second with a GPS), altitude, details of the station or characteristics of the measuring programme (STATION_TYPE) (Lipina et al., 2000). For an example of a "moving” station see Table 8.

ČHMÚ has been applying this international indicative in all climate stations. For creation of a hydrological indicative, numbers of a hydrology sequence have been used. It is also used in other countries for measurements on floating buoys and ships. A deviation indicates a real time of the climatologic period, while the data in the database are stored in an agreed time (for instance, 7, 14 and 21 hours). For some calculations (e.g. psychrometer-based humidity) and for estimation of certain data, reference stations are employed. The table is designed for data that occur irregularly, for instance 9SpSpspsp in the SYNOP report. For some elements, missing values are filled in automatically by a multinominal function and boundary conditions.

The station in Jeseník has been chosen deliberately as an example of a station that moves very often. When processing the climate data, a particular attention should be paid to such records. Only then, complete information will be available to data users. Furthermore, this station is located in a town that changed its administrative classification (in 1992, a part of the Šumperk District split off to form the Jeseník District). It was decided then to assign the entire measurement history to the new district – Jeseník.

The meaning of individual items is clear from the explanation of the GEOGRAPHY table in Table 6. Other items should be included into the description of the station. Those items are available to users in specific tables only.

In the right-hand side of the form sheet (Fig. 17), there are control tools that can be used for a currently displayed record. Chapter 4.3 clarifies briefly the principle of a relation database – the data cannot be stored, unless described perfectly. This, certainly, results in certain operating complications. In the currently measuring station, changes can occur and a new record in the geography needs to be created. Parameters GH_ID, BEGIN_DATE, END_DATE define the station as follows:

O1CERV01, 1.1.1957, 31.12.9999 Now an adjustment is necessary so that two records could exist in the database: O1CERV01, 1.1.1957, 14.8.2005 O1CERV01, 15.8.2005, 31.12.9999.

Fig. 19: A map cutout with the location of the station

A reason for such change might be that a forest near the station was cut down, changing thus considerably the surroundings of the station. Data from the station are imported continuously. Once the change is made in the database, the observation cannot be closed by the 14th August 2005 because the database includes the data from the 15th August 2005 at least. It was possible to close the database not later than during the first hour after the midnight CET before the data with a new date are imported. Or it was possible to stop the import and place all the imported data into supporting tables. It is also essential to keep in mind that the data availability is of key importance. Similar changes in descriptive tables are reflected typically in the database with a monthly delay. Therefore, supporting sub-programmes were created. They will make it possible to close the observation and create a reliable record for the station irrespective of the dates. The database sub-programmes are named Interruption and Merging and enable the data integrity to be kept.

Fig. 20: Heliographic horizon of the station

4.2. Description of the station observation

In the Station Geography form sheet, it is possible to access a related form sheet named Description of Station Observation (Fig. 22). This form sheet is created for the ST_ELEMENTS table (Fig. 7). It lists all elements measured in the station. The combination of the station and measuring period is unique. An example of changes in temperature measurements (T—element) because of an automatic control shows that the database stores general information about the change in the measuring method. Not only the measuring instrument but also information about an observation schedule are important for the description of the observation. Chapter 4.3.3 describes the division of the basic data into RDATA_N and RDATA_R (table 5). Fig. 23 shows a list of climate schemes of the observation schedule for RDATA_N. The definition of applied times is absolutely free and can be adjusted to specific climate practice. For RDATA_R it is necessary to define a measuring interval that fills in regularly an entire day or a part of the day. ČHMÚ uses typically following intervals: 10 and 15 minutes, 1, 3, 6, 12 and 24 hours (Fig. 24). Most items in the table use 15 minute and 1 hour intervals. The other intervals apply to the calculated elements, e.g. rainfall aggregates for 3 and 6 hours based on hourly rainfall aggregates or storing of “synoptic” rainfalls every 12 hours with a variable measuring interval, or 24 hours for some “daily” characteristics mentioned in SYNOP (SYNOP is a report on ground level meteorological observations from a ground level station, 2003).

Fig. 21: Example of a definition of the station surroundings

The second part of the form sheet includes information about a calculation and storing of some calculated elements (for details see Chapter 5.4.4). It is also possible there to assign historical units to some elements that are related to the currently used unit in line with a definition in a special table (Fig. 25). For instance, a wind speed in Beaufort degrees (Beaufort ->m.s-1), wind direction marked with an abbreviation instead of degrees (such as NE ->45°) or air pressure in torr (torr->hPa) can be converted. The defined conversions can be used for collection of data from historic statements or for preparation of data for some specific purposes.

Fig. 22: Description of the station observation Fig. 23: List of observation schemes Fig. 24: Measurement intervals Fig. 25: Historic units

4.3. Data

Though this chapter describes in particular the calculation of estimated basic data used typically in the climatology, it is necessary to provide general information about a structure of the identical tables: RDATA_N and RDATA_R. These are pentade, decade and monthly data in MDATA (table 4), long-termed averages and extremes in EDATA (table 1) and normals in NDATA (table 2). The other basic data are one-minute rainfall aggregates in INTENSITY_RAINFALL (table 9) and information about occurrence and duration of meteorological phenomena in MET_PHENOMENA (table 10) that is used as a source for MDATA_PHENOMENA (table 11). This is a derived table that contains a number of days per month when a phenomenon or phenomena occurred. It was very difficult to persuade database experts and CLIDATA developers that the estimated data should be stored permanently in the database. From their point of view, this was just a waste of processor time and unnecessary use of disk capacity. All those data can be calculated only when they are required by user.

The experience from the climate practice shows however that a typical user needs the estimated data more frequently and in bigger time/geographic volumes than the basic data.

Fig. 26: Combination of the daily cloudiness and sunshine (Ostrava, Mošnov, 15 January 2006)

4.3.1. Basic nonregular and regular data

CLIDATA includes two independent tables with an identical structure (RDATA_N and RDATA_R in table 5). Users can use separately nonregular data (N) and regular data (R). The practical operation proves that this division is very good, as the data are used in really different ways: the regular data are used for operative meteorology, while nonregular data are used in standard climate practice. It is also possible to make calculations using the RDATA_R data and to store the resulting values into RDATA_N. The original intention was to use RDATA_R to archive only the data that are not used so much in the climatology, as the operative meteorology lacks tools for a long-termed data archiving. The climatologists have been given a tool that makes it possible to view the climate absolutely differently - as a long-termed average state of atmosphere. In CLIDATA it is rather easy now to process a long-termed daily occurrence of individual elements (Fig. 26) or to calculate average daily characteristics from 15 minute data in days defined with a midnight-to-midnight interval or in "climatologic" days beginning with a morning climatology time, that is to say 7 a.m. local central time (Fig. 27).

Fig. 27: An average daily temperature of a calendar and climatologic days (Doksany, October 2006)

Table 5 shows that a month is used as a basis for a data model for the both basic data tables. The combination of “the station, element, year, month and time” (EG_GH_ID, EG_EL_ABBREVIATION, YEAR, MONTH, TIME) is a unique key. The values are defined in the line by the items VAL01 through VAL31. This model was used in CLICOM and the previous application proved that it was a good choice for the users. It is however not so good for programming and complicates the calculation of estimated data that need to be prepared in advance for the users. At the beginning of the development, the data model was discussed much. It was considered to use an element model where the unique key would be a combination of "the station, year, month, day and time" (EG_GH_ID, YEAR, MONTH, DAY, TIME) and the values of the elements should be defined in a preset order (for instance, the maximum temperature, minimum temperature, temperature, wind speed, wind direction, rainfall aggregate, height of new snow, total height of snow...). As the climate stations differ considerably, the model contains many empty spaces that load a computation capacity unnecessarily, because it is necessary to define more various tables and the application becomes difficult to control (each type of the station should have a special data structure and the user would approach in different ways the temperatures in the synoptic, climatologic and agro-meteorological stations). The last option is a full data model where the unique key consists of "the station, element, date, time and value" (EG_GH_ID, EG_EL_ABBREVIATION, DATE, VALUE). Such a model is easy to control both for the users and software developers. It is, however, very demanding in terms of a computer capacity (the data key is much longer that the data information).

Data are either imported into the basic tables (Chapter 5.4.3) or filled in in the data acquisition form sheet (Chapter 5.3). The data to be imported are either up-to-date data with the shortest possible interval ranging from 15 minutes to 1 day (for the operative meteorology) or historic data transferred from other software environments. It is possible to enter manually up-to-date data in monthly cycles from the stations that have not been automated yet or historic data (if they can be entered directly into CLIDATA).

Each change (if any value is added, edited or deleted) is immediately checked in the basic data tables. If any calculations are defined for that element in the description of the station observation (Chapter 5.1.1) or in the definition of the elements, the system will enter the information into supporting tables. The supporting tables are processed regularly beginning with the 1st minute. This means that the information in the database is always full (for instance, once the temperature of the air is saved in three climate periods, the average daily temperature will be calculated within 5 minutes, or if the psychrometer value needed for calculation of the air humidity changes, the result will be calculated again). In the ČHMÚ database environment, the changes are entered in the replication tables and transferred, in a preset interval, in a star-shaped way among the individual database servers (Fig. 28).

Fig. 28: Star-shaped replications of basic data in ČHMÚ

The values in the basic data tables can include flags (FIxx as a data flag or FIIxx as a quality flag). Data flag options are defined in a special table (Fig. 29). Those options provide more details about results of the measurement.

Fig. 29: Data flags of the values in the basic tables

For instance, the flag N for the total height of the snow cover being 0 cm means that it is discontinuous snow cover that was entered into the database. Or the flag L for the wet thermometer temperature means that there was ice on the thermometer housing. The quality flags are defined directly in the system and cannot be customised. For each value, following flags can be entered:

  • E – estimated value
  •  
  • C – calculated value
  •  
  • + – an attempt was made to enter a value beyond the limits (see Chapter 4.3.1).

The entire monthly record is given a validation flag that informs the user about the progress of data validation. Below are meanings of the validation flags:

  • A – the record has been validated
  • B – the record has been validated in CLICOM (the previous application)
  • C – this is a calculated record
  • N – the record has not been validated

The records where the validation flag is A, B or C cannot change. The value in such record cannot be edited or deleted even by a user with the highest authorisation. If such value needs to be checked again or edited, it is necessary to change the validation flag (Fig. 30) from A or B to N. After validation, the flag will be set to A.

Fig. 30: Form for changes in validation flags

4.3.2. One minute rainfall intensity

Thanks to automated rain gauges it is possible to store one minute rainfall intensities into INTENSITY_RAINFALL (Table 9). Then, the current rainfall intensity can be evaluated immediately. Originally it was necessary to analyse an ombrograph record that was typically available after the end of the month only. When processing the table, it is necessary to be careful with values measured in the period when the value is negative. There is a delay and loss depending on heating intensity of the rain gauge.

4.3.3. Meteorological phenomena

The meteorological phenomena in CLIDATA are stored in MET_PHENOMENA (table 10). Each record in that table includes information about one meteorological phenomenon the duration of which is interrupted always at the end of the calendar day (a permanent snow fall at night from 10 p.m. on one day until 4 a.m. of the other day is stored in the database in two subsequent records (Chromá et al., 2005). All phenomena are pre-defined (Fig. 31).

Fig. 31: Form for the definition of meteorological phenomena

A special font extension with the phenomenom file.ttf is used to display the flag. The flags will display correctly in other applications too (Figure 32). In case of storms, additional information can be stored (such as a storm direction, maximum vicinity or maximum gust of wind depending on direction/time). In the Czech Republic, times of meteorological phenomena are estimated only in some stations. In that case, the time is replaced with a time flag (Fig. 33).

Fig. 32: A list of meteorological phenomena in MS Excel Fig. 33: Time flags of meteorological phenomena

4.3.4. Pentade, decade and monthly data

Pentade, decade and monthly data are calculated by means of MAX, MIN, SUM or AVG functions for following periods/days:

  • pentades:1 - 5, 6 – 10, 11 – 15, 16 – 20, 21 – 25 and 26 - EoM,
  • decades:1 – 10, 11 – 20 and 21 – EoM, and
  • month:1 – EoM,
  • where EoM stands for the last day in the month (ranging from 28 to 31).

Fig. 9 shows the definition of the calculation used for the rainfall aggregate – SRA. The values are calculated for decades and the month using SUM of nonregular data (from RDATA_N). AVG is calculated if the condition 3/5 (Calculation of monthly and annual 30-year standard normals, 1989) is fulfilled. This means that as many as 5 values can miss for the calculation but only 3 of them can follow each other. Resulting values are stored in MDATA (table 3). The individual periods are identified as follows:

0 – month, 1 – the first decade, 2 – the second decade, 3 – the third decade, 4 – the first pentade, 5 – the second pentade, 6 – the third pentade, 7 – the fourth pentade, 8 – the fifth pentade and 9 – the sixth pentade

If MAX or MIN functions are defined for the component, information about the date of the calculated extreme is also stored in MDATA. If one extreme value occurs several times in the period, the first date only will be stored and a multiple occurrence flag will be attached to the value. MDATA includes a list of values of other elements relating to the period when the extreme, calculated by means of MAX and MIN, occurred (Fig. 34). This additional information is available because it was requested to display not only the maximum wind velocity but also a direction of the wind. This proves efficient co-operation between the user, client and software developer.

Fig. 34: A list of element values in the period of occurrence of an extreme value

4.3.5. Number of days

Frequently used characteristics include the number of days with values of the climate element and the number of days with occurrence of the meteorological phenomena. An application administrator can agree with the users to edit calculations of the characteristics in the element table (see Chapter 4.3.1). The characteristics are stored in MDATA_COUNT (table 4) and MDATA_PHENOMENA (table 11). The numbers of days with the meteorological phenomena are based on the definition in the system administration (see Chapter 5.4.2).

4.3.6. Long-termed averages and extremes

Long-termed averages and extremes in EDATA (table 1) are calculated in several independent actions. The values are used for different purposes then. Individual groups of the calculated values differ in SOURCE. This quantity can acquire any of following values:

  • S – standard – a calculation for a standard period defined in a table of system parameters (Fig. 12),
  • U – user – a user defined calculation for certain stations, elements and periods (Fig. 35),
  • F – full – a calculation for all data stored in the database for the element and scheme,
  • T – temporary – temporary values of the characteristics derived from another station for purposes of a data check.

Fig. 35: User calculation in EDATA

The STANDARD characteristics are used as a basis for a spatial analysis of data quality in a special GIS application (see Chapter 5.3). The calculation is possible for a minimum quantity of data stored in the database for the period defined in the system parameter table. For the current period from 1961 to 2005, the data for 10 years at least are needed (Moberg et al., 2006; Štekl et al., 2001). If the station does not fulfill the ten–year limit, it is necessary to estimate the TEMPORARY characteristics using other stations and coefficients/constants. The FULL characteristics are used for standard assessment reports by ČHMÚ. Typically, the reports are handed over to mass media. The USER can also choose the calculation of the characteristics for any period and for a chosen list of stations and elements (Fig. 35). The USER characteristics are available to the users until the next calculation is started. The record (Table 1) will include also a user name of the user who entered the calculation. Each user may store in the database an only one own calculation.

4.3.7. Normals

The system parameter tables (Fig. 12) define also two normal periods for which the calculation calculates the normals. WMO recommends that the standard normal period should be 1961–1990. But it is also possible to use another period that lasts thirty years at least. In practice, ČHMÚ prefers other normal periods (1961–2000 and 1971-2000). Most of the calculated normals do not meet all criteria set for the normal characteristics calculation. This is, in particular, the case of quality control and data homogeneity. The data quality control is sufficient CLIDATA. Homogenised data in the database differ in a station indicative. This means they cannot be replaced with original data.

4.4. Data acquisition and data control

4.4.1. Data acquisition

In case of non-automated networks, the most typical approach is to enter manually the data from database form sheets into the database. The application offers a user-friendly definition of the data acquisition form sheets where the screen corresponds as much as possible to the station measurement report (see Chapter 4.3.5). To compare the measurement report with the corresponding data acquisition form sheet see Fig. 36.The intention of software developers was to develop that part of the application in line approaches typical to data collection. It is also possible to use function key on the numerical part of the keyboard (for instance, it is possible to toggle Enter for moving in lines or in columns, or to set that a value from a current field will be added into remaining cells in the line or column). Within the data acquisition, upper and lower limits are checked for the defined element and it is checked whether characters are permissible.

Fig. 36: Comparison of the data acquisition form sheet and station measurement record

Because the user cannot check the screen when he enters the data, the field with an incorrect value is highlighted in colour and the value is replaced with an agreed string (Fig. 37).

Fig. 37: Data acquisition form sheet with incorrect values

If the incorrect value remains in the data acquisition form sheet without being corrected, the user will be informed in the first control level by a different colour of the field. In some countries, the data are acquired twice and the system checks whether they are different. In CLIDATA this is possible, but ČHMÚ does not use this feature.

4.4.2. Data control

The first level control takes place during the acquisition of import of the data. If the value to be stored does not comply with the limits, it will not be stored and the system will set a quality flag (see Chapter 5.2.1) that indicates this problem. After the data are acquired or imported, the system will check the control form sheet. It is same as the data acquisition form sheet, but it is not necessary to use the identical form sheets for the acquisition and control. Once the control combination (Station Indicative, Year and Month) are entered, the data will be downloaded into the form sheet and checked for compliance with the defined control equations (Fig. 38).

Fig. 38: Control equation for data control Fig. 39: Indication of faulty values in the control form sheet

Only equations will be selected that include elements defined in the control form sheet. If the value, or values, in the control form sheet does not comply with the equation, it is highlighted in green (Fig. 39). The user has to decide what steps will be taken in respect of the faulty condition. In the form sheet, there are several buttons: S (a correct value), O (an estimated value), V (a calculated value) and N (an empty value). Once the function is selected, the values can be changed. If the value is O or V, the value is assigned a quality flag. The user should make decision about all values that the application highlighted. Each highlighted value will be given an R control code. It is possible now to find out anytime whether all values in the database have been corrected. To facilitate the work, the user can display each equation with a non-conforming value (Fig. 40).

Fig. 40: Control equation for the value

If a user changes the value, the system will store information about the change into a special table. In the control form sheet it is possible to view a history of all changes for the value (Fig. 41).

Fig. 41: History of value changes in the control form sheet

The changes in the values are monitored not only in the control form sheets, but also in the entire application. HISTORIC_VALUE_N (or HISTORIC_VALUE_R) is an table that provides general information necessary for creation of lists of changes. The table shows also a user name and an exact time of the change. For example, Fig. 42 shows a list of actions taken when revising the data by a user during one month.

Fig. 42: A list of user changes

Control equations can be used to control the data by batches too. An example of assignment and results is shown in Fig. 43. The batch control is recommended before a rather big quantity of source data are processed or for preparation of various studies or verifications by data correction staff.

The next step in the control process is a spatial analysis that uses GIS applications to display a simple proportional symbol map for each station, element and period (Fig. 44). The proportional symbol maps are based on a comparison of a current value and percentiles stored in EDATA. The user can easily survey positions of current values in the empirical distribution of values for at least ten-year observation period in the month that is checked for that element in the station. Only some percentiles (1, 2, 5, 10, 20, 50, 70, 90, 95, 98 and 99%) of the elements are used. The percentiles are valid always for a respective month of the year. Estimated values for the station with a short observation period are approximate only. Nevertheless, they are sufficient for the data control. If the value exceeds 99% percentiles or is lower than 1% percentile, a special font is used to highlight the value (the card chart does not display). If a user believes that the current value might be incorrect, he can view a list of values for the station confronted with values in neighbouring stations (Fig. 45). This simple GIS application enables the user to define source layers in the map to make orientation as easy as possible.

Fig. 43: Data control in batches – result Fig. 44: Example of a spatial data analysis Fig. 45: List of values in the spatial data analysis Fig. 46: Form sheet for system administration

4.5. System administration

This part of the application is intended for administrators who administer the installations. Using of CLIDATA has proved that this application is very simple and easy-to-survey for the users, but requires demanding administration and permanent supervision by an administrator. The administrator is obliged and authorised to create in CLIDATA an environment typical for a specific meteorological service. This part ranks among most important parts of the application. Without due knowledge, climate data cannot be stored correctly. Administration of an operating system and database environment are not included in CLIDATA.

Though some parts of the system administration were mentioned in previous chapters, below is a brief outline of all available items (Fig. 46).

4.5.1. Relation to CLICOM

As mentioned several times in the chapters above, one of requirements submitted before development of CLIDATA was that CLIDATA should be related to CLICOM that had been used in many developing countries. A particular attention was paid to a possible transfer of descriptive texts and data (Fig. 47). This covers an import of data, station geography, element description and information of meteorological phenomena. Sometimes, it is necessary to import data that have not been calculated from data stored in the application.

Fig. 47: Possible transfer of descriptive texts and data from CLICOM Fig. 48: Definition of a type of time Fig. 49: Meaning of codes for encoded elements

4.5.2. Simple code lists

The descriptive information includes several simple code lists with basic information used for description of the stations, observations or measurements. Some information, incl. the sample form sheets in the figures, has been mentioned above. This information includes (Fig. 10) historic units (Fig. 25), instruments, districts, regions, GIS regions, elements, type of time (Fig. 48) and applicability of a daylight saving scheme (Fig. 51), pedology, geo-relief, description of vegetation and categories of anthropogenic influences near the station, regular observations (Fig. 24), climate schemes (Fig. 23), river basin, element flags (Fig. 29), phenomenon flags (Fig. 33), description of values (Fig. 49), list of phenomena (Fig. 31) and definitions of phenomenon days (Fig. 50), control equations (Fig. 38) and tabulated values(Fig. 54).

Fig. 50: Definition of phenomenon days

Some simple code lists include not only the description, but also a kind of logics. The type of time (Fig. 48) combined with the definition of time (Fig. 12) makes it possible to convert zone times so that all data in a single database installation could be in an only one time. This is important when confronting the data and information. Applicability of the daylight saving scheme (Fig. 51) is used when processing meteorological phenomena in the Czech Republic where all types of stations (Instructions of Meteorological Station Observers, 2003) monitor the meteorological phenomena in the applicable time, this means in the Central European Time in one part of the year and in Central European Summer Time in another part of the year. The values are stored in the database in CET for the whole of the year. Thanks to the description of the values (Fig. 49) it is possible to process decoded information so that the information could be comprehensible for general public. In order to calculate MDATA_PHENOMENA (see Chapter 5.2.5 and table 11) it is necessary to define the phenomenon days (Fig. 50). The tabulated values (Fig. 54) include limits for elements applicable in altitude zones in individual months of the year. The values are used for another level of data control in the control form sheet.

Fig. 51: Applicability of the daylight saving scheme

4.5.3. Administration of import files

The application offers simple user options for defining formats of input import files. Fig. 52 shows an example of the definition of a most complicated case – a SYNOP release in a format of WMO press service. For regular text files the definition is easier. The definition enables to process individual data groups, assign the data groups to an element in the database, or to make calculations for a certain condition.

Fig. 52: Definition of an import file format for a SYNOP release Fig. 53: Form for the definition of calculation equations Fig. 54: Control limits for selected elements

4.5.4. Calculation equations

The application is flexible enough to include a user-friendly definition of calculation formulae (Fig. 53) that are assigned to the station in the form sheet where the elements in the station are defined (Fig. 13). Each change in RDATA_N, RDATA_R or MET_PHENOMENA is monitored and information that the values need to be calculated again is stored into supporting tables. For instance, any saving or editing of the value of air pressure in climate periods (07, 14 and 21 hours) will cause the calculation equations #4 (for an average daily pressure of the air), #8 (the pressure of water vapours) and #7 (the relative humidity of air) to start. This re–calculation in RDATA_N will trigger a sequence of other calculations: monthly and decade values as well as long-termed extremes will be calculated again (EDATA with F or S, see Chapter 5.2.6). If the change relates to a preset normal period, NDATA will be re-calculated too (Chapter 5.2.7). In ČHMÚ databases, information needed for data replications between individual parts of the database structure is stored too (Chapter 6.1). The re-calculation does not start immediately after the value changes, but in time intervals that are preset in the administration offer of CLIDATA (Chapter 5.4.7).

Fig. 55: Preparation of automatic e-mails

4.5.5. Preparation of data e-mails

CLIDATA is intended for general administration and archiving of climate data. It is possible to prepare a number of data outputs. Some of them can be fully automated: an inquiry is sent to the database or results are e-mailed. Fig. 55 shows a definition form sheet. It is possible to set an interval for repeating an inquiry. The interval is given either in a number of days or as a part of the day (1 means once a day, 2 means once in two days or 1/24 means every hour). An example in Fig.55 shows an inquiry in respect of the daily rainfall aggregate (SRA) that exceeded 30 mm. This inquiry is repeated twice a day for yesterday and sent to an e-mail conference named EXTREMY. This part of the application sends also information about operation of the application, a success rate for imports or a scope of required re-calculation of the data.

In CLIDATA it is also possible to attach files to emails where the files do not include replies to database inquiries, but outputs from other independent applications. Fig. 56 shows a definition for sending of a model.txt file to the Emergency Call Centre in Ostrava. The file contains calculated results of the hydrological rainfall drain mode.

Fig. 56: Sending of files Fig. 57: Definition of FTP data transfer

4.5.6. FTP transfer of data and information

In CLIDATA it is also possible to transfer data files via FTP. Such file can contain a reply to a database inquiry (Fig. 57). GET is a parameter that makes it possible to transfer actively files to the database environment, for instance, when importing data into the application.

Fig. 58: Administrator’s offer in CLIDATA

4.5.7. Administration of CLIDATA

Some parts of the application can be administered directly and it is not necessary to access the database or operating system (Fig. 58). A part dealing with voluntary observers is rather interesting in the context of users. It is included in that part of the application because it contains a number of sensitive data (such as birth certificate numbers, addresses and remunerations paid to observers). It is separated from lowest authorisation users. The part containing system report (Fig. 59) is used when the application is to be localised into foreign languages. See the description in Chapter 3.6. The process administration (for database jobs see Fig. 60) displays not only a list of all defined processes (import processes, data processes and system processes) but also highlights current conditions of the processes. Green processes are in progress, while red ones have been stopped. Each process is assigned a start interval that depends on importance of the process (the figure shows replications processes that have been started – the interval is 1 minute).

Fig. 59: Part of a localisation form sheet Fig. 60: Process administration

The administration of processes and authorisations (Fig. 61) is also a very sensitive part of application. The administrator may forbid a user to enter the application or may restrict his rights to a minimum scope that is really necessary. Restrictions are either defined in roles or SQL conditions are used flexibly. For instance, the condition eg_gh_id=‘P1PKLM01’ and eg_el_abbreviation=‘SRA‘ and year=‘2007‘ authorises the user to process rainfall aggregates from the Klementinum station, Prague, in 2007 only. Fig. 61: Administration of users

4.6. Products

CLIDATA is an open database system. The database administrator and the application itself make it possible to process data tables or data views. It is also possible to process data from other environments. An intention has been to reduce users’ requirements to develop various products that exist in other environment. We have also managed to restrict unnecessary data export out of the application. Seven basic modules only are available in the application: wind rose, x-day function, rainfall intensity, user defined extreme values, regular inquiries, CLIMAT REPORT and frequency characteristics.

4.6.1. Wind rose

Since the calculation method for and presentation of wind roses have not been defined clearly (Nosek, 1972; Guide to Climate Practices – Second edition, 1983), our objective was to combine several methods in CLIDATA so that the user could choose the best type of the wind rose. Calculations methods are shown in the introductory form sheet. See Fig. 62, First, it is necessary to choose the period, station and a wind direction-wind speed pair. Only then the wind rose will be calculated correctly and divided by wind speed values.

Fig. 62: Form for the calculation of a wind rose

The division can be adjusted as necessary. It is possible to use a preset calculation option: a so-called stability wind rose is used for dispersion studies and cases when the wind is defined in 1, 2–4, 5–9 and > 9 m.s-1 intervals. The user may choose a number of wind rose directions (8, 16, 18 and 36), a relative or absolute frequency in the selected number of wind rose directions, a type of a graphical output (a bar chart, line chart, radial chart or alignment chart) and other characteristics. The result is displayed on the screen (Fig. 63). The values are stored in a database table. A simple chart can be exported into a chosen graphic format.

Fig. 63: An example of a calculated wind rose Fig. 64: Form sheet where an X-day function is entered

4.6.2. X-day function

In this special product the user may choose data from TDATA_N (nonregular data). The original intention was to make it possible to choose one-day, two-day and three-day rainfall aggregates that exceed a 30 mm limit. During the creation of the product, it was decided to make the solution more general. Now it is possible to define other combinations too (Fig. 64). For instance, it is easy now to answer the question "What is the last record of five subsequent tropical days?” and “When were the tropical days recorded” or “When did the South Bohemia witness more than 750 mm rainfall in a week?”. The result is displayed on the screen and the values are stored in X_DAY_FUNCTION (Fig. 65).

Fig. 65: Discoverer worksheet for X_DAY_FUNCTION

4.6.3. Rainfall intensity

Information about a rainfall intensity can be found quickly in the part of the application using the minute rainfall aggregate table (INTENSITY_RAINFALL, Table 9). It is possible to define an interval for sum creation (e.g. 5 minutes) and to set a type of a graphical output (Fig. 66). This product is used to check data measured by automated rainfall gauges.

Fig. 66Working with the rainfall intensity worksheet

4.6.4. User EDATA

Fig. 35 shows a form sheet used for calculation of long-termed averages and extremes. The calculation can be customised. It is possible to choose any stations and elements, but only the stations that have been measuring for ten years at least in that period will be included into the calculation. In EDATA the results are differentiated by a source of data (SOURCE = U (user)) and a user name of the user who prepared the calculations.

4.6.5. Regular inquiries

ČHMÚ is frequently asked to send regularly each month a certain quantity of data. Therefore, a part of CLIDATA prepares the outputs for customers. Typically, letters and e-mails are sent each month (Fig. 67). The output contains only the information that is defined in a form sheet for a specific customer (Fig. 68).

Fig. 67: Regular inquiry form sheet Fig. 68: Contents of the regular inquiry form sheet

4.6.6. CLIMAT report

Users in other countries create regular CLIMAT reports in CLIDATA. Data from a specified network of climate stations are supplied into the international data sharing network operated by WMO. The CLIMAT report lists monthly values of basic climate elements as well as observations with long-termed averages and normals. Advanced services use typically the application that generates the report directly in the stations. At the end of the month, the report is sent out. In developing countries the report is generated in national centres. CLIDATA may facilitate this process considerably (Fig. 69). The output is a simple text file that is sent out as agreed out of CLIDATA.

Fig. 69: Preparation of a CLIMAT report

4.6.7. Frequency characteristics

Some industries (for instance, civil engineering) requires combined information from several climate elements. In CLICOM, this calculation was named “frequency characteristics” and we decided to keep the same name in CLIDATA. Generally, the purpose is to calculate a frequency of occurrences when the value of any element is within an interval, while the value of another element is within another interval. For instance, the minimum temperature from -5 to +5 °C occurred X-times for the daily aggregate rainfall interval being 10 mm - 15 mm. Or it is possible to calculate an occurrence of a certain temperature of a specified wind direction speed. A special form sheet is used for the calculation (Fig. 70) and results are stored in CROSS_COUNT.

Fig. 70: Preparation of frequency characteristics

5 Using CLIDATA in the Czech Republic and in the world

As mentioned in Chapter 3 describing the history of CLIDATA, the intention of the authors from the very beginning of the development was to put through CLIDATA not only in the Czech Republic. Meteorological services in big and advanced countries use generally their own database applications that, however, cannot be transferred to another environment.

5.1. CLIDATA in ČHMÚ

CLIDATA is the main climate database for ČHMÚ now. It is also used for operative meteorology and in other fields, for instance, in hydrology and air cleanliness. All authorised users can use CLIDATA in ČHMÚ’s computer network. Database instances exist in seven locations (Prague, České Budějovice, Plzeň, Ústí nad Labem, Hradec Králové, Ostrava and Brno). They are connected in a star-shaped configuration. The users in ČHMÚ are divided into several groups:

  • an administrator of the application: defines the application environment and new records in the administration system, monitors regularly data being filled in in the database, checks data, co-operates with Oracale database administrations in creation of new users and cancellation of non-existing users, and assigns rights to the users
  •  
  • an administrator of data contents: is responsible for one database instance as well as for imports, data acquisition and data control
  •  
  • a revising user: acquires non-automated data, takes control process actions and is responsible for meeting deadlines set forth for monthly data processing
  •  
  • a user: uses rights that enable him or her to process the data stored in CLIDATA smoothly and carry out work tasks correctly (preparation of source data and opinions, drafting of studies, communication with mass media...)

Each user may propose changes in the application and inform the administrators about data errors, if any. The most frequently used tool is an e-mail conference of CLIDATA users. The proposals can be discussed there and the administrator is sure then that the proposed modification is not in conflict with requirements of the other users. Simple modifications and corrections of errors, if any, are carried out immediately. If major changes are needed, they are described twice a year and needed works are planned. After the modification, a new version is available. Once a user logs in, he is notified of a new version and should start an update that will take several minutes only. Each part of the application is identified with a current version number. This number is located in the upper bar behind the name of the form sheet, for instance

2.6.11.29where 2.6 is a code of the year (2006) and 11.29 is the date (29.11.) when the form sheet was changed.

Each year in autumn many of the users meet regularly at a general methodology meeting organised by the Climatology Department, ČHMÚ. The agenda of the meeting includes data processing issues. Since users of some parts of CLIDATA approach differently to the application, a considerable flexibility of CLIDATA has started becoming a kind of an issue. Such approaches should be made more consistent now. Because of personnel changes in departments the users must be trained regularly to use CLIDATA correctly and not to create any system errors.

5.2. CLIDATA in the world

In 2004, the Director of ČHMÚ and General Secretary of WMO signed a memorandum of co-operation providing meteorological services with an access to CLIDATA. It is however always necessary to raise funds for IT, database software, system software, and two-stage training (for the administrator and for the users). In 2002 CLIDATA was tested thoroughly by WMO in evaluation workshops (Report of the Climate Database Management Systems Evaluation Workshop, 2002).

Table 12: Installations of CLIDATA in the world

The test result was not an opinion on best and worst applications, but information only about applications meeting criteria set for the database application that would be suitable for the meteorological services. CLIDATA fulfilled all test criteria and has been regarded unofficially since the test as the best solution.

Table 12 lists countries and organisations where CLIDATA was installed and is used now. ČHMÚ ensures that all users can access latest versions and take part in e-mail conferences. The users can also view special information web site in English: . AGRHYMET in Niger is a partner for trainings and installations in Africa. Several international organisations contribute to financing of the CLIDATA development.

6 Short-period forecast

In the history of the climate database processing, several applications were developed for one meteorological service only. Once the development was stopped, the databases went out of use after a certain period of time. CLICOM was used on the global market, but gradual replacement has started. Since the very beginning, user feedback was an issue in CLICOM (the authors in NOAA never used CLICOM in practice and stopped the development of CLICOM soon). A big advantage of CLIDATA is that the authors have established close contacts with users in the country of origin, this means in the Czech Republic. The software developers meet the users every day at work. They also discuss issues in e-mail conferences and web discussion rooms. They can response to each reasonable comment submitted by the users, prepare new versions and various improvements, and to administer the application part of CLIDATA.

In 2005 CLIDATA acquired a new part - FENODATA thatprocesses phenology data. FENODATA is an independent and standalone instance. The structure, control and administration of FENODATA are same as those of CLIDATA. Currently, the users in ČHMÚ prepare integrating applications between CLIDATA (climate data) and FENODATA (phenology data). After the integration is completed, ČHMÚ can use this powerful tool when analysing influences of climate characteristics on phenophases. CLIDATA includes now pilot data (ADATA). There were not included in the original version of CLIDATA. CLIDATA users, in particular those from abroad, required these data, so they have been integrated into the application. The table with pilot data will become a standard part of the new versions only after thorough testing is completed.

Soon ČHMÚ will complete another database application project - SDNES (an operative inter-disciplinary database). SDNES will influence considerably the import part of CLIDATA. Data will not be imported into CLIDATA, but will be replicated from SDNES. This will streamline CLIDATA administration, but this part will be still kept in the application and will be available for other meteorological services. SDNES will use mode data (long-termed averages and normals) and metadata (station descriptions) from CLIDATA.

In the future, ČHMÚ will be trying to promote CLIDATA in the world. Meteorology services are interested in CLIDATA, but other climate database applications exist now in abroad.

7 Conclusion

This thesis is not an operation manual for CLIDATA users. It provides a brief description of the history, development and current situation. Since 2000 CLIDATA has been used by ČHMÚ in the climate practice. Installation in other meteorological services in WHO member states has started. The climate database application does not include the climate data only. Metadata are an integral part of CLIDATA too. For many years climatology experts are trying to define a border between metadata and data (Guide to Climate Practices, 1983). WMO Climate Commission intends to make the access to metadata consistent and devotes much time and labour force to this task. The author of this work has been working since 2005 as a head of the climate data and metadata expert team nominated by the WHO Climate Commission and has been trying to take advantage of many years of experience in climate database applications.

CLIDATA represented a jump improvement in the level of climate data processing. System and application tools help the users not to make additional errors and faults and force them to take standard actions in a defined sequence. The result is that the climate data is ready for standard use with quality typical of advanced meteorological services not only in Europe. CLIDATA has been becoming more extensive and more complicated. The administration of CLIDATA is rather complex now. This does not restrict typical users in the works, but some options existing in CLIDATA are not used sufficiently now. Only few experts in ČHMÚ and ATACO that develops software have detail knowledge of CLIDATA. All parts of CLIDATA are described in detail in CLIDATA. Users can access the descriptions in line with their database access authorisations. This work should fill in a gap by providing an easy-to-survey description of CLIDATA for a standard user. This description is free of “complicated” descriptions of the system, database, and programme components.

A clear advantage of CLIDATA is that this application has been used every day in the country of its origin – in ČHMÚ departments in the Czech Republic. Several tens of in-house users exert sufficient pressure on elimination of faults and errors, if any, and propose many minor as well as extensive improvements. All this results in regular updates of CLIDATA that are provided regularly to users all over the world.