Preparing and Submitting Tabular Data

CDS · Simbad · VizieR · Aladin · Catalogues · Nomenclature · Biblio · Tutorial · Developer's corner

The CDS and other astronomical data centers are storing and distributing the astronomical data to promote their usage primarily by professional astronomers. In order to ensure the scientific quality of the data, we therefore require that the data are related to a publication in a refereed journal, either as tables or catalogues actually published, or as a paper describing the data and their context.

In order to facilitate the usability of the data, and to allow their processing by the data centers, we require that:

the data are described accurately enough to allow an unambiguous interpretation of the data, as well as a comprehension of the context in which the data were acquired and/or processed; a single ascii file, named ReadMe, is designed for this role.
the data are in a format which allows their usage by tools currently in usage in our discipline -- normally flat ascii files; other formats can be accepted, but are converted into flat files.

A full description of the standard conventions used for the documentation of the catalogues is available at URL http://vizier.u-strasbg.fr/doc/catstd.htx . The present document just tries to answer to some frequently asked question about how to prepare the data for their inclusion in the Data Center documents. The following topics are covered:

1 How to prepare the Data files

It is assumed that each component of the data set is stored in a file; each file can represent a table, a spectrum (1-D data), or an image (2-D data). As a general rule, plain ascii data files (also called flat files) -- are preferred, simply because such files can always be processed. More explicitely, the following formats can be used:

for tables and catalogues: ascii (simple flat files), with details about their structures (description of columns) detailed in the ReadMe file. Some other data formats can be accepted, but are converted into flat files: latex, FITS, or TSV / CSV. TSV (tab-separated values) and CSV (character-separated values), are a presentation where a dedicated character (the tab in TSV, or a punctuation in CSV, typically the semi-colon) is used as a column separator; this is one of the formats available for the output of spreadsheets.
What cannot be used: postscript or word/excel processing internal documents.
for spectra (1-D data): either FITS file(s), or 2-column ascii tables.
What cannot be used: postscript, word/excel documents, GIF or JPEG images.
for images (2-D data): FITS is the preferred format; for images of the sky, the inclusion of the FITS-WCS (World Coordinate System) parameters describing the conversion between celestial coordinates and pixel position is strongly encouraged.
What cannot be used: postscript, word/excel documents.

Therefore: never postscript files, postscript is a language designed for printers, not for storing scientific data !

A short word about file naming conventions: according to ISO 9660 standard, file names are restricted to 8 + 3 characters: 8 characters in the set [a-z0-9_-], followed by a dot and an extension made of 3 characters with the following conventions: .dat for data files, .fit for FITS files, .tex for TeX/LaTeX files, and .txt for text files (ascii files containing only printable text).

Full details about the files and directories structures can be found in the Adopted Standards for Catalogues document.

2 How to fill the `ReadMe` description file

This file is aimed at describing all data files stored in a catalogued data set, and at providing the necessary explanations and references to the stored material.

All catalogues available at CDS and in associated astronomical data centers have such an associated file, and numerous examples can be found on the FTP directories at CDS.

A full description of the conventions used in this ReadMe file can be found in the Standards for Astronomical Catalogues, and a template is readily accessible for A&A tables. A typical illustration could be e.g. J/A+A/382/389/ReadMe. Short explanations about how to fill the ReadMe file:

the Keywords: part lists the following keywords:
- ADC_Keywords introduces the list of data-related keywords, out of a controlled set (see also examples at ADC).
- Keywords: introduces the list of keywords as in the printed publication
Unlike the Keywords: set which is generally related to the scientific goal of a paper, the ADC_Keywords are stricly related to the tabular material collected in the paper.
the Description: section is expected to describe the context of the data, like the instrumentation used or the observing conditions -- it therefore differs from the Abstract which tends to describe the scientific results that the author derived from the data.
the File Summary: section describes the files making up the set: for each file are specified its filename, the length of the longest line (lrecl), the number of records (number of lines), and a caption (short title of the file). Lengthy notes can be added if necessary.

the Byte-by-byte Description of file: section describes the structure of each of the data files (files with the .dat extension). This description is made in a tabular form, each row describing one field (column) of the data file. The description contains the following columns:

the starting column of the data field

the format of the field as a fortran-like format:

An	for a character column made of n characters;
In	for a column containing an integer number of n digits;
Fn.d	for a column containing a number of width n digits and up to d digits in the fractional part;
En.d Dn.d	for a number using the exponential notation.

the units used in the field; the usage of SI units are strongly encouraged, avoid the CGS units (for instance, use mW/m² instead of ergs/s/cm²).
the label (heading) of the field, made of a single word (no embedded blank); a few basic conventions are used for usual parameters (e.g. positions) and related quantities (e.g. mean errors).
the explanations can start with the following special characters related to some important data characteristics:

* (the asterisk) indicating a lengthy note

[...] (square brackets) indicating data ranges

? (question mark) indicating a possibility of blank or NULL (unspecified) values

the References: section contains the necessary references; the usage of the bibcode is strongly encouraged. For large sets of references, it is suggested to gather them into a dedicated reference file named refs.dat .

3 How to deposit the data

If not too bulky, data files with their ReadMe file can be uploaded from

http://cdsweb.u-strasbg.fr/cgi-bin/Submit

where some basic checks on the ReadMe and data files are performed. The checking procedure is also available as the anafile package which can be installed with the standard configure and make Linux procedures (man page)

Alternatively, you can:

upload the files with their ReadMe via ftp (recommended for large files) at the node

cdsarc.u-strasbg.fr (130.79.128.5)

in a subdirectory of the incoming directory. Use anonymous as userid, and your e-mail address as password, and move to the incoming directory with the command
cd incoming
Don't worry if the answer to the dir command is afterward
550 No files found.
this directory is protected such that the file names cannot be listed.
There, create a directory with a name of your choice (e.g. your name, or the A&A reference, but without blank or special character) with the command:
mkdir my_choice
(! NOTICE: Remember your choice, it can't be listed later !)
Then move to the directory you just created with the command:
cd my_choice
Then deposit your files with put or mput commands.
Finally SEND AN E-MAIL telling where you've placed files to: cats(at)simbad.u-strasbg.fr
e-mail your files to the e-mail address cats(at)simbad.u-strasbg.fr if these are not too bulky (< a few Megabytes).
or mail the data (CD or DVD) to the attention of

Dr François Ochsenbein
Centre de Données astronomiques
11, rue de l'Université
67000 STRASBOURG, France
francois(at)astro.u-strasbg.fr

4 What happens to your data

At the CDS, some checking procedures are executed to verify the compatibility between the data files and their description. This can lead to interactions with the authors, but we are trying to minimize the level of interaction. Once the data are public, they are accessible as plain files in FTP directories at CDS and other participating data centers (e.g. at CfA/Harvard (USA), CADC (Canada), or NOAJ/ADAC, Japan). The data are also added to the VizieR service, with mirrors at CfA/Harvard (USA), CADC (Canada), NOAJ/ADAC (Japan), Cambridge (UK), IUCAA (India), BAO (China).

5 Contacts

For any question related to the preparation of the data, for problems related to non-standard data formatting, or any other difficulty in the management or the transfer of the electronic tables, either send a mail by clicking on the envelope below, or contact directly François Ochsenbein (francois(at)astro.u-strasbg.fr)

*	(the asterisk)	indicating a lengthy note
[...]	(square brackets)	indicating data ranges
?	(question mark)	indicating a possibility of blank or NULL (unspecified) values