The Edge-on Galaxy Database

The Edge-on Galaxy candidates in Pan-STARRS1 survey dr2 found by ANN

Examples

Catalogs and samples of galaxies with well-defined selection criteria that have high-precision observational data on morphology, photometry, structural parameters of galaxies, redshifts, and internal kinematics are extremely important for comparison the modern cosmological models to the real observations. The goal of this project is creation of the catalog of galaxies visible nearly edge-on to the line of sight. We plan to dramatically increase the known number of edge-on galaxies due to a better sky coverage by the Pan-STARRS survey.

Team

  • Alexandra Antipova (SAO RAS)
  • Dmitry Bizyaev (APO, SAI MSU)
  • Svyatoslav Borisov (SAI MSU)
  • Stefan Kautsch (NOVA)
  • Dmitry Makarov (SAO RAS)
  • Lidia Makarova (SAO RAS)
  • Alexander Marchuk (SPbU)
  • Alexander Mosenkov (Pulkovo Observatory)
  • Vladimir Reshetnikov (SPbU)
  • Sergey Savchenko (SPbU)
  • Iliya Tikhonenko (SPbU)
  • Pavel Usachev (SPbU)

Imfit 2D-decomposition of the PS1 edgeon galaxy candidates DR1

We started a new project on 2D-decomposition of the edge-on galaxies.

Gallery

Superthin galaxy with a prominent pseudo-bulge
One more example
S0 disk with a pseudo-bulge
Truncated disk with a dust lane
Three aligned dalaxies
A LSB Face-on galaxy projected on an edge-on one
Overheated disk
Polar ring (45 degree)
Interaction
Merging
Integral shape
Chain

Final sample

The final sample of the edge on galaxies in the Pan-STARRS dr2 survey contains 18314 candidates that meet the following conditions:
  • There are no "wrong" votes in visual classification.
  • There is at least one "good" vote.
  • The number of "unsuitable" votes is less than 75% of the total vote number.

The final sample contains 3350 of 5749 EGIS-galaxies and 2250 of 3029 RFGC-galaxies from the PS1 zone, Dec>-30.

Photometry comparison

We compared our automatic SExtractor (magPetro) r-band Pan-STARRS photometry with the aperture photometry of EGIS-galaxies based on SDSS data.
Photometry comparison
The grey dots represent the common galaxies from our list of candidates and the EGIS catalog by Bizyaev et al. (2014). The cyan big dots correspond to the running median values and the error bars are 25 and 75% quartiles of the point distribution. The horizontal dash blue line is the median value of the difference between PS1 and SDSS photometry for galaxies in the range [15,17.2] SDSS-magnitudes. There are no significant trends between two different types of the photometries made in the different surveys in the range between ~13 and 17.2 mag. The clear trend for galaxies brighter than 13 mag illustrates the well known problem with extended objects in Pan-STARRS1 survey1. The faint objects with rPS1>17.2 mag show the opposite effect which can be interpreted as the Malmquist bias.

The systematic difference between PS1 and SDSS data is only
rPetro: Me(PS1-SDSS) = 0.00365 [-0.03068,0.03588] Sigma=0.0495.
rKron: Me(PS1-SDSS) = 0.02828 [-0.00373,0.06062] Sigma=0.0478.
The values in the brackets correspond to 25 & 75% quartiles. The standard deviation is estimated using a median absolute deviation Sigma=1.4826*MAD.

Photometry refinement

Из анализа статистики и визуального просмотра кандидатов был составлен список примерно 5500 объектов с проблемами в автоматической SExtractor фотометрии. Был составлен скрипт для ручного подбора параметров для улучшения ситуации. Цель - подобрать параметры "nthresh" и "nthresh" так, чтобы во всех 5 grizy-фильтрах добиться наилучшего выделения областей принадлежащих галактике и недопущения излишней сегментации с одной стороны, и отделения фоновых звезд и галактик с другой. В дальнейшем SExtractor-фотометрия была выполнена для всех объектов, с учетом индивидуальных параметров.

Cross-identification

Была выполнена кроссидентификация кандидатов с галактиками из базы данных HyperLeda. 21359 объекта из 22731 были отождествлены с известными галактиками, соответственно 1372 кандидата оказались новыми галактиками. Среди отождествленных три пары кандидатов оказались принадлежащих одним и тем же галактикам (3 шт). Список Pan-STARRS кандидатов в галактики, видимые с ребра, содержит 3590 из 5749 EGIS-галактик и 2435 из 3029, попадающих в облась Dec>-30, RFGC-галактик.

Interesting cases

Jellyfish like galaxy
Tidal streams
Very prominent polar ring?
Interaction
The foreground objects looks like a jellyfish galaxy
The superimposed LSB galaxy

Classification system

A software for visual inspection of the candidates and its classification was developed. It has an interface similar to the zooniverse, but directly connected with the database. The advantage is an usage of Aladin Lite for visualisation of the objects. Aladin gives the possibility to inspect an object in different sky surveys as well as to zoom the image.

Statistics on classification by people

This plot shows a ratio of votes which each participant gives to a particular class of edge-on galaxy candidates.
People statistics

Artificial tests

The completeness map shows a probability of detection of candidates in edge-on galaxies by our ANN algorithm depending on central surface brightness and exponential radial scale.
Completeness map

Visual inspection. Stage 1

We have finished the first stage of the visual classification of the candidates using the Zooniverse platform.

Accepted designations

GOOD
Edge-on galaxy: 85 ≤ i ≤ 90 (good for analysis)
Acceptable
Nearly edge-on galaxy: 80 ≤ i ≤ 85 (acceptable for analysis)
Unsuitable
Not edge-on galaxy, but it could be a real galaxy or a pair of galaxies
Problems
Stars or defects significantly overlap with the body of the edge-on galaxy and can hurt further analysis
Wrong
Not a galaxy at all. It can be defects or asterisms.

These cases need inspection !!!
Impossible combination: GOOD+Wrong 71
Impossible combination: Acceptable+Wrong 98
Impossible combination: Unsuitable+Wrong 81
Suspected combination: GOOD+Unsuitable 4013

The "impossible" combinations for 177 galaxies were revised and erroneous classifications were removed from the analysis.

Statistics
Voted as: Counts
GOOD 11877
Silver sample: GOOD ≥ 50% 4945
Gold sample: GOOD ≥ 80% 1794
Acceptable 16390
Unsuitable 12882
Wrong cases 4132

Artificial galaxy tests

To estimate the completeness of the catalog of edge-on galaxies in the Pan-STARRS we run a set of tests with artificial galaxies. The artificial galaxy is a pure exponential disk galaxy which varies in the central surface brightness, the radial exponential scale length and the vertical-to-radial scale length ratio. They are dropped down in random positions with random position angles within selected Pan-STARRS images. The image shows a preliminary map of the completeness of our edge-on galaxy catalog calculated by Sergey Savchenko.
Preliminary completeness map

Visual inspection of the candidates

Despite the huge improvement in the quality of edge-on galaxy selection using ANN methodology, the final sample contains a significant number of objects that were falsely interpreted by ANN as edge-on galaxies. Typically miscalssified objects are asterisms, image defects, bright stars and real galaxies visible not from the edge. The interface was created by Dmitry Bizyev using the Zooniverse platform. The instruction was prepared by Alexander Mosenkov.

The Edge-on Galaxy database

This project was added to the Edge-on Galaxies Database by Dmitry Makarov.

Training of the artificial neural network

For selection of the edge-on galaxies the artificial neural network was trained on a sample of the Edge-on Galaxies In SDSS (EGIS, Bizyaev et al. 2014) using the open source machine learning platform TensorFlow. It increased the number of correct detection up to 99.3 %. Finally, five models were trained and used simultaneously that allowed us to significantly decrease the number of wrong detections. As a result the 26587 candidates in the edge-on galaxies were selected. This stage was made by Sergey Savchenko.

  1. EGIS выборка галактик была разбита на тренировочную (80% от исходных галактик) и тестовую (оставшиеся 20%) подвыборки. Обучение проводилось по тренировочной подвыборке, а оценка правильных отождествлений была сделана по тестовой подвыборке галактик, Для увеличения объема исходной выборки был использован data augmenting: оригинальные галактики были немного модифицированы (использовалось зеркальное отражение, поверот на произвольный угол, изменение масштаба, добавление шума и т.д.). В результате из исходной выборки EGIS было сформировано 300.000 объектов для тренировочной выборки.
  2. Были использованы сверточные нейросети. Окончательная архитетктура состоит из:
    • Сверточный слой 5x5x16 (размер свертки 5x5, число сверток 16)
    • Сверточный слой 5x5x16
    • batch-norm -- слой
    • max-pooling -- слой
    • drop-out -- слой
    • Сверточный слой 5x5x32 (размер свертки 5x5, число сверток 32)
    • Сверточный слой 5x5x32
    • batch-norm -- слой
    • max-pooling -- слой
    • drop-out -- слой
    • Сверточный слой 5x5x64 (размер свертки 5x5, число сверток 64)
    • Сверточный слой 5x5x64
    • batch-norm -- слой
    • max-pooling -- слой
    • drop-out -- слой
    • полносвязный слой размером 500
    • полносвязный слой размером 2 (собственно решение с ребра -- не с ребра)
  3. В качестве функции везде, кроме последнего слоя используется ReLU, последний слой softmax. Входные изображения масштабируются к размеру 48x48.

    Для справки:

    batch-norm слой
    нормирует значения во входных данных, чтобы избежать очень больших и очень маленьких значений.
    max-pooling
    уменьшает размерность данных, выбирая только один максимальный элемент из куска NxN (использовалось 2x2)
    dropout
    позволяет сделать определенную долю нейронов неактивными. При каждом шаге тренировки в слое отключаются случайные нейроны. Это позволяет избежать переобучения, когда нейросеть просто запоминает конкретные изображения. Из-за того, что часть нейронов отключается, нейросети приходится действовать более умно и искать какие-то характерные паттерны, вместо зазубривания конкретных галактик, что нам и нужно. Когда обучение закончено, и нейросеть используется для анализа данных дропаут слои игнорируются и все нейроны активны (их возросшее число компенсируется нормировкой). В работе была выбрана доля отбрасываемых нейронов равная 0.3.

First attempt

In the first stage we tried to understand the possibility of using an automatic catalog of objects generated by Pan-STARRS for selection of the edge-on galaxies. As a reference sample we have used the Revised Flat Galaxies Catalog (RFGC) by I. Karachentsev et al. (1999). The aim was

  1. to find a correspondence between properties of the RFGC galaxies and the objects from the automatic Pan-STARRS catalog
  2. to pick up a selection criterion and to estimate the loss rate and the amount of wrong objects
Unfortunately, the analysis showed that there is no correlation between properties of the RFGC galaxies and the corresponding objects from the automatic catalog. Obviously, the automatic algorithm works unsatisfactorily for selection of extended and highly elongated objects. The attempts to pick up selection criterions show that reducing the loss factor to an acceptable level of better than 20% leads to a catastrophic increase in the amount of "trash" in the final sample. So the good-to-bad ration was 1 to 500 or worse. This was totally unacceptable for project purposes. The detailed analysis and report were carried out by Iliya Tikhonenko.