The Neptune Database - an Introduction

The Neptune database is a relational database of microfossil occurrence records from DSDP and ODP publications. It was produced by David Lazarus, Cinzia Spencer-Cervato, Hans Thierstein and colleagues at ETH-Zurich and has subsequently been implemented in various projects. It is currently being developed by David Lazarus, Haiko Paalike and colleagues who have made a version available to us. An extensive publication on the database is: Spencer-Cervato, C., (1999). The Cenozoic Deep Sea Microfossil Record: Explorations of the DSDP/ODP Sample Set Using the Neptune Database. Palaeontologia Electronica, 2(2, art.4): 1-268. See also Lazarus, D. B. (1994). Neptune: A marine Micropaleontology Database. Mathematical Geology, 26(7): 817-832.

Access to the nannoplankton data is possible from here - thanks to David Lazarus and Johann Renaudie. Following this link logs you in automatically without needing to create an account. The "about" link on the neptune site provides more information and contacts.

The database includes over 17,000 nannofossil samples and over 202,000 nannofossil occurrence records. For the other groups the totals are lower but of similar scale. So it is a very large data source, and there has been significant effort to enhance its' utility through production of uniform age models for each site and careful synonymising of taxa (for planktic forams this was done by Brian Huber, for nannofossils this was done initially by Katharina von Salis with updating subsequently by ourselves - Jeremy Young, Paul Bown, Jackie Lees). However.... the database does have major limitations

So, the database is noisy and its reliability declines as we go back through the geological record.
Nonetheless it is by the far the biggest database on nannofossil occurrences available to us.
Number of nannofossil samples per 2Ma time bin - showing massive bias toward the recent
neptunes-samples-all Cretaceous nannofossil samples - not quite as bad as it looks on the upper graph, but it is very thin coverage

What we have done

  1. To enable comparisons with modern data all age assignments are recalculated to the GTS2012 timescale
  2. Samples were grouped into 1Ma time bins - this is the finest sampling which is justified for the Paleogene and Cretaceous.
  3. In each time bin for each taxon the number of samples in which the taxon occurs was determined and divided by the total number of samples in the time bin. This gives us the occurrence frequency of the taxon in the time bin, which we express as a perecntage. For example, there are 46 planktonic foram samples in the database of age 34-35Ma and Hantkenina alabamensis is recorded in 19 of them, so it has an occurrence frequency of 41.3%. This statistic is independent of the number of samples, although obviously with less samples the data is less reliable.
  4. Scripts have been written to display this data on individual species pages and to allow plotting of data for several species. These plots are produced with the javascript library RaphaelJS
  5. We have used these plots to identify anomalies and clean up the data - in particular we have improved age-models for the Cretaceous. This work has been done collaboratively with the Neptune team in Berlin.

Graphs on species pages

neptune-data-ddefl Data for Discoaster deflandrei, this shows accurately that it is a very abundant species in the Oligocene and Early Miocene with lower abundance in the Late and Middle Eocene but there are also tails of rare reported occurrences outside its true range.
Don't over-interpret this data

These graphs show the percentage of samples in which the taxon occurs, per 1Ma time bin. The bar below shows the accepted range from the main database, which is based on literature data and usually is an expert assessment of the true range of the taxon. The colours show the stages, the bar at the base of the figure shows the zones, and below this are numbers indicting age in Ma.
Further explanation is provided by pop-up tool-tip data if you hover the cursor over the graph.

Interpreting the graphs

Range chart plotter

The range chart plotter tool provides a flexible tool to compare range data from this site and Neptune data. You can also reach this from the Tools Menu. The output is customisable in terms of taxa plotted, sorting order, scales, etc. Age data is given as Ma ages along the bottom of the graph, by colour coding of the chronostrat units and from tooltip boxes.
The image below shows the output for Helicosphaera in the Cenozoic. Again the data is a useful, objective guide to which species have been recorded, although it needs to be interpreted with caution. The overall pattern is a good depiction of the changing composition of assemblages through this time interval, although some details, such as the long trail of H. euphratis records afer the Early Miocene, are probably artefacts.

These are customisable plots, produced with the javascript library RaphaelJS. neptune-helicosphaera Range data for some Helicosphaera species.

Recorded diversity plots

As described above, for individual species we plot the occurrence frequency of the taxon in the database. If these occurrence frequencies are summed for a group of species, e.g. all the species of a genus, then we obtain the recorded diversity for that group of species. I.e. the average number of species of the genus (or higher taxonomic category) that have been recorded in samples of the age bin. This recorded diversity is something of a hybrid statistic since it is a product of both the abundance of the group and its diversity, but it does make a useful proxy for the prominence/importance of the group.
Plots of recorded diversity of any higher taxon can be made from the range chart plotter by selecting the taxon from the dropdown menu. They display more neatly if the mikrotax range bars are turned off, and it may be useful to adjust the vertical scale.
For radiolaria and diatoms this data is displayed directly on the higher taxon pages, as in the example below. Data is more relaible if thre are more samples in the time-bin. To indicate this the bars are made increasingly transparent when there are <50 samples per time bin. rads

Space-time plots

Space-time plots are a way to look at variation in occurrence frequency with latitude though time. The size of squares on the plot indicates number of samples and the colour shading indicates the occureence frequency. The actual occurrence frequency is also given as a figure in the centre of each box. On each species page a link is provived to allow plotting of a space-time plot for that species, and various options can be changed on the plotting page. NB the example below plots data for Neogloboquadrina pachyderma.

Biogeographic map plots

An alternative way to view the data is to plot data for one time interval on a biogeographic map. Circles here indicate all drilling sites for which occurrence data for the relevant fossil group is available in that time interval. Colour-coding indicates the occurrence frequency for that site (tooltip data provides more precise data for each site). NB Data is currently plotted on a modern basemap but we intend to rpelace this with palaeogeographic basemaps. The example below plots data for Neogloboquadrina pachyderma for the 2-4Ma time slice.

Species occurrence tables

Occurrence tables are given under the biogeographic map plot. These provide details of all samples plotted, and links back to the relevant Scientific Results volumes. NB If you click links on the space-time plot then only the data from the selected square will be plotted.


Site Distribution tables

Site distribution tables provide an overview of all the occurrence data of a group of fossils from a given site. They are in essence reconstructions of the original data tables from which the database was composed and one use of them is to check the data against the original. The tables are sometimes very large but they can be reduced in size by selecting data from a limited time (or depth) interval. The species shown are then reduced to those which occur in this interval. Taxa names can either be displayed as those use in the original report or as the respective current names - as used in mikrotax. In addition species columns are colour coded according to sample depth to show the interval in which according to the sample age the species is predicted to occur. This can help highlight data anomalies. distribution table

A reminder of the problems with the data

Don't forget the data is not perfect... We recommend that you use this data as you might use wikipedia. It is a powerful way to explore knowledge and develop hypotheses, but any hypotheses drawn from this data should be tested rather than assumed to be correct.