Introduction To EDA


Exploratory data analysis (EDA) provides a more comprehensive understanding of scientific data than is gained from basic charts, tables or numeric statistics.
Site Example 1 and Site Example 2 include several figures with specific examples of EDA based upon environmental data from actual projects, but the same approaches can be used in other branches of science and in business as well.

EDA can rapidly find data patterns and trends using visualized statistical procedures across multiple variables. This allows efficient hypothesis testing and assessment of whether any trends or patterns are statistically significant. In fact, preformed hypotheses are not always necessary because hypotheses can emerge from the trends and patterns. In combination with appropriate education and experience, EDA is a powerful tool for answering questions and finding solutions to complex problems.

Large quantities of data are usually generated at significant expense during environmental investigations and, quite often, valuable data content is overlooked. The simple data are generally used in important ways, such as to assess risks to human health and other organisms, to establish environmental baselines, as trend indicators, as criteria for regulatory action at a given concentration, as the basis of damages lawsuits, and so on.

At a more complex level of need, a scientist or engineer might be trying to decide whether active remediation is needed, whether monitored natural attenuation is occurring at an acceptable rate, to determine whether sufficient data have been collected to minimize data gaps, or to assess how and where to install active contaminant remediation systems. PASS has experience with such complex data needs, for example:
  • Determining why the sediments in a lake are devoid of benthic organisms in certain areas,
  • Resolving whether a facility being sued is responsible for widespread contamination, and
  • Discriminating between natural soil constituents, industrial fill materials and waste-contaminated soils at a RCRA site.

EDA can clarify complex scenarios by rapid assessment of alternative hypotheses and providing statistics about whether any visualized relationships are significant. Conceptual site models (CSM) developed with EDA are easier for clients and regulators to understand than numeric tables or maps of sample locations with lists of contaminant concentrations.