The GCP To–Do List

An outline of how people who would like to contribute programming or analytic skills to the GCP can do so.

Introduction
Types of Analysis
Statistical Tests
Graphical Displays
Potential Projects
Control Data
Data Protection

Introduction

This page is largely based on the project’s early analysis page, but has annotations and some prioritization of projects indicated by numbers in parentheses, as well as suggestions for methods that might be appropriate. For example, a display like a tapestry, perhaps enhanced by a modulated hum or overtone drone expressing occasional clusters of non-randomicity, would be nice, and probably could be quite beautiful. A richer set of possibilities for experimental music (archived page) based on the GCP data was developed, presenting an opportunity for someone with musical interest and programming or scripting skills.

The notes below are at best an outline to sketch some possibilities and requirements, and actual implementations will depend on the creativity and skill of the contributors, working together with people in the GCP planning group. As an example of what is involved, one of the most desirable facilites for the project is a form to specify the parameters to be used in automated data processing for analysis with graphics to assess a given global event.

If you would like to proceed, write to Roger Nelson, the project director, to coordinate your project and provide access to the necessary resources and information. For example, Ari Kahn sent a suggestion to check out a specialized graphics programming langauge which might yield interesting displays using the EGG network data. There are examples at processing.org. I’m happy to help people who want to do things like this, and to provide a web presence for applications people create.

Recent Suggestions

The material in theis page is mostly fairly old, and some projects actually have been done. If you are looking for something that appeals, it may be worthwhile to scan through the whole page, but here are a few suggestions made in response to inquiry about what someone might do to contribute.

a menu of graphical displays that website visitors could select to see what the summarized data look like last week or on a particular day a year ago, etc.
a musical rendition of the data that could be turned on (most people would not want to listen) based on statistical parameters of the current data. I envision a "chord" that would be chaotic noise most of the time, but would become more harmonious if strong interegg correlations or temporal autocorrelations occurred.
obtaining data from long-term physical variables like cosmic ray flux or something else that is either random or has a seperable random component to see if correlations might exist with GCP data or with events that correlate with departures from expectation in our data.
any creative scientific or aesthetic use of the data that appeals to you.

Types of Analysis

As time and programming skills permit, we will present multiple versions of most analyses, calculated and displayed either automatically or by selection from a small set of options. Priority is indicated with a number in parentheses.

(1) Event calculation should be provided to correspond to predictions made following specified rules for defining the beginning and end times for events, including pre-specified time periods to be associated with point events. This is now done on an ad hoc basis, manually.

(2) Update calculations using all available data within a specified period.
This would include data from all eggs except those which send data only once or a few times per day. At present, this function exists in the daily eggsummary report, which gives Z-scores for 15-minute blocks, extreme scores, and three graphical summaries. All of the raw, second-by-second data are directly available via the web-based extract form.

(2) Defined Period calculations and displays might cover a given date and time,
say, the past month, week, day (already available), or hour.

(2) On-line calculations with all data from permanently connected Eggs.
One example exists now in the Real-Time Display.

(2) Viewer specified calculation time periods, using input forms to get necessary information. It may be necessary to restrict these to a small subset of calculations. It will be necessary to clarify the interpretation in light of post hoc selection and multiple analysis concerns.

Statistical Tests

The focus for most analyses will be anomalous shifts of the segment distribution mean, and a composite across eggs is defined as the formal test of the primary hypothesis. We are interested in exploratory assessments of other parameters as possible indicators. We also expect to explore correlations with environmental variables including automatically registered global-scale measures such as sidereal time, geomagnetic field fluctuations, and seismographic activity.

A second major focus will be on the transitional probabilities within the sequences of interest. Atmanspacher and Scheingraber’s Scaling Index Analysis is one procedure for exploring this area, and Jiri Wackerman’s measure of Omega-Complexity, when implemented, will provide a comprehensive perspective on internal structure in the data.

The following list of likely measures, with short names and brief annotation, suggests some useful projects for analyst-programmers. Items preceded by (1) are priority items, (2) next priority, etc.

✓ Segment Variability Segment-based Chi-square calculations and statistics.
The Chi-square sums the squared Z-scores corresponding to meanshifts within the specified segments, across all segments and all eggs. This is the primary measure for formal analysis. What is now needed is a user-friendly procedure, based on html forms, to specify the segments of interest and automate the computations and create tabular and graphical displays of results.

(1) Block Variability Arbitrary-block Chi-square calculations and statistics.
Similar to the segment-based analysis, but with the option to specify a sub-segment length. That length (for example, 1 second, 1 minute, or 15 minutes, ...) would define data-blocks whose meanshifts would be the basis for the Chi-square test statistic. (Ref. Dick Bierman)

(1) Intercorrelation Correlation Matrix calculation and analysis.
A general assessment of the hypothesis that some influences might affect all of the REG devices. Application to signed deviations, and also to squared deviations (Chi-square). Desirable to include pattern search or extreme value search algorithms. (Ref. James Spottiswoode)

(1) Parameters Multi-level calculations of normal distribution statistics.
A relatively simple set of parameter computations that could be applied to the active, prediction-based data selections, or to arbitrarily selected data blocks, including those obtained in a resampling procedure to generate empirical distribution parameters.

(1) Calibration Global testing for conformance to probability theory.
A relatively simple set of calibration tests that would be applied to the individual and collective data streams.

(1) Complexity Omega complexity (Phase 2), using Jiri Wackerman’s methods.
A comprehensive measure to be applied to large blocks of unselected data to assess internal structure. Based on procedures used to summarize and display structure in neurophysiological data. You can download descriptions in various formats.

(1) Music Explorations Random music to use human perception of structure.
We consider it possible that musical and rhythmic patterns created from the data might reveal structure. Even if this possibility is not borne out, the resulting music should have an aesthetic value.

(2) Optimization Fitting analysis strategies to the experimental questions.
Write a note to discuss ideas.

(2) Factors Prediction-based testing, multiple regression, ANOVA.
Write a note to discuss ideas.

(2) Clusters Cluster/factor analysis, grade-of-membership.
Write a note to discuss ideas.

(2) Transitions Scaling index analysis, autocorrelation strategies.
Write a note to discuss ideas. See also notes on Fourier analysis.

(2) Dynamics Time series, dynamic systems models, chaos-dynamics.
Write a note to discuss ideas.

(2) Probability Complexity Probability modeling, Complexity theory.
Write a note to discuss ideas.

(2) AI Neural nets or other artificial intelligence programs.
Write a note to discuss ideas.

(2) Exploration Open-ended category for bright ideas and incisive questions.
Write a note to discuss ideas.

Graphical Displays

We want the website to be attractive and elegant, and well-designed displays of the data can greatly contribute to the art of the matter. Some displays are aesthetically pleasing by their nature. Where the best form is a graph, we will prefer a straight display, following Edward Tufte’s waste no ink dictum, of simplicity, readability, and elegance. On the other hand, dynamic, flowing and changing displays will be used where this makes sense, and displays may be multi-dimensional, with multiple spatial dimensions, color, motion, and sound if these help to make the information accessible.

(1) Simple pictures of hypothesis support (e.g., cumulative deviation of Chi-square; histogram with error bars). These should be among the most immediate displays, creating an automatically updated picture of results.

(1) Displays of time line, prediction registry, independent variables. This is conceived as a comprehensive, overall picture of what is happening in the project. It could take many forms, and should be very interesting to the viewer as a compact way of showing the nature and purpose of the project. Graphical versions of non-data displays, e.g., status of EGG-net.

(1) Running and Cumulative Z-scores, running and cumulative Chi-squares This already exists for the Daily eggsummaries. What is needed is a flexible set of tools allowing specification of a time-period of interest, and selection of subsets of eggs, e.g., those close to or distant from an event. See related note in the statistics list.

(1) Global map of overall and local hot-spot structure, coherence. One version already exists in the daily movie, but a variety of compact, summary overviews in graphic form can be envisioned. For example, check out this proposal for a Desktop Egg display of the activity.

(1) Spatial Principal Component Analysis and Omega-complexity. Visual or graphic aspect of the analysis described above in the statistics list. This is a 3-dimensional macro-state space capable of summarizing huge quantities of information. Also pretty graphics.

(2) Global and local composites of statistical parameters. Tables may be the better display for much of this, but graphs may be desirable for some aspects.

(2) Algorithmic conversion of data streams to musical chords. The daily movie already presents an interesting version, but it would be good to have other translations since musical intuitions are highly variable.

(2) Map of correlations with environmental variables.
Write to discuss ideas.

(2) QEEG frequency and correlation arrays.
Write to discuss ideas.

(2) Resonance and coherence meters.
Write to discuss ideas.

Project Possibilities

It is most helpful if people who wish to contribute programming or analysis develop projects that are directly interesting. We can help with information, access to data, and advice, but generally have a full plate. There are any number of good ideas, and we will list here a few that need a champion, because although they are worth doing, we don’t have enough hands to do them all. Let us know if you do want to work on one or another of these.

From Allen Ernst:

Muslim prayer time for morning and evening are determined in a very precise manner astronimically, for each local area. Picture the waves of consciousness produced as the sun travels across the orient, with spikes produced as people in large cities of the faithful come to prayer simultaneously by PA address from minarets. Maybe a phenomenon like this might explain ramping data that has not been accounted for otherwise. Since you had such good luck with mass meditations, this might yield something interesting.

The above idea is a good one I think. I am not sufficiently familiar with the timing to be sure there would be a strong focus at particular times, but it may be the case. I believe there are 5 prayer times each day, not just two. I’m uncertain whether those are set, or instead mighe be called by the local imams. In any case it would require a good deal of detailed information to set up an analysis, which would probably need to be a signal averaging process. It is a fine project for someone to undertake, perhaps as a thesis project for a graduate student. Or, if you are interested in working it up to the point of precise timing and consideration of the sweep through the 37 timezones, we could extract the corresponding data. In any case, I will make the question and preliminary description part of the project’s todo list.

Geomagnetic correlations have been considered, and though we have a few samples, it would be desirable to take a serious look at the question. The device variance, network variance, and the dispersion measure all could be tested for correlation to the aa index, which is available from government resources. Other cosmic variables can also be considered. The essential requirement is a continuing long-term measure at frequent intervals, at least hourly, preferably minutes or seconds resolution.

There is a suggestion of correlation between Sunspot counts and the GCP Netvar measure shown in long term trends. Dick Shoup suggests it might make a good project to look at dates of significant solar events as a group. A list of events considered significant in 2012 may be found at sidc.be/news/173/welcome.html

Control Data

Control data are needed to establish the viability of the statistical results from active data generated during events specified via the prediction protocol. The GCP controls are based on quality-controlled equipment design, thorough device calibration, and a procedure called resampling. Resampling examines data near but not in the active segments, to build an empirical distribution to compare with theoretical expectations. This is one of the projects suggested in the statistics list.

Data Protection

The raw data are quite well protected against various errors, but it remains possible for some malfunction to generate defective data, or for a malicious attack over the network to inject spurious data. Neither of these is considered likely; indeed they are highly unlikely, but it would be desirable to have automatic conformity checks to assure integrity of data upon entry into any analytic process. While the active data predicted to show deviations are immediately interesting, it is important also to test calibration data (i.e., data surrounding the active data) against expectation models. This is the purpose of the resampling procedure described above.