| To Do: |
| A list of
projects for the GCP Introduction Data Protection IntroductionThis is a page for people who would
like to contribute programming or analytic skills to the GCP. It is largely based on the
"Analysis" page, but has annotations and some prioritization of projects
indicated by numbers in parentheses, as well as suggestions for methods that might be
appropriate. For example, a display like a
tapestry, perhaps enhanced by a modulated
hum or overtone drone expressing occasional clusters of
non-randomicity, would be nice, and probably could be quite beautiful.
A richer set of possibilities for
music
based on the GCP data has recently been developed, presenting an
opportunity for someone with musical interest and
programming or scripting skills.
The notes below are at best an outline to sketch some possibilities and
requirements, and actual implementations will depend on the creativity and skill of the
contributors, working together with people in the GCP planning group.
As an example of what is involved, one of the most desirable facilites for the project is a
form to specify the parameters to be used in automated data processing for
analysis with graphics to assess a given global event.
If you would like to
proceed, write to Roger Nelson, the project
director, to coordinate your project and provide access to the necessary resources and
information.
For example, Ari Kahn sent a suggestion to check out a specialized graphics
programming langauge ( http://processing.org/ ) which might
yield interesting displays using the EGG network data.
There are examples at http://processing.org/exhibition/.
I'm happy to help people who want to do things like this, and to provide
a web presence for applications people create.
The material in theis page is mostly fairly old, and some
projects actually have been done. If you are looking for
something that appeals, it may be worthwhile to scan through
the whole page, but here are a few suggestions made in
response to inquiry about what someone might do do
contribute.
1) a menu of graphical displays that website visitors could
select to see what the summarized data look like last week
or on a particular day a year ago, etc.
2) a musical rendition of the data that could be turned on
(most people would not want to listen) based on statistical
parameters of the current data. I envision a "chord" that
would be chaotic noise most of the time, but would become
more harmonious if strong interegg correlations or temporal
autocorrelations occurred.
3) obtaining data from long-term physical variables like
cosmic ray flux or something else that is either random or
has a seperable random component to see if correlations
might exist with GCP data or with events that correlate with
departures from expectation in our data.
4) any creative scientific or aesthetic use of the data that
appeals to you.
As time and programming skills
permit, we will present multiple versions of most analyses, calculated and displayed
either automatically or by selection from a small set of options. Priority is indicated
with a number in parentheses. (1) "Event" calculation
should be provided to correspond to predictions made following specified rules for
defining the beginning and end times for events, including pre-specified time periods to
be associated with "point" events. This is now done on an ad hoc basis,
manually. (2) "Update" calculations
using all available data within a specified period. This would include data from all eggs
except those which send data only once or a few times per day. At present, this function
exists in the daily eggsummary report, which gives Z-scores for 15-minute blocks, extreme
scores, and three graphical summaries. All of the raw, second-by-second data are directly
available via the web-based extract form. (2) "Defined Period"
calculations and displays might cover a given date and time, say, the past month, week,
day (already available), or hour. (2) "On-line" calculations
with all data from permanently connected Eggs. One example exists now in the
"Real-Time Display". (2) "Viewer" specified
calculation time periods, using input forms to get necessary information. It may be
necessary to restrict these to a small subset of calculations. It will be necessary to
clarify the interpretation in light of post hoc selection and multiple analysis concerns. The focus for most analyses will be
anomalous shifts of the segment distribution mean, and a composite across eggs is defined
as the formal test of the primary hypothesis. We are interested in exploratory assessments
of other parameters as possible indicators. We also expect to explore correlations with
environmental variables including automatically registered global-scale measures such as
sidereal time, geomagnetic field fluctuations, and seismographic activity. A second major focus will be on the
transitional probabilities within the sequences of interest. Atmanspacher and
Scheingraber's Scaling Index Analysis is one procedure for exploring this area, and Jiri
Wackerman's measure of Omega-Complexity, when implemented, will provide a comprehensive
perspective on internal structure in the data. The following list of likely
measures, with short names and brief annotation, suggests some useful projects for
analyst-programmers. (1) Arbitrary-block Chi-square calculations and statistics, "Block Variability". Similar to the segment-based analysis, but with the option to specify a sub-segment length. That length (for example, 1 second, 1 minute, or 15 minutes, ...) would define data-blocks whose meanshifts would be the basis for the Chi-square test statistic. (Ref. Dick Bierman) (1) Correlation Matrix calculation and analysis, "Intercorrelation". A general assessment of the hypothesis that some influences might affect all of the REG devices. Application to signed deviations, and also to squared deviations (Chi-square). Desirable to include pattern search or extreme value search algorithms. (Ref. James Spottiswoode) (1) Multi-level calculations of normal distribution statistics, "Parameters". A relatively simple set of parameter computations that could be applied to the active, prediction-based data selections, or to arbitrarily selected data blocks, including those obtained in a resampling procedure to generate empirical distribution parameters. (1) Global testing for conformance to probability theory, "Calibration". A relatively simple set of calibration tests that would be applied to the individual and collective data streams. (1) Omega complexity (Phase 2), using Jiri Wackerman's methods, "Complexity". A comprehensive measure to be applied to large blocks of unselected data to assess internal structure. Based on procedures used to summarize and display structure in neurophysiological data. You can download descriptions in various formats. (1) Random music to use human perception of structure, "Music Explorations". We consider it possible that musical and rhythmic patterns created from the data might reveal structure. Even if this possibility is not borne out, the resulting music should have an aesthetic value. (2) Fitting analysis strategies to
the experimental questions, "Optimization". (2) Prediction-based testing,
multiple regression, ANOVA, "Factors". (2) Cluster/factor analysis,
grade-of-membership, "Clusters". (2) Scaling index analysis,
autocorrelation strategies, "Transitions". (2) Time series, dynamic systems
models, chaos-dynamics, "Dynamics". (2) Probability modeling, Complexity
theory, "Complexity". (2) Neural nets or other artificial
intelligence programs, "AI". (2) Open-ended category for bright
ideas and incisive questions, "Exploration". Graphical DisplaysWe want the website to be attractive and elegant, and well-designed displays of the data can greatly contribute to the art of the matter. Some displays are aesthetically pleasing by their nature. Where the best form is a graph, we will prefer a straight display, following Edward Tufte's "waste no ink" dictum, of simplicity, readability, and elegance. On the other hand, dynamic, flowing and changing displays will be used where this makes sense, and displays may be multi-dimensional, with multiple spatial dimensions, color, motion, and sound if these help to make the information accessible. (1) Simple pictures of hypothesis support (e.g., cumulative deviation of Chi-square; histogram with error bars). These should be among the most immediate displays, creating an automatically updated picture of "results". (1) Displays of time line, prediction registry, independent variables. This is conceived as a comprehensive, overall picture of what is happening in the project. It could take many forms, and should be very interesting to the viewer as a compact way of showing the nature and purpose of the project. Graphical versions of non-data displays, e.g., status of EGG-net. (1) Running and Cumulative Z-scores, running and cumulative Chi-squares This already exists for the Daily eggsummaries. What is needed is a flexible set of tools allowing specification of a time-period of interest, and selection of subsets of eggs, e.g., those close to or distant from an event. See related note in the statistics list. (1) Global map of overall and local "hot-spot" structure, coherence. One version already exists in the daily movie, but a variety of compact, summary overviews in graphic form can be envisioned. For example, check out this proposal for a Desktop Egg display of the activity. (1) Spatial Principal Component Analysis and Omega-complexity. Visual or graphic aspect of the analysis described above in the statistics list. This is a 3-dimensional ``macro-state space'' capable of summarizing huge quantities of information. Also pretty graphics. (2) Global and local composites of statistical parameters. Tables may be the better display for much of this, but graphs may be desirable for some aspects. (2) Algorithmic conversion of data streams to musical chords. The daily movie already presents an interesting version, but it would be good to have other translations since musical intuitions are highly variable. (2) Map of correlations with
environmental variables. (2) QEEG frequency and correlation
arrays. (2) Resonance and coherence
"meters". Project PossibilitiesIt is most helpful if people who wish to contribute programming or analysis develop projects that are directly interesting. We can help with information, access to data, and advice, but generally have a full plate. There are any number of good ideas, and we will list here a few that need a champion, because although they are worth doing, we don't have enough hands to do them all. Let us know if you do want to work on one or another of these.
Control DataControl data are needed to establish the viability of the statistical results from "active" data generated during events specified via the prediction protocol. The GCP controls are based on quality-controlled equipment design, thorough device calibration, and a procedure called resampling. Resampling examines data near but not in the active segments, to build an empirical distribution to compare with theoretical expectations. This is one of the projects suggested in the statistics list. Data ProtectionThe raw data are quite well protected against various errors, but it remains possible for some malfunction to generate defective data, or for a malicious attack over the network to inject spurious data. Neither of these is considered likely; indeed they are highly unlikely, but it would be desirable to have automatic conformity checks to assure integrity of data upon entry into any analytic process. While the "active" data predicted to show deviations are immediately interesting, it is important also to test calibration data (i.e., data surrounding the active data) against expectation models. This is the purpose of the resampling procedure described above. GCP Home Email: rdnelson@princeton.edu |