To Do: Automated Analysis and Graphics

GCP Home

The primary analyses involve extracting the data for specified time periods corresponding to "global events", from the existing daily files which contain all data from all of the egg-sites during a 24-hour period (UTC time). The term "egg" is used to refer to a single REG in the network, which now has over 20 separate egg-sites. The data are "trials" consisting of the sum of 200 random bits, taken at a rate of one trial per second, resulting in an array of numbers with a mean of 100 and standard deviation of sqrt (50). Each egg produces a continuous sequence of one trial per second, and all data are archived on the noosphere server where they are always available via a general extract form.

Once extracted for a particular analysis, the data are converted to Z-scores representing the deviation from expectation -- this is calculated as ((value - 100)/standard error). The data may be used directly as the second-by-second values, or they may be compacted in "blocks" of, e.g., a minute, or 15 minutes, or an arbitrary size. In the latter case, the mean of the block of data for each egg is compared with expectation (still 100) and converted to a Z-score. In the most frequently used analysis, these Z-scores are squared, to produce a Chisquare distributed quantity with one degree of freedom. Chisquares are additive, so the Chisqares for all eggs and all blocks can be summed to yield a Chisquare with N-eggs x N-blocks degrees of freedom. This represents the accumulated deviation of the means (or trials if raw data are used) from expectation, and a probability for that deviation is calculated from the Chisquare distribution for the appropriate number of degrees of freedom.

We currently have only one form, which is a general "extract" form that allows specification of a date and the beginning and end times for a data subset. What we would like to have is a more flexible form that would allow specification of the date and times, and in addition, the block-size, ranging from 1 second to the full, specified time-period. (A further refinement, which might be left for a second stage of software development,would allow specification of subsets of eggs, such as those in the US, or Europe, or a particular range of timezones.)

Having specified these parameters, we would like to have automated calculations of the Z-scores and Chisquares as described above, and an output table showing the identifying information and the computed results for each egg (and subsets if these are specified), and all eggs combined. From the same data, we would like to have a graph showing the cumulative deviation of the Chisquare values minus their expectation (which is 1) from zero.

You can see examples of such graphs on the website under and Selected Examples. or you can go directly to More directly useful information may be found under where you will find the general extract form, and on that page there is a link to the Basket Data File Format, which gives details of the CSV data files and also has links to a zipped file containing the primary functions for data processing, including some perl scripts which are directly usable but also may be a good starting point for other scripts. In addition, the scripts that generate the automatic tables and graphs may be valuable models, and I will provide access to them if this would be helpful.

I hope this is sufficient information to get you started. Probably you will have other questions, or need certain specific information. Please let me know what I can do to be helpful.