Explanation of Hierarchical Temporal Memory Trace Formation

Hierarchical Spatiotemporal Sparse Distributed
Memory Trace of a Sequence

This animation shows the procession through time of a hierarchical temporal SDR memory trace representing the input sequence, “BALL”. NOTE: This figure is unfolded in time: the blue (horizontal) links are actually recurrent. A main point of this figure is to suggest that the three input sources to a level, bottom-up (U), horizontal (H), and top-down (D), combine, on each time step, to determine the code that becomes active at that level. NOTE in particular, that the L2 codes each last ("persist") for two time steps and the L3 code persists for all four time steps.
This model has an input level, which uses a localist representation: 4x4 grid of binary pixels. Each internal level consists of one macrocolumn ("mac") that is comprised of a set of WTA minicolumns [N.b.: In general, a level will be a tiling of many macs, e.g., the tiling of hypercolumns of V1]. The L1, L2, and L3 macs consists of 12, 9, and 6 minicolumns, respectively. The figure shows the temporal trace unfolded (unrolled) in time. Thus this is a recurrent model, not a model in which time is explicitly spatialized, e.g., Waibel et al's Time Delay Neural Net.
Black arrows show the propagation of U signals via the feedforward synaptic matrices from one level of the cortical hierarchy to the next, e.g., V1 to V2 to V4, via the U-fibers of the sub-cortical white matter (i.e., the cortex's "cable basement"). These signals are shown propagating up through all levels on the same t. It's known that this upward pass of signals in cortex takes about 100-150 msec., i.e., one Theta cycle. Thus, our view is that the cortex as a whole updates its overall representation of the world every Theta cycle. Thus, each t is a Theta cycle. [N.b.: Even though the overall cortical representation "refreshes" at Theta, the macs at higher levels are refreshed at progressively higher multiples of Theta (determined by their perisistence parameters). Furthermore, it's possible, in any given mac, that the computation that combines the U, H, and D signals to determine the next code may result in the code active on the current persistence period being reinstated for the next persistence period. Thus, longer timescale aspects of the world can be maintained while their shorter timsecale components update at higher frequencies.]
Blue arrows show the propagation of H signals via the recurrent H synaptic matrix in each internal level. These signals are shown originating from a code active at t and arriving back at the cells of the same mac at t+1. That is, the figure is unfolded in time.
Magenta arrows show propagating D signals, originating from the superjacent code active on the same time step.
This figure serves equally well as a picture of the initial formation (learning) of the trace in response to “BALL” and as a picture of the reinstatement (retrieval) of a previously learned trace of “BALL”.

You can see the forward (upward) sweep of activation on each time step. In the real cortex, there may be 7-10 levels. We envision that each level does its processing in a gamma cycle and that the theta cycle corresponds to one sweep through all levels.
The processing that takes place at each level is the evaluation of the total input (U, D, and H signals), calculation of the familarity, G, of that total (context-dependent) input, and choosing a code to become active to represent that total input. See my 2010 and 2014 papers for descriptions of the processing.
The pictorial convention is that at the beginning of each new time step, the whole network appears showing the codes that were active at the end of the prior time step. As the signals propagate and the code selection algorithm executes at each level, the code may be changed, in which case you will see the new code become active while the prior code fades out. On the other hand, if the age of the code at a given level is less than its persistence, it remains active. This is shown as a momentary pulsing (blinking) of the code. Again, the longer persistence of a code at level J+1 is what allows that code to become associatively linked with multiple (here, two) successive codes at level J.

Note that each L2 code remains active for two time steps and the L3 code remains active for all four time steps.
Note, in particular the convergence of U, D, and H signals at each internal level (only U and H at the top) on each time step. Each unit in a mac normalizes these inputs separately and then multiplies (possibly nonlinearly transformed versions of) of them to determine the unit's overall local degree of support.
As suggested by this animation, the formation (learning) of a memory trace is essentially single-trial (one-shot). When presented with a novel sequence, the model will detect low familarity (G~0.0) on each time step at each level. In this case, codes will be selected at random and full-strength associative connections will be made between them. When presented with a familiar sequence, the model will detect high familiarity (G~1.0) on each time step and at each level, which minimizes the amount of randomness in the code selection process, allowing the determinisitic influence of prior learning to dominate code selection and, with high probability, reinstate the previously learned codes.

Hierarchical Spatiotemporal Sparse Distributed Memory Trace of a Sequence

Hierarchical Spatiotemporal Sparse Distributed
Memory Trace of a Sequence