The video below illustrates the novel principles by which Sparsey's code selection algorithm (CSA), originally described in my 1996 thesis, and improved versions in several pubs since, e.g., Rinkus (2010), Rinkus (2014), and Rinkus (2017), causes similar inputs to be mapped to similar codes (SISC), where the similarity metric may generally be spatiotemporal and/or multimodal.  The first principle is that the codes are sparse distributed codes (SDCs), a.k.a. sparse distributed representations (SDRs), or Hebbian cell assemblies.  In Sparsey, an SDC coding field is a set of Q WTA competitive modules (CMs), each with K binary units.  Thus a code is a set of Q units, one per CM.  And the natural similarity metric for SDC is size of intersection.  Given this reprsesentational format, SISC can be achieved simply by making the scalar level of noise in the code choice process vary inversely with the familiarity (directly with the novelty) of the input.  This video and discussion and the corresponding DOWNLOADABLE APP explain this in more detail.  The app in particular allows the user to play with parameters and understand the computationally simple means by which Sparsey achieves SISC. To run the app, download the zip file (a compressed Java "dist" folder), extract all, cd into the extracted dist folder, and double-click the jar file.

The idea that similar inputs should be mapped to simlar codes is uncontentious.  However what makes Sparsey unique is that its learning algorithm creates new codes, adhering to SISC, in fixed time, i.e., with a number of operations that remains constant as the number of stored items grows. A corollary is that Sparsey does not explicitly compare a new input X to each stored input, nor to a log number of the stored inputs (as is the case for any tree data structure) when learning (creating the code of) X. Furthermore, the fact that similarity is preserved during learning means that the most similar stored input can also be retrieved in fixed time. Sparsey is thus a direct alternative to locality sensitive hashing (LSH). In fact, it is simpler, representationally richer [it captures arbitrarily graded degrees of similarity, whereas LSH theory is based on distinguishing close vs. far (a binary distinction)], applies equally well to both spatial and spatiotemporal (sequence) domains, learns the similarity metric (hash) directly from the data (based on single-trial, on-line learning)], and is biologically plausible.

At the start of the video, the coding field ("Mac") at lower right contains Q=10 WTA competitive modules (CMs), each with K=8 binary units. CM 1 has a yellow background to indicate that the V-to-µ plot (upper left) and "single CM" panel (lower left) are showing the details of CM 1 specifically. For each CM (in the Mac panel), we show the V and ρ distributions. The V distribution in a CM is generated as follows.

  1. One cell is randomly chosen to have the max V value, which is the current value of the "Max V" (or equivalently, "G") slider, which is V=1. A unit's V value is a measure of its input summation (but normalized to [0,1], i.e., of how closely its current input matches its afferent weight pattern (tuning), i.e., the "evidence" or "support" that it should become active. We do not show inputs explicitly in this application. That is, the V values that the user selects (or that are automatically selected based on distribution parameters, see below) implicitly reflect some history of inputs, and thus, of prior weight increases, but the specifics of those implied inputs and weight increases are not needed to illustrate how similarity is preserved.
  2. The V values of the other K-1 cells in the CM are drawn uniformly from the crosstalk range, determined by the the min and max sliders at upper right, which are set 10% and 30% at the start of this video.  Indeed, the crosstalk parameters impose further implicit constraints on the history of inputs, i.e., higher average crosstalk implies more inputs have been experienced.

Above each CM's V distribution, is its ρ distribution. Sparsey chooses/selects/activates a code in a mac by choosing one cell from the ρ distribution in each CM (i.e., softmax). The ρ distributions shown are those produced by putting the V values through the V-to-μ transform (a sigmoid, but collpsible to a constant function, depending on G) and renormalizing the μ's to ρ's.

You can see that some of the ρ bars are pink. That indicates an error:, i.e., a CM in which the cell with highest V was not ultimately chosen by the softmax on the ρ distribution. Why do we consider this an error? Because the fact that there is a cell in each CM that has a high V implies that the current input is highly familiar. In particular, if there is a cell with V=1 in every CM, this means that the binary weights from all features comprising the current input (which again, is not being explicitly modeled here) were increased on some prior occasion. Assuming the mac is in a regime (phase of life) such that not too many codes have been stored, then this situation, i.e., every CM has a cell with V=1, which further entails that G=1, strongly indicates that the current input has been seen before, and that the set of Q cells (one per CM) with V=1 was the code assigned to it on that prior occasion (at which time the weights from that input's active features to those Q cells were increased). So, that is why we can consider CMs in which the cell with max V was not chosen to be errors.

With the above in mind, when "Max V" is set to 1, the "Actual Accuracy" field reports the fraction of the Q CMs in which the correct cell, i.e., the cell with highest V, wins. Note that we can consider the (softmax) choice process in each CM to be a Bernoulli trial, where success probability is just the ρ value of the max-V cell, which we can call the max-ρ value, and the failure probability is 1 - max-ρ. Thus, the "Expected Accuracy" field reports the average of the Q max ρ values.

But now consider the situation when we lower the "Max V" slider. A lower max V means that the current input has not been seen exactly before. Again, assuming the regime where not too many codes have been stored in the mac, if the max V is 0.5, that implies that the current input has about half of its features in common with the closest-matching previously stored input. A max V of 0.75 means that the current input has about 75% of its features in common with the closest-matching stored input, etc. So, in order to enforce SISC, i.e., preserve similarity from the input space into the code space, what we want is that as we lower the "Max V" slider, while keeping the max and min crosstalk limits constant, we lower the Expected Accuracy. That is, as max V decreases, we want the probability of picking the max-ρ (which is also the max-V cell) in each CM to also decrease. This has the effect of decreasing the expected intersection of the code assigned to the current input with code of the most closely-matching stored input. Sparsey reduces the probability of choosing the max-ρ cell by squashing the V-to-μ sigmoid. This squashing of the sigmoid has the effect of flattening the ρ distributions, which can be viewed as titrating the amount of noise in the individual softmaxes and in the selection of the overall code.

So, with the above in mind, if you now watch the video, what you see is that as the "Max V" slider is lowered smoothly (you can see the black bars in the V plots of all the CMs in the Mac panel drop together), the "Expected Accuracy", i.e., the expected intersection with the closest-matching stored code, decreases smoothly. This constitutes a demonstration of SISC. Note that as you decrease the max V you will also see the number of pink ρ bars increase. (Actually, each time the "Max V" slider moves a little bit, a new draw is made. That's why you see pink bars updating quite a bit. To make this clearer, you might want to put the cursor somwhere on the Max V slider and just do successive clicks; each click will change the max V value slightly and generate a new sample.) If you download the app, you can experiment with the various parameters (eccentricity, range multiplier, and inflection pt sliders) and see how they affect the V-to-μ transform and thus, the "Expected Accuracy".

Note 1: The V-to-μ transform is modulated as a function of the global familarity (G) of the input, which is defined simply as the average maximum V value across the Q WTA competitive modules (CMs) that comprise a Sparsey coding field ("Mac"). This V-to-μ transform models the nonlinearity of a neuron and Sparsey is (to my knowledge) unique in proposing that the nonlinearity of the principal cells of a cortical representational field is modulated on-line and quickly, in fact, within a few, e.g., 5-10, ms (i.e., within a half-cycle of gamma), on the basis of a "mesoscale" measure of the subsuming representational field (e.g., the L2/3 population of a macrocolumn).

Note 2: The modulation of the V-to-μ transform, from a highly expansive sigmoid (when G ≈ 1, which means the input is maximally familiar, i.e., that it has likely been seen before) to the constant function (when G ≈ 0, which means the input is completely unfamiliar), can be viewed as modulating the amount of noise in the process of choosing winners in the CMs. I postulate that this variable "noise" mechanism is implemented by neuromodulators, e.g., ACh and NE (See 2010 paper).

So, what does the user of this app have to do to see how SISC is being preserved?

  • 1. Add a few cells (say, K=6) to the upper left chart by clicking in it K times (you can click at different V locations, but technically that's not necessary, since, as you will see, the cells' V values will be overridden in subsequent steps.
  • 2. Click on "Generate New Sample" button. This populates the Q CMs int the lower right panel, with K cells each. You will see that there is one cell in each CM that has a black bar in the V chart and that the height of that bar is V=1. The rest of the cells in each CM have gray bars and their V values are randomly drawn from the uniform distribution of V values between the "Min Crosstalk" and "Max Crosstalk" limits set by the two sliders at upper right.
  • 3. As explained above, the fact that there is a cell with V=1 in each CM, implies that the current input (which again, is not explicitly shown in this app) has been seen before, i.e., it is perfectly familiar. The fact that other cells have random V values between the low and high crosstalk limits implies that these other cells have been active in other codes and that the inputs corresponding to those codes have some features in common with the current input. (Note: There is a passive decay mechanism used in Sparsey, but that is not needed for the current discussion.)
  • 4. You will note that there is a particular value in the "Expected Accuracy" field. That is the average probability that the cell with the max V in its CM wins, averaged across the Q CMs. It reflects the current actual distributions of V values in the CMs and the current parameters of the V-to-μ transform. Thus, the expected number of CMs in which the max V cell wins is just that value, the expected accuracy, times Q.
  • 5. If you now slide the "Max V" slider from 1.0 down (i.e., as you consider inputs that are progressively less similar to their closest-matching stored input), you will see the expected accuracy field, and thus the expected intersection with the code of the most closely-matching stored input, smoothly decrease. As you slide it back up, the expected accuracy smoothly increases. That's it. That demonstrates SISC!

The results below show how expected intersection size (as % of Q) decreases smoothly as the novelty of the current input increases (as the similarity of the current input to the closest matching stored input decreases). Seven experiments were run, using different parameter settings for V-to-μ transform. "mult" refers to the "range multiplier" slider in the app, "ecc" to the "sigmoid eccentricity" slider, and "inflect" to the "Horiz. Inflection Pt. Position" slider.  All experiments except the last used a Mac with Q=10 CMs each with K=8 units and min and max crosstalk limits of 10% and 30%; the last used Q=20 CMs, each wth K=10, and min and max crosstalk limits of 20% and 50%. The graded simiarlity preservation can be seen in all cases. Sparsey preserves similarity from input space to SDR intersection space

You can do many other things with the app as well. For instance, for any particular "Max V" (or equivalently, "G" setting), you can vary the other parameters, to see how they also cause smooth variation in the expected accuracy. Thus, it is not simply the V-to-μ transform range that we could modulate based on G, we could also choose to modulate these other parameters, e.g., sigmoid eccentricity, based on G. This is a subject of ongoing/future research. One interesting manipulation is explorable by playing with the "Phase of Life" buttons and is described below.

The "Phase of Life" buttons

The four "Phase of Life" radio buttons simulate conditions within a mac that might exist at different points of the system's lifetime. Since Sparsey's mac is an associative memory in which SDRs (memory traces) are stored in superposition, as more and more SDRs are stored, the expected intersection size over the stored SDRs will increase, thus crosstalk interference will increase. As noted above, the max V value (of one randomly chosen cell in each CM) is set by the user's setting of the "Max V" or "G" sliders (they are locked together). The rest of the cells in each CM are drawn from uniform distributions whose low and high limits are set either by the "Life Phase" buttons, or by the user setting the sliders at upper right (as was done in the video immediately above). Also, the user can choose whether those crosstalk distribution limits are determined relative to the currently spececified max V or "absolutely" (i.e., relative to the max possible max V, i.e., V=1).

  • Early: The idea is that early in the model's life, when few memories have been stored, there will be little crosstalk (interference, overlap) between memory traces. To simulate this. the V values of the other cells are chosen randomly from the interval [0,0.1].
  • Middle: In this case, the other V values are selected from the interval [0.1,0.5]. This simulates a later period of life after a lot more inputs have been stored and any given cell will have been used in the codes of many of those inputs.
  • Late: The other cells' V values are chosen from interval [0.2,0.8], indicating mounting crosstalk between traces.
  • Old: The model has stored so many traces that crosstalk is very high. Specifically, the K-1 other V values are chosen from interval [0.6,0.9].

A major principle to see here is that, for any given set of sigmoid parameters, increasing crosstalk reduces the expected number of CMs in which the cell with the highest V value ends up being chosen winner. When the input is perfectly familiar, i.e., when G=1, each such event constitutes an error, i.e., the wrong cell has been activated in the CM. However, a code consists of Q units; thus, making a few unit-level errors may still allow the code, as a whole, to exert the proper influence on downstream computations. More generally, the fact that in Sparsey, the Q units that comprise a code are chosen in independent processes is what allows for graded levels of similarity (intersection) between codes, i.e., the SISC property. Note that although we don't actually make the final winner choices in this video, the gradual flattening of the ρ distributions as one goes from "Early" to "Old", implies the graded reduction in the expected number of max-V cells that end up winning in their respective CMs.

So, what if we move the sigmoid inflection point, let's call that parameter Y, to the right as the coding field ages, i.e., as more and more SDR codes are stored in it and the level of crosstalk interference rises? For example, select "Old". You can see that expected accuracy is low. But if you then slide Y to the right, you can see that exepcted accuracy recovers. On the other had, suppose you select "Early". You will see highly peaked distributions in each CM [assuming you've populated the mac (see above)]. If you now slide the Y slider around, you will see that the range of Y for which the correct cells (those with the max V in their CM) have very high probability of winning is very large, extending from just over 10% up to 100%. Now select the "Middle" button. Note that the model can still ensure a high probability of reactivating the correct code, but that the range of Y for which this is the case has contracted to the right, typically spanning from about 50% to 100%. Click on "Late" and the range of Y yielding a high probability of success shrinks further. So, simply shifting the sigmoid's inflection point to higher V values recovers/preserves retrieval accuracy.

The question arises, why not simply keep Y at a high value througout the model's life? The answer is that there is a tradeoff involving Y. When Y is very high, then even fairly high V values will be squashed to near-zero ρ values. This greatly diminishes the propensity of the model to assign similar (i.e., more overlapped) codes to similar moments. Thus, the gradedness of the categories that the model would embed would be much reduced. Keeping Y at or near 100%, would simulate a person who stored almost all new experiences very uniquely, but was impoverished at forming categories based on similarity. Such a person would have very (i.e., too) strong episodic memory ability/capacity and weak semantic memory ability, perhaps something like a savant syndrome, e.g., Luria's famous patient S, "the mnemonist".

Y should depend on how saturated a regions's afferent synaptic matrices have become, i.e., what fraction of its synapses have been increased to the high setting, e.g., w=1, assuming binary synapses. I think that such a neurophysiological parameter would be quite easy for the brain to keep track of through the life of an organism. Moreover, it seems reasonable to believe that such a "degree of saturation" parameter could be maintained on a region-by-region (or macrocolumn-by-macrocolumn) basis across cortex. Thus, for example, the brain region(s) that are most used in storing the memory/knowledge of a given individual's particular field of expertise might fill up faster than other regions.

Sigmoid eccentricity parameter

The Sigmoid Eccentricity parameter simply controls how abruptly the the V-to-µ map changes from its low to its higher values. It also affects the granularity of the categories (spatial or spatiotemporal) formed by the model. However, we will not discuss it further here...except to repeat that it too can be modulated by G to achieve/enforce SISC.

Summary

The goal of this page and video and the downloadable Java app is to show how Sparsey's Code Selection Algorithm (CSA) achieves/enforces SISC extremely efficiently. It does so by transforming a local (cell-level) measure of support, V (essentially a cell's input summation, but normalized to [0.1]), over a competitive population of cells, using global (macrocolumn-level) information, G, into a distribution of relative likelihoods of becoming active (i.e., of winning the competition), μ, and then, by simple normalization, into a final probability distribution of winning, ρ. Crucially, this transform, specifically its range, is modulated by G in a way that results in enforcing that more similar inputs are mapped to more similar (more highly intersecting) SDCs.