### 2018

- Rinkus, G. (Submitted abstract) First Spike Combinatorial Coding: The Key to Brain’s Computational Efficiency. Submitted to Cognitive Computing 2018, Hannover.
**Abstract:**For a single source neuron, spike coding schemes can be based on rate or on precise spike time(s) relative to an event, e.g., to a particular phase of gamma. Both are fundamentally temporal, requiring a decode window duration*T*much longer than a single spike. But, if information is represented by population activity (distributed codes, cell assemblies) then messages are carried by populations of spikes propagating in bundles of axons. This allows an*atemporal*coding scheme where the signal is encoded in the*instantaneous sum*of*simultaneously*arriving spikes, in principle, allowing*T*to shrink to the duration of a single spike. In one type of atemporal population coding scheme, the fraction of active neurons in a source population (thus, the fraction of active afferent synapses) carries the message. However, any single message carried by this variable-size code can represent only*one*value (signal). In contrast, if the source field uses fixed-size, combinatorial coding, then any one active code can represent*multiple*values, in fact, the*entire likelihood distribution*, e.g., over all values, e.g., of a scalar variable, stored in the field. Consequently, the vector of single, e.g.,*first*, spikes sent by such a code can*simultaneously*transmit the full distribution. Combining fixed-size combinatorial coding and first-spike coding may be key to explaining the speed and energy efficiency of probabilistic computation in the brain. -
Rinkus, G. (2018) Sparse distributed representation, hierarchy, critical periods, metaplasticity: the keys to lifelong fixed-time learning and best-match retrieval. (Accepted Talk) Biological Distributed Algorithms 2018 (London). Abstract
**Abstract:**Among the more important hallmarks of human intelligence, which any artificial general intelligence (AGI) should have, are the following. 1. It must be capable of on-line learning, including with single/few trials. 2. Memories/knowledge must be permanent over lifelong durations, safe from catastrophic forgetting. Some confabulation, i.e., semantically plausible retrieval errors, may gradually accumulate over time. 3. The time to both: a) learn a new item; and b) retrieve the best-matching / most relevant item(s), i.e., do similarity-based retrieval, must remain constant throughout the lifetime. 4. The system should never become full: it must remain able to store new information, i.e., make new permanent memories, throughout very long lifetimes. No artificial computational system has been shown to have all these properties. Here, we describe a neuromorphic associative memory model, Sparsey, which does, in principle, possess them all. We cite prior results supporting possession of hallmarks 1 and 3 and sketch an argument, hinging on strongly recursive, hierarchical, part-whole compositional structure of natural data, that Sparsey also possesses hallmarks 2 and 4.

### 2017

- A Radically Novel Explanation of Probabilistic Computing in the Brain. Invited Talk to Xaq Pitkow Lab Weekly Seminar Dec. 18, 2017.
**Abstract:**It is widely believed that the brain computes probabilistically and via some form of population coding. I describe a concept and mechanism of probabilistic computation and learning that differs radically from existing probabilistic population coding (PPC) models. The theory, Sparsey, is based on the idea that items of information (e.g., concepts) are represented as sparse distributed representations (SDRs), i.e., relatively small subsets of cells chosen from a much larger field, where the subsets may overlap, cf. ‘cell assembly’, ‘summary statistic’ (Pitkow & Angelaki, 2017). A Sparsey coding field consists of*Q*WTA competitive modules (CMs), each consisting of*K*units. Thus, all codes are of fixed size, Q, and the code space is*K*. This allows an extremely simple way to represent the likelihood/probability of a concept: the probability of concept X is simply the fraction of X’s SDR code present in the currently active code. But to make sense, this requires that more similar concepts map to more similar (more highly intersecting) codes (“SISC” property). If SISC is enforced, then any single active SDR code simultaneously represents both a particular concept (at 100% likelihood) and the entire likelihood distribution over all concepts stored in the field (with likelihoods proportional to the sizes of their codes’ intersections with the currently active code). The core of Sparsey is a learning/inference algorithm, the code selection algorithm (CSA), which ensures SISC and which runs in fixed time, i.e., the number of operations needed both to learn (store) a new item and to retrieve the best-matching stored item remains constant as the number of stored items increases (cf. locality-sensitive hashing). Since any SDR code represents the entire distribution, the CSA also realizes fixed-time ‘belief update’. I will describe the CSA and address the neurobiological correspondence of the theory’s elements/processes and highlight relationships with Pitkow & Angelaki, 2017.^{Q} - Superposed Episodic and Semantic Memory via Sparse Distributed Representation (arXiv 2017, Submitted)
- The Brain’s Computational Efficiency derives from using Sparse Distributed Representations. Rejected from Cognitive Computational Neuroscience 2017.
**Abstract:**Machine learning (ML) representation formats have been dominated by: a) localism, wherein individual items are represented by single units, e.g., Bayes Nets, HMMs; and b) fully distributed representations (FDR), wherein items are represented by unique activation patterns over all the units, e.g., Deep Learning (DL) and its progenitors. DL has had great success vis-a-vis classification accuracy and learning complex mappings (e.g., AlphaGo). But, without massive machine parallelism (MP), e.g., GPUs, TPUs, and thus high power, DL learning is intractably slow. The brain is also massively parallel, but uses only 20 watts and moreover, the forms of MP used in DL, model / data parallelism and shared parameters, are patently non-biological, suggesting DL’s core principles do not emulate biological intelligence. We claim that a basic disconnect between DL/ML and biology and the key to biological intelligence is that instead of FDR or localism, the brain uses sparse distributed representations (SDR), i.e., “cell assemblies”, wherein items are represented by small sets of binary units, which may overlap, and where the pattern of overlaps embeds the similarity/statistical structure (generative model) of the domain. We’ve previously described an SDR-based, extremely efficient, one-shot learning algorithm in which the primary operation is permanent storage of experienced events based on single trials (episodic memory), but in which the generative model (semantic memory, classification) emerges automatically, and as a computationally free, in terms of time and power, side effect of the episodic storage process. Here, we discuss fundamental differences between the mainstream localist/FDR-based and our SDR-based approaches. - A Radically new Theory of How the Brain Represents and Computes with Probabilities. (arXiv)
**Abstract:**The brain is believed to implement probabilistic reasoning and to represent information via population, or distributed, coding. Most previous population-based probabilistic (PPC) theories share several basic properties: 1) continuous-valued neurons; 2) fully(densely)-distributed codes, i.e., all(most) units participate in every code; 3) graded synapses; 4) rate coding; 5) units have innate unimodal tuning functions (TFs); 6) intrinsically noisy units; and 7) noise/correlation is considered harmful. We present a radically different theory that assumes: 1) binary units; 2) only a small subset of units, i.e., a sparse distributed code (SDC) (cell assembly, ensemble), comprises any individual code; 3) binary synapses; 4) signaling formally requires only single (first) spikes; 5) units initially have completely flat TFs (all weights zero); 6) units are not inherently noisy; but rather 7) noise is a resource generated/used to cause similar inputs to map to similar codes, controlling a tradeoff between storage capacity and embedding the input space statistics in the pattern of intersections over stored codes, indirectly yielding correlation patterns. The theory, Sparsey, was introduced 20 years ago as a canonical cortical circuit/algorithm model, but not elaborated as an alternative to PPC theories. Here, we show that the active SDC simultaneously represents both the most similar/likely input and the coarsely-ranked distribution over all stored inputs (hypotheses). Crucially, Sparsey's code selection algorithm (CSA), used for both learning and inference, achieves this with a single pass over the weights for each successive item of a sequence, thus performing spatiotemporal pattern learning/inference with a number of steps that remains constant as the number of stored items increases. We also discuss our approach as a radically new implementation of graphical probability modeling.

Abstract:The abilities to perceive, learn, and use generalities, similarities, classes, i.e., semantic memory (SM), is central to cognition. Machine learning (ML), neural network, and AI research has been primarily driven by tasks requiring such abilities. However, another central facet of cognition, single-trial formation of permanent memories of experiences, i.e., episodic memory (EM), has had relatively little focus. Only recently has EM-like functionality been added to Deep Learning (DL) models, e.g., Neural Turing Machine, Memory Networks. However, in these cases: a) EM is implemented as a separate module, which entails substantial data movement (and so, time and power) between the DL net itself and EM; and b) individual items are stored localistically within the EM, precluding realizing the exponential representational efficiency of distributed over localist coding. We describe Sparsey, a unsupervised, hierarchical, spatial/spatiotemporal associative memory model differing fundamentally from mainstream ML models, most crucially, in its use of sparse distributed representations (SDRs), or, cell assemblies, which admits an extremely efficient, single-trial learning algorithm that maps input similarity into code space similarity (measured as intersection). SDRs of individual inputs are stored in superposition and because similarity is preserved, the patterns of intersections over the assigned codes reflect the similarity, i.e., statistical, structure, of all orders, not simply pairwise, over the inputs. Thus, SM, i.e., a generative model, is built as a computationally free side effect of the act of storing episodic memory traces of individual inputs, either spatial patterns or sequences. We report initial results on MNIST and on the Weizmann video event recognition benchmarks. While we have not yet attained SOTA class accuracy, learning takes only minutes on a single CPU.

### 2014

- Sparsey™: Event recognition via deep hierarchical sparse distributed codes. (2014) Frontiers in Computational Neuroscience. v. 8 December 2014 | doi: 10.3389/fncom.2014.00160 (Frontiers Link)
- Sparse Distributed Coding & Hierarchy: The Keys to Scalable Machine Intelligence. DARPA UPSIDE Year 1 Review Presentation. 3/11/14. (PPT)

Abstract: The visual cortex's hierarchical, multi-level organization is captured in many biologically inspired computational vision models, the general idea being that progressively larger scale (spatially/temporally) and more complex visual features are represented in progressively higher areas. However, most earlier models use localist representations (codes) in each representational field (which we equate with the cortical macrocolumn, "mac"), at each level. In localism, each represented feature/concept/event (hereinafter "item") is coded by a single unit. The model we describe, Sparsey, is hierarchical as well but crucially, it uses sparse distributed coding (SDC) in every mac in all levels. In SDC, each represented item is coded by a small subset of the mac's units. The SDCs of different items can overlap and the size of overlap between items can be used to represent their similarity. The difference between localism and SDC is crucial because SDC allows the two essential operations of associative memory, storing a new item and retrieving the best-matching stored item, to be done in fixed time for the life of the model. Since the model's core algorithm, which does both storage and retrieval (inference), makes a single pass over all macs on each time step, the overall model's storage/retrieval operation is also fixed-time, a criterion we consider essential for scalability to the huge ("Big Data") problems. A 2010 paper described a nonhierarchical version of this model in the context of purely spatial pattern processing. Here, we elaborate a fully hierarchical model (arbitrary numbers of levels and macs per level), describing novel model principles like progressive critical periods, dynamic modulation of principal cells' activation functions based on a mac-level familiarity measure, representation of multiple simultaneously active hypotheses, a novel method of time warp invariant recognition, and we report results showing learning/recognition of spatiotemporal patterns.

### 2013

- A cortical theory of super-efficient probabilistic inference based on sparse distributed representations. 22nd Annual Computational Neuroscience Meeting, Paris, July 13-18. BMC Neuroscience 2013, 14(Suppl 1):P324 (Abstract)
- Constant-Time Probabilistic Learning & Inference in Hierarchical Sparse Distributed Representations, Invited Talk at the Neuro-Inspired Computational Elements (NICE) Workshop, Sandia Labs, Albuquerque, NM, Feb 2013.

### 2012

- Probabilistic Computing via Sparse Distributed Representations. Invited Talk at Lyric Semiconductor Theory Seminar, Dec. 14, 2012.
- Quantum Computation via Sparse Distributed Representation. (2012) Gerard Rinkus. NeuroQuantology 10(2) 311-315. (NeuroQuantology Link)

Abstract: Quantum superposition states that any physical system simultaneously exists in all of its possible states, the number of which is exponential in the number of entities composing the system. The strength of presence of each possible state in the superposition—i.e., the probability with which it would be observed if measured—is represented by its probability amplitude coefficient. The assumption that these coefficients must be represented physically disjointly from each other, i.e., localistically, is nearly universal in the quantum theory/computing literature. Alternatively, these coefficients can be represented using sparse distributed representations (SDR), wherein each coefficient is represented by a small subset of an overall population of representational units and the subsets can overlap. Specifically, I consider an SDR model in which the overall population consists of Q clusters, each having K binary units, so that each coefficient is represented by a set of Q units, one per cluster. Thus, K^Q coefficients can be represented with KQ units. We can then consider the particular world state, X, whose coefficient’s representation, R(X), is the set of Q units active at time t to have the maximal probability and the probabilities of all other states, Y, to correspond to the size of the intersection of R(Y) and R(X). Thus, R(X) simultaneously serves both as the representation of the particular state, X, and as a probability distribution over all states. Thus, set intersection may be used to classically implement quantum superposition. If algorithms exist for which the time it takes to store (learn) new representations and to find the closest-matching stored representation (probabilistic inference) remains constant as additional representations are stored, this would meet the criterion of quantum computing. Such algorithms, based on SDR, have already been described. They achieve this "quantum speed-up" with no new esoteric technology, and in fact, on a single-processor, classical (Von Neumann) computer.

### 2010

- A cortical sparse distributed coding model linking mini- and macrocolumn-scale functionality. (2010) Gerard Rinkus. Frontiers in Neuroanatomy 4:17. doi:10.3389/fnana.2010.00017 (Frontiers Link)

Abstract: No generic function for the minicolumn - i.e., one that would apply equally well to all cortical areas and species - has yet been proposed. I propose that the minicolumn does have a generic functionality, which only becomes clear when seen in the context of the function of the higher-level, subsuming unit, the macrocolumn. I propose that: (a) a macrocolumn's function is to store sparse distributed representations of its inputs and to be a recognizer of those inputs; and (b) the generic function of the minicolumn is to enforce macrocolumnar code sparseness. The minicolumn, defined here as a physically localized pool of approximately 20 L2/3 pyramidals, does this by acting as a winner-take-all (WTA) competitive module, implying that macrocolumnar codes consist of approximately 70 active L2/3 cells, assuming approximately 70 minicolumns per macrocolumn. I describe an algorithm for activating these codes during both learning and retrievals, which causes more similar inputs to map to more highly intersecting codes, a property which yields ultra-fast (immediate, first-shot) storage and retrieval. The algorithm achieves this by adding an amount of randomness (noise) into the code selection process, which is inversely proportional to an input's familiarity. I propose a possible mapping of the algorithm onto cortical circuitry, and adduce evidence for a neuromodulatory implementation of this familiarity-contingent noise mechanism. The model is distinguished from other recent columnar cortical circuit models in proposing a generic minicolumnar function in which a group of cells within the minicolumn, the L2/3 pyramidals, compete (WTA) to be part of the sparse distributed macrocolumnar code.

### 2009

- Familiarity-Contingent Probabilistic Sparse Distributed Code Selection in Cortex. (
**in prep**, also see this page) - Overcoding-and-Pruning:A Novel Neural Model of Temporal Chunking and Short-term Memory. (2009) Gerard Rinkus. Invited Talk in Gabriel Kreiman Lab, Dept. of Opthamology and Neuroscience, Children's Hospital, Boston, July 31, 2009.
- Overcoding-and-paring: a bufferless neural chunking model. (2009) Gerard Rinkus. Frontiers in Computational Neuroscience. Conference Abstract: Computational and systems neuroscience. (COSYNE '09) doi: 10.3389/conf.neuro.10.2009.03.292

### 2008

- Population Coding Using Familiarity-Contingent Noise.(abstract/poster)
*AREADNE 2008: Research in Encoding And Decoding of Neural Ensembles*, Santorini, Greece, June 26-29. (abstract) (poster) - Overcoding-and-pruning: A novel neural model of sequence chunking (manuscript
**in prep**)**-- Patented**

Abstract: We present a radically new model of chunking, the process by which a monolithic representation emerges for a sequence of items, called overcoding-and-pruning (OP). Its core insight is this: if a sizable population of neurons is assigned to represent an ensuing sequence immediately, at sequence start, it can then be repeatedly pruned as functions of each successive item. This solves the problem of assigning unique chunk representations to sequences that start in the same way, e.g., "CAT" and "CAR", without requiring temporary buffering of the items' representations. OP rests on two well-supported assumptions: 1) information is represented in cortex by sparse distributed representations; and 2) neurons at progressively higher cortical stages have progressively longer activation duration-or, persistence. We believe that this type of mechanism has been missed so far due to the historical bias of thinking in terms of localist representations, which cannot support it since pruning cannot be applied to a single representational unit.

### 2007

- A Functional Role for the Minicolumn in Cortical Population Coding. Invited Talk at
*Cortical Modularity and Autism*, University of Louisville, Louisville, KY, Oct 12-14, 2007. (PPT) (pdf) Animations do not show in pdf version.

### 2006

- Hierarchical Sparse Distributed Representations of Sequence Recall and Recognition. Presentation given at The Redwood Center for Theoretical Neuroscience (University of California, Berkeley) on Feb 22, 2006. (PPT) (video) (Note: PPT presentation uses heavy animations)

### 2005

- Time-Invariant Recognition of Spatiotemporal Patterns in a Hierarchical Cortical Model with a Caudal-Rostral Persistence Gradient (2005) (poster) Rinkus, G. J. & Lisman, J.
*Society for Neuroscience Annual Meeting, 2005*. Washington, DC. Nov 12-16. Note that this poster is almost identical to the one presented at the First Annual Computational Cognitive Neuroscience Conference. - A Neural Network Model of Time-Invariant Spatiotemporal Pattern Recognition (2005) (abstract) Rinkus, G. J.
*First Annual Computational Cognitive Neuroscience Conference*, Washington, DC, Nov. 10-11.

### 2004 and earlier

- A Neural Model of Episodic and Semantic Spatiotemporal Memory (2004) Rinkus, G.J.
*Proceedings of the 26th Annual Conference of the Cognitive Science Society*. Kenneth Forbus, Dedre Gentner & Terry Regier, Eds. LEA, NJ. 1155-1160. Chicago, Ill. (pdf)A Quicktime animation that walks you through the example in Figure 4 of the paper.

- Software tools for emulation and analysis of augmented communication. (2003) Lesher, G.W., Moulton, B.J., Rinkus, G. & Higginbotham, D.J.
*CSUN 2003*, California State University, Northridge. - Adaptive Pilot-Vehicle Interfaces for the Tactical Air Environment. (2001) Mulgund, S.S., Zacharias, G.L., & Rinkus, G.J. in Psychological Issues in the Design and Use of Virtual Adaptive Environments. Hettinger, L.J. & Haas, M. (Eds.) LEA, NJ. 483-524.
- Leveraging word prediction to improve character prediction in a scanning configuration. (2002) Lesher, G.W. & Rinkus, G.J.
*Proceedings of the RESNA 2002 Annual Conference.*Reno. (pdf) - Domain-specific word prediction for augmentative communications (2001) Lesher, G.W. & Rinkus, G.J.
*Proceedings of the RESNA 2002 Annual Conference*, Reno. (pdf) - Logging and analysis of augmentative communication. (2000) Lesher, G.W., Rinkus, G.J., Moulton, B.J., & Higginbotham, D.J.
*Proc. of the RESNA 2000 Annual Conference*, Reno. 82-85. - Intelligent fusion and asset manager processor (IFAMP). (1998) Gonsalves,P.G. & Rinkus, G.J.
*Proc. of the IEEE Information Technology Conference*(Syracuse, NY) 15-18. (pdf) - A Monolithic Distributed Representation Supporting Multi-Scale Spatio-Temporal Pattern Recognition (1997)
*Int'l Conf. on Vision, Recognition, Action: Neural Models of Mind and Machine*, Boston University, Boston, MA May 29-31. (abstract) - Situation Awareness Modeling and Pilot State Estimation for Tactical Cockpit Interfaces. (1997) Mulgund, S., Rinkus, G., Illgen, C. & Zacharias, G. Presented at
*HCI International*, San Francisco, CA, August. (pdf) - OLIPSA: On-Line Intelligent Processor for Situation Assessment. (1997) S. Mulgund, G. Rinkus, C. Illgen & J. Friskie.
*Second Annual Symposium and Exhibition on Situational Awareness in the Tactical Air Environment*, Patuxent River, MD. (pdf) - A Neural Network Based Diagnostic Test System for Armored Vehicle Shock Absorbers. (1996) Sincebaugh, P., Green, W. & Rinkus, G.
*Expert Systems with Applications*,**11**(2), 237-244. - A Combinatorial Neural Network Exhibiting Episodic and Semantic Memory Properties for Spatio-Temporal Patterns (1996) G. J. Rinkus. Doctoral Thesis. Boston University. Boston, MA. (ResearchGate)
- TEMECOR: An Associative, Spatiotemporal Pattern Memory for Complex State Sequences. (1995)
*Proceedings of the 1995 World Congress on Neural Networks*. LEA and INNS Press. 442-448. (pdf) - Context-sensitive spatio-temporal memory. (1993)
*Proceedings of World Congress On Neural Networks*. LEA. v.2, 344-347. - Context-sensitive Spatio-temporal Memory. (1993) Technical Report CAS/CNS-93-031, Boston University Dept. of Cognitive and Neural Systems. Boston, MA. (pdf)
- A Neural Model for Spatio-temporal Pattern Memory (1992)
*Proceedings of the Wang Conference: Neural Networks for Learning, Recognition, and Contro*l, Boston University, Boston, MA - Learning as Natural Selection in a Sensori-Motor Being (1988)
*Proceedings of the 1st Annual Conference of the Neural Network Society*, Boston. - Learning as Natural Selection in a Sensori-Motor Being (1986) G.J.Rinkus. Master's Thesis. Hofstra University, Hempstead, NY.

Abstract: A model is described in which three types of memory—episodic memory, complex sequence memory and semantic memory—coexist within a single distributed associative memory. Episodic memory stores traces of specific events. Its basic properties are: high capacity, single-trial learning, memory trace permanence, and ability to store non-orthogonal patterns. Complex sequence memory is the storage of sequences in which states can recur multiple times: e.g. [A B B A C B A]. Semantic memory is general knowledge of the degree of featural overlap between the various objects and events in the world. The model's initial version, TEMECOR-1, exhibits episodic and complex sequence memory properties for both uncorrelated and correlated spatiotemporal patterns. Simulations show that its capacity increases approximately quadratically with the size of the model. An enhanced version of the model, TEMECOR-II, adds semantic memory properties. The TEMECOR-I model is a two-layer network that uses a sparse, distributed internal representation (IR) scheme in its layer two (L2). Noise and competition allow the IRs of each input state to be chosen in a random fashion. This randomness effects an orthogonalization in the input-to- IR mapping, thereby increasing capacity. Successively activated IRs are linked via Hebbian learning in a matrix of horizontal synapses. Each L2 cell participates in numerous episodic traces. A variable threshold prevents interference between traces during recall. The random choice of IRs in TEMECOR-I precludes the continuity property of semantic memory: that there be a relationship between the similarity (degree of overlap) of two IRs and the similarity of the corresponding inputs. To create continuity in TEMECOR-II, the choice of the IR is a function of both noise (Λ) and signals propagating in the L2 horizontal matrix and input-to-IR map. These signals are deterministic and shaped by prior experience. On each time slice, TEMECOR-II computes an expected input based on the history-dependent influences, and then computes the difference between the expected and actual inputs. When the current situation is completely familiar, Λ=0, and the choice of IRs is determined by the history-dependent influences. The resulting IR has large overlap with previously used IRs. As perceived novelty increases, so does Λ, with the result that the overlap between the chosen IR and any previously-used IRs decreases.