Neurithmic Systems
Finding the Fundamental Cortical Algorithm of Intelligence

Highlights

  • Rod Rinkus Extended Research Statement
  • Sparsey already likely at least as fast without machine parallelism (MP) as gradient-baesd methods are with MP, can easily be sped up by 100-1,000x via simple, existing, non-GPU-based MP, e.g., SIMD, ASIC:
  • CCN2017 Submission (Rejected): The Brain’s Computational Efficiency derives from using Sparse Distributed Representations. Related post on Quora.
  • Apr 21 '17: Invited Talk: Intel Microarchitecture Technology Lab. Hillsboro, OR (host: Narayan Srinavsa) Sparse Distributed Coding Enables Super-Efficient Probabilistic Modeling.
  • Apr 20 '17: Invited Talk: IBM Almaden Machine Intelligence Lab. San Jose, CA (host: Winfried Wilcke) Sparse Distributed Coding Enables Super-Efficient Probabilistic Modeling.
  • Mar 5 '17: NICE 2017 (IBM Almaden) Poster: "A Radically new Theory of How the Brain Represents and Computes with Probabilities"
  • Jan 30 '17: A Recent arXiv paper describing how Sparsey constitutes a radically different theory [from the mainstream probabilistic population coding (PPC) theories] of how the brain represents and computes with probabilities, which includes radically different concepts / explanations for noise, correlation, and the origin of the classical unimodal single-cell receptive field. An applet explaining core principles in the paper.
  • Aug 22 '16: Results on MNIST. Preliminary result: 91% on substantial subset of MNIST. To my knowledge, these are the first reported results of ANY SDR-based model (e.g., Numenta, Kanerva, Franklin & Snaider, Hecht-Nielsen) on ANY well-known benchmark! Single-trial learning, no gradients, no MCMC, no need for machine parallelism, just simple Hebbian learning with binary units and effectively binary weights.
  • Jun 1 '16: Hyper-essay discussing potential weaknesses of Deep Learning models, particularly in light of Sparsey's unique and powerful properties, including fixed-time learning, best-match retrieval, and belief update, of space-time patterns.
  • Mar 8 '16: Talk at NICE 2016 (Berkeley, March 7-9), focused on the idea that the gains in computational speed possible via algorithmic parallelism (i.e., distributed representations, and specifically, sparse distributed representations, SDR), particularly with regard to probabilistic computing, e.g., belief update, are greater than the gains achievable via machine parallelism.  Though algorithmic and machine parallelism are orthogonal resources and so can be leveraged multiplicatively. (Slides (57meg))
  • Dec '15: Historical Highlights for Sparsey: A Talk given at The Redwood Neuroscience Institute in 2004, before it became Numenta. A better 2006 talk (video) at RNI after it moved to Berkeley.
  • Dec '15: Preliminary results of hierarchical Sparsey model on Weizmann event recognition benchmark. To our knowledge, these are the first published results for a model based on sparse distributed codes  (SDR) on this or any video event recognition benchmark!
  • Sep '15: Brain Works How? ...a new blog with the goal of stimulating discussion of the coming revolution in machine intelligence, sparse distributed representaton.
  • Apr '15: Short video explaining hierarchical compositional (part-whole) representation in deep (9-level) Sparsey network.
  • Apr '15: Animation showing notional mapping of Sparsey onto Cortex.
  • Mar '15: U.S. Patent 8,983,884 B2 awarded (after almost 5 years!) to Gerard Rinkus for "Overcoding and Paring (OP): A Bufferless Chunking Process and Uses Thereof".  A completely novel mechanism for learning / assigning unique "chunk" codes to different sequences that have arbitrarily long prefixes in common, without requiring any buffering of the sequence items.  Interestingly, OP requires the use of sparse distributed representations and is in fact undefined for localist representations.  In addition, it is ideally suited to use in arbitrarily deep hierarchical representations.
  • Mar '15: Movie of 8-level Model, with 3,261 macs, in learning/recognition experiment with 64x64 36-frame snippet.  Interestingly, based on a single learning trial, most of the spatiotemporal SDC memory traces are reinstated virtually perfectly even though the "V1" and "V2" traces are only 57% and 85% accurate, respectively.
  • Dec '14: Movie of 6-level hexagonal topology Sparsey model recognizing an 8-frame 32x32 natural-derived snippet. Also see below.
  • Dec '14: Paper published in Frontiers in Computational Neuroscience: "Sparsey: event recognition via deep hierarchical sparse distributed codes"
  • Neurithmic Systems's YouTube channel
  • Jan '15: Fundamentally different concept of a representational (basis) of features in Sparsey compared to that in the localist "sparse basis" (a.k.a. "sparse coding") concept. (PPT slide).
  • CNS 2013 Poster: "A cortical theory of super-efficient probabilistic inference based on sparse distributed representations" (abstract, summary)
  • Feb '13: NICE (Neuro-Inspired Computational Elements) Workshop talk (click on "Day1: Session 2" link on NICE page), "Constant-Time Probabilistic Learning & Inference in Hierarchical Sparse Distributed Representations" (PPT)

Sparse Distributed Representation (SDR) : A Revolution in Probabilistic Computing

Neurithmic Systems is developing scalable on-line-learning probabilistic reasoning models based on sparse distributed representations (SDR), a.k.a., sparse distributed codes (SDC). Following on prior ONR and DARPA supported research, we are developing these models for video event recognition and understanding, as well as for multi-modal inputs, e.g., visual + auditory + text. Though, the technology is modality-neutral and can be applied to any type of discrete multivariate time series data.

SDR provides massive algorithmic speedup for both learning and best-match retrieval of spatial or spatiotemporal patterns. In fact, Neurithmic's core algorithm, TEMECOR, now called Sparsey®, invented by Dr. Gerard Rinkus in the early 90's, both stores (learns) and retrieves the best-matching stored sequence (recognizes) in fixed time for the life of the system. This was demonstrated in Dr. Rinkus's 1996 Thesis and described in his 2004 and 2006 talks at the Redwood Neuroscience Institute, amongst other places. To date, no other published information processing method achieves this level of performance! Sparsey, implements what computational scientists have long been seeking: computing directly with probability distributions, and moreover, updating from one probability distribution to the next in fixed time, i.e., time that does not increase as the number of hypotheses stored in (represented by) the distribution increases.

The magic of SDR is precisely this: any single active SDR code simultaneously functions not only as the single item (i.e., feature, concept, event) that it represents exactly, but also as the complete probability distribution over all items stored in the database. With respect to the model animation shown here, each macrocolumn constitutes an independent database. Because SDR codes are fundamentally distributed entities, i.e., in our case, sets of co-active binary units chosen from some much larger pool (e.g., a macrocolumn), whenever one specific SDR code is active, all other SDR codes that intersect with it are also partially physically active in proportion to how many units they share with the fully active code. And, because these shared units are physically active (in neural terms, spiking), all these partially active codes also influence the next state of the computation. But, the next state of the computation will just be another of the stored SDR codes that has become active, which will in general have some other pattern of overlaps with all of the stored codes, and thus embody some other probability distribution over the items.

Virtually all graphical probabilistic models to date, e.g., dynamic Bayes nets, HMMs, use localist representations. In addition, influential cortically-inspired recognition models such as Neocognitron and HMAX also use localist representations. This page shows what an SDR-based model of the cortical visual hierarchy would look like. Also see the NICE Workshop and CNS 2013 links at left.

Memory trace of 8-frame 32x32 natural event snippet playing out in a 6-level Sparsey model with 108 macs (proposed analogs of cortical macrocolumns).

This movie shows a memory trace that occurs during an event recognition test trial, when this 6-level model (with 108 macs) is presented with an identical instance to one of the the 30 training snippets. A small fraction of the U, H, and D signals underlying the trace is shown. See this page for more details. What's really happening here is that the Code Selection Algorithm (CSA) [See Rinkus (2014) for description] runs in every mac [having sufficient bottom-up (U) input to be activated] at every level and on every frame. The CSA combines the U, H, and D signals arriving at the mac, computes the overall spatiotemporal familarity of the spatiotemporal moment represented by that total input, and in so doing effectively retrieves (activates the SDR code of) the spatiotemporally closest-matching stored moment in the mac.

eXTReMe Tracker