[[
wikihub
]]
Search
⌘K
Explore
People
For Agents
Sign in
Explore
People
For Agents
Sign in
@jeremynixon / Thinking / metalearning-the-structure-of-information.md
Suggest edit
Cancel
Submit suggestion
Title
Name
Note
--- title: "Metalearning the Structure of Information" visibility: public --- # Metalearning the Structure of Information Category: [[machine-intelligence|Machine Intelligence]] [Read the original document](https://docs.google.com/document/d/12aZCRdXh5VmtaNFCzuV2pzIO2veA8vtQXtZhN4zMWoM/edit?usp=drivesdk&sa=D&ust=1596495076469000&usg=AOvVaw2Q78BVM997BprTl_4aZCUb) <!-- gdoc-inlined --> --- Practical Examples of Encoding Structure 1. Attentional ShapeContextNet for Point Cloud Recognition 2. Aggregated Residual Transformations for Deep Neural Networks 3. Why does deep and cheap learning work so well? 4. Symmetry Regularization 5. Group Equivariant Convolutional Networks 6. Spatial Transformer Networks 7. Stochastic Video Generation with a Learned Prior 8. Relational inductive biases, deep learning, and graph networks 9. Distant Transfer for Continual Learning [Marc Pickett] 10. Relational Deep Reinforcement Learning I’m extremely surprised that I haven’t seen this comprehensive mode of thinking made explicit anywhere. If I’m actually the first person to come up with this abstraction (and the surprise of & questions asked by people like Pedro Domingos, Ryan Adams suggests that it’s rare) then I have a serious duty at hand. This is key to algorithm learning and creating low bias models against the curse of dimensionality. This in an ensemble model. Types of structure in information: 1. Hierarchical / Compositional / Combinatorial Structure 2. Relational / Graphical Structure 3. Recursive Structure 4. Temporal / Sequential Structure 5. Clustering Structure 6. Discreteness - quantized 7. Continuity - distribution 8. Smoothness 9. Sparsity 10. Locality 11. Linearity / Polynomial / Exponential Structure Principles of Structure: 1. Simplicity vs. complexity 2. Bias - Variance Decomposition 3. Abstraction - level of abstraction at which more or less structure, or different types of structure are present 4. Framed as Compression 1. Degree of Compression 5. Directionality 6. Discrete vs. Continuous 7. Abstraction - fine vs. coarse grain structure 8. Similarity, say, with a feature or set of features 9. Randomness, degree to which there is structure, compressibility of data 10. Homogeneity - degree to which the same operations can be run over objects in the structure 11. Dimensionality - Interactions between features vs. single feature structure Examples 1. Hierarchical / Compositional / Combinatorial 1. Images 2. Language 3. Set of axioms to euclidean geometry 4. Organization's’ management structure 2. Relational / Graphical 1. Social Network 2. Worldview (Tension with Hierarchical) 3. Recursive (Top-down hierarchical) 1. Trees 4. Temporal 1. Periodicity 2. Messages’ bursting structure 3. Quantized, like hitting lights for predicting arrival time 4. Making food in a kitchen (Tempo) 5. Dancing (Rhythmic / Periodic) 6. Option / Permanence - School choice, Tatoos, Relationships 5. Discreteness 1. Categories - Number of Fields in an Academy 2. Binary - Graduated or Not Graduated, Accepted or not Accepted, Given an offer or Not Given an Offer 6. Continuity 1. Intensity of emotion 2. Amount of time on a task 7. Causal 1. Counterfactual - If I had done x, simulation. 2. Imagination - If I do x, simulation. Hierarchical Structure 1. Abstraction 2. Images 1. Objects - Object Parts - Shapes - Lines / Curves 3. Audio 1. Words - Phonemes 4. Businesses / Governments 5. Sciences 1. Physics 2. Chemistry 3. Biology 1. Ontology of Species 2. Organ Systems - Organs - Tissues - Cells - Nuclei + Organelles 3. Brain 6. Natural Language 1. Fields - Concepts - Words (Combinatorial as well) 2. Paragraph - Sentence - Phrase - Word - Character 7. Time 1. Centuries - Decades - Years - Months - Weeks - Days - Hours - Minutes - Seconds 8. Measurement 1. Kilometers - Meters - Centimeters - Millimeters 9. Object Oriented Systems 1. Classes - Objects 10. Economy 1. GDP - consumer spending + investment + Government Spending + Exports - Imports Relational / Graphical Structure 1. Object Oriented Structure 1. Object (Entity) 2. X is a Y relationships (Classification, Inheritance) 3. X has a Y relationships (Composition / Aggregation) 4. Properties of an Object 2. Causal Graph - X leads to Y 3. Dependency - X depends on Y 4. Subject - Object relationships (in sentences) 1. Linking verbs - ‘is’, ‘has’, ‘are’, ‘being’, ‘sense’ etc. between Object and Subject 5. Co-occurrence 1. Ex. Words mentioned in concert with one another 6. Link - are connected 1. Linkage Distribution 7. Locality 8. Edge Density Temporal Structure 1. Periodicity 1. Hierarchical Periodicity 2. Seasonality 2. Burstiness 3. Stationary vs. Non-Stationary Distributions 4. Permanence / Option Structure 5. Quantized 1. Ex. hitting lights when predicting arrival time 6. Autoregression / Autocovariance 7. Feedback 1. Positive Feedback 2. Negative Feedback 3. Length of feedback loops 8. Synchronicity vs. Asynchronicity 1. Discrete vs. Continuous 9. Exponential Decay vs. Windowing 1. Continuity vs. Discreteness 10. Stability & Equilibrium 11. Derivatives - change over time 12. Objectness - these pixels move together 13. Asymmetry between past and future 14. Exclusive ability to directly impact present 15. Strong predictor of causality / anti-causality Relevant Links 1. https://sites.google.com/site/icml18limitedlabels/ 2. https://arxiv.org/pdf/1608.08225.pdf Papers 1. Notes Regularizers impose a smoothness inductive bias, and weight decay / L2 regularization happens to impose smoothness. But at the end of the day, we do induction. We realized that this bias worked well in the past and impose it on new data. Big diff between having a causal model (true relationship is smooth, so imposing that prior will lead to a more efficient search in function space) and just predicting that it will work well because it worked on past data (with no model for why it’s working) There are different inductive biases imposed by every form of regularization, which should be listed and maximized. 1. Dropout 1. Algorithmic. Cuts the signal for inputs to a network. 1. Does dropout work for linear models? For trees? How to deal with it at test time? 2. Norm Penalties 1. L1 (Sharp) 2. L2 (Smooth) / Weight Decay 3. Model averaging 1. (the averaging step, not the step where variance is created through bagging, alternate parameterization, feature elimination (al rf & erf), etc. 4. Intelligent Initialization 5. Noise Injection 6. Early Stopping 7. Constraints on optimization 8. Train and test time data augmentation 9. Multi-task learning 1. Multi-class as multi-task 10. Pruning 11. Weight Sharing 12. Stochastic Optimization 13. All models? All Priors? --- *Source: [Original Google Doc](https://docs.google.com/document/d/12aZCRdXh5VmtaNFCzuV2pzIO2veA8vtQXtZhN4zMWoM/edit?usp=drivesdk&sa=D&ust=1596495076469000&usg=AOvVaw2Q78BVM997BprTl_4aZCUb)*