Instructor: Matilde Marcolli

There will be three TAs for the class: Vinny Augustine, Shival Dasu, and Vibhor Kumar

(Darren Waterston, "Linguistics", oil on wood, 2002)

- Partha Niyogi, "The computational nature of language learning and evolution", MIT Press, 2006.

- Andras Kornai, "Mathematical Linguistics", Springer, 2010.
- G.E. Revesz, "Introduction to formal languages", McGraw-Hill, 1983.
- Christopher D. Manning, Hinrich Schuetze, "Foundations of statistical natural language processing", MIT Press, 1999.
- Mark C. Baker, "The Atoms of Language", Basic Books, 2001.
- Peter Forster and Colin Renfrew, "Phylogenetic methods and the prehistory of language", McDonald Institute Monographs, 2006.
- Prashant Parikh, "Language and equilibrium", MIT Press, 2010.
- Charlotte Calves et al (Eds.) "Parameter theory and linguistic change" Oxford, 2012
- G.E. Barton, R.C. Berwick, E.S. Ristad, "Computational complexity and natural language", MIT Press, 1987.
- Ruslan Mitkov, "The Oxford Handbook of Computational Linguistics", Oxford 2009.
- Chris Heunen at al. (Eds.) "Quantum physics and linguistics", Oxford, 2013.

- Tuesday January 6: What is Linguistics? World language families, diachronic/synchronic viewpoints, Levels of structure: phonology, morphology, syntax, semantics; Overview of phonetics: IPA charts, tones and suprasegmental features, autosegmental phonology, feature geometry, optimality theory
- Thursday January 8: Overview of morphology: allomorphy, morphological typology, polysynthetic languages, lexicology; Overview of Syntax: transformational grammars, principles and parameters, government and binding, minimalist program, head-driven phrase structure grammar, lexical functional grammar, tree-adjoining grammar (continued on Jan 13)
- Tuesday January 13: Historical linguistics: sound changes, borrowing, analogical change, semantic shifts, syntactic changes, grammaticalization, comparative methods and reconstruction of proto-languages, phylogenetic linguistics, family branches by shared innovations
- Thursday January 15: Phylogenetic Linguistics: computational methods, Swadesh lexical lists, cognates and coding of lexical data, neighbor-join method, Q-test matrices, UPGMA, maximum parsimony, maximum likelihood, Bayesian inference trees; Wave Theory of language change; origins of modern linguistics from Panini to de Saussure and Chomsky
- Tuesday January 20: Phylogenetic Inference: Hidden Markov Models, invariants, Viterbi sequence, polynomial formulation, phylogenetic algebraic geometry, model parameters, Segre embeddings, Secant variety of a Segre variety, determinantal varieties, tropical semiring and tropicalization, Viterbi sequence and tropical polynomials, Newton polygon, normal fan, and inference functions
- Thursday January 22: Formal languages: grammars, context free and context sensitive, the Chomsky hierarchy, Types and Machine Recognition, finite state automata, pushdown stack automata, Turing machines, recursively enumerable grammars, linear bounded automata
- Tuesday January 27: formal languages from group theory, word problem, recursive languages, regular languages, Cayley graphs and context free languages; context-free grammars and parse trees, ambiguities, parse trees and natural languages, operations, transformational grammars, tree-adjoining grammar, non-context-freeness of natural languages
- Thursday January 29: Probabilistic linguistics: Bernoulli and Markov measures, hidden Markov models, probabilistic context free grammars, sentence probabilities, inside and outside probabilities (Chomsky normal form), training, probabilistic tree adjoining grammars
- Tuesday February 3: Graph granmmars, context free case, examples from Feynman graphs and quantum field theory, insertion Lie algebra; Languages and Complexity: physical systems from formal languages (subshifts of finite type, random walk with barrier, self-avoiding random walk), Kolmogorov complexity and data compression, morphological complexity of languages, Kolmogorov complexity and entropy, Kraft inequality, optimal encoding and Shannon entropy, universal Levin probability, syntactic parameters and complexity, Zipf's law and entropy, Zipf's law and complexity
- Thursday February 5: Coding theory and linguistics: error correcting codes and code languages, code parameters, asymptotic bound, Gilbert-Varshamov bound and Shannon random code ensemble, asymptotic bound and Kolmogorov complexity, asymptotic bound as phase transition, syntatic parameters and the asymptotic bound: a measure of language relatedness (continued on February 10)
- Tuesday February 10: Natural Language Processing: tagging, collocations, disambiguation, supervised and unsupervised learning, machine translation, text alignement, translation probabilities, information retrieval, vector space model
- Thursday February 12: CLASS CANCELLED due to ongoing Student-Faculty Conference
- Tuesday February 17: Models of language acquisition: learning algorithm, inductive inference approach, learnability, probabilistic learnability
- Thursday February 19: Models of language acquisition: probably approximately correct model, statistical learning theory, weak convergence, Vapnik-Chervonenkis dimension and learnability; Syntactic Parameter setting: 3-parameter model, Gibson-Wexler Trigger Learning Algorithm, Parameter space as a Markov Chain, closed sets and learnability
- Tuesday February 26: population dynamics of 2-language model, trigger learning algorithm, batch error-based learning, cue-based learning
- Thursday February 26: multiple languages, Markov Chain model, homogeneous and non-homogeneous population, multilingual learners, bilingualism, communicative fitness, languages as association matrices (measures)
- Tuesday March 3: Communicative Fitness, languages as association matrices, encoders/decoders, communicability, best response approximation algorithm, learning with full and partial information; linguistic coherence as emergent property, cue-based social learning
- Thursday March 5: statistical physics models of language learning and evolution; automatic construction of symbolic parsers for syntactic parameters; parameter setting via conditional entropies; algebraic versus probabilistic methods in Linguistics
- Tuesday March 10: student presentations

- What is Linguistics? (Part I) general introduction, phonology
- What is Linguistics? (Part II) morphology, syntax
- What is Linguistics? (Part III) historical linguistics, phylogenetic linguistics, wave theory, roots of modern linguistics
- Geometry of Phylogenetic Inference: hidden Markov models and polynomial maps, phylogenetic algebraic geometry, tropicalization and Viterbi sequence
- Formal Languages (Part I) context free, context sensitive, Chomsky hierarchy, types and machine recognition, finite state automata, pushdown stack automata, Turing machines
- Formal Languages (Part II) formal languages from group theory, word problem, recursive languages, regular languages, Cayley graphs and context free languages
- Parsing Trees: from formal languages to natural languages; context-free grammars and parse trees, ambiguities, parse trees and natural languages, operations, transformational grammars, tree-adjoining grammar, non-context-freeness of natural languages (Swiss German)
- Probabilistic Linguistics: Bernoulli and Markov measures, hidden Markov models, probabilistic context free grammars, probabilistic tree adjoining grammars
- Graph Grammars: parallelism and graph grammars, examples based on Feynman graphs
- Languages and Complexity: Kolmogorov complexity, morphological and syntactic complexity, Zipf's law
- Coding Theory and Linguistics : error correcting codes and code language, code parameters, asymptotic bound and Kolmogorov complexity, syntactic parameters, language families and codes
- Natural Language Processing: tagging, collocations, disambiguation, supervised and unsupervised learning, machine translation, text alignement, translation probabilities, information retrieval, vector space model
- Models of Language Acquisition: learning algorithm, inductive inference approach, learnability, probabilistic learnability
- Models of Language Acquisition: Part II: probably approximately correct model, statistical learning theory, weak convergence, Vapnik-Chervonenkis dimension and learnability
- Language Acquisition: Parameter Setting 3-parameter model, Gibson-Wexler Trigger Learning Algorithm, Parameter space as a Markov Chain, closed sets and learnability
- Language Acquisition and Parameters: Part II Learning Algorithms and (inhomogeneous) Markov Chains
- Models of Language Evolution: 2-language model, population dynamics, trigger learning algorithm, batch error-based learning, cue-based learning
- Models of Language Evolution, Part II: multiple languages, Markov Chain model, homogeneous and non-homogeneous population, multilingual learners, bilingualism
- Models of Language Evolution, Part III: Communicative Fitness, languages as association matrices, encoders/decoders, communicability, best response approximation algorithm, learning with full and partial information
- Models of Language Evolution, Part IV: linguistic coherence as emergent property, cue-based social learning, language learning and evolution and statistical physics
- Additional Topics: Syntactic parameters and language acquisition: automatic construction of symbolic parsers; parameter setting via conditional entropies; discussion of algebraic versus probabilistic methods in Linguistics

- pdf Noam Chomsky, "Three models for the description of Language"
- pdf Seymour Ginsburg, Barbara Partee, "A mathematical model of Transformational Grammars"
- pdf Stuart M. Shieber, "Evidence against the context-freeness of natural language"
- pdf Haitao Liu, "Dependency direction as a means of word-order typology: a method based on dependency treebanks"

- pdf Partha Niyogi, Robert C. Berwick, "A dynamical systems model for language change"
- pdf C.F. Cuskley, M. Pugliese, C. Castellano, F. Colaiori, V.Loreto, F.Tria, "Internal and external dynamics in language: evidence from verb regularity in a historical corpus of English"

- pdf N.Saitou, M.Nei, "The Neighbor-joining Method: a new method for reconstructing phylogenetic trees"
- pdf R.Mihaescu, D.Levy, L.Pachter, "Why neighbor-joining works?"
- pdf A.Delmestri, N.Cristianini, "Linguistic Phylogenetic Inference by PAM-like Matrices"
- pdf F.Petroni, M.Serva, "Language distance and tree reconstruction"
- pdf A.Bouchard-Cote, D.Hall, T.L.Griffiths, D.Klein, "Automated reconstruction of ancient languages using probabilistic models of sound change"
- pdf H.Luqman, "A Phylogenetic approach to comparative linguistics: a test study using the languages of Borneo"
- pdf B.Chor, T.Tuller, "Finding the Maximum Likelihood Tree is Hard"
- pdf L.Pacher, B.Sturmfels, "The Mathematics of Phylogenomics"
- pdf N.Eriksson, K.Ranestad, B.Sturmfels, S.Sullivant, "Phylogenetic Algebraic Geometry"
- pdf L.Pacher, B.Sturmfels, "Tropical geometry of statistical models"
- pdf G. Longobardi, C. Guardiano, G. Silvestri, A. Boattini, A. Ceolin, "Towards a syntactic phylogeny of modern Indo-European languages"
- pdf G. Longobardi, C. Guardiano, "Evidence for syntax as a signal of historical relatedness"

- pdf W. Labov, "Transmission and Diffusion"
- pdf J. Nerbonne, "Measuring the diffusion of linguistic change"

- pdf Marc van Oostendrop, "Feature Geometry"
- pdf Andras Kornai, "The generative power of feature geometry"
- pdf G.N. Clements, "The Geometry of phonological features"
- pdf Alan Prince, Paul Smolensky, "Optimality Theory in Phonology"

- pdf R.D. Levine, W.D. Meurers, "Head-driven Phrase Structure Grammar"

- pdf Carol Neidle, "Lexical Functional Grammar"
- pdf Ronald M. Kaplan, Joan Bresnan, "Lexical-Functional Grammar: A formal system for grammatical representation"

- pdf A.Copestake, D.Flickinger, C.Pollard, I.A.Sag, "Minimal Recursion Semantics: An Introduction"

- pdf K.Ehret, B.Szmrecsanyi, "An information-theoretic approach to assess linguistic complexity"
- pdf M.Bane, "Quantifying and measuring Morphological Complexity"
- pdf A.Kaltchenko, "Algorithms for estimating information distance with applications to bioinformatics and linguistics"
- pdf R.Clark, "Kolmogorov complexity and the information content of parameters"
- pdf M.Gell-Mann, "What is Complexity?"
- pdf C.E.Shannon, "Prediction and Entropy of Printed English"
- pdf D.Link, "Traces of the Mouth: Andrei Andreyevich Markov's mathematization of writing"
- pdf A.K.Zvonkin, L.A.Levin, "The complexity of finite objects and the development of the concepts of information and randomness by means of the theory of algorithms"

- pdf T.Ceccherini-Silberstein, W.Woess, "Growth and Ergodicity of Context-free Languages"
- pdf Y.Wang, L.Yang, H.Xie, "Complexity of unimodal maps with aperiodic kneading sequences"
- pdf Eibe Frank, "Formal Languages and Automata", Chapter 6
- pdf J.Shallit, "Number Theory and Formal Languages"

- pdf R.Sproat, M.Yarmohammadi, I.Shafran, B.Roark, "Lexicographic Semirings"
- pdf S.Giraudo, J.G.Luque, L.Mignot, F.Nicart, "Operads, quasiorders and regular languages"

- Project N.1 Topological Analysis of Syntactic Parameters
- Project N.2 Dimension Reduction of Syntactic Parameters
- Project N.3 A Spin Glass Model of Syntax
- Project N.4 Syntactic Phylogenetic Trees
- Project N.5 A Kanerva Network model of Syntax

- January 15, 4-5pm: Nakul Dawra on Chomsky's Three models for the description of Language
- February 19, 4-5pm: Ella Mathews on Tree Distance and Language Reconstruction
- February 26, 4-5pm: Shival Dasu on Evidence against the context-freeness of natural language
- March 5, 4-5pm: Ella Mathews on Evidence for Syntax as a Signal for Historical Relatedness
- March 10, 10:30-11:15am: Haebin Lim on Operads in Linguistics (slides of the talk)
- March 10, 11:15am-12:00pm: Sadaf Amouzegar on Feature Geometry
- March 28, 11:30-12:30 SLN 159: Chris Estrada on Automata and Formal Languages

- Oral presentations based on assigned reading material
- A computer project based on some of the material discussed in class

- Terraling
- The Indo-European Language Family Map
- Syntactic Structures of the World's Languages
- World Atlas of Language Structures
- Global Language Network
- Online Linguistics Database
- Linguistics Data Resources
- IPA sound charts
- LaTeX for Linguists
- WordNet: English lexical database
- Links to other Semantic Networks
- Chinese WordNet links
- Delph-In: Head-Driven Phrase Structure Grammar parsers
- Head-Driven Phrase Structure Grammar
- LaTeX package for AVMs
- Text Encoding Initiative: Feature Structures
- The Babel System: HPSG Interactive
- Lexical Functional Grammar
- Glue Logic theorem prover
- LTAG parser
- XTAG Project
- Phylogeny Programs
- Swadesh List
- Swadesh Lists of Brazilian Aboriginal Languages
- English translation of Panini's Ashtadhyayi