Homepage of Daniel S. Wilkerson

my email address as an image

For four and a half years I was a Programmer-Analyst III at the University of California at Berkeley. I worked for Alex Aiken and David Wagner. I continue to work on research and software in the areas of programming languages and security. My resume.

On the right is a picture of my hero, Ben Franklin! I have made printable versions of excerpts from his Autobiography on Virtue and Humility to carry with me; you may want to as well. Benjamin Franklin

It is certainly of more consequence to a man, that he has learnt to govern his passions in spite of temptation, to be just in his dealings, to be temperate in his pleasures, to support himself with fortitude under his misfortunes, to behave with prudence in all his affairs and every circumstance of life; I say, it is of much more real advantage to him to be thus qualified, than to be a master of all the arts and sciences in the world beside. Virtue alone is sufficient to make a man great, glorious, and happy. -- Benjamin Franklin

Some paintings by my mom, such as the one below, are at http://carolynhocker.com/. She paints portraits and other subjects for hire; contact me if you are interested.

Homepage of Carolyn Hocker


Software

Trend Profiler (trend-prof) aims at finding performance surprises (super-linearities) in C / C++ code based on trends in the runs of the program on small to medium inputs. That is, we predict which parts of the code will not scale well to larger inputs. For each basic block (more or less a line of source code), trend-prof constructs a model that predicts how many times that basic block is executed as a function of input size. The current trend-prof tool works on C / C++ compiled with gcc. A collaboration with Simon Goldsmith. NOTE: Tigris.org has shut down and so we are currently moving trend-prof to github.

[After going on for 7 paragraphs about the best methods of sweeping the streets of Philadelphia] Some may think these trifling matters not worth minding or relating; but when they consider that tho' dust blown into the eyes of a single person, or into a single shop on a windy day, is but of small importance, yet the great number of the instances in a populous city, and its frequent repetitions give it weight and consequence, perhaps they will not censure very severely those who bestow some attention to affairs of this seemingly low nature. Human felicity is produc'd not so much by great pieces of good fortune that seldom happen, as by little advantages that occur every day. -- Benjamin Franklin

Oink is a collection of composable backends for the Elsa C and C++ frontend. Oink computes both 1) expression-level and type-level data-flow, and 2) statement-level intra-procedural control-flow [by delegating to Elsa]. Oink also comes with a client of the data-flow analysis that does type qualifier inference: Cqual++, a C/C++ frontend for Cqual. Whole-program analyses may be attempted using the linker imitator.

After the Heartbleed bug came out, someone at a government lab that will not let me use their name wrote me, saying:
Yes, you are interpreting me correctly. CQual++ found heartbleed while the commercial tools I tried did not. I also applied CQual++ to an important internal project and found it very effective (though a bit difficult to run and interpret) at identifying places where sanitization routines weren't being called consistently.
and
I've been applying cqual to a fairly significant library that we've been developing that does a fair amount of both network and filesystem related stuff. I don't think I've hit any false positives. There were a number of places where input was being validated but not marked as such which of course caused issue reports but that's not really a false positive.

Compound Init is a small package for implementing the semantics of Compound Initializer syntax as described in the C99 standard, section 6.7.8, including Designated Initializers and the gcc Range Initializer extension to the syntax. It is released under the BSD license and is designed for maximum ease of inclusion into an existing compiler. Compound Init is now a subdirectory of the Oink project which is why the link above points to Oink.

Build Interceptor captures the .i files of any project while it is built from source using the gcc toolchain without *any* modification to the build process of the project you are trying to capture. Includes "Simple Build-Process Interceptor" which used to be posted here. NOTE: Tigris.org has shut down and so build-interceptor has moved to github.

Delta assists you in minimizing "interesting" files subject to a test of their interestingness. A common such situation is when attempting to isolate a small failure-inducing substring of a large input that causes your program to exhibit a bug. Our implementation is based on the algorithm of the Delta Debugging project at Saarland University. Delta is a collaboration with Scott McPeak. You have Microsoft to thank for asking that Delta be released as Open Source; see the Delta page for more. NOTE: Tigris.org has shut down and so delta has moved to github.

Subversion dumpfile tools written by me in 2006:

MTS: Multi-TeSter: a simple domain-specific language for maintaining tests that supports two properties I have found desirable:

punch-clock and punch-clock-report: simple Python tools for tracking your time on multiple projects and generating billing reports; they use a human-readable and editable text file database.

Contributions of software must say "I put this email, including any attachments, into the public domain.". Contributions by corporations or major contributions require that you sign a one page Public Domain Dedication in saying in one page what the previous quotation says in one sentence (hey, that's the law for you); email me for the Dedication. This arrangement prevents copyright difficulties. New features in contributed software should be accompanied by tests.


Articles

Harmony Explained: Progress Towards A Scientific Theory of Music: Most music theory books are like medieval medical textbooks: they contain unjustified superstition, non-reasoning, and funny symbols glorified by Latin phrases. How does music, in particular harmony, actually work, presented as a real, scientific theory of music?

In particular we derive from first principles of Physics and Computation the following three fundamental phenomena of music:

While the Major Scale has been independently derived before by others in a similar manner as we do here [Helmholtz1863, p. 300], [Birkhoff1933, p. 92], I believe the derivation of the Standard Chord Dictionary as well as the difference in feeling between the Major and Minor Triads to be an original contribution to science and art.

I intend this article to be satisfying to scientists as an original contribution to science (as a set of testable conjectures that explain observed phenomena), yet I also intend it to be approachable by musicians and other curious members of the general public who may have long wondered at the curious properties of tonal music and been frustrated by the lack of satisfying, readable exposition on the subject. Therefore I have written in a deliberately plain and conversational style, avoiding unnecessarily formal language.

Harmony Explained: Progress Towards A Scientific Theory of Music.
Daniel S. Wilkerson. 20 February 2012.


Static-Time Relational Query Plan Generator: In PLDI'11 there is a paper "Data Representation Synthesis", by Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin Rinard, and Mooly Sagi, PLDI'11, June 4-8, 2011, San Jose, California, USA. The core idea is a "Static-Time Relational Query Plan Generator": one represents data in the relational model and then runs a query plan generator at *static* time to generate functions that query and update the model. There is more than one way to generate the data structures and as well as more than one way to implement a given query or update; therefore allowing for human guidance of this static-time query plan generator is part of the problem.

I thought of this idea myself years ago while working on the rather horrible internals of Scott McPeak's Elsa C/C++ front-end (which is not Scott's fault: the semantics of C++ are a disaster); in particular, I noticed that most of what I was doing amounted to implementing the queries of a relational database by hand, which was very tedious, verbose, and error-prone. I worked on this idea and its ramifications for quite a while and presented it to several people in 2005, at the time calling it "Orth" (I have since picked a new name for a more expanded version of the idea). One of them was Alex Aiken, also an author on the above paper. Despite my request to Alex, there is no citation for my design given in the paper, though eventually they added an acknowledgment to the electronic version. I have therefore decided to acknowledge myself here in order to provide a record of this fact.

Show me your flowcharts [code], and conceal your tables [schema], and I shall continue to be mystified; show me your tables [schema] and I won't usually need your flowcharts [code]: they'll be obvious. -- Fred Brooks, "The Mythical Man Month", ch 9.

Distributed Transactions for Google App Engine: I have an app I've wanted to build for years, but I couldn't build it until we had cloud computing; to that end I started using Google App Engine (GAE) the day it came out. I needed to maintain a global invariant across the database, so I needed distributed transactions (DTs), but GAE didn't have them, so I started writing a design document. I still have not built the app, but we finished the design for DTs. This paper is a tutorial, not just an algorithm, so we motivate the ACID properties from scratch, instead of treating them as if they were handed down by God on stone tablets, and we prove everything from scratch also. This paper is also a design document rather than just a "core algorithm", so addresses the rich collection of features that people expect; most theoreticians do not do this, but that's where much of the difficulty is. Erick Armbrust at Google has implemented it as part of tapioca-orm. Ryan Barrett, one of the founders of the Google App Engine team, said "Thanks. This has been on our todo list for years." Presented by me at CodeCon 2009 and at Google I/O 2009; see the video of the talk on the right.

Distributed Transactions for Google App Engine: Optimistic Distributed Transactions built upon Local Multi-Version Concurrency Control.
Daniel S. Wilkerson, Simon Fredrick Vicente Goldsmith, Ryan Barrett, Erick Armbrust, Robert Johnson, Alfred Fuller. Progress report presented at Google I/O in May 2009; final version 15 June 2011.

I think that 'innovation' is a four-letter word in the industry. It should never be used in polite company. ... It was Edison who said '1% inspiration, 99% perspiration'. That may have been true a hundred years ago. These days it's '0.01% inspiration, 99.99% perspiration', and the inspiration is the easy part. As a project manager, I have never had trouble finding people with crazy ideas. I have trouble finding people who can execute. ... So no, I don't think people need more innovation. I'd rather see more people sell their product on some plain old-fashioned 'being good'. -- Linus Torvalds

Manhattan-Connected Exact-Coloring Is NP-Complete: I came to this problem through a friend who was taking Professor Richard M. Karp's CS 270 Combinatorial Algorithms and Data Structures class at The University of California, Berkeley. At the time I was just trying to help a friend with his homework (of course with the intent to provide proper notification to the course grader). After I solved it my friend told me that in fact neither Karp nor his teaching assistant had a solution.

Manhattan-Connected Exact-Coloring Is NP-Complete.
Daniel S. Wilkerson. 31 March 2011.


Trend Profiler: I am proud of this paper as it is a simple solution to a practical problem; it also demonstrates the elegance and wide applicability of power-laws and has a neat use of trigonometry. Alex Aiken proposed the problem of empirically determining the time complexity of a program, which, just as with the Winnowing problem below, is so simple that I was sure if it were possible at all someone would have done it already; yet he was right there was definitely something there. My contributions include that I conjectured power-laws as a low-dimensional way to model the number of executions of a line of code in a program. The paper was a study of an implementation mostly done by Simon and a little by me called Trend Profiler (trend-prof), mentioned above in the Software section. Simon and I gave a talk on Trend Prof at CodeCon 2009.

Measuring Empirical Computational Complexity.
SEC/FSE'07: The 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering,
Simon F. Goldsmith, Alex S. Aiken, Daniel S. Wilkerson.

I think it's much less important to get somebody who has PhDs in all these subjects. If you can find somebody who never went to college but has built a lot of things as a tinkerer - knows how to operate the equipment, run into their own little garage or laboratory quickly and whip something out - that's the person that companies are missing out on, and all their requisition requirements overlook those people. I can admire the tinkerers. -- Woz

Winnowing: Now this is a cool paper. It is theoretically cool, but more importantly it actually made a real string-matching index 8 times smaller and 3 times faster on the same queries. I am now only interested in this kind of practical stuff.

Winnowing: Local Algorithms for Document Fingerprinting.
SIGMOD 2003: ACM Special Interest Group on Management of Data,
Saul Schleimer, Daniel S. Wilkerson, Alex Aiken.
Please note that the code in the article has some bugs.
Please beware that unfortunately the Winnowing Algorithm is patented: United States Patent 6,757,675.

Theory is gray, but the golden tree of life is green. -- Goethe

Other: I worked on the theory part of this paper and the relevance of that part to reality is pretty questionable. The implementation side done by Alan and Brent is pretty cool, but I didn't work on that.

System Area Network Mapping.
SPAA 1997: ACM Symposium on Parallel Algorithms and Architectures,
A. Mainwaring, B. Chun, S. Schleimer, D. Wilkerson.

The papers below are useless theoretical junk, but you have to put something on your resume. I no longer do this kind of stuff.

Faster Computation on Directed Networks of Automata.
PODC 95: ACM Symposium on Principles of Distributed Computing,
R. Ostrovsky, D. Wilkerson.

Nearly Tight Bounds for Wormhole Routing.
FOCS 94: IEEE Symposium on Foundations of Computer Science,
A. Ranade, S. Schleimer, D. Wilkerson.

Optimal Leapfrogging.
The Mathematics Magazine, 1994,
J. Auslander, A. Benjamin, D. Wilkerson.

Thomas Godfrey, a self-taught mathematician, great in his way, and afterward inventor of what is now called Hadley's Quadrant. But he knew little out of his way, and was not a pleasing companion; as, like most great mathematicians I have met with, he expected universal precision in everything said, or was for ever denying or distinguishing upon trifles, to the disturbance of all conversation. He soon left us. -- Benjamin Franklin, from his Autobiography


Essays

A little retrospection shows that although many fine, useful software systems have been designed by committees and built as part of multipart projects, those software systems that have excited passionate fans are those that are the products of one or a few designing minds, great designers. -- Fred Brooks

A Proposal for A Book of Piano Grooves
originally written 20 September 2009, posted 16 January 2010

A Proposal for Proquints: Identifiers that are Readable, Spellable, and Pronounceable
26 January 2009
Open Source implementations:

Effective Management: Want Things, Acknowledge People, Maintain Integrity
18 January 2009

An Intuitive Explanation of the Information Entropy of a Random Variable
Or: How to Play Twenty Questions
8 and 15 October 2006, 26 January 2009

Windows NT crashed.
I am the Blue Screen of Death.
No one hears your screams. -- Peter Rothman

Outline Proposal for Strongly Type-Checked Make and Shell
2 May 2006, edited 4 October 2006

On the Aesthetic of Programming:
Towards Academics and Engineers Talking Usefully About Programming Languages
19 Oct 2005 and 7-8 June 2006

Proposal for an Exercise-Pedia:
A Collaboratively-Maintained Set of Exercises Teaching Fluency in the Whole of Practical Programming and Computer Science
30 August 2005

A crash reduces
your expensive computer
to a simple stone. -- James Lopez

Proposal for an SQL Database Editor
24 August 2005

On the Hammer of Modernism and the Cleaning of Toilets
25 April 2005 and 8 June 2006

Why the '+1' in Two's Complement Arithmetic?
22 June 2004


Non-fiction

If you would not be forgotten
as soon as you are dead and rotten
write things worth the reading
or do things worth the writing.
-- Benjamin Franklin

Groundhog Day
Approximately 5:20 pm, Thursday, 2 February 2006, Berkeley, California

Buxom Red Girl in the Bookstore
Approximately 4:00 pm, Sunday, 6 June 2004, Berkeley, California

Vote Early and Often
Approximately 11:30 am, Tuesday, 2 March 2004, Berkeley, California

A Walk In Berkeley
Approximately 7 pm, Saturday, 16 June 2001, Berkeley, California Official NaNoWriMo 2006 Winner


Fiction

I am an Official NaNoWriMo 2006 Winner. (Note: Everyone who finishes "wins".)


© Copyright 2002-2007 Daniel S. Wilkerson