This summer, I’ve been working as an intern for Microsoft on the Direct2D/DirectWrite team. While I can’t really talk about what my work entails, I can talk about some of the fun things I’ve done this summer in my free time and the non-work-related components of my internship. I suppose most people wouldn’t start blogging about their internship by describing a bookstore, but I went to this place today and it was so incredible that I had to write about it.

In Capitol Hill, there’s a small store by the name Ada’s Technical Books. It’s in a house that’s been converted to a cafe and bookstore, and is quite possibly the most amazing bookstore I’ve ever seen. As you walk in, you’re greeted by an small cafe counter to your left and an open area to your right with short bookcases and comfy chairs. Toys, puzzles, and “Maker”-appropriate items like lockpicks and Raspberry Pis.

Many computer science problems utilize graph-based data structures. Their use can range from explicit inclusion in an algorithm-centric problem (like path-finding) to a more “behind-the-scenes” presence in Bayesian networks or descriptions of finite automata. Unfortunately, visualizing large graphs can be difficult to do, especially for debugging. Unlike lists or dictionaries, which can be represented clearly by plain text printing, depicting a graph tends to require more graphics overhead than is reasonable for most programmers to write simply for debugging purposes. I’ve found that Graphviz, a free graph visualization utility, can be quite useful in debugging graph-related programs.

Installing Graphviz

If you’re on a Debian-based Linux OS (e.g. Ubuntu), you can install Graphviz using apt-get. Just run $ sudo apt-get install graphviz and you’ll have everything you need to complete the steps in this blog post. Mac OS X users can use brew equivalently.

Windows users should install using a binary downloaded from the Graphviz Windows page, but there might be some issues with setting the PATH variable for running in the commandline.

Making a basic graph

Once you’ve installed, the next thing you’ll want to do is create a basic graph to ensure the installation succeeded and to gain practice using Graphviz tools. We do this by creating a *.dot file that describes the graph we wish to display. If you’re the type of person who likes to jump right in and experiment first before reading too much, or if you love formal language specification, the DOT grammar is fairly readable and can give a quick introduction to creating DOT files.

The below is a fairly representative DOT file to demonstrate some of the capabilities of Graphviz. Open your favorite text editor, copy/paste it in, and save it as firstgraph.dot:

digraph G { // Style information for nodes A [style="filled", color=".05 .3 1.0"]

// Edge declarations A -> {B, C}; D -> E [label="-1"]; E -> F [label="3"]; F -> D [label="10"]; }

This creates a directed graph (also called a digraph) with six nodes and two connected components. Some of the edges have labels, and one of the nodes is colored. After you’ve copied (or downloaded) this file, open up a terminal to the directory with firstgraph.dot in it and run $ dot firstgraph.dot -Tpng -o firstgraph.png. The resulting image file should look something like the below:

This past week, my ICPC team worked the 2013 Greater New York Regional problem packet. One of my favorite problems in this set was Problem E: Deranged Exams. The code required to solve this problem isn’t that complicated, but the math behind it is a little unusual. In this post, I aim to explain the math and provide a solution to this problem.

Problem Description

The full problem statement is archived online; in shortened form, we can consider the problem to be:

Given a “matching” test of $n$ questions (each question maps to exactly one answer, and no two questions have the same answer), how many possible ways are there to answer at least the first $k$ questions wrong?

It turns out that there’s a really nice solution to this problem using a topic from combinatorics called “derangements.” (Note that the problem title was a not-so-subtle hint towards the solution.)

Derangements

While the idea of a permutation should be familiar to most readers, the closely related topic of a derangement is rarely discussed in most undergraduate curriculum. So, it is reasonable to start with a definition:

A derangement is a permutation in which no element is in its original place. The number of derangements on $n$ elements is denoted $D_n$; this is also called the subfactorial of $n$, denoted $!n$.

The sequence $\langle D_n\rangle$ is A000166 in OEIS (a website with which, by the way, every competitive programmer should familiarize themselves).

It turns out that there is both a recursive and an explicit formula for $D_n$:

This is significant because we can use the explicit formulation for computing single values of derangements, or we can use dynamic programming to rapidly compute $D_n$ for relatively small $n$.

A friend recently emailed me asking for titles of books I’d recommend to read over the summer, particularly to prepare for computer science and mathematics. I’ve adapted my suggestions into this post. I’d like to note that I’ve restricted my responses to “non-textbooks;” otherwise, I’d have several more additions that would increase the average page count and price quite drastically. As such, these books don’t have problems to work or present an extreme level of detail, but in many cases they present enough information to provide a strong foundation and context for math and CS classes.

I will most likely write a separate blog post about this book. I read it during the end of the fall semester and found that it presented a very interesting approach to designing reusable code by utilizing principles from abstract algebra. It’s written to be accessible by someone who hasn’t studied abstract algebra yet, which means it also can serve as an introduction to that subject.

CODE: The Hidden Language of Computer Hardware and Software

Four years ago, I wrote a review of this book on RoboDesigners. At that time, my perspective was that of a high school student and I thought the book was interesting; with the additional perspective of a year of college study in Computer Science, I cannot recommend this book highly enough.

By “building” a computer piece-by-piece from the idea of a relay through developing a simple assembly language, it covers nearly all of the material as the Digital Logic Design course I took, but in an easy-to-read book. If you comprehend the material in this book, you will be able to coast through DLD.

When a mathematician with Hardy’s stature writes a book on why he studies math, it’s probably advisable to read it! Multiple professors of mine have said it’s a book any mathematician should read and I wholeheartedly agree. It’s really short (the printing I’ve linked above is only 154 pages), but the content is amazing. Hardy addresses the complaints many have with pure math and embodies the spirit of “doing mathematics for mathematics’ sake.” If you are thinking about pursuing a theoretical route in either CS or math, I highly recommend you read this book.

To grow my programming repertoire, I decided to learn a functional language; at the recommendation of a friend, I selected Haskell. Thus far, it seems great. As a mathematician at heart, I love the way that the notation and language constructs resemble math (list comprehensions, tuples, function composition, etc). In this blog post, I will outline the major resources I am using to learn Haskell.

To learn Haskell, I am using the ebook Learn You a Haskell for Great Good. Yes–terrible grammar in the title, but it’s (fairly) grammatically correct on the inside. This is a great introduction to Haskell, although I’d highly recommend prior knowledge of another programming language like Java or C++.

Unfortunately, that ebook is somewhat lacking in practice problems. It does have examples, but there isn’t a true “exercise” section like one would find in a textbook. This is a common fault with online programming tutorials; to be honest, creating a good exercise set is hard work. To remedy this problem, I turned to a favorite site of mine, HackerRank.com. While designed for competitive programmers, this site also has an “introductory” set of functional programming challenges (see here). These range in difficulty from very easy to extremely hard. This provides a great compliment to the tutorial I referenced above.

Finally, one last resource I am going to use after finishing Learn You a Haskell is a set of lectures by former University of Virginia student-teacher Nishant Shukla. Although I have not been able to view them in great detail, they appear to present a great introduction to Haskell.

How many divisors are there of the number $1281942112$? It turns out that determining the answer to this problem is (at most) only as difficult as determining the prime factorization of the number. In this blog post, I will outline a solution to this (and similar) problems.

The Math

The Fundamental Theorem of Arithmetic guarantees each positive integer greater than $1$ a unique prime factorization. We write this factorization as:

$$N = p_0^{e_0}p_1^{e_1}\cdots p_n^{e_n}$$

where $p_k$ is a prime number, and $e_k$ is its corresponding exponent. This provides us with useful information regarding divisors of $N$: any divisor of $N$ must be comprised of some combination of those prime factors (and exponents). Specifically, we can define the divisor, $D$, as:

$$D = p_0^{a_0}p_1^{a_1}\cdots p_n^{a_n}$$

where the $p_k$ are the same as in the factorization of $N$ and $a_k \in {0, 1, \ldots, e_k}$. To find the total number of divisors, we multiply together the number of options we have for each exponent. That is,

$$\text{Number of Divisors}\; = (e_0+1)(e_1+1)\cdots(e_n + 1)$$

Example: Consider $N = 20$. In this case, $N$ has $6$ divisors; to determine this without needing to list them all, we may note that $N = 2^2\cdot 5^1$. Using the notation described above, this means that $p_0 = 2,\;p_1 = 5$ and $e_0 = 2\;e_1 = 1$. Each of our divisors will be of the form $2^{a_0}\cdot 5^{a_1}$, where $a_0$ could be $0, 1,$ or $2$ and $a_1$ could be $0$ or $1$. Since we have $e_0+1 = 3$ options for $a_0$ and $e_1+1 = 2$ options for $a_1$, we have $3\cdot 2 = 6$ total divisors. In case you were wondering, the list of divisors is:

Algorithmic efficiency is imperative for success in programming competitions; your programs must be accurate and fast. To help evaluate algorithms for speed, computer scientists focus on what is called “asymptotics,” or “asymptotic analysis.” The key question answered by asymptotics is: “When your input gets really big, how many steps does your program take?” This post seeks to explain basic asymptotic analysis and its application to computing simple program runtime.

The underlying principle of asymptotic analysis is that a program’s runtime depends on the number of elementary operations it performs. The fewer elementary operations, the faster the program (and vice-versa). What do I mean by “elementary operation?” By this, I refer to any operation such that the runtime is not affected by the input size. This is more commonly referred to as a constant-time operation. Examples of such operations are assignment, basic arithmetic operations (+, -, *, /, %), accessing an array element, increment/decrement operations, function returns, and boolean expressions.

A First Example

So, a good way of gauging the runtime of a program is to count the number of elementary operations it performs. Let’s jump right in by analyzing a simple program.

1 2 3 4 5 6 7 8

publicstaticinttest(int N){ int i = 0;

while(i < N) { i++; } return i; }

Obviously, this program always returns $N$, so the loop is unnecessary. However, let’s just analyze the method as-is.

Lines 2 and 7 each contribute one constant-time operation. The loop contributes two constant-time operations per iteration (one for the comparison, one for the increment), plus one extra constant-time operation for the final comparison that terminates the loop. So, the total number of operations is:

(Notice how I used sigma (summation) notation for counting a loop’s operation. This is useful, because loops and sigma notation behave in much the same way.)

Thus, it will take $3+2N$ operations to perform that method, given an input $N$. If each operation takes $2\times 10^{-9}$ (about the speed of a 2 GHz processor), it would take 5 seconds to run this program for an input of $N=10^{10}$.

I am an avid member of the Math.StackExchange community. We have recently reached a milestone, as our request to create a site blog has been approved by the Stack Exchange administration. I volunteered to write a post which I believe should be useful to competition programmers.

Using Green’s Theorem, this post derives a formula for the area of any simple polygon, dependent solely on the coordinates of the vertices. This is useful for some computational geometry problems in programming; for example, the formula can be used to compute the area of the convex hull of a set of points.

After managing a fairly successful blog for many years about competitive robotics, I am attempting to re-brand myself as I begin my studies in the field of Computer Science and Mathematics.

This blog will be the place I post interesting pieces of code I either develop or find, as well as math concepts useful to competitive programmers. This blog will focus heavily on ACM-style competitions, and may occasionally contain hints or my solutions to problems from sites like USACO or UVA Online Judge. I will attempt to post most of my code from here on a GitHub repository, but I’m still experimenting with that.

Java is my “native language,” although I know Visual BASIC, C++ to a fair extent (few people can actually say they “really know” C++), and some Python. I’ll try to mix up the languages I post (I’ll have a tag for each language), but I predict most of my posts will be Java-oriented.

As always, topic suggestions are welcome via comments on any post.