As part of a recent homework set in my complex analysis course, I was tasked with a problem that required a slight generalization on part of Schwarz’s Lemma:

Lemma (Schwarz): Let $f$ be analytic on the unit disk with $|f(z)| \leq 1$ for all $z$ on the disk and $f(0) = 0$. Then $|f(z)| < |z|$ and $f’(0)\leq 1$. If either $|f(z)|=|z|$ for some $z\neq0$ or if $|f’(0)|=1$, then $f$ is a rotation, i.e., $f(z)=az$ for some complex constant $a$ with $|a|=1$.

The homework assignment asked us (within the context of a larger problem) to consider the case when $f(\zeta) = 0$ for some $\zeta \neq 0$ on the interior of the unit disk. The secret to this problem was to find some analytic function $\varphi$ that maps the unit disk to itself, but swaps $0$ and $\zeta$. Then, we may consider $\varphi^{-1}\circ f\circ \varphi$ and apply Schwarz’s Lemma.

For the past couple years, I have worked as a teaching assistant for UVa’s CS 2150 (Program and Data Representation) course. We recently started a GroupMe chat for the course staff, and I thought it would be fun to create a chatbot to help remind all the TAs to submit timesheets, keep track of when people are holding office hours, and remember when/where TA meetings are being held. Setting up a basic chatbot is a lot simpler than it sounds and is really fun–I wrote my bot from scratch using Python in just one day.

GroupMe Bot Overview

GroupMe has a very brief tutorial explaining how their API may be used for bots. The easiest way to create a bot is through their web form, which prompts you for the bot’s name, callback URL (technically optional, but you want it), avatar URL (optional), and the name of the group where the bot will live. Once you’ve done this, you will be given a unique bot ID token. Anyone with this token can pretend to be your bot, so keep it safe. (Security is somewhat laughable here: your bot asserts its ID and the server believes it with no “login” procedure.) We’ll talk more about the callback URL in a moment; for now, just leave it blank.

Once you’ve done these steps, you have created a bot–as far as GroupMe is concerned. If you send a specifically formatted JSON mssage, the newly created bot will post in your group. However, if left at this point, your “bot” is little more than a proxy for human-written messages submitted with curl. Your bot needs some way of reading messages sent to the group, formulating a response, and only then sending messages to the GroupMe servers.

While interning at Microsoft over the summer, I received a first-generation Surface Book with an i5-6300U CPU (2.4 GHz dual core with up to 3.0 GHz), 8GB RAM, and a “GeForce GPU” (officially unnamed, but believed to be equivalent to a GT 940). This is a huge step up from my older laptop, so I wanted to set it up for my ML work. In this post, I’ll outline how I set it up with TensorFlow and GPU acceleration.

CUDA + cuDNN

If you want to use GPU acceleration, the typical way to do so is with NVIDIA’s CUDA API. CUDA 8.0 is compatible with the Surface Book and is (as of this writing) the most up-to-date version of CUDA. Download it from the NVIDIA website and run their installer.

For work with deep learning, you’ll also want to install cuDNN. To install, just download the library from NVIDIA’s website and unzip it in a convenient place (I chose C:\cudnn). The only “installation” you need to do is to add C:\cudnn\bin to your PATH environment variable.

Nearly everyone is familiar with two-dimensional plots, and most college students in the hard sciences are familiar with three dimensional plots. However, modern datasets are rarely two- or three-dimensional. In machine learning, it is commonplace to have dozens if not hundreds of dimensions, and even human-generated datasets can have a dozen or so dimensions. At the same time, visualization is an important first step in working with data. In this blog entry, I’ll explore how we can use Python to work with n-dimensional data, where $n\geq 4$.

Packages

I’m going to assume we have the numpy, pandas, matplotlib, and sklearn packages installed for Python. In particular, the components I will use are as below:

1 2 3 4 5 6 7 8

import matplotlib.pyplot as plt import pandas as pd

from sklearn.decomposition import PCA as sklearnPCA from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA from sklearn.datasets.samples_generator import make_blobs

from pandas.tools.plotting import parallel_coordinates

Plotting 2D Data

Before dealing with multidimensional data, let’s see how a scatter plot works with two-dimensional data in Python.

First, we’ll generate some random 2D data using sklearn.samples_generator.make_blobs. We’ll create three classes of points and plot each class in a different color. After running the following code, we have datapoints in X, while classifications are in y.

1

X, y = make_blobs(n_samples=200, centers=3, n_features=2, random_state=0)

To create a 2D scatter plot, we simply use the scatter function from matplotlib. Since we want each class to be a separate color, we use the c parameter to set the datapoint color according to the y (class) vector.

Like many of my fellow Americans, I stayed up late tonight to watch the polling results for the 2016 General Election. As of my writing this, it appears that Donald Trump will win by a slight margin. The New York Times is predicting that the popular vote will go to Hillary Clinton, while Politico and the Wall Street Journal are showing the current popular vote is Trump’s by about 1 million.

Thanks to a helpful blog post by CodeBlocQ, I’ve now enabled Disqus-powered comments on the blog! Let me know what you think about my posts, and I’ll keep an eye on discussions to respond to questions/comments/concerns!

The second part of the Microsoft series should be out soon; I wanted to get comments working before I did so, but it took me a while to find the time to actually get it up and running.

As suggested by this post’s title, I spent this past summer as an intern with Microsoft in Redmond, Washington. The experience was highly educational for me–as my first (and last!) “real” internship, I learned a lot about software development and the importance of corporate culture, as well as discovering a lot about myself. Overall, the experience was a positive one, though, and I had an enormous amount of fun!

This is the first of a three-part series on my time at Microsoft. This post focuses on fun recreational activities for interns in the Seattle area.

Outdoors

The Pacific Northwest is home to some of the most amazing views I’ve ever seen. Seattle is conveniently located close to the beach, the mountains, Puget Sound, rainforests, and many hiking trails and campsites. Exploring the outdoors also has the advantage of being very inexpensive, which is great if you’re saving your internship money for college expenses. If you visit National Parks, consider the National Park Passport Program–if you’re going to once-in-a-lifetime parks, it’s a good idea to get your passbook stamped!

This summer, I’ve been working as an intern for Microsoft on the Direct2D/DirectWrite team. While I can’t really talk about what my work entails, I can talk about some of the fun things I’ve done this summer in my free time and the non-work-related components of my internship. I suppose most people wouldn’t start blogging about their internship by describing a bookstore, but I went to this place today and it was so incredible that I had to write about it.

In Capitol Hill, there’s a small store by the name Ada’s Technical Books. It’s in a house that’s been converted to a cafe and bookstore, and is quite possibly the most amazing bookstore I’ve ever seen. As you walk in, you’re greeted by an small cafe counter to your left and an open area to your right with short bookcases and comfy chairs. Toys, puzzles, and “Maker”-appropriate items like lockpicks and Raspberry Pis.

Many computer science problems utilize graph-based data structures. Their use can range from explicit inclusion in an algorithm-centric problem (like path-finding) to a more “behind-the-scenes” presence in Bayesian networks or descriptions of finite automata. Unfortunately, visualizing large graphs can be difficult to do, especially for debugging. Unlike lists or dictionaries, which can be represented clearly by plain text printing, depicting a graph tends to require more graphics overhead than is reasonable for most programmers to write simply for debugging purposes. I’ve found that Graphviz, a free graph visualization utility, can be quite useful in debugging graph-related programs.

Installing Graphviz

If you’re on a Debian-based Linux OS (e.g. Ubuntu), you can install Graphviz using apt-get. Just run $ sudo apt-get install graphviz and you’ll have everything you need to complete the steps in this blog post. Mac OS X users can use brew equivalently.

Windows users should install using a binary downloaded from the Graphviz Windows page, but there might be some issues with setting the PATH variable for running in the commandline.

Making a basic graph

Once you’ve installed, the next thing you’ll want to do is create a basic graph to ensure the installation succeeded and to gain practice using Graphviz tools. We do this by creating a *.dot file that describes the graph we wish to display. If you’re the type of person who likes to jump right in and experiment first before reading too much, or if you love formal language specification, the DOT grammar is fairly readable and can give a quick introduction to creating DOT files.

The below is a fairly representative DOT file to demonstrate some of the capabilities of Graphviz. Open your favorite text editor, copy/paste it in, and save it as firstgraph.dot:

digraph G { // Style information for nodes A [style="filled", color=".05 .3 1.0"]

// Edge declarations A -> {B, C}; D -> E [label="-1"]; E -> F [label="3"]; F -> D [label="10"]; }

This creates a directed graph (also called a digraph) with six nodes and two connected components. Some of the edges have labels, and one of the nodes is colored. After you’ve copied (or downloaded) this file, open up a terminal to the directory with firstgraph.dot in it and run $ dot firstgraph.dot -Tpng -o firstgraph.png. The resulting image file should look something like the below:

This past week, my ICPC team worked the 2013 Greater New York Regional problem packet. One of my favorite problems in this set was Problem E: Deranged Exams. The code required to solve this problem isn’t that complicated, but the math behind it is a little unusual. In this post, I aim to explain the math and provide a solution to this problem.

Problem Description

The full problem statement is archived online; in shortened form, we can consider the problem to be:

Given a “matching” test of $n$ questions (each question maps to exactly one answer, and no two questions have the same answer), how many possible ways are there to answer at least the first $k$ questions wrong?

It turns out that there’s a really nice solution to this problem using a topic from combinatorics called “derangements.” (Note that the problem title was a not-so-subtle hint towards the solution.)

Derangements

While the idea of a permutation should be familiar to most readers, the closely related topic of a derangement is rarely discussed in most undergraduate curriculum. So, it is reasonable to start with a definition:

A derangement is a permutation in which no element is in its original place. The number of derangements on $n$ elements is denoted $D_n$; this is also called the subfactorial of $n$, denoted $!n$.

The sequence $\langle D_n\rangle$ is A000166 in OEIS (a website with which, by the way, every competitive programmer should familiarize themselves).

It turns out that there is both a recursive and an explicit formula for $D_n$:

This is significant because we can use the explicit formulation for computing single values of derangements, or we can use dynamic programming to rapidly compute $D_n$ for relatively small $n$.