Skip to content

Problem 4: Shakespeare and Dictionaries (0pts)

Note that this problem is in shakespeare.py, not hw04.py. It will not be tested and graded by OJ.

We will use dictionaries to approximate the entire works of Shakespeare! We're going to use a bigram language model. Here's the idea: We start with some word -- we'll use "The" as an example. Then we look through all of the texts of Shakespeare and for every instance of "The" we record the word that follows "The" and add it to a list, known as the successors of "The". Now suppose we've done this for every word Shakespeare has used, ever.

Let's go back to "The". Now, we randomly choose a word from this list, say "cat". Then we look up the successors of "cat" and randomly choose a word from that list, and we continue this process. This eventually will terminate in a period (".") and we will have generated a Shakespearean sentence!

The object that we'll be looking things up in is called a "successor table", although really it's just a dictionary. The keys in this dictionary are words, and the values are lists of successors to those words.

请注意,这个问题在 shakespeare.py 中,而不在 hw04.py 中。它不会被 OJ 测试和评分。

我们将使用字典来近似莎士比亚的全部作品!我们将使用二元语法语言模型(bigram language model)。思路如下:我们从某个词开始——以“The”为例。然后我们查阅莎士比亚的所有文本,记录“The”出现的每一个实例后面紧跟的词,并将这个词添加到一个列表中,这个列表被称为“The”的后继词(successors)。现在假设我们已经为莎士比亚用过的每一个词都做了这个工作。

让我们回到“The”。现在,我们从“The”的后继词列表中随机选择一个词,比如“cat”。然后我们查找“cat”的后继词,并从那个列表中随机选择一个词,然后我们继续这个过程。这个过程最终会在一个句号(“.”)处终止,我们就会生成一个莎士比亚式的句子

我们用来查找信息的对象被称为“后继词表”(successor table),尽管它实际上只是一个字典。这个字典中的是词语,是这些词语的后继词列表。