软件工程师访谈 - #EIS CLI

百变鹏仔 4个月前 (01-16) #Python

文章标签软件工程师

介绍

这是软件工程师访谈系列的第三篇文章。我带来了几年前做过的挑战，并且实际上得到了这个职位 - 涉及其他技术面试，例如过去的经验筛选。

如果您错过了本系列之前的帖子，可以在这里找到它们。

挑战

这个挑战也是一项带回家的编码任务，我必须开发一个 cli 程序来查询 oeis（整数序列在线百科全书）并返回结果总数以及第一个结果的名称查询返回五个序列。

值得庆幸的是，oeis 查询系统包含 json 输出格式，因此您可以通过调用 url 并将序列作为查询字符串传递来获取结果。

输入和输出示例：

oeis 1 1 2 3 5 7

found 1096 results. showing first five:1. the prime numbers.2. a(n) is the number of partitions of n (the partition numbers).3. prime numbers at the beginning of the 20th century (today 1 is no longer regarded as a prime).4. palindromic primes: prime numbers whose decimal expansion is a palindrome.5. a(n) = floor(3^n / 2^n).

注意：此结果已过时！

解决挑战

解决这一挑战的计划如下：

创建一个客户端文件，负责从 oeis 查询系统获取数据

一个格式化程序，负责返回为控制台格式化的输出

由于这是一个编码挑战，我将使用 poetry 来帮助我创建项目的结构，并方便任何人运行它。您可以在他们的网站上查看如何安装和使用 poetry。

我将首先使用以下内容创建包：

poetry new oeis

这将创建一个名为 oeis 的文件夹，其中包含 poetry 的配置文件、一个测试文件夹和一个也称为 oeis 的文件夹，该文件夹将成为我们项目的根目录。

我还将添加一个名为 click 的可选包，它有助于构建 cli 工具。这不是必需的，可以用 python 中的其他本机工具替换，尽管不太优雅。

在项目文件夹中，运行：

poetry add click

这会将 click 添加为我们项目的依赖项。

现在我们可以移动到入口点文件。如果你打开文件夹 oeis/oeis，你会看到已经有一个 __init__.py 文件。让我们更新它以导入 click，以及使用以下命令调用的主函数：

# oeis/oeis/__init__.pyimport click@click.command()def oeis():    passif __name__ == "__main__":    oeis()

这是我们 cli 的起点。看到@click.command了吗？这是来自 click 的包装器，它将帮助我们将 oeis 定义为命令。

现在，还记得我们需要接收以空格分隔的数字序列吗？我们需要将其添加为参数。 click 有一个选项：

# oeis/oeis/__init__.pyimport click@click.command()@click.argument("sequence", nargs=-1)def oeis(sequence: tuple[str]):    print(sequence)if __name__ == "__main__":    oeis()

这将添加一个名为序列的参数，并且 nargs=-1 选项告诉单击它将用空格分隔。我添加了一个打印，以便我们可以测试参数是否正确传递。

为了告诉 poetry 我们有一个命令，我们需要打开 pyproject.toml 并添加以下行：

# oeis/pyproject.toml[tool.poetry.scripts]oeis = "oeis:oeis"

这是添加一个名为 oeis 的脚本，该脚本调用 oeis 模块上的 oeis 函数。现在，我们运行：

poetry install

这将让我们调用脚本。我们来尝试一下：

❯ poetry run oeis 1 2 3 4 5('1', '2', '3', '4', '5')

完美，我们已经按照我们的预期解析了命令和参数！让我们继续讨论客户端。在oeis/oeis文件夹下，创建一个名为clients的文件夹、一个名为__init__.py的文件和一个名为oeis_client.py的文件。

如果我们期望在这个项目中拥有其他客户端，我们可以开发一个基本客户端类，但由于我们只有这一个，所以这可能被认为是过度设计。在 oeis 客户端类中，我们应该有一个基本 url，这是没有路径的 url，我们将使用它来查询它：

# oeis/oeis/clients/oeis_client.pyimport requestsfrom urllib.parse import urlencodeclass oeisclient:    def __init__(self) -> none:        self.base_url = "https://oeis.org/"    def query_results(self, sequence: tuple[str]) -> list:        url_params = self.build_url_params(sequence)        full_url = self.base_url + "search?" + url_params        response = requests.get(full_url)        response.raise_for_status()        return response.json()    def build_url_params(self, sequence: tuple[str]) -> str:        sequence_str = ",".join(sequence)        params = {"q": sequence_str, "fmt": "json"}        return urlencode(params)

如您所见，我们正在导入 requests 包。我们需要将它添加到 poetry 中才能使用它：

poetry add requests

现在，客户端有一个不会改变的基本 url。让我们深入研究其他方法：

查询结果

我们还需要更新我们的主文件，以调用此方法：

# oeis/oeis/__init__.pyimport clickfrom oeis.clients.oeis_client import oeisclientoeis_client = oeisclient()@click.command()@click.argument("sequence", nargs=-1)def oeis(sequence: tuple[str]):    data = oeis_client.query_results(sequence)    print(data)if __name__ == "__main__":    oeis()

这里我们现在在方法外部构建一个客户端实例，因此它不会在每次调用命令时都创建一个实例，而是在命令内部调用它。

运行此命令会产生非常非常长的响应，因为 oeis 有数千个条目。由于我们只需要知道总大小和前五个条目，因此我们可以执行以下操作：

# oeis/oeis/__init__.pyimport clickfrom oeis.clients.oeis_client import oeisclientoeis_client = oeisclient()@click.command()@click.argument("sequence", nargs=-1)def oeis(sequence: tuple[str]):    data = oeis_client.query_results(sequence)    size = len(data)    top_five = data[:5]    print(size)    print(top_five)if __name__ == "__main__":    oeis()

运行这个已经比以前好得多了。我们现在打印总大小以及前五个（如果存在）条目。

但我们也不需要所有这些。让我们构建一个格式化程序来正确格式化我们的输出。创建一个名为 formatters 的文件夹，其中包含 __init__.py 文件和 oeis_formatter.py 文件。

# oeis/oeis/formatters/oeis_formatter.pydef format_output(query_result: list) -> str:    size = len(query_result)    top_five = query_result[:5]    top_five_list = [f"{i+1}. {entry["name"]}" for i, entry in enumerate(top_five)]    top_five_str = "".join(top_five_list)    first_line = f"found {size} results. showing the first {len(top_five)}:"    return first_line + top_five_str

该文件基本上将前五个结果格式化为我们想要的输出。让我们在主文件中使用它：

# oeis/oeis/__init__.pyimport clickfrom oeis.clients.oeis_client import oeisclientfrom oeis.formatters import oeis_formatteroeis_client = oeisclient()@click.command()@click.argument("sequence", nargs=-1)def oeis(sequence: tuple[str]):    data = oeis_client.query_results(sequence)    output = oeis_formatter.format_output(data)    print(output)if __name__ == "__main__":    oeis()

如果您运行此代码，您现在将得到：

found 10 results. showing the first 5:1. a(n) is the number of partitions of n (the partition numbers).2. a(n) = floor(3^n / 2^n).3. partition triangle a008284 read from right to left.4. number of n-stacks with strictly receding walls, or the number of type a partitions of n in the sense of auluck (1951).5. number of partitions of n into prime power parts (1 included); number of nonisomorphic abelian subgroups of symmetric group s_n.

它现在以我们期望的格式返回，但请注意它说找到了 10 个结果。这是错误的，如果您在 oeis 网站上搜索，您会看到更多结果。不幸的是，oeis api 进行了更新，结果不再返回包含结果数量的计数。不过，该计数仍然显示在文本格式的输出中。我们可以用它来知道有多少个结果。

为此，我们可以更改 url 以使用 fmt=text 和正则表达式来查找我们想要的值。让我们更新客户端代码以获取文本数据，并更新格式化程序以使用此数据，以便我们可以输出它。

# oeis/oeis/clients/oeis_client.pyimport reimport requestsfrom urllib.parse import urlencodeclass oeisclient:    def __init__(self) -> none:        self.base_url = "https://oeis.org/"        self.count_regex = re.compile(r"showing .* of (d*)")    def query_results(self, sequence: tuple[str]) -> list:        url_params = self.build_url_params(sequence, fmt="json")        full_url = self.base_url + "search?" + url_params        response = requests.get(full_url)        response.raise_for_status()        return response.json()    def get_count(self, sequence: tuple[str]) -> str:        url_params = self.build_url_params(sequence, fmt="text")        full_url = self.base_url + "search?" + url_params        response = requests.get(full_url)        response.raise_for_status()        return self.get_response_count(response.text)    def build_url_params(self, sequence: tuple[str], fmt: str) -> str:        sequence_str = ",".join(sequence)        params = {"q": sequence_str, "fmt": fmt}        return urlencode(params)    def get_response_count(self, response_text: str) -> str:        match = self.count_regex.search(response_text)        if not match:            raise exception("count not found!")        return match.group(1)

如您所见，我们添加了两个新方法：

获取响应计数

# oeis/oeis/formatters/oeis_formatter.pydef format_output(query_result: list, count: str) -> str:    top_five = query_result[:5]    top_five_list = [f"{i+1}. {entry["name"]}" for i, entry in enumerate(top_five)]    top_five_str = "".join(top_five_list)    first_line = f"found {count} results. showing the first {len(top_five)}:"    return first_line + top_five_str

在这个文件中，我们只为方法添加了一个新的参数，并用它代替了查询结果的长度。

# oeis/oeis/__init__.pyimport clickfrom oeis.clients.oeis_client import oeisclientfrom oeis.formatters import oeis_formatteroeis_client = oeisclient()@click.command()@click.argument("sequence", nargs=-1)def oeis(sequence: tuple[str]):    data = oeis_client.query_results(sequence)    count = oeis_client.get_count(sequence)    output = oeis_formatter.format_output(data, count)    print(output)if __name__ == "__main__":    oeis()

这里我们只是在客户端调用新方法，并将信息传递给格式化程序。再次运行它会产生我们期望的输出：

❯ poetry run oeis 1 2 3 4 5Found 7821 results. Showing the first 5:1. The positive integers. Also called the natural numbers, the whole numbers or the counting numbers, but these terms are ambiguous.2. Digital sum (i.e., sum of digits) of n; also called digsum(n).3. Powers of primes. Alternatively, 1 and the prime powers (p^k, p prime, k >= 1).4. The nonnegative integers.5. Palindromes in base 10.

代码已经基本准备好了。但对于真正的挑战，请记住尽可能使用 git，进行小型提交，当然，添加单元测试、代码格式化库、类型检查器以及您认为需要的任何其他内容。

祝你好运！

文章推荐

软件工程师访谈 - #EIS CLI

介绍

挑战

解决挑战

Python实现字典的key和values的交换

使用Python脚本来获取Cisco设备信息的示例

Python的Django中django-userena组件的简单使用教程

零基础写python爬虫之神器正则表达式

零基础写python爬虫之抓取百度贴吧代码分享