SimpleHTTPServer标准库源码学习

Contents

简介
SimpleHTTPRequestHandler

简介

本模块是建立在 BaseHTTPServer 模块基础之上的。

在 BaseHTTPServer 模块的基础之上，实现了具体的 GET 、 HEAD 请求。

本模块实现了一个浏览根目录及其子目录的 HTTP Server, 可以直接通过命令启动 :

1	python -m SimpleHTTPServer

然后在浏览器中输入地址 http://127.0.0.1:8000 或者 http://localhost:8000 即可 .

SimpleHTTPRequestHandler

class SimpleHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):

    server_version = "SimpleHTTP/" + __version__

    def do_GET(self):
        f = self.send_head()
        if f:
            # f 为一个文件对象 ( 或者类文件对象 )
            # 可从中读取出消息主体的内容 
            self.copyfile(f, self.wfile)
            f.close()

    def do_HEAD(self):
        f = self.send_head()
        if f:
            # HEAD 只需要返回响应消息头部即可 , 不需要返回消息主体 
            f.close()

    def send_head(self):
        path = self.translate_path(self.path)
        f = None
        # path 是目录 
        if os.path.isdir(path):
            if not self.path.endswith('/'):
                # HTTP 重定向 , 按照 apache 的实现思路来的 ,
                # 如果客户端的请求一个目录路径时但结尾未加 "/" 则要求其重新请求 (self.path + "/").
                # 具体原因参见 `Issue17324`
                self.send_response(301)
                self.send_header("Location", self.path + "/")
                self.end_headers()
                return None
            # 如果服务的根目录下面有 `index.html`  `index.htm` 文件 , 则返回该文件 .
            for index in "index.html", "index.htm":
                index = os.path.join(path, index)
                if os.path.exists(index):
                    path = index
                    break
            else:
                return self.list_directory(path)

        # path 是文件 
        ctype = self.guess_type(path)
        try:
            # 这里不用 `rt` 模块读 , 是因为 `rt` 模式会根据当前的系统环境来决定换行符 
            # 是哪一种 , 但是这样会导致在下面获取文件大小时 , `Content-Length` 的值与 
            # 消息主体中的数据长度不匹配 .
            f = open(path, 'rb')
        except IOError:
            self.send_error(404, "File not found")
            return None
        self.send_response(200)
        self.send_header("Content-type", ctype)
        # 获取文件大小 , 貌似 os.path.getsize 就可以了 , 可能这里的代码是先 os.path.getsize 写的 .
        fs = os.fstat(f.fileno())
        self.send_header("Content-Length", str(fs[6]))
        self.send_header("Last-Modified", self.date_time_string(fs.st_mtime))
        self.end_headers()
        return f # 消息主体直接是文件内容 

    def list_directory(self, path):
        """ 列出目录中的文件 ,HTML 格式 """
        # 在请求一个目录时 , 这里相当于将目录中的文件列表构造成一个内容为 HTML 的文件对象并返回 
        # 之所以使用 StringIO, 是为了和上面的 `return f` 对应 , 统一接口 , 外部好统一处理 .
        try:
            list = os.listdir(path)
        except os.error:
            self.send_error(404, "No permission to list directory")
            return None
        list.sort(key=lambda a: a.lower())
        f = StringIO()
        displaypath = cgi.escape(urllib.unquote(self.path))
        # 下面的内容 , 在浏览器中 , 右击并 " 查看网页源码 " 就看到 , 没啥好讲的 .
        f.write('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">')
        f.write("<html>\n<title>Directory listing for %s</title>\n" % displaypath)
        f.write("<body>\n<h2>Directory listing for %s</h2>\n" % displaypath)
        f.write("<hr>\n<ul>\n")
        for name in list:
            fullname = os.path.join(path, name)
            displayname = linkname = name
            # Append / for directories or @ for symbolic links
            if os.path.isdir(fullname):
                displayname = name + "/"
                linkname = name + "/"
            if os.path.islink(fullname):
                displayname = name + "@"
                # Note: a link to a directory displays with @ and links with /
            f.write('<li><a href="%s">%s</a>\n'
                    % (urllib.quote(linkname), cgi.escape(displayname)))
        f.write("</ul>\n<hr>\n</body>\n</html>\n")
        length = f.tell()
        f.seek(0)
        self.send_response(200)
        encoding = sys.getfilesystemencoding()
        self.send_header("Content-type", "text/html; charset=%s" % encoding)
        self.send_header("Content-Length", str(length))
        self.end_headers()
        return f

    def translate_path(self, path):
        """ 将 HTTP 请求消息中的路径转换成本地路径 """
        # abandon query parameters
        # URL 本身支持参数 , 但是这里的代码不对其进行处理 , 直接忽略 
        path = path.split('?',1)[0]
        path = path.split('#',1)[0]
        # Issue17324
        trailing_slash = True if path.rstrip().endswith('/') else False
        path = posixpath.normpath(urllib.unquote(path))
        words = path.split('/')
        words = filter(None, words) # 过滤掉空字符串 
        path = os.getcwd()
        for word in words:
            drive, word = os.path.splitdrive(word)
            head, word = os.path.split(word)
            if word in (os.curdir, os.pardir): continue
            path = os.path.join(path, word)
        if trailing_slash:
            path += '/'
        return path

    def copyfile(self, source, outputfile):
        shutil.copyfileobj(source, outputfile)

    def guess_type(self, path):
        """ 猜测文件的 MIME 类型 , 用于 HTTP 消息头部中的 `Content-type` """
        # 这里的代码仅仅是通过后缀来进行其类型的猜测 
        # 在实际环境中最好通过文件内容 ( 一般是文件头部 ) 来猜测 , 这样准确性更高 
        # 但要读取文件内容 ( 磁盘操作 ), 所以性能会有所降低 
        base, ext = posixpath.splitext(path)
        if ext in self.extensions_map:
            return self.extensions_map[ext]
        ext = ext.lower()
        if ext in self.extensions_map:
            return self.extensions_map[ext]
        else:
            return self.extensions_map['']

    if not mimetypes.inited:
        mimetypes.init() # try to read system mime.types
    extensions_map = mimetypes.types_map.copy()
    extensions_map.update({
        '': 'application/octet-stream', # Default
        '.py': 'text/plain',
        '.c': 'text/plain',
        '.h': 'text/plain',
        })

cgi.escape

该函数的功能和 BaseHTTPServer 模块中的 _quote_html 函数功能类似 , 用来防止代码注入 .

In [82]: cgi.escape?
Type:        function
String form: <function escape at 0xa77a3e4>
File:        /usr/lib/python2.7/cgi.py
Definition:  cgi.escape(s, quote=None)
Docstring:
Replace special characters "&", "<" and ">" to HTML-safe sequences.
If the optional flag quote is true, the quotation mark character (")
is also translated.

Issue17324

具体参见 Issue17324

传统意义上说，URL 末尾是没有反斜杠的。有无 "/" 主要用于指示该 url 是指向一个文件还是一个目录，例如：

http://localhost:8000/too.txt 指向的是网站根目录下一个名为 too.txt 的文件

http://localhost:8000/too.txt/ 指向的是网站根目录下一个名为 too.txt 的目录

这样浏览器以及 HTTP 服务器都可以针对这种约定来进行相应的优化 , 加快网页的加载速度 .

所以对于一个文件 too.txt 用 http://localhost:8000/too.txt/ 来请求的话 , 应该返回 404 错误 , 服务器要将之当作一个目录来处理 .

filter

filter 的第一个参数 , 支持两种类型 , 一种是 callable 对象 , 另外一种是 None.

为 None 时 , 它的行为类似于 filter(lambda x: bool(x), sequence)

In [40]: filter?
Type:        builtin_function_or_method
String form: <built-in function filter>
Namespace:   Python builtin
Docstring:
filter(function or None, sequence) -> list, tuple, or string

Return those items of sequence for which function(item) is true.  If
function is None, return the items that are true.  If sequence is a tuple
or string, return the same type, else return a list.

In [41]: filter(None, "saifasdf/a/asdfi///asdif".split("/"))
Out[41]: ['saifasdf', 'a', 'asdfi', 'asdif']

Comments

SimpleHTTPServer标准库源码学习

简介

SimpleHTTPRequestHandler

cgi.escape

Issue17324

filter

Published

Category

Tags

Contact