rollup

下一代 ES 模块捆绑器

一些插件

介绍一些常用的插件

官方插件

插件	功能
@rollup/plugin-alias	支持 alias 配置
@rollup/plugin-commonjs	支持使用 ESM 语法引入 CJS 文件
@rollup/plugin-json	支持直接导入 json 文件
@rollup/plugin-babel	引入 babel 的能力
@rollup/plugin-replace	字符串替换
@rollup/plugin-typescript	添加对 ts 的支持
@rollup/plugin-node-resolve	使用 Node resolution algorithm 解析模块
@rollup/plugin-terser	产物压缩

在 rollup/plugins 仓库可以查看所有的官方 rollup 插件

社区插件

插件	功能
rollup-plugin-typescript2	除了添加对 ts 的支持，对错误输出友好
rollup-plugin-copy	支持 Glob 模式的文件拷贝
rollup-plugin-dts	d.ts 生成支持
rollup-plugin-esbuild	引入 esbuild 能力（转换、压缩）

Tree-sharking 原理

如果我们对 Tree-sharking 原理一无所知，可能很难理解其中的一些做法，所以下面我将自底向上，从最基础的建立数据结构开始，逐步介绍一下整个过程。

AST

我们知道，rollup 基于 ESM，有了静态分析代码的能力，那如何分析代码呢？答案就是通过 ast 分析，如果你之前写过使用过 babel 或者 eslint 等的插件，对 AST 肯定会比较熟悉

有非常多的 JS parser 都可以解析代码得到 AST（babel、recast、swc、espree...）

TIP

虽然解析器非常的多，为了保证 AST 在各个解析中复用，大部分解析器自发达成共识，都遵循 estree 标准

我们这里使用 acornjs/acorn 来进行代码解析，得到 AST

可以在这个playground，选择 acorn 作为 parser，然后尝试解析一段代码，就可以看到 AST 的结构

walker

上面我们提到了 AST，它是一个树形结构，为了访问树中的节点，我们可以通过自己写一个 DFS（深度优先搜索）去遍历它

然而，这有一点不方便，譬如我只是想找到一个名为 foo 的函数节点，需要不断去遍历不同类型节点的子节点，写起来很麻烦

所以更多情况，我们可以直接使用 walker（Abstract Syntax Tree walker）来遍历语法树，例如 acorn-walk，这样，我们就可以直接找到我们所需要类型的节点，然后通过其他信息进一步过滤得到我们的目标节点

import * as walk from 'acorn-walk'
import * as acorn from 'acorn'

const ast = acorn.parse(`
  let x = 10;
  function foo(){
    return 20;
  }
`, { ecmaVersion: 'latest' }
)

walk.simple(ast, {
  Literal(node) {
    console.log(`Found a literal: ${node.value}`)
    // Found a literal: 10
    // Found a literal: 20
  },
  Function(node, state) {
    console.log(`Found a function: ${node.id.name}`)
    // Found a function: foo
  },
})

另一种方式是带有钩子的 walker，例如 Rich-Harris/estree-walker，可以在其内部对 AST 对 DFS 中，在 entry/leave 钩子中执行我们想要的代码

import * as acorn from 'acorn'
import { walk } from 'estree-walker'
import type { Node } from 'estree'

const ast = acorn.parse(`
  let x = 10;
  function foo(){
    return 20;
  }
`, { ecmaVersion: 'latest' }) as Node

walk(ast, {
  enter(node, parent, prop, index) {
    console.log({node})
  },
  leave(node, parent, prop, index) {
    // some code happens
  }
});

Scope 类

我们定义一个 Scope 类，里面记录着当前作用域的一些信息

作用域的基本属性

一个名称 name，内部的变量名 names，如果是函数作用域，则可能存在参数，它同样也是变量，我们用 params 记录

class Scope {
  name
  parent
  names
  params
  constructor(options: {
    name?: string; parent?: Scope; params?: string[]; names?: string
  } = {}) {
    // 作用域的名称
    this.name = options.name
    // 父作用域
    this.parent = options.parent
    // 此作用域内定义的变量
    this.names = options.names || []

    // 函数作用域定义的参数也是该作用域的变量
    this.params = options.params || []
  }
}

添加变量的方法

我们还需要有一个方法，可以随着 AST 的遍历，把当前 Scope 的变量添加到 names 中

add(name) {
  this.names.push(name)
}

查找变量的方法

我们已经通过 parent 属性记录了父级的 Scope，自然就可以通过递归的方式，找到变量所在的作用域

// 递归向上找到包含当前变量名name的作用域
findDefiningScope(name) {
  // 判断当前作用域中是否有该变量
  if (this.params.includes(name) || this.names.includes(name))
    return this

  else if (this.parent)
    return this.parent.findDefiningScope(name)

  else
    return null
}

analysis

接下来我们写一个 analysis 模块，通过 DFS 遍历一个 AST，生成我们想要的信息，将相关信息挂载在 AST 的节点上

currentScope

首先，我们需要有一个顶级的 currentScope 变量，用于记录 DFS 过程中，我们所处位置的 Scope

function analysis(ast, magicString, moduleInstance) {
  let currentScope = new Scope()
  // ...
}

顶层节点的状态初始化

我们初始化顶层节点的状态：源码、全局变量、外部变量、是否被打包了。然后

// body下的顶层结点，属于顶级作用域
ast.body.forEach((statement) => {
  Object.defineProperties(statement, {
    // 为每个子结点标记对应的代码
    _source: { value: magicString.snip(statement.start, statement.end) },
    _defines: { value: {} }, // 存放当前模块定义的所有全局变量
    _dependsOn: { value: {} }, // 当前模块依赖的外部变量
    _included: { value: false, writable: true }, // 判断当前语句是否被打包
  })
})

模块全局变量收集、_scope 构建

收集全局变量，以及各个子 Scope 的变量，将 Scope 作为节点的 _scope 属性挂载。为了使代码不过于复杂，我们只处理函数和变量节点的情况

// 构建作用域链, 保存所有变量params
walk(statement, {
  enter(node) {
    let newScope
    switch (node.type) {
      case FUNCTION_DECLARATION:
        // 函数的参数也是函数内声明的变量
        const params = node.params.map(x => x.name)
        // 拿到identifier内保存的name标识符
        addToScope(node)
        // 如果是顶层的函数声明，会生成新的作用域
        newScope = new Scope({
          parent: currentScope,
          params,
        })
        break
        // 变量声明不会产生新的作用域
      case VARIABLE_DECLARATION:
        node.declarations.forEach(addToScope)
        break
    }
    // 如果生成了新的作用域，那么紧接着会进入该作用域内的结点
    // 那么此时在访问currentScope时，就属于当前作用域了
    if (newScope) {
      // 给会生成新的作用域的结点,标记_scope
      Object.defineProperty(node, '_scope', { value: newScope })
      currentScope = newScope
    }
  },
  leave(node) {
    // 如果一个结点产生了作用域，那么要回到父作用域（回溯）
    if (node._scope)
      currentScope = currentScope.parent
  },
})

function addToScope(identifierNode) {
  const name = identifierNode.id.name
  currentScope.add(name) // 把test加入作用域

  // 如果当前是全局作用域
  if (!currentScope.parent) {
    // 标记全局作用域下声明了test这个变量
    statement._defines[name] = true
  }
}

找出外部依赖 _dependsOn

我们已经通过 DFS 遍历当前模块的定义的变量，并且把这个变量加入到了对于到 scope 中。当然，除了使用当前模块中定义的变量，一个文件还会使用到其他文件中的变量，所以我们接下来还需要处理这些变量

ast._scope = currentScope

ast.body.forEach((statement) => {
  // 找出外部依赖_dependsOn
  walk(statement, {
    enter(node) {
      // 如果这个结点产生了新的作用域，那么就修改指向
      if (node._scope)
        currentScope = node._scope

      // 如果是标识符,就从作用域开始向上查找,看当前遍历是否在作用域链中定义
      if (node.type === IDENTIFIER) {
        const definingScope = currentScope.findDefiningScope(node.name)
        // 如果没定义，说明该变量是依赖的外部变量
        if (!definingScope)
          statement._dependsOn[node.name] = true
      }
    },
    leave(node) {
      // 退出时，回退作用域
      if (node._scope)
        currentScope = currentScope.parent
    },
  })
})

至此，我们的 analysis 功能就写好了

Module 类

我们简单认为一个 module 对应一个文件

初始化

把当前模块的代码、文件的绝对路径，和所属 bundle 传入（下面会讲到），生成 ast 、导入、导出、顶级变量的信息

class Module {
  constructor({ code, path, bundle }: { code: string; path: string; bundle: Bundle }) {
    this.code = new MagicString(code) // 当前模块的代码
    this.path = path // 当前模块的绝对路径
    this.bundle = bundle // 当前模块属于哪个bundle

    // 生成ast
    this.ast = parse(code, { ecmaVersion: 7, sourceType: 'module' })

    this.imports = {} // 存放当前模块所有的导入
    this.exports = {} // 存放当前模块所有的导出
    this.definitions = {} // 存放所有全局变量的定义语句

    // 对ast进行分析
    this.analysis()
  }
}

从 AST 提取导入导出、AST 分析

接下来，对 ast 进行分析，会用到我们上面写过的 analysis 函数。我们在 analysis 方法中完成

analysis() {
  // 根据ast结点，收集模块的导入导出
  this.ast.body?.forEach((node) => {
  //   import a, {name as n,age} from './msg
    if (node.type === IMPORT_DECLARATION) {
      // 获取导入的来源:  ..msg
      const source = node.source.value

      // 获取所有和导入内容有关的结点,a,{name,age}
      const specifiers = node.specifiers

      specifiers.forEach((specifier) => {
        // {name as n}
        // 本地的变量: n
        const localName = specifier.local.name
        // 导入的变量: name
        const name = specifier.imported?.name ?? localName

        // 记录本地的哪个变量是从哪个模块的哪个变量导出的
        // this.imports.age = {name:"name",localName:"n",source:"./msg"}
        this.imports[localName] = { name, localName, source }
      })
    }
    // export const a = 1
    else if (node.type === EXPORT_NAMED_DECLARATION) {
      const { declaration } = node
      if (declaration.type === VARIABLE_DECLARATION) {
        const varname = declaration.declarations[0].id.name
        //   记录当前模块的导出信息，
        //   this.exports.a = {node,localName:"a",expression:const a = 1对应的结点}
        this.exports[varname] = {
          node,
          localName: varname,
          expression: declaration, // 通过哪个表达式创建的
        }
      }
    }
  })
  analysis(this.ast, this.code, this)
  this.ast.body.forEach((statement) => {
    Object.keys(statement._defines).forEach((name) => {
      // name:全局变量名
      // statement，定义对应变量的语句结点
      this.definitions[name] = statement
    })
  })
}

展开节点，得到顶层节点和其所以依赖的节点

此步骤是为了，将类似 import { foo } from './bar' 的导入语句，替换为对应模块的代码。我们需要进行 Tree-sharking，所以不应该将对应模块的直接用全部的源码替换，而是需要展开节点，收集用到的节点，然后之前记录的 _source 属性，将导入语句替换

// 展开顶层节点
expandAllStatements() {
  const allStatements = [] as any[]
  this.ast.body.forEach((statement) => {
    // 替换导入语句为对应模块的声明语句
    if (statement.type === IMPORT_DECLARATION)
      return

    const statements = this.expandStatement(statement)

    allStatements.push(...statements)
  })
  return allStatements
}

// 展开一个结点（一个结点可能依赖或者声明多个变量），找到当前结点依赖的变量的声明语句
// 可能是在当前模块声明，也可能是在导入的模块声明
expandStatement(statement) {
  const res = [] as any[]

  const depend = Object.keys(statement._dependsOn)// 外部依赖
  depend.forEach((name) => {
    const definition = this.define(name)
    definition && res.push(definition)
  })

  // TODO: 其实这里还没做到完全的treeshaking，如果入口文件中，不依赖外部模块的变量并且没调用，它还是会加入打包
  if (!statement._included) {
    statement._included = true
    // tree shaking核心
    res.push(statement)
  }
  return res
}

define

上面代码高亮处的 define 方法如下，是为了将节点依赖的外部节点，也一并收集到 allStatements 中

define(name) {
  // 查看导入变量里有没有name，有则说明是导入进来的
  if (hasOwnP(this.imports, name)) {
    // this.imports.age = { name: 'name', localName: 'n', source: './msg' }
    const importDeclaration = this.imports[name]
    // 获取msg模块
    const module = this.bundle.fetchModule(importDeclaration.source, this.path)

    // 获取msg模块导出的name
    //   this.exports.a = {node,localName:"name",expression:const name = 1对应的结点}
    const exportDeclaration = module?.exports[importDeclaration.name]

    // 递归调用，有可能msg的name也是从其他地方导入的
    const res = module?.define(exportDeclaration.localName)

    return res
  }
  else {
    const statement = this.definitions[name]
    if (statement && !statement._included) {
      statement._included = true
      return statement
    }

    return null
  }
}

Bundle 类

在不拆包的情况下，以单文件入口打包，我们会得到一个 bundle 产物，我们需要一个类来记录这个 bundle 的信息

API 调用 build

我们对外暴露一个函数，叫做 rollup，此方法提供外部 API，从入口开始进行打包。主要做的两件事：初始化 Bundle 类、调用 build 方法

function rollup(entry: string, outputFileName: string) {
  // bundle就代表打包对象,包括所有的模块信息
  const bundle = new Bundle({ entry })
  bundle.build(outputFileName)
}

rollup(entry, 'bundle.js')

build 方法

在 build 方法中，通过调用 this.fetchModule 获取入口的 Module 实例，随后调用其 expandAllStatements 方法，得到所有用到的 node

class Bundle {
  constructor(options: { entry: string }) {
    const { entry } = options

    // 入口文件的绝对路径,包括后缀
    // TODO: 因为目前测试是js后缀，所以先解析成js
    this.entryPath = entry

    // 存放着所有模块,入口文件和它依赖的模块
    this.modules = {}
  }

  build(outputFileName: string) {
    // 从入口文件的绝对路径出发,找到入口模块, 创建并返回Module对象
    const entryModule = this.fetchModule(this.entryPath)
    // 把这个入口模块的所有语句进行展开,返回所有语句组成的数组
    this.nodes = entryModule?.expandAllStatements() ?? {}

    const { code } = this.generate()

    fs.writeFileSync(outputFileName, code, 'utf-8')
  }
}

fetchModule 方法

在 fetchModule 方法中，返回文件所对应的 Module 实例

// importee当前模块，importer导入该模块的模块
  fetchModule(importee: string, importer?: string) {
    const route
      = !importer // 如果没有模块导入此模块，那么就是入口模块
        ? importee
        : path.isAbsolute(importee) // 如果是绝对路径
          ? importee
          // import a from './msg.ts' 根据importer路径去解析importee路径
          : path.resolve(path.dirname(importer), `${importee.replace(/\.js$/, '')}.js`)

    // 如果存在对应的文件
    if (fs.existsSync(route)) {
      // 根据绝对路径读取源代码
      const code = fs.readFileSync(route, 'utf-8')
      const module = new Module({
        code,
        path: route,
        // 归属与哪个bundle对象
        bundle: this,
      })
      return module
    }
  }

generate 方法

借助 magic-string 的 bundle 功能，我们将节点的源码 node._source.clone() 加入，移除 export 语句

// 把this.nodes生成代码
generate() {
  const magicString = new MagicString.Bundle()
  this.nodes.forEach((node) => {
    const source = node._source.clone()
    // 移除export
    if (node.type === EXPORT_NAMED_DECLARATION)
      source.remove(node.start, node.declaration.start)

    magicString.addSource({
      content: source,
    })
  })
  return { code: magicString.toString() }
}

为什么 ESM 可以 tree-sharking？

首先，在 NodeJs 中，顶层的代码是导入就一定会被执行的，例如下面的代码，一旦有其他文件导入了 index.ts 文件，即使导入后什么也不做。console.log("hello") 和 const num = 1 + 2 也一定会被执行

// index.ts
console.log("hello") // 

const num = 1 + 2

export { num }

在 ESM 中语句 import/export 一定位于模块顶层，所以我们通过分析 AST 的顶级语句，很容易知道一个模块使用了哪些模块，以及具体模块中使用到的导出的变量。

所以我们称 ESM 具有静态结构，这使得依赖分析工具能够精确地建立模块间的依赖关系

而 CJS 可以在非顶层作用域调用 require/export，无法从静态的角度判断是否会执行到这个 export，自然无法确定会使用到哪些导出，很难进行 tree-sharking

副作用

什么是副作用

上面说到：顶层的代码是导入就一定会被执行的，我们可以理解为执行的是一个表达式。而在这个执行的过程中，我们很难判断这个表达式有没有产生“副作用”。

副作用是指：一个表达式或者函数在执行的过程中，改变了外部的状态，包括但不限于：

修改了全局变量
修改了函数参数
执行 I/O 操作

模块导入后可能会执行一些对环境产生影响的代码。这种情况下，即使在另一个模块中没有使用到该变量，我们也不能直接把它的代码在 bundle 中移除，例如下面这个例子：

// test.js
const foo = (() => {
  window.greet = 'hello'
  return 1
})();

const bar = 234

export { foo, bar }

// index.js
import { bar, foo } from './test.js'
console.log(bar)

我们以 index.js 作为打包入口，虽然在 index.js 中没有使用到 foo 变量，但是显然不能将 foo 的声明语句直接移除，因为从表达式得到 foo 的过程中，它「可能」会对环境产生影响，例如在 window 上挂载了一个值

// 错误的 tree-sharking 结果，把 foo 的声明给完全移除了。和原始逻辑不一致，有潜在的问题
const bar = 234
console.log(bar)

// -----
// 逻辑正确的 tree-sharking 结果
(() => {
  window.greet = 'hello'
  return 1
})();

const bar = 234
console.log(bar)

虽然从静态分析，无法判断副作用的产生，但是作为开发者是可以明确一个表达式是否有副作用的，因此，许多打包工具会把 /*#__PURE__*/ 注释识别为指令，让开发者标记哪些表达式是没有副作用的。从而让工具给 tree-sharking

const foo = /*#__PURE__*/ (() => {
  return 1 + 2
})();

纯函数

纯函数是指在相同输入的情况下总是返回相同输出，并且不产生任何副作用的函数。

在 JS 的编程风格中，建议我们声明的函数都是纯函数，原因之一就是不建议函数出现副作用，有副作用的函数会带来一些问题：

难以测试，函数的输入输出不仅仅收到参数的影响，还受到外部状态的影响
难以理解，需要关注除了函数参数之外的额外因素
Tree-sharking 不友好
并发问题

举个例子，下面的 Bad case 中，修改了外部 a 的状态，这个是十分不建议的做法

// Bad case
let a = { key: 1 }
function merge1(b) {
  a = {
    ...a,
    ...b
  }
}
merge1({ key: 2 })

// Good case
let a = { key: 1 }
function merge2(a, b) {
  const mergedConfig = {
    ...a,
    ...b
  }
  return mergedConfig
}
const newConfig = merge2(a, { key: 2 })

rollup ​

一些插件 ​

官方插件 ​

社区插件 ​

Tree-sharking 原理 ​

AST ​

walker ​

Scope 类 ​