Character

字符相关的包

chardet ---> 字符编码检测

模块使用 occurrence analysis 来确定最可能的编码

import chardet from 'chardet';

const encoding = chardet.detect(Buffer.from('hello there!'));
// or
const encoding = await chardet.detectFile('/path/to/file');
// or
const encoding = chardet.detectFileSync('/path/to/file');

可能的返回值

UTF-8 UTF-16 LE UTF-16 BE UTF-32 LE UTF-32 BE ISO-2022-JP ISO-2022-KR ISO-2022-CN Shift_JIS Big5 EUC-JP EUC-KR GB18030 ISO-8859-1 ISO-8859-2 ISO-8859-5 ISO-8859-6 ISO-8859-7 ISO-8859-8 ISO-8859-9 windows-1250 windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 windows-1256 KOI8-R

leven ---> 计算两个字符串的最小差异

import leven from 'leven';

leven('cat', 'cow');
//=> 2

mrmime ---> 通过文件名得到 mime 类型

import { lookup, mimes } from 'mrmime';

// Get a MIME type
// ---
lookup('txt'); //=> "text/plain"
lookup('.txt'); //=> "text/plain"
lookup('a.txt'); //=> "text/plain"

// Unknown extension
// ---
lookup('.xyz'); //=> undefined

// Add extension to dictionary
// ---
mimes['xyz'] = 'hello/world';
lookup('xyz'); //=> "hello/world"

类似的库 broofa/mime

transliteration ---> 音译字符转换

适合于对中文文件名转为英文字符，却仍保留语义的场景

import { transliterate as tr, slugify } from 'transliteration';

tr('你好, world!');
// Ni Hao , world!
slugify('你好, world!');
// ni-hao-world

Python 的版本 barseghyanartur/transliterate

strip-ansi ---> 从字符串中去除 ansi 字符

import stripAnsi from 'strip-ansi';

stripAnsi('\u001B[4mUnicorn\u001B[0m');
//=> 'Unicorn'

stripAnsi('\u001B]8;;https://github.com\u0007Click\u001B]8;;\u0007');
//=> 'Click'

ohash ---> 轻便的内容哈希

import { hash, objectHash, murmurHash } from 'ohash'

console.log(objectHash({ foo: 'bar'}))  //将对象转为稳定安全的哈希字符串
// "object:1:string:3:foo:string:3:bar,"

console.log(murmurHash('Hello World')) // 将字符串转为32位正整数
// "2708020327"

console.log(hash({ foo: 'bar'})) // 先objectHash ，后murmurHash
// "2736179692"

fuse.js ---> 模糊查询库

使用 Bitap 算法来找到最佳的匹配。Bitap 算法是一种用于字符串搜索的二进制算法，它通过比较二进制位来判断字符串是否匹配

const list = [
  { "title": "Old Man's War","author": "John Scalzi","tags": ["fiction"] },
  { "title": "The Lock Artist","author": "Steve","tags": ["thriller"] }
]
const options = {
  includeScore: true,
  // Search in `author` and in `tags` array
  keys: ['author', 'tags']
}
const fuse = new Fuse(list, options)
const result = fuse.search('tion')
/**
[
  {
    "item": {
      "title": "Old Man's War",
      "author": "John Scalzi",
      "tags": ["fiction"]
    },
    "refIndex": 0,
    "score": 0.03
  }
]
 */

Character ​

chardet ---> 字符编码检测 ​

leven ---> 计算两个字符串的最小差异 ​

mrmime ---> 通过文件名得到 mime 类型 ​

transliteration ---> 音译字符转换 ​

strip-ansi ---> 从字符串中去除 ansi 字符 ​

ohash ---> 轻便的内容哈希 ​

fuse.js ---> 模糊查询库 ​