Skip to main content

TinyURL

Definition

URL shortening is a service that produces short aliases for long URLs, commonly referred to as short links. Upon clicking, these short links direct to the original URLs. The following illustration depicts how the process works:

Requirements

Functional requirements

  • Short URL generation: Our service should be able to generate a unique shorter alias of the given URL.
  • Redirection: Given a short link, our system should be able to redirect to the user to the original URL.
  • Custom short links: Users should be able to generate custom short links for their URLs using our system.
  • Deletion: User should be able to delete a short link generated by our system, given the rights.
  • Update: User should be able to update the long URL associated with the short link, given the proper rights.
  • Expiry time: There must be a default expiration time for the short links, but users should be able to set the expiration time based on their requirements.

Non-functional requirements

  • Scalability: Our system should be horizontally scalable with increasing demand.
  • Availability: Our system should be highly available, because even a fraction of the second downtime would result in URL redirection failures. Since our system's domain is in URLs, we don't have he leverage of downtime, and our design must be fault-tolerance conditions instilled in it.
  • Latency: The System should perform at low latency to provide the user with a smooth experience.
  • Readability: The short links generated by our system should be easily readable, distinguishable, and typeable.
  • Unpredictability: From a security standpoint, the short links generated by our system should be highly unpredictable. This ensures that the next-in-line short URL is not serially produced, eliminating the possibility of someone guessing all the short URLs that our system has ever produced or will produce.

Resource estimation

Assumptions

  • There are 100 million Daily Active Users (DAU).
  • There are 200 million new URL shortening requests per month.
  • We assume that the shortening:redirection request ratio is 1:100.
  • A URL shortening entry requires 500 Bytes of database storage.
  • Each entry will have a maximum of five years of expiry time, unless explicitly deleted.

Storage estimation

Since entries are saved for a time period of 5 years and there are a total of 200 million entries per month, the total entries will be approximately 12 billion.

200 Million/Month * 12 months/year * 5 years = 12 Billion * 500 Bytes = 6TB

Query rate estimation

Based on the estimations above, we can expect 20 billion redirection requests per month

200 Million * 100 = 20 Billion

We can extend our calculations for Queries Per Second (QPS) for our system from this baseline. The number of seconds in one month, given the average number of days per month is 30.42:

30.42 days × 24 hours × 60 minutes × 60 seconds = 2628288 seconds

Considering the calculation above, new URL shortening requests per second will be:

20 Billion / 2628288 seconds = 76 URLs/s

With a 1:100 shortening to redirecting ratio, the URL redirection rate per second will be:

100 × 76 URLs/s = 7.6 K URLs/s

Bandwidth estimation

Shortening requests: The expected arrival rate will be 76 new URLs per second. The total incoming data would be 304 Kbps per second:

76 × 500 Bytes × 8 bits = 304 Kbps

Redirection requests: Since the expected rate would be 7.6K URLs redirections per second, the total outgoing data would be 30.4Mbps per second:

7.6 K × 500 Bytes × 8 bits = 30.4 Mbps

Memory estimation (per day)

We need memory estimates in case we want to cache some of the frequently accessed URL redirection requests. Let’s assume a split of 80-20 in the incoming requests. 20 percent of redirection requests generate 80 percent of the traffic.

Since the redirection requests per second are 7.6 K, the total would be 0.66 billion for one day.

7.6 K × 3600 seconds × 24 hours = 0.66 billion

Since we would only consider caching 20 percent of these per-day redirection requests, the total memory requirements estimate would be 66 GB.

0.2 × 0.66 Billion × 500 Bytes = 66 GB

Number of servers estimation

We adopt the same approximation discussed in the Back-of-the-Envelope Calculations to calculate the number of servers needed: the number of daily active users and the requests handling limit of a server are the two main factors in depicting the total number of servers required. Recall that a typical server can serve 64,000 requests per second (RPS). Considering our assumption of using daily active users as a proxy for the number of requests per second for peak load times, we get 100 million requests per second. Then, we use the following formula to calculate the number of servers:

Building blocks we will use

With the estimations done, we can identify the key building blocks in our design. Such a list is given below:

  • Database(s) will be needed to store the mapping of long URLs and the corresponding short URLs.
  • Sequencer will provide unique IDs that will serve as a starting point for each short URL generation.
  • Load balancers at various layers will ensure smooth requests distribution among available servers.
  • Caches will be utilized to store the most frequent short URLs related requests.
  • Rate limiters will be used to avoid system exploitation.

Besides these building blocks, we'll also need the following additional components to achieve the desired service:

  • Servers to handle and navigate the service requests along with running the application logic.
  • A Base-58 encoder to transform the sequencer’s numeric output to a more readable and usable alphanumeric form.

Design and Deployment

System APIs

  • Shortening a URL
  • Redirecting a short URL
  • Deleting a short URL

Shortening a URL

shortURL(api_dev_key, original_url, custom_alias=None, expiry_date=None)

Redirecting a short URL

redirectURL(api_dev_key, url_key)

Deleting a short URL

deleteURL(api_dev_key, url_key)

Design

Components

  • Database: NoSQL, MongoDB
  • Short URL generator:
    • A sequencer to generate unique IDs
    • A Base-58 encoder to enhance the readability of the short URL
  • Other building blocks:
    • Load balancing
    • Cache
    • Rate limiter

Design diagram

A simple design diagram of the URL shortening system is given below.

Workflow

1. Shortening

Each new request for shor link computation gets forwarded to the short URL generator (SUG) by the application server. Upon successful generation of the short link, the system sends one copy back to the user and stores the record in the database for future use.

2. Redirection

Application servers, upon receiving the redirection requests, check the storage units (caching system and database) fro the required record. If found, the application server redirects the user to the associated long URL.

3. Deletion

A logged-in user can delete a record by requesting the application server which forwards the user details and the associated URL's information to the database server for deletion. A system-initiated deletion can also be triggered upon an expiry time, as we'll see ahead.

This task begins with checking the eligibility of the requested short URL. The maximum lenght allowed is 11 alphanumeric digits. We can find the details on the allowed format and the specific digits in the next lesson. Once verified, the system checks its availability in the database. If the requested URL is available, the user receives a successful short URL generation message, or an error message in the opposite case.

Encoder

Base-58

Convert base-10 to base-58

这个计算过程是将一个十进制(Base-10)数转换成五十八进制(Base-58)数的过程。在每一步中,都涉及到除以 58 然后取余数的操作。这个过程从原始的 Base-10 数字开始,直到这个数字被减小到小于 58 为止。每一步的被除数是通过将前一步的被除数除以 58 并向下取整得到的。

让我们一步步解释你的示例:

原始数字: 2468135791013

首先,2468135791013 % 58 = 17,得到余数 17。 然后,我们将 2468135791013 除以 58 并向下取整,得到 42554065362。 下一步: 42554065362

现在,42554065362 % 58 = 6,得到余数 6。 接着,42554065362 / 58 = 733690782(向下取整)。 继续这个过程,直到得到的商小于 58 为止。每一步都计算余数,并更新被除数为原来的商。

Convert base-58 to base-10

这个计算过程是将一个五十八进制(Base-58)数转换成十进制(Base-10)数的过程。在 Base-58 到 Base-10 的转换中,每个 Base-58 数字位的值乘以 58 的幂,然后将这些乘积相加。这里的关键是确定每个字符在 Base-58 中的数值,并计算其相应的 58 的幂次。

让我们一步步解释你的示例:

1. Base-58 数字: 27qMi57J

在 Base-58 编码中,通常使用一组特定的 58 个字符(如字母和数字,但去掉了一些容易混淆的字符)。每个字符都对应一个从 0 到 57 的数值。例如,在你的例子中,字符 2 可能代表数值 1,7 代表 6,依此类推。

2. 计算过程
  • 第一位: 2
    • 在 Base-58 中,2 的数值可能是 1。
    • 1 × 58^7 = 2207984167552
  • 第二位: 7
    • 7 的数值是 6。
    • 6 × 58^6 = 228412155264
  • 依此类推,每个字符乘以其对应的 58 的幂次。
3. 完整的转换
  • q: 可能对应数值 48,48 × 58^5 = 31505124864
  • M: 对应数值 20,20 × 58^4 = 226329920
  • i: 对应数值 41,41 × 58^3 = 7999592
  • 5: 对应数值 4,4 × 58^2 = 13456
  • 7: 对应数值 6,6 × 58^1 = 348
  • J: 对应数值 17,17 × 58^0 = 17
4. 计算结果

最后,将所有这些乘积加起来,得到最终的十进制数:

2207984167552 + 228412155264 + ... + 17 = (最终结果)

The scope of the short URL generator

The short URL generator is the backbone of our URL shortening service. The output of this short URL generator depends on the design-imposed limitations, as given below:

  • The generated short URL should contain alphanumeric characters.
  • None of the characters should look alike.
  • The minimum default length of the generated short URL should be six characters.

These limitations define the scope of our short URL generator. We can define the scope, as shown below:

  • Starting range: 必须有 6 个字符串,所以 ID 的大小必须大于 58^5=656*10^6 结果,所以 ID 可以考虑从 1Billion (10 digits) 开始。
  • Ending point: 我们需要先计算出在 base-58 编码中 使用 2 进制表示的时候最高可以是多少位,我们需要公式去计算。

步骤解释

  1. 表示一个数字需要的比特数, 这是通过计算 log2(n) 得出的,其中 n 是基数。log2(n) 给出了在二进制系统中表示基数 n 需要的最小比特数。
    • 例如,对于十进制(Base-10),n 是 10。因此,需要的比特数是 log2(10),约等于 3.32。这意味着大约需要 3.32 位来表示一个十进制数字。
    • 对于五十八进制(Base-58),n 是 58。所以,需要的比特数是 log2(58),约等于 5.85。
  2. 在 64 位中可以表示的最大数字位数, 这是通过将 64 位总数除以表示一个数字所需的比特数来计算的。
    • 对于 Base-10,计算方法是 64 / log2(10),得到大约 19.31,向下取整得到 19 -> 在 Base-10(十进制)中,64 位可以表示大约 19 位数。
    • 对于 Base-58,计算方法是 64 / log2(58),得到大约 10.95,向下取整得到 10 -> 在 Base-58 中,64 位可以表示大约 11 位数。

The sequencer's lifetime

The number of years that our sequencer can provide us with unique IDs depends on two factors:

  • Total numbers available in the sequencer = 2^64 - 10^9 (starting from 1 Billion as discussed above)
  • Number of requests per year = 200 Million per month × 12 = 2.4 Billion (as assumed in Requirements)

So, taking the above two factors into consideration, we can calculate the expected life of our sequencer.

Evaluation

RequirementsTechniques
Availability
  • The Amazon S3 service backs up the storage and cache servers daily. We can restore them upon fault occurrence.
  • Global server load balancing to handle the system's traffic.
  • Rate limiters to limit each user's resource allocation.
Scalability
  • Horizontal sharding of the database.
  • Distribution of the data based on consistent hashing.
  • MongoDB - as the NoSQL database.
Readability
  • Introduction of the base-58 encoder to generate short URLs.
  • Removal of non-alphanumeric characters.
  • Removal of look-alike characters.
Latency
  • Unnoticeable delay in the overall operation.
  • MongoDB, because of its low latency and high throughput in reading tasks.
  • Distributed cache to minimize the service delays.
Unpredictability
  • Randomly selecting and associating an ID to each request, from the pool of unused and readily available unique IDs.