534 Design TinyURL

How would you design a URL shortening service that is similar toTinyURL?

Background: TinyURL is a URL shortening service where you enter a URL such ashttps://leetcode.com/problems/design-tinyurland it returns a short URL such ashttp://tinyurl.com/4e9iAk.

Requirements:

  1. For instance, "http://tinyurl.com/4e9iAk" is the tiny url for the page "https://leetcode.com/problems/design-tinyurl". The identifier (the highlighted part) can be any string with 6 alphanumeric characters containing 0-9, a-z, A-Z.

  2. Each shortened URL must be unique; that is, no two different URLs can be shortened to the same URL.

Note about Questions: Below are just a small subset of questions to get you started. In real world, there could be many follow ups and questions possible and the discussion is open-ended (No one true or correct way to solve a problem). If you have more ideas or questions, please ask in Discuss and we may compile it here!

Questions:

  1. How many unique identifiers possible? Will you run out of unique URLs?

  2. Should the identifier be increment or not? Which is easier to design? Pros and cons

  3. Mapping an identifier to an URL and its reversal - Does this problem ring a bell to you?

  4. How do you store the URLs? Does a simple flat file database work?

  5. What is the bottleneck of the system? Is it read-heavy or write-heavy?

  6. Estimate the maximum number of URLs a single machine can store.

  7. Estimate the maximum number of queries per second (QPS) for decoding a shortened URL in a single machine.

  8. How would you scale the service? For example, a viral link which is shared in social media could result in a peak QPS at a moment's notice.

  9. How could you handle redundancy? i,e, if a server is down, how could you ensure the service is still operational?

  10. Keep URLs forever or prune, pros/cons? How we do pruning? (Contributed by @alex_svetkin)

  11. What API would you provide to a third-party developer? (Contributed by @alex_svetkin)

  12. If you can enable caching, what would you cache and what's the expiry time? (Contributed by @Humandroid)

The Idea: If base 10 was our alphabet, then simply generating keys that increment from 0 can provide each url a unique key. Using this same idea, we begin counting from 0 but in base 62. This new alphabet set include lower and uppercase characters, and base 10 digits, which total to 62 characters.

Complexity: O(1) average and O(N) worst time. O(2N) space for bidirectional map. Where N is the number of tiny urls generated.

inline void pause() { cin.ignore(numeric_limits<streamsize>::max(), '\n'); }

class URLService {
public:

    string get_tiny(const string &url) {
        if (large_to_short_db.find(url) != large_to_short_db.end())
            return large_to_short_db[url];
        else {
            string shortened = base_10_to_base_62(CURRENT_INDEX++);
            large_to_short_db[url] = shortened;
            short_to_large_db[shortened] = url;
            return home_url + shortened;
        }
    }

    auto get_large(const string &url) {
        string without_tiny = url.substr(home_url.length(), url.length() - home_url.length());
        return short_to_large_db[without_tiny];
    }

#ifdef _DEBUG
    void generate_keys(const int go_to_COUNTER)
    {
        int max_base_10_pow = to_string(go_to_COUNTER).length();
        for (int i = 0; i <= go_to_COUNTER; i++) {
            cout << setw(max_base_10_pow) << i << ": " << home_url + base_10_to_base_62(CURRENT_INDEX++) << endl;
        }
    }
#endif

private:
    static long long CURRENT_INDEX;
    const string elements = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    const string home_url = "http://tiny.url/";
    const int url_base = elements.length();
    unordered_map<string, string> large_to_short_db, short_to_large_db;

    string base_10_to_base_62(int base10_num)
    {
        if (base10_num == 0) return "0";
        string shortened = "";
        while (base10_num != 0) {
            string nth_base = string(1, elements[base10_num % url_base]);
            shortened.insert(0, nth_base);
            base10_num /= url_base;
        }
        return shortened;
    }
};

long long URLService::CURRENT_INDEX = 0; // change for new seed


int main()
{
    URLService service1;
    service1.generate_keys(1000);
    pause();

    URLService service2;
    service2.generate_keys(1000);
    pause();

    URLService service3;
    string tiny1 = service3.get_tiny("youtube.com");  cout << tiny1 << endl;
    string tiny2 = service3.get_tiny("leetcode.com"); cout << tiny2 << endl; 
    string tiny3 = service3.get_tiny("twitch.tv");    cout << tiny3 << endl;

    cout << service3.get_large(tiny1) << endl;
    cout << service3.get_large(tiny2) << endl;
    cout << service3.get_large(tiny3) << endl;
}

Last updated