Javascript: Splitting a URL into parts using Regex

Sep 6, 2014 • javaScript

How to split a URL into different parts in JS by using regex

JavaScript (js) is a lightweight, interpreted, object-oriented language with first-class functions, most known as the scripting language for Web pages. Most of the web application developers, designers etc.. use js on a daily basis. Recently I got selected for Google Summer of code 2014, and a part of my project was related to JavaScript where I had to make use of JavaScript for splitting a URL into corresponding parts. When I googled this, I came across many tutorials but later on I set down to write a function making use of the powerful JavaScript Regex (thanks to my friend, Arjun, who helped me in writing this). Here is the code that I wrote:

 
    function subDomain(url) {
        url = url.replace(new RegExp(/^s+/), "");
        url = url.replace(new RegExp(/s+$/), "");
        url = url.replace(new RegExp(/\/g), "/");
        url = url.replace(new RegExp(/^http://|^https://|^ftp:///i), "");
        url = url.replace(new RegExp(/^www./i), "");
        url = url.replace(new RegExp(//(.*)/), "");
        if (url.match(new RegExp(/.[a-z]{2,3}.[a-z]{2}$/i))) {
            url = url.replace(new RegExp(/.[a-z]{2,3}.[a-z]{2}$/i), "");
        } else if (url.match(new RegExp(/.[a-z]{2,4}$/i))) {
            url = url.replace(new RegExp(/.[a-z]{2,4}$/i), "");
        }
        var subDomain = (url.match(new RegExp(/./g))) ? true : false;
        return (subDomain);
    }
    function parseUrl(url) {
        var parser = document.createElement('a');
        parser.href = url;
        var datas = [parser.hostname, parser.port, parser.pathname];
        return (datas);
    }
    function printer(test_var) {
        var k = "";
        k = test_var.toString();
        var k1 = addhttp(k);
        q = k1[0];
        scheme = k1[2];
        k = k1[1];
        k = parseUrl(k);
        return k;
    }
    function addhttp(url) {
        var k3 = ""
        if (url.match(/http:///)) {
            url = url.substring(7);
            k3 = "http://"
        }
        if (url.match(/https:///)) {
            url = url.substring(8);
            k3 = "https://"
        }
        var k = "";
        k = url;
        if (url.match(/^www./)) {
            url = url.substring(4);
        }
        if (!/^(f|ht)tps?:///i.test(url)) {
            url = "http://" + url;
        }
        var k2 = [k, url, k3]
        return k2;
    }
    //var q = 'https://demo.testfire.net:1234/hey'
    //var output = document.getElementById('output');
    var temp = "";
    var hosts = "";
    var port = "";
    var scheme = "";
    var path = "";
    var sub = "";
    var datas = printer(q);
    sub = datas[0];
    port = datas[1];
    path = datas[2];
    if (!subDomain(sub)) {
        hosts = sub;
    } else {
        hosts = sub.split(".")[1] + "." + sub.split(".")[2];
    };
    if (!port) {
        port = 80;
    }
    if (!scheme) {
        scheme = 'http://';
    }

One obvious question that comes to everyone’s mind is that, what the hell does this code do ?. I can explain that. Even though this code looks very huge, its functionality is very simple. Here I’m assuming that you have the URL to split it in a variable named “q”. If so, the URL is passed as an argument to the function printer and the splitting occurs. After the splitting, 5 new JavaScript variables are created namely scheme, host, port, path **and **sub.

For example, consider that the URL saved in the variable q is “http:/security.securethelock.com:443/hey”. After the splitting, the 5 new variables will be created with the following values in it:

scheme: http://

host: securethelock.com

port: 443

path: hey

sub: security.securethelock.com

If port is not specified, the default value, 80,  will be stored. If q is a main domain (and not a subdomain) like “http://www.securethelock.com”, sub = host = securethelock.com . If path is not specified, the corresponding variable will be empty.

Anirudh Anand

Security Engineer @CRED | Web Application Security ♥ | Google, Microsoft, Zendesk, Gitlab Hall of Fames | Blogger | CTF lover - @teambi0s | Certs - eWDP, OSCP