http-requestWhen a browser is trying to download a webpage it is necessary to perform, at least, these steps before it gets the content:

                  1. Resolve the host name, i.e., obtain the IP associated to that domain
                  2. Connect a socket to the IP
                  3. Write on the socket the request
                  4. Read the socket data

In this article I will try to decribe some of them using both Qt and clib sockets.

 

Qt TCP Sockets

Connecting socket

Socket declaration (class scope):

class MainWindow : public QMainWindow
{
    Q_OBJECT
public:
    explicit MainWindow(QWidget *parent = 0);
    ~MainWindow();
    void log(const QString log) const;
    const QString generateGet() const;
private slots:
    void on_download_clicked();
    void lookedUp(const QHostInfo &host);
    void onSocketConnected();
    void onSocketRead();
    void onSocketDisconnected();
    void on_clean_clicked();
private:
    Ui::MainWindow *ui;
    QTcpSocket socket;
};

 Socket signal connections:

connect(&socket, SIGNAL(connected()), this, SLOT(onSocketConnected()));
connect(&socket, SIGNAL(readyRead()), this, SLOT(onSocketRead()));
connect(&socket, SIGNAL(disconnected()), this, SLOT(onSocketDisconnected()));

 Socket conection:

void MainWindow::lookedUp(const QHostInfo &host)
{
    if (host.error() != QHostInfo::NoError) {
        log("Lookup failed: " + host.errorString());
        return;
    }
    if (!host.addresses().empty()) {
        const QHostAddress address = host.addresses().first();
        log("Found address: " + address.toString());
        log("Connecting to socket");
        if (socket.state() == QAbstractSocket::ConnectedState) {
            socket.disconnectFromHost();
        }
        socket.connectToHost(address, 80);
    }
}

 This slot is executed once the host name lookup finishes.

 

Writing a HTTP request on the socket

When the socket is connected to the endpoint, according the slot specified before, this code is executed:

void MainWindow::onSocketConnected()
{
    log("socket connected");
    QString const request = generateGet();
    log("\n\nFull request:\n" + request);
    ui->websiteView->textCursor().insertText("========== NEW REPLY ========== ");
    log("Writing to socket the request");
    socket.write(request.toLatin1().data());
    log("The request was written");
}

 where the request is injected to the socket. The request should have at least these parameters:

  • GET/POST <path> HTTP/1.1\r\n
  • Host <hostname>\r\n
  • \r\n

For instance: a request to main page of google:

GET / HTTP/1.1\r\nHost www.google.com\r\n\r\n

const QString MainWindow::generateGet() const
{
    const QUrl url(ui->address->toPlainText());
    const QString path = url.path();
    log("Setting path: " + path);
    const QString HTTPHeader = " HTTP/1.1\r\n";
    log("Setting HTTP header: " + HTTPHeader);
    const QString connectionHeader = "Connection: Keep-Alive\r\n";
    log("Setting connection header: " + connectionHeader);
    const QString encodingHeader = "Accept-Encoding: gzip, deflate\r\n";
    log("Setting encoding header: " + encodingHeader);
    const QString languageHeader = "Accept-Language: es-ES,en,*\r\n";
    log("Setting language header: " + languageHeader);
    const QString userAgentHeader = "User-Agent: Mozilla/5.0\r\n";
    log("Setting user agent header: " + userAgentHeader);
    const QString hostHeader = "Host: " + url.host() + "\r\n";
    log("Setting host header: " + userAgentHeader);
    QString fullRequest = "GET " + path + HTTPHeader + connectionHeader
                          + encodingHeader + languageHeader + userAgentHeader
                          + hostHeader + "\r\n";
    return fullRequest;
}

 

Reading the data (HTTP response and webpage)

void MainWindow::onSocketRead()
{ QByteArray socketData = socket.readAll(); log("Reading from socket " + QString::number(socketData.size()) + " bytes"); ui->websiteView->textCursor().insertText(QString(socketData)); ui->websiteView->moveCursor(QTextCursor::End); }

 

 

clib TCP sockets

clib socket main functions:

  • socket: creates a socket descriptor
  • setsockopt: sets the socket options (family, protocol, etc..)
  • getsockopt
  • connect: connect the socket to the specified address
  • bind: connect the socket to the specified local address
  • accept: once the socket is bind, it allows to accept client connections
  • shutdown: finalizes the socket

As a client we are only interested on socket, setsockopt and connect.

Connecting socket

int socket_connect(char *host, in_port_t port){
    struct hostent *hp;
    struct sockaddr_in addr;
    int on = 1, sock;
    if((hp = gethostbyname(host)) == NULL){
        herror("gethostbyname");
        exit(1);
    }
    bcopy(hp->h_addr, &addr.sin_addr, hp->h_length);
    addr.sin_port = htons(port);
    addr.sin_family = AF_INET;
    sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);
    setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, (const char *)&on, sizeof(int));
    if(sock == -1){
        perror("setsockopt");
        exit(1);
    }
    if(connect(sock, (struct sockaddr *)&addr, sizeof(struct sockaddr_in)) == -1){
        perror("connect");
        exit(1);
    }
    return sock;
}

It first creates the socket, then sets the options and finally connects to the end-point

 

Writing a HTTP request on the socket

Generating the HTTP request with the minimum headers:

char *generate_http_request(const char* hostname, const char* path)
{
    char *http_request;
    const char *get = "GET  HTTP/1.1\r\n";
    const char *host = "Host: \n\n";
    const char *end = "\r\n";
    int total_size = strlen(get) + strlen(host) + strlen(end) + strlen(path) + strlen(hostname);
    http_request = malloc(total_size*sizeof(char));
    strncpy(http_request, get, 4); // copy only get
    strncpy(&http_request[strlen(http_request)], path, strlen(path)); // append path
    strncpy(&http_request[strlen(http_request)], &get[4], strlen(get)-4); // append remaining data from get
    printf("GET data: %s\n", http_request);
    strncpy(&http_request[strlen(http_request)], host, 6); // copy host header
    strncpy(&http_request[strlen(http_request)], hostname, strlen(hostname)); // append host name
    strncpy(&http_request[strlen(http_request)], &host[6], strlen(host)-6); // append remaining date from host
    printf("GET+Host data: %s\n", http_request);
    return http_request;
}

 Writing the request on the socket:

write(fd, req, strlen(req));

send() could be also used.

 

Reading the data (HTTP response and webpage)

We just read the socket until we get the end of the file (zero value):

while(read(fd, buffer, BUFFER_SIZE - 1) != 0){
    fprintf(stdout, "%s", buffer);
    bzero(buffer, BUFFER_SIZE);
}

 

Wireshark capture

wireshark

The code is available on github