TL;DR; Python is the most required skill if you're looking for a data science job on Stackoverflow.

Searching for a data science job on stackoverflow will return at the moment 162 job openings.

Show the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
<?php

require_once(__DIR__ . '/vendor/autoload.php');
// Add parameters to the query via the constructor
$query = new JobApis\Jobs\Client\Queries\StackoverflowQuery();

// Add parameters via the set() method
$query->set('q', 'data scientist');

// Instantiating a provider with a query object
$client = new JobApis\Jobs\Client\Providers\StackoverflowProvider($query);

// Get a Collection of Jobs
$jobs = $client->getJobs();

$data = [];

foreach($jobs->all() as $key => $job) {
    $data[$key] = ["skills" => $job->getSkills(), "description" => $job->getDescription()];
}

$fp = fopen("data-science-jobs.csv", 'w');
foreach ($data as $key => $fields) {
    if ($key === 0) {
        fputcsv($fp,["skills", "description"]);
    }
    fputcsv($fp, $fields);
}

fclose($fp);

?>

If we are looking at the skills required section, we will see that the first 10 skills are:

Show the code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import datetime

%matplotlib inline

jobs_data = pd.read_csv(
    "data/stackoverflow/data-science-jobs.csv"
)

from collections import Counter

skills = jobs_data["skills"].dropna()
skills_list = skills.str.split(",").as_matrix().flat
single_skills_list = [item.strip(" ") for sublist in skills_list for item in sublist]

occurences = Counter(single_skills_list)
sorted_occurences = occurences.most_common()

sorted_occurences_first_10 = sorted_occurences[:10]
sorted_occurences_dict = dict((x, y) for x, y in sorted_occurences_first_10)
explode = (0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
fig = plt.figure(figsize=(10,10))
plt.pie([float(v) for v in sorted_occurences_dict.values()], labels=[k for k in sorted_occurences_dict],
            autopct='%1.1f%%', explode=explode, shadow=True, startangle=140)

Contrary to my expectation, the degree is mentioned in only a quarter of the job posts:

Show the code

1
2
3
4
5
description = jobs_data["description"].dropna()
description_degree = description.str.lower().str.contains('degree').value_counts()

fig = plt.figure(figsize=(10,10))
description_degree.plot(kind="pie", labels=["Degree not required","Degree required"], autopct='%1.1f%%')

Full code here.