Skill Gap Analysis

Comparing Team Capabilities to Market Demand in Data & Analytics Roles

Objective

This section evaluates how our team’s current technical capabilities compare to the skills most frequently demanded in the job market for Data & Analytics roles.

Using Lightcast job postings, we extracted software and technical skills commonly requested by employers. We then compared these industry requirements to the team’s self-assessed proficiency levels across key tools such as SQL, Python, Tableau, Power BI, R, and AWS.

The goal is to identify strengths, highlight capability gaps, and provide insights that can guide learning plans and role alignment within the team.

Team Skill Matrix

The heatmap below shows each team member’s self-assessed proficiency (1–5) across selected analytics and data engineering skills.

Darker shades indicate higher proficiency.

Code

team_members = ["Dinara", "Nhat", "Leo"]
skills = [
    "SQL (Programming Language)",
    "Python (Programming Language)",
    "Tableau (Business Intelligence Software)",
    "Power BI",
    "Microsoft Excel",
    "R (Programming Language)",
    "Amazon Web Services",
]

df_skills = pd.DataFrame(
    {
        "Name": team_members,
        "SQL (Programming Language)": [5, 3, 3],
        "Python (Programming Language)": [4, 3, 4],
        "Tableau (Business Intelligence Software)": [3, 4, 2],
        "Power BI": [3, 5, 2],
        "Microsoft Excel": [5, 4, 4],
        "R (Programming Language)": [4, 3, 3],
        "Amazon Web Services": [4, 3, 3],
    }
).set_index("Name")

Code

fig_heat = px.imshow(
    df_skills,
    text_auto=True,
    aspect="auto",
    color_continuous_scale="Blues",
    labels=dict(color="Skill Level"),
    title="Team Skill Matrix"
)
fig_heat.update_yaxes(title="Team Member")
fig_heat

Market Alignment Analysis

Extracting Industry Demand

We filter job postings to focus exclusively on data-centric roles within data-heavy industries (NAICS sectors: Information, Finance & Insurance, and Professional Services).

These sectors employ a large proportion of data & analytics professionals, making them strong benchmarks for industry expectations.

We then identify postings related to analytics roles using keyword matching on job titles.

Finally, we extract software skills listed in job descriptions and compute how frequently each skill appears across the filtered postings.

Higher skill frequency = greater demand = higher importance for employability.

Code

naics_data_industries = [51, 52, 54]
df_data_industry = lightcast_jp[
    lightcast_jp["NAICS_2022_2"].astype(float).isin(naics_data_industries)
]

data_keywords = [
    "data", "analytics", "analysis", "analyst",
    "machine learning", "ml", "ai",
    "business intelligence", "cloud",
    "sql", "python"
]

df_data_roles = lightcast_jp[
    lightcast_jp["TITLE_NAME"].str.lower().str.contains('|'.join(data_keywords), na=False)
]

df_filtered = df_data_industry[
    df_data_industry["TITLE_NAME"].str.lower().str.contains('|'.join(data_keywords))
]

Software Skills Extraction

Lightcast provides skill data as comma-separated text fields.

We parse and standardize these strings, explode them into individual skills, and aggregate counts across all postings. This produces a clean, frequency-based ranking of the most in-demand technologies in data-oriented careers.

Code

df_filtered = df_filtered.copy()
df_filtered["software_skill"] = (
    df_filtered["SOFTWARE_SKILLS_NAME"]
        .fillna("")
        .str.split(",")
        .apply(lambda lst: [s.strip() for s in lst if s.strip() != ""])
)

df_sw_exploded = (
    df_filtered
        .explode("software_skill")
        .dropna(subset=["software_skill"])
)

industry_skill_counts = (
    df_sw_exploded["software_skill"]
        .value_counts()
        .to_frame(name="count")
)

industry_skill_counts.head()

	count
software_skill
SQL (Programming Language)	7684
Python (Programming Language)	4620
Dashboard	4202
Tableau (Business Intelligence Software)	4085
Power BI	3695

Gap Calculation

To quantify alignment between our team and the market, we compare:

Team Score - average proficiency ratings (1–5 scale) provided internally
Industry Score - demand-derived skill importance (normalized from posting frequency)

The Gap metric is defined as: \(Gap=Industry Score - Team Score\)

A positive gap means the team exceeds market expectations.
A negative gap highlights opportunities for additional training or hiring.

Code

team_avg = df_skills.mean().to_frame(name="team_score")

ind_skills = industry_skill_counts.rename_axis("Skill")
ind_skills = ind_skills.loc[skills]

gap_df = team_avg.join(ind_skills, how="left")

max_demand = gap_df["count"].max()
gap_df["industry_score"] = (gap_df["count"] / max_demand) * 5

gap_df["gap"] = gap_df["industry_score"] - gap_df["team_score"]
gap_df

	team_score	count	industry_score	gap
SQL (Programming Language)	3.666667	7684	5.000000	1.333333
Python (Programming Language)	3.666667	4620	3.006247	-0.660420
Tableau (Business Intelligence Software)	3.000000	4085	2.658121	-0.341879
Power BI	3.333333	3695	2.404347	-0.928987
Microsoft Excel	4.333333	3642	2.369859	-1.963474
R (Programming Language)	3.333333	2312	1.504425	-1.828909
Amazon Web Services	3.333333	1126	0.732691	-2.600642

Skill Gap Visualizations

Bar Chart Comparison

Code

gap_long = (
    gap_df[["team_score", "industry_score"]]
    .reset_index(names="Skill")
    .melt(id_vars="Skill", var_name="Source", value_name="Score")
)
skill_order = gap_df["industry_score"].sort_values(ascending=False).index.to_list()
fig_gap = px.bar(
    gap_long.sort_values("Score", ascending=False),
    x="Score",
    y="Skill",
    color="Source",
    barmode="group",
    orientation="h",
    title="Skill Gap Analysis",
    labels={"Score": "Score (1–5)", "Skill": "", "Source": ""},
    category_orders={"Skill": skill_order}
)
fig_gap.update_layout(
    legend_title_text="",
    xaxis=dict(range=[0, 5]),
    legend=dict(orientation="v", yanchor="top", y=0.15, xanchor="left", x=0.75)
)
fig_gap

Radar Chart Comparison

Code

skills_list = gap_df.index.tolist()
industry_values = gap_df["industry_score"].tolist()

skills_list += [skills_list[0]]
industry_values += [industry_values[0]]

fig = go.Figure()
for member in df_skills.index:
    values = df_skills.loc[member].tolist()
    values_loop = values + [values[0]]

    fig.add_trace(go.Scatterpolar(
        r=values_loop,
        theta=skills_list,
        fill='toself',
        name=member,
        opacity=0.3,
        line=dict(width=1)
    ))

fig.add_trace(go.Scatterpolar(
    r=industry_values,
    theta=skills_list,
    fill='toself',
    name='Industry Demand',
    line=dict(color='rgba(255, 127, 14, 0.9)', width=3),
    fillcolor='rgba(255, 127, 14, 0.5)'
))
fig.update_layout(
    title="Skill Gap Analysis",
    polar=dict(radialaxis=dict(visible=True, range=[0, 5])),
    showlegend=True,
    legend=dict(orientation="h", y=-0.2)
)
fig.show()

Improvement Plan

How can the team collaborate to bridge skill gaps?

While our team demonstrates strong proficiency across most software skills, we still need more consistency to meet industry expectations. To address this, we developed a collaborative improvement plan where members with higher expertise will guide others through structured peer learning. The plan focuses on small but consistent practice activities, especially in the skills with the highest industry demand: Python, SQL, Tableau/Power BI, and Microsoft Excel. For example, we will practice SQL by solving LeetCode problems and joining SQL contests, and strengthen Power BI and Tableau skills by creating hands-on visualizations. We will also remain active on GitHub to document progress and maintain a strong professional portfolio.