Gender Disparities in Labor Force Analysis (2024)
  • Home
  • Research Background
  • Analysis
    • Gender Disparities Overview
    • Data Cleaning & Preprocessing
    • Exploratory Data Analysis
    • Gender Dominance in Job Postings
    • Machine-Learning Models
    • NLP Analysis
    • Skill Gap Analysis
  • Career Strategy
  • About Us

On this page

  • Objective
  • Team Skill Matrix
  • Market Alignment Analysis
  • Skill Gap Visualizations
  • Improvement Plan

Skill Gap Analysis

Comparing Team Capabilities to Market Demand in Data & Analytics Roles

Objective

This section evaluates how our team’s current technical capabilities compare to the skills most frequently demanded in the job market for Data & Analytics roles.

Using Lightcast job postings, we extracted software and technical skills commonly requested by employers. We then compared these industry requirements to the team’s self-assessed proficiency levels across key tools such as SQL, Python, Tableau, Power BI, R, and AWS.

The goal is to identify strengths, highlight capability gaps, and provide insights that can guide learning plans and role alignment within the team.


Team Skill Matrix

The heatmap below shows each team member’s self-assessed proficiency (1–5) across selected analytics and data engineering skills.

Darker shades indicate higher proficiency.

Code
team_members = ["Dinara", "Nhat", "Leo"]
skills = [
    "SQL (Programming Language)",
    "Python (Programming Language)",
    "Tableau (Business Intelligence Software)",
    "Power BI",
    "Microsoft Excel",
    "R (Programming Language)",
    "Amazon Web Services",
]

df_skills = pd.DataFrame(
    {
        "Name": team_members,
        "SQL (Programming Language)": [5, 3, 3],
        "Python (Programming Language)": [4, 3, 4],
        "Tableau (Business Intelligence Software)": [3, 4, 2],
        "Power BI": [3, 5, 2],
        "Microsoft Excel": [5, 4, 4],
        "R (Programming Language)": [4, 3, 3],
        "Amazon Web Services": [4, 3, 3],
    }
).set_index("Name")
Code
fig_heat = px.imshow(
    df_skills,
    text_auto=True,
    aspect="auto",
    color_continuous_scale="Blues",
    labels=dict(color="Skill Level"),
    title="Team Skill Matrix"
)
fig_heat.update_yaxes(title="Team Member")
fig_heat

Market Alignment Analysis

Extracting Industry Demand

We filter job postings to focus exclusively on data-centric roles within data-heavy industries (NAICS sectors: Information, Finance & Insurance, and Professional Services).

These sectors employ a large proportion of data & analytics professionals, making them strong benchmarks for industry expectations.

We then identify postings related to analytics roles using keyword matching on job titles.

Finally, we extract software skills listed in job descriptions and compute how frequently each skill appears across the filtered postings.

Higher skill frequency = greater demand = higher importance for employability.

Code
naics_data_industries = [51, 52, 54]
df_data_industry = lightcast_jp[
    lightcast_jp["NAICS_2022_2"].astype(float).isin(naics_data_industries)
]

data_keywords = [
    "data", "analytics", "analysis", "analyst",
    "machine learning", "ml", "ai",
    "business intelligence", "cloud",
    "sql", "python"
]

df_data_roles = lightcast_jp[
    lightcast_jp["TITLE_NAME"].str.lower().str.contains('|'.join(data_keywords), na=False)
]

df_filtered = df_data_industry[
    df_data_industry["TITLE_NAME"].str.lower().str.contains('|'.join(data_keywords))
]

Software Skills Extraction

Lightcast provides skill data as comma-separated text fields.

We parse and standardize these strings, explode them into individual skills, and aggregate counts across all postings. This produces a clean, frequency-based ranking of the most in-demand technologies in data-oriented careers.

Code
df_filtered = df_filtered.copy()
df_filtered["software_skill"] = (
    df_filtered["SOFTWARE_SKILLS_NAME"]
        .fillna("")
        .str.split(",")
        .apply(lambda lst: [s.strip() for s in lst if s.strip() != ""])
)

df_sw_exploded = (
    df_filtered
        .explode("software_skill")
        .dropna(subset=["software_skill"])
)

industry_skill_counts = (
    df_sw_exploded["software_skill"]
        .value_counts()
        .to_frame(name="count")
)

industry_skill_counts.head()
count
software_skill
SQL (Programming Language) 7684
Python (Programming Language) 4620
Dashboard 4202
Tableau (Business Intelligence Software) 4085
Power BI 3695

Gap Calculation

To quantify alignment between our team and the market, we compare:

  • Team Score - average proficiency ratings (1–5 scale) provided internally
  • Industry Score - demand-derived skill importance (normalized from posting frequency)

The Gap metric is defined as: \(Gap=Industry Score - Team Score\)

  • A positive gap means the team exceeds market expectations.
  • A negative gap highlights opportunities for additional training or hiring.
Code
team_avg = df_skills.mean().to_frame(name="team_score")

ind_skills = industry_skill_counts.rename_axis("Skill")
ind_skills = ind_skills.loc[skills]

gap_df = team_avg.join(ind_skills, how="left")

max_demand = gap_df["count"].max()
gap_df["industry_score"] = (gap_df["count"] / max_demand) * 5

gap_df["gap"] = gap_df["industry_score"] - gap_df["team_score"]
gap_df
team_score count industry_score gap
SQL (Programming Language) 3.666667 7684 5.000000 1.333333
Python (Programming Language) 3.666667 4620 3.006247 -0.660420
Tableau (Business Intelligence Software) 3.000000 4085 2.658121 -0.341879
Power BI 3.333333 3695 2.404347 -0.928987
Microsoft Excel 4.333333 3642 2.369859 -1.963474
R (Programming Language) 3.333333 2312 1.504425 -1.828909
Amazon Web Services 3.333333 1126 0.732691 -2.600642

Skill Gap Visualizations

Bar Chart Comparison

Code
gap_long = (
    gap_df[["team_score", "industry_score"]]
    .reset_index(names="Skill")
    .melt(id_vars="Skill", var_name="Source", value_name="Score")
)
skill_order = gap_df["industry_score"].sort_values(ascending=False).index.to_list()
fig_gap = px.bar(
    gap_long.sort_values("Score", ascending=False),
    x="Score",
    y="Skill",
    color="Source",
    barmode="group",
    orientation="h",
    title="Skill Gap Analysis",
    labels={"Score": "Score (1–5)", "Skill": "", "Source": ""},
    category_orders={"Skill": skill_order}
)
fig_gap.update_layout(
    legend_title_text="",
    xaxis=dict(range=[0, 5]),
    legend=dict(orientation="v", yanchor="top", y=0.15, xanchor="left", x=0.75)
)
fig_gap

Radar Chart Comparison

Code
skills_list = gap_df.index.tolist()
industry_values = gap_df["industry_score"].tolist()

skills_list += [skills_list[0]]
industry_values += [industry_values[0]]

fig = go.Figure()
for member in df_skills.index:
    values = df_skills.loc[member].tolist()
    values_loop = values + [values[0]]

    fig.add_trace(go.Scatterpolar(
        r=values_loop,
        theta=skills_list,
        fill='toself',
        name=member,
        opacity=0.3,
        line=dict(width=1)
    ))

fig.add_trace(go.Scatterpolar(
    r=industry_values,
    theta=skills_list,
    fill='toself',
    name='Industry Demand',
    line=dict(color='rgba(255, 127, 14, 0.9)', width=3),
    fillcolor='rgba(255, 127, 14, 0.5)'
))
fig.update_layout(
    title="Skill Gap Analysis",
    polar=dict(radialaxis=dict(visible=True, range=[0, 5])),
    showlegend=True,
    legend=dict(orientation="h", y=-0.2)
)
fig.show()

Improvement Plan

How can the team collaborate to bridge skill gaps?

While our team demonstrates strong proficiency across most software skills, we still need more consistency to meet industry expectations. To address this, we developed a collaborative improvement plan where members with higher expertise will guide others through structured peer learning. The plan focuses on small but consistent practice activities, especially in the skills with the highest industry demand: Python, SQL, Tableau/Power BI, and Microsoft Excel. For example, we will practice SQL by solving LeetCode problems and joining SQL contests, and strengthen Power BI and Tableau skills by creating hands-on visualizations. We will also remain active on GitHub to document progress and maintain a strong professional portfolio.

© 2025 · AD 688 Web Analytics · Boston University

Team 5