Overview

Data cleaning? Oh yeah, I live for this stuff. This project was all about whipping a messy dataset into shape and transforming raw shark attack data from 2013 to 2023 into clear, actionable insights. Because let’s be real, if you want solid takeaways, you need clean, structured data.

So, what did I do? I tackled inconsistencies, standardized formats, and removed junk data so we could build a dashboard that actually tells a story.

Top 10 projects by pledged amount table

Data Cleaning: Making Order Out of Chaos

Step one? Roll up the sleeves and get our hands dirty. Here’s how I took this dataset from hot mess to data goldmine:

1. Preserving the Original Data

First things first, always keep a backup. I duplicated the dataset to make sure I could track every change. No accidental data losses on our watch.

2. Filtering for Relevance

Removed duplicate records (because who needs redundant data?)

Focused only on shark attacks from 2013 to 2023, anything outside that window got the boot.

Spotted and eliminated non-year values in the Year column (because “01” is not a year).

Top 10 projects by pledged amount table

3. Cleaning Up the Date Column

Cross-referenced other fields (like Investigator Source and PDFs) to correct inconsistent or missing dates.

Cleaning the date column

Standardized everything into dd/mm/yyyy format because I like things neat.

Data standardization

Extracted months for better trend analysis using Excel’s TEXT function.

Extract months from the dates column

4. Filtering Out Irrelevant Incidents

The “Type” column was a mess, filled with entries like “Unverified” and “Questionable.” I filtered out anything that wasn’t a confirmed shark attack.

Filtering out irrelevant incidents

5. Standardizing Country Names

Countries were all over the place, spelling errors, random capitalization, you name it. I fixed that by converting everything to uppercase and correcting inconsistencies.

Standardizing Country Names

6. Cleaning the Activity Column

When people enter “Swimming, Swims, Swam,” they mean the same thing. I consolidated variations into standardized categories.

Activity Consolidation

7. Fixing Victim Details

Created a “Victims” column to accurately reflect the number of individuals involved.

New victim column

Replaced placeholders in the “Name” column with “Unknown” where necessary.

Victim name clean up

Standardized the “Sex” column (goodbye, “M” and “F,” hello, “Male” and “Female”).

Gender column standardization

8. Age Group Categorization

Because analyzing raw ages is messy, I grouped them into categories:

0-3: Baby

4-12: Child

13-19: Teenager

20-39: Adult

40+: Older Adult

I then used XLOOKUP to assign each entry to the right category.

Categorization of age group column

9. Cleaning Up Injury & Fatality Data

Consolidated “Injury” descriptions into broader categories (e.g., “Legs” instead of “Ankle, Thigh, Calf”).

Cleaning up the injury column

Standardized fatality data for clear yes/no insights.

Clean up fatality column

10. Organizing Time of Day

Grouped attacks into morning, afternoon, evening, and night for better trend analysis.

Organizing time of day

11. Shark Species Standardization

Removed unconfirmed shark involvement cases.

Removed unconfirmed shark involvement cases.

Cleaned up species naming conventions and categorized unknown sharks as Unidentified.

Species clean up

12. Streamlining Columns

Got rid of redundant data; duplicate case numbers, hyperlinks, and unnecessary columns because clutter is the enemy of good analysis.

Redundant column clean up

Data Analysis: The Good Stuff

With clean data in hand, I built a dashboard to uncover key insights:

Shark attack dashboard

1. Shark Attacks Per Year

Majority of shark attacks? Non-fatal.

Shark Attacks per year chart

2. Top Shark Species Responsible for Attacks

White sharks, Bull sharks, and Tiger sharks are the usual suspects, big and aggressive.

Shark species that attack the most chart

3. Provoked vs. Unprovoked Incidents

Most attacks are unprovoked, meaning people aren’t out there messing with sharks, they just happen to be in the wrong place at the wrong time.

Provoked vs unprovoked incidents pie chart

4. Injury Trends

Legs are the most commonly injured body parts, probably because they move the most in the water, making them prime targets.

Injury patterns chart

5. Geographic Distribution

The USA and Australia have the highest number of shark attacks, making them key areas for further study.

Geographic Distribution chart

6. Time of Day Trends

Most attacks happen in the morning and afternoon, though a lack of time data in many cases means we need more info to confirm strong patterns.

Time of the day shark attacks occur pie chart

Final Thoughts

This project was all about turning messy data into real insights. Once cleaned, the dataset gave us a clear picture of global shark attack trends. Key takeaways? Most shark attacks are non-fatal and unprovoked, the USA and Australia are hotspots, and White, Bull, and Tiger sharks are the ones to watch out for.

Data cleaning may not be the flashiest part of analytics, but trust me it’s the secret sauce that makes everything work.