Skip to content

dcaribou/transfermarkt-datasets

Repository files navigation

Build Status Scraper Pipeline Status API Pipeline Status dbt Version

transfermarkt-datasets

Clean, structured and automatically updated football (soccer) dataset built from Transfermarkt data -- 79,000+ games, 37,000+ players, 1,800,000+ appearances and more, refreshed weekly.

🌍 New: International football data — The dataset now includes countries, national_teams, national team competition games (🏆 World Cup, UEFA Euro, Copa América, AFCON, AFC Asian Cup), and international_caps / international_goals / current_national_team_id on every player profile.

What's in it

The dataset is composed of 12 tables covering competitions, games, clubs, players, appearances, player valuations, club games, game events, game lineups, transfers, countries and national teams. Each table contains the attributes of the entity and IDs that can be used to join them together.

Table Description Scale
competitions Leagues, tournaments and national team competitions 40+
clubs Club details, squad size, market value 400+
players Player profiles, positions, market values, international caps 37,000+
games Match results, lineups, attendance 79,000+
appearances One row per player per game played 1,800,000+
player_valuations Historical market value records 500,000+
club_games Per-club view of each game 150,000+
game_events Goals, cards, substitutions 1,100,000+
game_lineups Starting and bench lineups 2,800,000+
transfers Player transfers between clubs 87,000+
countries Country details and confederation membership 100+
national_teams National team profiles, squad size, FIFA ranking 100+

Download Dataset Open in GitHub Codespaces Kaggle data.world

ER diagram
classDiagram
direction LR
competitions --|> games : competition_id
competitions --|> clubs : domestic_competition_id
clubs --|> players : current_club_id
clubs --|> club_games : opponent/club_id
clubs --|> game_events : club_id
players --|> appearances : player_id
players --|> game_events : player_id
players --|> player_valuations : player_id
games --|> appearances : game_id
games --|> game_events : game_id
games --|> clubs : home/away_club_id
games --|> club_games : game_id
countries --|> national_teams : country_id
national_teams --|> players : current_national_team_id
class competitions {
    competition_id
    type
}
class games {
    game_id
    home/away_club_id
    competition_id
}
class game_events {
    game_id
    player_id
}
class clubs {
    club_id
    domestic_competition_id
}
class club_games {
    club_id
    opponent_club_id
    game_id
}
class players {
    player_id
    current_club_id
    current_national_team_id
    international_caps
    international_goals
}
class player_valuations{
    player_id
}
class appearances {
    appearance_id
    player_id
    game_id
}
class countries {
    country_id
    country_name
    confederation
}
class national_teams {
    national_team_id
    country_id
    confederation
    fifa_ranking
}
Loading

Getting started

The fastest way to explore the dataset is to download the DuckDB database file -- a single file containing all 12 tables, ready to query.

1. Download the database

Download DuckDB Database

Or via the command line:

curl -LO https://pub-e682421888d945d684bcae8890b0ec20.r2.dev/data/transfermarkt-datasets.duckdb

2. Query with any DuckDB client

Open the file with the DuckDB CLI, Python, R, or any compatible client:

-- DuckDB CLI
-- $ duckdb transfermarkt-datasets.duckdb

SHOW TABLES;

SELECT player_id, name, position, market_value_in_eur
FROM players
WHERE position = 'Attack'
ORDER BY market_value_in_eur DESC
LIMIT 10;

-- player_id | name            | position | market_value_in_eur
-- 418560    | Erling Haaland  | Attack   | 200000000
-- 342229    | Kylian Mbappé   | Attack   | 180000000
-- 371998    | Vinicius Junior | Attack   | 180000000
-- 433177    | Bukayo Saka     | Attack   | 130000000
-- ...

Tip: You can also query individual CSV files remotely with DuckDB -- no download required:

INSTALL httpfs; LOAD httpfs;
SELECT * FROM read_csv_auto('https://pub-e682421888d945d684bcae8890b0ec20.r2.dev/data/players.csv.gz') LIMIT 10;

Community

Getting in touch

In order to keep things tidy, there are two simple guidelines

  • Keep the conversation centralised and public by getting in touch via the Discussions tab.
  • Avoid topic duplication by having a quick look at the FAQs

Sponsoring

Maintenance of this project is made possible by sponsors. If you'd like to sponsor this project you can use the Sponsor button at the top.

Contributing

Contributions to transfermarkt-datasets are most welcome. If you want to contribute new fields or assets to this dataset, the instructions are quite simple:

  1. Fork the repo
  2. Set up your local environment
  3. Populate the data directory
  4. Start modifying assets or creating new ones in the dbt project
  5. If it's all looking good, create a pull request with your changes 🚀

In case you face any issue following the instructions above please get in touch

For full setup and workflow details, see the Developer guide.

Sponsor this project

 

Contributors