Adomas Fiseris

Data Scientist, AVP at Citizens

I’m a risk data scientist and AVP at Citizens focused on consumer bank compliance and operational risk analytics. I design and orchestrate end-to-end Python and SQL pipelines on a cloud-native analytics platform to turn large-scale transaction data into production-ready risk scores and decision tools. My work blends feature engineering and model development on modern ML frameworks with scalable ETL, rigorous data quality monitoring, and audit-ready validation in partnership with Risk, Compliance, and Engineering teams. I keep my practice sharp through Kaggle competitions, benchmarking new methods and bringing proven ideas back into regulated production environments.

Education

Current focus: NLP for conduct risk, streaming inference, production observability.

Georgia Tech

Program: Master of Science in Analytics, Computational Track - Data Science and ML

Start: January 2026 (two-year program)

ISM University of Management and Economics

Degree: Bachelor's degree in Economics

Years: 2012 - 2016

Activities: University of Economics (VSE) Prague Erasmus exchange, spring semester 2015

GPA: 3.7

Honors: 100 Talent scholarship

Kaggle & Trading Competitions

Hull Tactical - Market Prediction (Kaggle 2025)

Competition: Kaggle Hull Tactical: Market Prediction | Placement: Pending (competition active) | Best public score: 0.494 (Sharpe-like metric)

Approach: streaming ensemble of LightGBM and ElasticNet with a volatility aware allocation policy.

▶ Full details

Overview: predict daily S&P 500 excess returns and convert them into allocations between 0 and 2 while staying within 120% of index volatility. Kaggle evaluation streams one day at a time via the API to prevent lookahead.

Feature engineering:

Explicit feature contract so inference schema matches training for single row streaming.
Rolling means, standard deviations, and z scores over 21, 42, and 63-day windows for key signals like lagged_forward_returns.
Short lags such as 2-day and 5-day for local momentum and mean reversion.
Group proxies for macro, rates, and sentiment capturing level, one day delta, five day trend, and spread.
Expanding mean imputer with stored scaling stats so inputs are fully imputed and standardized exactly as in training.

Return ensemble:

Four LightGBM boosters on the engineered set (about 289 features) averaged into a single gradient boosting signal.
Four ElasticNet regressions serialized as coefficient and intercept bundles, averaged into a linear signal.
Blend as ensemble_pred = w_lgb * lgb_mean + w_enet * elastic_mean with weights tuned offline.

Volatility aware allocation:

SmoothRegimePolicy maps predicted excess return to allocation using a leverage coefficient k_t based on rolling volatility (lagged_forward_returns__vol__w21). Low volatility means higher sensitivity, high volatility de leverages automatically.
VolGovernor tracks an EWMA of strategy variance versus market variance and scales allocations to target the 1.2x volatility cap.
Daily loop: compute ensemble_pred, apply SmoothRegimePolicy, adjust with VolGovernor, clip to [0, 2], and return allocation to the API.

Link: Kaggle submission notebook

MAP - Charting Student Math Misunderstandings (Kaggle 2025)

Competition: Kaggle MAP: Charting Student Math Misunderstandings | Placement: 962/1,857 | Private score: 0.94310 MAP@3

Approach: deep learning ensemble (DeepSeek 7B, Gemma2-9B, ModernBERT-base) targeting ranked Category:Misconception labels.

▶ Full details

Problem: diagnose math misconceptions from open-ended explanations (ages 9–16). For each response, predict up to three Category:Misconception labels covering correctness, presence of misconception, and specific misconception, evaluated with MAP@3.

Modeling strategy:

Unified label space: canonical Category:Misconception set, normalized label variants, and reindexed outputs into a shared class order before fusion.
Correctness aware inputs: derived question level correctness flags and injected them as text prompts and ModernBERT special tokens to guide predictions.
Ensemble: ModernBERT-base sequence classifier (64 class focus) with templated question, answer, explanation, plus a correctness token, DeepSeek 7B classifier over the full label set, and Gemma2-9B IT with LoRA adapters and a compact correctness prompt.
Fusion and decoding: log probability blends (60% DeepSeek, 35% Gemma2-9B IT, 5% ModernBERT) with top three labels selected from the unified space.

Link: Kaggle submission notebook

Predict Calorie Expenditure (Kaggle Playground S5E5, 2025)

Competition: Kaggle Playground Series S5E5: Predict Calorie Expenditure | Placement: 1,533 / 4,316 teams | Best public score: 0.05794 RMSLE | Best private score: 0.05942 RMSLE

Final model: stacked ensemble of XGBoost, CatBoost, and a neural network with a Ridge meta learner.

▶ Full details

Overview: tabular regression to predict workout calorie burn on a large synthetic dataset derived from Calories Burnt Prediction. Target is continuous calories, metric is RMSLE, and submissions require id with Calories for each test row.

Feature engineering:

BMI computed as Weight divided by (Height divided by 100) squared.
Timed intensity as Duration multiplied by Heart_Rate.
Heart rate zone percent of max as Heart_Rate divided by (220 - Age).
Mifflin-St Jeor BMR as a sex-specific basal metabolic rate estimate.
One hot encoding for Sex where needed.

Base models (Optuna tuned, 5-fold CV):

CatBoost with depth 10, about 1.6k trees, learning rate around 0.083.
XGBoost with histogram based trees, tuned depth, learning rate, and regularization.
Neural network with three dense layers (256 to 32 to 64), GELU activations, dropout around 0.17, early stopping via SciKeras.

Ensemble: Ridge stack on out of fold predictions from XGBoost, CatBoost, and the neural network to produce the final leaderboard submission.

Link: GitHub repo (modeling done outside Kaggle, CSV predictions submitted).

Jane Street Real-Time Market Data Forecasting (Kaggle 2024-2025)

Competition: Kaggle Jane Street Real-Time Market Data Forecasting | Placement: N/A (submission failed in later test phases) | Best leaderboard score: 0.006171 weighted R squared on responder_6

Models: XGBoost gradient boosted trees and a two layer LSTM evaluated separately, with the XGBoost run producing the top score.

▶ Full details

Overview: real-time responder_6 forecasting on roughly 4.5 million rows per phase with 79 anonymized features and nine responders. Submission is via the streaming evaluation API with strict runtime constraints, reflecting non-stationary, heavy-tailed market data and evolving private tests.

Data and validation:

Used 79 raw features plus optional one day lags of all responders from lags.parquet, keeping feature engineering minimal to avoid regime overfitting and to stay within memory and time limits.
Chronological validation: trained on early periods and held out about the last 100 days as a pseudo live set to mirror leaderboard behavior.
Implemented the official weighted zero mean R squared metric inside XGBoost and PyTorch loops for early stopping and aligned model selection.

XGBoost setup: histogram tree method, tuned with Optuna (max_depth 4, learning_rate about 0.10, subsample about 0.71, colsample_bytree about 0.73, gamma about 0.26), trained around 75 rounds with early stopping; best public and private score was 0.006171.

Two layer LSTM: two LSTM layers with 32 hidden units each followed by a dense head, dropout about 0.4, sequence length 1. Trained with weighted MSE, Adam at lr 5e-5 and weight decay 1e-3, early stopping on weighted R squared. Best LSTM score was about 0.004329, lagged responder variant slightly negative.

Ensembling: no blend deployed in the final submission; compared tree and RNN families to understand when trees dominate short-horizon market data. Future work notes include blending XGBoost and LSTM for regime diversification.

Link: GitHub repo for the local modeling code and utilities.

Core stack

Cloud: AWS (EC2 · RDS · S3 · SageMaker) Data: PostgreSQL Orchestration & CI: GitFlow · Jenkins Containers: Docker MLOps: MLflow Modeling: PyTorch · JAX · H2O.ai Language: Python