Aviv Gerbi

Logo

Data Science portfolio

View the Project on GitHub avivgerbi/portfolio

Data Scientist

Technical Skills: Python, SQL, Deep Learning, NLP, Computer Vision, GenAI, LLMs

Education

Work Experience

R&D Group Lead @ Israeli Military Intelligence - Unit 8200 (2024 - Present)

Software Engineer Team Lead @ Israeli Military Intelligence - Unit 8200 (2022 - 2024)

Projects

Hebrew Sentence-BERT – Semantic Representation Learning for Low-Resource Languages

We developed a Sentence-BERT (SBERT) model tailored for Hebrew, applying advanced representation learning and contrastive training techniques to generate high-quality sentence embeddings. Using the Hebrew Natural Language Inference (NLI) dataset for training and the Hebrew Semantic Textual Similarity (STS) benchmark for evaluation, we fine-tuned four transformer architectures — AlephBERT, mBERT, DictaBERT, and RoBERTa — within the SentenceTransformer framework. The best-performing configuration, DictaBERT with Contrastive Loss, achieved Pearson correlation of 0.648 and Spearman correlation of 0.661, outperforming all other models in capturing subtle semantic relationships in Hebrew text. Beyond the research contribution, this project showcases the data science workflow for low-resource NLP, including data preprocessing, model evaluation, and optimization under computational constraints.

Hebrew Sentence-BERT

EyeVision – AI-Powered Violence Detection System for Daycares

EyeVision is an AI-powered system that uses advanced computer vision and video analytics to automatically detect signs of violence in daycares from regular CCTV footage. Leveraging deep learning models for both visual and motion analysis, our solution achieved over 90% accuracy in real-time detection and alerting through a dedicated app for parents and authorities. Beyond its technical innovation, EyeVision presents a scalable business solution with strong social impact potential — and was awarded First Place in the College of Management’s Outstanding Final Project Competition.

EyeVision

ChatGPT The Tweets – Sentiment Analysis on Social Media Impact

ChatGPT The Tweets, a large-scale data science and machine learning pipeline designed to analyze public sentiment and trends about ChatGPT on Twitter. Using PySpark for scalable big data processing and logistic regression for sentiment classification, we analyzed over 190,000 tweets to uncover correlations between sentiment, geography, and user occupation. Additionally, we applied N-Gram and TF-IDF models to extract trending keywords and popular discussion topics. Beyond its analytical depth, this project demonstrates how AI-driven social listening can inform business and product strategy through real-time public opinion insights.

ChatGPT The Tweets

Reinforcement Learning for Flappy Bird – Comparing SARSA and Q-Learning

In this project, we applied Reinforcement Learning (RL) techniques to train an AI agent to play the game Flappy Bird using SARSA and Q-Learning algorithms. We built a custom state-space preprocessing pipeline that reduced over 10^21 possible states to only 10,000, enabling efficient tabular learning. The project incorporated reward shaping, epsilon decay exploration, and extensive hyperparameter experimentation to evaluate performance and stability. Results showed that SARSA with Epsilon Decay achieved the best balance between exploration and exploitation, outperforming other configurations with an average reward of 27.55 and a maximum score of 343.

Flappy Bird

Talks & Lectures