Member-only story
Get with the Times: PyTabKit for better Tabular Machine Learning over Sk-Learn (CODE Included)

For too long has Scikit-Learn been the go-to library for machine learning on tabular data, offering a broad collection of algorithms, preprocessing utilities, and model evaluation tools. Yes, it is still perfect, but why continue to use your grandfather’s run down ‘58 chevy, let it remain an antique. Enter PyTabKit — a new framework designed to replace Scikit-Learn for classification and regression on tabular data, leveraging cutting-edge techniques like RealMLP and improved default hyperparameters for GBDTs.
Full Article link: 2407.04491
Citation: @inproceedings{holzmuller2024better,title={Better by default: {S}trong pre-tuned {MLPs} and boosted trees on tabular data}, author={Holzm{\"u}ller, David and Grinsztajn, Leo and Steinwart, Ingo}, booktitle = {Neural {Information} {Processing} {Systems}},year={2024}}
Why Move Beyond Scikit-Learn?
Scikit-Learn provides a solid foundation for model development, but it lacks highly optimized deep learning methods and efficient auto-tuned hyperparameters. Recent research has demonstrated that:
RealMLP Can Rival GBDTs
- Deep learning models for tabular data have traditionally required extensive…