Hotel Reservations
The hospitality industry faces a significant challenge with booking cancellations. When a guest cancels a reservation, hotels lose revenue and struggle to re-allocate rooms on short notice. High cancellation rates complicate resource planning, staffing, and inventory management.
You have been hired as a Data Scientist for a major hotel chain. Your task is to build a binary classification model to predict whether a booking will be canceled (is_canceled = True) or not. By identifying high-risk bookings, the hotel can implement strategic overbooking or targeted deposit policies.
Task
Using the provided historical data, train a model to predict the is_canceled status for the reservations in the test set. You are provided with:
- train.csv: Contains the features and the target labels (
is_canceled) for training your model. - test.csv: Contains the features for the reservations you need to predict. The target column is omitted.
Dataset
- id: Unique identifier for the reservation.
- hotel: The name/city of the hotel.
- lead_time: Number of days between the booking date and the arrival date.
- arrival_date_month: The month of arrival.
- deposit_type: Indicates if a deposit was made (No Deposit, Non Refund, Refundable).
- country: Country of origin of the guest.
- arrival_date_week_number: Week number of the year for arrival.
- stays_in_weekend_nights: Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay.
- stays_in_week_nights: Number of week nights (Monday to Friday) the guest stayed or booked to stay.
- is_repeated_guest: Indicator if the guest has stayed at the hotel before (0 or 1).
- required_car_parking_spaces: Number of car parking spaces required by the guest.
- adults/children/babies: Number of guests in each category.
- meal: Type of meal plan booked.
- booking_changes: Number of changes made to the booking.
- previous_cancellations: Number of previous bookings that were cancelled by the customer.
- total_of_special_requests: Number of special requests made by the guest (e.g. twin bed, high floor).
- customer_type: Type of booking (Contract, Group, Transient, etc.).
- resort_status: Boolean indicating if the property is a Resort Hotel.
- is_canceled: The target variable (True if canceled, False otherwise).
Evaluation
Submissions are evaluated based on the F1 Score.
- 100 points: F1 Score 0.75
- 0 to 99 points: F1 Score between 0.60 and 0.74 (calculated linearly).
- 0 points: F1 Score 0.60.
Submission Format
You must submit a CSV file with exactly two columns: id and answer. The answer column should contain your boolean predictions (True or False).
id,answer
1,False
2,True
3,False