This should be doable using the Tobii Gaze SDK. The key aspects of the calibration process that you will have to consider are that the person has to look at a precise point of the Active Display Area at a precise time, and that the length of the time the person has to look at each point is not fixed but dynamic. Each point is a single command to the eye tracker, and a callback is called when the eye tracker has collected enough eye-gaze data for that point.
The animations of the built in calibration is to help the person focus on the precise point which is the calibration point. In your case you would have to make sure the person will be able to focus on a small enough point so that the eye-gaze data collected by the eye tracker for the point is valid.
In the Tobii Gaze SDK C-API download you will find the documented calibration API in the tobiigaze_calibration.h header file. There are two C++ samples illustrating the use of the API: Samples/MinimalCalibration and Samples/wxWidgetsCalibrationSample (in the latter, take a look at CalibrationViewModel.cpp).