Asking for input before each calibration point sounds like a good idea.
Regarding mixing of EyeX SDK and Gaze SDK: It is technically possible, but might be a bit complicated. Since you require custom calibration, are not depending of on-screen interaction and have an interest to port your software to Linux (if/when you have an eye tracker that runs on Linux), I would recommend to just use the Gaze SDK.
Read more about
Differences between Tobii Gaze SDK and Tobii EyeX SDK