NumPy includes lots of functions and macros that make it pretty easy to access the data of an ndarray object within a C or C++ extension. Given a 1D ndarray called v, one can access element i with PyArray_GETPTR1(v, i). So if you want to copy each element in the array to a std::vector of the same type, you can iterate over each element and copy it, like so (I'm assuming an array of doubles):
npy_intp vsize = PyArray_SIZE(v);
std::vector<double> out(vsize);
for (int i = 0; i < vsize; i++) {
out[i] = *reinterpret_cast<double*>(PyArray_GETPTR1(v, i));
}
One could also do a bulk memcpy-like operation, but keep in mind that NumPy ndarrays may be mis-aligned for the data type, have non-native byte order, or other subtle attributes that make such copies less than desirable. But assuming that you are aware of these, one could do:
npy_intp vsize = PyArray_SIZE(v);
std::vector<double> out(vsize);
std::memcpy(out.data(), PyArray_DATA(v), sizeof(double) * vsize);
Using either approach, out now contains a copy of the ndarray's data, and you can manipulate it however you like. Keep in mind that, unless you really need the data as a std::vector, the NumPy C API may be perfectly fine to use in your extension as a way to access and manipulate the data. That is, unless you need to pass the data to some other function which must take a std::vector or you want to use C++ library code that relies on std::vector, I'd consider doing all your processing directly on the native array types.
As to your last question, one generally uses PyArg_BuildValue to construct a tuple which is returned from your extension functions. Your tuple would just contain two ndarray objects.