This work focuses on reliable detection and segmentation of bird vocalizations as recorded in the open field. Acoustic detection of avian sounds can be used for the automatized monitoring of multiple bird taxa and querying in long-term recordings for species of interest. These tasks are tackled in this work, by suggesting two approaches: A) First, DenseNets are applied to weekly labeled data to infer the attention map of the dataset (i.e. Salience and CAM). We push further this idea by directing attention maps to the YOLO v2 Deepnet-based, detection framework to localize bird vocalizations. B) A deep autoencoder, namely the U-net, maps the audio spectrogram of bird vocalizations to its corresponding binary mask that encircles the spectral blobs of vocalizations while suppressing other audio sources. We focus solely on procedures requiring minimum human attendance, suitable to scan massive volumes of data, in order to analyze them, evaluate insights and hypotheses and identify patterns of bird activity. Hopefully, this approach will be valuable to researchers, conservation practitioners, and decision makers that need to design policies on biodiversity issues.