我如何在咨询项目中使用Vagrant和Docker
By Doug Ashton – Data Scientist, UK
作者:道格·阿什頓(Doug Ashton)–英國(guó)數(shù)據(jù)科學(xué)家
Just like you I like to try out all the latest tech. If there’s a new feature in Shiny then I’ll download the latest version without thinking. I’ve currently got 4 versions of R on my laptop, 270 packages, 2 versions of Java, and a number of other open source tools. While being on the cutting edge is part of my job, this conflicts with the need for strict audit and reproducibility requirements that we have for project work.
就像您一樣,我喜歡嘗試所有最新技術(shù)。 如果Shiny中有一項(xiàng)新功能,那么我會(huì)不加考慮地下載最新版本。 我目前在筆記本電腦上有4個(gè)版本的R,270個(gè)軟件包,2個(gè)Java版本以及許多其他開源工具。 雖然在我的工作中處于最前沿,但這與我們對(duì)項(xiàng)目工作的嚴(yán)格審核和可重復(fù)性要求的需求相矛盾。
One problem with R is that due to the fast changing nature of CRAN it can be difficult to gain a consistent combination of packages across your team and production servers. The R community has responded to this problem with a number of noteworthy packages for managing package libraries, such as packrat, checkpoint, switchr and our own pkgsnap. Another approach is to use the MRAN mirror to freeze CRAN to a particular date.
R的一個(gè)問(wèn)題是,由于CRAN的特性日新月異,因此很難在團(tuán)隊(duì)和生產(chǎn)服務(wù)器之間獲得一致的軟件包組合。 R社區(qū)已通過(guò)許多值得注意的軟件包來(lái)解決此問(wèn)題,這些軟件包用于管理軟件包庫(kù),例如packrat , checkpoint , switchr和我們自己的pkgsnap 。 另一種方法是使用MRAN鏡像將CRAN凍結(jié)到特定日期。
A bigger problem is how R is interacting with the various system depenedencies you have installed. At Mango this is why we use continuous integration and unit testing to make sure our results are reproducible on dedicated build servers. Even this can leave you scratching your head when tests don’t match.
一個(gè)更大的問(wèn)題是R如何與您已安裝的各種系統(tǒng)依賴關(guān)系進(jìn)行交互。 在Mango,這就是為什么我們使用持續(xù)集成和單元測(cè)試來(lái)確保我們的結(jié)果在專用構(gòu)建服務(wù)器上可再現(xiàn)的原因。 即使測(cè)試不匹配,這也會(huì)使您撓頭。
All this led us to look for a better way of working. We needed an environment that was easily reproducible, and more in line with the production environment we are deploying to. We’ve already been using Docker for some time so this was the natural choice.
所有這些使我們尋求一種更好的工作方式。 我們需要一個(gè)易于復(fù)制的環(huán)境,并且與我們要部署到的生產(chǎn)環(huán)境更加一致。 我們已經(jīng)使用Docker已有一段時(shí)間了,所以這是自然的選擇。
碼頭工人 (Docker)
As described in a previous post, Docker is designed to provide an isolated, portable and repeatable wrapper around your applications. We use this in a number of ways:
如前一篇文章所述 ,Docker旨在為您的應(yīng)用程序提供一個(gè)隔離,可移植和可重復(fù)的包裝器。 我們以多種方式使用它:
1.可重現(xiàn)的環(huán)境 (1. Reproducible environments)
Each project can run inside its own container, completely sandboxed from the rest of your system. We have a number of base images, each built on specific R versions and provisioned with standard sets of packages (using our pkgsnap package) and RStudio Server. Each project can build on one of these images with any specific package dependencies. The recipe to build this image is stored in the Dockerfile that can be saved in the project directory. An example project Docker file is shown in this demonstration.
每個(gè)項(xiàng)目都可以在自己的容器中運(yùn)行,并且與系統(tǒng)其余部分完全沙盒化。 我們有許多基礎(chǔ)映像,每個(gè)基礎(chǔ)映像都基于特定的R版本構(gòu)建,并配有標(biāo)準(zhǔn)的軟件包集(使用我們的pkgsnap軟件包)和RStudio服務(wù)器。 每個(gè)項(xiàng)目都可以在這些映像之一上建立任何特定的程序包依賴關(guān)系。 構(gòu)建該映像的配方存儲(chǔ)在Dockerfile中,該文件可以保存在項(xiàng)目目錄中。 此演示中顯示了一個(gè)示例項(xiàng)目Docker文件。
2.系統(tǒng)依賴性 (2. System dependencies)
If there are system dependencies such as database connections or external libraries, then building an image with these installed makes it much easier to distribute the project to others. This also makes Docker a great way of trying a new technology without the pain of installing it on your system. For example the excellent Jupyter/all-spark-notebook has everything you need to get started with Spark from R, Python or Scala.
如果存在諸如數(shù)據(jù)庫(kù)連接或外部庫(kù)之類的系統(tǒng)依賴項(xiàng),則在安裝了這些依賴項(xiàng)的情況下構(gòu)建映像將使將項(xiàng)目分發(fā)給其他人更加容易。 這也使Docker成為嘗試一項(xiàng)新技術(shù)的好方法,而無(wú)需在系統(tǒng)上安裝新技術(shù)。 例如,出色的Jupyter / all-spark-notebook提供了從R,Python或Scala入門Spark所需的一切。
3.可擴(kuò)展性 (3. Scalability)
Once you’re used to working in containers it can significantly lower the barrier to scaling up the compute power when needed. Your container will work just the same on your laptop and a 32 core EC2 instance. You just spin up a node, pull the image and deploy your application. Multiple containers from the same image can be spawned across a grid in seconds and a small scale Spark cluster can be swapped out for a much larger one.
一旦習(xí)慣了在容器中工作,它就可以顯著降低在需要時(shí)擴(kuò)展計(jì)算能力的障礙。 您的容器在筆記本電腦和32核EC2實(shí)例上的工作原理相同。 您只需旋轉(zhuǎn)一個(gè)節(jié)點(diǎn),拉取映像并部署您的應(yīng)用程序。 可以在幾秒鐘內(nèi)跨網(wǎng)格生成同一圖像中的多個(gè)容器,并且可以將小規(guī)模的Spark集群換成更大的集群。
流浪漢 (Vagrant)
For larger software development projects we also use Vagrant as a tool for reproducible development environments. As described in an earlier post Vagrant is a set of command line tools for managing virtual machines (VMs). This creates a dedicated VM for each project that is consistent across the development team and only creates a small file in version control.
對(duì)于大型軟件開發(fā)項(xiàng)目,我們還使用Vagrant作為可重現(xiàn)的開發(fā)環(huán)境的工具。 如之前的文章所述,Vagrant是一組用于管理虛擬機(jī)(VM)的命令行工具。 這將為每個(gè)項(xiàng)目創(chuàng)建專用的VM,該VM在整個(gè)開發(fā)團(tuán)隊(duì)中是一致的,并且僅在版本控制中創(chuàng)建一個(gè)小文件。
更多資源 (More resources)
翻譯自: https://www.pybloggers.com/2015/12/how-i-use-vagrant-and-docker-in-consultancy-projects/
總結(jié)
以上是生活随笔為你收集整理的我如何在咨询项目中使用Vagrant和Docker的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Weisfeiler-Lehman(WL
- 下一篇: 计算机专业欧美排名,数字媒体艺术大学排名